Hands-On PySpark for Big Data Analysis

BY
Udemy

Mode

Online

Fees

₹ 449 3499

Quick Facts

particular details
Medium of instructions English
Mode of learning Self study
Mode of Delivery Video and Text Based

Course and certificate fees

Fees information
₹ 449  ₹3,499
certificate availability

Yes

certificate providing authority

Udemy

What you will learn

Knowledge of big data

The syllabus

Install PySpark and Setup Your Development Environment

  • The Course Overview
  • Core Concepts in Spark and PySpark
  • Setting Up Spark on Windows and PySpark
  • SparkContext, SparkConf and Spark Shell

Getting Your Big Data into the Spark Environment Using RDDs

  • Loading Data onto Spark RDDs
  • Parallelization with Spark RDDs
  • RDD Operation Basics

Big Data Cleaning and Wrangling with Spark Notebooks

  • Using Spark Notebooks for Quick Iteration of Ideas
  • Sampling/Filtering RDDs to Pick-Out Relevant Data Points
  • Splitting Datasets and Creating New Combinations with Set Operations

Aggregating and Summarizing Data into Useful Reports

  • Calculating Averages with Map and Reduce
  • Faster Average Computation with Aggregate
  • Pivot Tabling with Key-Value Paired Data Points

Powerful Exploratory Data Analysis with MLlib

  • Computing Summary Statistics with MLlib
  • Using Pearson and Spearman to Discover Correlations
  • Testing Your Hypotheses on Large Datasets

Putting Structure on Your Big Data with SparkSQL

  • Manipulating DataFrames with SparkSQL Schemas
  • Using the Spark DSL to Build Queries for Structured Data Operations

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses