Skip to main content

Version: 2.0

🕑Estimated time for completion

This section takes about 3h10m to complete.

Syed Ali Masroor Rizvi

Curator

Vanilla Spark on your Local Machine (Bonus)

Every wonder what Spark looks like when it's run on your local machine? Let's join Syed as he walks you through some of the Vanilla Spark basics!

Running Spark locally: Spark Shell & UI

Basics of Spark Shell
Basics of Spark UI
- Can be seen through Spark Shells
- Also in the Databricks Notebooks

To practice the concepts and examples that are demonstrated in the above video, specific versions of Apache Spark and Java Standard Edition are required to be installed on your machines. Follow the instructions here to do so.

Spark Ecosystem, Spark Session (Spark Object)

Spark Program Structure

Spark Applications and Jobs

Spark Input Partitions, Stages, and DAGs

Spark Tasks and Operations

Do it yourself!

Starting from the video below, a Git repo is used to demonstrate the examples in Pycharm. If you would like to follow along on your local machine, set up the repo in Pycharm by following the instructions here.

Spark Read data from files - Part 1

Spark Read data from files - Part 2

Spark groupBy and Shuffle Partitions - Part 1

Spark groupBy and Shuffle Partitions - Part 2

Spark Repartitions - Part 1

Spark Repartitions - Part 2

Spark Program on the cluster - detailed explanation

Spark PartitionBy

Spark BucketBy

Spark Submit - Part 1

Spark Submit - Part 2

Spark Submit - Part 3

Spark Structured Streaming - Introduction

Spark Structured Streaming - Example 1

Spark Structured Streaming - Example 2

Spark Structured Streaming - Example 3

Spark Caching - Introduction

Spark Caching - Example 1

Spark Caching - Example 2

Spark Caching - Example 3

Summary