Skip to main content
Version: 1.0

Vanilla Spark on your Local Machine (Optional)

Goal: what does Spark look like on your local machine?

Running Spark locally: Spark Shell & UI

  • Basics of Spark Shell
  • Basics of Spark UI
    • Can be seen through spark shells
    • Also in the Databricks Notebooks ????

To practice the concepts and examples that are demonstrated in the above video, specific versions of Apache Spark and Java Standard Edition are required to be installed on your machines. Follow the instructions here to do so.

Spark Ecosystem, Spark Session (Spark Object)

Spark Program Structure

Spark Applications and Jobs

Spark Input Partitions, Stages, and DAGs

Spark Tasks and Operations

Do it yourself!

Starting from the video below, a Git repo is used to demonstrate the examples in Pycharm. If you would like to follow along on your local machine, set up the repo in Pycharm by following the instructions here.

Spark Read data from files - Part 1

Spark Read data from files - Part 2

Spark groupBy and Shuffle Partitions - Part 1

Spark groupBy and Shuffle Partitions - Part 2

Spark Repartitions - Part 1

Spark Repartitions - Part 2

Spark Program on the cluster - detailed explanation

Spark PartitionBy

Spark BucketBy

Spark Submit - Part 1

Spark Submit - Part 2

Spark Submit - Part 3

Spark Structured Streaming - Introduction

Spark Structured Streaming - Example 1

Spark Structured Streaming - Example 2

Spark Structured Streaming - Example 3

Spark Caching - Introduction

Spark Caching - Example 1

Spark Caching - Example 2

Spark Caching - Example 3

Summary