Skip to main content
Version: 2.0
🕑Estimated time for completion

This section takes about 3h10m to complete.

Vanilla Spark on your Local Machine (Bonus)

Every wonder what Spark looks like when it's run on your local machine? Let's join Syed as he walks you through some of the Vanilla Spark basics!

Running Spark locally: Spark Shell & UI​

  • Basics of Spark Shell
  • Basics of Spark UI
    • Can be seen through Spark Shells
    • Also in the Databricks Notebooks

To practice the concepts and examples that are demonstrated in the above video, specific versions of Apache Spark and Java Standard Edition are required to be installed on your machines. Follow the instructions here to do so.

Spark Ecosystem, Spark Session (Spark Object)​

Spark Program Structure​

Spark Applications and Jobs​

Spark Input Partitions, Stages, and DAGs​

Spark Tasks and Operations​

Do it yourself!​

Starting from the video below, a Git repo is used to demonstrate the examples in Pycharm. If you would like to follow along on your local machine, set up the repo in Pycharm by following the instructions here.

Spark Read data from files - Part 1​

Spark Read data from files - Part 2​

Spark groupBy and Shuffle Partitions - Part 1​

Spark groupBy and Shuffle Partitions - Part 2​

Spark Repartitions - Part 1​

Spark Repartitions - Part 2​

Spark Program on the cluster - detailed explanation​

Spark PartitionBy​

Spark BucketBy​

Spark Submit - Part 1​

Spark Submit - Part 2​

Spark Submit - Part 3​

Spark Structured Streaming - Introduction​

Spark Structured Streaming - Example 1​

Spark Structured Streaming - Example 2​

Spark Structured Streaming - Example 3​

Spark Caching - Introduction​

Spark Caching - Example 1​

Spark Caching - Example 2​

Spark Caching - Example 3​

Summary​