Introduction

Completed

Apache Spark is an open source parallel processing framework for large-scale data processing and analytics. Spark has become extremely popular in "big data" processing scenarios, and is available in multiple platform implementations; including Azure HDInsight, Azure Databricks, and Azure Synapse Analytics.

This module explores how you can use Spark in Azure Synapse Analytics to ingest, process, and analyze data from a data lake. While the core techniques and code described in this module are common to all Spark implementations, the integrated tools and ability to work with Spark in the same environment as other Synapse analytical runtimes are specific to Azure Synapse Analytics.

After completing this module, you'll be able to:

  • Identify core features and capabilities of Apache Spark.
  • Configure a Spark pool in Azure Synapse Analytics.
  • Run code to load, analyze, and visualize data in a Spark notebook.