This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Which definition best describes Apache Spark?
A highly scalable relational database management system.
A virtual server with a Python runtime.
A distributed platform for parallel data processing using multiple languages.
You need to use Spark to analyze data in a parquet file. What should you do?
Load the parquet file into a dataframe.
Import the data into a table in a serverless SQL pool.
Convert the data to CSV format.
You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use?
%%spark
%%pyspark
%%sql
You must answer all questions before checking your work.
Was this page helpful?