How to limit synapse spark to only send WARN/ERROR logs to log analytics workspace ?
I have followed the steps here: https://zcusa.951200.xyz/en-us/azure/synapse-analytics/spark/apache-spark-azure-log-analytics
But now all the internal driver/executor logs are getting sent to log analytics workspace, which is leading to storage issues. It sends internal logs like from files like "InMemoryCacheClient.scala", "TokenLibraryInternal.scala", etc.
Question is: How to control/limit so that the spark application only sends the application logs (written in my pyspark code) ?
Using Synapse 3.4 runtime - spark3.4
Using pyspark.
I have tried adding log4j properties as part of spark config, but its not working.
Azure Synapse Analytics
-
phemanth 12,655 Reputation points • Microsoft Vendor
2024-12-18T17:35:46.5266667+00:00 Thanks for the question and using MS Q&A platform.
To limit the logs sent to your Log Analytics workspace to only WARN and ERROR levels, you can adjust the log4j configuration in your Synapse Spark application. Here are the steps you can follow:
- Create a file named
log4j.properties
with the following content:log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{ISO8601} [%t] %-5p %c %x - %m%n
- You can upload this file to a location accessible by your Spark job, such as Azure Blob Storage or Azure Data Lake Storage.
- Add the following Spark configuration to your job to point to the custom log4j.properties file:
spark.conf.set("spark.driver.extraJavaOptions", "-Dlog4j.configuration=file:/path/to/log4j.properties") spark.conf.set("spark.executor.extraJavaOptions", "-Dlog4j.configuration=file:/path/to/log4j.properties")
- Ensure that your Spark job is submitted with the above configurations. This will limit the logs sent to Log Analytics to only WARN and ERROR levels.
If you continue to experience issues, you might want to double-check the path to the
log4j.properties
file and ensure that it is correctly referenced in your Spark configurationI hope the above steps will resolve the issue, please do let us know if issue persists. Thank you
- Create a file named
-
_user_unknown_ 1 Reputation point
2024-12-18T18:58:08.5266667+00:00 @phemanth thank you for your quick response. I will try it out later today.
I am also reading online that spark3.3+ use log4j2. Does that change things for me? as I am using spark3.4
-
_user_unknown_ 1 Reputation point
2024-12-19T00:08:19.2366667+00:00 @phemanth I tried your suggested steps, but it didn't work. I still see info logs.
I have passed javaoptions like:
-Dlog4j.configuration=abfss://mystoragefs1@mystorage.dfs.core.windows.net/log4j.properties -
_user_unknown_ 1 Reputation point
2024-12-19T00:10:03.44+00:00 @phemanth I tried it, but it didn't work. I still see info logs.
I passed abfss path like:
-Dlog4j.configuration=abfss://mystoragefs1@mystorage.dfs.core.windows.net/log4j.properties -
phemanth 12,655 Reputation points • Microsoft Vendor
2024-12-19T18:02:33.9033333+00:00 Thanks for information
Starting from Spark 3.3, the logging framework has been updated to use log4j2 instead of log4j1
This means you'll need to adjust your configuration accordingly.
Here's how you can set up log4j2 for Spark 3.4:
- Create a log4j2.properties file with the following content:
Upload this file to a location accessible by your Spark job, such as Azure Blob Storage or Azure Data Lake Storage.status = error name = PropertiesConfig filters = threshold filter.threshold.type = ThresholdFilter filter.threshold.level = warn appenders = console appender.console.type = Console appender.console.name = STDOUT appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{ISO8601} [%t] %-5p %c %x - %m%n rootLogger.level = warn rootLogger.appenderRefs = stdout rootLogger.appenderRef.stdout.ref = STDOUT
- Add the following Spark configuration to your job to point to the custom log4j2.properties file:
Ensure your Spark job is submitted with the above configurations. This will limit the logs sent to Log Analytics to only WARN and ERROR levels.spark.conf.set("spark.driver.extraJavaOptions", "-Dlog4j.configurationFile=file:/path/to/log4j2.properties") spark.conf.set("spark.executor.extraJavaOptions", "-Dlog4j.configurationFile=file:/path/to/log4j2.properties")
If you encounter any issues, double-check the path to the log4j2.properties file and ensure it is correctly referenced in your Spark configuration.
I hope the above steps will resolve the issue, please do let us know if issue persists. Thank you
- Create a log4j2.properties file with the following content:
-
_user_unknown_ 1 Reputation point
2024-12-19T21:35:50.2166667+00:00 I did that.
I set:
spark.conf.set("spark.driver.extraJavaOptions", "-Dlog4j.configurationFile=abfss://mycontainer@mystorage.dfs.core.windows.net/log4j2.properties") spark.conf.set("spark.executor.extraJavaOptions", "-Dlog4j.configurationFile=abfss://mycontainer@mystorage.dfs.core.windows.net/log4j2.properties")
it didnt work
-
phemanth 12,655 Reputation points • Microsoft Vendor
2024-12-23T18:33:31.62+00:00 Here are a few things you can check and confirm us
Verify the Path: Ensure that the path to the
log4j2.properties
file is correct and accessible. You can try using a local path first to see if it works:spark.conf.set("spark.driver.extraJavaOptions", "-Dlog4j.configurationFile=file:/path/to/log4j2.properties") spark.conf.set("spark.executor.extraJavaOptions", "-Dlog4j.configurationFile=file:/path/to/log4j2.properties")
Check File Permissions: Make sure the
log4j2.properties
file has the correct permissions and is readable by the Spark job.Log4j2 Configuration: Double-check the content of your
log4j2.properties
file. It should look something like this:status = error name = PropertiesConfig filters = threshold filter.threshold.type = ThresholdFilter filter.threshold.level = warn appenders = console appender.console.type = Console appender.console.name = STDOUT appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{ISO8601} [%t] %-5p %c %x - %m%n rootLogger.level = warn rootLogger.appenderRefs = stdout rootLogger.appenderRef.stdout.ref = STDOUT
Spark Configuration: Ensure that the Spark configuration is being applied correctly. You can print the Spark configuration to verify:
spark.conf.getAll.foreach(println)
Driver and Executor Logs: Check both driver and executor logs to see if there are any errors or warnings related to log4j configuration.
Alternative Configuration: If the above steps don't work, you can try setting the log level programmatically in your Spark application:
from pyspark.sql import SparkSession import logging spark = SparkSession.builder \ .appName("MyApp") \ .getOrCreate() log4jLogger = spark._jvm.org.apache.log4j logger = log4jLogger.LogManager.getLogger(__name__) logger.setLevel(log4jLogger.Level.WARN)
-
_user_unknown_ 1 Reputation point
2024-12-26T17:54:27.9166667+00:00 I have tried this, it still doesn't work.
Are you able to get it to work from your end ?
-
phemanth 12,655 Reputation points • Microsoft Vendor
2024-12-27T19:14:00.0166667+00:00 could you please provide more details about the error and screenshots if possible.
Sign in to comment