I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.
Issue:
We are running a logic in an Azure Databricks notebook using PySpark. Initially, we read data from ADLS and load it into a global temporary table to perform data quality checks. We then recreate the same temporary table. Afterward, we use these temporary tables to write Spark SQL for transformations, execute the query, and load the results into another global temporary table. Finally, the result set is written back to ADLS.
Problem: When reading the transformed data, one of the column values is inconsistent. Each time we query the data, the column alternates between returning null
and the expected value.
Solution:
To address the inconsistent data issue, I implemented a solution that performs a write operation immediately after the read operation, followed by another read operation. This method effectively resolves the problem.
If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.
Hope this helps. Do let us know if you have any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.