Getting Different results when I am running the Global Templary table (Transform Query)

PRADEEP KUMAR Namani 0

Hi Team,

We are running a logic in an Azure Databricks notebook using PySpark. Initially, we read data from ADLS and load it into a global temporary table to perform data quality checks. We then recreate the same temporary table. Afterward, we use these temporary tables to write Spark SQL for transformations, execute the query, and load the results into another global temporary table. Finally, the result set is written back to ADLS.

Problem: When reading the transformed data, one of the column values is inconsistent. Each time we query the data, the column alternates between returning null and the expected value.

Please find the below screenshot

RUN1: WrittenAmount value is coming Null

User's image

RUN2: WrittenAmount value is coming some data

User's image

Please help me on this, Thank you in Advance

PRADEEP KUMAR Namani 0 Reputation points

2024-11-25T10:30:30.19+00:00

The script is too large to paste in here, so please get in touch with me to to obtain it.
Chandra Boorla 6,460 Reputation points Microsoft Vendor

2024-11-26T06:20:35.71+00:00

@PRADEEP KUMAR Namani

Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

It seems like you are facing an issue with inconsistent results when running a PySpark logic in Azure Databricks notebook. Specifically, you are experiencing with the global temporary table in Azure Databricks, where the WrittenAmount column alternates between returning null and the expected value, could be related to how data is being processed and stored in the temporary tables.

Here are some potential causes and recommendations that might help you.

Data Quality Checks: Ensure that the data quality checks you are performing do not inadvertently filter out or modify the WrittenAmount values. Review the logic used in these checks to confirm that they are not causing the inconsistency.

Global Temporary Table Session Behavior: Global temporary tables are session-based and can behave inconsistently if multiple jobs or sessions are accessing or modifying the table at the same time. Always drop the global temporary table before creating it again in subsequent runs. This ensures that no stale data or metadata persists between runs.

For more details please refer: CREATE VIEW

Caching Issues: If data is cached in memory for performance, it could lead to inconsistencies between query runs. Caching issues may cause previously loaded data to persist in memory, leading to unexpected results. Explicitly uncache any cached tables or data after transformations. This ensures that the query uses the most up-to-date data from the global temporary table.

For additional information, please refer: Let’s talk about Spark (Un)Cache/(Un)Persist in Table/View/DataFrame, In depth

Concurrency and Race Conditions: If multiple operations (e.g., transformations or queries) are running concurrently on the same global temporary table, this can lead to race conditions where data changes unexpectedly between different stages. Ensure that the transformations are executed in a serialized manner or consider using regular temporary tables instead of global ones if concurrent access is not required.

Please refer the documentation for more details: Temporary views

By following these steps and ensuring that your data handling and transformations are consistent and isolated, you should be able to resolve the issue of inconsistent column values.

If the issue still persists, please do let us know.

I hope this information helps.

Thank you.
PRADEEP KUMAR Namani 0 Reputation points

2024-11-26T06:59:59.6866667+00:00

Hi @Chandra Boorla Thank you for the response and elaborate the points in different level, I am also suspecting from DQ checks or Concurrency**.** Validating the transformation query as well as we have parallel processing in DQ level(it may causing any issue).
Chandra Boorla 6,460 Reputation points Microsoft Vendor

2024-11-27T08:10:51.91+00:00

@PRADEEP KUMAR Namani

Just checking in to see if the below suggestions helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank you.
PRADEEP KUMAR Namani 0 Reputation points

2025-01-02T10:27:43.3+00:00

Hi @Chandra Boorla Your inputs are usefull to figure out the solution.

The solution I applied involves performing a write operation immediately after completing the read operation, followed by another read operation. This approach resolves the inconsistent data issue

Thank you
PRADEEP KUMAR Namani 0 Reputation points

2025-01-02T10:28:03.28+00:00

The solution I applied involves performing a write operation immediately after completing the read operation, followed by another read operation. This approach resolves the inconsistent data issue
Chandra Boorla 6,460 Reputation points Microsoft Vendor

2025-01-02T10:40:03.4033333+00:00

@PRADEEP KUMAR Namani

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.
Chandra Boorla 6,460 Reputation points Microsoft Vendor

2025-01-03T15:49:04.15+00:00

@PRADEEP KUMAR Namani

Just checking in to see if the below suggestion helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Chandra Boorla 6,460 Reputation points Microsoft Vendor

2025-01-02T10:38:59.9033333+00:00

@PRADEEP KUMAR Namani

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.

Issue:

We are running a logic in an Azure Databricks notebook using PySpark. Initially, we read data from ADLS and load it into a global temporary table to perform data quality checks. We then recreate the same temporary table. Afterward, we use these temporary tables to write Spark SQL for transformations, execute the query, and load the results into another global temporary table. Finally, the result set is written back to ADLS.

Problem: When reading the transformed data, one of the column values is inconsistent. Each time we query the data, the column alternates between returning null and the expected value.

Solution:

To address the inconsistent data issue, I implemented a solution that performs a write operation immediately after the read operation, followed by another read operation. This method effectively resolves the problem.

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Please sign in to rate this answer.
Chandra Boorla 6,460 Reputation points Microsoft Vendor

2025-01-06T07:10:34.8133333+00:00

@PRADEEP KUMAR Namani

Following up to see if the above suggestion was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Getting Different results when I am running the Global Templary table (Transform Query)

1 answer

Your answer