Lineage was extracted from unity but duplicate process created for same table In Purview

Sri Lakshman Velugubantla 20 Reputation points
2025-01-07T12:52:16.9333333+00:00

Hi Microsoft Team,

I have registered and scanned databricks unity catalog in azure purview and then all tables,views metadata came into purview and that time lineage is not pulled into azure purview and now i have enabled access schema in unity catalog and in that schema table_ and column_lineage tables are there having the lineage information stored. After enabling this schema and because of these tables' lineage got extracted and pulled into azure purview.

Now the issue is I'm getting duplicate lineages for same table in azure purview because that two tables having historical lineages as well. we are running notebooks every day and every time notebook having new changes that lineage is stored in those tables with different entity_id and this issue creating duplicate lineages for each table in azure purview.

User's image

In the above screenshot we can see multiple databricks_notebook(multiple versions of same notebook) processes between factinventorysnapshot and vw_factinventorysnapshot and out of four one is latest and other is historical ones. We notice for every version of the notebook a new entity_id was created by databricks unity catalog.

So, can you please suggest solution to avoid multiple lineages in purview and pick latest lineage from Databricks Unity Catalog.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,303 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,335 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.