Lineage was extracted from unity but duplicate process created for same table In Purview
Hi Microsoft Team,
I have registered and scanned databricks unity catalog in azure purview and then all tables,views metadata came into purview and that time lineage is not pulled into azure purview and now i have enabled access schema in unity catalog and in that schema table_ and column_lineage tables are there having the lineage information stored. After enabling this schema and because of these tables' lineage got extracted and pulled into azure purview.
Now the issue is I'm getting duplicate lineages for same table in azure purview because that two tables having historical lineages as well. we are running notebooks every day and every time notebook having new changes that lineage is stored in those tables with different entity_id and this issue creating duplicate lineages for each table in azure purview.
In the above screenshot we can see multiple databricks_notebook(multiple versions of same notebook) processes between factinventorysnapshot and vw_factinventorysnapshot and out of four one is latest and other is historical ones. We notice for every version of the notebook a new entity_id was created by databricks unity catalog.
So, can you please suggest solution to avoid multiple lineages in purview and pick latest lineage from Databricks Unity Catalog.