Data quality troubleshooting

All questions, symptoms of errors are described with possible resolution below:

Connection failed error

Screenshot of the connection failed error.

Azure Resource Owner or User Access Administrator access is a higher level of access on the Control Plan for example: with this role you can modify access or modify configurations on the subscription. Is there any other workaround?

  • Workaround 1: Your IT admin, who has the Azure Resource Owner role or User Access Administrator role, can create a data source connection for Data Quality (DQ). This is a one-time configuration task. The IT admin only needs the Data Quality Steward role temporarily to create the DQ connection. After completing DQ connection configuration, the Data Quality Steward role can be removed from the IT admin's personal, as there's no use of that role for an IT Admin personal.

  • Workaround 2: Your company can grant the Azure Resource Owner role to one or two data stewards who are accountable and responsible for creating data source connections for Data Quality assessment and data profiling.

Why I'm seeing invalid source error from scanning job.

  • There can be two reasons why you see this error:
    • The delta table doesn't exist in the location
    • The data in the file aren't in a valid delta format.

I'm setting up data quality scans for my Fabric delta tables. I see all data assets in the data quality view, I selected one asset and applied rules for data quality scanning, but the scan is failing.

  • There can be number of reasons why your data quality scanning is failing:
    • Name of the fabric workspace shouldn't have any spaces (example workspace name: CorpData Not Corp Data)
    • Data in tables aren't correct delta format. Make sure that your data are in delta format.
    • Make sure the data map scan ran successfully, if not then rerun data map scan.
    • Delete any previous data quality runs for the data asset.

Why I'm seeing this error message: No connection can be used. Try to create connection first?

Screenshot of the 'No connection can be used' error in the data quality overview page.

  • To profile data or to run data quality scanning, you first need to configure data source connection. This alert disappears after you have created a data source connection.

Why is the total count of profiled data showing less than the total count in my Azure Data Lake Storage Gen2 delta table?

  • Microsoft Purview Data Quality is using 1 Million sample size for profiling. This sample is taken randomly. If your delta table has more than million records, then total count won't match.

Why do I see an action about data quality score is missing for a data product, I see the score in the data product when I browsed the data product view.

  • When the action was created, there wasn't any data quality score for that data product. Data quality scanning ran after the action was created and the score published for the data product. Recommend to close the action once the remediation is done to avoid confusion.

Data quality rule creation from "Suggest rules" throws an error about a "date" column when trying to add all 30 suggested rules

  • This is because the schema data type is unsupported state in the data quality schema view. You could change the data type to date by selecting the schema management toggle and save it. After you changed the data type you should be able to add the rule.

When trying to add all suggested rules it throws error about "ObserverId already exists"

  • Most likely, the same/identical rule has already been added to a column. When you try to add same/identical rule to a column the application throws this error message.

Why my scheduled job is skipping instead of running? I see the Skipped for data quality scanning jobs

  • The DQ Job has a functionality to check and run DQ only if there has been changes since the last run, which is performed to check the delta history. Skipped merely means there has been no changes in the data since last run and the spark run for DQ is not performed. Skipped!= Failed

When I select profile data tab, I see number of columns preselected. Can I change the selected columns?

  • Microsoft Purview Data Quality is using an AI assisted profiling solution. Preselected columns are selected using the Microsoft Purview Data Profiling AI. You can deselect preselected columns and reselect based on criticality of the columns and select save and run to run profiling.

Why I can't select some of the data assets from data quality asset list page to profile and scan?

  • There can be few reasons:
    • Those data assets are published from unsupported data sources
    • The file format of those data assets isn't supported

Why my profiling job is failing for the supported data sources?

  • Check the schema to make sure that there's no column name with spaces. Current version isn't supporting column name with spaces. Our engineers are working on to release a hotfix.

Why I can't run data quality scanning and data profiling for CSV, parquet, and text files?

  • Microsoft Purview Data Quality is currently only supports delta format of parquet. Purview Data Quality isn't supporting CSV, text, and parquet (no delta).

Why don't I see the data quality freshness rule in the rule list?

  • Data quality freshness isn't supported for Azure SQL tables. If your data asset is an Azure SQL table, then the freshness rule won't be listed to select and apply to the data asset.

My DQ scan job failed. I see an error message 'Internal service error occurred, please retry, or contact Microsoft support.' What should I do to troubleshoot?

  • There can be many reasons the scan is failing with this error message:
    • User isn't authorized to perform the current operation for the workspace that user is trying to access for the data quality scan.
    • Error code 403, meaning access to data sources is forbidden temporarily.
    • Granted access to the data source for your managed identity (MSI) has expired.
    • Purview managed identity (MSI) needs contributor access to the Microsoft Fabric workspace. If the contributor access for the Microsoft Purview MSI hasn't been provided to the Microsoft Fabric workspace, then the data quality scan fails.

Why am I getting delta format error even though I'm using delta format?

  • We support Spark 3.4 Delta 2.4. Make sure that you are using delta lake version 2.4.

Why I'm seeing the error when I selected a reference data asset to configure Table lookup rule

  • The reason is you have selected a data asset that isn't part linked or referred to a data product under the same governance domain. To select the right data asset:
    • Click select reference table (see the following screenshot)

      Screenshot of look up data asset error.

    • Cancel current selection (see the screenshot below)

      Screenshot of selecting correct data asset.

    • After canceled current selection, select another asset.

How can I configure access to data source for Microsoft Purview MSI?

All our data sources are behind the private end point (in vNet), Can Purview access data in vNet for data quality scanning?

Where can I find good documentation about expression function to create custom rules?

Why DQ scan for Fabric Lakehouse table is failing?

  • Purview Data map support for fabric Lakehouse is in private preview. We need to allow-list your purview tenant to Purview data map and Fabric One Lake to enable Fabric Lakehouse table DQ scanning with Purview DQ. Contact your Microsoft account team to allowlist your tenant for Fabric Lakehouse support.