Transforming JSON files using data flow

A K 45 Reputation points
2024-11-07T15:23:15.93+00:00

Hello!

I currently have about 60 json files inside a blob container which most of them have different fields and values.

I have created a pipeline with a get metadata activity that points to the container, with the field list set to Child items. I have also created a parameter within the source dataset called fileName and have set the value as item().name within the Get Metadata activity settings. I have then connected this to a for each activity with '@activity('Get Metadata1).output.childItems'. Inside the for each activity I have placed a data flow to remove the header and footer, and to flatten the nested json value.

I also have a data flow parameter filenameparam with the value of @item().name again.

The problem I am facing now is that it outputs the transformed data in multiple parts with system-generated names like the below screenshot and instead of the 60 json files, I now have 150+ files.partition

Can anyone please advise on how I can change the configuration settings so that the file names are the same as the original file and also that it outputs the 60 files with just the header/footer taken out and the value flattened without the partitioning.

Thank you in advance!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,491 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,873 questions
{count} votes

Accepted answer
  1. Keshavulu Dasari 1,730 Reputation points Microsoft Vendor
    2024-11-07T20:00:18.73+00:00

    Hi A K,
    Welcome to Microsoft Q&A Forum, thank you for posting your query here!
    The issue you’re encountering with the output files being split into multiple parts is likely due to the default partitioning behavior in your data flow. To ensure that your output files retain their original names and are not partitioned:

    1.Disable Partitioning in Data Flow:

    • In your data flow, go to the Sink transformation.
    • Under the Settings tab, look for the Partitioning section.
    • Set the Partitioning option to Single partition. This will ensure that the data is not split into multiple files.

    2.Set Output File Names:

    • In the Sink transformation, under the Settings tab, you can specify the output file name.
    • Use the data flow parameter filenameparam to set the output file name. You can do this by setting the Output to single file option and then using the expression @concat(filenameparam, '.json') to name the file.

    3.Ensure Consistent File Naming:

    • Make sure that the filenameparam is correctly passed from the ForEach activity to the Data Flow.
    • In the ForEach activity, ensure that the Items property is set to @activity('Get Metadata1').output.childItems.
    • Inside the ForEach activity, in the Data Flow activity, map the parameter filenameparam to @item().name.

    Brief steps:

    1. Disable Partitioning: Set the partitioning option to single partition in the Sink transformation.
    2. Set Output File Names: Use the filenameparam to set the output file name in the Sink transformation.
    3. Ensure Parameter Mapping: Ensure the filenameparam is correctly mapped in the For Each activity.

    By following these steps, you should be able to output the transformed data into 60 files with the same names as the original files, without the unwanted partitioning

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you,

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Keshavulu Dasari 1,730 Reputation points Microsoft Vendor
    2024-11-11T12:08:46.3866667+00:00

    Hi A K,
    There might be a couple of issues to address.

    1. Duplicate Files Issue:
      • Ensure that the Sink transformation is set to output to a single file without partitioning. Double-check that the Single partition setting is correctly applied.
      • Verify that the Output to single file option is enabled and that no other settings are causing the duplication.
    2. Using filenameparam in Expressions:
      • The error “Unrecognized expression: filenameparam” suggests that the parameter might not be correctly referenced or passed. Ensure that filenameparam is properly defined and accessible within the Data Flow.

    Step-by-Step:

    1. Check Data Flow Parameter:
      • Ensure that filenameparam is defined in the Data Flow parameters section.
      • In the Data Flow activity, map the parameter correctly:
      • Go to the Parameters tab of the Data Flow activity.
      • Set filenameparam to @item().name.
    2. Set Output File Name:
      • In the Sink transformation, use the Expression Builder to set the file name.
      • Instead of using @concat(filenameparam, '.json') directly, try using the expression within the Data Flow’s context:
      • Go to the Sink transformation.
      • Under Settings, find the File name option.
      • Use the expression: filenameparam + '.json' (without the @concat).
    3. Ensure No Partitioning:
      • In the Sink transformation, ensure that Partitioning is set to Single partition.
      • Confirm that the Output to single file option is enabled.

    Example Configuration:

    Data Flow Parameters:

    • filenameparam: @item().name
    • Sink Transformation Settings:
    • File name option: filenameparam + '.json'
      • Partitioning: Single partition
      • Output to single file: Enabled

    By following these steps, you should be able to resolve the issues with duplicate files and correctly set the output file names.


    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you,

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.