Append vs update in Azure Data Lake Storage Gen2 for a csv file.

Ravikiran Srinivasulu 10 Reputation points
2024-12-10T03:23:07.9433333+00:00

I have a CSV file in a Data Lake Storage Gen2. I am referring to the ACL permissions below.https://zcusa.951200.xyz/en-us/azure/storage/blobs/data-lake-storage-access-control#common-scenarios-related-to-acl-permissionsThe table gives different ACL permissions for updating and appending data. I want to know what it means to append and update data in a CSV file.

The traditional meaning of append is to add rows to the end of the file. In this case, even if RW- permission is given to the file and --X permission to all its parent, I am not able to append.

Of course, to append new rows, I need to save the file. So it considers it as an update activity, not an append activity. The file is saved if I assign the required ACLs for an update operation (-WX for the parent directory and none for the file).

So I want to know what is meant by append here which I can reasonably test.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,513 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,295 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Keshavulu Dasari 2,570 Reputation points Microsoft Vendor
    2024-12-10T03:51:59.7133333+00:00

    Hi Ravikiran Srinivasulu ,
    Welcome to Microsoft Q&A Forum. Thanks for posting you query here!
    Appending data typically means adding new data to the end of an existing file without altering the existing content. For a CSV file, this would mean adding new rows at the end of the file, in ADLS Gen2, appending data requires specific permissions:

    • Write (W) permission on the file.
    • Execute (X) permission on the parent directory.

    Updating data involves modifying the existing content of the file. This could mean changing existing rows, adding new rows, or deleting rows. For updating a file, the required permissions are:

    • Write (W) and Execute (X) permissions on the parent directory.
    • Write (W) permission on the file.

    Based on Your Scenario

    In your case, when you try to append new rows to the CSV file, it seems the system treats this as an update operation because the file needs to be saved after appending. This is why you need the update permissions (-WX for the parent directory and none for the file) to successfully append data.
    To reasonably test the append operation:

    1. Ensure you have the correct permissions:
      • For appending: RW- on the file and --X on the parent directory.
      • For updating: -WX on the parent directory and none on the file.
    2. Perform the operation:
      • Try adding new rows to the CSV file and saving it. If the operation fails with append permissions, it indicates that the system is treating it as an update.
    3. Adjust permissions if needed:
      • If appending fails, try setting the update permissions and perform the operation again.

    These distinctions and ensuring the correct permissions, you should be able to manage your CSV file operations effectively
    For more information:
    https://zcusa.951200.xyz/en-us/azure/storage/blobs/data-lake-storage-access-control
    https://zcusa.951200.xyz/en-us/azure/storage/blobs/data-lake-storage-access-control-model


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members
    User's image
    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you


  2. Amrinder Singh 5,555 Reputation points Microsoft Employee
    2024-12-10T05:23:38.1066667+00:00

    HI Ravikiran Srinivasulu - Thanks for reaching out over Q&A forum.

    When writing data to Storage via ADLS Gen2 API. there are 3 operations called in sequence i.e. Create File -> Append File -> Flush File.

    So, that's what refers to the Append for ADLS Gen2 and different than what we have in conventional terms or what we have in Blob Storage (Append Blobs)

    Whereas, when it comes to update, it means updating the existing file.

    Hope that helps!

    If there are any queries. let me know and will be glad to assist.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.