How to send the history of 3 months of files from one directory to another according to the date in its name

ALER 0 Reputation points
2024-05-14T22:35:22.23+00:00

Hello, I'm having problems because I want to move the history of some files stored in a data lake according to the date in its name, but in the end what I manage to do is only move either one specific day or all of them.

example: I have files October - November - December-January- February- March-April I want to leave only the ones from February- March and April and move the rest to another folder.

Imagen

this is the pipeline expression builder

@and(

    greaterOrEquals(

        formatDateTime(substring(item().name, 15, 10), 'yyyy-MM-dd'),

        formatDateTime(subtractFromTime(utcNow(), -1, 'Month'), 'yyyy-MM-dd')

    ),

    equals(

        substring(item().name, 1, 10),

        formatDateTime(subtractFromTime(utcNow(), -1, 'Month'), 'yyyy-MM')

    )

)

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,527 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,108 questions
Azure Data Lake Analytics
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 12,975 Reputation points Microsoft Vendor
    2024-05-15T09:53:51.7833333+00:00

    @ANDRES LEONARDO ESPITIA RAMIREZ

    Welcome to the Microsoft Q&A platform and thank you for posting your question here.

    The expression you provided filters the files based on two conditions:

    1. Date comparison: It checks if the date extracted from the filename (using substring and formatDateTime) is greater than or equal to one month before the current date (using subtractFromTime and formatDateTime).
    2. Month match: It checks if the first 10 characters of the filename (which you assume represent the date in YYYY-MM format) exactly match the month one month prior to the current month (using substring and formatDateTime).

    There are a couple of reasons why your current expression might not be filtering files for the desired months (February, March, and April):

    Strict Month Match: The second condition with equals might be too strict. If the filename format includes any additional characters before the date (like a prefix or separator), the first 10 characters won't necessarily match exactly.

    Date Comparison: The first condition might not be filtering correctly. It checks if the date in the filename is greater than or equal to one month before today. This would include files from today and potentially even future dates (if the filenames contain dates in the future).

    Here's how you can adjust the expression to address these issues:

    Relax Month Match: Instead of using equals, try using startsWith to check if the filename starts with the desired month string (e.g., "2024-02").

    Filter for Past Dates: Modify the first condition to check if the date in the filename is less than today's date (e.g., <).

    Here's an example of an adjusted expression that incorporates these changes:

    @and(
      less(formatDateTime(substring(item().name, 15, 10), 'yyyy-MM-dd'), utcNow()),
      startsWith(substring(item().name, 1, 10), formatDateTime(subtractFromTime(utcNow(), -1, 'Month'), 'yyyy-MM'))
    )
    
    

    This expression will filter files where:

    • The date extracted from the filename is less than today's date (ensuring we only move files from the past).
    • The first part of the filename (assuming it contains the date) starts with the month string one month prior to the current month (e.g., "2024-02" for February).

    Remember to test your expression thoroughly to ensure it filters the files as intended before running the pipeline on a large dataset.

    Hope this helps. Do let us know if you any further queries..

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.