Public Holidays
Worldwide public holiday data sourced from PyPI holidays package and Wikipedia, covering 38 countries or regions from 1970 to 2099.
Each row indicates the holiday info for a specific date, country or region, and whether most people have paid time off.
Note
Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Volume and retention
This dataset is stored in Parquet format. It's a snapshot with holiday information from January 1, 1970 to January 1, 2099. The data size is about 500KB.
Storage location
This dataset is stored in the East US Azure region. We recommend locating compute resources in East US for affinity.
Additional information
This dataset combines data sourced from Wikipedia (WikiMedia Foundation Inc) and PyPI holidays package.
- Wikipedia: original source, original license
- PyPI holidays: original source, original license
The combined dataset is provided under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
Email aod@microsoft.com if you have any questions about the data source.
Columns
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
countryOrRegion | string | 38 | Sweden Norway | Country or region full name. |
countryRegionCode | string | 35 | SE NO | Country or region code following the format here. |
date | timestamp | 20,665 | 2074-01-01 00:00:00 2025-12-25 00:00:00 | Date of the holiday. |
holidayName | string | 483 | Søndag Söndag | Full name of the holiday. |
isPaidTimeOff | boolean | 3 | True | Indicate whether most people have paid time off on this date (only available for US, GB, and India now). If it is NULL, it means unknown. |
normalizeHolidayName | string | 438 | Søndag Söndag | Normalized name of the holiday. |
Preview
countryOrRegion | holidayName | normalizeHolidayName | countryRegionCode | date |
---|---|---|---|---|
Norway | Søndag | Søndag | NO | 12/28/2098 12:00:00 AM |
Sweden | Söndag | Söndag | SE | 12/28/2098 12:00:00 AM |
Australia | Boxing Day | Boxing Day | AU | 12/26/2098 12:00:00 AM |
Hungary | Karácsony másnapja | Karácsony másnapja | HU | 12/26/2098 12:00:00 AM |
Austria | Stefanitag | Stefanitag | AT | 12/26/2098 12:00:00 AM |
Canada | Boxing Day | Boxing Day | CA | 12/26/2098 12:00:00 AM |
Croatia | Sveti Stjepan | Sveti Stjepan | HR | 12/26/2098 12:00:00 AM |
Czech | 2. svátek vánoční | 2. svátek vánoční | CZ | 12/26/2098 12:00:00 AM |
Data access
Azure Notebooks
# This is a package in preview.
from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_pandas_dataframe()
hol_df.info()
Azure Databricks
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://zcusa.951200.xyz/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()
display(hol_df.limit(5))
Azure Synapse
# This is a package in preview.
from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()
# Display top 5 rows
display(hol_df.limit(5))
Next steps
View the rest of the datasets in the Open Datasets catalog.