GATK Resource Bundle
Note
Important Update 9/19/2024: All URLs are changing. We are enabling public access to all Genomics Data Lake containers. The existing “signed URLs” (shared access signatures) will be retired at: 2024-11-04T00:00:00Z. After this time, the URLs without a query string will continue to work, however the “signed URLs” will no longer work and will return a 403 HTTP status code. Please plan accordingly to access the public URLs without a query string after this date (remove the ‘?’ and trailing characters).
The GATK resource bundle is a collection of standard files for working with human resequencing data with the GATK.
Note
Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Data source
This dataset is a mirror of the data store at https://gatk.broadinstitute.org/hc/articles/360035890811-Resource-bundle
Data volumes and update frequency
- datasetgatkbestpractices : 542 GB
- datasetgatklegacybundles : 61 GB
- datasetgatktestdata : 2 TB
- datasetpublicbroadref : 477 GB
- datasetbroadpublic : 3 TB
Datasets are updated monthly during the first week of every month.
Storage location
This dataset is stored in the West US 2 and West Central US Azure regions. Allocating compute resources in West US 2 or West Central US is recommended for affinity.
Data Access
datasetgatkbestpractices
West US 2: 'https://datasetgatkbestpractices.blob.core.windows.net/dataset'
West Central US: 'https://datasetgatkbestpractices-secondary.blob.core.windows.net/dataset'
SAS Token: ?sv=2020-04-08&si=prod&sr=c&sig=6SaDfKtXAIfdpO%2BkvNA%2FsTNmNij%2Byh%2F%2F%2Bf98WAUqs7I%3D
datasetgatklegacybundles
West US 2: 'https://datasetgatklegacybundles.blob.core.windows.net/dataset'
West Central US: 'https://datasetgatklegacybundles-secondary.blob.core.windows.net/dataset'
SAS Token: ?sv=2020-04-08&si=prod&sr=c&sig=xBfxOPBqHKUCszzwbNCBYF0k9osTQjKnZbEjXCW7gU0%3D
datasetgatktestdata
West US 2: 'https://datasetgatktestdata.blob.core.windows.net/dataset'
West Central US: 'https://datasetgatktestdata-secondary.blob.core.windows.net/dataset'
SAS Token: ?sv=2020-04-08&si=prod&sr=c&sig=fzLts1Q2vKjuvR7g50vE4HteEHBxTcJbNvf%2FZCeDMO4%3D
datasetpublicbroadref
West US 2: 'https://datasetpublicbroadref.blob.core.windows.net/dataset'
West Central US: 'https://datasetpublicbroadref-secondary.blob.core.windows.net/dataset'
SAS Token: ?sv=2020-04-08&si=prod&sr=c&sig=DQxmjB4D1lAfOW9AxIWbXwZx6ksbwjlNkixw597JnvQ%3D
South Central US: 'https://datasetpublicbroadrefsc.blob.core.windows.net/dataset'
SAS Token: ?sv=2023-01-03&st=2024-02-12T19%3A56%3A11Z&se=2029-02-13T19%3A56%3A00Z&sr=c&sp=rl&sig=oGiNUGZ08PaabHVNtIiVEpJ1kcyqcL6ZadQcuN2ns%2FM%3D
datasetbroadpublic
West US 2: 'https://datasetbroadpublic.blob.core.windows.net/dataset'
West Central US: 'https://datasetbroadpublic-secondary.blob.core.windows.net/dataset'
SAS Token: ?sv=2020-04-08&si=prod&sr=c&sig=u%2Bg2Ab7WKZEGiAkwlj6nKiEeZ5wdoJb10Az7uUwis%2Fg%3D
South Central US: 'https://datasetbroadpublicsc.blob.core.windows.net/dataset'
SAS Token: ?sv=2023-01-03&st=2024-02-12T19%3A58%3A33Z&se=2029-02-13T19%3A58%3A00Z&sr=c&sp=rl&sig=C2lDhe1uwu%2FJnC9rbQO65G6%2BdEUQ%2Fl0VheXrlnIQVAs%3D
Use Terms
Visit the GATK resource bundle official site.
Contact
Visit the GATK resource bundle official site.
Next steps
View the rest of the datasets in the Open Datasets catalog.