Genome in a Bottle Consortium (GIAB)

The Genome in a Bottle Consortium (GIAB) is a public-private-academic consortium hosted by the National Institute of Standards and Technology (NIST) to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice. The GIAB Consortium has selected several genomes to produce and characterize as reference materials. NIST is developing NIST Reference Materials from these genomes, which are DNA extracted from a large homogenized growth of B lymphoblastoid cell lines from the Coriell Institute for Medical Research.

A mirror of the complete data set from the GIAB project is freely available on Amazon S3. Now anyone can use the data on-demand without worrying about storage costs and download time.

For more information, please visit If you have any questions, please email A description of all data generated by GIAB for the genomes below is described in a preprint at:

Accessing the Data

The latest data is publicly available in the “giab” Amazon S3 bucket in US-East (N. Virginia) region, available via HTTP at or S3 at s3://giab. The structure of the bucket is fully described in the file and manifest of all files is available within at file.

Please be aware that the prefix “ftp” should be removed from all paths within those files. For example, the following unix commands get the list of all files from current.tree as S3 HTTP URLs:

curl -s -O<br> grep file current.tree | cut -f 1 | sed -e 's/^ftp//' | awk '{print "" $1}' &gt; giab_s3_urls

The “giab_s3_urls” file will now contain lines formatted as S3 URLs:

You can access the data via simple HTTP requests, or take advantage of the AWS Command Line Interface or AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP.