Population data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines CIESIN’s Gridded Population of the World using machine learning models on high-resolution worldwide Digital Globe satellite imagery. CIESIN population counts aggregated from worldwide census data are allocated to blocks where imagery appears to contain buildings.
The data are organized in an S3 bucket using hierarchical object prefixes based on data format, year and month, country ISO code, and demographic type. With the exception of the data format, the naming hierarchy uses the Apache Hive partitioning convention. This hierarchy structure follows this pattern: s3://dataforgood-fb-data/[data-format]/month=[year-and-month]/country=[country-ISO]/type=[demographic-type]/[object-name]
.
In detail, the components of this prefix structure are defined as:
[data-format]
= Format of the data, either text delimited (csv
) or GeoTiff (tif
) (NOTE: in this case csv
denotes text delimited data, however the delimiter type is a tab rather than a comma)[year-and-month]
= Year and month in YY-MM
format[country-ISO]
= 3-digit ISO country code[demographic-type]
= A brief text description of the demographic of interest.[object-name]
= The name of the object, excluding the prefixFor example: s3://dataforgood-fb-data/csv/month=2019-06/country=WLF/type=men/WLF_men.csv.gz
In this case, we know that:
WLF_men.csv.gz
If you use the AWS Command Line Interface, you can list the top-level contents of the bucket with this command:
aws s3 ls s3://dataforgood-fb-data
You can additionally download data using the AWS CLI. For example, to download the GeoTIFF data and associated XML metadata for youths aged 15-24 in Zimbabwe, we can use the following command:
aws s3 cp s3://dataforgood-fb-data/tif/month=2019-06/country=ZWE/type=youth_15_24/ ./ --recursive
Because the dataset includes a text delimited version, we can easily access and query the data using AWS Athena and ordinary SQL. This requires creating an external table, which can be accomplished within the AWS Console for Athena.
For convenience, you can use the following query within the Athena console to create this table definition:
CREATE EXTERNAL TABLE IF NOT EXISTS hrsl (
`latitude` double,
`longitude` double,
`population` double
) PARTITIONED BY (
month string,
country string,
type string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://dataforgood-fb-data/csv/'
TBLPROPERTIES ('has_encrypted_data'='false', 'skip.header.line.count'='1');
Because the dataset is partitioned you must make Athena aware of the partition structure. You can do this by running the following query from the Athena console:
MSCK REPAIR TABLE hrsl;
Once the partitions have been added you can query the dataset as desired, e.g.:
SELECT *
FROM hrsl
WHERE country = 'WLF'
AND type = 'children_under_five'
LIMIT 10;