- AWS s3 tools
- Rasterio from S3 investigations
- Utilities for data visualizations in notebooks
This repository provides a number of small libraries and CLI tools.
Full list of libraries, and install instructions:
odc.uitools for data visualization in notebook/labodc.indexextra utils for working with datacube databaseodc.geomgeometry utils and prototypesodc.algoalgorithms (GeoMedian wrapper is here)odc.iocommon IO utilities, used by apps mainlyodc.awsAWS/S3 utilities, used by apps mainlyodc.aiofaster concurrent fetching from S3 with async, used by appsodc.dscacheexperimental key-value store wherekey=UUID,value=Datasetodc.dtoolstools/experiments in the area of dask.distributed/dask<>datacube integrationodc.pptparallel processing helper methods, internal lib
Installation requires using custom package repo https://packages.dea.ga.gov.au.
pip install --extra-index-url="https://packages.dea.ga.gov.au" \
odc-ui \
odc-index \
odc-geom \
odc-algo \
odc-io \
odc-aws \
odc-aio \
odc-dscache \
odc-dtools
NOTE: on Ubuntu 18.04 default pip version is awfully old and does not
support --extra-index-url command line option, so make sure to upgrade pip
first: pip3 install --upgrade pip.
- For cloud (AWS only)
pip install --extra-index-url="https://packages.dea.ga.gov.au" odc-apps-cloud - For cloud (GCP, THREDDS and AWS)
pip install --extra-index-url="https://packages.dea.ga.gov.au" 'odc-apps-cloud[GCP,THREDDS]' - For
dc-index-from-tar(indexing to datacube from tar archive)pip install --extra-index-url="https://packages.dea.ga.gov.au" odc-apps-dc-tools
NOTE: cloud tools depend on aiobotocore which has a dependency on a specific
version of botocore, boto3 also depends on a specific version of botocore
as a result having both aiobotocore and boto3 in one environment can be a bit
tricky. The easiest way to solve this is to install aiobotocore[awscli,boto3] before
anything else, which will pull in a compatible version of boto3 and awscli into the
environment.
pip install -U 'aiobotocore[awscli,boto3]'
s3-findlist S3 bucket with wildcards3-to-tarfetch documents from S3 and dump them to tar archivegs-to-tarsearch GS for documents and dump them to tar archivedc-index-from-tarread yaml documents from tar archive and add them to datacube
Example:
#!/bin/bash
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'
s3-find "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineageFastest way to list regularly placed files is to use fixed depth listing:
#!/bin/bash
# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'
s3-find --skip-check "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineageWhen using Google Storage:
#!/bin/bash
# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz