- AWS s3 tools
- Rasterio from S3 investigations
- Utilities for data visualizations in notebooks
This repository provides a number of small libraries and CLI tools.
Full list of libraries, and install instructions:
odc.algoalgorithms (GeoMedian wrapper is here)odc.statslarge scale processing framework (Moved to odc-stats)odc.uitools for data visualization in notebook/labodc.stacSTAC to ODC conversion tools (Moved to odc-stac)odc.dscacheexperimental key-value store wherekey=UUID,value=Dataset(moved to odc-dscache)odc.iocommon IO utilities, used by apps mainlyodc-cloud[ASYNC,AZURE,THREDDS]cloud crawling support packageodc.awsAWS/S3 utilities, used by apps mainlyodc.aiofaster concurrent fetching from S3 with async, used by appsodc-cloud[ASYNC]odc.{thredds,azure}internal libs for cloud IOodc-cloud[THREDDS,AZURE]
Pre-release of these libraries is on PyPI now, so can be installed with pip
"the normal way". Most recent development versions of odc-tools packages are
pushed to https://packages.dea.ga.gov.au, and can be installed like so:
pip install --extra-index-url="https://packages.dea.ga.gov.au" \
odc-ui \
odc-stac \
odc-stats \
odc-algo \
odc-io \
odc-cloud[ASYNC] \
odc-dscache
NOTE: on Ubuntu 18.04 the default pip version is awfully old and does not
support --extra-index-url command line option, so make sure to upgrade pip
first: pip3 install --upgrade pip.
Currently there are no odc-tools conda packages. But majority of odc-tools
dependencies can be installed with conda from conda-forge channel.
Use conda env update -f <file> to install all needed dependencies for
odc-tools libraries and apps.
Conda `environment.yaml` (click to expand)
channels:
- conda-forge
dependencies:
# Datacube
- datacube>=1.8.5
# odc.dscache
- python-lmdb
- zstandard
# odc.algo
- dask-image
- numexpr
- scikit-image
- scipy
- toolz
# odc.ui
- ipywidgets
- ipyleaflet
- tqdm
# odc-apps-dc-tools
- pystac>=1
- pystac-client>=0.2.0
- azure-storage-blob
- fsspec
- lxml # needed for thredds-crawler
# odc.{aio,aws}: aiobotocore/boto3
# pin aiobotocore for easier resolution of dependencies
- aiobotocore==1.3.3
- boto3
# eodatasets3 (used by odc-stats)
- boltons
- ciso8601
- python-rapidjson
- requests-cache
- ruamel.yaml
- structlog
- url-normalize
# for dev
- pylint
- autopep8
- flake8
- isort
- black
- mypy
# For tests
- pytest
- pytest-httpserver
- pytest-cov
- pytest-timeout
- moto
- deepdiff
- pip>=20
- pip:
# odc.apps.dc-tools
- thredds-crawler
# odc.stats
- eodatasets3
# tests
- pytest-depends
# odc.ui
- jupyter-ui-poll
# odc-tools libs
- odc-stac
- odc-algo
- odc-ui
- odc-dscache
- odc-stats
# odc-tools CLI apps
- odc-apps-cloud
- odc-apps-dc-toolsCloud tools depend on aiobotocore package which has a dependency on a specific
version of botocore. Another package we use, boto3, also depends on a
specific version of botocore. As a result having both aiobotocore and
boto3 in one environment can be a bit tricky. The easiest way to solve this,
is to install aiobotocore[awscli,boto3] before anything else, which will pull
in a compatible version of boto3 and awscli into the environment.
pip install -U "aiobotocore[awscli,boto3]==1.3.3"
# OR for conda setups
conda install "aiobotocore==1.3.3" boto3 awscli
The specific version of aiobotocore is not relevant, but it is needed in
practice to limit pip/conda package resolution search.
- For cloud (AWS only)
pip install odc-apps-cloud - For cloud (GCP, THREDDS and AWS)
pip install odc-apps-cloud[GCP,THREDDS] - For
dc-index-from-tar(indexing to datacube from tar archive)pip install odc-apps-dc-tools
s3-findlist S3 bucket with wildcards3-to-tarfetch documents from S3 and dump them to a tar archivegs-to-tarsearch GS for documents and dump them to a tar archivedc-index-from-tarread yaml documents from a tar archive and add them to datacube
Example:
#!/bin/bash
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'
s3-find "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineageFastest way to list regularly placed files is to use fixed depth listing:
#!/bin/bash
# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'
s3-find --skip-check "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineageWhen using Google Storage:
#!/bin/bash
# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gzThe following steps is used in github workflow main.yml
# build environment from file
mamba env create -f tests/test-env-py38.yml
# this environment name is defined in tests/test-env-py38.yml file
conda activate odc-tests-py38
# install additional packages
./scripts/dev-install.sh --no-deps
# setup database for testing
./scripts/setup-test-db.sh
# run test
echo "Running Tests"
pytest --cov=. \
--cov-report=html \
--cov-report=xml:coverage.xml \
--timeout=30 \
libs apps
# Optional, to delete the environment
conda env remove -n odc-tests-py38Development versions of packages are pushed to DEA packages
repo on every push to develop branch,
version is automatically increased by a script that runs before creating wheels
and source distribution tar balls. Right now new dev version is pushed for all
the packages even the ones that have not changed since last push.
Publishing to PyPi happens automatically when changes are
pushed to a protected pypi/publish branch. Only members of Open Datacube
Admins group have the
permission to push to this branch.
Process:
- Manually edit
{lib,app}/{pkg}/odc/{pkg}/_version.pyfile to increase version number - Merge it to
developbranch via PR - Fast forward
pypi/publishbranch to matchdevelop - Push it to GitHub
Steps 3 and 4 can be done by an authorized user with
./scripts/sync-publish-branch.sh script.