A Java application that allows you to work with Delta Lake tables across environments. The tool provides two main functions:
- Export: Export Delta Lake tables from S3 or local filesystem to a local ZIP archive
- Import: Import previously exported Delta tables to S3 or local filesystem
This tool allows you to create a portable snapshot of a Delta table between two versions and replicate it elsewhere.
- Delta log JSON files (
_delta_log/*.json) from the specified version range - The
_last_checkpointfile (if it exists) - Parquet data files referenced in the add operations within the logs
- Deletion vector files (if any) defined under
add.deletionVector.dvFile
- Java 17 or higher
- Maven 3.6 or higher
- AWS credentials with access to the S3 bucket containing the Delta table (only needed when accessing S3)
mvn clean packageThis will create a JAR file with all dependencies in the replicator/target directory.
java -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--table-path <path-to-delta-table> \
--output-zip /path/to/output.zip \
[--from-version 5] \
[--s3-access-key YOUR_ACCESS_KEY] \
[--s3-secret-key YOUR_SECRET_KEY] \
[--s3-endpoint custom-endpoint.example.com] \
[--s3-path-style-access] \
[--temp-dir /path/to/temp] \
[--cleanup]| Option | Description |
|---|---|
-s, --table-path |
Path to the Delta table (s3a://bucket/path/to/table or file:///path/to/table) |
-o, --output-zip |
Local path where the ZIP file will be created |
-f, --from-version |
Starting version to export (inclusive, default: 0) |
--s3-access-key |
AWS access key (only needed for S3 paths) |
--s3-secret-key |
AWS secret key (only needed for S3 paths) |
--s3-endpoint |
S3 endpoint (for S3-compatible storage, only needed for S3 paths) |
--s3-path-style-access |
Use path-style access (for S3-compatible storage, only needed for S3 paths) |
--tmp, --temp-dir |
Temporary directory to use for downloading files |
-c, --cleanup |
Clean up temporary directory after export |
-h, --help |
Display help information |
java -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--archive-file /path/to/delta-export.zip \
--target-path <path-to-target-location> \
[--overwrite] \
[--merge-schema] \
[--s3-access-key YOUR_ACCESS_KEY] \
[--s3-secret-key YOUR_SECRET_KEY] \
[--s3-endpoint custom-endpoint.example.com] \
[--s3-path-style-access] \
[--temp-dir /path/to/temp] \
[--cleanup]| Option | Description |
|---|---|
-a, --archive-file |
Path to the 7-Zip archive containing the Delta table export |
-t, --target-path |
Path where the Delta table will be created (s3a://bucket/path/to/table or file:///path/to/table) |
-o, --overwrite |
Overwrite the target table if it exists |
-m, --merge-schema |
Merge the schema with the existing table if it exists |
--s3-access-key |
AWS access key (only needed for S3 paths) |
--s3-secret-key |
AWS secret key (only needed for S3 paths) |
--s3-endpoint |
S3 endpoint (for S3-compatible storage, only needed for S3 paths) |
--s3-path-style-access |
Use path-style access (for S3-compatible storage, only needed for S3 paths) |
--tmp, --temp-dir |
Temporary directory to use for extracting files |
-c, --cleanup |
Clean up temporary directory after import |
-h, --help |
Display help information |
You can provide AWS credentials in several ways (only needed when accessing S3):
- Using the
--s3-access-keyand--s3-secret-keycommand line options - Using environment variables (
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY) - Using AWS credentials file (
~/.aws/credentials) - Using EC2 instance profiles or container roles if running on AWS
java -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--table-path s3a://my-bucket/my-delta-table \
--output-zip ~/exports/my-table-export.zipjava -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--table-path file:///path/to/my-delta-table \
--output-zip ~/exports/my-table-export.zipjava -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--table-path s3a://my-bucket/my-delta-table \
--output-zip ~/exports/my-table-export.zip \
--from-version 10java -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--table-path s3a://my-bucket/my-delta-table \
--output-zip ~/exports/my-table-export.zip \
--s3-endpoint https://minio.example.com \
--s3-path-style-accessjava -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--archive-file ~/exports/my-table-export.zip \
--target-path s3a://my-bucket/my-imported-delta-tablejava -jar replicator/target/replicator-1.0-SNAPSHOT-jar-with-dependencies.jar \
--archive-file ~/exports/my-table-export.zip \
--target-path file:///path/to/target-delta-table \
--overwrite- The application detects whether the Delta table is on S3 or local filesystem based on the table path
- It connects to the appropriate filesystem using the Hadoop FileSystem API
- It reads Delta log files directly from the source for the specified version range and up to the latest version available
- Each log entry is parsed to extract data file paths and deletion vector paths
- All required files are downloaded to a temporary local folder
- A ZIP archive is created with the correct internal structure
- The temporary directory is cleaned up if the
--cleanupoption is specified
- The ZIP archive is extracted to a temporary directory
- The application detects whether the target location is on S3 or local filesystem based on the path
- It connects to the appropriate filesystem using the Hadoop FileSystem API
- All files from the extracted archive are uploaded to the target location, maintaining the Delta table structure
- If overwrite is specified, any existing table at the location will be overwritten
- If merge-schema is specified, the schema will be merged with any existing table at the location
- The temporary directory is cleaned up if the
--cleanupoption is specified
This project is licensed under the MIT License - see the LICENSE file for details.