This automation downloads the following resources from the confluence API:
- Spaces
- Pages
- Attachments
- Blogposts
- Comments
- Tasks
The data is stored in Google BigQuery and the attachments are stored in Google Cloud Storage.
Create a .env file in the project root directory and set the following variables:
BASE_URL="https://{your-atlassian-domain}/wiki/api/v2/"
EMAIL="<email with access to confluence>"
API_TOKEN="<API token>"
PAGES_TABLE="pages"
COMMENTS_TABLE="comments"
ATTACHMENTS_TABLE="attachment"
SPACES_TABLE="spaces"
TASKS_TABLE="tasks"
BLOGPOSTS_TABLE="blogposts"
PROJECT_NAME="<GCP Project name>"
DATASET="<BigQuery dataset name>"
GCP_STORAGE_BUCKET="<Cloud Storage bucket name>"
GOOGLE_APPLICATION_CREDENTIALS="<path to your GCP service account JSON>"
GCP_LOGGING_SERVICE_NAME="<Cloud Logging logger name>"
TMP_DOWNLOADS_FOLDER="<downloads folder>"
Grant the following permissions to your Service Account
- Storage Object Creator
- Storage Object User
- Storage Object Viewer
- BigQuery Admin
- BigQuery Data Editor
- BigQuery Data Owner
- BigQuery Data Viewer
- BigQuery Job User
- Logs Writer
Execute the following command in your project folder terminal to run the application:
docker compose up
Refer tp the LICENSE for terms of use.
Contributions are very much welcome, open a PR with your additions and request a review.
If you encounter problems feel free to open an issue.