- Edit configuration in
config/default.jsonand - custom environment variables names in
config/custom-environment-variables.json,
- Application constants can be configured in
./constants.js
- Since the data we need to download and process is huge it's better (/ safer) to use 2 different tools instead of one single script so in case that something goes wrong during processing, we'll minimise the damage.
- Run
npm run download-datato download all available datasets. - The datasets will be stored in the configured directory.
- Old data will be replaced.
- This operation does not affect the database.
- Run
npm run import-datato import all data using the downloaded files from the previous step.
Before starting the application, make sure that PostgreSQL is running and you have configured everything correctly in config/default.json
- Install dependencies
npm i - Run lint check
npm run lint - Start app
npm start. This will run all tools in the following sequence:
npm run download-data => npm run import-data
The application will print progress information and the results in the terminal.
- To verify that the data is imported, you can use the pgAdmin tool and browser the database.
- The total size of all datasets is > 1.5GB so it will take quite some time, depending on your internet connection, to finish the operation.
max_old_space_sizehas been set to 4096MB to allow parse/process such huge data files without any issues. The app will clean the memory right after using the data to prevent memory/heap leaks.- The dataset for
FOREIGN ADDRESSESdoesn't have a header in the CSV file and it has slightly different format (it has an extra column). The app handles all datasets without any issue.