Parse and anonymize pathology whole-slide images
  • Shell 87.2%
  • Dockerfile 12.8%
Find a file
2025-03-09 23:42:32 -04:00
paw Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
postgresql Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
wsi-anon@442aeb2047 Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
.gitignore Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
.gitmodules Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
docker-compose-standalone.yml Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
docker-compose.yml Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
Dockerfile Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
env.example Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
LICENSE Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
parse.example Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00
README.md Initial release for USCAP 2025 2025-03-09 23:42:32 -04:00

paw: Parse and anonymize WSIs

Prospectively parse and anonymize WSIs by dropping files into a folder.

André Lametti, 2025

Quick start

  1. Make sure Docker is installed for your operating system.
    On some systems (e.g. Synology) it is already installed.
  2. Download or clone this repository:
    git clone https://codeberg.org/bertogatti/paw.git && cd paw
    
  3. Create an .env to set the master password for the database (you can modify and rename env.example to .env):
    # SET THIS
    POSTGRES_PASSWORD=some-very-secure-password
    
  4. Create the default data directories:
    mkdir -p data/identified
    mkdir -p data/anonymized
    
  5. Start your instance:
    docker compose up -d
    # Or, in db-less mode
    docker compose -f docker-compose-standalone.yml up -d
    

You can now drop WSIs into data/identified; every minute, new files will be anonymized and output into data/anonymized.

Database access

With the default configuration, this project exposes a web interface accessible at http://localhost:8080.

Login with the following credentials (these should be pre-filled with the exception of the password):

  • System: PostgreSQL
  • Server: db
  • Username: postgres
  • Password: the password you set in step 3 above
  • Database: wsianon

Advanced usage

paw works best under Docker. There are two common ways you may wish to modify the default installation detailed below, but endless customization is possible.

Changing the location of the identified and anonymized folders

You can define custom directories for the identified and anonymized folders through environment variables. As docker-compose.yml reads these from your .env file, you can add the following definitions:

# SET THIS
POSTGRES_PASSWORD=some-very-secure-password
WSI_IDENTIFIED_FOLDER=/path/to/your/identified/folder
WSI_ANONYMIZED_FOLDER=/path/to/your/anonymized/folder

Make sure these folders have the right permissions (read and write for user 1000:1000, by default).

Adding custom filename parsers

You can define custom parsers to preserve specimen, block and slide numbers in your anonymized filenames. Custom parsers are plain text files mounted to the /app/parse/custom.d directory. By default, paw will search for the parse directory relative to the docker-compose.yml file is located. This behaviour can be changed by defining PARSE_DEFS_CUSTOM_DIR in the .env file.

Each file consists of one or many lines, each containing three literal tab-separated fields (unicode U+0009):

  • A regular expression matching the filename you want to parse, with parenthesis groups defining relevant information
  • A comma-separated list to match groups with information categories, in order:
    • PREFIX: the accession prefix, often letters
    • YEAR: the year, in whichever format you wish to preseve in the accession number
    • CASE: the serial number of the case
    • SPECIMEN: the specimen identifier, often a letter
    • BLOCK: the block identifier, often a number
    • LEVEL: the level identifier, usually a number
    • SCAN_DATE: the date of scanning, ideally in YYYY-MM-DD format
    • SCAN_TIME: the time of scanning
    • EXTENSION: the file extension
  • A field that defines the "clean" accession number that will be logged in the database

For example, one may wish to parse the following file name: 00000AS20240987654;A;1;1 - 2024-01-01 00.00.00.ndpi.

The following text file would detect and parse similar filenames, assuming the desired output format for the accession number is AS24987654:

^0+([A-Z]{2})20([0-9]{2})[0-9]([0-9]{6});([A-Z]*);([0-9]*);([0-9]*) - ([0-9-]*) ([0-9.]+)\.(.*)$	\1,\2,\3,\4,\5,\6,\7,\8,\9	\1\2\3

paw will read the files in alphabetical order, from top to bottom, and stop as soon as a match is found (and will log in which file this occurred). If no match is found, the default parser will be used, which does not preserve specimen, block, and slide information:

^([A-Z0-9._-]{3,}[0-9]{2,})[^A-Z0-9].*\.(.*)$	,,,,,,,,\2	\1
.*\.(.*)$	,,,,,,,,\1

See the example parse definition file.

Running paw as a different user

Running paw with a user other than 1000:1000 requires build-time and run-time configuration.

On linux-base systems, find out your user and group ID with the id command-line tool.

To build the docker image with a user id and group id of your choice, run:

docker build --build-arg UID=$UID --build-arg GID=$GID -t paw:latest .
# Or, to automatically use your UID/GID:
docker build --build-arg UID=$(id -u) --build-arg GID=$(id -g) -t paw:latest .

Then, edit docker-compose.yml to run with the correct user (in this example, 1001:1001):

user: "1001:1001"

Finally, start up the project as usual with docker compose up -d. Make sure the anonymized and identified folders have the correct permissions.

Running paw with an external database

You can connect to an existing database and only run the paw service.

In docker-compose.yml, delete or comment out the db: and adminer: blocks by prepending # to each line.

Then, comment out the following lines:

depends_on:
  - db

And uncomment the following line: network_mode: host.

Provide the connection information to your database in .env:

# SET THIS
POSTGRES_PASSWORD=some-very-secure-password
PSQL_HOST=your.host.com
PSQL_USER=username
PSQL_DB=database-name
PSQL_PORT=9876

Then, start the project as usual with docker compose up -d.

Non containerized use

All scripts are aimed to be written in a POSIX-compliant fashion. If wsi-anon is installed on your system, you can likely run paw/anonymize.sh with minimal configuration as long as the PATH_TO_WSI_ANON environment variable is set. See anonymize.sh for the list of default environment variables.

Installation

See wsi-anon's dependencies for troubleshooting make.

git clone https://codeberg.org/bertogatti/paw.git && cd paw
cd wsi-anon
make

Contributing

Pull requests, issues, and emails with patches are all welcome.

Known issues

Double quotes ("): wsi-anon cannot handle " in either input our output file names, even when properly escaped (it will cause a segmentation fault). paw handles double quote escaping, but this will cause errors. Single quotes do not appear to be a problem.

License

See the license.