- Shell 87.2%
- Dockerfile 12.8%
| paw | ||
| postgresql | ||
| wsi-anon@442aeb2047 | ||
| .gitignore | ||
| .gitmodules | ||
| docker-compose-standalone.yml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| env.example | ||
| LICENSE | ||
| parse.example | ||
| README.md | ||
paw: Parse and anonymize WSIs
Prospectively parse and anonymize WSIs by dropping files into a folder.
André Lametti, 2025
Quick start
- Make sure Docker is installed for
your operating system.
On some systems (e.g. Synology) it is already installed. - Download or clone this repository:
git clone https://codeberg.org/bertogatti/paw.git && cd paw - Create an
.envto set the master password for the database (you can modify and renameenv.exampleto.env):# SET THIS POSTGRES_PASSWORD=some-very-secure-password - Create the default data directories:
mkdir -p data/identified mkdir -p data/anonymized - Start your instance:
docker compose up -d # Or, in db-less mode docker compose -f docker-compose-standalone.yml up -d
You can now drop WSIs into data/identified; every minute, new files will be
anonymized and output into data/anonymized.
Database access
With the default configuration, this project exposes a web interface accessible at http://localhost:8080.
Login with the following credentials (these should be pre-filled with the exception of the password):
- System:
PostgreSQL - Server:
db - Username:
postgres - Password: the password you set in step 3 above
- Database:
wsianon
Advanced usage
paw works best under Docker. There are two common ways you may wish to modify
the default installation detailed below, but endless customization is possible.
Changing the location of the identified and anonymized folders
You can define custom directories for the identified and anonymized folders
through environment variables. As docker-compose.yml reads these from your
.env file, you can add the following definitions:
# SET THIS
POSTGRES_PASSWORD=some-very-secure-password
WSI_IDENTIFIED_FOLDER=/path/to/your/identified/folder
WSI_ANONYMIZED_FOLDER=/path/to/your/anonymized/folder
Make sure these folders have the right permissions (read and write for user 1000:1000, by default).
Adding custom filename parsers
You can define custom parsers to preserve specimen, block and slide numbers in
your anonymized filenames. Custom parsers are plain text files mounted to the
/app/parse/custom.d directory. By default, paw will search for the parse
directory relative to the docker-compose.yml file is located. This behaviour
can be changed by defining PARSE_DEFS_CUSTOM_DIR in the .env file.
Each file consists of one or many lines, each containing three literal
tab-separated fields (unicode U+0009):
- A regular expression matching the filename you want to parse, with parenthesis groups defining relevant information
- A comma-separated list to match groups with information categories, in order:
- PREFIX: the accession prefix, often letters
- YEAR: the year, in whichever format you wish to preseve in the accession number
- CASE: the serial number of the case
- SPECIMEN: the specimen identifier, often a letter
- BLOCK: the block identifier, often a number
- LEVEL: the level identifier, usually a number
- SCAN_DATE: the date of scanning, ideally in YYYY-MM-DD format
- SCAN_TIME: the time of scanning
- EXTENSION: the file extension
- A field that defines the "clean" accession number that will be logged in the database
For example, one may wish to parse the following file name:
00000AS20240987654;A;1;1 - 2024-01-01 00.00.00.ndpi.
The following text file would detect and parse similar filenames, assuming the
desired output format for the accession number is AS24987654:
^0+([A-Z]{2})20([0-9]{2})[0-9]([0-9]{6});([A-Z]*);([0-9]*);([0-9]*) - ([0-9-]*) ([0-9.]+)\.(.*)$ \1,\2,\3,\4,\5,\6,\7,\8,\9 \1\2\3
paw will read the files in alphabetical order, from top to bottom, and stop as
soon as a match is found (and will log in which file this occurred). If no match
is found, the default parser will be used, which does not preserve specimen,
block, and slide information:
^([A-Z0-9._-]{3,}[0-9]{2,})[^A-Z0-9].*\.(.*)$ ,,,,,,,,\2 \1
.*\.(.*)$ ,,,,,,,,\1
See the example parse definition file.
Running paw as a different user
Running paw with a user other than 1000:1000 requires build-time and run-time
configuration.
On linux-base systems, find out your user and group ID with the id
command-line tool.
To build the docker image with a user id and group id of your choice, run:
docker build --build-arg UID=$UID --build-arg GID=$GID -t paw:latest .
# Or, to automatically use your UID/GID:
docker build --build-arg UID=$(id -u) --build-arg GID=$(id -g) -t paw:latest .
Then, edit docker-compose.yml to run with the correct user (in this example,
1001:1001):
user: "1001:1001"
Finally, start up the project as usual with docker compose up -d. Make sure
the anonymized and identified folders have the correct permissions.
Running paw with an external database
You can connect to an existing database and only run the paw service.
In docker-compose.yml, delete or comment out the db: and adminer: blocks
by prepending # to each line.
Then, comment out the following lines:
depends_on:
- db
And uncomment the following line: network_mode: host.
Provide the connection information to your database in .env:
# SET THIS
POSTGRES_PASSWORD=some-very-secure-password
PSQL_HOST=your.host.com
PSQL_USER=username
PSQL_DB=database-name
PSQL_PORT=9876
Then, start the project as usual with docker compose up -d.
Non containerized use
All scripts are aimed to be written in a POSIX-compliant fashion. If wsi-anon
is installed on your system, you can likely run paw/anonymize.sh with minimal
configuration as long as the PATH_TO_WSI_ANON environment variable is set. See
anonymize.sh for the list of default environment variables.
Installation
See
wsi-anon's dependencies
for troubleshooting make.
git clone https://codeberg.org/bertogatti/paw.git && cd paw
cd wsi-anon
make
Contributing
Pull requests, issues, and emails with patches are all welcome.
Known issues
Double quotes ("): wsi-anon cannot handle " in either input our output
file names, even when properly escaped (it will cause a segmentation fault).
paw handles double quote escaping, but this will cause errors. Single quotes
do not appear to be a problem.
License
See the license.