This repository contains all the code for creating, visualizing and evaluating the REVEAL dataset!
Required Python version == 3.10
The dataset can be found here https://doi.org/10.17026/PT/DITX0A. The paper can be found here https://doi.org/10.1016/j.fsidi.2025.302006.
[!NOTE] The dataset is currently still being uploaded.
The steganography tools that were used to create this dataset have not been included in this repository and can be downloaded separately. However, the apply_modifications_and_generate_stego_commands.py file does contain all commands that have been used to create stego images with the command line based tools. In order to create stego images for GUI based tools like Hide'N'Send, Microsoft Power Automate was used to automate clicking various buttons. Unfortunately, this application did not provide a way to share the automation scripts, but the GIF below gives a small impression of how this was done.
In case you wish to generate your own dataset, we provided the scripts we used to create our dataset in this repository as well. The steps below discuss in more detail how to do this step-by-step.
The first step when creating a new dataset is to generate a new database master file containing the ground truth information and metadata of each image file in the dataset. This step will also randomly generate a sequence of operations applied to each image. This file can be generated by running:
python "creation/generate_database_master_file.py" -p "/path/to/picture/dir" -c "/path/to/camera/info" -t "/path/to/tools/info" -o "/path/to/output/dir"-p: the path to a directory containing all original images that should be included in the dataset. This directory should contain a separate folder with images per distinct camera.-c: the path to a .csv file containing metadata information on the camera's that were used to take the images (see 'CameraInfo.csv' for an example).-t: the path to a .xlsx file containing information on the stego tools that should be included in the dataset (see 'ToolsRunInfo.xlsx' for an example).-o: the path to a directory where the database master file is saved to.
The second step is to generate the modified images as described in the database master file and generate the resources needed to run the stego tools. This can be done by running this script:
python "creation/apply_modifications_and_generate_stego_commands.py" \
-d "/path/to/database/master/file" \
-t "/path/to/tools/info" \
-or "/path/to/original/picture/dir" \
-m "/path/to/modified/picture/dir" \
-s "/path/to/stego/picture/dir" \
-mes "/path/to/message/dir" \
-l "/path/to/stego/tools/linux" \
-o "/path/to/output/dir" -d: the path to the dataset master file.-t: the path to a .xlsx file containing information on the stego tools that should be included in the dataset (see 'ToolsRunInfo.xlsx' for an example).-or: the path to the directory containing all original pictures. This directory should contain a separate folder with images per distinct camera.-m: the path to the directory where all modified pictures will be stored.-s: the path to the directory where all stego pictures will be stored.-mes: the path to the directory where all messages will be stored.-l: the path to the directory where all stego tools that run on linux are stored. Each tool should be stored in a separate folder in this directory. This is used to automatically format the commands to run all tools for the correct images.-o: the path to a directory where the database master file is saved to.
If you want to add new stego tools, perform the following steps:
- Make sure the dataset master file was generated with an up-to-date tools info file containing the new stego tools.
- (For Linux tools) Ensure the new stego tools are located in the linux tool directory that is provided to the script above.
- (For Linux tools) Add a template string to run the stego tool to the
write_tool_commandfunction in the "creation/apply_modifications_and_generate_stego_commands.py" file.
The script above will generate a script file for all linux tools that can be executed to run the stego files on the corresponding images. For instance:
python commands_linux.pyThe GUI based tools need to be used manually with the corresponding images or by using some GUI automation software. We implemented this using Microsoft Power Automate, but unfortunately it was not possible to share this script since the software did not provide this option.
After all stego images have been generated, several checks can be performed to check if dataset creation was successful, such that:
- All stego image hashes should be different from the hashes of their cover images.
- All image hashes with preprocessing operations should be different from the hashes of their original images.
- All stego image hashes in the control condition should be equal to the hashes of their cover images.
- All image hashes of preprocessed images without preprocessing operations should be equal to the hashes of their original images.
- All hashes in the dataset should be unique.
To automatically perform these checks, you can use the following script:
python "creation/verify_stego_file_creation.py" -d "/path/to/database/master/file" -o "/path/to/original/picture/dir" -m "/path/to/modified/picture/dir" -s "/path/to/stego/picture/dir"-d: the path to the dataset master file.-o: the path to the directory containing all original pictures. This directory should contain a separate folder with images per distinct camera.-m: the path to the directory where all modified pictures will be stored.-s: the path to the directory where all stego pictures will be stored.
CC BY-SA 4.0
