Skip to content

A Large-Scale Comprehensive Image Dataset for Steganalysis

License

Notifications You must be signed in to change notification settings

NetherlandsForensicInstitute/REVEAL

Repository files navigation

REVEAL: A Large-Scale Comprehensive Dataset for Steganalysis

This repository contains all the code for creating, visualizing and evaluating the REVEAL dataset!

Required Python version == 3.10

Dataset

The dataset can be found here https://doi.org/10.17026/PT/DITX0A. The paper can be found here https://doi.org/10.1016/j.fsidi.2025.302006.

[!NOTE] The dataset is currently still being uploaded.

Dataset Creation

The steganography tools that were used to create this dataset have not been included in this repository and can be downloaded separately. However, the apply_modifications_and_generate_stego_commands.py file does contain all commands that have been used to create stego images with the command line based tools. In order to create stego images for GUI based tools like Hide'N'Send, Microsoft Power Automate was used to automate clicking various buttons. Unfortunately, this application did not provide a way to share the automation scripts, but the GIF below gives a small impression of how this was done.

HideNSendEmbedding.gif

Generating your own dataset

In case you wish to generate your own dataset, we provided the scripts we used to create our dataset in this repository as well. The steps below discuss in more detail how to do this step-by-step.

1. Generate a database master file

The first step when creating a new dataset is to generate a new database master file containing the ground truth information and metadata of each image file in the dataset. This step will also randomly generate a sequence of operations applied to each image. This file can be generated by running:

python "creation/generate_database_master_file.py" -p "/path/to/picture/dir" -c "/path/to/camera/info" -t "/path/to/tools/info" -o "/path/to/output/dir"
  • -p: the path to a directory containing all original images that should be included in the dataset. This directory should contain a separate folder with images per distinct camera.
  • -c: the path to a .csv file containing metadata information on the camera's that were used to take the images (see 'CameraInfo.csv' for an example).
  • -t: the path to a .xlsx file containing information on the stego tools that should be included in the dataset (see 'ToolsRunInfo.xlsx' for an example).
  • -o: the path to a directory where the database master file is saved to.

2. Apply modifications and generate stego commands

The second step is to generate the modified images as described in the database master file and generate the resources needed to run the stego tools. This can be done by running this script:

python "creation/apply_modifications_and_generate_stego_commands.py" \
        -d "/path/to/database/master/file" \
        -t "/path/to/tools/info" \
        -or "/path/to/original/picture/dir" \
        -m "/path/to/modified/picture/dir" \
        -s "/path/to/stego/picture/dir" \
        -mes "/path/to/message/dir" \
        -l "/path/to/stego/tools/linux" \
        -o "/path/to/output/dir" 
  • -d: the path to the dataset master file.
  • -t: the path to a .xlsx file containing information on the stego tools that should be included in the dataset (see 'ToolsRunInfo.xlsx' for an example).
  • -or: the path to the directory containing all original pictures. This directory should contain a separate folder with images per distinct camera.
  • -m: the path to the directory where all modified pictures will be stored.
  • -s: the path to the directory where all stego pictures will be stored.
  • -mes: the path to the directory where all messages will be stored.
  • -l: the path to the directory where all stego tools that run on linux are stored. Each tool should be stored in a separate folder in this directory. This is used to automatically format the commands to run all tools for the correct images.
  • -o: the path to a directory where the database master file is saved to.

2.1 Adding new stego tools

If you want to add new stego tools, perform the following steps:

  • Make sure the dataset master file was generated with an up-to-date tools info file containing the new stego tools.
  • (For Linux tools) Ensure the new stego tools are located in the linux tool directory that is provided to the script above.
  • (For Linux tools) Add a template string to run the stego tool to the write_tool_command function in the "creation/apply_modifications_and_generate_stego_commands.py" file.

3. Create stego images

The script above will generate a script file for all linux tools that can be executed to run the stego files on the corresponding images. For instance:

python commands_linux.py

The GUI based tools need to be used manually with the corresponding images or by using some GUI automation software. We implemented this using Microsoft Power Automate, but unfortunately it was not possible to share this script since the software did not provide this option.

4. Verify dataset creation

After all stego images have been generated, several checks can be performed to check if dataset creation was successful, such that:

  • All stego image hashes should be different from the hashes of their cover images.
  • All image hashes with preprocessing operations should be different from the hashes of their original images.
  • All stego image hashes in the control condition should be equal to the hashes of their cover images.
  • All image hashes of preprocessed images without preprocessing operations should be equal to the hashes of their original images.
  • All hashes in the dataset should be unique.

To automatically perform these checks, you can use the following script:

python "creation/verify_stego_file_creation.py" -d "/path/to/database/master/file" -o "/path/to/original/picture/dir" -m "/path/to/modified/picture/dir" -s "/path/to/stego/picture/dir"
  • -d: the path to the dataset master file.
  • -o: the path to the directory containing all original pictures. This directory should contain a separate folder with images per distinct camera.
  • -m: the path to the directory where all modified pictures will be stored.
  • -s: the path to the directory where all stego pictures will be stored.

Dataset License

CC BY-SA 4.0

About

A Large-Scale Comprehensive Image Dataset for Steganalysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages