Skip to content

mdrpanwar/memorization-patterns

Repository files navigation

This repository contains the code and data for our NeurIPS 2025 paper:

For Better or for Worse, Transformers Seek Patterns for Memorization
Madhur Panwar, Gail Weiss, Navin Goyal, Antoine Bosselut

    @inproceedings{
        panwar2025for,
        title={For Better or for Worse, Transformers Seek Patterns for Memorization},
        author={Madhur Panwar and Gail Weiss and Navin Goyal and Antoine Bosselut},
        booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
        year={2025},
        url={https://openreview.net/forum?id=98NrkXPRZ9}
    }

Creating the environment

conda env create -f neurips.yml

Tokenizers

For synthetic data, we use custom tokenizers depending on whether the data requires only digits, or digits and letters. The tokenizer files necessary to use them in code as well as the code to create them is placed in tokenizer_local directory.

Generating synthetic datasets

Synthetic datasets as well as the code to generate them is placed under synthetic_data directory. To generate synthetic datasets, run python generate_synthetic_data.py.

Running experiments

For running WikiText experiments, change appropriate parameters and run run_wikitext.sh:

./run_wikitext.sh

For running experiments with synthetic datasets, change appropriate parameters and run run_synthetic.sh:

./run_synthetic.sh

Generating plots

Code for the plots in the paper is in the notebook plots.ipynb.

Maintainers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published