Skip to content

moringfix/frame-representation-hypothesis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎌 Frame Representation Hypothesis

Authors: Pedro Valois, Lincon Souza, Erica Kido Shimomoto, Kazuhiro Fukui

The Frame Representation Hypothesis is a robust framework for understanding and controlling LLMs. We use WordNet to generate concepts that can both guide the model text generation and expose biases or vulnerabilities.

arXiv code

💡 Highlights

  • ♻️ Capable of dealing with multi-token words.

  • 🎧 Can use OMW 50M word dataset to build 100,000 concepts.

  • 💪 Tested on Llama 3.1, Gemma 2 and Phi 3 ensuring high-quality responses.

  • 🚀 Very fast and low memory cost. Able to compute all concepts in less than a second and fit both Llama 3.1 8B Instruct and Concepts in a RTX 4090 GPU.

Install

  1. Clone this repository.
git clone https://github.com/Pedrexus/frame-representation-hypothesis
cd frame-representation-hypothesis
  1. Install packages.
pip install -U pip
pip install uv
uv sync
  • We also provide a Docker image (you may need to update the CUDA version to yours)
  1. Add Environment Variables
  1. Download Models

Run 01_START_HERE.ipynb to download all models.

Quick Start

Each experiment in the paper is in one of the jupyter notebooks starting from 02.

LICENSE

Our code is released under the MIT License.

Citation

If you have any questions, please feel free to submit an issue or contact pedro@cvlab.cs.tsukuba.ac.jp.

If our work is useful for you, please cite as:

@article{valois2025framerepresentationhypothesis,
      title={Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation},
      author={Pedro H. V. Valois and Lincon S. Souza and Erica K. Shimomoto and Kazuhiro Fukui},
      journal = {Transactions of the Association for Computational Linguistics},
      year={2025},
      url={https://arxiv.org/abs/2412.07334},
}

About

private research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.9%
  • JavaScript 7.8%
  • Python 7.0%
  • HTML 0.9%
  • CSS 0.2%
  • Dockerfile 0.1%
  • Other 0.1%