A modular pipeline for converting natural language OTC derivatives into Common-Domain-Model (CDM) representations using LLM and RAG.
IJCAI'25 Paper : https://arxiv.org/abs/2506.01063 AI4F Symposium'25 Paper : https://arxiv.org/abs/2510.23990
This repository enables an end-to-end automation flow for OTC derivatives processing:
- Natural Language contract description β CDM representation
It also supports synthetic data generation and contract template creation.
β Recommended: Use Conda with Python
3.11.10
# Step 1: Clone the repository
git clone https://github.com/smart-derivative-contracts/automating-otc-derivatives.git
cd automating-otc-derivatives
# Step 2: Create and activate conda environment
conda create -n <your_env_name> python=3.11.10
conda activate <your_env_name>
# Step 3: Install dependencies
# If any package fails, remove it from requirements.txt and try again
pip install -r requirements.txt-
src/natural_language_to_cdm/
Contains experiments and models for converting contract descriptions to CDM format -
generate_text_from_cdm.py
Generates synthetic natural language descriptions from CDM data -
create_contract_templates.py
Generates six contract templates based on the CDM schema
The figure below illustrates the overall pipeline (CDMizer) used for populating CDM templates using RAG-enhanced prompting and LLM inference:
CDMizer Workflow: Recursive traversal, governed by a depth threshold (d), selects substructures (e.g., assignedIdentifier) where the deepest subtree has a depth β€ d. This ensures manageable task sizes for efficiency and accuracy. Context-aware prompts, incorporating object definitions, traversal paths, and RAG-retrieved examples, guide the LLM in populating fields, which are then validated. Recursive traversal ensures the entire structure is systematically completed.
- Python version must be 3.11.10 for compatibility.
- If certain packages in
requirements.txtfail to install, comment them out and retry. - The repository is under active development.