Skip to content

A sample of some of the techniques used during my MSc project using small language models to generate formative writing feedback.

Notifications You must be signed in to change notification settings

bergamotBen/feedback_generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generating formative feedback with DSPy

A sample of some of the techniques used during my MSc project using small language models to generate formative writing feedback.

Contents

⚙️ Tools

  • Ollama: an open-source tool for running LLMs locally.
  • DSPy: a framework that automates prompt optimisation and orchestrates LLMs.
  • Langchain: a framework for developing LLM applications.
  • Chroma: an open-source vector database.
  • Pandas: a Python library for manipulation and analysis of structured data.
  • Streamlit: a little app builder.

The full list of packages is listed in requirements.txt.

▶️ Try it for yourself

Have a look at the results on your local machine.

  1. Create a venv
python3 -m venv .venv
source .venv/bin/activate
  1. Install requirements
pip install -r requirements.txt
  1. Run the app
cd app
streamlit run app.py

📝 Data and preprocessing

The dataset created for the dissertation contains materials licensed for educational purposes only. The data used in this repo is adapted from the Persuade 2.0 Corpus To emulate the examples of writing used in my dissertation (GCSE English students), I have selected a random sample of 100 essays by tenth grade students. From this set, I selected 100 paragraphs longer than six words.

The columns 'ID', 'text' and 'score' in data.csv come from the original corpus.

💬 Feedback in the dataset

For demonstration purposes, the feedback columns in data.csv are generated using the model developed during the project.

🤖 Models

🔁 Optimising the prompt

This notebook uses a DSPy Module (WriteFeedback), the LabeledFewShot optimizer (/teleprompter) and Evaluate, to create an optimised few-shot prompt.

🧠 The vector database

The training dataset is vectorised into documents with the following structure:

    feedback_doc = Document(
        page_content=data['feedback'],
        metadata= {'text': data['text']}
    )

The documents are used as a knowledge base when the Evaluator is at work.

🦉 Evaluator

The evaluator uses a test set of unseen examples and a quality metric. An optimum prompt is chosen through iterating over combinations of examples from the training data and scoring their quality. The results for this optimization:

Average Metric: 62.00 / 20 (310.0%):
100%|██████████| 20/20
[05:46<00:00, 17.33s/it]
2025/10/08 11:47:13
INFO dspy.evaluate.evaluate: Average Metric: 62 / 20 (310.0%)

(I struggle to make sense of this, if anyone can explain it to me, it'd be a great help!) Broadly, it says that out of the 20 iterations, the average score by the judge was 3.1 / 5. Not particularly great, but I think this can be attributed to the artificial data.

About

A sample of some of the techniques used during my MSc project using small language models to generate formative writing feedback.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published