Please note: This repo is not ready for primetime and the gem it provides is not yet published on RubyGems.
In order to use it, you will need to checkout the code and point your Gemfile at a vendored copy (or use the Gemfile config to point at the repo URL):
gem "llm-workbench", path: "vendor/llm-workbench"Then get your bundle updated:
bundle install
Once that’s complete, you can run it with:
bundle exec workbench start <my task or pipeline>- Basic test coverage for core primitives, especially for filesystem lookups
- ERB support for Prompts (currently is a no-op
file.read) - LLM Provider support and helpers (currently call LLMs on your own within the task)
- API publishing support: Likely using roda
- Flow control support: Enhancing pipeline syntax to include control flow
- LLM Output caching: Likely integrating VCR to record and cache useful LLM outputs
- ActiveRecord Support: Focused initially on enabling state-dumping on errors for retries (avoid wasting expensive LLM outputs)
- MLFlow Exporter: Allow monitoring of Pipeline & Task execution from MLFlow
LLM Workbench is an opinionated framework designed to help you build flexible (and hopefully maintainable!) LLM-enabled pipelines in Ruby. You can run these pipelines one-off or automatically host them as individual API endpoints.
Install the Ruby gem:
gem install llm-workbenchOr add it to your bundle:
bundle add llm-workbenchInside your project directory root, create a tasks/ directory and create a task:
mkdir tasks
vi tasks/my_task.rbCreate your Task as a subclass of the main Workbench::Task class, and start writing your code inside the #run method:
require 'workbench'
class MyTask < Workbench::Task
def run
# Do something
end
endNext, define a new Pipeline in your pipeline directory using just your newly created Task:
mkdir pipeline
vi pipelines/my_pipeline.yamlInside my_pipeline.yaml:
name: my_pipeline
description: |
This pipeline runs a single task.
tasks:
- name: my_taskOptional: Define a new Prompt for use with the same name, and access it in one line from your task:
mkdir prompts
vi my_task.v1.prompt.erbInside my_task.rb:
require 'workbench'
class MyTask < Workbench::Task
def run
# Do something
@prompt.render # gets the latest Prompt called "my_task"
end
endRun it! Use the command line tool to run your pipeline (launch from your project directory root):
workbench start my_pipelineThe framework defines a few primitives to help you get started, largely inspired by the separation of concerns encouraged by MVC frameworks like Rails. (At this time, the gem does not directly integrate with Rails)
- Pipeline: A sequenced collection of Tasks, defined in YAML along with optional inputs and other run-time configurations for each Task
- Task: A self-contained set of Ruby code corresponding to a logical action, whether LLM-enabled or deterministic (e.g. simple parsing or processing)
- Prompt: An ERB file containing a prompt for use in a Task, identified by its filename and optional properties included in its file extension, including:
- Version: e.g. “v1” identifying a working version or series of prompts
- Model Provider: A model provider like “OpenAI”, “Anthropic,” or “Google”
- Model: A specific version of model like “gpt-4”, “claude-3.7-sonnet”, or “gemini-1.5-pro”
- Schema: A JSON schema that can be used with a Prompt inside a Task to constrain the output of an LLM response
Workbench provides useful tools (e.g. lookup, execution, and logging support) for each primitive that you can leverage inside the code you write for each Task.
- Structured Logging: Workbench automatically maintains a structured logging hierarchy using OpenTelemetry’s spans, to cover pipeline execution and nested task execution. Helpers are exposed to Tasks to add events, properties, and embedded spans.
- Because it uses OpenTelemetry under the hood, Workbench’s backend is pluggable and log data can be pushed to sources like Datadog or ML-native platforms, like MLFlow for monitoring and analysis
- State management: Pipelines and Tasks can accept inputs, and during a Pipeline’s lifecycle, task outputs are automatically pushed to the default Pipeline context. Tasks have access to and can inspect the full stack or prior tasks, if needed through a Pipeline object exposed at runtime.
- API publishing: Tasks and Pipelines can be directly published
- Each Task may define inputs and outputs that make API definitions automatically discoverable
- Inputs and outputs can cascade across a Pipeline enabling publishing of complex workflows
LLM Workbench is licensed under the Apache 2.0 License.
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.