The opinionated high performance professional-grade AI package for Go.
genai is intentional. Curious why it was created? See the release announcement at maruel.ca/post/genai-v0.1.0.
- Full functionality: Full access to each backend-specific functionality. Access the raw API if needed with full message schema as Go structs.
- Tool calling via reflection: Tell the LLM to call a tool directly, described as a Go struct. No need to manually fiddle with JSON.
- Native JSON struct serialization: Pass a struct to tell the LLM what to generate, decode the reply into your struct. No need to manually fiddle with JSON. Supports required fields, enums, descriptions, etc. You can still fiddle if you want to. :)
- Streaming: Streams completion reply as the output is being generated, including thinking and tool calling, via go 1.23 iterators.
- Multi-modal: Process images, PDFs and videos (!) as input or output.
- Web Search: Search the web to answer your question and cite documents passed in.
- Smoke testing friendly: record and play back API calls at HTTP level to save 💰 and keep tests fast and reproducible, via the exposed HTTP transport. See example.
- Rate limits and usage: Parse the provider-specific HTTP headers and JSON response to get the tokens usage and remaining quota.
- Provide access to HTTP headers to enable beta features.
- Safe and strict API implementation. All you love from a statically typed language. The library's smoke tests immediately fail on unknown RPC fields. Error code paths are properly implemented.
- Stateless. No global state, it is safe to use clients concurrently.
- Professional grade. smoke tested on live services with recorded traces located in
testdata/directories, e.g. providers/anthropic/testdata/TestClient/Scoreboard/. - Trust, but verify. It generates a scoreboard based on actual behavior from each provider.
- Optimized for speed. Minimize memory allocations, compress data at the transport layer when possible. Groq, Mistral and OpenAI use brotli for HTTP compression instead of gzip, and POST's body to Google are gzip compressed.
- Lean: Few dependencies. No unnecessary abstraction layer.
| Provider | 🌐 | Mode | ➛In | Out➛ | Tool | JSON | Batch | File | Cite | Text | Probs | Limits | Usage | Finish |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| anthropic | 🇺🇸 | Sync, Stream🧠 | 💬📄📸 | 💬 | ✅🪨🕸️ | ❌ | ✅ | ❌ | ✅ | 🛑📏 | ❌ | ✅ | ✅ | ✅ |
| bfl | 🇩🇪 | Sync | 💬 | 📸 | ❌ | ❌ | ✅ | ❌ | ❌ | 🌱 | ❌ | ✅ | ❌ | ❌ |
| cerebras | 🇺🇸 | Sync, Stream🧠 | 💬 | 💬 | ✅🪨 | ✅ | ❌ | ❌ | ❌ | 🌱📏🛑 | ✅ | ❌ | ✅ | ✅ |
| cloudflare | 🇺🇸 | Sync, Stream🧠 | 💬 | 💬 | 💨 | ✅ | ❌ | ❌ | ❌ | 🌱📏 | ❌ | ❌ | ✅ | 💨 |
| cohere | 🇨🇦 | Sync, Stream🧠 | 💬📸 | 💬 | ✅🪨 | ✅ | ❌ | ❌ | ✅ | 🌱📏🛑 | ✅ | ❌ | ✅ | ✅ |
| deepseek | 🇨🇳 | Sync, Stream🧠 | 💬 | 💬 | ✅🪨 | ☁️ | ❌ | ❌ | ❌ | 📏🛑 | ✅ | ❌ | ✅ | ✅ |
| gemini | 🇺🇸 | Sync, Stream🧠 | 🎤🎥💬📄📸 | 💬📸 | ✅🪨🕸️ | ✅ | ❌ | ✅ | ❌ | 🌱📏🛑 | ❌ | ❌ | ✅ | ✅ |
| groq | 🇺🇸 | Sync, Stream🧠 | 💬📸 | 💬 | ✅🪨🕸️ | ☁️ | ❌ | ❌ | ❌ | 🌱📏🛑 | ❌ | ✅ | ✅ | ✅ |
| huggingface | 🇺🇸 | Sync, Stream🧠 | 💬 | 💬 | ❌ | ☁️ | ❌ | ❌ | ❌ | 🌱📏🛑 | ✅ | ✅ | ✅ | ✅ |
| llamacpp | 🏠 | Sync, Stream | 💬📸 | 💬 | ✅🪨 | ✅ | ❌ | ❌ | ❌ | 🌱📏🛑 | ✅ | ❌ | ✅ | ✅ |
| mistral | 🇫🇷 | Sync, Stream | 🎤💬📄📸 | 💬 | ✅🪨 | ✅ | ❌ | ❌ | ❌ | 🌱📏🛑 | ❌ | ✅ | ✅ | ✅ |
| ollama | 🏠 | Sync, Stream🧠 | 💬📸 | 💬 | 💨 | ✅ | ❌ | ❌ | ❌ | 🌱📏🛑 | ❌ | ❌ | ✅ | ✅ |
| openaichat | 🇺🇸 | Sync, Stream🧠 | 🎤💬📄📸 | 💬📸 | ✅🪨🕸️ | ✅ | ✅ | ✅ | ❌ | 🌱📏🛑 | ✅ | ✅ | ✅ | ✅ |
| openairesponses | 🇺🇸 | Sync, Stream🧠 | 💬📄📸 | 💬📸 | ✅🪨🕸️ | ✅ | ❌ | ❌ | ❌ | 🌱 | ❌ | ✅ | ✅ | ✅ |
| perplexity | 🇺🇸 | Sync, Stream🧠 | 💬📸 | 💬 | 🕸️ | 📐 | ❌ | ❌ | ✅ | 📏 | ❌ | ❌ | ✅ | ✅ |
| pollinations | 🇩🇪 | Sync, Stream | 💬📸 | 💬📸 | ✅🪨 | ☁️ | ❌ | ❌ | ❌ | 🌱 | ❌ | ❌ | ✅ | ✅ |
| togetherai | 🇺🇸 | Sync, Stream🧠 | 🎥💬📸 | 💬📸 | ✅🪨 | ✅ | ❌ | ❌ | ❌ | 🌱📏🛑 | ❌ | ✅ | ✅ | ✅ |
| openaicompatible | N/A | Sync, Stream | 💬 | 💬 | ❌ | ❌ | ❌ | ❌ | ❌ | 📏🛑 | ❌ | ❌ | ✅ | ✅ |
‼️ Click here for the legend of columns and symbols
- 🏠: Runs locally.
- Sync: Runs synchronously, the reply is only returned once completely generated
- Stream: Streams the reply as it is generated. Occasionally less features are supported in this mode
- 🧠: Has chain-of-thought thinking process
- Both redacted (Anthropic, Gemini, OpenAI) and explicit (Deepseek R1, Qwen3, etc)
- Many models can be used in both mode. In this case they will have two rows, one with thinking and one without. It is frequent that certain functionalities are limited in thinking mode, like tool calling.
- ✅: Implemented and works great
- ❌: Not supported by genai. The provider may support it, but genai does not (yet). Please send a PR to add it!
- 💬: Text
- 📄: PDF: process a PDF as input, possibly with OCR
- 📸: Image: process an image as input; most providers support PNG, JPG, WEBP and non-animated GIF, or generate images
- 🎤: Audio: process an audio file (e.g. MP3, WAV, Flac, Opus) as input, or generate audio
- 🎥: Video: process a video (e.g. MP4) as input, or generate a video (e.g. Veo 3)
- 💨: Feature is flaky (Tool calling) or inconsistent (Usage is not always reported)
- 🌐: Country where the company is located
- Tool: Tool calling, using genai.ToolDef; best is ✅🪨🕸️ - 🪨: Tool calling can be forced; aka you can force the model to call a tool. This is great. - 🕸️: Web search
- JSON: ability to output JSON in free form, or with a forced schema specified as a Go struct
- ✅: Supports both free form and with a schema
- ☁️ :Supports only free form
- 📐: Supports only a schema
- Batch: Process asynchronously batches during off peak hours at a discounts
- Text: Text features
- '🌱': Seed option for deterministic output
- '📏': MaxTokens option to cap the amount of returned tokens
- '🛑': Stop sequence to stop generation when a token is generated
- File: Upload and store large files via a separate API
- Cite: Citation generation from a provided document, specially useful for RAG
- Probs: Return logprobs to analyse each token probabilities
- Limits: Returns the rate limits, including the remaining quota
The following examples intentionally use a variety of providers to show the extent at which you can pick and chose.
examples/txt_to_txt_sync/main.go: This selects a good default model based
on Anthropic's currently published models, sends a prompt and prints the response as a string.
💡 Set ANTHROPIC_API_KEY.
func main() {
ctx := context.Background()
c, err := anthropic.New(ctx, &genai.ProviderOptions{}, nil)
msgs := genai.Messages{
genai.NewTextMessage("Give me a life advice that sounds good but is a bad idea in practice. Answer succinctly."),
}
result, err := c.GenSync(ctx, msgs)
fmt.Println(result.String())
}This may print:
"Follow your passion and the money will follow."
This ignores market realities, financial responsibilities, and the fact that passion alone doesn't guarantee income or career viability.
examples/txt_to_txt_sync_multi/main.go: This shows how to do multiple message
round trips adding additional follow-up messages from users. Set OPENAI_API_KEY.
func main() {
ctx := context.Background()
c, err := anthropic.New(ctx, &genai.ProviderOptions{}, nil)
msgs := genai.Messages{
genai.NewTextMessage("Let's play a word association game. You pick a single word, then I pick the first word I think of, then you respond with a word, and so on.")
}
result, err := c.GenSync(ctx, msgs)
if err != nil {
panic(err)
}
// Show the message from ChatGPT
fmt.Println(result.String())
// Save the message in the collection of messages to build up context
msgs = append(msgs, result.Message)
// Add another user message
msgs = append(msgs, genai.NewTextMessage("nightwish"))
// Get another completion
result, err := c.GenSync(ctx, msgs)
// ...and so on.
}examples/txt_to_txt_stream/main.go: This is the same example as above, with the output streamed as it replies. This leverages go 1.23 iterators. Notice how little difference there is between both.
func main() {
ctx := context.Background()
c, err := anthropic.New(ctx, &genai.ProviderOptions{}, nil)
msgs := genai.Messages{
genai.NewTextMessage("Give me a life advice that sounds good but is a bad idea in practice."),
}
fragments, finish := c.GenStream(ctx, msgs)
for f := range fragments {
os.Stdout.WriteString(f.Text)
}
_, err = finish()
}examples/txt_to_txt_thinking/main.go: genai supports for implicit
reasoning (e.g. Anthropic) and explicit reasoning (e.g. Deepseek). The package adapters provide logic to
automatically handle explicit Chain-of-Thoughts models, generally using <think> and </think> tokens.
💡 Set DEEPSEEK_API_KEY.
Snippet:
c, _ := deepseek.New(ctx, &genai.ProviderOptions{Model: "deepseek-reasoner"}, nil)
msgs := genai.Messages{
genai.NewTextMessage("Give me a life advice that sounds good but is a bad idea in practice."),
}
fragments, finish := c.GenStream(ctx, msgs)
for f := range fragments {
if f.Reasoning != "" {
// ...
} else if f.Text != "" {
// ...
}
}examples/txt_to_txt_citations/main.go: Send entire documents and
leverage providers which support automatic citations (Cohere, Anthropic) to leverage their functionality for a
supercharged RAG.
💡 Set COHERE_API_KEY.
Snippet:
const context = `...` // Introduction of On the Origin of Species by Charles Darwin...
msgs := genai.Messages{{
Requests: []genai.Request{
{
Doc: genai.Doc{
Filename: "On-the-Origin-of-Species-by-Charles-Darwin.txt",
Src: strings.NewReader(context),
},
},
{Text: "When did Darwin arrive home?"},
},
}}
res, _ := c.GenSync(ctx, msgs)
for _, r := range res.Replies {
if !r.Citation.IsZero() {
fmt.Printf("Citation:\n")
for _, src := range r.Citation.Sources {
fmt.Printf("- %q\n", src.Snippet)
}
}
}
fmt.Printf("\nAnswer: %s\n", res.String())When asked When did Darwin arrive home? with the introduction of On the Origin of Species by Charles Darwin passed in as a document, this may print:
Citation:
- "excerpt from Charles Darwin's work 'On the Origin of Species'"
- "returned home in 1837."
Answer: 1837 was when Darwin returned home and began to reflect on the facts he had gathered during his time on H.M.S. Beagle.
examples/txt_to_txt_websearch-sync/main.go: Searches the web
to answer your question.
💡 Set PERPLEXITY_API_KEY.
Snippet:
c, _ := perplexity.New(ctx, &genai.ProviderOptions{Model: genai.ModelCheap}, nil)
msgs := genai.Messages{{
Requests: []genai.Request{
{Text: "Who holds ultimate power of Canada? Answer succinctly."},
},
}}
// perplexity has websearch enabled by default so this is a no-op.
// It is needed to enable websearch for anthropic, gemini and openai.
opts := genai.OptionsTools{WebSearch: true}
res, _ := c.GenSync(ctx, msgs, &opts)
for _, r := range res.Replies {
if !r.Citation.IsZero() {
fmt.Printf("Sources:\n")
for _, src := range r.Citation.Sources {
switch src.Type {
case genai.CitationWeb:
fmt.Printf("- %s / %s\n", src.Title, src.URL)
case genai.CitationWebImage:
fmt.Printf("- image: %s\n", src.URL)
}
}
}
}
fmt.Printf("\nAnswer: %s\n", res.String())Try it locally:
go run github.com/maruel/genai/examples/txt_to_txt_websearch-sync@latestWhen asked Who holds ultimate power of Canada?, this may print:
Sources:
Prime Minister of Canada / https://en.wikipedia.org/wiki/Prime_Minister_of_Canada
Canadian Parliamentary System - Our Procedure / https://www.ourcommons.ca/procedure/our-procedure/parliamentaryFramework/c_g_parliamentaryframework-e.html
(...)
(...)
Answer: The ultimate power in Canada constitutionally resides with the monarch (King Charles III) as the head of state, with executive authority formally vested in him. However, (...)
examples/txt_to_txt_websearch-stream/main.go: Searches the web
to answer your question and streams the output to the console.
💡 Set PERPLEXITY_API_KEY.
go run github.com/maruel/genai/examples/txt_to_txt_websearch-stream@latestSame as above, but streaming.
examples/txt_to_txt_logprobs/main.go: List the alternative tokens that were considered during generation. This helps tune Temperature, TopP or TopK.
Try it locally:
go run github.com/maruel/genai/examples/txt_to_txt_logprobs@latestWhen asked Tell a joke, this may print:
Provider huggingface
Reply:
Why don't scientists trust atoms?
Because they make up everything!
Logprobs:
* -0.000082: "Why"
-9.625082: "Here"
-11.250082: "What"
-13.875082: "A"
-14.500082: "How"
* -0.000003: " don"
-14.125003: " do"
-14.625003: " did"
-14.625003: " dont"
-14.875003: " didn"
* -0.000001: "'t"
-14.000001: "’t"
-18.062500: "'"
-18.875000: "'T"
-19.812500: "'s"
* -0.000002: " scientists"
-14.250002: " Scientists"
-14.250002: " eggs"
-15.125002: " skeletons"
-16.125002: " programmers"
* -0.000000: " trust"
-16.250000: " trusts"
-16.250000: " Trust"
-17.250000: " like"
-18.000000: " trusted"
* -0.000006: " atoms"
-13.250006: "atoms"
-13.500006: " stairs"
-14.625006: " their"
-15.000006: " electrons"
* -0.000011: "?\n\n"
-12.125011: "?\n"
-12.125011: "?"
-14.750011: "?\n\n"
-16.500011: " anymore"
(...)
examples/txt_to_txt_tool-sync/main.go: A LLM can both retrieve
information and act on its environment through tool calling. This unblocks a whole realm of possibilities. Our
design enables dense strongly typed code that favorably compares to python.
💡 Set CEREBRAS_API_KEY.
Snippet:
type numbers struct {
A int `json:"a"`
B int `json:"b"`
}
msgs := genai.Messages{
genai.NewTextMessage("What is 3214 + 5632? Call the tool \"add\" to tell me the answer. Do not explain. Be terse. Include only the answer."),
}
opts := genai.OptionsTools{
Tools: []genai.ToolDef{
{
Name: "add",
Description: "Add two numbers together and provides the result",
Callback: func(ctx context.Context, input *numbers) (string, error) {
return fmt.Sprintf("%d", input.A+input.B), nil
},
},
},
// Force the LLM to do a tool call.
Force: genai.ToolCallRequired,
}
// Run the loop.
res, _, _ := adapters.GenSyncWithToolCallLoop(ctx, c, msgs, &opts)
// Print the answer which is the last message generated.
fmt.Println(res[len(res)-1].String())When asked What is 3214 + 5632?, this may print:
8846
examples/txt_to_txt_tool-stream/main.go: Leverage a thinking
model to see the thinking process while trying to use tool calls to answer the user's question. This enables
keeping the user updated to see the progress.
💡 Set GROQ_API_KEY.
Snippet:
fragments, finish := adapters.GenStreamWithToolCallLoop(ctx, p, msgs, &opts)
for f := range fragments {
if f.Reasoning != "" {
// ...
} else if f.Text != "" {
// ...
} else if !f.ToolCall.IsZero() {
// ...
}
}``
When asked What is 3214 + 5632?, this may print:
# Reasoning
User wants result of 3214+5632 using tool "add". Must be terse, only answer, no explanation. Need to call add function with a=3214, b=5632.
# Tool call
{fc_e9b9677b-898c-46df-9deb-39122bd6c69a add {"a":3214,"b":5632} map[] {}}
# Answer
8846
examples/txt_to_txt_tool-manual/main.go: Runs a manual loop and
runs tool calls directly.
💡 Set CEREBRAS_API_KEY.
Snippet:
res, _ := c.GenSync(ctx, msgs, &opts)
// Add the assistant's message to the messages list.
msgs = append(msgs, res.Message)
// Process the tool call from the assistant.
msg, _ := res.DoToolCalls(ctx, opts.Tools)
// Add the tool call response to the messages list.
msgs = append(msgs, msg)
// Follow up so the LLM can interpret the tool call response.
res, _ = c.GenSync(ctx, msgs, &opts)examples/txt_to_txt_decode-json/main.go: Tell the LLM to use a specific Go struct to determine the JSON schema to generate the response. This is much more lightweight than tool calling!
It is very useful when we want the LLM to make a choice between values, to return a number or a boolean
(true/false). Enums are supported.
💡 Set OPENAI_API_KEY.
Snippet:
msgs := genai.Messages{
genai.NewTextMessage("Is a circle round? Reply as JSON."),
}
var circle struct {
Round bool `json:"round"`
}
opts := genai.OptionsText{DecodeAs: &circle}
res, _ := c.GenSync(ctx, msgs, &opts)
res.Decode(&circle)
fmt.Printf("Round: %v\n", circle.Round)This will print:
Round: true
examples/txt_to_img/main.go: Use Together.AI's free (!) image generation albeit with low rate limit.
Some providers return an URL that must be fetched manually within a few minutes or hours, some return the data
inline. This example handles both cases.
💡 Set TOGETHER_API_KEY.
Snippet:
msgs := genai.Messages{
genai.NewTextMessage("Carton drawing of a husky playing on the beach."),
}
result, _ := c.GenSync(ctx, msgs)
for _, r := range result.Replies {
if r.Doc.IsZero() {
continue
}
// The image can be returned as an URL or inline, depending on the provider.
var src io.Reader
if r.Doc.URL != "" {
req, _ := c.HTTPClient().Get(r.Doc.URL)
src = req.Body
defer req.Body.Close()
} else {
src = r.Doc.Src
}
b, _ := io.ReadAll(src)
os.WriteFile(r.Doc.GetFilename(), b, 0o644)
}Try it locally:
go run github.com/maruel/genai/examples/txt_to_img@latestThis may generate:
This generated picture shows a fake signature. I decided to keep this example as a reminder that the result comes from the data harvested that was created by real humans.
examples/img-txt_to_vid/main.go: Leverage the content.jpg file generated in
txt_to_img example to ask Veo 3 from Google to generate a video based on the image.
💡 Set GEMINI_API_KEY.
Snippet:
// Warning: this is expensive.
c, _ := gemini.New(ctx, &genai.ProviderOptions{Model: "veo-3.0-fast-generate-preview"}, nil)
f, _ := os.Open("content.jpg")
defer f.Close()
msgs := genai.Messages{
genai.Message{Requests: []genai.Request{
{Text: "Carton drawing of a husky playing on the beach."},
{Doc: genai.Doc{Src: f}},
}},
}
res, _ := c.GenSync(ctx, msgs)
// Save the file in Replies like in the previous example ...Try it locally:
go run github.com/maruel/genai/examples/img-txt_to_vid@latestThis may generate:
⚠ The MP4 has been recompressed to AVIF via compress.sh so GitHub can render it. The drawback is that audio is lost. View the original MP4 with audio (!) at content.mp4. May not work on Safari.
This is very impressive, but also very expensive.
examples/img-txt_to_img/main.go: Edit an image with a prompt. Leverage
the content.jpg file generated in txt_to_img example.
💡 Set BFL_API_KEY.
go run github.com/maruel/genai/examples/img-txt_to_img@latestThis may generate:
examples/img-txt_to_img-txt/main.go: Leverage the
content.jpg file generated in txt_to_img example to ask gemini-2.5-flash-image-preview to change the image
with a prompt and ask the model to explain what it did.
💡 Set GEMINI_API_KEY.
Snippet:
// Warning: This is a bit expensive.
opts := genai.ProviderOptions{
Model: "gemini-2.5-flash-image-preview",
OutputModalities: genai.Modalities{genai.ModalityImage, genai.ModalityText},
}
c, _ := gemini.New(ctx, &opts, nil)
// ...
res, _ := c.GenSync(ctx, msgs, &gemini.Options{ReasoningBudget: 0})Try it locally:
go run github.com/maruel/genai/examples/img-txt_to_img-txt@latestThis may generate:
Of course! Here's an updated image with more animals. I added a playful dolphin jumping out of the water and a flock of seagulls flying overhead. I chose these animals to enhance the beach scene and create a more dynamic and lively atmosphere.
Wrote: content.png
This is quite impressive, but also quite expensive.
examples/img-txt_to_txt/main.go: Run vision to analyze a picture provided
as an URL (source: wikipedia). The response is
streamed out the console as the reply is generated.
💡 Set MISTRAL_API_KEY.
go run github.com/maruel/genai/examples/img-txt_to_txt@latestThis may generate:
The image depicts a single ripe banana. It has a bright yellow peel with a few small brown spots, indicating ripeness. The banana is curved, which is typical of its natural shape, and it has a stem at the top. The overall appearance suggests that it is ready to be eaten.
examples/img-txt_to_txt_local/main.go: is very similar to the previous example!
Use cmd/llama-serve to run a LLM locally, including tool calling and vision!
Start llama-server locally either by yourself or with this utility:
go run github.com/maruel/genai/cmd/llama-serve@latest \
-model ggml-org/gemma-3-4b-it-GGUF/gemma-3-4b-it-Q8_0.gguf#mmproj-model-f16.gguf -- \
--temp 1.0 --top-p 0.95 --top-k 64 \
--jinja -fa -c 0 --no-warmupRun vision 100% locally on CPU with only 8GB of RAM. No GPU required!
go run github.com/maruel/genai/examples/img-txt_to_txt_local@latestexamples/vid-txt_to_txt/main.go: Run vision to analyze a video.
💡 Set GEMINI_API_KEY.
Using this video:
Try it locally:
go run github.com/maruel/genai/examples/vid-txt_to_txt@latestWhen asked What is the word, this generates:
Banana
examples/aud-txt_to_txt/main.go:
Analyze an audio file.
💡 Set OPENAI_API_KEY.
Try it locally:
go run github.com/maruel/genai/examples/vid-txt_to_txt@latestWhen asked What was the word?, this generates:
The word was "orange."
examples/txt_to_txt_quota/main.go: Prints the tokens processed and
generated for the request and the remaining quota if the provider supports it.
💡 Set GROQ_API_KEY.
Snippet:
msgs := genai.Messages{
genai.NewTextMessage("Describe poutine as a French person who just arrived in Québec"),
}
res, _ := c.GenSync(ctx, msgs)
fmt.Println(res.String())
fmt.Printf("\nTokens usage: %s\n", res.Usage.String())This may generate:
« Je viens tout juste d’arriver au Québec et, pour être honnête, je n’avais jamais entendu parler du fameux « poutine » avant de mettre le pied dans un petit resto du coin. »
(...)
Tokens usage: in: 83 (cached 0), reasoning: 0, out: 818, total: 901, requests/2025-08-29 15:58:13: 499999/500000, tokens/2025-08-29 15:58:12: 249916/250000
In addition to the token usage, remaining quota is printed.
examples/txt_to_txt_any/main.go: Let the user chose the provider by name.
The relevant environment variable (e.g. ANTHROPIC_API_KEY, OPENAI_API_KEY, etc) is used automatically for
authentication.
Automatically selects a models on behalf of the user. Wraps the explicit thinking tokens if needed.
Supports ollama and llama-server even if they run on a remote host or non-default port.
Snippet:
names := strings.Join(slices.Sorted(maps.Keys(providers.Available(ctx))), ", ")
provider := flag.String("provider", "", "provider to use, "+names)
flag.Parse()
cfg := providers.All[*provider]
c, _ := cfg.Factory(ctx, &genai.ProviderOptions{}, nil)
p := adapters.WrapReasoning(c)
res, _ := p.GenSync(...)Try it locally:
go run github.com/maruel/genai/examples/txt_to_txt_any@latest \
-provider cerebras \
"Tell a good sounding advice that is a bad idea in practice."Snapshot of all the supported models at docs/MODELS.md is updated weekly.
Try it locally:
go install github.com/maruel/genai/cmd/...@latest
list-models -provider huggingfaceAs of August 2025, the following services offer a free tier (other limits apply):
- Cerebras has unspecified "generous" free tier
- Cloudflare Workers AI about 10k tokens/day
- Cohere (1000 RPCs/month)
- Google's Gemini 0.25qps, 1m tokens/month
- Groq 0.5qps, 500k tokens/day
- HuggingFace 10¢/month
- Mistral 1qps, 1B tokens/month
- Pollinations.ai provides many models for free, including image generation
- Together.AI provides many models for free at 1qps, including image generation
- Running Ollama or llama.cpp locally is free. :)
PRs are appreciated for any of the following. No need to ask! Just send a PR and make it pass CI checks. ❤️
- Authentication: OAuth, service account, OIDC, GITHUB_TOKEN.
- Server-side MCP Client: OpenAI
- Anthropic raw API is implemented and smoke tested but there's no abstraction layer yet
- Real-time / Live: Gemini, OpenAI, TogetherAI, ...
- More comprehensive file/cache abstraction
- Tokens counting: Anthropic, Cohere, Gemini, ...
- Embeddings: Anthropic, Cohere, Gemini, OpenAI, TogetherAI, ...
- Image to 3D, e.g. github.com/Tencent-Hunyuan/Hunyuan3D-2
I'd be delighted if you want to contribute any missing provider being added, I'm particularly looking forward to these:
- Alibaba Cloud: Maker of Qwen models.
- AWS Bedrock
- Azure AI
- Fireworks responses
- GitHub inference API, which works on GitHub Actions (!)
- Google's Vertex AI: It supports much more features than Gemini API.
- Groq
- LM Studio: Easier way to run local models.
- Mistral
- Novita: Supports lots of modalities.
- Open Router
- OpenAI
- Runway: Specialized in images and videos.
- Synexai: It's very cheap.
- vLLM: The fastest way to run local models.
I'm also looking to disconnect more the scoreboard from the Go code. I believe the scoreboard is useful in itself and is not Go specific. I appreciate ideas towards achieving this, send them my way!
Thanks in advance! 🙏
Made with ❤️ by Marc-Antoine Ruel




