Kronk

This project lets you use Go for hardware accelerated local inference with llama.cpp directly integrated into your applications via the yzma module. Kronk provides a high-level API that feels similar to using an OpenAI compatible API.

This project also provides a model server for chat completions, responses, messages, embeddings, and reranking. The server is compatible with the OpebWebUI, Cline, and the Claude Code project.

Here is the current catalog of models that have been verified to work with Kronk.

To see all the documentation, clone the project and run the Kronk Model Server:

$ make kronk-server

$ make website

You can also install Kronk, run the Kronk Model Server, and open the browser to localhost:8080

$ go install github.com/ardanlabs/kronk/cmd/kronk@latest

$ kronk server start

Read the Manual to learn more about running the Kronk Model Server.

Project Status

Owner Information

Name:     Bill Kennedy
Company:  Ardan Labs
Title:    Managing Partner
Email:    bill@ardanlabs.com
BlueSky:  https://bsky.app/profile/goinggo.net
LinkedIn: www.linkedin.com/in/william-kennedy-5b318778/
Twitter:  https://x.com/goinggodotnet

Install Kronk

To install the Kronk tool run the following command:

$ go install github.com/ardanlabs/kronk/cmd/kronk@latest

$ kronk --help

Issues/Features

Here is the existing Issues/Features for the project and the things being worked on or things that would be nice to have.

If you are interested in helping in any way, please send an email to Bill Kennedy.

Architecture

The architecture of Kronk is designed to be simple and scalable.

Watch this video to learn more about the project and the architecture.

SDK

The Kronk SDK allows you to write applications that can diectly interact with local open source GGUF models (supported by llama.cpp) that provide inference for text and media (vision and audio).

Check out the examples section below.

Models

Kronk uses models in the GGUF format supported by llama.cpp. You can find many models in GGUF format on Hugging Face (over 147k at last count):

models?library=gguf&sort=trending

Support

Kronk currently has support for over 94% of llama.cpp functionality thanks to yzma. See the yzma ROADMAP.md for the complete list.

You can use multimodal models (image/audio) and text language models with full hardware acceleration on Linux, on macOS, and on Windows.

OS	CPU	GPU
Linux	amd64, arm64	CUDA, Vulkan, HIP, ROCm, SYCL
macOS	arm64	Metal
Windows	amd64	CUDA, Vulkan, HIP, SYCL, OpenCL

Whenever there is a new release of llama.cpp, the tests for yzma are run automatically. Kronk runs tests once a day and will check for updates to llama.cpp. This helps us stay up to date with the latest code and models.

API Examples

There are examples in the examples direction:

The first time you run these programs the system will download and install the model and libraries.

AUDIO - This example shows you how to execute a simple prompt against an audio model.

make example-audio

CHAT - This example shows you how to chat with the chat-completion api.

make example-chat

EMBEDDING - This example shows you a basic program using Kronk to perform an embedding operation.

make example-embedding

QUESTION - This example shows you how to ask a simple question with the chat-completion api.

make example-question

RERANK - This example shows you how to use a rerank model.

make example-rerank

RESPONSE - This example shows you how to chat with the response api.

make example-question

VISION - This example shows you how to execute a simple prompt against a vision model.

make example-vision

YZMA - This example shows you how to use the yzma api at it's basic level.

make example-yzma

You can find more examples in the ArdanLabs AI training repo at Example13.

Sample API Program - Question Example

package main

import (
	"context"
	"fmt"
	"os"
	"time"

	"github.com/ardanlabs/kronk/sdk/kronk"
	"github.com/ardanlabs/kronk/sdk/kronk/model"
	"github.com/ardanlabs/kronk/sdk/tools/defaults"
	"github.com/ardanlabs/kronk/sdk/tools/libs"
	"github.com/ardanlabs/kronk/sdk/tools/models"
)

const modelURL = "Qwen/Qwen3-8B-GGUF/Qwen3-8B-Q8_0.gguf"

func main() {
	if err := run(); err != nil {
		fmt.Printf("\nERROR: %s\n", err)
		os.Exit(1)
	}
}

func run() error {
	mp, err := installSystem()
	if err != nil {
		return fmt.Errorf("unable to installation system: %w", err)
	}

	krn, err := newKronk(mp)
	if err != nil {
		return fmt.Errorf("unable to init kronk: %w", err)
	}

	defer func() {
		fmt.Println("\nUnloading Kronk")
		if err := krn.Unload(context.Background()); err != nil {
			fmt.Printf("failed to unload model: %v", err)
		}
	}()

	if err := question(krn); err != nil {
		fmt.Println(err)
	}

	return nil
}

func installSystem() (models.Path, error) {
	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Minute)
	defer cancel()

	libs, err := libs.New(
		libs.WithVersion(defaults.LibVersion("")),
	)
	if err != nil {
		return models.Path{}, err
	}

	if _, err := libs.Download(ctx, kronk.FmtLogger); err != nil {
		return models.Path{}, fmt.Errorf("unable to install llama.cpp: %w", err)
	}

	// -------------------------------------------------------------------------

	mdls, err := models.New()
	if err != nil {
		return models.Path{}, fmt.Errorf("unable to install llama.cpp: %w", err)
	}

	mp, err := mdls.Download(ctx, kronk.FmtLogger, modelURL, "")
	if err != nil {
		return models.Path{}, fmt.Errorf("unable to install model: %w", err)
	}

	// -------------------------------------------------------------------------
	// You could also download this model using the catalog system.
	// templates.Catalog().DownloadModel("Qwen3-8B-Q8_0")

	return mp, nil
}

func newKronk(mp models.Path) (*kronk.Kronk, error) {
	fmt.Println("loading model...")

	if err := kronk.Init(); err != nil {
		return nil, fmt.Errorf("unable to init kronk: %w", err)
	}

	cfg := model.Config{
		ModelFiles: mp.ModelFiles,
		CacheTypeK: model.GGMLTypeQ8_0,
		CacheTypeV: model.GGMLTypeQ8_0,
		NSeqMax:    2,
	}

	krn, err := kronk.New(cfg)
	if err != nil {
		return nil, fmt.Errorf("unable to create inference model: %w", err)
	}

	fmt.Print("- system info:\n\t")
	for k, v := range krn.SystemInfo() {
		fmt.Printf("%s:%v, ", k, v)
	}
	fmt.Println()

	fmt.Println("  - contextWindow:", krn.ModelConfig().ContextWindow)
	fmt.Println("  - embeddings   :", krn.ModelInfo().IsEmbedModel)
	fmt.Println("  - isGPT        :", krn.ModelInfo().IsGPTModel)

	return krn, nil
}

func question(krn *kronk.Kronk) error {
	ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
	defer cancel()

	question := "Hello model"

	fmt.Println()
	fmt.Println("QUESTION:", question)
	fmt.Println()

	d := model.D{
		"messages": model.DocumentArray(
			model.TextMessage(model.RoleUser, question),
		),
		"temperature": 0.7,
		"top_p":       0.9,
		"top_k":       40,
		"max_tokens":  2048,
	}

	ch, err := krn.ChatStreaming(ctx, d)
	if err != nil {
		return fmt.Errorf("chat streaming: %w", err)
	}

	// -------------------------------------------------------------------------

	var reasoning bool

	for resp := range ch {
		switch resp.Choice[0].FinishReason() {
		case model.FinishReasonError:
			return fmt.Errorf("error from model: %s", resp.Choice[0].Delta.Content)

		case model.FinishReasonStop:
			return nil

		default:
			if resp.Choice[0].Delta.Reasoning != "" {
				reasoning = true
				fmt.Printf("\u001b[91m%s\u001b[0m", resp.Choice[0].Delta.Reasoning)
				continue
			}

			if reasoning {
				reasoning = false
				fmt.Println()
				continue
			}

			fmt.Printf("%s", resp.Choice[0].Delta.Content)
		}
	}

	return nil
}

This example can produce the following output:

make example-question

CGO_ENABLED=0 go run examples/question/main.go
download-libraries: status[check libraries version information] arch[arm64] os[darwin] processor[cpu]
download-libraries: status[check llama.cpp installation] arch[arm64] os[darwin] processor[cpu] latest[b7406] current[b7406]
download-libraries: status[already installed] latest[b7406] current[b7406]
download-model: model-url[Qwen/Qwen3-8B-GGUF/Qwen3-8B-Q8_0.gguf] proj-url[] model-id[Qwen3-8B-Q8_0]:
download-model: waiting to check model status...:
download-model: status[already exists]:
loading model...

QUESTION: Hello model

Okay, the user said "Hello model." I need to respond appropriately. First, I should acknowledge their greeting. Since they mentioned "model," maybe they're referring to me as a language model. I should clarify that I'm Qwen, a large language model developed by Alibaba Cloud. I should keep the response friendly and open-ended, inviting them to ask questions or share topics they're interested in. Let me make sure the tone is welcoming and helpful. Also, check for any possible misunderstandings. They might be testing if I recognize the term "model," so confirming my identity as Qwen is important. Alright, time to put it all together in a natural, conversational way.

! I'm Qwen, a large language model developed by Alibaba Cloud. How can I assist you today? 😊 Whether you have questions, need help with something, or just want to chat, feel free to let me know!
Unloading Kronk

Name		Name	Last commit message	Last commit date
Latest commit History 271 Commits
.claude		.claude
.githooks		.githooks
.github/workflows		.github/workflows
.opencode		.opencode
cmd		cmd
examples		examples
images		images
sdk		sdk
zarf		zarf
.excalidraw		.excalidraw
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
MANUAL.md		MANUAL.md
README.md		README.md
THREADS.md		THREADS.md
WORKSHOPS.md		WORKSHOPS.md
go.mod		go.mod
go.sum		go.sum
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kronk

Project Status

Owner Information

Install Kronk

Issues/Features

Architecture

SDK

Models

Support

API Examples

Sample API Program - Question Example

About

Uh oh!

Releases 152

Packages

Contributors 3

Uh oh!

Languages

License

ardanlabs/kronk

Folders and files

Latest commit

History

Repository files navigation

Kronk

Project Status

Owner Information

Install Kronk

Issues/Features

Architecture

SDK

Models

Support

API Examples

Sample API Program - Question Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 152

Packages 0

Contributors 3

Uh oh!

Languages

Packages