YALMR β Yet Another LLM Runtime
Run local GGUF models in .NET 10 via llama.cpp. Features streaming generation, tool calling, vision (multimodal), conversation compaction, MCP integration, and a multi-model server.
Ask AI about YALMR β Yet Another LLM Runtime
Powered by Claude Β· Grounded in docs
I know everything about YALMR β Yet Another LLM Runtime. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
YALMR
Yet Another LLM Runtime β run local GGUF models in .NET 10 via llama.cpp.
Requires .NET 10 SDK and a GGUF model file (e.g. from Hugging Face).
Install the runtime
YALMR downloads the correct llama.cpp native binaries automatically. Call once at startup:
using YALMR.LlamaCpp;
string backendDir = await LlamaRuntimeInstaller.EnsureInstalledAsync(LlamaBackend.Cpu);
// LlamaBackend.Cuda or LlamaBackend.Vulkan for GPU
Binaries are cached in %LOCALAPPDATA%\YALMR\llama-runtime (Windows) or ~/.local/share/YALMR/llama-runtime (Linux/macOS).
Start a session
using YALMR.LlamaCpp;
using YALMR.Runtime;
using YALMR.Utils;
string backendDir = await LlamaRuntimeInstaller.EnsureInstalledAsync(LlamaBackend.Cpu);
await using var session = await Session.CreateAsync(new SessionOptions
{
BackendDirectory = backendDir,
ModelPath = "path/to/model.gguf",
ToolRegistry = new ToolRegistry(),
Compaction = new ConversationCompactionOptions(MaxInputTokens: 8192),
});
Chat
await foreach (var chunk in session.GenerateAsync(new ChatMessage("user", "Hello!")))
Console.Write(chunk.Text);
Multi-turn conversation is tracked automatically. Each call to GenerateAsync appends to the session history.
Structured output
Use AskAsync<T>() to get a typed response. The model's sampling is constrained via GBNF grammar so the output is always valid JSON matching your type's schema β no prompt-engineering or retry logic required.
public record MovieReview(string Title, int ReleaseYear, double Score, string Summary);
var review = await session.AskAsync<MovieReview>(
"Review the movie Interstellar.");
Console.WriteLine($"{review.Title} ({review.ReleaseYear}) β {review.Score}/10");
Console.WriteLine(review.Summary);
Any JSON-serializable type works: records, classes, and collections. Optional properties (nullable or with defaults) may be returned as null by the model.
Features
- Tool calling β register handlers in
ToolRegistry; the model calls them automatically - Streaming β
IAsyncEnumerable<ChatResponseChunk>with text, reasoning, and tool-call chunks - Structured output β
AskAsync<T>()constrains sampling via GBNF grammar and deserializes the result - Vision β attach images via
ImagePartwith a multimodal projector model - Conversation compaction β automatic context-window management with pluggable strategies
- MCP integration β call tools from HTTP or stdio MCP servers
- Multi-model server β
YALMRServermanages named engines and sessions for concurrent use
Tool calling
Decorate methods with [Tool] and parameters with [ToolParam], then register the instance:
public class AssistantTools
{
[Tool("Returns the current date and time.")]
public string GetDateTime() => DateTimeOffset.Now.ToString("R");
[Tool("Returns the weather for a city.")]
public string GetWeather(
[ToolParam("City name.")] string city,
[ToolParam("Temperature units: celsius or fahrenheit.")] string units = "celsius")
=> $"22 degrees, sunny in {city}.";
}
ToolRegistry registry = new();
registry.Register(new AssistantTools());
await using var session = await Session.CreateAsync(new SessionOptions
{
BackendDirectory = backendDir,
ModelPath = "path/to/model.gguf",
ToolRegistry = registry,
DefaultInference = new InferenceOptions { Tools = registry.ToToolDefinitions() },
Compaction = new ConversationCompactionOptions(MaxInputTokens: 8192),
});
The session calls matching handlers automatically. Method names become snake_case (GetWeather to get_weather). Override with [Tool("...", Name = "custom_name")].
With a DI container
registry.Register<AssistantTools>(serviceProvider);
With YALMRServer (per-request tools)
if (!server.TryGetModelOptions("my-model", out var baseOpts))
throw new InvalidOperationException("Model not loaded.");
var registry = new ToolRegistry();
registry.Register(new AssistantTools());
var sessionId = server.CreateSession("my-model", baseOpts with
{
ToolRegistry = registry,
DefaultInference = (baseOpts.DefaultInference ?? new InferenceOptions()) with
{
Tools = registry.ToToolDefinitions(),
},
});
try
{
var reply = await server.GetSession(sessionId)
.SendAsync(new ChatMessage("user", "What time is it?"));
Console.WriteLine(reply.Content);
}
finally { await server.RemoveSessionAsync(sessionId); }
Argument types
When writing handlers manually, argument values have these CLR types:
| JSON Schema type | CLR type | Note |
|---|---|---|
"string" | string | |
"integer" | long | |
"number" | double | Cast if you need decimal: (decimal)(double)args["price"] |
"boolean" | bool | |
"array" | object?[] | Elements follow the same rules |
"object" | Dictionary<string, object?> |
The reflection-based Register coerces all types automatically. The table above only matters when writing AgentTool handlers directly.
Manual registration
For programmatic or stateful tools that don't fit the method-per-tool model:
registry.Register(new AgentTool(
"search_products",
"Searches the product catalogue.",
[
new ToolParameter("query", "string", "Search query."),
new ToolParameter("limit", "integer", "Max results.", Required: false),
],
args =>
{
string query = (string)args["query"];
int limit = args.TryGetValue("limit", out var l) && l is long n ? (int)n : 10;
return SearchProducts(query, limit);
}));
