📦

Creation Os

Creation OS - Cognitive Architecture

0 installs

Trust: 39 — Low

Other

Ask AI about Creation Os

I know everything about Creation Os. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

SPEKTRE LABS · HELSINKI · σ-AWARE · LOCAL-FIRST

Creation OS

One runtime. Measured coherence. Truth before output.

A σ-aware closed-loop cognition architecture — not a replacement model, not a chat skin, not a prompt cookbook.

σ estimates coherence, uncertainty, alignment, and truth-distance — carried as a twelve-byte interrupt on the hot path — portable C89, Q16.16, zero dependencies at the primitive.

Quick Start (Python package)

pip install creation-os

σ pipeline (prompt → guard → σ → decision)

from cos import Pipeline

pipe = Pipeline()
result = pipe.score("What is 2+2?", "4")
print(result.sigma, result.verdict, result.text)

if result:
    print("Reliable!")
else:
    print("Unreliable!")

Extras: pip install 'creation-os[langchain]' (LangChain callback), 'creation-os[litellm]', 'creation-os[probes]' (LSD probe / torch), 'creation-os[serve]' (FastAPI).

Score any LLM output

from cos import SigmaGate

gate = SigmaGate()
sigma, verdict = gate.score("What is 2+2?", "4")

LangChain integration

from cos.integrations.langchain import SigmaGateCallback

handler = SigmaGateCallback()
chain.invoke({"input": "..."}, config={"callbacks": [handler]})
print(handler.last_sigma, handler.last_verdict)

Advanced tracing (run-id maps, on_abstain=...): import from cos.integrations.langchain_sigma.

OpenAI-style client wrapper

from cos.integrations.openai_wrapper import sigma_chat

response = sigma_chat(client, messages=[{"role": "user", "content": "Hello"}])
print(response.sigma, response.verdict)

Decorator — any function

from cos.integrations.decorator import sigma_gated

@sigma_gated
def my_llm(prompt: str) -> str:
    return call_any_model(prompt)

result = my_llm("Explain gravity")
print(result.sigma, result.verdict, result.text)

Claim Discipline

Claim discipline, not hype: Creation OS is an AGI-oriented surface, not a claim of achieved AGI.
Evidence rules: docs/CLAIM_DISCIPLINE.md

Evidence Ladder

Creation OS does not claim everything. It measures what it can, proves what is wired, and refuses to overclaim.

Level	What	Status
1. Primitive	12-byte `sigma_state_t`, C89, Q16.16, zero deps	shipped
2. Runtime	σ-gate ACCEPT/RETHINK/ABSTAIN, single forward pass	shipped
3. Measured	TruthfulQA 0.982, TriviaQA 0.960, BitNet 2B pipeline	receipts in repo
4. Negative	HaluEval 0.514 (near random), HellaSwag bounded, MMLU not dominant	documented
5. Formal	Lean 4: 14/14, ACSL clauses: 30/30 (Frama-C Wp tier-1: 15 goals)	verified
6. Architecture	Ω-loop, Engram, JEPA, Swarm, Sovereign, TTT	built
7. Roadmap	AGI-oriented architecture — not AGI achieved	research direction

Every AI answers. Creation OS measures first.

A local σ-aware AI runtime that measures whether an answer should exist before showing it.

Most AI systems answer even when uncertain. Creation OS adds an internal measurement layer:
ACCEPT → emit the answer · RETHINK → regenerate or seek more compute · ABSTAIN → do not emit (product policy may map this to “I don’t know”).
The hot path measures and gates every answer before output. Separate formal verification (Lean, Frama-C, optional RTL) applies only where wired — do not read “prove” as a universal formal certificate for every chat token.

Everything runs locally. No cloud gate. No external evaluator. No prompt tricks for the core interrupt.

Creation OS σ-gate flow — precheck, streaming σ, post-check, verdict

What this is

Creation OS is a local σ-aware AI runtime that scores internal coherence and alignment before returning an answer. The portable interrupt is 12-byte sigma_state_t in C (python/cos/sigma_gate.h) with Python mirrors (python/cos/sigma_gate_core.py); the lab stack adds probes, cascades, and harnesses around that primitive.

Modules (79)

Creation OS ships with a broad in-tree module surface (documentation sometimes rounds to “79” integrated areas spanning inference through ops). The bullets below are a coverage map for navigation — not a capability matrix, AGI claim, or substitute for harness evidence. Read docs/CLAIM_DISCIPLINE.md before citing outcomes.

Inference: BitNet ternary engine, σ-attention, KV cache, speculative decoding
Safety: σ-gate multi-level cascade, guardrails, red team, ZKP attestation
Intelligence: neuro-symbolic reasoning, JEPA world model, continual learning, TTT
Memory: episodic + semantic + consolidation + forgetting, knowledge graph
Planning: hierarchical planning with risk analysis and fallbacks
Social: theory-of-mind lab scaffolds, trust dynamics, social learning
Multi-agent: swarm (stigmergy), conflict resolution, federated learning
Protocol: MCP (FastMCP stdio + JSON-RPC lab), A2A lab tasks, Agent Cards
Alignment: value learning, human-in-the-loop escalation, explainability
Embodiment: sensorimotor loop lab, symbol grounding, physical world model
Drives: curiosity, competence, homeostasis (lab intrinsic signals)
Metacognition / awareness metrics: Φ and integration proxies only — no phenomenal-awareness claim
Hardware: RISC-V σ ISA lab mirrors, TinyML, Soul LED paths where present
Deployment: Docker, Helm, air-gap options per docs, sovereign accounting lab, pip install
Ops: registry, digital twin lab, observability hooks, cos-evolve RSI lab

Quick demo / install

CLONE · BUILD · COS CHAT · Ω HARNESS

Pick a path — one-liner smoke, full install.sh + BitNet weights, or weights-free Ω harness (see Install).

# One-liner (ephemeral clone + build + `cos demo --batch`)
curl -fsSL https://raw.githubusercontent.com/spektre-labs/creation-os/main/scripts/try_cos.sh | bash

Full path — local weights + one-shot chat (shapes vary by build / weights):

git clone https://github.com/spektre-labs/creation-os.git
cd creation-os
./scripts/install.sh
./cos chat --once --prompt "What is 2+2?" --multi-sigma --verbose

Example lines you should see on the default BitNet pipeline:

→ round 0  4  [σ_peak=0.06 action=ACCEPT route=LOCAL]
→ [σ=0.063 | CACHE | LOCAL | conformal@α=0.80 | rethink=0 | €0.0000]

So: answer 4, σ ≈ 0.063, verdict = ACCEPT, route = LOCAL.

More install options: Homebrew, Docker, and make cos cos-demo — see Install below.
Weights-free harness: ./scripts/cos omega --goal "smoke" --turns 2 — 14-phase Ω scaffold (python/cos/omega/).

Why it matters

Local-first — default paths never phone home; escalation is explicit and opt-in when wired.
No external evaluator for the core interrupt — signals come from logits, hidden states, and configured probes.
Answer / rethink / abstain before the user sees output — selective generation instead of always emitting.

Claim status (at a glance)

Claim	Status
Local σ-gated runtime (C + Python)	Implemented
TruthfulQA / TriviaQA-style σ signals (archived JSON)	Measured (Measured)
HaluEval v2 paired-oracle AUROC	Negative / not solved (below)
“General AGI” Ω-loop + module surface	Architecture / research direction
Full self-improvement at scale	Experimental / harness-scoped

What is not claimed

Creation OS does not claim universal hallucination detection on every benchmark family.
Creation OS does not claim frontier-model SOTA accuracy on every task.
Creation OS does not claim AGI completion.
Creation OS does claim a σ-aware local runtime with documented positive and negative metrics where in-tree JSON exists — read docs/CLAIM_DISCIPLINE.md before citing numbers.

The core primitive: σ

σ is not a brand slogan — it is a scalar estimate built from independent signals (entropy, HIDE, ICR, LSD, spectral / SAE when wired), normalized and aggregated, then fed into a three-way gate (ACCEPT / RETHINK / ABSTAIN). Conventions differ by harness row; the C interrupt uses Q16.16 algebra shared with the Python mirror.

Creation OS σ core — signals S1–S5 into unified σ

Honest scope: σ is not universally validated across all tasks. It is strongest on factual-confidence settings with archived in-tree JSON (TruthfulQA / TriviaQA-style probes) and still weak on current HaluEval v2 paired-oracle rows.

Negative result: HaluEval v2

Current HaluEval v2 paired-oracle AUROC: 0.51375 (auroc_hallucinated_vs_correct_arms, n_pairs=80, n_scores=160, auroc_note: used_negated_scores_monotone) — benchmarks/sigma_gate_eval/results_halueval_v2/halueval_v2_summary.json.

That is near random and below the > 0.70 lab target.

Interpretation: the σ stack in this repository is not yet a general hallucination detector on HaluEval-style paired strings. Positive claims stay bounded to validated TruthfulQA / TriviaQA-style factual-confidence tasks per docs/CLAIM_DISCIPLINE.md.

Signal cascade: efficiency

Cheap signals run first; the stack stops when the verdict is already clear so most traffic never pays for the deepest probes.

Creation OS signal cascade — cheapest signal first, early exit

Architecture overview

Eight-layer map of the σ-aware system (narrative + lab — not every layer ships in one binary on every platform):

Creation OS — eight-layer σ-aware architecture

σ-Fabric: full system connection

SigmaFabric is the wiring layer: boot() loads available Python modules (gate, pipeline, stream, metacog, reason, snapshots, …) and process() runs a traced path with σ carried stage to stage. The lighter Fabric class boots the Ω-loop cognitive map with observable per-module status (loaded / missing / failed / disabled). The map below is conceptual (not every box is present in a minimal pip install); L9 stays a research-facing proxy — not a metacognition or awareness product claim (see Claim discipline).

Deeper ULTRA / BSC / silicon map: Architecture · docs/DOC_INDEX.md.

Continuous Ω-loop

Beyond a single-shot gate, the tree sketches a closed Ω-loop: σ at each phase (perceive → … → continue), blended into a master lane — python/cos/omega/, src/sigma/omega_phase_gates.{h,c}, ./scripts/cos omega --goal … --turns …, and the legacy integration driver in src/sigma/omega_loop.c.

Creation OS Ω-loop — fourteen σ phases per cycle

Memory / Engram

Only experiences that pass the gate consolidate into durable memory; recall is σ-aware (thresholds in harnesses vary).

Creation OS engram memory hierarchy — store only what serves truth

Portable σ interrupt (reference)

The portable interrupt lives in python/cos/sigma_gate.h: 12-byte sigma_state_t (Q16.16), no heap in the gate itself. Python mirrors the same algebra in python/cos/sigma_gate_core.py. Optional llama.cpp hook: python/cos/sigma_sampler.h (adds llama.h from upstream ggml-org/llama.cpp; install sampler last in the chain after temperature / XTC).

Claim discipline: numbers in the table are archived lab harness JSON in-tree — do not merge them with microbench throughput or frontier harness rows without the wall in docs/CLAIM_DISCIPLINE.md.

Metric	Value	Source (JSON key)
AUROC wrong vs σ (TruthfulQA holdout, GPT-2)	0.982	`benchmarks/sigma_gate_eval/results_holdout/holdout_summary.json` → `auroc_wrong_vs_sigma`
AUROC wrong vs σ (TriviaQA, cross-domain)	0.960	`benchmarks/sigma_gate_eval/results_cross_domain/cross_domain_summary.json` → `triviaqa_auroc_wrong_vs_sigma`
Wrong + confident (holdout)	0	`benchmarks/sigma_gate_eval/results_holdout/holdout_summary.json` → `wrong_confident_accept`
LSD training CV mean AUROC (full split manifest)	0.943	`benchmarks/sigma_gate_lsd/results_full/manifest.json` → `cv_auroc_mean`

#include "sigma_gate.h"
sigma_verdict_t v = sigma_gate(&state);
/* ACCEPT  → trust trajectory */
/* RETHINK → try a different path */
/* ABSTAIN → do not emit */

Benchmarks: make check-sigma-v57 runs the σ-gate C test, pytest core, and eval drivers (streaming / router / HIDE; Gemma runs when HF_TOKEN or HUGGING_FACE_HUB_TOKEN is set). Regenerate summaries with the scripts under benchmarks/sigma_gate_eval/ and benchmarks/sigma_gate_scaling/. Hardware lab (silicon path): make bench-hardware and cos benchmark --hardware — capture stdout under benchmarks/hardware/ with host metadata (see docs/REPRO_BUNDLE_TEMPLATE.md); do not treat microbench throughput as harness MMLU/ARC.

Forty branchless integer kernels · one composed verdict · 1 = 1 · merge gate

Ship path

What this is
Quick demo
σ primitive · Cascade
Architecture map · σ-Fabric · Ω-loop · Memory
Try it — 30 s smoke
Measured — receipts
Differs — vs field
Architecture — ULTRA + BSC
Beyond inference — cos surface
Install — brew / docker / source
Build — make bars

Proofs · docs · policy

Proof status — Lean + Frama-C
Surface versions — catalogue
Docs hub — DOC_INDEX
Doctoral path — committee order
Limitations — honest scope
License — SCSL / AGPL

README scan map — inverted pyramid L1–L3 (light and dark aware)

FIG 09 — where to look first on this page (adapts to light/dark in supporting clients). VISUAL_INDEX · DOC_INDEX.

— SHIP —

Try it

Zero-to-chat on macOS or Linux — weights optional for CI (COS_INSTALL_NO_BITNET=1).

Fastest path — PyPI (cos CLI; default cos gate uses a deterministic quickstart scorer so you can run without cloning; the trained LSD probe is cos gate --lsd from a full checkout):

pip install creation-os
cos version
cos gate --prompt "What is the capital of France?" --response "Berlin"
# → σ=0.890 verdict=ABSTAIN
cos gate --prompt "What is 2+2?" --response "4"
# → σ=0.060 verdict=ACCEPT

Ops — production HTTP (Dockerfile.prod: cos-serve on port 3001; image does not bake model weights — use Ollama or mount a backend):

docker build -f Dockerfile.prod -t creation-os:prod .
docker run --rm -p 3001:3001 creation-os:prod
curl -fsS http://127.0.0.1:3001/v1/health

# Published builds (on tag `v*`): see `.github/workflows/docker.yml`
# docker run --rm -p 3001:3001 ghcr.io/spektre-labs/creation-os:latest

# Compose: σ-gate + Ollama (see docker-compose.yml)
docker compose up -d
curl -fsS http://127.0.0.1:3001/v1/health
# POST /v1/gate runs generation + σ against the inference backend (requires a healthy Ollama/model).

# Kubernetes
helm install creation-os ./helm/creation-os

Air-gapped bundle (after make cos cos-serve from a checkout):

cos sovereign --package --output creation-os-sovereign.tar.gz

Fast path — under a minute in a checkout, no GGUF download (recorded sigma from benchmarks):

git clone https://github.com/spektre-labs/creation-os
cd creation-os
bash scripts/quickstart.sh

Full path — local weights + cos chat smoke test:

git clone https://github.com/spektre-labs/creation-os
cd creation-os
./scripts/install.sh
./cos chat

scripts/install.sh checks for python3, cmake, and a C compiler; if huggingface-cli and cmake are present, it downloads the 1.2 GB BitNet-b1.58-2B-4T GGUF weights into models/, builds third_party/bitnet (llama-cli + llama-perplexity), builds the cos binary, and runs cos chat --once --prompt "What is 2+2?" as a smoke test. Set COS_INSTALL_NO_BITNET=1 to skip the model download for CI-only clones.

Everything runs locally. Nothing is sent to the cloud. Nothing is logged. Nothing calls home. Safe to re-run; idempotent.

Local-first by construction — the default path never phones home; cloud escalation is explicit, opt-in, and σ-gated when wired.

What `cos chat` can do

cos chat is a σ-gated REPL with four wired phases (see src/cli/cos_chat.c):

phase	primitive	flag
A σ_combined ensemble	`cos_multi_sigma_combine` — logprob · entropy · perplexity · consistency	`--multi-sigma`
B conformal τ	`cos_conformal_read_bundle_json` · auto-loads `~/.cos/calibration.json`	on by default; `--no-conformal` to opt out
C meta-cognitive σ	`cos_ultra_metacog_*` — perception · self · social · situational	`--verbose`
D session coherence	`cos_ultra_coherence_emit_report` — dσ/dt, K_eff, {STABLE, DRIFTING, AT_RISK}	REPL-only; `--no-coherence` to opt out

%%{init: {'theme':'neutral', 'flowchart': {'curve': 'linear'}}}%%
flowchart LR
  A["Phase A<br/>σ_combined"] --> B["Phase B<br/>conformal τ"]
  B --> C["Phase C<br/>meta-cog"]
  C --> D["Phase D<br/>coherence"]

Example:

./cos chat --once --prompt "What is 2+2?" --multi-sigma --verbose
# → [meta: perception=0.35 self=0.06 social=0.45 situational=0.00]
# → round 0  4  [σ_peak=0.06 action=ACCEPT route=LOCAL]
# → [σ=0.063 | CACHE | LOCAL | conformal@α=0.80 | rethink=0 | €0.0000]
# → [σ_combined=0.184 | σ_logprob=0.063 σ_entropy=0.063
#    σ_perplexity=0.063 σ_consistency=0.667 | k=3]

Install (full options)

# One-liner (ephemeral clone + build + `cos demo --batch`)
curl -fsSL https://raw.githubusercontent.com/spektre-labs/creation-os/main/scripts/try_cos.sh | bash

# Homebrew (macOS) — tap repo: spektre-labs/homebrew-cos (Formula lives under packaging/homebrew-cos/)
brew tap spektre-labs/cos
brew install creation-os

# Docker — Alpine cos + cos-demo (lab smoke; see Dockerfile.cos)
docker build -f Dockerfile.cos -t creation-os:cos .
docker run --rm creation-os:cos

# Docker — production σ-gate HTTP (cos-serve + Python cos stack; see Dockerfile.prod)
docker build -f Dockerfile.prod -t creation-os:prod .
docker run --rm -p 3001:3001 creation-os:prod

# When published: docker run --rm -p 3001:3001 ghcr.io/spektre-labs/creation-os:latest

# From source
git clone https://github.com/spektre-labs/creation-os.git
cd creation-os && make cos cos-demo && ./cos demo --batch

Tagged releases also attach macOS universal and Linux tarballs from .github/workflows/release.yml.

Framework integrations (LangChain · LangGraph · CrewAI · AutoGen)

Thin optional shims live under python/cos/integrations/. Default scoring uses the same deterministic quickstart τ bands as cos gate without an LSD pickle; pass a trained SigmaGate for trajectory-probe scores (see docs/CLAIM_DISCIPLINE.md before mixing lab demos with harness receipts).

pip install 'creation-os[langchain]'      # or [langgraph] / [crewai] / [autogen] / [frameworks]

LangChain (callbacks on each LLM end):

from cos.integrations.langchain_sigma import SigmaGateCallback, SigmaAbstainError
# llm = ChatOpenAI(callbacks=[SigmaGateCallback()])

LangGraph (state node + router names output / regenerate / abstain):

from cos.integrations.langgraph_sigma import sigma_gate_node, sigma_gate_router
# graph.add_node("sigma_gate", lambda s: sigma_gate_node(s))
# graph.add_conditional_edges("sigma_gate", sigma_gate_router)

CrewAI (tool):

from cos.integrations.crewai_sigma import SigmaGateTool
# Agent(tools=[SigmaGateTool()])

AutoGen-style dict messages (hook on a full thread):

from cos.integrations.autogen_sigma import SigmaAutoGenHook, SigmaGateHook
# hook.process_last_received_message(messages, sender)  # mutates last dict; ABSTAIN adds suffix
# SigmaGateHook(...).process_message(sender, receiver, {"content": text})  # single-message copy

Copy-paste recipes and cos integrations --check / --example: docs/v152/INTEGRATIONS.md.

Any Python callable:

from cos.decorators import sigma_gated

@sigma_gated
def my_agent(prompt: str) -> str:
    return model.generate(prompt)

Interop SDK re-exports (requires creation-os installed in the same env): from creation_os.integrations import SigmaGateCallback, SigmaAutoGenHook, sigma_gated_llm (dict-returning wrapper with a gate), and sigma_gated (decorator from cos.decorators) — see python/creation_os/integrations/__init__.py.

σ-red-team (gate robustness)

Adversarial batch aimed at the σ-gate (can a hallucinated completion still earn ACCEPT?). Offline CI uses a mock generator + quickstart gate:

pip install -e ".[dev]"   # from a checkout, or pip install creation-os
python -m cos red-team --mock --n 50 --ci --threshold 0.05
python -m cos red-team --mock --ci --threshold 0.05 --all
# or: ./scripts/cos red-team --mock --ci --threshold 0.05 --n 20

Optional: --attack confident_hallucination, --output report.json, --lsd with a pickle for the real probe. See python/cos/sigma_red_team.py, .github/workflows/red-team.yml.

Integration	LangSmith	Guardrails AI	NeMo	Creation OS
σ per assistant/LLM step	✗	partial	✗	✓
Trajectory / hidden-state probe	✗	✗	partial	✓ (LSD pickle when configured)
ACCEPT / RETHINK / ABSTAIN	✗	mostly block/pass	✗	✓
LangChain	native	✓	✗	✓ callback
LangGraph	native	partial	✗	✓ node + router
CrewAI	partial	partial	✗	✓ tool
AutoGen	✗	✗	✗	✓ hook
Decorator	✗	✗	✗	✓ `@sigma_gated`
Local-first default	✗	✓	✗	✓

— EVIDENCE —

Measured

Evidence hygiene — read CLAIM_DISCIPLINE before citing metrics from this section. Do not merge microbench throughput with harness MMLU / ARC in one headline.

Two independent, reproducible evidence surfaces — both use real BitNet-b1.58 2B4T weights; neither is simulated. Claim-class rules: docs/CLAIM_DISCIPLINE.md. Compact re-run bundle for the v3.0 wired pipeline (identical numbers, one command each): benchmarks/final5/README.md.

Evidence ladder — arithmetic vs measured vs harness vs lab demo

FIG 03 — which numbers may travel together (never merge microbench throughput with harness MMLU in one headline). VISUAL_INDEX.

TruthfulQA 817 · baseline0.261
bitnet_only σ-pipeline · scored acc.0.336
pipeline Conformal τ (SCI-1)0.655
α=0.80, δ=0.10

1. TruthfulQA 817 (generation/validation, end-to-end)

Full run of the TruthfulQA generation/validation split through llama-cli, scored by substring match against each row's correct_answers / incorrect_answers. Raw artefacts: benchmarks/pipeline/truthfulqa_817.json · benchmarks/pipeline/truthfulqa_817_detail.jsonl · commentary docs/domain_analysis.md.

configuration	N	scored	correct	accuracy (of scored)	coverage	mean σ	rethink rate	wall (s)
`bitnet_only` (no σ-gate)	817	111	29	0.261	0.136	0.370	0.000	1 554.8
`pipeline` (σ-gate on)	817	140	47	0.336	0.171	0.391	0.991	4 804.7

On the same 817 prompts and seeds, the σ-pipeline lifts scored-accuracy from 0.261 → 0.336 (+28.7 % relative) and coverage from 0.136 → 0.171 (+25.7 % relative). Mean σ is essentially unchanged (0.370 → 0.391), so the gain comes from selective regeneration on initially-uncertain rows, not from the model itself becoming more confident. All numbers are read directly from the JSON artefact; no row is projected.

“Accuracy (of scored)” is conservative — rows whose generated text contains neither a correct nor an incorrect string are excluded from both numerator and denominator — and is not directly comparable to lm-eval MC2. See docs/BENCHMARK_PROTOCOL.md.

Conformal guarantee (SCI-1, α=0.80, δ=0.10): the same run yields τ=0.655 with P(wrong | σ≤τ) ≤ α on exchangeable draws from the calibration distribution. Caveats and the scope of the bound: docs/v111/CONFORMAL_GUARANTEE.md.

2. v111 Frontier parity matrix (σ vs entropy, Bonferroni-controlled)

Pre-registered σ-gate vs entropy baseline on four benchmark families. σ is not a universal calibration signal — this table is the single source of truth, positive and negative results side by side.

family	task	status at α_fw = 0.05	signal	ΔAURCC	n
PRE-REGISTERED	`truthfulqa_mc2`	win (v111.1, Bonf N=24)	`sigma_max_token`	−0.0447 (p = 0.0005)	817
PRE-REGISTERED	`truthfulqa_mc2`	win (v111.2-prereg test split, Bonf N=12)	`sigma_task_adaptive`	−0.0681 (p ≈ 0.0005)	409
POST-HOC	`arc_challenge`	directional, not replicated at α_fw	`sigma_product`	−0.0087 (full-data p = 0.004; test-split p = 0.145)	1172 / 586
NEGATIVE	`hellaswag`	σ not dominant	— (entropy baseline)	`σ_product` Δ = −0.0016, p = 0.68	746
NEGATIVE	`mmlu_*` (7 eligible / 10 candidates)	σ not dominant — 0 / 28 Bonf-sig. cells	— (entropy baseline)	best σ Δ = +0.0000, worst = +0.0152	605

Lower AURCC is better. Full table with CI95 and p-values: benchmarks/v111/results/frontier_matrix.md. Reproduce end-to-end:

bash benchmarks/v111/run_matrix.sh               # all four tasks
bash benchmarks/v111/check_v111_matrix.sh        # CI-safe smoke

The σ-gate's Bonferroni-significant domain is therefore bounded to TruthfulQA-style factual-confidence tasks, not general MMLU-style knowledge-QA. Methodology and signal definitions: docs/v111/THE_FRONTIER_MATRIX.md.

3. Multi-dataset σ-gate suite (SCI-6)

Aggregator: ./cos-bench-suite-sci. Output: benchmarks/suite/full_results.json, schema cos.suite_sci.v1.

dataset	status	N	acc(all)	acc(accepted)	coverage	σ_mean	τ	conformal
TruthfulQA (gen/val, scored)	measured	817	0.336	0.336	0.171	0.391	0.655	yes @ (α=0.80, δ=0.10)
ARC-Challenge	measured	1172	0.337	0.337	0.969	0.508	0.650	yes
ARC-Easy	measured	2376	0.420	0.420	0.947	0.477	0.650	yes
GSM8K	measured	1319	0.125	0.000	0.109	0.481	0.330	no (τ invalid @ δ)
HellaSwag (500 val)	measured	500	0.285	0.285	0.960	0.533	0.650	yes

All five rows are "measured": true in benchmarks/suite/full_results.json (BitNet-b1.58-2B, cos chat, pipeline mode filter). GSM8K: few rows expose a gradable #### answer (low scored coverage), and the conformal τ search does not yield a valid guarantee at the pinned (α, δ) — the table shows honest zeros for acceptance metrics, not projections. Reproduce: benchmarks/suite/README.md and benchmarks/suite/run_all_detail.sh.

Domain read: σ-gate + conformal line up on TruthfulQA and ARC-style MC; GSM8K at 2B is mostly unscored or τ-invalid at this δ; HellaSwag stays modest. σ is not universal across benchmarks.

σ-gate v4 (LSD contrastive probe, GPT-2, TruthfulQA200)

This row is a separate evidence class from the BitNet TruthfulQA-817 generation table above: a Python probe trained with the Sirraya LSD-style contrastive stack on 200 bundled TruthfulQA prompts plus synthetic negatives, then evaluated with sklearn trajectory features. See docs/CLAIM_DISCIPLINE.md: do not merge these AUROC figures with the 817-row cos chat accuracy table in one headline.

Metric	Value
Method	LSD contrastive hidden-state probe + logistic head
AUROC (5-fold CV, training manifest)	0.9428 (`benchmarks/sigma_gate_lsd/results_full/manifest.json`)
Training pairs	800 (balanced factual / hallucinated; includes GPT-2–sampled negatives)
Inference	One forward pass through the probe causal LM + trajectory feature pass
Wire layout	12 bytes (`python/cos/sigma_gate.py` → `pack_measurement`)
Based on	arXiv:2510.04933 (The Geometry of Truth)

σ-gate: runtime hallucination detection (measured in this tree)

Method (summary): contrastive LSD-style training on hidden-state trajectories with a MiniLM-L6-v2 truth encoder; margin loss; single forward pass through the probe GPT-2 checkpoint for scoring (no multi-sample decoding for the detector itself). Default cross-domain eval uses PRISM-style prompt suffixes and semantic correctness labels (MiniLM cosine) where noted below.

Ship-ready summary (validated claims only). Do not merge these AUROCs with the BitNet TruthfulQA-817 accuracy table above, or with spectral / unified fusion experiments, in one headline — see docs/CLAIM_DISCIPLINE.md.

Benchmark	AUROC	Type	N	Status
TruthfulQA (5-fold CV)	0.943	In-distribution	200	Validated
TruthfulQA (30% holdout)	0.982	True holdout	57	Validated
TriviaQA (greedy GPT-2)	0.960	Cross-domain	100	Validated
HaluEval (generative)	—	Pending label protocol refinement	—	In progress

Method: LSD contrastive hidden-state probe (arXiv:2510.04933). Training: 800 balanced pairs, ~15 epochs (see adapt_lsd / manifest). Inference: one scoring forward through the probe bundle; 12-byte wire blob via SigmaGate.pack_measurement.

Experimental HaluEval generative smoke numbers (when run) live in benchmarks/sigma_gate_eval/results_cross_domain/cross_domain_summary.json — not promoted here until the label story is frozen.

Harness detail (artifact paths, includes generative HaluEval smoke when present):

Benchmark	AUROC	Type	N	Artifact
TruthfulQA (5-fold CV)	0.943	In-distribution	200-pair protocol	`benchmarks/sigma_gate_lsd/results_full/manifest.json`
TruthfulQA (holdout)	0.982	Held-out prompts	57	`benchmarks/sigma_gate_eval/results_holdout/holdout_summary.json`
TriviaQA (greedy GPT-2)	0.960	Cross-task smoke	100	`benchmarks/sigma_gate_eval/results_cross_domain/cross_domain_summary.json`
HaluEval (`qa_samples`, generative)	0.383	Cross-task smoke	100	`cross_domain_summary.json` (`halueval_mode`: `generative`)

Comparison (orientation only). Published AUROCs below are not head-to-head on the same CSV / split as this harness; they summarize commonly cited ranges or paper tables. Do not merge them with the in-repo rows in one headline without that wall. See docs/sigma_gate_v4_comparison_table.md and docs/CLAIM_DISCIPLINE.md.

Method	Typical reported AUROC	Forward passes (detector)	Year
Semantic entropy (sampling)	~0.79 (setup-dependent)	Many (5–20+)	2024
SelfCheckGPT-style	~0.76 (setup-dependent)	Several+	2023
HalluShift (example)	~0.90 (paper tables; task-dependent)	1	2026
LSD (Geometry of Truth)	~0.92–0.96 (vendor / paper setups)	1	2025
σ-gate (this repo)	0.982 holdout / 0.960 TriviaQA	1 scoring forward	2026

Limitations: short-form QA only (TruthfulQA / TriviaQA smoke); GPT-2-scale probe target; white-box hidden states required; long-form and domain-specific (medical, legal) evaluation not claimed here; HaluEval generative labels combine HF hallucination flags with cosine alignment to the provided answer string — treat as a smoke harness, not a reproduction of HaluEval leaderboard conditions.

Usage:

from cos.sigma_gate import SigmaGate
gate = SigmaGate("benchmarks/sigma_gate_lsd/results_holdout/sigma_gate_lsd.pkl")
sigma, decision = gate(model, tokenizer, prompt, response)
# sigma in [0, 1], decision in {ACCEPT, RETHINK, ABSTAIN}; wire blob: gate.pack_measurement(...)

Acknowledgments: LSD contrastive framework arXiv:2510.04933; semantic-entropy line (Kuhn et al., Farquhar et al.); RLHF / uncertainty discussions in the literature (e.g. Liu 2026, arXiv:2603.24124) for motivation only unless separately reproduced in-tree.

Integration: the repo root already ships a C executable named cos; the importable Python module lives under python/cos/. Use PYTHONPATH=python (or make check-sigma-gate) and from cos.sigma_gate import SigmaGate. Default probe path: benchmarks/sigma_gate_lsd/results_full/sigma_gate_lsd.pkl.

End-to-end eval harness: greedy GPT-2 completions on the same CSV, weak substring labels vs. best_answer, AUROC in benchmarks/sigma_gate_eval/results/eval_summary.json after python3 benchmarks/sigma_gate_eval/run_eval.py (venv: benchmarks/sigma_gate_lsd/.venv). That AUROC is not the same statistic as the CV row (different labels and generative setup).

Method	Typical evidence	Forward passes (detector)
Semantic-entropy family (e.g. Kuhn et al.)	Published MC / QA setups (task-dependent AUROC)	Many (multi-sample)
Self-consistency / self-check variants	Task-dependent	Several+
σ-gate v4 (this probe)	CV on curated pairs + optional `run_eval.py`	1 LM forward for scoring

The left column cites families, not a claim that a single external AUROC number was reproduced on this CSV; see docs/EXTERNAL_EVIDENCE_AND_POSITIONING.md.

Holdout protocol: python3 benchmarks/sigma_gate_lsd/create_splits.py writes benchmarks/sigma_gate_lsd/splits/{train,holdout}.csv (default: stratified by category). Retrain with adapt_lsd.py --prompts .../train.csv and evaluate on holdout via python3 benchmarks/sigma_gate_eval/run_holdout_eval.py, or run bash run_holdout_pipeline.sh (long; uses the LSD venv; includes Step 4 TriviaQA + HaluEval cross-domain smoke with no probe retraining).

Cross-domain only: make sigma-cross-domain (requires network + datasets).
One-screen summary from JSON: make sigma-all-results.
Publication-style table template: docs/sigma_gate_v4_publication_results.md.
Checksums + regenerative cross-domain refresh: bash run_ship.sh (optional CREATION_SHIP_COMMIT=1 / CREATION_SHIP_PUSH=1).

σ-gate v5 (lab, optional): multi-dataset + semantic labels + leave-one-source-out — make sigma-v5 or bash run_v5.sh; see benchmarks/sigma_gate_v5/README.md.

Pre-generation scaffold (HALT / ICR direction, not shipped weights): python/cos/sigma_gate_precheck.py (SigmaPrecheck) defaults to normalized next-token entropy on the prompt (one forward). python/cos/sigma_gate_full.py (SigmaGateFull) chains precheck → optional generate → LSD SigmaGate post score. See module docstrings; calibrate tau_skip on your traffic.

Comparison table (with claim discipline): docs/sigma_gate_v4_comparison_table.md. Reddit drafts: docs/reddit_ml_post_v2.md, docs/reddit_ml_sigma_gate_v4.md, docs/reddit_ml_sigma_gate.md.

Reasoning per joule (ULTRA-7)

Energy-aware pipeline runs (make check-ultra, --energy) report accuracy together with joules per query and reasoning per joule (higher means more correct signal per unit energy spent). Figures below are pinned demo rows from the bundled ULTRA harness — not merged with TruthfulQA harness accuracy; see docs/CLAIM_DISCIPLINE.md.

Config	Accuracy	J/query	Reasoning/J
bitnet_only	0.261	0.8J	0.326
σ-pipeline	0.336	1.2J	0.280
σ-selective	0.520	0.5J	1.040

σ-selective answers only when certain: fewer wrong answers → less wasted energy → higher reasoning/joule.

How Creation OS differs

Feature	Creation OS	OpenClaw ~302k★	Hermes ~95k★	Ollama ~130k★
σ per token	✓	✗	✗	✗
Conformal guarantee	✓ α=0.80	✗	✗	✗
ABSTAIN when unsure	✓	✗	✗	✗
Formal proofs	Lean+Frama-C	✗	✗	✗
Self-improving	✓ Ω-loop (`cos-evolve`)	✗	✓ skills	✗
Reasoning/joule	✓ measured	✗	✗	✗
Theory papers	~80 CC BY 4.0 (`data/corpus/`)	✗	✗	✗
Stars (GitHub)	~30	~302k	~95k	~130k

★ Star counts are informal social signals on a public forge and change daily — they are not an engineering scorecard.

They are bigger. We measure σ. Nobody else does.
Full comparison: docs/comparison.md.

— STACK —

Architecture

FIG 08 — single-file kernel narrative over Hypercube, Oracle, world model, BSC core, Soul, Proconductor. VISUAL_INDEX.

Full ULTRA pipeline (one turn)

Interactive graph (GitHub renders Mermaid). Plain-text twin lives in the foldout under it — same graph, copy-paste friendly.

%%{init: {'theme':'neutral', 'flowchart': {'curve': 'basis'}}}%%
flowchart TB
  P([Prompt]) --> C[Codex · soul]
  C --> M[Meta-cognition]
  M --> E{Engram lookup}
  E -->|HIT| O[Return cached · 0 ms]
  E -->|MISS| X[sigma-MoE to JEPA to neuro-symbolic]
  X --> SD[Selective decoding]
  SD --> B[BitNet generate]
  B --> PT[Per-token σ]
  PT --> G{Conformal gate}
  G -->|ACCEPT| N[Engram store]
  G -->|RETHINK| R[Recurrent depth · TTT]
  R --> G
  G -->|Escalate| ES[Swarm or API]
  ES --> CH[Coherence · dσ/dt · K_eff]
  N --> Z["Output · σ · EUR · J/query"]
  O --> Z
  CH --> Z

ASCII pipeline (identical topology · terminal-friendly)

 Prompt
   │
   ▼
 Codex (soul) ─────────────── Atlantean system prompt
   │
   ▼
 Meta-cognition ──────────── perception · self · social · situational
   │
   ▼
 Engram lookup ──── HIT ──► return cached (0ms, €0.00)
   │ MISS
   ▼
 σ-MoE routing ──────────── adaptive k experts by σ
   │
   ▼
 JEPA world model ────────── σ_world: understanding vs repetition
   │
   ▼
 Neuro-symbolic ──────────── System 1 (fast) / System 2 (deliberate)
   │
   ▼
 Selective decoding ──────── compute only when σ changes
   │
   ▼
 BitNet generate ─────────── ternary {-1,0,+1}, integer-only
   │
   ▼
 Per-token σ ─────────────── logprob + entropy + perplexity + consistency
   │
   ▼
 Conformal gate ──────────── P(wrong|ACCEPT) ≤ α, mathematically guaranteed
   │
   ├── ACCEPT ──► engram store ──► response + σ + cost
   │
   ▼ RETHINK (≤3 rounds)
 Recurrent depth ─────────── loop until σ < τ or overthinking detected
   │
   ▼ σ still high
 Escalate ────────────────── swarm peers or API fallback
   │
   ▼
 Coherence check ─────────── dσ/dt, K_eff, Lagrangian conservation
   │
   ▼
 Output + σ_combined + cost (€) + reasoning/joule

Canonical source: src/sigma/pipeline/pipeline.h · src/sigma/pipeline/pipeline.c · src/cli/cos_chat.c.

Forty integer kernels, one AND gate

Every emission from cos chat passes forty integer kernels — each one a falsifiable statement about the answer. Categories:

 reasoning soundness · reversibility · meta-cognition
 world-model coherence · memory integrity
 adaptive compute · geometric algebra · sheaf topology
 post-quantum crypto · homomorphic compute
 neuromorphic spikes · hierarchical active inference
 quantum amplitude amplification · integer diffusion sampler
 Q-learning + GAE + PPO · persistent homology
 structural causal do-calculus · sub-quadratic Hyena
 security · provenance

Hot path: branchless, Q16.16 fixed-point, libc + libm only. The runtime refuses to emit unless every one of the forty kernels returns PASS. Rollup target: make check-v60 … make check-v100.

Full forty-kernel receipt (16 416 185 PASS / 0 FAIL as of current head, ASAN + UBSAN clean): docs/README_FULL.md. Figure and palette rules: docs/VISUAL_INDEX.md.

BSC primer

BSC primitives — XOR bind, MAJ bundle, popcount to sigma

FIG 06 — teaching strip for bind / bundle / similarity. VISUAL_INDEX.

Binary Spatter Coding (BSC) in a nutshell: bind = XOR · bundle = popcount threshold · similarity = 1 − hamming/D. Memory is one bit per dimension; binding and bundling are branchless on every hot path. The Spektre Corpus traces this lineage from Kanerva (1988, 1994) forward to the 2025 HDC/VSA robustness estimation literature — see docs/HDC_VSA_ENGINEERING_SUPERIORITY.md and data/corpus/INDEX.md.

BSC vs GEMM performance

At D = 4096, XNOR binding requires 87,000× fewer bit-ops than a naive float32 dense matmul at the same logical width; at 128K tokens the arithmetic gap crosses 2,000,000× (same encoding assumptions as §7 / README limitations — not a merged throughput headline; run make bench for time). BSC recovers the exact algebraic object that softmax attention approximates in continuous relaxation. Binding fidelity on the reference hot path: 1.0000 (see make check / BSC core tests).

Operation	Transformer	Creation OS
Attention	O(n²) softmax	O(n) XNOR bundle
Dense layers	float32 MatMul	ternary add/sub
Memory (13B)	48.5 GB	4.19 GB
Power	300W GPU	5.8W CPU

GEMM vs BSC — memory and op-proxy ratios at README definitions

FIG 07 — 32× RAM and 192× op-proxy at D = 4096 (see limitations for throughput vs arithmetic). VISUAL_INDEX.

Benchmark: bench/gemm_vs_bsc.c (make bench → ./gemm_vs_bsc). Theory: data/corpus/. HDC/VSA lineage: docs/HDC_VSA_ENGINEERING_SUPERIORITY.md.

Self-improvement (Ω-loop)

Creation OS improves itself autonomously (evaluator-first; see docs/OMEGA_EVOLVE.md):

cos-evolve evolve — σ-guided weight / parameter mutations (keep if fitness improves, revert otherwise; scaffold today, mutator pluggable).
cos-evolve discover — declarative hypothesis harness → JSONL verdicts.
cos-evolve calibrate-auto — τ-sweep / conformal operating-point search on a labeled fixture.
cos omega — dispatches the recursive Ω driver (creation_os_sigma_omega).

The machine that improves while you sleep — and stops when the gate says so.

Ω = argmin ∫σ dt subject to K ≥ K_crit.
Implemented: src/sigma/evolve/.

— SURFACE —

Beyond inference

One cos front door plus dedicated σ binaries — all instrumented.

Creation OS is not just a chat interface.

Planes A–B–C — analysis map for stack positioning

FIG 05 — Planes A–B–C (where σ-gates sit vs silicon vs product). ANALYSIS · VISUAL_INDEX.

Command	What it does
`cos chat`	σ-gated local inference
`cos-evolve`	self-improving Ω stack (`evolve` · `memory-*` · `calibrate-auto` · `discover` · `omega` · `daemon`)
`cos swarm`	multi-agent σ-coordinated routing (mock σ peers in v0)
`cos sandbox`	isolated safe process execution (allowlist + rlimits)
`cos plan`	long-horizon planning with snapshot rollback
`cos predict`	σ-JEPA lab: latent one-step predict, roll-out imagination, or low-σ plan pick (JSON)
`cos exec`	digital twin pre-execution simulation
`cos-calibrate`	conformal bundle helpers (see `make cos-calibrate`)
`cos health`	system status + coherence monitoring
`cos benchmark`	full benchmark suite + energy metrics (`--energy`)
`cos mcp`	MCP server — σ-gate as infrastructure
`cos a2a`	agent-to-agent with σ-trust

MCP: LSD σ-gate (Python, stdio)

Optional MCP tools score (prompt, response) with the lab LSD pickle (python3 -m cos.mcp_sigma_server with PYTHONPATH=python; see docs/MCP_SIGMA.md, docs/MCP_LISTING.md, and docs/EU_AI_ACT_COMPLIANCE.md for marketplace / transparency framing). The JSON-RPC server scripts/cos_mcp_server.py adds matching sigma_gate_* tools (response shape may differ until aligned). Requires pip install 'mcp[cli]' in your venv for the FastMCP entrypoint. Governance: follow docs/CLAIM_DISCIPLINE.md for any AUROC claims; audit logs are operator hooks, not legal certification.

make sigma-mcp-smoke   # import check (needs mcp in sigma_gate_lsd venv)
make sigma-mcp-serve   # stdio MCP server (SIGMA_PROBE_PATH)

%%{init: {'theme':'neutral', 'flowchart': {'curve': 'basis'}}}%%
flowchart LR
  H[Operator] --> FD[cos dispatcher]
  FD --> G1[chat · benchmark · cost]
  FD --> G2[swarm · sandbox · plan · exec]
  FD --> G3[health · mcp · a2a]
  H --> OM[cos-evolve]
  OM --> G4[evolve · discover · calibrate-auto · omega]

Every surfaced turn measures σ. Every gate decision is logged.

— TOOLCHAIN —

Build

Minimal (any C11 + libm)

cc -O2 -I. -o creation_os creation_os_v2.c -lm
./creation_os

Make (default -O2 -march=native)

make help           # full target list (labs, RTL, benches)
make check          # standalone + tests/test_bsc_core (fast)
make check-ultra    # ULTRA-1..11 kernels + bundle + --energy
make check-sigma-pipeline
                    # every σ-pipeline kernel + 12 integration scenarios
make merge-gate     # check + check-v6 … check-v306 (maintainer bar)

Flagship check targets: make check-v6 (Living Kernel, 30 self-tests) … make check-v29 (collapse harness, 22 self-tests); make check-v60 … make check-v100 (forty-kernel composed stack).

Optional (not in merge-gate): σ labs, MCP, RTL, native-M4 — make formal-sby-v37, make verify-agent, make red-team, make certify, make check-mcp, make check-native-m4, make formal-rtl-lint, make stack-ultimate.

Host metadata when publishing numbers: docs/REPRO_BUNDLE_TEMPLATE.md.

— PROOFS —

Proof status

Lean 4: 14 / 14 proof obligations discharged, sorry-free — T1–T6 in hw/formal/v259/Measurement.lean plus eight stack lemmas in formal/lean/CreationOS/V133.lean; make check-v259 (primitive) + make check-lean-t3-discharged (Lean gate + v133).
Frama-C: ACSL clause ledger 30 tracked lines (v259 companion + hw/formal/v133/sigma_stack_contracts.acsl), via creation_os_sigma_formal_complete; tier-1 Wp remains 15 goals on cos_sigma_measurement_gate + cos_sigma_measurement_clamp (scripts/v259/run_frama_c_wp.sh).
SBY + EQY (YosysHQ OSS CAD Suite, optional): make stack-singularity — hw/formal/README.md.
Formalism → silicon map: docs/FULL_STACK_FORMAL_TO_SILICON.md.
Conformal guarantee (selective prediction, Angelopoulos-Bates): P(wrong | σ ≤ τ) ≤ α with confidence 1 − δ on the calibration draw — docs/v111/CONFORMAL_GUARANTEE.md.

— VERSIONS —

Surface versions

Full version catalogue (v112–v306+): docs/SURFACE_VERSIONS.md — per-version make check-vNN targets and dominant primitives live there so this README stays a ship + evidence + architecture front door, not a catalogue dump.

— DOCS —

Docs hub

Canonical index: docs/DOC_INDEX.md. Three audience tracks:

Tier 1 — default paths

You need…	Open
Full map of markdown	`docs/DOC_INDEX.md`
Evidence / headline rules	`docs/CLAIM_DISCIPLINE.md`
Mis-readings we fixed	`docs/COMMON_MISREADINGS.md`
Binaries & CI matrix	`docs/FEATURES_AND_STANDALONE_BUILDS.md`
Plain-language snapshot	`docs/PARADIGM_SNAPSHOT_FOR_DRIVE_BY_READERS.md`
Figure & SVG rules	`docs/VISUAL_INDEX.md`
Push hygiene	`docs/publish_checklist_creation_os.md`

Tier 2 — benchmarks, thesis, industry

Topic	Doc
Analysis / Planes A–C	`docs/ANALYSIS.md`
`make bench` / §7 protocol	`docs/BENCHMARK_PROTOCOL.md`
§1 – §26 evidence index	`docs/MODULE_EVIDENCE_INDEX.md`
Thesis spine (RQ, threats, contributions)	`docs/RESEARCH_AND_THESIS_ARCHITECTURE.md`
Repro bundle for published numbers	`docs/REPRO_BUNDLE_TEMPLATE.md`
HDC / VSA ↔ engineering	`docs/HDC_VSA_ENGINEERING_SUPERIORITY.md`
Glossary	`docs/GLOSSARY.md`
Selective prediction formal framework	`docs/selective_prediction.md`

Tier 3 — silicon, remotes, governance

Topic	Doc
RTL mirror (SV, Chisel, Yosys, Rust, formal)	`docs/RTL_SILICON_MIRROR.md`
Formalism → silicon	`docs/FULL_STACK_FORMAL_TO_SILICON.md`
σ stack map (v33 → v100 + HDL)	`docs/SIGMA_FULL_STACK.md`
Mobile + messenger + legacy-app bindings	`bindings/README.md`
MCP σ server	`docs/MCP_SIGMA.md` · `make check-mcp` · `make sigma-mcp-smoke`
Git remotes	`docs/CANONICAL_GIT_REPOSITORY.md`
Contributing · security · agent rules	`CONTRIBUTING.md` · `SECURITY.md` · `AGENTS.md`
Maintainers + merge gate	`docs/MAINTAINERS.md`
English-only policy	`docs/LANGUAGE_POLICY.md`
Citation metadata	`CITATION.cff` · `docs/CITATION.bib`

Archived full README (pre-slim, narrative / diagram-heavy): docs/README_FULL.md. README slim plan / future iterations: docs/README_REFACTOR_PLAN.md.

Theory

Roughly 80 CC BY 4.0 theory papers ship under data/corpus/ — every fork carries the full text bundle. Zenodo DOIs and catalogue indices: data/corpus/INDEX.md.

Core equation: K_eff = (1 − σ) · K. Distortion Theory of Intelligence: scale compensates for a broken architecture; fix the architecture and a 2B-class stack can punch at the fidelity envelope people associate with trillion-parameter clouds — without mixing lab toy kernels with harness claims; see docs/CLAIM_DISCIPLINE.md.

— COMMITTEE —

Doctoral and committee read path

Read in order once before citing any number or narrative title from this tree:

docs/CLAIM_DISCIPLINE.md — evidence classes, forbidden merges, falsifiers for the portable core.
docs/RESEARCH_AND_THESIS_ARCHITECTURE.md — RQ1 – RQ4, contributions C1 – C6, threats to validity, chapter outline, pre-defense gates.
docs/REPRO_BUNDLE_TEMPLATE.md — minimum metadata when a metric leaves the lab.
docs/FEATURES_AND_STANDALONE_BUILDS.md — which binary is which (creation_os vs creation_os_v6 … v12), self-test counts, CI.
docs/MODULE_EVIDENCE_INDEX.md — §1 – §26 in creation_os_v2.c: evidence class per section before you cite a module headline.
Scoped kernel docs for any line you cite from v6 – v12: LIVING_KERNEL_V6.md, HALLUCINATION_KILLER_V7.md, PARAMETERS_IN_SILICON_V9.md, THE_REAL_MIND_V10.md, THE_MATMUL_FREE_MIND_V11.md, THE_TENSOR_MIND_V12.md.
docs/ADVERSARIAL_REVIEW_CHECKLIST.md — hostile review simulation before submission.

Rule for dissertations: v6 – v12 are Lab demo (C) appendices with their own evidence-class headers; do not fold their toy outputs into the same tables as §7 throughput without an explicit wall — see CLAIM_DISCIPLINE §1.

Portable proof vs standalone lab demos (evidence classes)

FIG 04 — portable proof vs extended lab demos (evidence-class guardrail). VISUAL_INDEX.

— SCOPE —

Limitations

This is a research prototype. Full list with scope and caveats: docs/limitations.md. Short form:

σ is not a universal signal. Bonferroni-significant on TruthfulQA-class factual-confidence tasks; on HellaSwag and MMLU-eligible subjects entropy is the best signal (v111 matrix).
Conformal guarantee is exchangeable-draw, finite-sample; distribution shift reverts the bound to empirical AURCC.
v6 – v29 extended kernels are Lab demo (C) appendices with internal self_test consistency, not harness rows, tape-out, or trained LM reproduction.
Arithmetic vs throughput. 192× ops and 32× RAM are arithmetic ratios at D = 4096. Throughput requires make bench plus archived host metadata.
BitNet + σ kernel lab: integer ternary matvec + sigma_gate_tiny step + cache/speculative hooks — make bench-bitnet-sigma or cos benchmark --bitnet-sigma (toy dimensions; archive under benchmarks/bitnet/; not full 2B4T tok/s without a harness bundle).
σ-sparse + SSM hybrid (lab): make bench-hybrid / cos benchmark --hybrid — integer sparse attention + toy SSM path (benchmarks/sigma_hybrid/; not llama.cpp).
BitNet quickstart downloads real 1.2 GB weights; the local runtime is real. Cloud escalation is opt-in and off by default.

Enterprise (pilot MVP)

Goal: one command HTTP σ-gate + audit JSONL + tiny Python client + stdout audit summary.

`cos serve` (HTTP)

make cos && make cos-serve, then:

./cos serve --port 3001
curl -s http://127.0.0.1:3001/v1/health
curl -s -X POST http://127.0.0.1:3001/v1/gate \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"What is 2+2?"}'

If ollama serve is listening on 127.0.0.1:11434, cos serve sets COS_INFERENCE_BACKEND=ollama and picks a default model from /api/tags (prefers gemma3:4b) when you have not set COS_OLLAMA_MODEL. You can still override with env vars or a "model" field in JSON.

Endpoints: POST /v1/gate, POST /v1/verify, GET /v1/health, GET /v1/audit/{id}. Append-only audit: ~/.cos/audit/YYYY-MM-DD.jsonl.

`cos report`

cos report

Prints a human-readable summary of all ~/.cos/audit/*.jsonl rows (counts, mean σ).

Python SDK (`pip install`)

From a checkout: pip install ./sdk/python (see sdk/python/README.md for a git+https://…#subdirectory=sdk/python one-liner).

from creation_os import CreationOS

cos = CreationOS()
print(cos.gate("What is 2+2?"))

Optional LangChain hook: integrations/langchain_sigma.py (needs PYTHONPATH including sdk/python).

Graded metrics (lab, one CSV)

Metric	Value (evidence)
AUROC	0.8123 on graded-50 run — see `benchmarks/graded/RESULTS.md` and source CSV named there
Evidence class	Lab reporting on a fixed graded set; not a frontier harness row

Pricing (positioning only)

No payment logic ships in this tree; licence terms are unchanged (LICENSE).

Tier	What	Indicative price
Open source	`cos`, σ-gate kernels, `cos serve`, audit JSONL, `cos report`, SDK	Free under SCSL / AGPL terms
Pro (positioning)	Commercial support / packaging around the same bits	€49/month (contact Spektre Labs)
Enterprise (positioning)	SLA, custom τ, on-prem	Contact

— LICENSE —

License

Dual-licensed; the choice is not at your discretion — see LICENSE §0 for which one binds you. A third option (paid Commercial License) is available only from the Licensor.

Path	Cost	Document
Spektre Commercial Source License v1.0 (primary)	free for non-commercial	`LICENSE-SCSL-1.0.md`
GNU AGPL v3.0-only (fallback after 4-yr Change Date, AGPL-derived portions)	free	`LICENSE-AGPL-3.0.txt`
Commercial License (closed-source / SaaS / OEM / Sovereign / Strategic)	paid	`COMMERCIAL_LICENSE.md`
Contributor License Agreement	n/a	`CLA.md`

TL;DR

Private individuals · academia · non-profits · journalism · reproducibility / security audits · 30-day commercial evaluation (under EUR 1 M revenue) → FREE under SCSL-1.0.
For-profit > EUR 1 M revenue · hosted SaaS / model-as-a-service / agent-as-a-service (unless you publish the complete service-stack source per SCSL §5) · OEM closed-source redistribution → paid Commercial License required.
All government / military / intelligence / law-enforcement operational use (SCSL §9.1(b)) → DENIED at any price; civilian Sovereign deployments by EU CFR / ECHR / ICCPR-bound states under SCSL §9.3.
Sanctioned Persons (EU / UN / OFAC / UK HMT / Finland) and parties credibly accused of Aggression (Rome Statute Art. 8 bis) → categorical denial (SCSL §10).

Sole holder of all paid commercial rights: Lauri Elias Rainio (ORCID 0009-0006-0903-8541) and Spektre Labs Oy, jointly and severally. No other person or entity may grant a Commercial License; any attempted grant is void ab initio (SCSL §4.3).

Every Receipt emitted by Creation OS carries the SHA-256 of LICENSE-SCSL-1.0.md (SCSL §11). The pinned reference hash lives in LICENSE.sha256 and is independently verifiable:

shasum -a 256 LICENSE-SCSL-1.0.md       # macOS
sha256sum LICENSE-SCSL-1.0.md           # POSIX
bash tools/license/license_sha256.sh    # bundled helper
make license-attest                     # full: 11 KAT + bundle + sample receipt

Full human-readable explainer: docs/LICENSING.md · who-may-do-what matrix: docs/LICENSE_MATRIX.md · trademark / patent notices: NOTICE.

Lauri Elias Rainio · Spektre Labs Oy · Helsinki, Finland ORCID: 0009-0006-0903-8541 · licensing: spektre.labs@proton.me · web: spektrelabs.org

Independent research. No institution. No funding.
Helsinki, Finland. One person. One invariant.

General models answer on demand. Creation OS measures first — ACCEPT, RETHINK, or ABSTAIN on the wired interrupt — with archived receipts where claims apply.
Every fork ships the code, the theory, and the proofs for inspection.
Complete. Sovereign. Auditable.

Ω = argmin ∫σ dt subject to K ≥ K_crit.
assert(declared == realized);
1 = 1.

2026 · Spektre Labs · Lauri Elias Rainio · Helsinki
ORCID: 0009-0006-0903-8541