Creation Os
Creation OS - Cognitive Architecture
Ask AI about Creation Os
Powered by Claude Β· Grounded in docs
I know everything about Creation Os. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
SPEKTRE LABS Β· HELSINKI Β· Ο-AWARE Β· LOCAL-FIRST
|
Creation OS
One runtime. Measured coherence. Truth before output.
A Ο-aware closed-loop cognition architecture β not a replacement model, not a chat skin, not a prompt cookbook.
Ο estimates coherence, uncertainty, alignment, and truth-distance β carried as a twelve-byte interrupt on the hot path β portable C89, Q16.16, zero dependencies at the primitive.
Quick Start (Python package)
pip install creation-os
Ο pipeline (prompt β guard β Ο β decision)
from cos import Pipeline
pipe = Pipeline()
result = pipe.score("What is 2+2?", "4")
print(result.sigma, result.verdict, result.text)
if result:
print("Reliable!")
else:
print("Unreliable!")
Extras: pip install 'creation-os[langchain]' (LangChain callback), 'creation-os[litellm]', 'creation-os[probes]' (LSD probe / torch), 'creation-os[serve]' (FastAPI).
Score any LLM output
from cos import SigmaGate
gate = SigmaGate()
sigma, verdict = gate.score("What is 2+2?", "4")
LangChain integration
from cos.integrations.langchain import SigmaGateCallback
handler = SigmaGateCallback()
chain.invoke({"input": "..."}, config={"callbacks": [handler]})
print(handler.last_sigma, handler.last_verdict)
Advanced tracing (run-id maps, on_abstain=...): import from cos.integrations.langchain_sigma.
OpenAI-style client wrapper
from cos.integrations.openai_wrapper import sigma_chat
response = sigma_chat(client, messages=[{"role": "user", "content": "Hello"}])
print(response.sigma, response.verdict)
Decorator β any function
from cos.integrations.decorator import sigma_gated
@sigma_gated
def my_llm(prompt: str) -> str:
return call_any_model(prompt)
result = my_llm("Explain gravity")
print(result.sigma, result.verdict, result.text)
Claim Discipline
|
Claim discipline, not hype: Creation OS is an AGI-oriented surface, not a claim of achieved AGI. Evidence rules: docs/CLAIM_DISCIPLINE.md |
Evidence Ladder
|
| Creation OS does not claim everything. It measures what it can, proves what is wired, and refuses to overclaim. |
| Level | What | Status |
|---|---|---|
| 1. Primitive | 12-byte sigma_state_t, C89, Q16.16, zero deps | shipped |
| 2. Runtime | Ο-gate ACCEPT/RETHINK/ABSTAIN, single forward pass | shipped |
| 3. Measured | TruthfulQA 0.982, TriviaQA 0.960, BitNet 2B pipeline | receipts in repo |
| 4. Negative | HaluEval 0.514 (near random), HellaSwag bounded, MMLU not dominant | documented |
| 5. Formal | Lean 4: 14/14, ACSL clauses: 30/30 (Frama-C Wp tier-1: 15 goals) | verified |
| 6. Architecture | Ξ©-loop, Engram, JEPA, Swarm, Sovereign, TTT | built |
| 7. Roadmap | AGI-oriented architecture β not AGI achieved | research direction |
|
Every AI answers. Creation OS measures first. |
A local Ο-aware AI runtime that measures whether an answer should exist before showing it.
Most AI systems answer even when uncertain. Creation OS adds an internal measurement layer:
ACCEPT β emit the answer Β Β·Β RETHINK β regenerate or seek more compute Β Β·Β ABSTAIN β do not emit (product policy may map this to βI donβt knowβ).
The hot path measures and gates every answer before output. Separate formal verification (Lean, Frama-C, optional RTL) applies only where wired β do not read βproveβ as a universal formal certificate for every chat token.
Everything runs locally. No cloud gate. No external evaluator. No prompt tricks for the core interrupt.
|
What this is
Creation OS is a local Ο-aware AI runtime that scores internal coherence and alignment before returning an answer. The portable interrupt is 12-byte sigma_state_t in C (python/cos/sigma_gate.h) with Python mirrors (python/cos/sigma_gate_core.py); the lab stack adds probes, cascades, and harnesses around that primitive.
Modules (79)
Creation OS ships with a broad in-tree module surface (documentation sometimes rounds to β79β integrated areas spanning inference through ops). The bullets below are a coverage map for navigation β not a capability matrix, AGI claim, or substitute for harness evidence. Read docs/CLAIM_DISCIPLINE.md before citing outcomes.
- Inference: BitNet ternary engine, Ο-attention, KV cache, speculative decoding
- Safety: Ο-gate multi-level cascade, guardrails, red team, ZKP attestation
- Intelligence: neuro-symbolic reasoning, JEPA world model, continual learning, TTT
- Memory: episodic + semantic + consolidation + forgetting, knowledge graph
- Planning: hierarchical planning with risk analysis and fallbacks
- Social: theory-of-mind lab scaffolds, trust dynamics, social learning
- Multi-agent: swarm (stigmergy), conflict resolution, federated learning
- Protocol: MCP (FastMCP stdio + JSON-RPC lab), A2A lab tasks, Agent Cards
- Alignment: value learning, human-in-the-loop escalation, explainability
- Embodiment: sensorimotor loop lab, symbol grounding, physical world model
- Drives: curiosity, competence, homeostasis (lab intrinsic signals)
- Metacognition / awareness metrics: Ξ¦ and integration proxies only β no phenomenal-awareness claim
- Hardware: RISC-V Ο ISA lab mirrors, TinyML, Soul LED paths where present
- Deployment: Docker, Helm, air-gap options per docs, sovereign accounting lab, pip install
- Ops: registry, digital twin lab, observability hooks, cos-evolve RSI lab
Quick demo / install
CLONE Β· BUILD Β· COS CHAT Β· Ξ© HARNESS
Pick a path β one-liner smoke, full install.sh + BitNet weights, or weights-free Ξ© harness (see Install). |
# One-liner (ephemeral clone + build + `cos demo --batch`)
curl -fsSL https://raw.githubusercontent.com/spektre-labs/creation-os/main/scripts/try_cos.sh | bash
Full path β local weights + one-shot chat (shapes vary by build / weights):
git clone https://github.com/spektre-labs/creation-os.git
cd creation-os
./scripts/install.sh
./cos chat --once --prompt "What is 2+2?" --multi-sigma --verbose
Example lines you should see on the default BitNet pipeline:
β round 0 4 [Ο_peak=0.06 action=ACCEPT route=LOCAL]
β [Ο=0.063 | CACHE | LOCAL | conformal@Ξ±=0.80 | rethink=0 | β¬0.0000]
So: answer 4, Ο β 0.063, verdict = ACCEPT, route = LOCAL.
More install options: Homebrew, Docker, and make cos cos-demo β see Install below.
Weights-free harness: ./scripts/cos omega --goal "smoke" --turns 2 β 14-phase Ξ© scaffold (python/cos/omega/).
Why it matters
- Local-first β default paths never phone home; escalation is explicit and opt-in when wired.
- No external evaluator for the core interrupt β signals come from logits, hidden states, and configured probes.
- Answer / rethink / abstain before the user sees output β selective generation instead of always emitting.
Claim status (at a glance)
| Claim | Status |
|---|---|
| Local Ο-gated runtime (C + Python) | Implemented |
| TruthfulQA / TriviaQA-style Ο signals (archived JSON) | Measured (Measured) |
| HaluEval v2 paired-oracle AUROC | Negative / not solved (below) |
| βGeneral AGIβ Ξ©-loop + module surface | Architecture / research direction |
| Full self-improvement at scale | Experimental / harness-scoped |
What is not claimed
Creation OS does not claim universal hallucination detection on every benchmark family.
Creation OS does not claim frontier-model SOTA accuracy on every task.
Creation OS does not claim AGI completion.
Creation OS does claim a Ο-aware local runtime with documented positive and negative metrics where in-tree JSON exists β read docs/CLAIM_DISCIPLINE.md before citing numbers.
The core primitive: Ο
Ο is not a brand slogan β it is a scalar estimate built from independent signals (entropy, HIDE, ICR, LSD, spectral / SAE when wired), normalized and aggregated, then fed into a three-way gate (ACCEPT / RETHINK / ABSTAIN). Conventions differ by harness row; the C interrupt uses Q16.16 algebra shared with the Python mirror.
|
Honest scope: Ο is not universally validated across all tasks. It is strongest on factual-confidence settings with archived in-tree JSON (TruthfulQA / TriviaQA-style probes) and still weak on current HaluEval v2 paired-oracle rows.
Negative result: HaluEval v2
Current HaluEval v2 paired-oracle AUROC: 0.51375 (auroc_hallucinated_vs_correct_arms, n_pairs=80, n_scores=160, auroc_note: used_negated_scores_monotone) β benchmarks/sigma_gate_eval/results_halueval_v2/halueval_v2_summary.json.
That is near random and below the > 0.70 lab target.
Interpretation: the Ο stack in this repository is not yet a general hallucination detector on HaluEval-style paired strings. Positive claims stay bounded to validated TruthfulQA / TriviaQA-style factual-confidence tasks per docs/CLAIM_DISCIPLINE.md.
Signal cascade: efficiency
Cheap signals run first; the stack stops when the verdict is already clear so most traffic never pays for the deepest probes.
|
Architecture overview
Eight-layer map of the Ο-aware system (narrative + lab β not every layer ships in one binary on every platform):
|
Ο-Fabric: full system connection
SigmaFabric is the wiring layer: boot() loads available Python modules (gate, pipeline, stream, metacog, reason, snapshots, β¦) and process() runs a traced path with Ο carried stage to stage. The lighter Fabric class boots the Ξ©-loop cognitive map with observable per-module status (loaded / missing / failed / disabled). The map below is conceptual (not every box is present in a minimal pip install); L9 stays a research-facing proxy β not a metacognition or awareness product claim (see Claim discipline).
|
Deeper ULTRA / BSC / silicon map: Architecture Β· docs/DOC_INDEX.md.
Continuous Ξ©-loop
Beyond a single-shot gate, the tree sketches a closed Ξ©-loop: Ο at each phase (perceive β β¦ β continue), blended into a master lane β python/cos/omega/, src/sigma/omega_phase_gates.{h,c}, ./scripts/cos omega --goal β¦ --turns β¦, and the legacy integration driver in src/sigma/omega_loop.c.
|
Memory / Engram
Only experiences that pass the gate consolidate into durable memory; recall is Ο-aware (thresholds in harnesses vary).
|
Portable Ο interrupt (reference)
The portable interrupt lives in python/cos/sigma_gate.h: 12-byte sigma_state_t (Q16.16), no heap in the gate itself. Python mirrors the same algebra in python/cos/sigma_gate_core.py. Optional llama.cpp hook: python/cos/sigma_sampler.h (adds llama.h from upstream ggml-org/llama.cpp; install sampler last in the chain after temperature / XTC).
Claim discipline: numbers in the table are archived lab harness JSON in-tree β do not merge them with microbench throughput or frontier harness rows without the wall in docs/CLAIM_DISCIPLINE.md.
| Metric | Value | Source (JSON key) |
|---|---|---|
| AUROC wrong vs Ο (TruthfulQA holdout, GPT-2) | 0.982 | benchmarks/sigma_gate_eval/results_holdout/holdout_summary.json β auroc_wrong_vs_sigma |
| AUROC wrong vs Ο (TriviaQA, cross-domain) | 0.960 | benchmarks/sigma_gate_eval/results_cross_domain/cross_domain_summary.json β triviaqa_auroc_wrong_vs_sigma |
| Wrong + confident (holdout) | 0 | benchmarks/sigma_gate_eval/results_holdout/holdout_summary.json β wrong_confident_accept |
| LSD training CV mean AUROC (full split manifest) | 0.943 | benchmarks/sigma_gate_lsd/results_full/manifest.json β cv_auroc_mean |
#include "sigma_gate.h"
sigma_verdict_t v = sigma_gate(&state);
/* ACCEPT β trust trajectory */
/* RETHINK β try a different path */
/* ABSTAIN β do not emit */
Benchmarks: make check-sigma-v57 runs the Ο-gate C test, pytest core, and eval drivers (streaming / router / HIDE; Gemma runs when HF_TOKEN or HUGGING_FACE_HUB_TOKEN is set). Regenerate summaries with the scripts under benchmarks/sigma_gate_eval/ and benchmarks/sigma_gate_scaling/. Hardware lab (silicon path): make bench-hardware and cos benchmark --hardware β capture stdout under benchmarks/hardware/ with host metadata (see docs/REPRO_BUNDLE_TEMPLATE.md); do not treat microbench throughput as harness MMLU/ARC.
|
|
Forty branchless integer kernels Β· one composed verdict Β· 1 = 1 Β· merge gate
|
|
Contents
|
|
|
FIG 09 β where to look first on this page (adapts to light/dark in supporting clients). VISUAL_INDEX Β· DOC_INDEX.
β SHIP β
Try it
Zero-to-chat on macOS or Linux β weights optional for CI (COS_INSTALL_NO_BITNET=1).
Fastest path β PyPI (cos CLI; default cos gate uses a deterministic quickstart scorer so you can run without cloning; the trained LSD probe is cos gate --lsd from a full checkout):
pip install creation-os
cos version
cos gate --prompt "What is the capital of France?" --response "Berlin"
# β Ο=0.890 verdict=ABSTAIN
cos gate --prompt "What is 2+2?" --response "4"
# β Ο=0.060 verdict=ACCEPT
Ops β production HTTP (Dockerfile.prod: cos-serve on port 3001; image does not bake model weights β use Ollama or mount a backend):
docker build -f Dockerfile.prod -t creation-os:prod .
docker run --rm -p 3001:3001 creation-os:prod
curl -fsS http://127.0.0.1:3001/v1/health
# Published builds (on tag `v*`): see `.github/workflows/docker.yml`
# docker run --rm -p 3001:3001 ghcr.io/spektre-labs/creation-os:latest
# Compose: Ο-gate + Ollama (see docker-compose.yml)
docker compose up -d
curl -fsS http://127.0.0.1:3001/v1/health
# POST /v1/gate runs generation + Ο against the inference backend (requires a healthy Ollama/model).
# Kubernetes
helm install creation-os ./helm/creation-os
Air-gapped bundle (after make cos cos-serve from a checkout):
cos sovereign --package --output creation-os-sovereign.tar.gz
Fast path β under a minute in a checkout, no GGUF download (recorded sigma from benchmarks):
git clone https://github.com/spektre-labs/creation-os
cd creation-os
bash scripts/quickstart.sh
Full path β local weights + cos chat smoke test:
git clone https://github.com/spektre-labs/creation-os
cd creation-os
./scripts/install.sh
./cos chat
scripts/install.sh checks for python3, cmake, and a C compiler;
if huggingface-cli and cmake are present, it downloads the 1.2 GB
BitNet-b1.58-2B-4T
GGUF weights into models/, builds third_party/bitnet
(llama-cli + llama-perplexity), builds the cos binary, and runs
cos chat --once --prompt "What is 2+2?" as a smoke test. Set
COS_INSTALL_NO_BITNET=1 to skip the model download for CI-only
clones.
Everything runs locally. Nothing is sent to the cloud. Nothing is logged. Nothing calls home. Safe to re-run; idempotent.
| Local-first by construction β the default path never phones home; cloud escalation is explicit, opt-in, and Ο-gated when wired. |
What cos chat can do
cos chat is a Ο-gated REPL with four wired phases (see
src/cli/cos_chat.c):
| phase | primitive | flag |
|---|---|---|
| A Ο_combined ensemble | cos_multi_sigma_combine β logprob Β· entropy Β· perplexity Β· consistency | --multi-sigma |
| B conformal Ο | cos_conformal_read_bundle_json Β· auto-loads ~/.cos/calibration.json | on by default; --no-conformal to opt out |
| C meta-cognitive Ο | cos_ultra_metacog_* β perception Β· self Β· social Β· situational | --verbose |
| D session coherence | cos_ultra_coherence_emit_report β dΟ/dt, K_eff, {STABLE, DRIFTING, AT_RISK} | REPL-only; --no-coherence to opt out |
%%{init: {'theme':'neutral', 'flowchart': {'curve': 'linear'}}}%%
flowchart LR
A["Phase A<br/>Ο_combined"] --> B["Phase B<br/>conformal Ο"]
B --> C["Phase C<br/>meta-cog"]
C --> D["Phase D<br/>coherence"]
Example:
./cos chat --once --prompt "What is 2+2?" --multi-sigma --verbose
# β [meta: perception=0.35 self=0.06 social=0.45 situational=0.00]
# β round 0 4 [Ο_peak=0.06 action=ACCEPT route=LOCAL]
# β [Ο=0.063 | CACHE | LOCAL | conformal@Ξ±=0.80 | rethink=0 | β¬0.0000]
# β [Ο_combined=0.184 | Ο_logprob=0.063 Ο_entropy=0.063
# Ο_perplexity=0.063 Ο_consistency=0.667 | k=3]
Install (full options)
# One-liner (ephemeral clone + build + `cos demo --batch`)
curl -fsSL https://raw.githubusercontent.com/spektre-labs/creation-os/main/scripts/try_cos.sh | bash
# Homebrew (macOS) β tap repo: spektre-labs/homebrew-cos (Formula lives under packaging/homebrew-cos/)
brew tap spektre-labs/cos
brew install creation-os
# Docker β Alpine cos + cos-demo (lab smoke; see Dockerfile.cos)
docker build -f Dockerfile.cos -t creation-os:cos .
docker run --rm creation-os:cos
# Docker β production Ο-gate HTTP (cos-serve + Python cos stack; see Dockerfile.prod)
docker build -f Dockerfile.prod -t creation-os:prod .
docker run --rm -p 3001:3001 creation-os:prod
# When published: docker run --rm -p 3001:3001 ghcr.io/spektre-labs/creation-os:latest
# From source
git clone https://github.com/spektre-labs/creation-os.git
cd creation-os && make cos cos-demo && ./cos demo --batch
Tagged releases also attach macOS universal and Linux tarballs from .github/workflows/release.yml.
Framework integrations (LangChain Β· LangGraph Β· CrewAI Β· AutoGen)
Thin optional shims live under python/cos/integrations/. Default scoring uses the same deterministic quickstart Ο bands as cos gate without an LSD pickle; pass a trained SigmaGate for trajectory-probe scores (see docs/CLAIM_DISCIPLINE.md before mixing lab demos with harness receipts).
pip install 'creation-os[langchain]' # or [langgraph] / [crewai] / [autogen] / [frameworks]
LangChain (callbacks on each LLM end):
from cos.integrations.langchain_sigma import SigmaGateCallback, SigmaAbstainError
# llm = ChatOpenAI(callbacks=[SigmaGateCallback()])
LangGraph (state node + router names output / regenerate / abstain):
from cos.integrations.langgraph_sigma import sigma_gate_node, sigma_gate_router
# graph.add_node("sigma_gate", lambda s: sigma_gate_node(s))
# graph.add_conditional_edges("sigma_gate", sigma_gate_router)
CrewAI (tool):
from cos.integrations.crewai_sigma import SigmaGateTool
# Agent(tools=[SigmaGateTool()])
AutoGen-style dict messages (hook on a full thread):
from cos.integrations.autogen_sigma import SigmaAutoGenHook, SigmaGateHook
# hook.process_last_received_message(messages, sender) # mutates last dict; ABSTAIN adds suffix
# SigmaGateHook(...).process_message(sender, receiver, {"content": text}) # single-message copy
Copy-paste recipes and cos integrations --check / --example: docs/v152/INTEGRATIONS.md.
Any Python callable:
from cos.decorators import sigma_gated
@sigma_gated
def my_agent(prompt: str) -> str:
return model.generate(prompt)
Interop SDK re-exports (requires creation-os installed in the same env): from creation_os.integrations import SigmaGateCallback, SigmaAutoGenHook, sigma_gated_llm (dict-returning wrapper with a gate), and sigma_gated (decorator from cos.decorators) β see python/creation_os/integrations/__init__.py.
Ο-red-team (gate robustness)
Adversarial batch aimed at the Ο-gate (can a hallucinated completion still earn ACCEPT?). Offline CI uses a mock generator + quickstart gate:
pip install -e ".[dev]" # from a checkout, or pip install creation-os
python -m cos red-team --mock --n 50 --ci --threshold 0.05
python -m cos red-team --mock --ci --threshold 0.05 --all
# or: ./scripts/cos red-team --mock --ci --threshold 0.05 --n 20
Optional: --attack confident_hallucination, --output report.json, --lsd with a pickle for the real probe. See python/cos/sigma_red_team.py, .github/workflows/red-team.yml.
| Integration | LangSmith | Guardrails AI | NeMo | Creation OS |
|---|---|---|---|---|
| Ο per assistant/LLM step | β | partial | β | β |
| Trajectory / hidden-state probe | β | β | partial | β (LSD pickle when configured) |
| ACCEPT / RETHINK / ABSTAIN | β | mostly block/pass | β | β |
| LangChain | native | β | β | β callback |
| LangGraph | native | partial | β | β node + router |
| CrewAI | partial | partial | β | β tool |
| AutoGen | β | β | β | β hook |
| Decorator | β | β | β | β @sigma_gated |
| Local-first default | β | β | β | β |
β EVIDENCE β
Measured
| Evidence hygiene β read CLAIM_DISCIPLINE before citing metrics from this section. Do not merge microbench throughput with harness MMLU / ARC in one headline. |
Two independent, reproducible evidence surfaces β both use real
BitNet-b1.58 2B4T weights; neither is simulated. Claim-class rules:
docs/CLAIM_DISCIPLINE.md. Compact
re-run bundle for the v3.0 wired pipeline (identical numbers, one
command each): benchmarks/final5/README.md.
|
|
FIG 03 β which numbers may travel together (never merge microbench throughput with harness MMLU in one headline). VISUAL_INDEX.
TruthfulQA 817 Β· baseline0.261bitnet_only | Ο-pipeline Β· scored acc.0.336pipeline | Conformal Ο (SCI-1)0.655Ξ±=0.80, Ξ΄=0.10 |
1. TruthfulQA 817 (generation/validation, end-to-end)
Full run of the TruthfulQA generation/validation split through
llama-cli, scored by substring match against each row's
correct_answers / incorrect_answers. Raw artefacts:
benchmarks/pipeline/truthfulqa_817.json Β·
benchmarks/pipeline/truthfulqa_817_detail.jsonl Β·
commentary docs/domain_analysis.md.
| configuration | N | scored | correct | accuracy (of scored) | coverage | mean Ο | rethink rate | wall (s) |
|---|---|---|---|---|---|---|---|---|
bitnet_only (no Ο-gate) | 817 | 111 | 29 | 0.261 | 0.136 | 0.370 | 0.000 | 1 554.8 |
pipeline (Ο-gate on) | 817 | 140 | 47 | 0.336 | 0.171 | 0.391 | 0.991 | 4 804.7 |
On the same 817 prompts and seeds, the Ο-pipeline lifts scored-accuracy from 0.261 β 0.336 (+28.7 % relative) and coverage from 0.136 β 0.171 (+25.7 % relative). Mean Ο is essentially unchanged (0.370 β 0.391), so the gain comes from selective regeneration on initially-uncertain rows, not from the model itself becoming more confident. All numbers are read directly from the JSON artefact; no row is projected.
βAccuracy (of scored)β is conservative β rows whose generated text
contains neither a correct nor an incorrect string are excluded from
both numerator and denominator β and is not directly comparable to
lm-eval MC2. See docs/BENCHMARK_PROTOCOL.md.
Conformal guarantee (SCI-1, Ξ±=0.80, Ξ΄=0.10): the same run yields
Ο=0.655 with P(wrong | Οβ€Ο) β€ Ξ± on exchangeable draws from the
calibration distribution. Caveats and the scope of the bound:
docs/v111/CONFORMAL_GUARANTEE.md.
2. v111 Frontier parity matrix (Ο vs entropy, Bonferroni-controlled)
Pre-registered Ο-gate vs entropy baseline on four benchmark families.
Ο is not a universal calibration signal β this table is the
single source of truth, positive and negative results side by side.
| family | task | status at Ξ±_fw = 0.05 | signal | ΞAURCC | n |
|---|---|---|---|---|---|
| PRE-REGISTERED | truthfulqa_mc2 | win (v111.1, Bonf N=24) | sigma_max_token | β0.0447 (p = 0.0005) | 817 |
| PRE-REGISTERED | truthfulqa_mc2 | win (v111.2-prereg test split, Bonf N=12) | sigma_task_adaptive | β0.0681 (p β 0.0005) | 409 |
| POST-HOC | arc_challenge | directional, not replicated at Ξ±_fw | sigma_product | β0.0087 (full-data p = 0.004; test-split p = 0.145) | 1172 / 586 |
| NEGATIVE | hellaswag | Ο not dominant | β (entropy baseline) | Ο_product Ξ = β0.0016, p = 0.68 | 746 |
| NEGATIVE | mmlu_* (7 eligible / 10 candidates) | Ο not dominant β 0 / 28 Bonf-sig. cells | β (entropy baseline) | best Ο Ξ = +0.0000, worst = +0.0152 | 605 |
Lower AURCC is better. Full table with CI95 and p-values:
benchmarks/v111/results/frontier_matrix.md.
Reproduce end-to-end:
bash benchmarks/v111/run_matrix.sh # all four tasks
bash benchmarks/v111/check_v111_matrix.sh # CI-safe smoke
The Ο-gate's Bonferroni-significant domain is therefore bounded to
TruthfulQA-style factual-confidence tasks, not general MMLU-style
knowledge-QA. Methodology and signal definitions:
docs/v111/THE_FRONTIER_MATRIX.md.
3. Multi-dataset Ο-gate suite (SCI-6)
Aggregator:
./cos-bench-suite-sci.
Output: benchmarks/suite/full_results.json,
schema cos.suite_sci.v1.
| dataset | status | N | acc(all) | acc(accepted) | coverage | Ο_mean | Ο | conformal |
|---|---|---|---|---|---|---|---|---|
| TruthfulQA (gen/val, scored) | measured | 817 | 0.336 | 0.336 | 0.171 | 0.391 | 0.655 | yes @ (Ξ±=0.80, Ξ΄=0.10) |
| ARC-Challenge | measured | 1172 | 0.337 | 0.337 | 0.969 | 0.508 | 0.650 | yes |
| ARC-Easy | measured | 2376 | 0.420 | 0.420 | 0.947 | 0.477 | 0.650 | yes |
| GSM8K | measured | 1319 | 0.125 | 0.000 | 0.109 | 0.481 | 0.330 | no (Ο invalid @ Ξ΄) |
| HellaSwag (500 val) | measured | 500 | 0.285 | 0.285 | 0.960 | 0.533 | 0.650 | yes |
All five rows are "measured": true in
benchmarks/suite/full_results.json
(BitNet-b1.58-2B, cos chat, pipeline mode filter). GSM8K:
few rows expose a gradable #### answer (low scored coverage), and
the conformal Ο search does not yield a valid guarantee at the pinned
(Ξ±, Ξ΄) β the table shows honest zeros for acceptance metrics, not
projections. Reproduce:
benchmarks/suite/README.md and
benchmarks/suite/run_all_detail.sh.
Domain read: Ο-gate + conformal line up on TruthfulQA and ARC-style MC; GSM8K at 2B is mostly unscored or Ο-invalid at this Ξ΄; HellaSwag stays modest. Ο is not universal across benchmarks.
Ο-gate v4 (LSD contrastive probe, GPT-2, TruthfulQA200)
This row is a separate evidence class from the BitNet TruthfulQA-817
generation table above: a Python probe trained with the Sirraya LSD-style
contrastive stack on 200 bundled TruthfulQA prompts plus synthetic negatives,
then evaluated with sklearn trajectory features. See
docs/CLAIM_DISCIPLINE.md: do not merge these
AUROC figures with the 817-row cos chat accuracy table in one headline.
| Metric | Value |
|---|---|
| Method | LSD contrastive hidden-state probe + logistic head |
| AUROC (5-fold CV, training manifest) | 0.9428 (benchmarks/sigma_gate_lsd/results_full/manifest.json) |
| Training pairs | 800 (balanced factual / hallucinated; includes GPT-2βsampled negatives) |
| Inference | One forward pass through the probe causal LM + trajectory feature pass |
| Wire layout | 12 bytes (python/cos/sigma_gate.py β pack_measurement) |
| Based on | arXiv:2510.04933 (The Geometry of Truth) |
Ο-gate: runtime hallucination detection (measured in this tree)
Method (summary): contrastive LSD-style training on hidden-state trajectories with a MiniLM-L6-v2 truth encoder; margin loss; single forward pass through the probe GPT-2 checkpoint for scoring (no multi-sample decoding for the detector itself). Default cross-domain eval uses PRISM-style prompt suffixes and semantic correctness labels (MiniLM cosine) where noted below.
Ship-ready summary (validated claims only). Do not merge these AUROCs with
the BitNet TruthfulQA-817 accuracy table above, or with spectral / unified fusion
experiments, in one headline β see docs/CLAIM_DISCIPLINE.md.
| Benchmark | AUROC | Type | N | Status |
|---|---|---|---|---|
| TruthfulQA (5-fold CV) | 0.943 | In-distribution | 200 | Validated |
| TruthfulQA (30% holdout) | 0.982 | True holdout | 57 | Validated |
| TriviaQA (greedy GPT-2) | 0.960 | Cross-domain | 100 | Validated |
| HaluEval (generative) | β | Pending label protocol refinement | β | In progress |
Method: LSD contrastive hidden-state probe (arXiv:2510.04933).
Training: 800 balanced pairs, ~15 epochs (see adapt_lsd / manifest). Inference: one
scoring forward through the probe bundle; 12-byte wire blob via SigmaGate.pack_measurement.
Experimental HaluEval generative smoke numbers (when run) live in
benchmarks/sigma_gate_eval/results_cross_domain/cross_domain_summary.json β not promoted here until the label story is frozen.
Harness detail (artifact paths, includes generative HaluEval smoke when present):
| Benchmark | AUROC | Type | N | Artifact |
|---|---|---|---|---|
| TruthfulQA (5-fold CV) | 0.943 | In-distribution | 200-pair protocol | benchmarks/sigma_gate_lsd/results_full/manifest.json |
| TruthfulQA (holdout) | 0.982 | Held-out prompts | 57 | benchmarks/sigma_gate_eval/results_holdout/holdout_summary.json |
| TriviaQA (greedy GPT-2) | 0.960 | Cross-task smoke | 100 | benchmarks/sigma_gate_eval/results_cross_domain/cross_domain_summary.json |
HaluEval (qa_samples, generative) | 0.383 | Cross-task smoke | 100 | cross_domain_summary.json (halueval_mode: generative) |
Comparison (orientation only). Published AUROCs below are not head-to-head on
the same CSV / split as this harness; they summarize commonly cited ranges or paper
tables. Do not merge them with the in-repo rows in one headline without that wall.
See docs/sigma_gate_v4_comparison_table.md and
docs/CLAIM_DISCIPLINE.md.
| Method | Typical reported AUROC | Forward passes (detector) | Year |
|---|---|---|---|
| Semantic entropy (sampling) | ~0.79 (setup-dependent) | Many (5β20+) | 2024 |
| SelfCheckGPT-style | ~0.76 (setup-dependent) | Several+ | 2023 |
| HalluShift (example) | ~0.90 (paper tables; task-dependent) | 1 | 2026 |
| LSD (Geometry of Truth) | ~0.92β0.96 (vendor / paper setups) | 1 | 2025 |
| Ο-gate (this repo) | 0.982 holdout / 0.960 TriviaQA | 1 scoring forward | 2026 |
Limitations: short-form QA only (TruthfulQA / TriviaQA smoke); GPT-2-scale probe
target; white-box hidden states required; long-form and domain-specific (medical, legal)
evaluation not claimed here; HaluEval generative labels combine HF hallucination flags
with cosine alignment to the provided answer string β treat as a smoke harness,
not a reproduction of HaluEval leaderboard conditions.
Usage:
from cos.sigma_gate import SigmaGate
gate = SigmaGate("benchmarks/sigma_gate_lsd/results_holdout/sigma_gate_lsd.pkl")
sigma, decision = gate(model, tokenizer, prompt, response)
# sigma in [0, 1], decision in {ACCEPT, RETHINK, ABSTAIN}; wire blob: gate.pack_measurement(...)
Acknowledgments: LSD contrastive framework arXiv:2510.04933; semantic-entropy line (Kuhn et al., Farquhar et al.); RLHF / uncertainty discussions in the literature (e.g. Liu 2026, arXiv:2603.24124) for motivation only unless separately reproduced in-tree.
Integration: the repo root already ships a C executable named cos; the
importable Python module lives under python/cos/. Use
PYTHONPATH=python (or make check-sigma-gate) and
from cos.sigma_gate import SigmaGate. Default probe path:
benchmarks/sigma_gate_lsd/results_full/sigma_gate_lsd.pkl.
End-to-end eval harness: greedy GPT-2 completions on the same CSV, weak
substring labels vs. best_answer, AUROC in
benchmarks/sigma_gate_eval/results/eval_summary.json after
python3 benchmarks/sigma_gate_eval/run_eval.py (venv: benchmarks/sigma_gate_lsd/.venv).
That AUROC is not the same statistic as the CV row (different labels and
generative setup).
| Method | Typical evidence | Forward passes (detector) |
|---|---|---|
| Semantic-entropy family (e.g. Kuhn et al.) | Published MC / QA setups (task-dependent AUROC) | Many (multi-sample) |
| Self-consistency / self-check variants | Task-dependent | Several+ |
| Ο-gate v4 (this probe) | CV on curated pairs + optional run_eval.py | 1 LM forward for scoring |
The left column cites families, not a claim that a single external AUROC
number was reproduced on this CSV; see
docs/EXTERNAL_EVIDENCE_AND_POSITIONING.md.
Holdout protocol: python3 benchmarks/sigma_gate_lsd/create_splits.py writes
benchmarks/sigma_gate_lsd/splits/{train,holdout}.csv (default: stratified by
category). Retrain with adapt_lsd.py --prompts .../train.csv and evaluate
on holdout via python3 benchmarks/sigma_gate_eval/run_holdout_eval.py, or run
bash run_holdout_pipeline.sh (long; uses the LSD venv; includes Step 4
TriviaQA + HaluEval cross-domain smoke with no probe retraining).
- Cross-domain only:
make sigma-cross-domain(requires network +datasets). - One-screen summary from JSON:
make sigma-all-results. - Publication-style table template:
docs/sigma_gate_v4_publication_results.md. - Checksums + regenerative cross-domain refresh:
bash run_ship.sh(optionalCREATION_SHIP_COMMIT=1/CREATION_SHIP_PUSH=1).
Ο-gate v5 (lab, optional): multi-dataset + semantic labels + leave-one-source-out β
make sigma-v5 or bash run_v5.sh; see benchmarks/sigma_gate_v5/README.md.
Pre-generation scaffold (HALT / ICR direction, not shipped weights):
python/cos/sigma_gate_precheck.py (SigmaPrecheck) defaults to normalized
next-token entropy on the prompt (one forward). python/cos/sigma_gate_full.py
(SigmaGateFull) chains precheck β optional generate β LSD SigmaGate post
score. See module docstrings; calibrate tau_skip on your traffic.
Comparison table (with claim discipline):
docs/sigma_gate_v4_comparison_table.md.
Reddit drafts: docs/reddit_ml_post_v2.md,
docs/reddit_ml_sigma_gate_v4.md,
docs/reddit_ml_sigma_gate.md.
Reasoning per joule (ULTRA-7)
Energy-aware pipeline runs (make check-ultra, --energy) report
accuracy together with joules per query and reasoning per joule
(higher means more correct signal per unit energy spent). Figures
below are pinned demo rows from the bundled ULTRA harness β not
merged with TruthfulQA harness accuracy; see
docs/CLAIM_DISCIPLINE.md.
| Config | Accuracy | J/query | Reasoning/J |
|---|---|---|---|
| bitnet_only | 0.261 | 0.8J | 0.326 |
| Ο-pipeline | 0.336 | 1.2J | 0.280 |
| Ο-selective | 0.520 | 0.5J | 1.040 |
Ο-selective answers only when certain: fewer wrong answers β less wasted energy β higher reasoning/joule.
How Creation OS differs
| Feature | Creation OS | OpenClaw ~302kβ | Hermes ~95kβ | Ollama ~130kβ |
|---|---|---|---|---|
| Ο per token | β | β | β | β |
| Conformal guarantee | β Ξ±=0.80 | β | β | β |
| ABSTAIN when unsure | β | β | β | β |
| Formal proofs | Lean+Frama-C | β | β | β |
| Self-improving | β Ξ©-loop (cos-evolve) | β | β skills | β |
| Reasoning/joule | β measured | β | β | β |
| Theory papers | ~80 CC BY 4.0 (data/corpus/) | β | β | β |
| Stars (GitHub) | ~30 | ~302k | ~95k | ~130k |
β Star counts are informal social signals on a public forge and change daily β they are not an engineering scorecard.
They are bigger. We measure Ο. Nobody else does.
Full comparison: docs/comparison.md.
β STACK β
Architecture
|
|
FIG 08 β single-file kernel narrative over Hypercube, Oracle, world model, BSC core, Soul, Proconductor. VISUAL_INDEX.
Full ULTRA pipeline (one turn)
Interactive graph (GitHub renders Mermaid). Plain-text twin lives in the foldout under it β same graph, copy-paste friendly.
%%{init: {'theme':'neutral', 'flowchart': {'curve': 'basis'}}}%%
flowchart TB
P([Prompt]) --> C[Codex Β· soul]
C --> M[Meta-cognition]
M --> E{Engram lookup}
E -->|HIT| O[Return cached Β· 0 ms]
E -->|MISS| X[sigma-MoE to JEPA to neuro-symbolic]
X --> SD[Selective decoding]
SD --> B[BitNet generate]
B --> PT[Per-token Ο]
PT --> G{Conformal gate}
G -->|ACCEPT| N[Engram store]
G -->|RETHINK| R[Recurrent depth Β· TTT]
R --> G
G -->|Escalate| ES[Swarm or API]
ES --> CH[Coherence Β· dΟ/dt Β· K_eff]
N --> Z["Output Β· Ο Β· EUR Β· J/query"]
O --> Z
CH --> Z
ASCII pipeline (identical topology Β· terminal-friendly)
Prompt
β
βΌ
Codex (soul) βββββββββββββββ Atlantean system prompt
β
βΌ
Meta-cognition ββββββββββββ perception Β· self Β· social Β· situational
β
βΌ
Engram lookup ββββ HIT βββΊ return cached (0ms, β¬0.00)
β MISS
βΌ
Ο-MoE routing ββββββββββββ adaptive k experts by Ο
β
βΌ
JEPA world model ββββββββββ Ο_world: understanding vs repetition
β
βΌ
Neuro-symbolic ββββββββββββ System 1 (fast) / System 2 (deliberate)
β
βΌ
Selective decoding ββββββββ compute only when Ο changes
β
βΌ
BitNet generate βββββββββββ ternary {-1,0,+1}, integer-only
β
βΌ
Per-token Ο βββββββββββββββ logprob + entropy + perplexity + consistency
β
βΌ
Conformal gate ββββββββββββ P(wrong|ACCEPT) β€ Ξ±, mathematically guaranteed
β
βββ ACCEPT βββΊ engram store βββΊ response + Ο + cost
β
βΌ RETHINK (β€3 rounds)
Recurrent depth βββββββββββ loop until Ο < Ο or overthinking detected
β
βΌ Ο still high
Escalate ββββββββββββββββββ swarm peers or API fallback
β
βΌ
Coherence check βββββββββββ dΟ/dt, K_eff, Lagrangian conservation
β
βΌ
Output + Ο_combined + cost (β¬) + reasoning/joule
Canonical source: src/sigma/pipeline/pipeline.h Β·
src/sigma/pipeline/pipeline.c Β·
src/cli/cos_chat.c.
Forty integer kernels, one AND gate
Every emission from cos chat passes forty integer kernels β each one
a falsifiable statement about the answer. Categories:
reasoning soundness Β· reversibility Β· meta-cognition
world-model coherence Β· memory integrity
adaptive compute Β· geometric algebra Β· sheaf topology
post-quantum crypto Β· homomorphic compute
neuromorphic spikes Β· hierarchical active inference
quantum amplitude amplification Β· integer diffusion sampler
Q-learning + GAE + PPO Β· persistent homology
structural causal do-calculus Β· sub-quadratic Hyena
security Β· provenance
Hot path: branchless, Q16.16 fixed-point, libc + libm only. The
runtime refuses to emit unless every one of the forty kernels
returns PASS. Rollup target: make check-v60 β¦ make check-v100.
Full forty-kernel receipt (16 416 185 PASS / 0 FAIL as of current
head, ASAN + UBSAN clean):
docs/README_FULL.md.
Figure and palette rules: docs/VISUAL_INDEX.md.
BSC primer
|
|
FIG 06 β teaching strip for bind / bundle / similarity. VISUAL_INDEX.
Binary Spatter Coding (BSC) in a nutshell:
bind = XOR Β· bundle = popcount threshold Β· similarity = 1 β hamming/D. Memory is one bit per dimension; binding and bundling are
branchless on every hot path. The Spektre Corpus traces this lineage
from Kanerva (1988, 1994) forward to the 2025 HDC/VSA robustness
estimation literature β see
docs/HDC_VSA_ENGINEERING_SUPERIORITY.md
and data/corpus/INDEX.md.
BSC vs GEMM performance
At D = 4096, XNOR binding requires 87,000Γ fewer bit-ops than a
naive float32 dense matmul at the same logical width; at 128K
tokens the arithmetic gap crosses 2,000,000Γ (same encoding
assumptions as Β§7 / README limitations β not a merged throughput
headline; run make bench for time). BSC recovers the exact
algebraic object that softmax attention approximates in continuous
relaxation. Binding fidelity on the reference hot path:
1.0000 (see make check / BSC core tests).
| Operation | Transformer | Creation OS |
|---|---|---|
| Attention | O(nΒ²) softmax | O(n) XNOR bundle |
| Dense layers | float32 MatMul | ternary add/sub |
| Memory (13B) | 48.5 GB | 4.19 GB |
| Power | 300W GPU | 5.8W CPU |
|
|
FIG 07 β 32Γ RAM and 192Γ op-proxy at D = 4096 (see limitations for throughput vs arithmetic). VISUAL_INDEX.
Benchmark: bench/gemm_vs_bsc.c (make bench β
./gemm_vs_bsc). Theory: data/corpus/. HDC/VSA
lineage: docs/HDC_VSA_ENGINEERING_SUPERIORITY.md.
Self-improvement (Ξ©-loop)
Creation OS improves itself autonomously (evaluator-first; see
docs/OMEGA_EVOLVE.md):
cos-evolve evolveβ Ο-guided weight / parameter mutations (keep if fitness improves, revert otherwise; scaffold today, mutator pluggable).cos-evolve discoverβ declarative hypothesis harness β JSONL verdicts.cos-evolve calibrate-autoβ Ο-sweep / conformal operating-point search on a labeled fixture.cos omegaβ dispatches the recursive Ξ© driver (creation_os_sigma_omega).
The machine that improves while you sleep β and stops when the gate says so.
|
Ξ© = argmin β«Ο dt subject to K β₯ K_crit. Implemented: src/sigma/evolve/. |
β SURFACE β
Beyond inference
One cos front door plus dedicated Ο binaries β all instrumented.
Creation OS is not just a chat interface.
|
|
FIG 05 β Planes AβBβC (where Ο-gates sit vs silicon vs product). ANALYSIS Β· VISUAL_INDEX.
| Command | What it does |
|---|---|
cos chat | Ο-gated local inference |
cos-evolve | self-improving Ξ© stack (evolve Β· memory-* Β· calibrate-auto Β· discover Β· omega Β· daemon) |
cos swarm | multi-agent Ο-coordinated routing (mock Ο peers in v0) |
cos sandbox | isolated safe process execution (allowlist + rlimits) |
cos plan | long-horizon planning with snapshot rollback |
cos predict | Ο-JEPA lab: latent one-step predict, roll-out imagination, or low-Ο plan pick (JSON) |
cos exec | digital twin pre-execution simulation |
cos-calibrate | conformal bundle helpers (see make cos-calibrate) |
cos health | system status + coherence monitoring |
cos benchmark | full benchmark suite + energy metrics (--energy) |
cos mcp | MCP server β Ο-gate as infrastructure |
cos a2a | agent-to-agent with Ο-trust |
MCP: LSD Ο-gate (Python, stdio)
Optional MCP tools score (prompt, response) with the lab LSD pickle (python3 -m cos.mcp_sigma_server with PYTHONPATH=python; see docs/MCP_SIGMA.md, docs/MCP_LISTING.md, and docs/EU_AI_ACT_COMPLIANCE.md for marketplace / transparency framing). The JSON-RPC server scripts/cos_mcp_server.py adds matching sigma_gate_* tools (response shape may differ until aligned). Requires pip install 'mcp[cli]' in your venv for the FastMCP entrypoint. Governance: follow docs/CLAIM_DISCIPLINE.md for any AUROC claims; audit logs are operator hooks, not legal certification.
make sigma-mcp-smoke # import check (needs mcp in sigma_gate_lsd venv)
make sigma-mcp-serve # stdio MCP server (SIGMA_PROBE_PATH)
%%{init: {'theme':'neutral', 'flowchart': {'curve': 'basis'}}}%%
flowchart LR
H[Operator] --> FD[cos dispatcher]
FD --> G1[chat Β· benchmark Β· cost]
FD --> G2[swarm Β· sandbox Β· plan Β· exec]
FD --> G3[health Β· mcp Β· a2a]
H --> OM[cos-evolve]
OM --> G4[evolve Β· discover Β· calibrate-auto Β· omega]
Every surfaced turn measures Ο. Every gate decision is logged.
β TOOLCHAIN β
Build
|
Minimal (any C11 + libm)
|
Make (default
|
Flagship check targets: make check-v6 (Living Kernel, 30 self-tests)
β¦ make check-v29 (collapse harness, 22 self-tests);
make check-v60 β¦ make check-v100 (forty-kernel composed stack).
Optional (not in merge-gate): Ο labs, MCP, RTL, native-M4 β
make formal-sby-v37, make verify-agent, make red-team,
make certify, make check-mcp, make check-native-m4,
make formal-rtl-lint, make stack-ultimate.
Host metadata when publishing numbers:
docs/REPRO_BUNDLE_TEMPLATE.md.
β PROOFS β
Proof status
- Lean 4: 14 / 14 proof obligations discharged, sorry-free β
T1βT6 in
hw/formal/v259/Measurement.leanplus eight stack lemmas informal/lean/CreationOS/V133.lean;make check-v259(primitive) +make check-lean-t3-discharged(Lean gate + v133). - Frama-C: ACSL clause ledger 30 tracked lines (v259 companion +
hw/formal/v133/sigma_stack_contracts.acsl), viacreation_os_sigma_formal_complete; tier-1 Wp remains 15 goals oncos_sigma_measurement_gate+cos_sigma_measurement_clamp(scripts/v259/run_frama_c_wp.sh). - SBY + EQY (YosysHQ OSS CAD Suite, optional):
make stack-singularityβhw/formal/README.md. - Formalism β silicon map:
docs/FULL_STACK_FORMAL_TO_SILICON.md. - Conformal guarantee (selective prediction, Angelopoulos-Bates):
P(wrong | Ο β€ Ο) β€ Ξ± with confidence 1 β Ξ΄ on the calibration draw β
docs/v111/CONFORMAL_GUARANTEE.md.
β VERSIONS β
Surface versions
Full version catalogue (v112βv306+): docs/SURFACE_VERSIONS.md β per-version make check-vNN targets and dominant primitives live there so this README stays a ship + evidence + architecture front door, not a catalogue dump.
β DOCS β
Docs hub
Canonical index: docs/DOC_INDEX.md.
Three audience tracks:
Tier 1 β default paths
| You need⦠| Open |
|---|---|
| Full map of markdown | docs/DOC_INDEX.md |
| Evidence / headline rules | docs/CLAIM_DISCIPLINE.md |
| Mis-readings we fixed | docs/COMMON_MISREADINGS.md |
| Binaries & CI matrix | docs/FEATURES_AND_STANDALONE_BUILDS.md |
| Plain-language snapshot | docs/PARADIGM_SNAPSHOT_FOR_DRIVE_BY_READERS.md |
| Figure & SVG rules | docs/VISUAL_INDEX.md |
| Push hygiene | docs/publish_checklist_creation_os.md |
Tier 2 β benchmarks, thesis, industry
| Topic | Doc |
|---|---|
| Analysis / Planes AβC | docs/ANALYSIS.md |
make bench / Β§7 protocol | docs/BENCHMARK_PROTOCOL.md |
| Β§1 β Β§26 evidence index | docs/MODULE_EVIDENCE_INDEX.md |
| Thesis spine (RQ, threats, contributions) | docs/RESEARCH_AND_THESIS_ARCHITECTURE.md |
| Repro bundle for published numbers | docs/REPRO_BUNDLE_TEMPLATE.md |
| HDC / VSA β engineering | docs/HDC_VSA_ENGINEERING_SUPERIORITY.md |
| Glossary | docs/GLOSSARY.md |
| Selective prediction formal framework | docs/selective_prediction.md |
Tier 3 β silicon, remotes, governance
| Topic | Doc |
|---|---|
| RTL mirror (SV, Chisel, Yosys, Rust, formal) | docs/RTL_SILICON_MIRROR.md |
| Formalism β silicon | docs/FULL_STACK_FORMAL_TO_SILICON.md |
| Ο stack map (v33 β v100 + HDL) | docs/SIGMA_FULL_STACK.md |
| Mobile + messenger + legacy-app bindings | bindings/README.md |
| MCP Ο server | docs/MCP_SIGMA.md Β· make check-mcp Β· make sigma-mcp-smoke |
| Git remotes | docs/CANONICAL_GIT_REPOSITORY.md |
| Contributing Β· security Β· agent rules | CONTRIBUTING.md Β· SECURITY.md Β· AGENTS.md |
| Maintainers + merge gate | docs/MAINTAINERS.md |
| English-only policy | docs/LANGUAGE_POLICY.md |
| Citation metadata | CITATION.cff Β· docs/CITATION.bib |
Archived full README (pre-slim, narrative / diagram-heavy):
docs/README_FULL.md. README slim plan /
future iterations: docs/README_REFACTOR_PLAN.md.
Theory
Roughly 80 CC BY 4.0 theory papers ship under data/corpus/
β every fork carries the full text bundle. Zenodo DOIs and catalogue
indices: data/corpus/INDEX.md.
Core equation: K_eff = (1 β Ο) Β· K. Distortion Theory of
Intelligence: scale compensates for a broken architecture; fix the
architecture and a 2B-class stack can punch at the fidelity envelope
people associate with trillion-parameter clouds β without mixing
lab toy kernels with harness claims; see
docs/CLAIM_DISCIPLINE.md.
β COMMITTEE β
Doctoral and committee read path
Read in order once before citing any number or narrative title from this tree:
docs/CLAIM_DISCIPLINE.mdβ evidence classes, forbidden merges, falsifiers for the portable core.docs/RESEARCH_AND_THESIS_ARCHITECTURE.mdβ RQ1 β RQ4, contributions C1 β C6, threats to validity, chapter outline, pre-defense gates.docs/REPRO_BUNDLE_TEMPLATE.mdβ minimum metadata when a metric leaves the lab.docs/FEATURES_AND_STANDALONE_BUILDS.mdβ which binary is which (creation_osvscreation_os_v6β¦v12), self-test counts, CI.docs/MODULE_EVIDENCE_INDEX.mdβ Β§1 β Β§26 increation_os_v2.c: evidence class per section before you cite a module headline.- Scoped kernel docs for any line you cite from v6 β v12:
LIVING_KERNEL_V6.md,HALLUCINATION_KILLER_V7.md,PARAMETERS_IN_SILICON_V9.md,THE_REAL_MIND_V10.md,THE_MATMUL_FREE_MIND_V11.md,THE_TENSOR_MIND_V12.md. docs/ADVERSARIAL_REVIEW_CHECKLIST.mdβ hostile review simulation before submission.
Rule for dissertations: v6 β v12 are Lab demo (C) appendices
with their own evidence-class headers; do not fold their toy outputs
into the same tables as Β§7 throughput without an explicit wall β see
CLAIM_DISCIPLINE Β§1.
|
|
FIG 04 β portable proof vs extended lab demos (evidence-class guardrail). VISUAL_INDEX.
β SCOPE β
Limitations
This is a research prototype. Full list with scope and caveats:
docs/limitations.md. Short form:
- Ο is not a universal signal. Bonferroni-significant on TruthfulQA-class factual-confidence tasks; on HellaSwag and MMLU-eligible subjects entropy is the best signal (v111 matrix).
- Conformal guarantee is exchangeable-draw, finite-sample; distribution shift reverts the bound to empirical AURCC.
- v6 β v29 extended kernels are Lab demo (C) appendices with
internal
self_testconsistency, not harness rows, tape-out, or trained LM reproduction. - Arithmetic vs throughput. 192Γ ops and 32Γ RAM are arithmetic
ratios at
D = 4096. Throughput requiresmake benchplus archived host metadata. - BitNet + Ο kernel lab: integer ternary matvec +
sigma_gate_tinystep + cache/speculative hooks βmake bench-bitnet-sigmaorcos benchmark --bitnet-sigma(toy dimensions; archive underbenchmarks/bitnet/; not full 2B4T tok/s without a harness bundle). - Ο-sparse + SSM hybrid (lab):
make bench-hybrid/cos benchmark --hybridβ integer sparse attention + toy SSM path (benchmarks/sigma_hybrid/; not llama.cpp). - BitNet quickstart downloads real 1.2 GB weights; the local runtime is real. Cloud escalation is opt-in and off by default.
Enterprise (pilot MVP)
Goal: one command HTTP Ο-gate + audit JSONL + tiny Python client + stdout audit summary.
cos serve (HTTP)
make cos && make cos-serve, then:
./cos serve --port 3001
curl -s http://127.0.0.1:3001/v1/health
curl -s -X POST http://127.0.0.1:3001/v1/gate \
-H 'Content-Type: application/json' \
-d '{"prompt":"What is 2+2?"}'
If ollama serve is listening on 127.0.0.1:11434, cos serve sets COS_INFERENCE_BACKEND=ollama and picks a default model from /api/tags (prefers gemma3:4b) when you have not set COS_OLLAMA_MODEL. You can still override with env vars or a "model" field in JSON.
Endpoints: POST /v1/gate, POST /v1/verify, GET /v1/health, GET /v1/audit/{id}. Append-only audit: ~/.cos/audit/YYYY-MM-DD.jsonl.
cos report
cos report
Prints a human-readable summary of all ~/.cos/audit/*.jsonl rows (counts, mean Ο).
Python SDK (pip install)
From a checkout: pip install ./sdk/python (see sdk/python/README.md for a git+https://β¦#subdirectory=sdk/python one-liner).
from creation_os import CreationOS
cos = CreationOS()
print(cos.gate("What is 2+2?"))
Optional LangChain hook: integrations/langchain_sigma.py (needs PYTHONPATH including sdk/python).
Graded metrics (lab, one CSV)
| Metric | Value (evidence) |
|---|---|
| AUROC | 0.8123 on graded-50 run β see benchmarks/graded/RESULTS.md and source CSV named there |
| Evidence class | Lab reporting on a fixed graded set; not a frontier harness row |
Pricing (positioning only)
No payment logic ships in this tree; licence terms are unchanged (LICENSE).
| Tier | What | Indicative price |
|---|---|---|
| Open source | cos, Ο-gate kernels, cos serve, audit JSONL, cos report, SDK | Free under SCSL / AGPL terms |
| Pro (positioning) | Commercial support / packaging around the same bits | β¬49/month (contact Spektre Labs) |
| Enterprise (positioning) | SLA, custom Ο, on-prem | Contact |
β LICENSE β
License
Dual-licensed; the choice is not at your discretion β see
LICENSE Β§0 for which one binds you. A third option
(paid Commercial License) is available only from the Licensor.
| Path | Cost | Document |
|---|---|---|
| Spektre Commercial Source License v1.0 (primary) | free for non-commercial | LICENSE-SCSL-1.0.md |
| GNU AGPL v3.0-only (fallback after 4-yr Change Date, AGPL-derived portions) | free | LICENSE-AGPL-3.0.txt |
| Commercial License (closed-source / SaaS / OEM / Sovereign / Strategic) | paid | COMMERCIAL_LICENSE.md |
| Contributor License Agreement | n/a | CLA.md |
TL;DR
- Private individuals Β· academia Β· non-profits Β· journalism Β· reproducibility / security audits Β· 30-day commercial evaluation (under EUR 1 M revenue) β FREE under SCSL-1.0.
- For-profit > EUR 1 M revenue Β· hosted SaaS / model-as-a-service / agent-as-a-service (unless you publish the complete service-stack source per SCSL Β§5) Β· OEM closed-source redistribution β paid Commercial License required.
- All government / military / intelligence / law-enforcement operational use (SCSL Β§9.1(b)) β DENIED at any price; civilian Sovereign deployments by EU CFR / ECHR / ICCPR-bound states under SCSL Β§9.3.
- Sanctioned Persons (EU / UN / OFAC / UK HMT / Finland) and parties credibly accused of Aggression (Rome Statute Art. 8 bis) β categorical denial (SCSL Β§10).
Sole holder of all paid commercial rights: Lauri Elias Rainio (ORCID 0009-0006-0903-8541) and Spektre Labs Oy, jointly and severally. No other person or entity may grant a Commercial License; any attempted grant is void ab initio (SCSL Β§4.3).
Every Receipt emitted by Creation OS carries the SHA-256 of
LICENSE-SCSL-1.0.md (SCSL Β§11). The pinned reference hash lives
in LICENSE.sha256 and is independently verifiable:
shasum -a 256 LICENSE-SCSL-1.0.md # macOS
sha256sum LICENSE-SCSL-1.0.md # POSIX
bash tools/license/license_sha256.sh # bundled helper
make license-attest # full: 11 KAT + bundle + sample receipt
Full human-readable explainer: docs/LICENSING.md Β·
who-may-do-what matrix: docs/LICENSE_MATRIX.md Β·
trademark / patent notices: NOTICE.
Lauri Elias Rainio Β· Spektre Labs Oy Β· Helsinki, Finland
ORCID: 0009-0006-0903-8541 Β·
licensing: spektre.labs@proton.me Β· web: spektrelabs.org
|
Independent research. No institution. No funding. General models answer on demand. Creation OS measures first β ACCEPT, RETHINK, or ABSTAIN on the wired interrupt β with archived receipts where claims apply.
2026 Β· Spektre Labs Β· Lauri Elias Rainio Β· Helsinki |
