GHOSTFACTORY

Every agent leaves a record.We keep it.

GhostFactory builds the infrastructure layer for production AI agents — deterministic planning, a supervised runtime, sandboxed execution, and a cryptographically verifiable record of everything. Four components, one contract. Starting with Span Chain — available now.

$ curl localhost:4001/api/runs/agent-run-7f3a/verify \ -H "Authorization: Bearer <token>" { "run_id": "agent-run-7f3a", "verified": true, "span_count": 42, "error": null }

One platform. Four components.

THE PIPELINE — ONE TASK, FOUR STATIONS

01 · PLAN Switchyard An issue becomes a validated Task Manifest — context, permissions, budget.
02 · RUN Murmur The manifest runs as a supervised session — event-sourced, recoverable.
03 · CONTAIN Shroud Execution stays inside the box — allowlist, budget, Shield gates.
04 · RECORD Span Chain Every step lands in a hash-chained trail — verifiable, replayable, evaluable.

Logs are claims.
Not evidence.

If someone with database access rewrites your agent's prompt history, LangSmith won't catch it. Langfuse won't catch it. Nobody will.

That's why the audit layer ships first.

SPAN CHAIN

Logs you can verify.
Runs you can replay.

Span Chain is the audit layer. Every LLM call, tool use, and decision your agent makes lands in an append-only, SHA-256 hash-chained ledger — then becomes raw material for deterministic replay, structural comparison, and evals. Elixir/OTP underneath; OpenTelemetry on the wire.

HOW IT WORKS — FROM SPAN TO RECORD IN THREE STEPS

01 · INGEST

Every span arrives via OTLP HTTP.

Standard OpenTelemetry wire format. Python + TypeScript SDKs included. The SDK stays dumb — all logic lives in the backend.

02 · CHAIN

SHA-256 hash links to the previous.

Each entry includes the hash of the one before it. Append-only — historic spans can be read but never edited or removed without detection.

03 · VERIFY

verify_ledger catches any tampered entry.

Instant chain re-walk. A single modified byte anywhere in history fails verification. Detection is not best-effort — it's mathematical.

span#33ce
span#71b8
span#9e22

SPAN CHAIN · FUNCTIONS

Built for evidence. Not just visibility.

A complete audit, replay, and eval layer — without giving up the OpenTelemetry ecosystem you already use.

Cryptographically verifiable audit trail

SHA-256 hash-chain. verify_ledger detects any modification instantly. When AI Act audits happen (Annex III, 2027), this is what they'll ask for.

For strict regulatory compliance, external Time Stamping Authority anchoring (RFC 3161) is available in the Enterprise tier.

VCR deterministic replay

Record any run as a cassette. Replay with a new model or prompt. Compare structural changes exactly.

Structural run comparison

Span tree diff between any two runs. See the exact deviation point — not just which run was slower.

LLM-as-judge evals

Score any run against a 24-dimension quality rubric. Evals read straight from traces — no separate harness — and every judgment is itself a traced run.

OTLP native

Standard OpenTelemetry protocol. Python SDK + TypeScript SDK. Any OTel-compatible client connects in minutes.

Real-time Trail UI

Watch spans arrive as they happen and drill from a full trace down to a single span. React front end on an Elixir backbone.

Built for sustained load

Broadway ingestion pipeline with back-pressure built in. Stress-tested at 571 spans/second — zero corrupted entries.

Per-run process isolation

Each run lives in its own SessionGenServer (~2 KB heap). A million concurrent runs without interference. Built on the OTP actor model.

POSITIONING

Not another observability tool.

Span Chain is the audit layer. Built for teams that need a verifiable record, not just visibility.

LangSmith / Langfuse GhostFactory
Span Chain
Question answered What did the agent do? What did the agent do — verifiable.
Data guarantee None SHA-256 hash-chain
Replay Session replay Deterministic VCR + structural diff
Debug cost Every retry = LLM call Replay from cassette = $0
Buyer Developer Developer without own framework + compliance
Security guarantee None — mutable traces can be rewritten Append-only, hash-chained — tamper-evident
Architecture Stateless API + DB Elixir/OTP actor model (isolated per-run)

Span Chain isn't a LangSmith competitor. It's the tamper-evident record layer that sits beneath your observability tool.

ROADMAP

The rest of the platform.

Three stations, shipping in pipeline order. One contract — the Task Manifest — carries a task from issue to merged PR, and every station along the way emits spans into the chain.

SWITCHYARD

A ticket goes in.
A verifiable work order comes out.

Switchyard is the control plane — deterministic planning and dispatch. It turns an issue into a Task Manifest: a complete, content-addressed contract that says exactly what an agent may do, with what context, at what cost. Models propose; the plane validates.

F-01

Intake gate

A checklist, not a model. No definition of done, no acceptance criteria, no labels — the issue bounces back as "needs spec" before a single token is spent.

F-02

ROMA planner

An LLM drafts the task graph; Switchyard accepts only a validated artifact — schema-checked, acyclic, with file ownership assigned per node so parallel work can't collide.

F-03

Context assembler

Capabilities — skills, hooks, templates, commands — come from a structured registry: deterministic lookup, same issue, same tools. Project knowledge and issue history come from retrieval.

F-04

Prompt compiler

A versioned template plus assembled slots compile into the final prompt. Which template, which version, which context — all recorded, nothing improvised.

F-05

Task Manifest

The contract between planes: issue ref, compiled prompt, pinned context (document IDs + hashes), write-allowlist, budget, model and effort, escalation policy. Content-addressed — so every run is reproducible.

LLMs only at the leaves. Orchestration stays deterministic.

MURMUR

Thousands of agent sessions.
One supervision tree.

Murmur is the runtime — every agent session runs as an isolated, supervised process on the BEAM. Crashes are expected, contained, and recovered from. State lives in an event log, not inside the process that might die holding it.

F-01

Process per session

Each run gets its own lightweight process under a supervision tree. One session crashing takes down exactly one session — never the system.

F-02

Event-sourced state

Every state transition is persisted as it happens. A restart replays the log and picks up where work stopped — no orphaned runs, no lost progress.

F-03

Explicit session lifecycle

Pending, active, awaiting input, complete, stale — explicit states, not implied ones. Instant acknowledgment, asynchronous work, supervised timeouts.

F-04

Human escalation built in

Review loops are bounded. When the cap is hit, the session moves to awaiting-input and a human decides. Escalation is a state in the machine — not an exception buried in a log.

F-05

Parallel fan-out

Independent tasks run concurrently. Tasks whose write sets overlap get serialized by file ownership — instead of merging into conflicts.

Let it crash. Never let it lose work.

SHROUD

The agent can do anything.
Inside the box.

Shroud is the sandbox — an isolated execution environment where an agent holds exactly the permissions its manifest grants. Read anything, write only the allowlist, spend only the budget. Nothing ships until the gates pass.

F-01

Isolated workspace per task

Every task executes in its own worktree, on its own branch. The blast radius of a bad run is one branch — and the road ahead is microVM-level isolation.

F-02

Manifest-enforced permissions

The Task Manifest is law at the boundary: write-allowlist, tool access, and token budget are enforced by the sandbox — not requested politely in the prompt.

F-03

Shield — layer one

Deterministic gates run first: lint, typecheck, the test suite, AST and pattern rules. No model ever reviews code the machine already rejected.

F-04

Shield — layer two

An independent model reviews the change against the acceptance criteria — full files, not diff lines. The verdict is structured JSON ({verdict, findings, dod_check}), never free text, so reviews themselves can be evaluated.

F-05

Every run is a cassette

Manifest in, result out — both on disk, both hashed. Any execution replays deterministically in Span Chain's VCR.

Autonomy inside the boundary. Determinism at it.

Every station emits spans. Span Chain ships first because the rest of the platform stands on a verifiable record.

plan → run → contain → record · one manifest, one chain

Ready to keep a record of what your agents did?

ghostfactory-art / spanchain · MIT · Built on the OTP actor model

Used by teams shipping AI agents in production — before they need a lawyer.