Runtime Hosting
Cartridge runtime, host runtime, relay topology, and capability routing
Architecture
graph TD
Engine["Engine / DAG Executor"]
Switch["RelaySwitch<br/>Routes REQ by cap URN<br/>Routes continuations by RID<br/>Tracks peer invocations"]
Engine --> Switch
subgraph Host0 ["Host 0"]
RM0["RelayMaster"] --> RS0["RelaySlave"]
RS0 --> PH0["CartridgeHostRuntime"]
PH0 --> P0A["Cartridge A"]
PH0 --> P0B["Cartridge B"]
end
subgraph Host1 ["Host 1"]
RM1["RelayMaster"] --> RS1["RelaySlave"]
RS1 --> PH1["CartridgeHostRuntime"]
PH1 --> P1A["Cartridge C"]
end
subgraph Host2 ["Host 2"]
RM2["RelayMaster"] --> RS2["RelaySlave"]
RS2 --> PH2["CartridgeHostRuntime"]
PH2 --> P2A["Cartridge D"]
PH2 --> P2B["Cartridge E"]
end
Switch --> RM0
Switch --> RM1
Switch --> RM2
- CartridgeRuntime runs inside each cartridge process — handles handler registration, handshake, and frame I/O.
- CartridgeHostRuntime manages one or more cartridge processes — spawning, routing, health monitoring.
- RelaySwitch routes requests across multiple hosts based on cap URN matching and specificity ranking.
- Relay pairs (RelaySlave/RelayMaster) bridge the Unix socket between host and switch, intercepting control frames.
Request Flow
A single capability invocation flows through the full stack:
sequenceDiagram
participant E as Engine
participant SW as RelaySwitch
participant RS as RelaySlave
participant PH as CartridgeHost
participant P as Cartridge
E->>SW: REQ(cap)
Note over SW: assign XID, match cap → master
SW->>RS: REQ(cap, xid)
RS->>PH: REQ(cap, xid)
Note over PH: match cap → cartridge
PH->>P: REQ(cap, xid)
Note over P: handler executes
P-->>PH: STREAM_START
P-->>PH: CHUNK(data)
P-->>PH: CHUNK(data)
P-->>PH: STREAM_END
P-->>PH: END
PH-->>RS: (forwarded)
RS-->>SW: (forwarded)
SW-->>E: (forwarded)
Peer Invocation Flow
When Cartridge A calls a capability provided by Cartridge B:
sequenceDiagram
participant PA as Cartridge A
participant PHA as CartridgeHost A
participant RSA as RelaySlave A
participant SW as RelaySwitch
participant PB as Cartridge B
PA->>PHA: REQ(cap) — no XID
Note over PHA: record RID in outgoing_rids
PHA->>RSA: REQ(cap)
RSA->>SW: REQ(cap)
Note over SW: assign XID, record peer
SW->>PB: REQ(cap, xid)
PB-->>SW: response frames
SW-->>RSA: response (by XID)
RSA-->>PHA: response
Note over PHA: route by RID to Cartridge A
PHA-->>PA: PeerResponse
CartridgeRuntime
CartridgeRuntime is the cartridge-side entry point. A cartridge creates a runtime with its manifest, registers handlers for each capability, and calls run(). The runtime handles everything else — handshake, frame parsing, request routing, response framing, heartbeat replies.
let manifest = build_manifest();
let mut runtime = CartridgeRuntime::with_manifest(manifest);
runtime.register_op("cap:in=...;out=...;op=generate", || {
Box::new(MyGenerateOp::default())
});
runtime.run().unwrap();
Mode Detection
flowchart TD
A["runtime.run()"] --> B{CLI arguments?}
B -->|No args| C["Cartridge CBOR Mode<br/>Binary frames on stdin/stdout"]
B -->|Has args| D{First arg?}
D -->|"manifest"| E["Print JSON manifest"]
D -->|"--help"| F["Print subcommands"]
D -->|slug match| G["CLI Mode<br/>Parse args, run handler,<br/>output NDJSON"]
D -->|no match| H["UnknownSubcommand error"]
A single binary serves as both a Bifaci cartridge (when launched by the host with no args) and a standalone CLI tool (when run directly by a user).
Handler Registration
Two methods register handlers:
register_op(cap_urn, factory)— registers a factory closure that creates a freshOpinstance per request. Useful when handlers hold per-request state.register_op_type::<T>(cap_urn)— registers a type implementingOp + Default. Instances are created viaT::default().
When a REQ arrives, the runtime finds the handler by matching the request’s cap URN against registered cap URNs using is_dispatchable (see Dispatch). When multiple handlers match, the runtime ranks by specificity.
Cartridge Mode Main Loop
In cartridge CBOR mode, run() executes:
- Handshake — exchange HELLO frames, negotiate limits.
- Spawn writer task — a tokio task drains an mpsc channel and writes frames to stdout. This is the only thing that touches stdout.
- Read loop — read frames from stdin:
- REQ: Look up handler, spawn handler task with
InputPackageandOutputStream. - Heartbeat: Respond inline with same ID.
- STREAM_START / CHUNK / STREAM_END / END: Route to the handler task owning the request ID.
- stdin EOF: Break and shut down.
- REQ: Look up handler, spawn handler task with
graph LR
HA["Handler A"] --> CH["mpsc channel"]
HB["Handler B"] --> CH
HC["Handler C"] --> CH
CH --> WT["Writer Task"] --> OUT["stdout"]
All handlers send frames through the channel. The writer serializes each frame to CBOR, prepends the 4-byte length, and writes to stdout.
Tokio Threading Model
graph LR
subgraph Tokio Workers
RT["Reader Task<br/>(stdin → dispatch)"]
WT["Writer Task<br/>(channel → stdout)"]
H1["Handler Task 1"]
H2["Handler Task 2"]
end
subgraph Blocking Pool
BL["spawn_blocking<br/>(FFI model load)"]
end
RT -->|route frames| H1
RT -->|route frames| H2
H1 -->|send frames| CH["mpsc channel"]
H2 -->|send frames| CH
BL -.->|ProgressSender| CH
CH --> WT
If a handler blocks a tokio worker (e.g., synchronous FFI model load), the writer task cannot flush frames and the engine triggers its 120-second activity timeout:
sequenceDiagram
participant H as Handler
participant CH as mpsc Channel
participant WT as Writer Task
participant E as Engine
H->>CH: progress(0.25, "Loading model...")
Note over H: llama_model_load() — blocks tokio worker
Note over WT: Needs tokio worker — none available
Note over CH: Frames stuck in channel
Note over E: 30s... 60s... 90s...
E->>E: 120s — Activity Timeout!
run_with_keepalive() solves this by moving blocking work to tokio::task::spawn_blocking (a dedicated thread pool):
flowchart TD
Q{"Blocking work needs<br/>granular progress?"}
Q -->|"No — just keep alive"| RWK["run_with_keepalive()<br/>Fixed value every 30s"]
Q -->|"Yes — per-token, per-step"| PS["ProgressSender<br/>Emit from inside closure"]
RWK --> SB1["spawn_blocking(closure)"]
PS --> SB2["spawn_blocking(move || {<br/> ps.progress(...) <br/>})"]
Op Trait
Handlers implement the Op<()> trait:
#[async_trait]
pub trait Op<T>: Send + Sync {
async fn perform(&self, dry: &mut DryContext, wet: &mut WetContext) -> OpResult<T>;
fn metadata(&self) -> OpMetadata;
}
flowchart LR
REQ["REQ frame"] --> FH["find_handler<br/>(is_dispatchable)"]
FH --> F["OpFactory"]
F --> OP["Op instance"]
OP --> P["perform()"]
P -->|Ok| CL["close() → STREAM_END + END"]
P -->|Err| ER["ERR frame"]
The runtime calls dispatch_op(), which creates a Request, inserts it into a WetContext, calls op.perform(), and on success closes the output stream (STREAM_END + END). On error, it sends an ERR frame.
The WetContext holds an Arc<Request> under the key "request". The Request struct bundles:
take_input()→InputPackage: All input argument streams. Can only be called once.output()→&OutputStream: For emitting response data, progress, and log messages.peer()→&dyn PeerInvoker: For calling other cartridges’ capabilities.
Handler Data Access Pattern
flowchart LR
WET["WetContext"] -->|get_arc| REQ["Request"]
REQ -->|take_input| IP["InputPackage"]
IP -->|collect_streams| ST["Vec<(media_urn, bytes)>"]
ST -->|require_stream| DATA["Argument bytes"]
ST -->|find_stream_str| OPT["Optional text"]
REQ -->|output| OS["OutputStream"]
OS -->|emit_cbor| OUT["Response"]
OS -->|progress| LOG["LOG frame"]
OutputStream Auto-Framing
The OutputStream automatically injects STREAM_START before the first chunk and STREAM_END on close:
sequenceDiagram
participant H as Handler
participant OS as OutputStream
participant W as Wire
H->>OS: write(data) or emit_cbor(value)
Note over OS: ensure_started()
OS->>W: STREAM_START (auto)
OS->>W: CHUNK (data)
H->>OS: write(more)
OS->>W: CHUNK (more)
Note over H: handler returns Ok(())
Note over OS: runtime calls close()
OS->>W: STREAM_END
Note over OS: runtime sends
OS->>W: END
Peer Invocation from Handler
When a handler calls another cartridge’s capability via peer():
sequenceDiagram
participant PA as Cartridge A
participant PHA as CartridgeHost A
participant RSA as RelaySlave A
participant SW as RelaySwitch
participant PB as Cartridge B
Note over PA,PB: Request Phase
PA->>PHA: REQ(cap) — no XID
Note over PHA: record RID in outgoing_rids
PHA->>RSA: REQ(cap)
RSA->>SW: REQ(cap)
Note over SW: assign XID, record peer
SW->>PB: REQ(cap, xid)
PA->>PHA: STREAM_START
PA->>PHA: CHUNK(arg)
PA->>PHA: STREAM_END
PA->>PHA: END
Note over PHA,PB: (forwarded through relay)
Note over PA,PB: Response Phase
PB-->>SW: LOG(progress)
SW-->>RSA: LOG(progress)
RSA-->>PHA: LOG(progress)
PHA-->>PA: LOG(progress)
PB-->>SW: CHUNK(result)
SW-->>RSA: CHUNK
RSA-->>PHA: CHUNK
PHA-->>PA: CHUNK
PB-->>SW: END
SW-->>RSA: END
RSA-->>PHA: END
PHA-->>PA: END
Progress Path
graph TD
PH["Cartridge handler<br/>output.progress(0.5, ...)"] --> LOG2["LOG frame on stdout"]
LOG2 --> PHR["CartridgeHostRuntime<br/>(pass-through)"]
PHR --> RS2["RelaySlave → RelayMaster<br/>(pass-through)"]
RS2 --> EF["execute_fanin<br/>→ CapProgressFn callback"]
EF --> CI["cap_interpreter<br/>→ map to step range"]
CI --> DB["SQLite database"]
DB --> UI["UI"]
Swift Equivalent
The Swift CartridgeRuntime in capdag-objc follows the same design. Key difference: Swift uses Task + DispatchSemaphore.wait() for handler dispatch — nesting another Task + semaphore inside perform() deadlocks the async executor. This constraint does not exist in Rust’s tokio model.
graph TD
subgraph Rust
R_CALL["run_with_keepalive()"] --> R_SB["spawn_blocking<br/>(blocking thread pool)"]
R_CALL --> R_SEL["tokio::select!<br/>interval(30s)"]
R_SEL -->|every 30s| R_PROG["Progress frame"]
R_SB -->|complete| R_DONE["select! exits"]
end
subgraph Swift
S_CALL["runWithKeepalive()"] --> S_TASK["Background Task<br/>loop { sleep(30s) }"]
S_CALL --> S_OP["Async operation"]
S_TASK -->|every 30s| S_PROG["Progress frame"]
S_OP -->|complete| S_CANCEL["Cancel background Task"]
end
CartridgeHostRuntime
CartridgeHostRuntime sits between a RelaySlave and one or more cartridge processes. It manages their full lifecycle — spawning, handshake, frame routing, health monitoring, and death handling.
Cartridge Registration
Two ways to connect cartridges:
register_plugin(path, known_caps)— registers a binary for on-demand spawning. The cartridge is not started until a REQ arrives for one of its capabilities.known_capsenables provisional routing before the HELLO handshake reveals the actual manifest.attach_plugin(read, write)— attaches a pre-connected cartridge. Performs HELLO handshake and identity verification immediately.
Cartridge Spawning
sequenceDiagram
participant R as Relay
participant PH as CartridgeHostRuntime
participant P as Cartridge (new)
R->>PH: REQ(cap)
Note over PH: Cartridge not running — spawn on demand
PH->>P: spawn process (no args)
rect rgb(240, 248, 255)
Note over PH,P: Handshake
PH->>P: Hello (limits)
P->>PH: Hello (limits + manifest)
PH->>P: REQ(CAP_IDENTITY, nonce)
P->>PH: nonce echo
end
Note over PH: Update cap table from manifest
PH->>P: Forward original REQ(cap)
If HELLO fails, the cartridge is marked with hello_failed = true and will not be spawned again.
Frame Routing
flowchart TD
F["Frame from relay"] --> FT{frame_type?}
FT -->|REQ| CAP["Lookup cap_urn<br/>in cap_table"]
CAP --> RUN{Cartridge running?}
RUN -->|No| SP["Spawn on demand"]
SP --> FWD
RUN -->|Yes| FWD["Forward to cartridge<br/>Record (XID,RID) in<br/>incoming_rxids"]
FT -->|"STREAM_START<br/>CHUNK<br/>STREAM_END<br/>END<br/>ERR"| CONT["Lookup (XID,RID)<br/>in incoming_rxids"]
CONT --> FWD2["Forward to<br/>handling cartridge"]
Engine → Cartridge: REQ frames are routed by cap URN lookup in the cap table. Continuation frames (STREAM_START, CHUNK, etc.) are routed by (XID, RID) lookup.
Cartridge → Engine: HEARTBEAT is handled locally (never forwarded). REQ from a cartridge (peer invoke) is forwarded to the relay with the source cartridge recorded. Everything else is forwarded to the relay.
Self-Loop Routing
When a cartridge peer-invokes a cap on another cartridge on the same host, both incoming_rxids and outgoing_rids contain entries for the same request IDs. The host resolves this by frame type discrimination — request body frames route via incoming, peer response frames route via outgoing.
Heartbeat Health Monitoring
The host probes each running cartridge every 30 seconds. A heartbeat unanswered for 10 seconds marks the cartridge as unhealthy.
Cartridge Death Handling
Three scenarios:
| Scenario | Behavior |
|---|---|
| Ordered shutdown | Clean up routing entries, no ERR frames |
| Unexpected death with pending requests | Send ERR frame for each pending request ID |
| Idle death | Clean up routing entries; next REQ triggers respawn (cartridge is re-spawned on demand) |
stateDiagram-v2
[*] --> Registered : Discovery
Registered --> Spawning : First REQ for this cap
Spawning --> Handshaking : Process started
Handshaking --> Live : HELLO + identity OK
Handshaking --> Failed : HELLO failed
Live --> Live : Handle requests
Live --> Dead : Process died
Live --> Killed : Engine shutdown / idle timeout
Dead --> Spawning : Next REQ triggers respawn
Killed --> Registered : Available for respawn
Failed --> [*] : Permanent failure
On unexpected death:
sequenceDiagram
participant P as Cartridge
participant PH as CartridgeHostRuntime
participant E as Engine
Note over P: Process dies unexpectedly
P--xPH: stdout EOF
Note over PH: Detect death via reader task
PH->>PH: Read stderr (last_death_message)
loop For each pending request
PH->>E: ERR frame (death message)
end
Note over E: ExecutionError::CartridgeExecutionFailed<br/>or ExecutionError::HostError
Capability Advertisement
When capabilities change, the host rebuilds its aggregate manifest and sends a RelayNotify frame to the relay. This propagates through the switch so the engine’s view stays in sync.
Relay Topology
A RelaySlave/RelayMaster pair forms a transparent frame bridge over a Unix socket:
graph LR
SW["RelaySwitch"] <-->|Unix socket| RM["RelayMaster"]
RM <-->|Unix socket| RS["RelaySlave"]
RS <--> PH["CartridgeHostRuntime"]
style RM fill:#e8f4fd
style RS fill:#e8f4fd
All regular frames pass through unchanged. Only two frame types are intercepted:
sequenceDiagram
participant PH as CartridgeHostRuntime
participant RS as RelaySlave
participant RM as RelayMaster
participant SW as RelaySwitch
Note over PH,SW: RelayNotify (slave → master)
PH->>RS: Caps changed
RS->>RM: RelayNotify (manifest, limits)
Note over RM: Stores manifest — never forwarded
RM--xSW: (intercepted)
Note over PH,SW: RelayState (master → slave)
SW->>RM: RelayState (resource info)
RM->>RS: RelayState (payload)
Note over RS: Stores in resource_state — never forwarded
RS--xPH: (intercepted)
Note over PH,SW: Regular frames
PH->>RS: REQ / CHUNK / END / etc.
RS->>RM: (pass-through)
RM->>SW: (pass-through)
- RelayNotify (Slave → Master): Carries capability advertisements. Stored by the master, consumed by the switch for routing.
- RelayState (Master → Slave): Carries host resource information (model paths, GPU availability, security bookmarks). Stored by the slave, accessible by the cartridge host runtime.
Connection Lifecycle
sequenceDiagram
participant E as Engine
participant RM as RelayMaster
participant RS as RelaySlave
participant PH as CartridgeHostRuntime
Note over E,PH: 1. Socket creation
E->>RM: Create Unix socket pair
E->>RS: Other end of socket
Note over E,PH: 2. Initial advertisement
RS->>RM: RelayNotify (caps, limits)
Note over E,PH: 3. Identity verification
RM->>RS: REQ(CAP_IDENTITY)
RS->>PH: (forward)
PH-->>RS: nonce echo
RS-->>RM: (forward)
Note over E,PH: 4. Spawn reader tasks
Note over E,PH: 5. Bidirectional flow — relay is live
- Socket creation — the engine creates a Unix socket pair.
- Initial advertisement — the RelaySlave sends a RelayNotify on startup.
- Identity verification — REQ for CAP_IDENTITY flows through the relay chain; nonce echo response flows back.
- Reader tasks — both sides spawn reader tasks for ongoing frame forwarding.
- Bidirectional flow — the relay is live. Shutdown occurs when either side closes the socket.
RelaySwitch
The RelaySwitch is the central routing component. It connects to one or more relay masters and routes requests based on cap URN matching.
All methods take &self — interior mutability (RwLock, Mutex, AtomicU64) allows multiple concurrent DAG executions to route frames simultaneously.
Cap Routing
flowchart TD
REQ["REQ(cap_urn)"] --> PARSE["Parse cap URN"]
PARSE --> MATCH["Match against each<br/>master's cap set<br/>(is_dispatchable)"]
MATCH --> RANK{Multiple matches?}
RANK -->|Yes| SPEC["Rank by specificity"]
RANK -->|No| SKIP
SPEC --> SKIP{Master healthy?}
SKIP -->|No| ERR["RelaySwitchError::NoHandler"]
SKIP -->|Yes| XID["Assign XID<br/>(atomic increment)"]
XID --> REC["Record (XID, RID)<br/>in routing tables"]
REC --> FWD["Forward REQ to<br/>selected master"]
- Parse the
capfield as a Cap URN. - Match against each master’s capability set using
is_dispatchable(contravariant input, covariant output). - Rank by specificity if multiple masters match.
- Assign XID via atomic counter.
- Record routing in
request_routing,rid_to_xid, andorigin_map. - Forward the REQ to the selected master.
XID Assignment
XIDs (routing IDs, key 13 in frames) are unsigned integers assigned by the switch at routing boundaries. Cartridges never see XIDs directly — they are purely infrastructure.
Request Tracking
| Table | Key | Purpose |
|---|---|---|
request_routing |
(XID, RID) | Route continuation frames to the right master |
rid_to_xid |
RID | Look up XID from RID |
peer_requests |
(XID, RID) | Track peer invocations for cleanup |
origin_map |
(XID, RID) | Route responses back to source (engine or cartridge) |
Terminal frames (END, ERR) trigger cleanup of all tables.
Multiple Masters
graph TD
SW["RelaySwitch"]
SW --> RM0["Master 0"]
SW --> RM1["Master 1"]
SW --> RM2["Master 2"]
RM0 --> PH0["Host 0"] --> P0["Cartridges..."]
RM1 --> PH1["Host 1"] --> P1["Cartridges..."]
RM2 --> PH2["Host 2"] --> P2["Cartridges..."]
Masters are independent. A failure in one does not affect the others — the switch marks the failed master as unhealthy and continues routing to healthy masters. New masters can be added at runtime via add_master().
Capability Aggregation
capabilities() returns the union of all healthy masters’ capabilities. The switch also exposes graph queries for the planner:
get_reachable_targets()— which output media URNs are reachable from a given input.find_paths_to_exact_target()— machines that transform input to output.