Mesh Architecture
The Rafka v2 mesh is the substrate every other layer rides on. Every node — gateway, broker, compute, registry, bridge — is a single binary that:
- Holds an iroh-net
Endpointkeyed by itsNodeId(Ed25519 public key) - Discovers peers via mDNS (LAN) and DERP relays (WAN via iroh-relay)
- Maintains one QUIC connection per peer pair under a single ALPN
- Multiplexes two distinct planes over that connection
The two planes
| Plane | Mechanism | Reliability | What rides it |
|---|---|---|---|
| Control | iroh-gossip (HyParView for membership + Plumtree for broadcast) | Internal control messages reliable-ordered; application-level state digests broadcast lossily | Membership churn (join/leave), small state digests (CPU load, peer counts, mesh_id), auth deltas, routing-table updates |
| Data | connection.open_bi() (bidirectional QUIC streams) | Reliable + ordered per stream; independent flow control across streams; shared congestion control across the connection | Heavy request/response payloads (Kafka ops, batch fetches, large auth pushes), anything that exceeds the safe gossip MTU or requires acknowledgement |
These ride the same QUIC connection, not separate connections. One ALPN = one TLS handshake per peer pair = one congestion-control context shared between control and data.
Opening a new stream on an established connection is effectively free (a local stream-ID allocation; no network handshake) because QUIC pre-grants stream quota via MAX_STREAMS frames.
Stream wire grammar
Every QUIC stream (bi or uni) follows the same per-stream framing:
stream = tag(u8) length(unsigned-varint) payload(postcard) [EOF]
- tag — 1 byte routing the stream to a handler. Lookup table below.
- length — LEB128 unsigned varint, byte length of the payload that follows.
- payload — postcard-encoded value of the type the tag's handler expects.
- [EOF] — single-use streams. Sender writes the frame and calls
finish(); receiver reads exactlylengthbytes, deserializes, drops the stream.
Serialization choice: postcard
- postcard for mesh frames and control messages — LEB128 varints give the smallest wire size; pure serde keeps domain types clean.
- rkyv reserved for WAL records read many times (future storage layer).
- serde_json for human-read external data only (REST APIs, audit logs). Never the internal wire format.
Tag namespace
| Range | Class | Use |
|---|---|---|
0x00 | RESERVED | Sentinel / null-detection. Never assigned. |
0x01–0x0F | Control plane | Pointer-gossip pulls (IHAVE/IWANT), auth-state pushes, control-plane request/response |
0x10–0x7F | Data plane | 0x10 = Ping/Pong/Hello (substrate). 0x11+ reserved for KafkaOp, compute, batch fetches |
0x80–0xFF | Extensions | Vendor / future / experimental. Drop on unknown tag with unsupported_tag span. |
The KafkaOp carrier uses KAFKA_OP_PRODUCE = 0x02 today (control-plane range; will move to 0x10+ when the full tag namespace ships in Phase 1.2).
Pointer gossip (oversize control deltas)
For control-plane deltas exceeding the QUIC datagram MTU ceiling (~1200 bytes), Rafka uses Pointer Gossip — the application-level name for Plumtree's native IHAVE/IWANT lazy-push pattern:
- Source computes
hash(payload), caches it locally, and broadcasts a tiny{hash, size, source_node_id}pointer datagram over gossip. - Receivers see the pointer. Cache hit → done. Cache miss → open a unidirectional QUIC stream (tag
0x01) tosource_node_id, source responds with the payload. - Receiver decodes, caches, processes.
Sub-millisecond datagram dissemination for routing; reliable stream fallback for the actual transfer. No fragmentation at the gossip layer.
QoS properties
- Flow control is per-stream. A stalled heavy consumer backpressures the producer on that stream only. A tiny pointer-pull on another stream of the same connection is unaffected.
- Congestion control is per-connection. Network saturation throttles the shared QUIC connection cooperatively — streams cooperate on path bandwidth rather than fighting as separate ALPNs would.
What is built today
| Layer | Status | Notes |
|---|---|---|
| iroh-net Endpoint per node | Live | crates/rafka-mesh-transport |
| NodeId-keyed peer registry | Live | crates/rafka-node-base |
| mDNS peer discovery | Live | iroh built-in |
| Single QUIC connection per peer pair | Live | iroh manages |
Single ALPN (rafka-mesh-v1) | Live | |
| Ping/Pong/Hello frames over QUIC streams | Live | rafka-mesh-ops |
| W3C trace-context in frame header | Live | 32-byte header; cross-process span propagation |
| postcard wire codec | Live | commit 24a19ee |
| KafkaOp correlated RPC carrier | Live | commit 40ba255; Produce op verified |
| 1-byte tag stream demux | Phase 1.2 | |
| Property-tested framer crate | Phase 1.1 | rafka-mesh-ops::framer |
| iroh-gossip wiring | Phase 1.3 | currently bootstrapped over mDNS |
| Pointer Gossip pattern | Phase 2 | needs 0x01 handler + payload cache |
Internal references
Architecture decisions, sprint configs, and implementation notes live in the repo but are not published to this site. Key references:
docs/eng/rafka-golden-principles.md— engineering principles (#2 broker design, #7 per-message observability, #11 serialization, #12 election)docs/plans/mesh-v1/06-decisions.md— D-027: locks iroh-gossip + backpressure testsdocs/features/frame-exchange/— Ping/Pong/Hello implementation detaildocs/features/peer-discovery/— mDNS + DERP discovery