WAL Durability
Rafka's broker stores produced records in a write-ahead log (WAL) called SingleWal. Records are written to disk and survive broker restarts. When the broker starts it recovers the full virtual topic index from the WAL without any external coordination.
Storage layout
The WAL stores records in fixed-size 1 GB segment files. Each segment has a companion memory-mapped Dense Offset Index (<segment_id>.oidx) that maps global message IDs to physical byte locations.
data/
0.log — segment 0 (raw record bytes, length-prefixed)
0.oidx — memory-mapped offset index (40 bytes per entry)
1.log — segment 1 (created when segment 0 fills)
1.oidx
...
Each OffsetIndexEntry (40 bytes) stores:
| Field | Type | Purpose |
|---|---|---|
segment_id | u32 | Which .log file the record is in |
physical_offset | u32 | Byte offset within the segment |
length | u32 | Record byte length |
org_id | u64 | Tenant org identifier |
topic_hash | u128 | Blake3 org-salted topic hash |
partition | u32 | Partition index |
The (org_id, topic_hash, partition) tuple in every index entry is what makes the virtual topic index recoverable after a restart — the broker can rebuild it by scanning the index without reading the record data.
Virtual topic index
The virtual topic index maps a virtual topic key to a RoaringTreemap bitset of global message IDs.
Virtual topic key format: "{org_id}-{topic_hash_decimal}-{partition}"
The bitset's rank-of-a-global-ID is the virtual offset — the Kafka-visible offset clients use in ProduceResponse.base_offset and FetchRequest.fetch_offset. This means:
- Virtual offset 0 = the first record ever produced to this (org, topic, partition).
high_watermark = bitset.len()— the total record count.- Fetch from
fetch_offset = Nreads theNth record in the bitset.
Append path
store_records(wal, vt_key, records) in broker/src/produce.rs:
- Calls
wal.append_direct(vt_key, record_bytes, …)once per record. append_directwrites the record to the current segment and updates the offset index.- The virtual topic index is updated:
global_idis inserted intovt_key's bitset; the bitset length − 1 is the virtual offset returned asbase_offset.
Span: rafka.broker.produce.store — attributes: virtual_topic, record_count, base_offset.
Fetch path
handle_fetch in broker/src/produce.rs:
- Looks up
vt_key's bitset to readhigh_watermark. - Calls
wal.fetch_virtual(vt_key, virtual_offset, max_bytes)in a loop, incrementingvirtual_offseteach iteration. fetch_virtualtranslates a virtual offset → global message ID (bitset select) →OffsetIndexEntry→ reads raw bytes from the segment file.- Returns when
fetch_virtualreturnsNone(past HWM) ormax_bytesis exhausted.
Span: rafka.broker.fetch.read — attributes: virtual_topic, fetch_offset, returned_count, high_watermark.
Restart recovery
When SingleWal::new(data_dir) runs on startup it performs a recovery scan:
- Reads every
OffsetIndexEntryfrom all existing.oidxfiles in order. - For each entry, reconstructs the virtual topic key from
(org_id, topic_hash, partition)usingparse_virtual_topic_tuple. - Inserts the
global_idinto the corresponding bitset, restoring the full virtual topic index.
After recovery, high_watermark, virtual offsets, and fetch paths are identical to the pre-restart state. Records are not re-read from .log files during recovery — only the 40-byte index entries are scanned.
Configuration
| Environment variable | Default | Purpose |
|---|---|---|
RAFKA_DATA_DIR | ./data | Base directory for segment files and index files |
The WAL is initialized once at broker startup via a OnceLock<Arc<SingleWal>>. All produce and fetch handlers share the same WAL instance for the lifetime of the process.
What is NOT yet implemented
- Segment compaction / GC (old segments accumulate until disk pressure)
- Replication (records live on one broker; Phase 4 adds HA)
- Timestamp-exact offset resolution for
ListOffsetswithtimestamp >= 0
Related pages
- Concepts: Produce — how records reach the WAL
- Concepts: Fetch — how records are read from the WAL
- Quickstart: Produce → Fetch — restart durability demo