Consumer Groups

Rafka P3.B ships durable consumer-group offsets — committed offsets survive a broker restart via the broker's SingleWal. Group session state (members, generation) remains ephemeral by design; only offsets persist. Multi-member rebalance lands in P3.C.

Consumer group identity — `cgp_<ULID>`

Every consumer group is a first-class entity in its organisation. When a consumer first joins a group, the broker mints a stable ConsumerGroupId:

cgp_01ktb7wm5jvkbfgpzvfnk3kqqb

The prefix cgp_ (consumer-group-pod) prevents cross-entity confusion with topics (top_), organisations, and other entity classes. The ULID suffix is lexicographically sortable and globally unique.

The id is minted once and is stable. If member1 joins order-processor, commits offsets, leaves, and member2 later joins the same group (same group.id string, same org), the group keeps the same cgp_id. Offsets survive the leave.

Group entity tier

Consumer groups are org-tier entities. They have no environment or cluster qualifier — they belong to an organisation, not to a specific deployment. This matches how Kafka treats group.id: as a durable, cross-session identity.

The RRL (Rafka Resource Locator) for a group is:

<org>/consumer-groups/<slug>

where <slug> is derived from the group.id string by Slug::sanitize(group_id).

Note: RRLs contain slashes, not colons. They are never placed in a URL path segment.

Offset semantics — P3.B (durable)

Committed offsets are stored in two places: the broker's SingleWal (durable, survives restart) and an in-memory HashMap (fast read path). These two copies are always consistent: the WAL append completes before the OffsetCommit acknowledgement is returned to the client, and the in-memory cache is updated immediately after.

Committed on OffsetCommit: the broker appends a CoordinatorLogRecord::OffsetCommit to the __consumer_offsets WAL log before acking the client. The in-memory cache is updated atomically after the WAL append.
Resumed on rejoin: when a new consumer instance joins the same group, OffsetFetch reads from the in-memory cache. Because the cache is rebuilt from the WAL on every broker boot, the resume guarantee holds across broker restarts.
Survives broker restart: the broker's recover_offsets scan runs on every boot before any group-op handlers start serving. It applies last-write-wins per (org_id, group_id, topic, partition), rebuilding the in-memory cache from WAL records in append-order.

Ephemeral session vs durable offsets

Group session state (members, generation ID, assignments, heartbeat timestamps) is still ephemeral — a restarted broker starts fresh with no members. Consumers must rejoin after a broker restart, which increments generation_id. The key invariant: the rejoin uses OffsetFetch to retrieve the recovered committed offset, so the consumer resumes from exactly where it committed before the restart. Standard Kafka behaviour.

Uncommitted partition

When a group has never committed an offset for a (topic, partition), OffsetFetch returns committed_offset=-1 with error_code=0. librdkafka interprets -1 as "no committed offset" and falls back to the auto.offset.reset policy (earliest = offset 0, latest = high-watermark). A recovered-but-never-committed partition also returns -1 — recovery does not fabricate entries.

The `__consumer_offsets` log

Rafka uses a single system WAL log — the __consumer_offsets log — for all orgs' offset records. This is the same model as Kafka's internal __consumer_offsets topic. Each CoordinatorLogRecord::OffsetCommit record is self-describing: it carries org_id, group_id, topic, partition, offset, and optional metadata. A single log serves all orgs; org_id disambiguates. Records are postcard-serialised and appended via append_direct. The virtual-topic key is make_vt_key(org_id=0, "__consumer_offsets", partition=0) — system org 0 is reserved and never allocated to a customer org.

Rebalance — single-member one-shot (P3.A)

For single-member groups, the rebalance is immediate:

JoinGroup arrives. Broker inserts the member → state PreparingRebalance.
Because waiters.len() == members.len() (single member), complete_join_phase fires synchronously.
generation_id increments; member becomes leader; state → CompletingRebalance.
Broker responds to JoinGroup with the JoinResult (leader gets full member list).
SyncGroup arrives (leader assigns its own partitions). State → Stable.

There is no timer or polling loop in this path — the transition is event-driven via a Tokio oneshot channel. The gateway-broker rail has no time cap (1 MiB size cap only), so the JoinGroup response can be held until rebalance completes without a timeout.

Multi-member rebalance (P3.C) introduces a configurable RAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS debounce window. In P3.A this defaults to 0 (immediate completion).

Group state machine

                        ┌──────────────────────────────────────────┐
   join (single member) │                                          │
         ───────────────┼───▶  PreparingRebalance                  │
        ▲               │            │                             │
        │               │      complete_join_phase                 │
    Empty ◀─────────────┼──── CompletingRebalance                  │
  (last leave)          │            │ sync (leader)               │
                        │       ─────▼─────                        │
   cgp_id, offsets      │      │  Stable   │◀── heartbeat (ok)     │
   persist here ────────┼──▶  └─────┬─────┘                        │
                        │          │                               │
                        │     leave (was Stable, members > 0)      │
                        │          │                               │
                        │     PreparingRebalance (again)            │
                        └──────────────────────────────────────────┘

States:

Empty — group exists in the broker map; offsets are preserved; no members.
PreparingRebalance — rebalance in progress; members are waiting for JoinResult.
CompletingRebalance — JoinResult delivered; waiting for SyncGroup.
Stable — all members have received assignments; heartbeats are accepted.
Dead — reserved; not currently triggered in P3.A (no expiry path yet).

What survives a leave

When the last member calls LeaveGroup, the group's state transitions to Empty but the group entry and its offsets are NOT deleted from the broker's in-memory maps. A subsequent JoinGroup for the same (org_id, group.id) reuses the existing entry — same cgp_id, same committed offsets — and increments generation_id from where the previous session left off.

This is the "offset persistence across member leave" invariant that makes the P3.A rejoin guarantee possible.

Consumer group identity — cgp_<ULID>​

Group entity tier​

Offset semantics — P3.B (durable)​

Ephemeral session vs durable offsets​

Uncommitted partition​

The __consumer_offsets log​

Rebalance — single-member one-shot (P3.A)​

Group state machine​

What survives a leave​

Related pages​