Skip to content

ADR-004: DB Unique Constraint over Distributed Locks for Double-Booking Prevention

Status: Accepted


Context

The system must guarantee that a practitioner's time slot cannot be double-booked, even under concurrent load (90 bookings/second peak, with bursts). Two patients attempting to book the same (practitioner_id, slot_start) pair must result in exactly one confirmation and one conflict error.

The approaches evaluated were:

  1. PostgreSQL unique partial index — let the database engine enforce the constraint
  2. Redis distributed lock (SETNX / Redlock) — acquire a lock per (practitioner_id, slot_start) before writing to the DB
  3. Optimistic concurrency control — version column on the slot; retry on conflict
  4. Application-level serialisation — single-writer queue per practitioner

Decision

Enforce double-booking prevention with a PostgreSQL unique partial index on the bookings table:

CREATE UNIQUE INDEX idx_no_double_book
  ON bookings (practitioner_id, slot_start)
  WHERE status = 'confirmed';

This index makes it physically impossible for two rows with the same (practitioner_id, slot_start) to both have status = 'confirmed'. The partial index (WHERE status = 'confirmed') means cancelled bookings do not consume a slot — the same slot can be rebooked after a cancellation without any index cleanup.

A Redis pre-check is used as a fast path to reduce contention under load, but it is not the correctness gate:

1. Redis GETBIT availability:{practitioner_id}:{date} {slot_index}
   → If bit is 0 (slot taken), reject early — no DB round-trip needed
2. INSERT INTO bookings ... ON CONFLICT DO NOTHING RETURNING id
   → Unique index enforces correctness; zero rows returned = conflict

Consequences

Positive:

  • The correctness guarantee lives entirely within the database — no additional distributed infrastructure to deploy, monitor, or reason about.
  • No lock expiry edge cases. A distributed lock (Redlock) requires careful TTL tuning: too short and the lock expires during a slow DB write; too long and a crashed lock holder blocks the slot until the TTL expires.
  • The partial index is transparent in the query plan and auditable via EXPLAIN.
  • Cancellations are handled correctly by design: UPDATE SET status = 'cancelled' removes the row from the partial index, immediately freeing the slot for rebooking.
  • The Redis pre-check is additive — it reduces DB contention but the system is correct without it (fallback is automatic if Redis is unavailable).

Negative:

  • Under high contention on a popular slot, multiple concurrent INSERT attempts will compete at the DB level. The unique index resolves conflicts with a serialisation error, but all losing transactions have already consumed a DB round-trip. The Redis pre-check reduces (but does not eliminate) this wasted work.
  • The constraint is enforced at the DB layer, so application code must handle UniqueViolation (error code 23505) and translate it into a user-facing "slot no longer available" response.

Why not Redlock: Redis distributed locks introduce a failure mode that the unique index avoids entirely: if the Redis node holding a lock key fails before the TTL expires, no other process can acquire the lock until the TTL elapses. The unique index has no equivalent concept of a "stuck lock". Additionally, Redlock correctness under network partitions is a known area of debate — the unique index correctness is not.

Why not optimistic concurrency: An optimistic version column on the slot would require a read-modify-write cycle for every booking attempt, and retry logic in the application. The unique index provides equivalent safety with a single INSERT — no read is required before the write.

Trade-off accepted: The unique index is the simplest, most auditable, and most operationally robust approach at the projected throughput of 90 bookings/second. At significantly higher write rates (thousands per second to the same hot slot), a pre-serialisation layer would be warranted, but this is not a current requirement.