Photo Credit: https://martinfowler.com/articles/patterns-of-distributed-systems/write-ahead-log.html
Key Questions
- What is the primary purpose of a Write-Ahead Log (WAL)?
The primary purpose of a WAL is to ensure durability by persisting every state change as a command in an append-only log, even before the actual data structures are updated. This guarantees that changes are not lost, even in the case of system crashes.
- What problem does the WAL solve in systems requiring strong durability?
WAL solves the problem of preserving data integrity in the event of server crashes or failures. It ensures that once a server agrees to perform an action, the changes can be recovered and replayed to restore the system’s state.
- What is stored in each log entry of a WAL?
Each log entry stores a unique identifier, the data representing the state change (often serialized commands), the type of entry, and a timestamp.
- How are log entries uniquely identified, and why is this important?
Log entries are uniquely identified using unique identifiers (IDs). This is important for tracking, ordering, and performing operations like cleaning the log (e.g., with Low-Water Mark) and ensuring that duplicates can be identified and ignored.
- How does the system recover its state after a crash using the WAL?
Upon restart, the system reads all entries from the WAL file and replays them in sequence. This re-applies the logged state changes, restoring the system to its most recent consistent state.
- What are the tradeoffs between synchronous and asynchronous flushing of logs?
Synchronous flushing: Provides strong durability guarantees by ensuring that every log entry is written to the physical disk immediately, but it significantly impacts performance.
Asynchronous flushing: Improves performance by batching log entries before writing them to disk but introduces the risk of data loss if the system crashes before the batch is flushed.
- How can corrupted log entries be detected and handled?
Corrupted log entries can be detected using mechanisms like CRC (Cyclic Redundancy Check) records or end-of-entry markers. When detected during recovery, corrupted entries can be discarded to prevent inconsistencies.
- Why is duplicate handling necessary in a WAL, and how can it be addressed?
Duplicates may occur due to client retries or communication failures. They can be addressed by marking each request with a unique identifier and ensuring idempotency in operations, such as ignoring repeated updates to the same key-value pair.
- How does the WAL differ from Event Sourcing in terms of purpose and usage?
WAL: Focuses on durability and recovery, storing changes needed for restoring the current state. Logs are discarded after their purpose is fulfilled (e.g., after a Low-Water Mark).
Event Sourcing: Logs serve as the system’s source of truth, allowing reconstruction of historical states. Logs are retained indefinitely for auditability and replaying events.
- How does a Write-Ahead Log (WAL) support transactional storage by ensuring ACID?
WAL ensures:
Atomicity: All changes within a transaction are written as a single batch, ensuring they are either fully applied or not applied at all.
Consistency: WAL ensures the system is restored to a consistent state by replaying log entries in the correct order.
Isolation: Multiple transactions are isolated and do not interfere with one another.
Durability: Changes are persisted in the WAL before updating the data store, guaranteeing recovery after a crash.