Denny Lee

How Delta Lake Handles Multiple Concurrent Reads and Writes

Delta Lake Optimistic Concurrency Control and Multi-Version Concurrency Control

Previously we had discussed how the Delta Lake transaction log works at a high level, now let’s talk about concurrency! In the previous posts, our examples covered scenarios in which users commit transactions linearly, or at least without conflict. But how does Delta Lake handle multiple concurrent reads and writes?  Because Delta Lake is a lakehouse format, the expectation is that multiple users can seamlessly concurrently modify a single table.   To do this, Delta Lake employs optimistic concurrency control.

What Is Optimistic Concurrency Control?

Optimistic concurrency control is a method of dealing with concurrent transactions. It makes the assumption that transactions made to a table by different users can complete without conflicting with one another. Typically, when dealing with petabytes of data, users will be working on different parts of the data altogether. This allows them to complete non-conflicting transactions simultaneously.

When working with tables, as long as different clients are modifying different parts of the table (e.g. different partitions) or performing actions that do not conflict (e.g. two clients reading from the table), those operations do not conflict so we can optimistically let them all complete their task.  But there are situations where clients modify the same parts of the table concurrently. In this case, Delta Lake has a protocol to resolve this. 

Solving Conflicts Optimistically and Ensuring Serializability

Another key piece that Delta requires for consistent guarantees is mutual exclusion. We need to agree on the order of changes, especially when there are multiple writers. This provides a guarantee in relational databases called serializability. Even with concurrent operations, it is possible to play them as if they happened in a synchronous ordered manner.

For example, we have user 1 reading 00000.json:

User 1 reading Delta Lake transaction log
User 1 reading Delta Lake transaction log

And we have user 2 reading 00001.json:

Both User 1 and User 2 reading Delta Lake transaction log
Both User 1 and User 2 reading Delta Lake transaction log

As both users try to commit 000002.json, user 2 wins and completes its commit.

User 2 commits to the transaction log first
User 2 commits to the transaction log first

When user 1 attempts to write 00002.json, the write will fail as it already exists.

User 1 fails to write to the transaction log because User 2 committed first
User 1 fails to write to the transaction log because User 2 committed first

Due to the requirement of mutual exclusion, user 1 write will fail. The Delta protocol will recognize the commit failed and it will reattempt to commit a new transaction instead (i.e., 00003.json).

Applying Optimistic Concurrency Control in Delta

The Delta Lake protocol supports ACID transactions by defining how commits are ordered (i.e., serializability in relational databases). This specification defines what to do when two or more transactions are made at the same time. 

Delta Lake handles these cases by implementing a rule of mutual exclusion, then attempting to solve any conflict optimistically. This protocol allows Delta Lake to deliver on the ACID principle of isolation, which ensures that the resulting state of the table after multiple, concurrent writes is the same as if those writes had occurred serially, in isolation from one another.

Optimistic Concurrency Control Scenario

Putting this altogether, let’s re-examine this with a slight modification where both users attempt to insert at the same time.

optimistic concurrency control scenario where two users are concurrently writing to the same Delta table
Optimistic concurrency control scenario where two users are concurrently writing to the same Delta table

Let’s break this down through the different time periods:

  • At t0
    • Both users read the schema of the Delta table before attempting to write to the Delta table.
  • At t1
    • Users 1 and 2 both attempt to append some data to the table at the same time. Here, we run into a conflict because only one commit can be recorded in 00001.json.
    • i.e., User 1’s commit succeeded while User 2’s commit failed.
  • At t2
    • Rather than throw an error for User 2, Delta Lake prefers to handle this conflict optimistically. It checks to see whether any new commits have been made to the table, and updates the table silently to reflect those changes, then simply retries User 2’s commit on the newly updated table (without any data processing), successfully committing 000002.json.

Discussion

In the most cases, this reconciliation happens silently, seamlessly, and successfully. However, in the event that there’s an irreconcilable problem that cannot solve optimistically (e.g., if User 1 deletes a file that User 2 already deleted), the only option is to throw an error.

As a final note, as all Delta transactions are stored directly to storage, this process satisfies the ACID property of durability, meaning it will persist even in the event of system failure.

Multi-version Concurrency Control

Within the file system, Delta’s transactions are using Multi-version Concurrency Control (MVCC).  This is a common relational database management systems method providing concurrent access to the database (and its tables).  As Delta Lake’s data objects and log are immutable, Delta Lake utilizes MVCC to both protect existing data, i.e. provides transactional guarantees between writes, as well as speed up query and write performance.  It also has the benefit of making it straightforward to query a past snapshot of the data, as is common in MVCC implementations.

Under this mechanism, writes operate in three stages: read, write, and validate/commit.

MVCC: Read

First the system reads the latest available version [1] of the table to identify which rows (and their files) need to be modified [2].

Delta Lake MVCC: Read Stage
Delta Lake MVCC: Read Stage

MVCC: Write

Next, Delta stages all the changes by writing new data files (e.g. 3.parquet).  All changes whether inserts or modifications are in the form of writing new files. 

Delta Lake MVCC: Write Stage
Delta Lake MVCC: Read Stage

Note, while the data file (3.parquet) was created in this stage, the transaction log (00001.json) has not been created yet. Therefore, if any user reads the table in this moment, because the transaction log has not been created, when following the Delta protocol, that user will not read 3.parquet.

MVCC: Validate and Commit

Before committing the changes, Delta checks whether the proposed changes conflict with any other changes. There may been change concurrently committed since the snapshot that was read.

Delta Lake MVCC: Validate and Commit
Delta Lake MVCC: Validate and Commit

If there are no conflicts, all the staged changes are committed as a new versioned snapshot, and the write operation succeeds. That is, a new transaction log (00001.json) is created.

However, if there are conflicts, the write operation fails with a concurrent modification exception rather than corrupting the table as would happen with the write operation on a Parquet table.

Discussion

Additional resources on this topic include:

Leave a Reply

Discover more from Denny Lee

Subscribe now to keep reading and get access to the full archive.

Continue reading