The previous post, What Is the Delta Lake Transaction Log?, described how Delta Lake provides atomicity at the conceptual level. In this post, we will dive deeper into how it works at the file level. This will provide a primer to better understand how Delta provides atomicity and multi-version concurrency control.
Delta Lake table file layout
When a Delta table is created, that table’s transaction log is automatically created in the _delta_log subdirectory. As changes are made to the table, the operations are recorded as ordered atomic commits in the transaction log. Each commit is written out as a JSON file, starting with 000...000000.json. Additional changes to the table generate subsequent JSON files in ascending numerical order so that the next commit is written out as 000...000001.json, 000...000002.json, and so on. Each numeric JSON file increment represents a new version of the table.
Note how the structure of the data files has not changed; they exist as separate parquet files generated by the query engine or language writing to the Delta table. If your table utilizes Hive-style partitioning, then you would retain the same date structure.
In a subsequent blog, we will discuss how Delta Lake provides liquid clustering to simplify table operations while improving query performance.

Implementing Atomocity
The Delta transaction log is the single source of truth for your Delta table. So any client that wants to read or write to your Delta table, must first query the transaction log. For example, when inserting data when creating our Delta table, we initially generate two parquet files: 1.parquet and 2.parquet. This event would automatically be added to the transaction log and saved to disk as commit 000...000000.json.

In a subsequent command, we run a DELETE operation that results in the removal of rows from the table. Instead of modifying the existing parquet files (1.parquet, 2.parquet), Delta creates a third parquet (3.parquet) file.

It is often faster to create a new file(s) comprised of the unaffected rows instead of modifying the existing parquet file(s). It also provides the advantage of multi-version concurrency control (MVCC). MVCC is a database optimization technique that creates copies of the data, thus allowing data to be safely read and updated concurrently. This technique also allows Delta Lake to provide time travel. Therefore, Delta Lake creates multiple files for these actions, which provides atomicity, MVCC, and speed.
MVCC file and data observations
The removal and creation of these three parquet files are wrapped in a single transaction recorded in the Delta transaction log in the file 000...00001.json. Some important observations concerning atomicity are:
- If a user were to read the parquet files without reading the Delta transaction log, they would read duplicates because of the replicated rows in all of the files (
1.parquet, 2.parquet, 3.parquet). - The remove and add actions are wrapped in the single transaction log
000...00001.json. When a client queries the Delta table at this time, it records both of these actions and the file paths for that snapshot. For this transaction, the file path would point only to3.parquet. - Note that the
removeoperation is a soft delete or tombstone where the physical removal of the files (1.parquet,2.parquet) has not happened yet. The physical removal of files will happen when executing the VACUUM command. - The previous transaction
000.00000.jsonhas the file path pointing to the original files (1.parquet, 2.parquet). Thus, when querying for an older version of the Delta table via time travel, the transaction log points to the files that make up that older snapshot.
Delta Lake also has Deletion Vectors, which help make deletes/updates/merges even faster.
Addendum
To dive into this further, enjoy Diving into Delta Lake: Unpacking the Transaction Log v2. Burak and I also have a fun gag/blooper around time travel as it was a virtual conference that year.

Leave a Reply