Understanding the Delta Lake transaction log at the file level

The previous post, What Is the Delta Lake Transaction Log?, described how Delta Lake provides atomicity at the conceptual level. In this post, we will dive deeper into how it works at the file level. This will provide a primer to better understand how Delta provides atomicity and multi-version concurrency control.

Delta Lake table file layout

When a Delta table is created, that table’s transaction log is automatically created in the _delta_log subdirectory. As changes are made to the table, the operations are recorded as ordered atomic commits in the transaction log. Each commit is written out as a JSON file, starting with 000...000000.json. Additional changes to the table generate subsequent JSON files in ascending numerical order so that the next commit is written out as 000...000001.json, 000...000002.json, and so on. Each numeric JSON file increment represents a new version of the table.

Note how the structure of the data files has not changed; they exist as separate parquet files generated by the query engine or language writing to the Delta table. If your table utilizes Hive-style partitioning, then you would retain the same date structure.

In a subsequent blog, we will discuss how Delta Lake provides liquid clustering to simplify table operations while improving query performance.

Delta Lake table layout for the transaction log and data files

Implementing Atomocity

The Delta transaction log is the single source of truth for your Delta table. So any client that wants to read or write to your Delta table, must first query the transaction log. For example, when inserting data when creating our Delta table, we initially generate two parquet files: 1.parquet and 2.parquet. This event would automatically be added to the transaction log and saved to disk as commit 000...000000.json.

Creating a new Delta table by adding parquet files and their relationship with the Delta transaction log

In a subsequent command, we run a DELETE operation that results in the removal of rows from the table. Instead of modifying the existing parquet files (1.parquet, 2.parquet), Delta creates a third parquet (3.parquet) file.

Deleting rows from this Delta table by removing and adding files and their relationship with the Delta transaction log — Deleting rows from this Delta table by *removing* and adding files and their relationship with the Delta transaction log

It is often faster to create a new file(s) comprised of the unaffected rows instead of modifying the existing parquet file(s). It also provides the advantage of multi-version concurrency control (MVCC). MVCC is a database optimization technique that creates copies of the data, thus allowing data to be safely read and updated concurrently. This technique also allows Delta Lake to provide time travel. Therefore, Delta Lake creates multiple files for these actions, which provides atomicity, MVCC, and speed.

MVCC file and data observations

The removal and creation of these three parquet files are wrapped in a single transaction recorded in the Delta transaction log in the file 000...00001.json. Some important observations concerning atomicity are:

If a user were to read the parquet files without reading the Delta transaction log, they would read duplicates because of the replicated rows in all of the files (1.parquet, 2.parquet, 3.parquet).
The remove and add actions are wrapped in the single transaction log 000...00001.json. When a client queries the Delta table at this time, it records both of these actions and the file paths for that snapshot. For this transaction, the file path would point only to 3.parquet.
Note that the remove operation is a soft delete or tombstone where the physical removal of the files (1.parquet, 2.parquet) has not happened yet. The physical removal of files will happen when executing the VACUUM command.
The previous transaction 000.00000.json has the file path pointing to the original files (1.parquet, 2.parquet). Thus, when querying for an older version of the Delta table via time travel, the transaction log points to the files that make up that older snapshot.

Delta Lake also has Deletion Vectors, which help make deletes/updates/merges even faster.

Addendum

To dive into this further, enjoy Diving into Delta Lake: Unpacking the Transaction Log v2. Burak and I also have a fun gag/blooper around time travel as it was a virtual conference that year.

2 responses to “Understanding the Delta Lake transaction log at the file level”

A peek into the Delta Lake transaction log

January 3, 2024 at 12:48 pm

[…] the previous post, Understanding the Delta Lake transaction log at the file level, we explored how the Delta Lake operated at the file level. Here we will dive deeper at the actual […]

Loading…

Optimize by Clustering not Partitioning Data with Delta Lake – Denny Lee

January 29, 2024 at 10:17 pm

[…] table. Therefore, with Delta Lake you significantly reduce the performance issues associated with file listing (i.e., […]

Loading…

Denny Lee