ARCHIVE NOTICE

My website can still be found at industrialcuriosity.com, but I have not been posting on this blog as I've been primarily focused on therightstuff.medium.com - please head over there and take a look!

Friday, 2 April 2021

Simple safe (atomic) writes in Python3

 In sensitive circumstances, trusting a traditional file write can be a costly mistake - a simple power cut before the write is completed and synced may at best leave you with some corrupt data, but depending on what that file is used for you could be in for some serious trouble.

While there are plenty of interesting, weird, or over-engineered solutions available to ensure safe writing, I struggled to find a solution online that was simple, correct and easy-to-read and that could be run without installing additional modules, so my teammates and i came up with the following solution:

Explanation:

temp_file = tempfile.NamedTemporaryFile(delete=False,
                                        dir=os.path.dirname(target_file_path))

The first thing to do is create a temporary file in the same directory as the file we're trying to create or update. We do this because move operations (which we'll need later) aren't guaranteed to be atomic when they're between different file systems. Additionally, it's import to set delete=False as the standard behaviour of the NamedTemporaryFile is to delete itself as soon as it's not in use.

# preserve file metadata if it already exists
if os.path.exists(target_file_path):
    copyWithMetaData(target_file_path, temp_file.name)

We needed to support both file creation and updates, so in the case that we’re overwriting or appending to an existing file, we initialize the temporary file with the target file’s contents and metadata.

with open(temp_file.name, mode) as f:
    f.write(file_contents)
    f.flush()
    os.fsync(f.fileno())

Here we write or append the given file contents to the temporary file, and we flush and sync to disk manually to prepare for the most critical step:

os.replace(temp_file.name, target_file_path)

This is where the magic happens: os.replace is an atomic operation (when the source and target are on the same file system), so we're now guaranteed that if this fails to complete, no harm will be done.

We use the finally clause to remove the temporary file in case something did go wrong along the way, but now the very worst thing that can happen is that we end up with a temporary file ¯\_(ツ)_/¯


No comments:

Post a Comment