Log-Structured File Systems

Computers Circa 1991

Disk bandwidth is improving rapidly, meaning the operating system can stream reads or writes to the disk faster.
Computers have more memory. Up to 128 MB! (Whoa.)
And, alas, disk seek times are … still dog slow!

Using What We Got

So, if we can solve this pesky seek issue we can utilize growing disk bandwidth to improve file system performance.

Oh, by the way: we’ve got a bunch of spare memory lying around. Might that be useful?

Use a Cache!

How do we make a big slow thing look faster?

Use a cache! Or, put a smaller, faster thing in front of it.

In the case of the file system the smaller, faster thing is memory.
We call the memory used to cache file system data the buffer cache.

A New Hope

With a large cache, we should be able to avoid doing almost any disk reads.
But we will still have to do disk writes, but the cache will still help collect small writes in memory until we can do one larger write.

Forget for a minute everything you’ve learned about file system design and answer the following question:

What’s the best way to avoid seeks when writing?

Write everything to the same place!

Log Structured File Systems

Invented and implemented at Stanford by then-faculty John Ousterhout and now-faculty Mendel Rosenblum.

Main idea: all writes go to an append-only log.

Great… um… how do we do that?

A Normal Write

Example: let’s say we want to change an existing byte in a file.

What would we normally do?

Seek to read the inode map.
Seek to read the inode.
Seek to write (modify) the data block.
Seek to write (update) the inode.

normalseeks 1

normalseeks 2

normalseeks 3

normalseeks 4

A Cached-Read Write

Let’s assume that our big friendly cache is going to soak up the reads.

Now what happens?

*Seek to read the inode map.*
*Seek to read the inode.*
Seek to write (modify) the data block.
Seek to write (update) the inode.

cachedseeks 1

cachedseeks 2

An LFS Write

lfsseeks 1

lfsseeks 2

lfsseeks 3

lfsseeks 4

lfsseeks 5

lfsseeks 6

I See What You Did There

Elegant solution! Reads are handled by the cache. Writes can stream to the disk at full bandwidth due to short seeks to append to the log.

But some thorny details to address.

When Do Writes Happen?

Want to stream as many writes to disk together as possible to maximize usage of disk bandwidth.

When do we write to the log?

When the user calls sync, fsync or when blocks are evicted from the buffer cache.

Locating LFS inodes

How did FFS translate an inode number to a disk block?

It stored the inode map in a fixed location on disk.

Why is this a problem for LFS?

inodes are just appended to the log and so they can move!

And so what do you think that LFS does about this?

Yep, it logs the inode map.

LFS metadata

What about file system metadata: inode and data block allocation bitmaps, etc.?

We can log that stuff too!

As the Log Turns

What happens when the log reaches the end of the disk?

Probably a lot of unused space earlier in the log due to overwritten inodes, data blocks, etc.

How do we reclaim this space?

Clean the log by identifying empty space and compacting used blocks.

Conceptually you can think of this happening across the entire disk all at once, but for performance reasons LFS divides the disk into segments which are cleaned separately.

LFS Cleaning

logcleaning 1

logcleaning 2

logcleaning 3

logcleaning 4

The Devil’s in the Cleaning

LFS seems like an incredibly great idea… until you start thinking about cleaning.
(Then it becomes a debatably great idea…)

Cleaning Questions

When should we run the cleaner?

Probably when the system is idle, which may be a problem on systems that don’t idle much.

What size segments should we clean?

Large segments amortize the cost to read and write all of the data necessary to clean the segment.
…but small segments increase the probability that all blocks in a segment will be "dead", making cleaning trivial.

What other effect does log cleaning have?

Cleaner overhead is very workload-dependent, making it difficult to reason about the performance of log-structure file system. (And easy to fight about their performance!)

Reading Questions

Let’s say that the cache does not soak up as many reads as we were hoping.

What problem can LFS create?

Block allocation is extremely discontiguous, meaning that reads may seek all over the disk.

People Care About This Stuff

1991: original LFS paper by Ousterhout and Rosenblum.

1993: reimplementation by Margo Seltzer questions performance improvements:

"Unfortunately, an enhanced version of FFS (with read and write clustering) provides comparable and sometimes superior performance to our LFS."

Ousterhout responds to the '93 critique and complains of "poor BSD-LFS implementation", "poor benchmark choice", and "poor analysis."

1995: a second paper by Margo Seltzer again questions LFS performance claims.

For small files, both systems provide comparable read performance, but LFS offers superior performance on writes. For large files (one megabyte and larger), the performance of the two file systems is comparable. When FFS is tuned for writing, its large-file write performance is approximately 15% better than LFS, but its read performance is 25% worse. When FFS is optimized for reading, its large-file read and write performance is comparable to LFS.

Ousterhout describes the '95 analysis as improved but states that "the new paper is still misleading in several ways".

Why Read A Research Paper?

Researchers have some great ideas about how to improve computer systems!
- Many times the design principles apply outside of the specific project described in the paper.
Both academic and industrial research labs publish papers.
- Frequently the best/only way to find out details about exciting production systems in use by companies like Google, Microsoft, Facebook, etc.
Because reading the code takes way too long!

Technology Trending

Forget the feature factory. A lot of research in computer systems is driven by hardware changes and other technology trends which both expose problems with existing systems and create new opportunities for better systems.

Significant technology trends over the past decade

End of Moore’s Law scaling and the rise of multicore processors.
Integration of Flash into the storage hierarchy.
Cloud computing!
Prevalent and powerful mobile battery-powered devices such as smartphones.
More users interacting with multiple devices: smartphones, tablets, laptops, desktops.
"Smart dust" and the "Internet of Things".
Fast networks everywhere.
1,000 things I’ve missed…