File System Caching and Consistency
Making File Systems Fast
-
Use a cache! (Put a smaller, faster thing in front of it.)
Putting Spare Memory To Work
-
as memory (duh), but also
-
to cache file data in order to improve performance.
-
Big buffer cache, small main memory:
-
Small buffer cache, large main memory:
Above the File System
-
Entire files and directories!
-
open
,close
,read
,write
. (Same as the file system call interface.)
Above the File System: Operations
open
-
Pass down to underlying file system.
read
-
If file is not in the buffer cache, pass down to underlying file system and load contents into the buffer cache.
-
If the file is in the cache, return the cached contents.
write
-
If file is not in the buffer cache, pass load contents into the buffer cache and then modify them.
-
If the file is in the cache, modify the cached contents.
close
-
Remove from the cache (if necessary) and flush contents through the file system.
Above the File System: Pros and Cons
-
Buffer cache sees file operations, may lead to better prediction or performance.
-
Hides many file operations from the file system, preventing it from providing consistency guarantees.
-
Can’t cache file system metadata: inodes, superblocks, etc.
Below the File System
-
Disk blocks!
-
readblock
,writeblock
. (Same as the disk interface.)
Below the File System: Pros and Cons
-
Can cache all blocks including file system data structures, inodes, superblocks, etc.
-
Allows file system to see all file operations even if they eventually hit the cache.
-
Cannot observe file semantics or relationships.
Caching and Consistency
-
Objects in the cache are lost on failures!
How Caching Exacerbates Consistency
Observation: file system operations that modify multiple blocks may leave the file system in an inconsistent state if partially completed.
-
May increase the time span between when the first write of the operation hits the disk and the last is completed.
What Can Go Wrong?
What kinds of inconsistency can take place if the system is interrupted between the multiple operations necessary to complete a write?
-
Allocate an inode, mark the used inode bitmap.
-
Allocate data blocks, mark the used data block bitmap.
-
Associate data blocks with the file by modifying the inode.
-
Add inode to the given directory by modifying the directory file.
-
Write data blocks.
Maintaining File System Consistency
-
Don’t buffer writes!
-
We call this a write through cache because writes do not hit the cache.
-
Buffer all operations until blocks are evicted.
-
We call this a write back cache.
-
performance?
-
safety?
-
Write important file system data metadata structures—superblock, inode maps, bitmaps, etc.—immediately, but delay data writes.
Journaling
-
Track pending changes to the file system in a special area on disk called the journal.
-
Following a failure, replay the journal to bring the file system back to a consistent state.
Journaling: Checkpoints
-
Update the journal!
-
This is called a checkpoint.
Journaling: Recovery
-
Start at the last checkpoint and work forward, updating on-disk structures as needed.
-
These are ignored as they may leave the file system in an incomplete state.
Journaling: Implications
Observation: metadata updates (allocate inode, free data block, add to directory, etc.) can be represented compactly and probably written to the journal atomically.
-
We could include them in the journal meaning that each data block would potentially be written twice (ugh).
-
We could exclude them from the journal meaning that file system structures are maintained but not file data.