Today
-
Hierarchical file systems.
-
On-disk data structures.
ASST3.1
Checkpoint
-
If you have not finished
ASST3.1
, you’re behind.
ASST3.1
is due today at 5PM. No extensions.
UNIX File Interface
-
open("foo")
: "I’d like to use the file named foo." -
close("foo")
: "I’m finished with foo."
-
read(2)
: "I’d like to perform a read from file handle 2 at the current position." -
write(2)
: "I’d like to perform a write from file handle 2 at the current position."
-
lseek(2, 100)
: "Please move my saved position for file handle 2 to position 100.
What’s Missing?
-
dup2
is really about manipulating a processes names for files and doesn’t have much to do with the file system.
Files Together: File Organization
Each file has to have a unique name. No problem!
-
Letter to Mom:
LetterToMom.txt
-
Letter to Suzanna:
LetterToSuzanna.txt
-
Letter to Chuchu:
WoofWoofWagWag.txt
-
Another letter to Suzanna:
AnotherLetterToSuzanna.txt
-
A third letter to Suzanna:
LetterToSuzanna22Mar2012.txt
Flat name spaces were actually used by some early file systems but naming gets gross fast…
Hierarchical Implications
Big idea: don’t look at everything all at once. Allows users to store and examine related files together.
-
letters/Mom/Letter.txt
-
letters/Chuchu/WoofWoofWagWag.txt
-
letters/Suzanna/Letters/1.txt
-
letters/Suzanna/Letters/2.txt
-
letters/Suzanna/Letters/2.txt
Each file should be stored in one place. (Although we’ll discuss exceptions to this rule.)
Location Implications
-
Location requires navigation and relative navigation is useful, meaning that locations (directories) can include pointers to other locations (other directories).
-
Finally, location is only meaningful if it is tied to a files name, so hierarchical file systems implement name spaces, which require that a file’s name map to a single unique location within the file system.
Why Trees?
File systems usually require that files be organized into an acyclic graph with a single root, also known as a tree.
Why?
-
What is the name of the file in the diagram below?
-
OK, I picked a root. What is the name of the file now?
-
OK, I eliminated the cycles. What is the name of the file now?
Tree Naming
Trees produce a single canonical name for each file on the system as well as an infinite number of relative names:
-
Canonical name: /you/used/to/love/well
-
Relative name: /you/used/to/love/me/../well
-
Relative name: love/me/../../love/me/../well
Sestina, by Clara Shuttleworth
You used to love me well.
Well, you— me— used love to…
to… well… love. You used me.
Me, too, used… well… you. Love,
love me. You, too well used,
used love well. Me, Too. You!
You used to love me well.
Files: Questions?
File System Design Goals
-
Efficiently translate file names to file contents.
-
Allow files to move, grow, shrink and otherwise change.
-
Optimize access to single files.
-
Optimize access to multiple files, particularly related files.
-
Survive failures and maintain a consistent view of file names and contents.
Three of These Things Are All Like Each Other
The file systems we will discuss all support the following features:
-
Files, including some number of file attributes and permissions.
-
Names, organized into a hierarchical name space.
This is the file interface and feature set we are all used to. The difference lie in the implementations and what happens on disk.
Implementing Hierarchical File Systems
-
Data blocks: contain file data.
-
Index nodes (inodes): contain not file data.
One of These Things Is Not Like the Others
-
On-disk layout. How does the file system decide where to put data and metadata blocks in order to optimize file access?
-
Data structures. What data structures does the file system use to translate names and locate file data?
-
Crash recovery. How does the file system prepare for and recover from crashes?
File System Challenges
-
File systems are really maintaining a large and complex data structure using disk blocks as storage.
-
This is hard because making changes potentially requires updating many different structures.
Example write
Say a process wants to write
data to the end of a file. What does
the file system have to do?
-
Find empty disk blocks to use and mark them as in use.
-
Associate those blocks with the file that is being written to.
-
Adjust the size of the file that is being written to.
-
Actually copy the data to the disk blocks being used.
-
From the perspective of a process all of these things need to happen synchronously.
-
In reality, many different asynchronous operations are involved touching many different disk blocks. (Each operation above modifies at least one disk block.)
-
This creates both a consistency and a performance problem!
What Happens On Disk?
Let’s consider the on-disk structures used by modern file systems.
-
translate paths to file index nodes or inodes.
-
find data blocks associated with a given inode (file).
-
allocate and free inodes and data blocks.
We’re going to try and keep this at a relatively high level, but
examples are used for concreteness and drawn from the Linux ext4
file system.
Sectors, Blocks, Extents
-
Sector: the smallest unit that the disk allows to be written, usually 256 bytes.
-
Block: the smallest unit that the file system actually writes, usually 4K bytes.
-
Extent: a set of contiguous blocks used to hold part of a file. Described by a start and end block.
-
Because contiguous writes are good for disk head scheduling and 4K is the page size which affects in-memory file caching.
-
Because contiguous writes are good for disk head scheduling and many files are larger than 4K!
Next Time
-
File system structures.
-
File system operations.
-
Caching.