Introduction to File Systems
UNIX File Interface
-
open("foo")
: "I’d like to use the file named foo." -
close("foo")
: "I’m finished with foo."
-
read(2)
: "I’d like to perform a read from file handle 2 at the current position." -
write(2)
: "I’d like to perform a write from file handle 2 at the current position."
-
lseek(2, 100)
: "Please move my saved position for file handle 2 to position 100.
What’s Missing?
-
dup2
is really about manipulating a processes names for files and doesn’t have much to do with the file system.
Files Together: File Organization
Each file has to have a unique name. No problem!
-
Letter to Mom:
LetterToMom.txt
-
Letter to Suzanna:
LetterToSuzanna.txt
-
Letter to Chuchu:
WoofWoofWagWag.txt
-
Another letter to Suzanna:
AnotherLetterToSuzanna.txt
-
A third letter to Suzanna:
LetterToSuzanna22Mar2012.txt
Hierarchical Implications
Big idea: don’t look at everything all at once. Allows users to store and examine related files together.
-
letters/Mom/Letter.txt
-
letters/Chuchu/WoofWoofWagWag.txt
-
letters/Suzanna/Letters/1.txt
-
letters/Suzanna/Letters/2.txt
-
letters/Suzanna/Letters/2.txt
Location Implications
-
Location requires navigation and relative navigation is useful, meaning that locations (directories) can include pointers to other locations (other directories).
-
Finally, location is only meaningful if it is tied to a files name, so hierarchical file systems implement name spaces, which require that a file’s name map to a single unique location within the file system.
Why Trees?
File systems usually require that files be organized into an acyclic graph with a single root, also known as a tree.
Why?
Tree Naming
Trees produce a single canonical name for each file on the system as well as an infinite number of relative names:
-
Canonical name: /you/used/to/love/well
-
Relative name: /you/used/to/love/me/../well
-
Relative name: love/me/../../love/me/../well
File System Design Goals
-
Efficiently translate file names to file contents.
-
Allow files to move, grow, shrink and otherwise change.
-
Optimize access to single files.
-
Optimize access to multiple files, particularly related files.
-
Survive failures and maintain a consistent view of file names and contents.
Three of These Things Are All Like Each Other
The file systems we will discuss all support the following features:
-
Files, including some number of file attributes and permissions.
-
Names, organized into a hierarchical name space.
Implementing Hierarchical File Systems
-
Data blocks: contain file data.
-
Index nodes (inodes): contain not file data.
One of These Things Is Not Like the Others
-
On-disk layout. How does the file system decide where to put data and metadata blocks in order to optimize file access?
-
Data structures. What data structures does the file system use to translate names and locate file data?
-
Crash recovery. How does the file system prepare for and recover from crashes?
File System Challenges
-
File systems are really maintaining a large and complex data structure using disk blocks as storage.
-
This is hard because making changes potentially requires updating many different structures.
Example write
Say a process wants to write
data to the end of a file. What does
the file system have to do?
-
Find empty disk blocks to use and mark them as in use.
-
Associate those blocks with the file that is being written to.
-
Adjust the size of the file that is being written to.
-
Actually copy the data to the disk blocks being used.
-
From the perspective of a process all of these things need to happen synchronously.
-
In reality, many different asynchronous operations are involved touching many different disk blocks. (Each operation above modifies at least one disk block.)
-
This creates both a consistency and a performance problem!
What Happens On Disk?
Let’s consider the on-disk structures used by modern file systems.
-
translate paths to file index nodes or inodes.
-
find data blocks associated with a given inode (file).
-
allocate and free inodes and data blocks.
Sectors, Blocks, Extents
-
Sector: the smallest unit that the disk allows to be written, usually 256 bytes.
-
Block: the smallest unit that the file system actually writes, usually 4K bytes.
-
Extent: a set of contiguous blocks used to hold part of a file. Described by a start and end block.
-
Because contiguous writes are good for disk head scheduling and 4K is the page size which affects in-memory file caching.
-
Because contiguous writes are good for disk head scheduling and many files are larger than 4K!
ext4
inodes
-
1 inode per file.
-
256 bytes, so 1 per sector or 16 per block.
-
Location of file data blocks (contents).
-
Permissions including user, read/write/execute bits, etc.
-
Timestamps including creation (
crtime
), access (atime
), content modification (mtime
), attribute modification (ctime
) and delete (dtime
) times. -
Named and located by number.
Locating inodes
-
All inodes are created at format time at well-known locations.
-
inodes may not be located near file contents.
ext4
creates multiple blocks of inodes within the drive to reduce seek times between inodes and data. -
Fixed number of inodes for the file system. Can run out of inodes before we run out of data blocks!
ext4
creates approximately one inode per 16 KB of data blocks, but this can be configured at format time.
open
: Path Name Translation
open("/etc/default/keyboard")
must translate "/etc/default/keyboard"
to an inode number.
-
Get inode number for root directory. This is usually a fixed agreed-on inode number, like 2.
-
Open the directory with inode number 2. Look for
"etc"
. Find"etc"
with inode number 393218. -
Open the directory with inode number 393218. Look for
"default"
. Find"default"
with inode number 393247. -
Open the directory with inode number 393247. Look for
"keyboard"
. Find keyboard with inode number 394692. -
Open the file with inode number 394692.