YouTube placeholder

Files

File Systems To The Rescue

Low-level disk interface is messy and very limited:
  • Requires reading and writing entire 512-byte blocks.

  • No notion of files, directories, etc.

File systems take this limited block-level device and create the file abstraction almost entirely in software.

  • Compared to the CPU and memory that we have studied previously more of the file abstraction is implemented in software.

  • This explains the plethora of available file systems: ext2,3 and 4, reiserfs, NTFS, jfs, lfs, xfs, etc.

  • This is probably why many systems people have a soft spot for file systems even if they seem a bit outdated these days.

What About Flash?

No moving parts! Great! We can eliminate a lot of the complexity of modern file systems. Yippee!

Except that…​
  • Have to erase an entire large chunk before we can rewrite it.

  • And it wears out faster that magnetic drives, and can wear unevenly if we are not careful.

Sigh…​ things are sounding complicated again.

Clarifying the Concept of a File

Most of us are familiar with files, but the semantics of file have a variety of sources what are worth separating:

  • Just a file: the minimum it takes to be a file.

  • About a file: what other useful information do most file systems typically store about files?

  • Files and processes: what additional properties does the UNIX file system interface introduce to allow user processes to manipulate files?

  • Files together: given multiple files, how do we organize them in a useful way?

Just a File: The Minimum

What does a file have to do to be useful?
  • Reliably store data. (Duh.)

  • Be located! Usually via a name.

image

Basic File Expectations

At minimum we expect that
  • file contents should not change unexpectedly.

  • file contents should change when requested and as requested.

These requirements seem simple but many file systems do not meet them!

03 Mar 2012: Bug Report–Serious file system corruption and data loss caused to other NTFS drives by Windows 8 CP

Failures such as power outages and sudden ejects make file system design difficult and exposed tradeoffs between durability and performance.

  • Memory: fast, transient. Disk: slow, stable.

About a File: File Metadata

What else might we want to know about a file?
  • When was the file created, last accessed, or last modified?

  • Who is allowed to what to the file—read, write, rename, change other attributes, etc.

  • Other file attributes?

Where to Store File Metadata?

An MP3 file contains audio data. But it also has attributes such as:

  • title

  • artist

  • date

Where should these attributes be stored?
  • In the file itself.

  • In another file.

  • In attributes associated with the file and maintained by the file system.

In the file:
  • Example: MP3 ID3 tag, a data container stored within an MP3 file in a prescribed format.

  • Pro: travels along with the file from computer to computer.

  • Con: requires all programs that access the file to understand the format of the embedded metadata.

In another file:
  • Example: iTunes database.

  • Pro: can be maintained separately by each application.

  • Con: does not move with the file and the separate file must be kept in sync when the files it stores information about change.

In attributes:
  • Example: attributes have been supported by a variety of file systems including prominently by BFS, the BeOS file system.

  • Pro: maintained by the file system so can be queried and queried quickly.

  • Con: does not move with the file, and creates compatibility problems with other file systems.

Processes and Files: UNIX Semantics

Many file systems provide an interface for establishing a relationship between a process and a file.

  • "I have the file open. I am using this file."

  • "I am finished using the file and will close it now."

Why does the file system want to establish these process-file relationships?
  • Can improve performance if the OS knows what files are actively being used by using caching or read-ahead.

  • The file system may provide guarantees to processes based on this relationship, such as exclusive access.

  • Some file systems, particularly networked file systems, don’t even bother to establish these relationships. (What happens if a networked client opens a file exclusively and then dies?)

File Location: UNIX Semantics

UNIX semantics simplify reads and writes to files by storing the file position for processes.

  • This is a convenience, not a requirement: processes could be required to provide a position with every read and write.

UNIX File Interface

Establishing relationships:
  • open("foo"): "I’d like to use the file named foo."

  • close("foo"): "I’m finished with foo."

Reading and writing:
  • read(2): "I’d like to perform a read from file handle 2 at the current position."

  • write(2): "I’d like to perform a write from file handle 2 at the current position."

Positioning:
  • lseek(2, 100): "Please move my saved position for file handle 2 to position 100.


Created 2/17/2017
Updated 9/18/2020
Commit 4eceaab // History // View
Built 3/31/2016 @ 20:00 EDT