RAID
How To Read A Research Paper
-
Don’t be afraid: you’re ready!
-
Read good papers from the top conferences
-
Skim to get the big picture, read for the details.
-
Most systems papers have one or two big ideas and a lot of implementation.
-
-
Understand the places where papers get published
-
Understand the kinds of papers
-
Understand the parts of a paper
Places Where Papers Get Published
-
Workshop papers: short (5-6 pages), usually containing only a provocative argument, system design, or very preliminary results.
-
Example: Hierarchical Filesystems Are Dead (PDF), by Margo Seltzer and Nicholas Murphy. Published at HotOS, a workshop on hot topics in operating systems.
-
-
Conference papers: long (12-14 pages), enough space to describe and evaluate a complete novel system.
-
Example: The Scalable Commutativity Rule (PDF), by Austin Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert Morris and Eddie Kohler. Published at SOSP, the top conference on computer systems.
-
-
Journal papers: longer (than conference papers), usually a published conference paper with extra material (frequently all of the unneccesary results that they removed to make the conference page limit).
-
My advice: read the conference paper.
-
Kinds of Papers
Clearly this is an incomplete list, but identifying the kind of paper can help you read it and understand its contributions.
-
(Big) idea papers: presents a new approach to an existing problem or a new idea about how to build systems. Should convince you that the solution is (1) new, (2) works, and (3) is useful.
-
Problem papers: presents a new problem and, usually, some ideas about how to solve it. Should convince you that the problem is (1), (2) matters and (3) that there are some ways to solve it.
-
Data papers: present novel analysis or analysis of a novel data set that produces interesting insights. Should convince you that the results are useful to the design of future systems.
-
New technology papers: describe some new hardware capability or device feature and why it’s interesting. Should convince you that the hardware can be used to build better systems.
-
Wrong way papers: argue that the community is solving an existing problem incorrectly. Frequently these are workshop-length papers and eventually lead to idea papers. Should be able to convince you that everyone else is confused and misguided. (Good luck!)
Parts of a Research Paper
Not all of these are included in every paper, and they are not always called the same thing. Many variants exist but these are the common elements.
-
Abstract: an overview of the paper and its contribution. Great place to get the big picture.
-
Introduction: an extension of the abstract. Usually contains:
-
A problem and solution statement (if appropriate)
-
Persuasive arguments to keep you reading
-
A preview of the interesting results ahead
-
Navigational information to guide you through the rest of the paper
-
-
Motivation:
-
More arguments about why this is a problem, why people have been solving this problem the wrong way, or why this data is interesting—depending on the type of paper.
-
-
Design: presents the design of the system, usually at as high a level as possible.
-
Implementation: presents details of the implementation and any interesting implementation challenges.
-
Related work: put the work in context by comparing it to other systems. Important to establish novelty.
-
Results: for data analysis papers, most of the paper is spent analyzing the data set that was collected and evaluated. This usually replaces an typical evaluation since there may not be a new idea.
-
Evaluation: measures things about the prototype system intended to demonstrate that it works.
Redundant Arrays of Inexpensive Disks
-
Big idea paper!
-
Spawned a commonly-used technology, an entire industry, and lots of similar approaches.
-
Several cheap things can be better than one expensive thing!
-
Multicore processors.
-
Google.
-
Crowdsourcing.
RAID: Problems
-
Computer CPUs are getting faster…
-
Computer memory is getting faster…
-
Hard drives are not keeping up!
-
Many cheap things fail much more frequently than one expensive thing.
-
So need a plan to handle failures.
RAID 1
-
Two duplicate disks.
-
Writes must go to both disks, reads can come from either.
-
Performance: better for reads.
-
Capacity: unchanged!
RAID 2
-
Byte-level striping, single error disk.
-
Hamming codes to detect failures and correct errors.
-
Most reads and writes require all disks.
-
Capacity: improved.
RAID 3
-
Only correct errors since disks can detect when they fail.
-
Byte-level striping, single parity disk.
-
Most reads and writes require all disks.
-
Capacity: improved.
RAID 4
-
Block-level striping, single parity disk.
-
Better distribution of reads between disks due to larger stripe size,
-
but all writes all must access the parity disk.
-
Performance: improved for reads.
RAID 5 (Full Victory)
-
Block-level striping
-
Multiple parity disks.
-
Better distribution of writes between disks.
-
Performance: improved for writes.
RAID 0 (Non-RAID)
-
Each disk stores half of the data.
-
No error correction or redundancy.
-
Performance: fantastic!
-
Capacity: fantastic!
-
Redundancy: ZERO!
RAID: Redundancy
-
RAID arrays can tolerate the failure of one (or more) disks.
-
Once a (or several) fail, the array is vulnerable to data loss.
-
An administrator must replace the disk(s) and then rebuild the array.
The RAID Aftermath
But perhaps our most enduring contribution is our experience demonstrating how a common intellectual framework and terminology, developed by researchers outside of the pressures and positioning of the marketplace, can allow engineers and technical developers to talk with each other, exchange ideas, and ultimately accelerate the development of what became a multibillion dollar industry sector.