Technical Women

044
Figure 1. Qiheng Hu

Today: Address Translation

  • Levels of indirection.

  • Physical and virtual addresses.

  • Virtual address properties.

ASST2 Checkpoint

At this point:
  • If you have not finished ASST2.1, you’re way, way behind.

  • If you have not finished the file system system calls sys_{open,close,lseek}…​, you’re behind.

  • If you finished one of sys_{fork,wait,exec,exit}, you’re OK.

Convention

  • Process layout is specified by the Executable and Linker Format (ELF) file. (Remember ELF?)

  • Some layout is the function of convention.

  • Example: why not load the code at 0x0?

    • To catch possibly the most common programmer error: NULL pointer problems!

    • Leaving a large portion of the process address space starting at 0x0 empty allows the kernel to catch these errors, including offsets against NULL caused by NULL structures:

struct bar * foo = NULL;
foo->bar = 10;

Segmentation fault

Core dumped

Aside: ASST2

int32_t sys_open(userptr_t pathname, ...) {
  if (pathname == NULL) {
    return EINVAL;
  }
  ...

This is also why not to check userptr_t types for NULL in ASST2:

  • 0x0 can be a valid user address. ("Look, Mom, I did my own linking.")

  • There are 2^31 ways that address can be bogus…​and you just checked one of them.

Destined To Ever Meet?

  • The stack starts at the top of the address space and grows ↓.

  • The heap starts towards the bottom and grows ↑.

  • Will they ever meet?

    • Probably not! That would mean either the stack or probably the heap was huge.

Relocation

int data[128];
...
data[5] = 8; // Where the heck is data[5]?
...
result = foo(data[5]); // Where the heck is foo?

So given our address space model, no more problems with locating things, right?

Not quite! Dynamically-loaded libraries still need to be relocated at run time. Cool: but not something we’ll cover in this course.

Sounds great

What’s the catch?

Address Spaces: A Great Idea?

  • The address space abstraction sounds powerful and useful. (It would be better if it cooked breakfast.)

  • But can we implement it?

Your mission

Implement address spaces

Implementing Address Spaces

What’s required?
  • Address translation: 0x10000 to Process 1 is not the same as 0x10000 to Process 2 is not the same as…​

  • Protection: address spaces are intended to provide a private view of memory to each process.

  • Memory management: together one or several processes may have more address space allocated than physical memory on the machine.

    • In a way, we are encouraging processes to spread out and let us handle the details.

Guess What?

  • Your entire (programming) life has been a lie.

  • You believe in things that are not actually true.

  • Today your view of the world will change forever.

0x10000

Also not real

Your Mission: Implement Address Spaces

  • Clearly implementing address spaces requires breaking the direct connection between a memory address and physical memory.

  • Introducing another level of indirection is a classic systems technique. We have seen it before. Where?

    • File handles!

Translation is Control

Forcing processes to translate a reference to gain access to the underlying object provides the kernel with a great deal of control.

References can be revoked, shared, moved, and altered.

translation 1
translation 2
translation 3
translation 4
translation 5
translation 6

Memory Interface

We don’t usually think about memory as having an interface, but it does:

  • load(address): load data from the given address, usually into a register or possible into another memory location.

  • store(address, value): store value to the given address, where value may be in a register or another memory location.

Virtual v. Physical Addresses

  • The address space abstraction requires breaking the connection between a memory address and physical memory.

  • We refer to data accessed via the memory interface as using virtual addresses:

    • A physical address points to memory.

    • A virtual address points to something that acts like memory.

  • Virtual addresses have much richer semantics than physical addresses, encapsulating location, permanence and protection.

Welcome

To the real world

Virtual Addresses: Location

The data referenced by a virtual address might be:

  • in memory! (Duh.) But…​the kernel may have moved it to the disk.

Virtual Address → Physical Address

  • on disk, but…​the kernel may be caching it in memory.

Virtual Address → Disk, Block, Offset

  • in memory on another machine.

Virtual Address → IP Address, Physical Address

  • a port on a hardware device.

Virtual Address → Device, Port

Virtual Addresses: Permanence

Processes expect data written to virtual addresses that point to physical memory to store values transiently.

Processes expect data written to virtual addresses that point to disk to store values permanently.

What about virtual addresses that point to device ports?
  • Hardware may change its registers independently, so a read will not necessarily return the last value written.

Virtual Addresses: Permissions and Protection

  • Some virtual addresses may only be used by the kernel while in kernel mode.

  • Virtual addresses may also be assigned read, write or execute permissions.

    • read/write: a process can load/store to this address.

    • execute: a process can load and execute instructions from this address.

Creating Virtual Addresses: exec()

  • exec() uses a blueprint from an ELF file to determine how the address space should look when exec() completes.

  • Specifically, exec() creates and initializes virtual addresses that (mainly) point to memory:

    • code, usually marked read-only.

    • data, marked read-write, but not executable.

    • heap, an area used for dynamic allocations, marked read-write.

    • stack space for the first thread.

$ pmap # memory mappings

pmap

Creating Virtual Addresses: fork()

fork() copies the address space of the calling process.

fork 3
fork 2

Creating Virtual Addresses: fork()

The child has the same virtual addresses as the parent but they point to different memory locations.

int i = 2;
ret = fork();
if (ret != 0) {
  printf("%x", &i); // prints virtual address 0x20010
  i = 4;
  printf("%d", i);
} else {
  printf("%x", &i);
  i = 3;
  printf("%d", i);
}
int i = 2;
ret = fork();
if (ret != 0) {
  printf("%x", &i); // prints virtual address 0x20010
  i = 4;
  printf("%d", i); // prints 4. virtual address points to private memory.
} else {
  printf("%x", &i);
  i = 3;
  printf("%d", i);
}
int i = 2;
ret = fork();
if (ret != 0) {
  printf("%x", &i); // prints virtual address 0x20010
  i = 4;
  printf("%d", i); // prints 4. virtual address points to private memory.
} else {
  printf("%x", &i); // prints virtual address 0x20010
  i = 3;
  printf("%d", i);
}
int i = 2;
ret = fork();
if (ret != 0) {
  printf("%x", &i); // prints virtual address 0x20010
  i = 4;
  printf("%d", i); // prints 4. virtual address points to private memory.
} else {
  printf("%x", &i); // prints virtual address 0x20010
  i = 3;
  printf("%d", i); // prints 3. virtual address points to private memory.
}

Issues with fork()

Copying all that memory is expensive!

  • Especially when the next thing that a process frequently does is start load a new binary which destroys most of the state fork() has carefully copied!

  • We will come back to this problem next week when we talk about clever memory-management tricks.

Creating Virtual Addresses: sbrk()

  • Dynamic memory allocation is performed by the sbrk() system call.

  • sbrk() asks the kernel to move the break point, or the point at which the process heap ends.

sbrk 1

sbrk 2

Creating Virtual Addresses: mmap()

  • mmap() is a system call that creates virtual addresses that map to a portion of a file.

Example Machine Memory Layout: System/161

  • System/161 emulates a 32-bit MIPS architecture.

  • Addresses are 32-bits wide: from 0x0 to 0xFFFFFFFF.

This MIPS architecture defines four address regions:
  • 0x0–0x7FFFFFFF: process virtual addresses. Accessible to user processes, translated by the kernel. 2 GB.

  • 0x80000000–0x9FFFFFFF: kernel direct-mapped addresses. Only accessible to the kernel, translated by subtracting 0x80000000. 512 MB. Cached.

  • 0xA0000000–0xBFFFFFFF: kernel direct-mapped addresses. Only accessible to the kernel. 512 MB. Uncached.

  • 0xC0000000–0xFFFFFFFF: kernel virtual addresses. Only accessible to the kernel, translated by the kernel. 1 GB.

Example Machine Memory Layout: System/161

mips 1

mips 2

mips 3

mips 4

mips 5

Mechanism v. Policy

  • We will get to the details of virtual address translation next time.

However, it is important to note that both hardware and software are involved:
  • The hardware memory management unit speeds the process of translation once the kernel has told it how to translate an address or according to architectural conventions. The MMU is the mechanism.

  • The operating system memory management subsystem manages translation policies by telling the MMU what to do.

  • Goal: system follows operating system established policies while involving the operating system directly as rarely as possible.

Next Time: Address Translation

  • Multiple approaches to translating addresses.

  • How to do it fast.