Technical Women

035
Figure 1. Hessa Sulta Al Jaber

Today

  • Producer-consumer buffer problem.

  • exec()

  • wait()/exit()

$ cat announce.txt

  • Do ASST1!

Producer-Consumer

Problem description:
  • Producer and consumer share a fixed-size buffer.

  • Producer can add items to the buffer if it is not full.

  • Consumer can withdraw items from the buffer if it is not empty.

What do we want to ensure?
  • Producer must wait if the buffer is full.

  • Consumer must wait if the buffer is empty.

  • Producers should not be sleeping if there is room in the buffer.

  • Consumers should not be sleeping if there are items in the buffer.

Producer-Consumer

int count = 0;
void produce(item) {
  while (count == FULL) {
    // do something
  }
  put(buffer, item);
  count++;
}

item consume() {
  while (count == 0) {
    // do something
  }
  item = get(buffer, item);
  count--;
  return item;
}
What synchronization primitive is a good fit for this problem?
  • Condition variables: I have a variable (count) and conditions that require waiting (full, empty).

Producer-Consumer

int count = 0;
struct * cv countCV;
struct * lock countLock;
void produce(item) {
  lock_acquire(countLock);
  while (count == FULL) {
    cv_wait(countCV, countLock);
  }
  put(buffer, item);
  count++;
  lock_release(countLock);
}

item consume() {
  lock_acquire(countLock);
  while (count == 0) {
    cv_wait(countCV, countLock);
  }
  item = get(buffer, item);
  count--;
  lock_release(countLock);
  return item;
}

Looks good, right?

Any time you call cv_wait you must call cv_signal or cv_broadcast!
  • But where? Where does the condition change?

Producer-Consumer

int count = 0;
struct * cv countCV;
struct * lock countLock;
void produce(item) {
  lock_acquire(countLock);
  while (count == FULL) {
    cv_wait(countCV, countLock);
  }
  put(buffer, item);
  count++;
  cv_broadcast(countCV, countLock);
  lock_release(countLock);
}

item consume() {
  lock_acquire(countLock);
  while (count == 0) {
    cv_wait(countCV, countLock);
  }
  item = get(buffer, item);
  count--;
  cv_broadcast(countCV, countLock);
  lock_release(countLock);
  return item;
}

This works. But does it work well?

Producer-Consumer

int count = 0;
struct * cv countCV;
struct * lock countLock;
void produce(item) {
  lock_acquire(countLock);
  while (count == FULL) {
    cv_wait(countCV, countLock);
  }
  put(buffer, item);
  count++;
  if (count == 1) {
    cv_broadcast(countCV, countLock);
  }
  lock_release(countLock);
}

item consume() {
  lock_acquire(countLock);
  while (count == 0) {
    cv_wait(countCV, countLock);
  }
  item = get(buffer, item);
  count--;
  if (count == FULL - 1) {
    cv_broadcast(countCV, countLock);
  }
  lock_release(countLock);
  return item;
}

Approaching full victory. But why not use cv_signal?

Using the Right Tool

  • Most problems can be solved with a variety of synchronization primitives.

  • However, there is usually one primitive that is more appropriate than the others.

  • You will have a chance to practice picking synchronization primitives for ASST1, and throughout the class.

Approaching Synchronization Problems

  1. Identify the constraints.

  2. Identify shared state.

  3. Choose a primitive.

  4. Pair waking and sleeping.

  5. Look out for multiple resource allocations: can lead to deadlock.

  6. Walk through simple examples and corner cases before beginning to code.

Questions about Synchronization?

Now back to the process-related system calls!

Review: After fork()

returnCode = fork();
if (returnCode == 0) {
  # I am the child.
} else {
  # I am the parent.
}
  • The child thread returns executing at the exact same point that its parent called fork().

    • With one exception: fork() returns twice, the PID to the parent and 0 to the child.

  • All contents of memory in the parent and child are identical.

  • Both child and parent have the same files open at the same position.

    • But, since they are sharing file handles changes to the file offset made by the parent/child will be reflected in the child/parent!

Review: Issues with fork()

Copying all that state is expensive!
  • Especially when the next thing that a process frequently does is start load a new binary which destroys most of the state fork() has carefully copied!

Several solutions to this problem:
  • Optimize existing semantics: through copy-on-write, a clever memory-management optimization we will discuss in several weeks.

  • Change the semantics: vfork(), which will fail if the child does anything other than immediately load a new executable.

    • Does not copy the address space!

Review: The Tree of Life

  • fork() establishes a parent-child relationship between two process at the point when each one is created.

  • The pstree utility allows you to visualize these relationships.

pstree

Let’s Go On…​

  • Questions about fork()?

$ wait %1 # Process lifecycle

  • Change: exec()

  • Death: exit()

  • The Afterlife: wait()

Groundhog Day

Is fork() enough?

initfork 1
initfork 2
initfork 3
initfork 4

Change: exec()

  • The exec() family of system calls replaces the calling process with a new process loaded from a file.

  • The executable file must contain a complete blueprint indicating how the address space should look when exec() completes.

    • What should the contents of memory be?

    • Where should the first thread start executing?

  • Linux and other UNIX-like systems use ELF (Executable and Linkable Format) as the standard describing the information in the executable file is structured.

$ readelf # display ELF information

readelf

$ /lib/ld-linux.so.2

ldlinux

exec() Argument Passing

  • The process calling exec() passes arguments to the process that will replace it through the kernel.

    • The kernel retrieves the arguments from the process after the call to exec() is made.

    • It then pushes them in to the memory of the process where the replacement process can find them when it starts executing.

    • This is where main gets argc and argv!

  • exec() also has an interesting return, almost the dual of fork(): exec() never returns!

$ exec()

exec 1
exec 2
exec 3
exec 4
exec 5
exec 6
exec 7

exec() File Handle Semantics

  • By convention exec does not modify the file table of the calling process! Why not?

  • Remember pipes?

    • Don’t undo all the hard work that fork() put in to duplicating the file table!

pipes example 3

Our Simple Shell

Disclaimer: this is C-like pseudo-code. It will not compile or run! (But it’s not far off.)

while (1) {
  input = readLine();
  returnCode = fork();
  if (returnCode == 0) {
    exec(input);
  }
}

exec() Challenges

  • The most challenging part of exec() is making sure that, on failure, exec() can return to the calling process!

    • Can’t make destructive changes to the parent’s address space until we are sure that things will success.

    • Of course, the process is just an abstraction anyway and that provides a lot of flexibility: can prepare a separate address space and just swap it in when we’re done.

exit() # End of Life Issues

  • What’s missing here? Death!

  • Processes choose the moment of their own end by calling exit().

  • As we discussed earlier a processes passes an exit code to the exit() function.

  • What happens to this exit code?

wait() # The Afterlife

  • When a process calls exit() the kernel holds the exit code, which can be retrieved by the exiting child’s parent.

  • The parent retrieves this exit code by calling wait(), the last of the primary process-related system calls.

    • And the one that stubbornly refuses to fit into my lifecycle metaphor.

Next Time

  • At this point we will head from outside looking in—or at the top of the world looking down—and get deep inside the trenches to examine the thread abstraction.

  • Tentatively:

    • One week on interrupts, kernel privileges, context switching and threading.

    • One week on scheduling.