Concurrency occurs when multiple processes or threads access a shared resource simultaneously (for simplicity, we will refer to them as processes below). Because resources are limited, problems arise if one process is modifying a shared resource while another process is using it. We call the section of code that accesses such shared resources the critical section.

Process Competition

Two or more processes need to access a critical section. For instance, suppose process A is driving the printer. If another process B also wants to print at this moment, without protective measures, the printer would mix outputs from process A and process B. This relationship is called process competition.

Solutions

Any solution to this problem must satisfy the following four requirements:

  1. Mutual Exclusion: Only one process is allowed access to the resource at a time.
  2. Progress (No Lockout): A process that is not currently using the resource cannot block other processes from accessing it.
  3. Bounded Waiting (Prevent Starvation): A process cannot repeatedly access a resource so quickly that other processes are left waiting indefinitely (starving).
  4. No Deadlock: When multiple processes attempt to access a resource simultaneously, they should not reach a deadlock state where they block each other indefinitely, preventing anyone from proceeding.

Example

Here is a critical section:

1
2
3
4
5
6
resource = 0;
def operation(x):
# In reality, this operation is not atomic.
# It involves reading resource, adding x, and writing back.
global resource
resource += x

Process 1 calls this critical section:

1
2
3
# process_1
operation(5)
print(resource)

Process 2 also calls this critical section:

1
2
3
# process_2
operation(3)
print(resource)

Suppose process 1 reaches the ready queue first and starts running. While it is executing operation(), process 2 enters the ready queue. For some reason, the scheduler decides to switch to process 2 for a while. Process 1 may have only calculated resource + x internally but has not yet updated the resource variable itself, so resource remains 0. Process 2 updates resource to 3, and only then does the scheduler resume process 1. Process 1 then adds 5 to its internal copy (which was based on 0), resulting in 8. For process 1, this result is clearly incorrect (it expects $0+5=5$ or $3+5=8$, but the race condition makes it non-deterministic).

Adhering to the principles above, let’s improve the code. should_wait is a global variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
# Shared variables
p1_running = False
p2_running = False
should_wait = 0

# process_1
p1_running = True
should_wait = 1
while p2_running and (should_wait == 1):
pass # Busy wait
operation(5)
p1_running = False
print(resource)
1
2
3
4
5
6
7
8
# process_2
p2_running = True
should_wait = 2
while p1_running and (should_wait == 2):
pass # Busy wait
operation(3)
p2_running = False
print(resource)

Assume both processes run concurrently. When the processes reach the line setting should_wait, its value will be either 1 or 2 (whichever process sets it last). Thus, when they both enter the while loop, only one process will be able to proceed and call operation(). This achieves mutual exclusion and avoids deadlock. Starvation should not occur either because once process 1 finishes, it sets p1_running to False, allowing process 2 to break out of its while loop and execute.

Process Cooperation

Besides processes competing for a resource, there are situations where processes need to coordinate. Like a moving company, some people carry furniture into the house, while others arrange it. Furniture that hasn’t been brought inside cannot be arranged. This scenario is known in operating systems as the Producer-Consumer Problem, also called the Bounded-Buffer Problem[^1].

Semaphores

Solving concurrency issues purely through software code has several drawbacks:

  1. It is often limited to two processes (though algorithms for $N$ processes exist, they are complex).
  2. While one process is accessing the critical section, another process can only endlessly loop to wait, wasting allocated CPU time.
  3. It is not easily reusable.

Semaphores offer a unified solution to concurrency problems. A semaphore is a non-negative integer value. It only accepts two fundamental operations (originally denoted as P and V):

Operation S > 0 S = 0
a(S) (signal / V) S + 1 S + 1
s(S) (wait / P) S - 1 Wait until S > 0, then S - 1

Semaphores must guarantee:

  1. Atomicity: When multiple processes operate on a semaphore, the operations are executed sequentially without interruption.
  2. If multiple processes are waiting in s(S) when $S=0$, a a(S) operation must wake up at least one of them.

As shown, we have two processes sharing a resource and using semaphores to prevent issues caused by concurrency.

semaphores

At $t0$, process 2 uses the a() function to increment the semaphore to 1. At $t1$, both process 1 and 2 increment the semaphore simultaneously. These increments are performed atomically, avoiding conflicts. At $t8$, both processes decrement the semaphore simultaneously, but since there isn’t enough, the system blocks process 1 arbitrarily (assuming a simple semaphore implementation). Only after process 2 increments it at $t11$ can process 1 proceed.

Implementing Semaphores

Many modern computers implement semaphores at the hardware level, often using an atomic test-and-set instruction[^2] to lock resources. This model operates on semaphores via a function like L(R, x). L(R, x) executes the following operations atomically:

  1. Copy the value of $x$ to $R$.
  2. Set $x$ to False.

All processes requiring the same shared resource would include a block of code like this:

1
2
3
4
5
6
7
8
9
10
/* x is a shared boolean initialized to True */
boolean will_access;

do {
L(will_access, x); // Atomically get x and set it to False
} while (will_access != True); // Loop if x was already False

operation_on_critical_section();

x = True; // Release lock

Initially, $x$ is True. Multiple processes will compete to acquire $x$ via L(R, x), which sets it to False atomically. Only the process that read True will proceed; others will loop in the while loop until $x$ becomes True again.

Binary Semaphores

Many synchronization problems do not require a general semaphore (counting semaphore). A binary semaphore only switches between 0 and 1 and accepts the following operations:

Operation s = 1 s = 0
ab(s) (binary signal) s = 1 s = 1
sb(s) (binary wait) L(R, s); while(R == 0) { L(R, s); } L(R, s); while(R == 0) { L(R, s); }

If a process repeatedly checks if a semaphore is available, we say that process is in a busy waiting (busy-looping, spinning) state[^3]. Busy waiting consumes CPU time without productive output and should be avoided as much as possible. Similar to the function above, the L() function in this context (assuming atomic swap for simplicity) would effectively achieve mutual exclusion but still rely on spinning in the pseudo-code example for implementation logic.

Note: I use uppercase S for general semaphores and lowercase s for binary semaphores.

ab() and sb() Implementation

General semaphores can be implemented using a() and s() operations. These operations manipulate an integer value $S$. To ensure that only one operation can be executed at a time (atomicity), we use a binary semaphore to protect the modification of $S$.

This $S$ value serves two purposes:

  1. When $S \ge 0$, $S$ represents the available resources (general semaphore).
  2. When $S < 0$, its absolute value $|S|$ represents the number of processes currently in the waiting queue.

To avoid busy waiting, when $S$ decrements below 0, the process is blocked and moved to a waiting queue. Remember the short-term scheduler from our previous post? We focused on the ready queue and ignored the waiting queue; now is when this waiting queue comes into play[^4].

Here is the implementation code (pseudo-code):

1
2
3
4
5
6
7
8
9
10
11
12
/* Global variable S protected by binary semaphore s */
/* s initialized to 1 (unlocked), S initialized to available resources */

def s_op():
sb(s); // Lock access to S
S -= 1;
if (S < 0) { // No resources available, processes are already waiting
ab(s); // Unlock s before blocking
block(); // Block this process / put it into the waiting queue
} else {
ab(s); // Unlock s
}
1
2
3
4
5
6
7
8
9
def a_op():
sb(s); // Lock access to S
S += 1;
if (S <= 0) { // There are processes waiting
ab(s); // Unlock s
reactive(); // Wake up one arbitrary process from the waiting queue
} else {
ab(s); // Unlock s
}

Let’s see what this code achieves:

bs

Interpreting this diagram. Before starting, let me clarify the notations:

  1. Processes with red borders indicate that the process was context-switched by the scheduler before completing a full CPU burst.
  2. Green processes perform a(S), pale orange processes perform s(S).
  3. Process priorities are $p1 > p2 > p3$.

Now, in chronological order, I’ll explain the process:

  • Initially, only process p2 is in the ready list, so it executes. p2 performs the s_op() operation.
  • Assume that just as p2 finishes executing sb(s), p1 enters the ready list. Because its priority is higher, the scheduler decides to run it first.
  • Because p2 holds the lock s, p1 cannot proceed after calling sb(s) and will spin (busy wait in this implementation example). After its time slice, it’s put back on the ready list. Meanwhile, p3 is added to the ready queue.
  • After p1’s slice, since p2’s priority is higher than p3’s, p2 continues running.
  • p2 completes S -= 1, finds $S < 0$ (assuming S was 0 initially), is moved to the waiting list, and releases lock s.
  • At $t1$, in the ready list, p1 is executed because its priority is higher than p3.
  • Similarly, because $S < 0$ after executing S -= 1, p1 is also moved to the waiting list.
  • At $t2$, the scheduler chooses to run p3.
  • p3 executes a_op(), which triggers the reactive() function and puts p1 back into the ready state.
  • After p3 finishes its slice, p1 transitions from ready to running state, while p3 enters the ready list.
  • After p1 runs, p3 executes, and p2 is subsequently moved to the ready queue (by p3’s a_op() eventually).
  • p2 is executed.

Throughout this process, $S$ becoming 0 does not lead to a busy waiting state at the $S$ level. (However, at $t0$, p1’s run is a busy wait because s was preempted by process p2. Even so, this is much more efficient than spinning directly on $S$, as s allows $S$ to manage processes via waiting queues).

Monitors

a() and s() operations are sufficient to solve most synchronization problems, but they focus too much on low-level implementation details, making monitoring and debugging difficult.

A Monitor (or管程), implements a() and s() at a higher level. It is theoretically equivalent to semaphores[^5].

Condition variables come with their own waiting queues, and processes can wait for certain values within these queues to become true.

A monitor must satisfy the following conditions[^5]:

  1. Multiple threads that can interact and share resources.
  2. Multiple variables related to resource usage.
  3. A mutual exclusion lock.
  4. Invariants used to avoid race conditions.

Condition variables support two operations:

  1. wait: Stop the process, place it in the waiting queue, and associate it with the condition variable.
  2. signal: Re-activate the first process in the condition variable’s queue.

Here is how to use a monitor to handle the Producer-Consumer Problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Monitor ProducerConsumer {
let product = 0;
// Condition variables typically have associated waiting queues
// condition full, empty;

consume_product() {
// Implement mutual exclusion automatically
if (product <= 0) {
// wait(empty);
wait(); // Blocks on implicit condition variable
}
product -= 1;
// signal(full);
}

produce_product() {
// Implement mutual exclusion automatically
product += 1;
// signal(empty);
signal(); // Signals implicit condition variable
}
}

When consumer p1 executes Monitor.consume_product(), because no product is available, p1 blocks and is thrown into the waiting queue. Later, producer process p2 executes Monitor.produce_product(). This function sends a signal to the waiting queue associated with product, releasing p1 to run. Since the monitor implementation requires mutual exclusion, once p1 is released, p2 is typically blocked and placed into a high-priority waiting queue. After p1 finishes execution, p2 continues. Typically, the waiting queue follows a FIFO model, but other scheduling algorithms can be used.

Common Synchronization Problems

I’ll leave some placeholders here to write about how to solve these problems using semaphores and monitors when I get some time. However, there are many tutorials online, so these placeholders might remain forever 😄.

  1. Readers-Writers Problem
  2. Dining Philosophers Problem