Critique of 'The New 2.6 Scheduler' by Rick Lindsley

by Brian Vuyk, student at Redeemer University College

Introduction

In recent years, the adoption of the Linux operating system into the desktop and server markets has increased at a nearly exponential rate. Linux has begun to fill many of the roles previously occupied by other operating systems. Linux is now being installed on many servers and other utility machines performing such tasks as CGI or CAD. These systems, as well as many modern desktop systems, typically contain multiple processors or processors containing multiple cores. In a recent paper (Lindsley, Rick. The New 2.6 Scheduler, Linux Journal, March 2004.), Rick Lindsley examines the flaws in the 2.4 Linux kernel scheduler, and the differences between scheduler in the 2.4 series kernels, and the 2.6 series.

In the 2.6 series of kernels, there are some fundamental differences in the way the scheduler works, when compared to the 2.4 series predecessors. This is a result of Linux’s increasing usage in symmetric multi-processing (SMP) environments. Due to certain architectural flaws, the 2.4 series scheduler was unable to scale well to SMP environments.

The 2.4 Series Scheduler

To properly understand the problem which faced the kernel engineers, one needs to have an understanding of the operation of the scheduler in the 2.4 series kernels.

The scheduler of the 2.4 series kernels was of exceptionally simple design. The processes are all stored on a single list, termed the ‘tasklist’. The tasklist is an unsorted linked list containing process references. Each process reference contains relevant process description and control information, as well as what is termed the ‘goodness rating’. The goodness rating determined the priority of a process and was the criteria by which a process was selected for running.

There were four main factors influencing the goodness rating:

  1. The primary ‘goodness’ factor was the number of allocated clock ticks remaining to a process. Upon being given processor resources, a process is allocated a certain number of clock cycles for which it is given control of the resource. If a process gives up the processor to wait for I/O, or is interrupted involuntarily, the process is awarded with increased ‘goodness’ when it is again ready to be processed.
  2. By using a certain system call, processes could advise the scheduler that they would like to remain on the current processor, even if another should open up first. This is called CPU affinity. This is done because it is expensive to move large amounts of memory between processes.
  3. User priority can be set by a root user using the ‘nice’ command. This allows a process’ priority to be elevated or lowered by a large range.
  4. Whether or not the process is a real time process influenced the priority greatly. This ensured that real time tasks could consistently reach their deadlines.

In order to chose a process to run, the scheduler would examine the tasklist, visiting each process within the list, and select the process with the highest goodness rating. Because the list was not ordered in any way, this operation was always O(n) in the best case.

This system worked fairly well for single-cored single-processor machines, which were common during the majority of the life cycle of the 2.4 series kernels. In 2001 when work began on the 2.5 series of experimental kernels, the kernel engineering community began to notice that their kernel was now being run on a large percentage of multi-processor or multi-cored machines, and that the 2.4 series scheduler did not scale well to meet the increased complexities.