Real Time and Linux
Linux is well tuned for throughput-limited applications, but it is not well designed for deterministic response, though enhancements to the kernel are available to help or guarantee determinism. So-called real-time applications require, among other things, deterministic response. In this article I examine the nature of real-time applications and Linux's strengths and weaknesses in supporting such applications. In later articles I will examine various approaches to help real-time applications satisfy hard real-time requirements. Most of these issues are with respect to the Linux kernel, but the GNU C library, for example, plays a part in some of this.
Many definitions are available for the terms relating to real time. In fact, because they have different requirements, different applications demand different definitions. Some applications may be satisfied by some average response time while others may require that every deadline be met.
The response time of an application is that time interval from when it receives a stimulus, usually provided via a hardware interrupt, to when the application has produced a result based on that stimulus. These results may be such things as the opening of a valve in an industrial control application, drawing a frame of graphics in a visual simulation or processing a packet of data in a data-acquisition application.
Let's consider an opening of a valve scenario. Imagine we have a sensor next to a conveyor belt with parts that need to be painted that move past a paint nozzle. When the part is just in the right position the sensor alerts our system that the valve on the paint nozzle should open to allow paint to be sprayed onto the part. Do we need to have this valve open at just the right time on average, or every time? Every time would be nice.
We need to open the valve no later than the latest time that it's not too late to begin painting the part. We also need to close the valve no sooner than when we are finished with painting the part. Thus, it is desirable to not keep the valve open any longer than necessary, since we really don't want to be painting the rest of the conveyor belt.
We say that the latest possible time to open the valve and still accomplish the proper painting is the deadline. In this case if we miss the deadline, we won't paint the part properly. Let's say that our deadline is 1ms. That is the time from when the sensor alerts us until the time we must have begun painting. To be sure that we are never late, let's say we design our system to begin painting 950µs after we receive the sensor interrupt. In some cases we may begin painting a little before, and sometimes a little afterward.
Of course, it will never be the case that we start painting exactly 950µs after the interrupt, with infinite precision. For instance, we may be early by 10µs one time and late by 13µs the next. This variance is termed jitter. We can see from our example that if our system were to provide large jitter, we would have to move our target time significantly below 1ms to be assured of satisfying that deadline. This also would mean that we frequently would be opening the valve much sooner than actually required, which would waste paint. Thus, some real-time systems will have requirements in terms of both deadlines and jitter. We assume that our requirements say that any missed deadlines are failures.
The term operating environment means the operating system as well as the collection of processes running, the interrupt activity and the activity of hardware devices (like disks). We want to have an operating environment for our real-time application that is so robust we are free to run any number of any kind of applications concurrently with our real-time application and still have it perform acceptably.
An operating environment where we can determine the worst-case time for a given response time or requirement is deterministic. Operating environments that don't allow us to determine a worst-case time are called nondeterministic. Real-time applications require a deterministic operating environment, and real-time operating systems are capable of providing a deterministic operating environment.
Nondeterminism is frequently caused by algorithms that do not run in constant time; for example, if an operating system's scheduler must traverse its entire run list to be able to decide which process to run next. This algorithm is linear, sometimes notated as O(n), meaning read on the order of n. That is, as n (the number of processes on the run list) grows, the time to decide grows proportionally. With an O(n) algorithm there is no upper bound on the time that the algorithm will take. If your response time depends upon your sleeping process to be awakened and selected to run, and the scheduler is O(n), then you will not be able to determine the worst-case time. This is a property of the Linux scheduler.
This is important in an environment where the system designer cannot control the number of processes that users of the system may create, this is important. In an embedded system, where characteristics of the system, such as the user interface, make it impossible for there to be any more than a given number of processes, then the environment has been constrained sufficiently to bound this kind of scheduling delay. This is an example where determinism may be achievable by some aspect of the configuration of the operating environment. Notice that a priority system may be required, as well as other things, but as far as the scheduling time is concerned, the time is bounded.
A visual simulation may require an average target framerate, say 60 frames per second. As long as frames are dropped relatively infrequently, and that over a suitable period the framerate is 60 frames a second, the system may be performing acceptably.
The example of the paint nozzle and the average framerate are examples of what we call hard real-time and soft real-time constraints, respectively. Hard real-time applications must have their deadlines met, otherwise an unacceptable result occurs. Something blows up, something crashes, some operation fails, someone dies. Soft real-time applications usually must satisfy a deadline, but if a certain number of deadlines are missed by just a little bit, the system may still be considered to be operating acceptably.
Let's consider another example. Imagine we are building a penguin robot to aid scientists in studying animal behavior. Through careful observation we determine that upon the emergence of a seal from a hole in the ice, a penguin has 600ms to move away from the hole to avoid being eaten by the seal. Will our robot penguin survive if it moves back, on average, within 600ms? Perhaps, if the variance in the attack time of the seal varies synchronously with our penguin's response time. Are you going to build your penguin with that assumption? We also realize that there is a certain distance from the hole that the seal can reach. Our penguin must move farther than that distance within the 600ms. Some would call that line the deadline.
For an operating environment to accommodate a hard real-time application, it must be able to insure that the application's deadlines always can be satisfied. This implies that all actions within the operating system must be deterministic. If the operating environment accommodates a soft real-time application, this generally means that an occasional delay may occur, but such a delay will not be unduly long.
Requirements for an application may be quantitative or qualitative. A qualitative requirement for a visual simulation would be that the system needs to react quickly enough to seem natural. This reaction time would be quantified to measure compliance. For example, a frame of graphics based upon user input may have a requirement to be rendered within 33.3ms after the user's input. This means that if a pilot moves the control stick in the flight simulator to bank right, the out-of-the-window view should change to reflect the new flight path within 33.3ms. Where did the 33.3ms requirement come from? Human factors--that amount of time is fast enough so that humans perceive the visual simulation as sufficiently smooth.
It is not the value of the time requirement but rather that there is a time requirement that makes this a real-time requirement. If one changed the requirement to have the graphics drawn within 33.3 seconds, instead of 33.3ms, it would still be a real-time system. The difference may be in the means to satisfy the requirements. In a Linux system the 33.3ms may require the use of a special Linux kernel and its functions, whereas the 33.3 second requirement may be achievable by means available within a standard kernel.
This leads us to a tenet: fast does not imply real-time and vice versa. However, fast on a relative scale may imply the need for real-time operating system features. This leads us to the distinction between real-time operating systems and real-time applications. Real-time applications have time-related requirements. Real-time operating systems can guarantee performance to real-time applications.
In practice, a general-purpose operating system, such as Linux, provides sufficient means for an application with relatively long deadlines if the operating environment can be controlled suitably. It is because of this property that one frequently hears that there is no need for real-time operating systems because processors have become so fast. This is only true for relatively uninteresting projects.
One has to remember, though, that if the operating environment is not constrained suitably, deadlines may be missed even though extensive testing never found a case of a missed deadline. Linear-time algorithms, for example, may be lurking in the code.
Another issue to keep in mind is audience effect, often stated as "The more important the audience for one's demo, the more likely the demo will fail.'' While anecdotal evidence abounds for the audience effect, effects such as nonrepeatability are often due to race conditions. A race condition is a situation where the result depends upon the relative speeds of the tasks or the outside world. All real-time systems, by definition, have race conditions. Well-designed systems have race conditions only around required deadlines. Testing alone cannot prove the lack of race conditions.
Since an operating system is largely responsive instead of proactive, many activities that cause delay can be avoided. In order for a particular application (process) to be able to meet its deadlines, such things as CPU-bound competitors, disk I/O, system calls or interrupts may need to be controlled. These are the kinds of things that constitute a properly constrained operating environment. Characteristics of the operating system and drivers also may be of concern. The operating system may block interrupts or not allow system calls to be preempted. While these activities may be deterministic, they may cause delays that are longer than acceptable for a given application.
A real-time operating system requires simpler efforts to constrain the environment than does a general-purpose operating system.
Unless we state otherwise, assume we are talking about the 2.4.9 version of the Linux kernel. Version 2.4.9 was released in August 2001, although our statements, at least for the most part, are true for the last several years of kernel releases.
There are many qualities of an operating system that may be necessary or desirable for it to be appropriate for real-time applications. One list of features is included in the Comp.realtime FAQ at http://www.faqs.org/faqs/realtime-computing/faq/. That list contains such features as the OS being multithreaded and preemptible, and able to support thread priorities and provide predictable thread-synchronization mechanisms. Linux is certainly multithreaded, supports thread priorities and provides predictable thread-synchronization mechanisms. The Linux kernel is not preemptible.
The FAQ also says that one should know the OS behavior for interrupt latency, time for system calls and maximum time that interrupts are masked. Further, one should know the system-interrupt levels and device-driver IRQ (interrupt request line) levels, as well as the maximum time they take. We provide some timings for interrupt latency and interrupt masked ("interrupts blocked'') times in the benchmarking section below.
Many developers also are interested in having a deadline scheduler, a kernel that is preemptible in the millisecond range or better, more than 100 priority levels, user-space support for interrupt handlers and DMA, priority inheritance on synchronization mechanisms, microsecond timer resolution, the complete set of POSIX 1003.1b functionality and constant time algorithms for scheduling, exit(), etc. None of these capabilities are available in the standard kernel and the GNU C library.
Additionally, in practice, the magnitude of delays becomes important. The Linux kernel, in a relatively easily constrained environment, may be capable of worst-case response times of about 50ms, with the average being just a few milliseconds.
Andrew Morton suggests that one should not scroll the framebuffer, run hdparm, use blkdev_close or switch consoles (see http://www.uow.edu.au/~andrewm/linux/schedlat.html#ddt). These are examples of constraining the operating environment.
Some applications, however, may require response times on the order of 25µs. Such requirements are not satisfiable in applications that are making use of Linux kernel functionality. In such cases, some mechanism outside of the Linux kernel functions must be employed in order to assure such relatively short response-time deadlines.
We see, in practice, that a hard real-time operating system can assure deterministic response and provide response times that are significantly faster than those provided by general-purpose operating systems like Linux.
We see that Linux is not a real-time operating system because it always cannot assure deterministic performance and because its average and worst-case timing behavior is far worse than that required by many real-time applications. Remember that the timing behavior required by these many real-time applications generally is not a hardware limitation. For example, the Linux kernel response time may be on the order of a few milliseconds on a typical x86-based PC, while the same hardware may be capable of better than 20µs response times when running a real-time operating system.
The two reasons that the Linux kernel has such relatively poor performance on uniprocessor systems are because the kernel disables interrupts and because the kernel is not suitably preemptible. If interrupts are disabled, the system is not capable of responding to an incoming interrupt. The longer that interrupts are delayed, the longer the expected delay for an application's response to an interrupt. The lack of kernel preemptibility means that the kernel does not preempt itself, such as in a system call for a lower-priority process, in order to switch to a higher-priority process that has just been awakened. This may cause significant delay. On SMP systems, the Linux kernel also employs locks and semaphores that will cause delays.
User-space real-time applications require services of the Linux kernel. Such services, among other things, provide scheduling, interprocess communication and performance improvement. Let's examine a variety of system calls (the kernel's way of providing services to applications that are of special benefit to real-time application developers). These calls are used to constrain the operating environment.
There are 208 system calls in the Linux kernel. System calls usually are called indirectly through library routines. The library routines usually have the same name as the system call, and sometimes library routines map into alternative system calls. For example, on Linux, the signal library routine from the GNU C library, version 2.2.3, maps to the sigaction system call.
A real-time application may call nearly all of the set of system calls. The calls that are most interesting to us are exit(2), fork(2), exec(2), kill(2), pipe(2), brk(2), getrususage(2), mmap(2), setitimer(2), ipc(2) (in the form of semget(), shmget() and msgget()), clone(), mlockall(2) and sched_setscheduler(2). Most of these are described well in either Advanced Programming in the UNIX Environment, by W. Richard Stevens, or in POSIX.4: Programming for the Real World, by Bill O. Gallmeister. The clone() function is Linux-specific. The others, for the most part, are compatible with typical UNIX systems. However, read the man pages because there are some subtle differences at times.
Real-time applications on Linux also frequently are interested in the POSIX Threads calls, such as pthread_create() and pthread_mutex_lock(). Several implementations of these functions exist for Linux. The most commonly available of these is provided by the GNU C library. These so-called LinuxThreads are based on the clone() system call and are scheduled by the Linux scheduler. Some POSIX functions are available for POSIX Threads (e.g., sem_wait()) but not for Linux processes.
An application running on Linux ordinarily can be slowed down considerably, from its best case, by a number of factors. Essentially these are caused by contention for resources. Such resources include synchronization primitives, main memory, the CPU, a bus, the CPU cache and interrupt handling.
An application can reduce its resource contention for these resources in a number of ways. For synchronization mechanisms, e.g., mutexes and semaphores, an application can reduce their use, employ priority inheritance versions, employ relatively fast implementations, reduce the time in critical sections, etc. Contention for the CPU is affected by priorities. In this view, for example, lack of kernel preemption can be seen as a priority inversion. Contention for a bus is probably not significantly long to be of direct concern. However, know your hardware. Do you have a clock that takes 70µs to respond and holds the bus? Contention for a cache is affected by frequent context switches and by large or random data or instruction references.
Consequently, real-time applications usually give themselves a high priority, lock themselves in memory (and don't grow their memory usage), use lock-free communication whenever possible, use cache memory wisely, avoid nondeterministic I/O (e.g., sockets) and execute within a suitably constrained system. Suitable constraints include limiting hardware interrupts, limiting the number of processes, curtailing system call use by other processes and avoiding kernel problem areas, e.g., don't run hdparm.
Some of the system calls that should be made by a real-time application require special privileges. This usually is accomplished by having root be the owner of the process (having a shell owned by root run the program or having the executable file have the SUID bit set). A newer way is to make use of the capability mechanism. There are capabilities for locking down memory, such as CAP_IPC_LOCK (that "IPC'' is in the name is just something we need to accept), and for being able to set real-time priorities, which can be done with the capability CAP_SYS_NICE.
A real-time process sets its priority with sched_setscheduler(2). The current implementation provides the standard POSIX policies of SCHED_FIFO and SCHED_RR, along with priorities ranging in value from 1-99. Bigger is better. The POSIX function to check the maximum allowable priority value for a given policy is sched_get_priority_max(2).
A real-time process should lock down its memory and not grow. Locking memory is done in Linux with the POSIX standard function mlockall(2). Usually one uses the flags value of MCL_CURRENT | MCL_FUTURE to lock down current memory and any new memory if one's process grows in the future. While growing often is not acceptable, if you get lucky and survive the delay you might as well get the newly allocated memory locked down as well. Be careful to grow your stack and allocate all dynamic memory, and then call mlockall(2) before your process begins its time-critical phase. Note that you can check to see if your process had any page faults during a section of code by using getrusage(2). I show a code fragment below to illustrate the use of several functions. Note that one should check the return value from each of these calls and read the man pages for more details:
priority = sched_get_priority_max(SCHED_FIFO); sp . sched_priority = priority; sched_setscheduler(getpid(), SCHED_FIFO, &sp); mlockall(MCL_FUTURE | MCL_CURRENT); getrusage(RUSAGE_SELF,&ru_before); . . . // R E A L T I M E S E C T I O N getrusage(RUSAGE_SLEF,&ru_after); minorfaults = ru_after.ru_minflt - ru_before.ru_minflt; majorfaults = ru_after.ru_majflt - ru_before.ru_majflt;
There are a number of efforts to benchmark various aspects of Linux. Real-time application developers are most interested in interrupt latency, timer granularity, context-switch time, system call overhead and kernel preemptibility. Interrupt latency is the time from when a device asserts an interrupt until the time that the appropriate interrupt handler begins executing. This typically is delayed by the handling of other interrupts and by interrupts being disabled. Linux does not implement interrupt priorities. Most interrupts are blocked when Linux is handling an interrupt. This time typically is quite short, however, perhaps a few microseconds.
On the other hand, the kernel may block interrupts for a significantly longer time. The intlat program from Andrew Morton allows one to measure interrupt latencies (http://www.uow.edu.au/~andrewm/linux/#intlat/). Similarly, his schedlat shows scheduling latencies (http://www.uow.edu.au/~andrewm/linux/schedlat.html).
Context-switch time is included in the well-known benchmark harness LMbench (http://www.bitmover.com/lmbench/), as well as by others (http://www.atnf.csiro.au/~rgooch/benchmarks/linux-scheduler.html, http://math.nmu.edu/~benchmark/index.php?page=context). LMbench also provides information about system calls.
In Table 1 we show the results of LMbench. This table shows context-switch times. The benchmark program was run three times, and the lowest value for the context-switch time for each configuration is reported in the table as per the documentation for LMbench. The highest value, however, was no more than about 10% larger than the minimum. The size of the process is reported in kilobytes, the context-switch time is in microseconds. The context-switch time data indicate that substantial use of data in the cache causes significantly larger context-switch times. The context-switch time includes time to restore the cache state.
As an example of interrupt-off times, one can see some results at http://www.uow.edu.au/~andrewm/linux/intlat/intlat-disk.html. In one experiment with hdparm, the data show that interrupts can be disabled for over 2ms while hdparm runs. Developers can use the intlat mechanism to measure interrupt-off times for the system they are running. It is only under rare conditions that interrupt-off times will exceed 100µs. These conditions should be avoidable for most embedded systems. They are the areas that Morton warns against.
An area of more significant concern to most real-time developers is that of scheduling latency. That is, the delay in continuing a newly awakened high-priority task. A long delay is possible when the kernel is busy executing a system call. This is because the Linux kernel will not preempt a lower priority process in the midst of a system call in order to execute a newly awakened higher priority process. This is why the Linux kernel is termed non-preemptible.
The latency test from Benno Senoner shows that a delay of 100ms or more is possible (http://www.gardena.net/benno/linux/audio/). We can see that both interrupt blocking and scheduling latencies can be sufficiently long to prevent satisfactory performance for some applications.
Timing resolution is also of importance to many embedded Linux developers. For example, the setitimer(2) function is used to set a timer. This function, like other time functions in Linux, has a resolution of 10ms. Thus, if one sets a timer to expire in 15ms, it actually will expire in about 20ms. In a simple test measuring the time interval between 1,000 successive 15ms timers, we found that the average time interval was 19.99ms, the minimum time was 19.987ms and the maximum time was 20.042ms on a quiescent system.
Kevin Dankwardt is founder and CEO of K Computing, a training and consulting firm in Silicon Valley. In particular, his organization develops and delivers embedded and real-time Linux training worldwide.
email: k@kcomputing.com