A Taxonomy of Resources for Embedded Linux Developers
So you want to run Linux on the new-fangled, network ready, Linux based, ``all homes will want one'', revolutionary appliance you just developed? Congratulations. Now what are you going to do for a version of Linux, device drivers, technical information, or tools? We present a hierarchical system, a taxonomy that describes the various kinds of products, services, or information that an embedded Linux developer (should we call them ELDers?) may need, or at least want. Our system is in the tradition of the Linnean system. We classify the flora and fauna that inhabit the embedded Linux world. Whether a particular item is flora or fauna we leave up to the reader.
We describe characteristics of the offerings that are to be used on the host or target systems. Embedded system developers may perform their development work on a system quite different from the system that will be deployed running their software. For example, a developer may be using a Sun Microsystems workstation running Solaris and cross compiling for a small device with a StrongArm processor and only 8MB of RAM. It should be noted, however, that it is recognized as a great benefit that developers can often use a Linux-based development environment coupled with a Linux-based target.
We use the term host to mean the computer system that the developer uses to interact with the development tools and the term target for the embedded system that will run the software that is being developed. The embedded Linux offerings supply development tools for the host system and drivers, special applications, or custom kernels for use on the target. When evaluating a supplier one needs to consider both their host and target support.
This article has the following parts:
Use and goals of the Taxonomy
Who should use the Taxonomy?
Defining embedded Linux system
Why can't a developer use a desktop or server distribution?
An overview of the Taxonomy
Using the Taxonomy--an example vendor
Conclusion
The Taxonomy has been derived from existing products and services. There exists at least one product or service that fits into each category.
How a particular supplier's offerings cover of the needs of a developer can be used to determine a good choice of suppliers. The Taxonomy can help a developer to write a checklist, in some sense, of what is required. It is to be expected that no one supplier will supply all of the needed information, hardware, software, etc. Some suppliers are volunteers that provide, say, a single device driver, while others are commercial organizations that provide sophisticated Linux distributions, support and development help.
Our goal is to provide a systematic means for developers to characterize the offerings they see. The Taxonomy is intended to be able to categorize any and all products or services from which an embedded Linux developer may benefit. The Taxonomy is also convenient for suppliers to characterize their offerings. Instead of saying, ``we use a standard kernel``, or ``using standard Linux'', or some other undefined phrase a supplier can now say ``we do not patch the kernel''. In this way the Taxonomy provides a common language for developers and suppliers with which to speak to each other.
The Taxonomy does not provide a means to evaluate the quality of a supplier's offerings, merely that the supplier has something available. The top two levels of the Taxonomy are shown in Figure 1.
Who Should Use the Taxonomy?
If you are an embedded Linux developer then you may find this classification system helpful in understanding the landscape. In addition, we provide real examples of the kinds of resources that are available. Vendors will also benefit by the using this established means of categorizing their offerings instead of having to make up their own terminology.
By embedded system we mean a computer system that functions inside of a device whose main function is not that of being a computing system. Examples include cellular telephones, PDAs and microwave ovens. We will also concentrate on devices with relatively limited resources such as no hard disk, or RAM of about 16MB or less. This means that our discussion won't be as useful for someone building a 256-processor, 16GB telescope directing system for the top of Mount Haleakala to view man-made satellites in orbit.
By Linux, we mean a system whose kernel is derived from a release of the Linux kernel. The version should share a great deal of the source code with a released version. We don't mean, for example, a system that may happen to share an API with Linux. When we say kernel we mean that software that's included in a kernel release tar file.
The issue of whether a supplier is supplying a kernel unmodified from http://www.kernel.org/ or one that they patched can be important. Some embedded Linux vendors have made significant changes in the kernels that they distribute. These changes can be essential to a target application. For example, the changes may provide the required scheduling behavior. On the other hand, these changes in the way the kernel behaves can provide a significant dependency of the application on the modifications. Changes in the kernel result in divergent paths or fragments. This fragmentation is reminiscent of the differences that resulted in somewhat incompatible versions of UNIX such as Solaris, HP-UX and Irix.
We prefer that suppliers make clear what changes they have made. A patch file is a standard way of documenting changes. In addition, clear documentation is valuable. If a vendor changes the kernel then we prefer that they give their version a name, for example, Purple Chicken Linux 1.1. If a vendor does not make any changes then merely saying the Linux version number is appropriate, say Linux 2.2.14. Similarly, if a vendor changes a tool or application then its appropriate to distinguish that, say, Purple Chicken gdb. If the vendor does not change the tool then merely calling it ``gdb'' is appropriate. It has become confusing since some vendors are calling everything in their distribution by their distribution name.
Our classification scheme, for example, allows one to note that a supplier has a non-standard version of a kernel using the category Software®Special Distribution®Patched Kernel (see Figure 2).
Embedded Linux is the use of Linux as the operating system in an embedded system. There are various references for why this is a valuable notion to consider. For more information, see, for example, http://www.linuxdevices.com/.
Why can't a developer just use a standard desktop or server distribution? The definition of embedded includes the characteristic that our target system has more limited resources than a standard desktop or server system. If your target doesn't then you may well be able to use a standard distribution as long as it can satisfy your time constraints. Since there are lots of overlap between the needs of desktop or server applications developers and embedded system developers, the Taxonomy does include some categories that would interest other developers, too. The Taxonomy, naturally, however, includes only categories specially designed to classify embedded Linux system offerings.
These distinctions lead to the two primary characteristics of embedded Linux distributions--they can help with conserving resources, and they can help with performance. For performance we usually mean that they have special facilities for ensuring real-time performance. This can mean the use of an adjunct real-time OS. Examples of these are RTLinux and RTAI. Another way to aid real-time, albeit ``soft'' real-time, is to change the Linux scheduler. If you don't need either resource conservation or improved performance than you can use a desktop or server distribution if it supports your target.
In addition, embedded Linux distributions provide a kernel and application code for processors or boards that are characteristic of embedded systems and not just for desktop or server systems. Embedded Linux distributions may also provide special tools for aiding the embedded Linux developer in reducing the size of their target configuration.
Organizations that are building a version of Linux for embedded systems may need to do quite a bit of kernel work. To get an idea of what a developer may be in for, consider the case of TiVo Systems (www.tivo.com), who build a set-top box that runs Linux. They used kernel release 2.1.24, added updated drivers, and in addition they added 29 files and modified 31 others. Their new code amounts to about 11,500 lines. They added a couple of system calls, modified several places in the kernel that checked for superuser permission to let any user access the resource (i'ts an embedded system, after all), added extra kernel diagnostics, added some test code and created a new kind of filesystem.
Embedded Linux distributions often also involve numerous modifications. As an example, Blue Cat Linux from LynuxWorks involves patches of 193 files of kernel release 2.2.12. Note that Blue Cat Linux derives from Red Hat Linux, a distribution targeted for desktop and server applications. The kernel package that comes with Blue Cat Linux says ``...the core of your Red Hat Linux Operating System...''.
One usually cannot use a newer version of a kernel with a vendor's patches. For example, if you blindly try to apply the Blue Cat patches to kernel release 2.2.14, then you will find that 15 files fail to patch. If you are really adventurous and you try their patches on kernel release 2.4.0.test7, you will find that 59 files fail to patch. This represents only failures from the patch command. Just because the patch command succeeded on the other files doesn't mean that they will work as required. This means that a developer that wants to take advantage of features available in a new kernel may be in a for a significant development effort.
The situation, however, is not as dire as what is faced by the typical Independent Software Vendor (ISV). When an ISV produces a version of their software they are usually dependent on the version of the operating system that their customers and suppliers are using. For embedded system developers however, they have control over the operating system that their customers will be running since they provide it with their application.
In addition to the kernel modifications, embedded Linux distributions may provide special tools to help reduce resource usage, custom kernel versions for particular targets, smaller versions of standard software or proprietary software (e.g., a licensed binary driver). (For more information of the busybox, a replacement for a wide range of standard Linux commands, see "Building Tiny Linux Systems with Busybox" by Bruce Perens). The Linux kernel was optimized for desktop and server throughput. The standard kernel was not optimized for real-time response or for use on relatively resource-starved devices. This means that it is likely that an embedded Linux system will use a modified kernel. The characteristics of special distributions are diagrammed in Figure 3.
An Overview of the Taxonomy
Let's take a tour of the Taxonomy, providing examples for many of the categories. We move over the figures from left to right, top to bottom. The examples chosen do not imply any kind of recommendation; they just serve to help clarify the definition of the category. We add comments about categories to properly define them.
The Information category (see Figure 4) is for informational or reference products. We define the products to be conferences, web sites or publications. Examples of each of the categories include: The Embedded Linux Exposition and Conference, www.linuxdevices.com and Linux Journal, respectively. We associate e-mail lists with web sites as part of their information offering.
The Hardware/Software category (see Figure 5) is for products that bundle hardware and software together. These bundled products are usually promoted as kits or systems for OEMs. The focus of these products is thus distinct from the singular Hardware and Software categories. Examples from the subcategories are for Chips/Chip sets: DiskOnChip along with its Linux driver. For Single Board Computers: the AVS Wireless LAN Developers Kit along with its version of Linux. For Complete Systems:the Easy I/O™ Linux DAQ System,complete system with dual-processor computer and Red Hat Linux 6.2.
By Services (see Figure 6) we mean professional services. Our categories include: Support such as the subscription service from MontaVista; Consulting such as by emLinux; Training such as by K Computing; and Programming such as by Echo Labs. Consulting is advice or aid provided in fixed-priced pieces. Support is advice or aid aimed at remedying problems. Also, support is a reactive service waiting for someone to call or e-mail. Programming is contract programming help. Training is either traditional instructor led or computer/web based.
Software (see Figures 7 and 8) related to embedded Linux systems can either be for use on the target or on the host. Target software involves special distributions, patched kernels and ports of Linux to new processors or systems. On the host side, the software consists of development tools.
[These figures are missing].
Special distributions of Linux for use in embedded Systems included such things as smaller versions of standard software, new device drivers and special versions of applications for use in embedded systems. The special versions of standard software or applications are designed to use less memory. Common examples are busybox, tinylogin and the GoAhead web server. The programs busybox and tinylogin are replacements for a collection of standard Linux programs such as cp, init, and tar. The GoAhead web server is much smaller than the more commonly used Apache.
A well thought out distribution includes packages that you are likely to want to use on an embedded system. The distribution should also leave out packages that you won't need. Desktop and server distributions, such as Red Hat 6.2, contain many more packages than embedded distributions. For example, Red Hat 6.2 includes about 600 packages and Embedix includes 57.
Patches to the Linux kernel are for three main reasons: to improve performance, to fix a bug or to reduce the kernel size. Improving performance frequently means to insure deterministic or real-time response. Some embedded Linux distributions include the RTAI or RTLinux real-time enhancements. RTAI and RTLinux involve patches to the kernel as well as a collection of other software to enable one to develop true real-time, ``hard real-time'' tasks to be run and for those tasks to communicate with Linux processes on the same computer. It is also common for embedded Linux distributions to provide improved scheduling for processes within Linux. Related to kernel patching is the use of loadable kernel modules. We have found that there are two kinds of loadable modules in use by embedded Linux developers. The first type is device drivers. There are numerous device drivers, such as Ethernet drivers, that can be compiled as dynamically loaded modules. The second type of loadable module is one that provides an extension to the kernel to aid real-time performance. Examples of these kind are RTAI and the scheduling support from TimeSys.
We do not have a separate category for loadable modules because the two types can be put into other categories. The drivers go into Special Distribution ® Drivers and the extensions go into Patched Kernel ® Improve Performance. The reason that these loadable modules go into a Patched Kernel category is because in order to make use of these kinds of loadable modules the kernel had to be modified. For example, new system calls are added.
Reducing the size of the Linux kernel is done through reconfiguration, the stripping of executables and removing unneeded members from libraries. It is not uncommon for an embedded version of Linux to have used each of these techniques. Some offerings provide special tools and instructions for doing these operations. Some examples of special distributions for embedded Linux are Hard Hat Linux from MontaVista, Blue Cat Linux from LynuxWorks, and Embedix from Lineo. There are also numerous smaller distributions that are designed for embedded developers but include many fewer components--an example is Cool Linux.
On the host side there are a variety of development tools. One expects to find the gcc compiler with all distributions of Linux. You will likely need this compiler to recompile the kernel. Other standard GPL tools besides gcc include gdb, a debugger; and gprof, a performance profiler.
There are integrated development environments such as CodeWarrior, tool-kits or libraries such as Qt/Embedded GUI, memory profilers such as the memory size benchmarking available in Blue Cat Linux, tracing tools such as Linux Trace Toolkit, test coverage such as ATTOL Testware, and source browsers such as Source-Navigator. Many of these tools also require that some components must run on the target system, although you likely won't use them in a deployed system. Some of the advantages of using Linux for both your host and target are demonstrated when you are able to profile, test, debug or trace your new application on your powerful development host as opposed to your tiny target platform.
While there are many advantages of using a Linux-based host, some software for use by embedded Linux system developers is for other platforms. In fact, some development tools are not written to run under Linux. For example, TimeWiz from TimeSys runs on Windows, not Linux.
Much of the work done by some vendors is to insure that their distribution and tools will work on a variety of targets. Such work may entail kernel modifications or simply vigorous testing. We have included those targets for which we have found support for embedded Linux. These include the following processors: Intel IA32/x86, MIPS, PowerPC, StrongARM, Cirrus Logic Maverick and PowerQUICC II. These processors may be used in a variety of boards or systems and our classification includes a number of these. It is likely that this part of classification will be quickly expanded as we learn of more ports.
Our Hardware category (see Figure 9) refers to suppliers that supply hardware but don't supply Linux software. A developer may obtain hardware components from one vendor and software from another. In fact, that's the usual case. Examples of hardware components include Chips/Chip Sets: you can buy a x86 microprocessor from lots of places; Complete Systems: buy a Windows-based PC and you have to make it a Linux-based system yourself; Boards: for example an Ethernet board.
Using the Taxonomy--An Example Vendor
MontaVista provides a distribution of Linux called Hard Hat Linux. Let's look at what offerings they have by using the Taxonomy. We should note that the determination of the extent of MontaVista's offerings with respect to the Taxonomy is hindered because MontaVista does not provide the answers to these kinds of questions in an organized manner. This is exactly the kind of situation that the use of the Taxonomy will benefit.
MontaVista provides special information in all three categories. They have provided seminars where members of their technical and marketing staff made presentations and answered questions. The seminar was naturally focused on both embedding and Linux.
MontaVista publishes a manual for their cross development kit, as well as providing PDF versions of manuals for some standard tools.
MontaVista provides a web site that includes various information of use to embedded Linux developers (http://www.mvista.com/). Monta Vista does not provide Hardware plus Software, or in fact, hardware at all.
MontaVista provides support of the device drivers, applications and the kernel that they provide. MontaVista also provides consulting. MontaVista does not currently offer organized training. MontaVista does provide programming services for device drivers and the kernel.
MontaVista provides a special distribution of Linux intended for embedded systems--Hard Hat Linux. Hard Hat Linux provides smaller versions of standard software, including busybox. Hard Hat Linux also provides drivers for a wide range of devices. Hard Hat Linux provides special embedded applications such as the MicroWindows system.
Hard Hat Linux is patched to improve performance and to fix bugs, such as with their replacement for the Linux scheduler. In addition Hard Hat Linux provides a smaller kernel through support for kernel configuration. Hard Hat Linux comes with compilers, gcc; tool kits and libraries, MicroWindows; a debugger, gdb; and performance profiling, gprof.
The MontaVista development tools run on Solaris, Red Hat Linux and YellowDog Linux. Hard Hat Linux has been ported to the systems in Table 1.
IA32/x86
Ampro LB3-P5X
Zioatech 5531
Force Computers 730-731
Intel Pica
Motorola 5350/5360
Radisys EMB-1
Winsys LBCPlus
MIPS
NEC Osprey 4181A
PowerPC
Embedded Planet
823 (RPX-Lite)
CLLF 860T
850 (RPX-Lite)
Linux Planet
EST SBC-8260
FORCEComputers 6750/680(G3)
Motorola
MCP 750
Sandpoint/755
MCPN 750
Sandpoint/8240
SBC Technologies SBS/K2
The Taxonomy presented is designed to aid embedded Linux developers in understanding the use of various products and services that are available. The Taxonomy could be extended arbitrarily deeply to provide additional information. For example, it is valuable to know whether a special embedded application requires license fees. One could extend that part of the hierarchy to include those that require fees and those that do not. The classification scheme is designed to be comprehensive in its breadth, not its depth. We encourage everyone to add to the hierarchy to make it even more useful.
Kevin Dankwardt is founder and president of K Computing, a Silicon Valley training and consulting firm. He has spent most of the last nine years designing, developing and delivering technical training on such subjects as UNIX system programming, Linux device drivers, real-time programming and parallel-programming for various organizations world-wide. He received his PhD in computer science in 1988. He may be contacted at k@kcomputing.com/.
Matt Reilly has worked for K Computing for three years administering Linux servers and teaching UNIX and Linux classes. When not travelling, he enjoys playing Celtic harp.