The Past and Future of Linux Standards
Despite their well-earned reputation as a source of confusion, standards are one of the enabling factors behind the success of Linux. If it were not for the adoption of the right standards by Linus Torvalds and other developers, Linux would likely be a small footnote in the history of operating systems.
Some people believe the interest in Linux standards is very recent, precipitated by the upswing in commercial interest. However, before Linux was even named, conformance to an open standard was an important goal. Here is one of the first postings by Linus Torvalds about a project that would soon be named Linux:
From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds) Newsgroups: comp.os.minix Subject: Gcc-1.40 and a POSIX-question Date: 3 Jul 91 10:00:50 GMT Due to a project I'm working on (in minix), I'm interested in the POSIX standard definition. Could somebody please point me to a (preferably) machine-readable format of the latest POSIX rules? FTP sites would be nice.
A month later, Linus posted:
As to POSIX, I'd be delighted to have it, but POSIX wants money for their papers, so that's not currently an option.Despite the high cost of a copy of the POSIX standard in 1991, it became one of the primary standards for Linux. POSIX, the Portable Operating System Interface, is a standard application programming interface (API) used by Linux and many other operating systems (typically UNIX and UNIX-like systems). There are several major benefits to using the interface defined by POSIX. It makes it easier to write source code that can be compiled on different POSIX systems. It also gives Linux application developers and Linux kernel developers a well-defined API to share. That means application developers don't need to track most kernel changes as long as the kernel continues to behave as POSIX says it should.
In addition, using POSIX as the API for Linux enabled Linus and other early Linux developers to use existing free programs written by the GNU Project, the BSD operating system and many other free programs which adhere to the POSIX specification.
It is important to note that POSIX does not provide for precompiled binary applications to be run on any POSIX operating system. Since it provides source code compatibility, but not binary compatibility, POSIX is often thought of as a source standard. Early development of Linux was made under the Minix operating system. In fact, Linus originally wanted to make Linux binary-compatible with Minix. That idea was dropped when the differences between Linux and Minix became too great, but some traces of the Minix heritage of Linux still linger here and there.
When asked about the importance of POSIX in early Linux development, Linus Torvalds said, “Linux started out very aware of POSIX, but even more so of the unofficial de facto standards.” He elaborated that, at the time, these de facto standards were approximately SunOS (the precursor to Solaris) behavior, somewhere between BSD and System V. “Basically, I wanted to not have to spend too much time porting user mode programs (it wasn't something I was all that interested in), and POSIX helped in that,” Torvalds said.
POSIX.1 (the POSIX kernel interface) is still considered by Linus Torvalds and other kernel developers to be the “base standard” for the kernel. Some later additions to the POSIX specification have not been as useful as POSIX.1, and design issues often have to be resolved before Linux can make use of a standard. Linus describes one of the best examples of such a design decision:
Often there are standards that are too generic to be very useful as a guide for the kernel. The “pthreads” POSIX threads standard is one example: some people have tried to implement their kernel threading model according to it, and the standard simply is not very well suited to that.
Even in that case, I still wanted to support the standard; I just did not want to natively implement the standard as-is. So Linux clone was created: the infrastructure to do threading under Linux, on top of which you can implement pthreads and also other threading models.
It is difficult to list all of the other standards used today by Linux. TCP/IP, Ethernet and other formal and de facto standards form the basis for networking in Linux. The IBM PC is one of the best examples of a de facto standard. (This standard is now more formalized.) The PC allowed thousands, and then millions, of people to run Linux on a system typically used for Microsoft Windows. It is a lot easier to take over the world if you run on the standard hardware of the day. Just as significant, the GNU C compiler (without which there would be no Linux) is built on top of the K&R (Kernigan & Ritchie) and ANSI C standards.
In addition to formal standards like POSIX and ANSI C, one of the most prevalent types of standards used by Linux systems is a common implementation. The best example, of course, is the Linux kernel. No written specification is available for many of the interfaces provided by the kernel. However, no specification is needed because only one strain of kernel is accepted by everyone (not just all distributions or all developers, but truly everyone). Want to know the standard? Read the kernel source.
While not always quite as clear-cut, many more examples of common implementation standards are accepted by the entire Linux community, or at least a large chunk of it. Take the utilities and programs provided by the GNU Project. For example, it has long been accepted that the utility find is the enhanced GNU version, not the BSD one or some other version. If you are writing an application for Linux only, it is probably safe to rely on most of the functionality provided by GNU find. Because so many GNU utilities like this are used, Richard Stallman, founder of the GNU Project, insists that people refer to Linux as “GNU/Linux” or as a “Linux-based GNU system”. This tends to upset many people, but most developers seem to recognize Stallman's immense contribution and don't make a big fuss about it, even if most of them still say “Linux”.
The GNU project's share in the success of Linux does not end there. Linux has now standardized on the GNU C library (glibc) as the common implementation for future Linux systems. The glibc project aims for compliance with several standards. According to Ulrich Drepper, the GNU libc maintainer, all of the important standards are supported as long as they don't conflict with common sense. That list includes ISO C 90, ISO C 9x, POSIX and UNIX 98.
What if there is not a common implementation, a formal standard, or even an informal, de facto or ad hoc standard? One thing you can do is try to pick the best and most common practices and base a standard on them.
The Filesystem Hierarchy Standard (FHS) was among the first written standards developed specifically for Linux. The FHS aims to standardize the locations where files and directories are placed in the system. Standard locations are needed so that applications can be compiled and run well on different Linux distributions; it is also helpful for developers. If you want to write documentation for Linux, or if you need to work on more than one variety of Linux, it is invaluable knowing you can expect to find important files and directories in a standard location.
If you are wondering why a standard such as POSIX could not be adopted, the reason is one did not exist. Each UNIX vendor had its own solution, and there was a great deal of overlap, but the specifications and rationale for each vendor's layout were either lacking or nonexistent. Unfortunately, no multi-vendor standard was available for a standard layout of files and directories.
The FHS was started in 1993. At the time, the Linux OS was approximately two years old. As he does today, Linus dictated how things would work in the Linux kernel and did not exert much influence on other areas of the Linux operating system. With no central authority for those other areas, a number of Linux distributions had considerably different layouts. Some were like the BSD operating system, others were more like SunOS, and others were different still. (Incidentally, most of the names of the big distributions back then were different from the ones we hear today: Debian, Linux/PRO, MCC, Slackware, SLS, TAMU, Yggdrasil, etc.)
The first version of the FHS was based on ideas from SunOS 4, SVR4, 4.3BSD, 4.4BSD, SunOS 4, HP-UX and many other UNIX systems, but it was also based as much as possible on common practices distilled from the various Linux distributions existing at the time. The completed specification attempted to take the best of each file-system layout and combine them into a solution that worked well.
Since then, subsequent releases of the FHS have been refinements of the original specification, importing a few additional ideas from other areas (including 4.4BSD and SVR4).
Today, most Linux distributions follow FSSTND 1.2 (the precursor to FHS 2.0). They do this not because they are forced to, but because it cleanly addresses a problem they were having. FHS 2.1 should be available by the time this article is published.
The widespread availability of third-party commercial software for Linux is a relatively new phenomenon. Many vendors are faced with a problem they have never seen before.
At one time, all the software on your Linux box generally came from one of two places. Either it came with the distribution, or someone (you, a friend, maybe even a system administrator) compiled it from the source and installed it locally. Distributions are very good at integrating software and making sure everything works together. Assuming someone had the expertise, it is possible to get even a sub-standard program compiled and installed for your Linux boxes.
What if someone else wants to give you a pre-compiled binary package to install on your system? They are forced to consider which distribution you run, what version of the distribution, your kernel version, and a slew of other considerations before they can be reasonably certain that the application will install correctly on your system and work. If the goal is to provide a binary package for everyone, it is even worse. You must tailor the application for any and all distributions, and then compile, build and test it on each one, too. Is it any surprise that few vendors support even three or four of the major Linux distributions for their applications?
One of the more common fears about Linux is that fragmentation will occur because it uses an open-development model. As in real life, fragmentation occurs when something breaks into many different pieces which are no longer connected. Fragmentation is not a new phenomenon in the free-software community. Sometimes it happens for good reasons, perhaps when a maintainer is not doing a good job. Sometimes it happens for less worthy reasons, such as a personality conflict between lead developers.
This has happened before. A graphical-oriented version of the GNU Emacs editor called XEmacs split off of the version maintained by the Free Software Foundation (FSF) because of technical differences between the FSF and the developers of Lucid Emacs (the predecessor to XEmacs). The XEmacs FAQ does not take a rosy view of the situation:
There are currently irreconcilable differences in the views about technical, programming, design and organizational matters between RMS [Richard Stallman] and the XEmacs development team which provide little hope for a merge to take place in the short-term future.
If you have a comment to add regarding the merge, it is a good idea to avoid posting to the newsgroups, because of the very heated flame wars that often result.
When there is a split like this, it usually has the most impact on the end user. Add-on packages work with one variety of the software, but might not work with another. That is more or less what happened to GNU Emacs, although developers try to compensate for it as best they can.
A small amount of fragmentation, such as the difference between Linux distributions, is good because it allows them to cater to different segments of the community. Because Linux is Open Source, different distributions have the freedom to be unique. For example, the Extreme Linux distribution targets high-performance clusters of Linux PCs running the Beowulf clustering software. Rather than a single all-in-one distribution, each distribution can target a segment of the Linux community and try to meet that segment's needs better than any other distribution. Users benefit by getting a Linux distribution that more closely meets their needs than is possible under a single distribution model.
Also important is the attitude about fragmentation in the Linux community. Everyone is concerned that fragmentation could become a problem, and wants to ensure that applications can run on any variety of Linux. That is where the Linux Standard Base (LSB) is involved. The LSB project is working to define a common subset of Linux that everyone can count on, independent of the distribution. By defining only what can be expected in a minimal base Linux system, the LSB is attempting to find a balance between stifling Linux development and the possibility of Linux fragmenting into several totally incompatible versions.
The main problem the LSB is addressing is that software vendors must port and test their software on multiple Linux distributions, because even a small difference between distributions can result in major problems for the software when the vendor's software has been based on the behavior of one distribution. The difficulties ultimately affect everyone: users, developers, application vendors, Linux companies, et al.
Aside from the danger of fragmentation (which, by the way, Linux has avoided for the last eight years without an LSB), there are secondary dangers, namely FUD—fear, uncertainty, and doubt. The LSB hopes that by making fragmentation a remote possibility, it will help bolster confidence and win support for Linux from even the most conservative sectors. How will it do that? The Linux Standard Base Goals are as follows:
Enable software applications to run on any LSB-compliant distribution.
Increase compatibility among Linux distributions.
Help support software vendors and developers to port and write software for Linux.
Those are the things the LSB developers want to do. On the other hand, we want to avoid some things. Our LSB Guidelines for Success include:
Don't tread on distributions—everyone wants them to be unique.
Do no more than what is required to solve the basic problem of application portability.
Don't break old systems or prevent future advances.
Is it enough for Linux developers to make their own way based on standards developed by outside groups such as the IEEE, The Open Group, ISO and ANSI? Probably not. Linux developers have been able to pick and choose which standards to adopt and how to implement them, but as standards are revised and extended, Linux developers want to ensure future standards also meet their needs.
One such revision in progress is a joint revision of the POSIX standards by the IEEE, The Open Group and ISO. The group revising the standard is known as the Austin Group. Unlike previous POSIX standards, the goal is a common set of documents shared by all three organizations. USENIX, the Advanced Computing Systems Association, is helping to fund two Linux developers to attend meetings and participate in the revision. The two developers are Ulrich Drepper, the glibc maintainer, and H. Peter Anvin, author of the kernel automounter and maintainer of the Linux device list. The POSIX revision, Ulrich says, will throw away or at least make optional some of the less wanted parts of the old standards (such as STREAMS). This is a good thing for Linux because those parts have not been adopted by the entire Linux community. The result is that fuller compliance with POSIX will become more likely.
In addition, Ulrich adds, there are functions he would like to see standardized in the new POSIX specification. Some of those function specifications may come directly from the glibc project. If that happens, maybe some future operating system can put some of the standardization blame on Linux.
Daniel Quinlan (quinlan@transmeta.com) is the chair (i.e., project leader) of the LSB, the editor of the Filesystem Hierarchy Standard and a member of the Linux International technical board. He is employed as a System Administrator at Transmeta Corporation. Outside work, he is currently getting into indoor rock climbing.