That First Gulp of Java
When we told a newbie friend that we had just co-authored a book on Java, he said, “You may be surprised to learn that I've heard of Java.” We weren't. In little more than a year Java has gone from an obscure back-room project at Sun to an all but universal topic of interest and speculation in the computing community—and we would be astonished if it hadn't.
If Java were just another powerful, expressive, object-oriented programming language, a few developers would have glanced at it, said, “that's nice.” and gone back to plowing their way through C++ code. If it merely offered a far more flexible, secure, and transparent way to enrich web page content with small programs, a few web surfers would have gotten about as excited as they did over “plug-ins”. If it only provided a more convenient way to distribute large applications across heterogeneous platforms, a few large companies would have begun investigating its potential.
Java combines all these features and many more, however, and it is this extraordinary combination that uniquely qualifies Java to capture the interest of an entire industry, and to demand significant investment by most of the big names in that industry: Borland, Intel, Microsoft, Novell, Oracle, SGI, and Symantec among many others. Even a fairly brief technical overview should suffice to explain the unprecedented enthusiasm that has greeted Java.
Like C++, Java capitalizes on the popularity of the C language, and preserves much of its compact, expressive character. Unlike C++, it makes no attempt to maintain backward compatibility and, for that reason and others, it has many advantages over its currently popular cousin.
While C++ provides syntax to achieve encapsulation, inheritance, polymorphism, and other features of object-oriented programming, it also supports traditional structured (and unstructured) programming. Java does not. Every Java program we create, even the first “hello world,” is object-oriented, preventing us from backsliding into old, familiar—and less productive—ways.
The need to support two utterly different types of programming makes C++ needlessly complicated. For example, a parameter can be of either a built-in type or a class type, and may be passed by value, by reference, or via a pointer: six possible combinations. In Java, a built-in parameter is always passed by value, a class-type parameter is always passed by reference: just two combinations, and no decision required. Much simpler. Furthermore, stronger typing eliminates many of the implicit type conversions that introduce unexpected ambiguity.
“Wait! No pointers?” Yup. No pointers. C needed them to support array processing and a primitive form of call by reference. Java supports true call by reference (as C++ does) and implements arrays as a built-in type. So, no pointers. No uninitialized pointers. No forgot-to-check-for-NULL pointers. No tears-at-midnight-trying-to-find-the-mis-aimed pointers.
Global variables also meet a cruel but well deserved fate. In Java, every datum is neatly encapsulated, accessible only by operations of the class of which it's a member. Thus another major source of programming mistakes disappears entirely. Automatic garbage collection gets rid of yet another: the “memory leaks” that result when we fail to release memory allocated dynamically.
Java also discards a mechanism that “seemed like a good idea at the time”: the preprocessor.
Early C compilers avoided multiple parsing passes by supporting forward declarations of functions and data. Programmers could ensure consistency among declarations, definitions, and uses by placing the declarations in separate header files, then directing the preprocessor to “#include” them in source files. C and C++ implementations have continued this tradition into an age in which systems are so large that we must pre-compile the header files themselves to get acceptable rebuilding times—and header-file maintenance becomes a major task in itself.
In Java, class definitions are not split between header and source files. Simple “import” statements tell the compiler which classes are needed by the current source file, and the compiler does the rest. Declaration order becomes unimportant. This scheme requires a bit more sophistication from the compiler, but one result of eliminating the preprocessor is that Java programs generally take less time to build than C++ programs. Another is that the developer's life becomes much easier.
The C/C++ preprocessor solves problems other than centralization of declarations, of course, but the Java compiler handles most of these other problems easily. And one other major problem disappears altogether:
The need to support multiple platforms obliges C and C++ developers to devote considerable effort to maintaining alternate blocks of code, and using conditional-compilation directives to ensure that only the correct blocks are compiled when they rebuild a particular system version.
Java makes all that tedium go away, because the language is independent of any particular machine architecture. Java programs are fully portable, not only the source code but the executable code as well, courtesy of the Java Virtual Machine (JVM).
Most modern languages are “fully compiled.” The compiler generates the “native code” of a specific target platform, i.e. the machine language appropriate to a particular operating system running in a particular processor. Once a program is installed on a user's machine, the operating system executes its instructions directly—an arrangement that achieves efficiency at the expense of portability. For example, if you are running Linux on a Pentium-based PC and creating a C program with GNU's gcc compiler, the resulting executable will run just fine on your machine and ones like it, but not on a Pentium running OS/2, and not on a DEC Alpha running Linux.
If you want to distribute your program widely, you will need to recompile it for a dismaying number of platforms, probably using a number of different development tools. Oh, and you want to keep supporting your software after sale, as well? Nice of you—start hiring. Experience has shown that the long-term effort of maintaining software products on multiple platforms far exceeds the effort of developing them in the first place. And the costs are proportional to the effort—better hunt up some heavy financing to pay all those people.
Java eliminates the complexity of cross-platform development and support through its reliance on a “virtual machine.”
As the word “virtual” implies, a Java compiler's target is a machine that does not actually exist. Instead of generating the native code of a particular platform, it produces “bytecode”, a sequence of 8-bit codes that no actual machine can execute directly. Your program will run, however, and not only on your Linux box, but on any platform that supports Java—and these days that's the same as saying “on any popular platform.”
To execute a Java program, a machine must have a Java Run-time System (JRTS), an implementation of the JVM for that platform—but that is all it needs to run any program written in Java. The JRTS executes the bytecode much as an operating system executes native machine code. Because the run-time system handles all those nasty machine-specific issues for every program, the program itself does not have to.
It is a common mistake to confuse the run-time system with the virtual machine. Even people who should know better sometimes refer to “a program running on [a particular computer's] virtual machine”—and thereby conceal a crucial distinction. Part of Java's unique character is that only one piece of software, the JRTS, knows anything about the particular platform. Programs themselves remain blissfully ignorant of hardware dependencies—and so do programmers. They write their code for a machine that does not exist, serenely confident that doing so makes it portable to any popular platform.
A JRTS loads compiled classes as needed, performs security checks, and dynamically binds calls to methods. At that point many run-time systems begin executing the bytecode, interpreting each one as it is encountered. This continuous interpretation limits execution speed, and is the source of many early complaints about poor performance. An increasing number of Java implementations solve this problem by performing a second compilation step, “just in time.”
Native-code compilers produce fast executables at the expense of portability. Java compilers that produce bytecode achieve portability at the expense of speed—if the JRTS interprets each instruction every time it is encountered.
Many run-time systems don't. In place of the interpreter they include a just-in-time (JIT) compiler. The first time the JRTS loads a portion of bytecode, the JIT compiler translates it into native code. Thereafter, the run-time system executes the native code instead of interpreting the bytecode; execution speed improves dramatically.
It is worth stressing that users get the speed of fully compiled programs without sacrificing portability. The JIT compiler is part of the JRTS, not the Java source-code compiler, so all platform-specific knowledge resides only in the run-time system, on the user's machine, where it belongs. Software developers continue to compile and distribute the same portable, architecture-neutral bytecode files.
A second compilation step is not as expensive as it might sound. JIT compilation is actually quite fast in practice, because the most time-consuming tasks are completed in the first translation, from original Java source code to bytecode. JIT-compiled code is currently running 20 to 30 times faster than interpreted bytecode; this level of performance compares favorably with that of object-oriented code written in C++. Future improvements could boost this ratio to 50 or more, which would put Java executables on a par with optimized C code.
Java's virtual-machine concept improves security as well as portability, and at several levels.
Because a traditional fully compiled program is in native code, it is in an uncomfortably good position to exploit weaknesses in the operating system or hardware, and do serious damage. By contrast, Java bytecode is architecture-neutral, and what it does not know about the platform it cannot exploit. This “passive protection” is only the beginning, however.
Strong typing, including the addition of a boolean type, the replacement of pointers with type-safe references, and the elimination of other troublesome features makes it possible to perform run-time checks that validate a program's correctness.
In addition, the run-time system's Bytecode Verifier validates each program at load time, in several ways: It simply rejects any file that does not adhere to the distinctive bytecode-file format, thus avoiding execution of what might appear to be valid Java instructions but are not. When satisfied that a file is in the proper form, the verifier examines the bytecode itself for ill-formed constructs. It then goes on to search for errors usually not detected before run time, such as stack overflow.
Another part of the JRTS, the Class Loader, further enhances security by isolating classes from each other in separate security domains. To guard against malicious code, it separates classes that are built into the run-time system itself from classes local to the user's account, and separates both of these from classes that come from other users and other systems. An ill-intentioned “foreign class” thus cannot disguise itself as a more trusted class.
Users are understandably concerned that a virus or a Trojan Horse will enter their systems by way of an applet downloaded from the Internet. To guard users' systems, run-time systems employ combinations of security features Java makes possible, above and beyond bytecode verification and class partitioning. A Web browser or other package typically enables users to select from among multiple security levels, so that they may deny or limit “untrusted” applets' access to network connections and local file stores. Clearly visible marks distinguish windows created by trusted and untrusted applets so that the latter cannot masquerade as the former.
Much has been made of the risks inherent in downloading executable code over the notoriously insecure Internet. Experience with “plug-ins” has created some justified worry, but it is important to learn from Mark Twain's proverbial cat, and not shy away from a cool stovetop just because we once jumped onto a hot one. Java is too new for us to dismiss all such concerns blithely, but its many security features make it much safer than comparable technologies.
Some will not be satisfied with any risk level above zero; for them the only counsel can be complete abstinence from the pleasures of the Internet. Others realize that some risk is an inevitable feature of life in this world, and they can protect themselves by obtaining a Java Run-time System from a reliable vendor, through means as secure as those they use to acquire other software. Doing so should bring risks down to a level acceptable by most.
The first uses of any new technology are often relentlessly trivial. If our only exposure to Java has been cutesy animations and downloaded calculators, it all too easy to underrate its potential. We hope this brief overview has shown that Java offers much more than bouncing heads—even though we didn't have the space to describe the neat way Java separates inheritance of implementation from inheritance of interface, and its built-in support for multi-threading, and....
Brian Christeson with John Mitchell, co-authored of Making Sense of Java. They are working on professional courses, other books, a compiler, and consulting/development projects related to Java, Tcl/Tk, and other languages. Brian lectures on OO analysis, design, and programming at major companies in the U.S. and abroad.
John Mitchell with Brian Christeson, Making Sense of Java. They are working on professional courses, other books, a compiler, and consulting/development projects related to Java, Tcl/Tk, and other languages. John developed PDA software in OO assembly language, and writes two columns for JavaWorld magazine.