Simplicity and Performance: JavaScript on the Server
For years, Douglas Crockford, the high priest of JavaScript (JS), has claimed that it is a powerful, flexible language suited to a multitude of tasks, especially if you can separate it from the ugly browser-side piece that is the Document Object Model, or DOM. Because of the browser, JavaScript is the most popular programming language around by number of users. Job sites dice.com and monster.com post more jobs for JavaScript than any other language, except Java. Of course, if JavaScript runs in a browser, or anywhere, it must have an engine. Those engines have been around since the earliest JS-capable browsers, and they have been available as separate standalone entities for several years. Thus, the potential for running JS on its own always has been available. However, JavaScript always has missed two critical elements to make it worthwhile to run on the server side.
The first missing piece was a common set of libraries. Quite simply, because JS was so focused on the browser, it was missing basic I/O libraries, such as file reading and writing, network port creation and listening, and other elements that can be found in any decent standalone language. Ruby includes them natively; Java includes them in its java.io and java.net packages. For JavaScript, running alone when all you can do is process text and data structures, but not communicate with the outside world, was rather useless. Over the years, several attempts have been made to make some form of JS I/O and Net packages, mostly wrapped around native C calls, if the JS engine was written in C, such as SpiderMonkey, or java.io, and java.net calls, if the JS engine was written in Java, for example, Rhino.
This began to change in early 2009 with the creation of the CommonJS Project (which, for some mystical reason, stands for Common JavaScript), which unified these efforts under a common namespace, with JS-specific semantics and included a package-inclusion system to boot.
Using Rhino as an example, you could read from a file using:
defineClass("File"); var f = new File("myfile.txt"), line; while ((line = f.readLine()) !== null) { // do some processing } // this example slightly modified and simplified // from the Mozilla Rhino site
As you can see, this is not file processing in JavaScript; it is file processing in Java! All I have done is opened the Java API to JavaScript. It is great if you really intend to program in Java, but it's of limited help if you are trying to do pure JS, and especially if your engine is not Java-based.
With CommonJS, there emerged a standard JavaScript-native interface to include a package, for example an I/O package or http package, and define many of the standard functionalities. Under the covers, the implementation may be C, Java, Erlang or Gobbledygook. All that matters is that the interface to the developer is platform-agnostic and portable from one interpreter to another.
The second missing piece was a server, similar either to Tomcat/Jetty for Java or Mongrel/Thin for Ruby, that provides a real environment, includes the necessary modules and is easy to use. Most important, it needed to take advantage of JavaScript's strengths, rather than attempt to copy a system that works for Java or Ruby. The real breakthrough was Ryan Dahl's Node.JS. Ryan combined Google's highly performant V8 engine, JavaScript's natural asynchronous semantics, a module system and the basic modules to create a server that suits JavaScript to a tee.
Most Web servers have a primary process that receives each new request. It then either forks a new process to handle the specific request, while the parent listens for more requests, or creates a new thread to do the same, essentially the same method if somewhat more efficient. The problem with processes or threads is threefold. First, they require significant resource usage (memory and CPU) for a small amount of differing code. Second, these threads often will block on various activities, such as filesystem or network access, tying up precious resources. Finally, threads and processes require context switches in the CPU. As good as modern operating systems are, context switches still are expensive.
The alternative, gaining in popularity, is event-driven, asynchronous callbacks. In an event model, everything runs in one thread. However, each request does not have its own thread. Rather, each request has a callback that is invoked when an event—like a new connection request—occurs. Several products already take advantage of the event-driven model. Nginx is a Web server with similar CPU utilization characteristics to dominant Apache, but with constant memory usage, no matter how many simultaneous requests it serves. The same model has been taken to Ruby using EventMachine.
As anyone who has programmed in JavaScript, and especially in asynchronous AJAX, knows, JS is extremely well suited to event-driven programming. Node.JS brilliantly combines packaging and an asynchronous event-driven model with a first-rate JS engine to create an incredibly lightweight, easy-to-use yet powerful server-side engine. Node has been in existence for less than two years and was first released to the world at large only at the end of May 2009, yet it has seen widespread adoption and has served as a catalyst for many other frameworks and projects. Quite simply, Node changes the way we write high-performance server-side nodes (pun intended) and opens up a whole new vista.
The rest of this article explores installing Node and creating two sample applications. One is the classic “hello world”, a starting point for every programming example, and the other is a simple static file Web server. More complex applications, Node-based development frameworks, package managers for Node, available hosting environments and how to host your own Node environment, will be subjects for future articles.
Node will install on almost any platform, but it is ideally suited to UNIX-like environments, such as Linux, UNIX and Mac OS X. It can be installed on Windows, using Cygwin, but it is not as easy as the other platforms and there are plenty of gotchas. Like most server-side packages, if you want to do anything serious, do it on UNIX/Linux/BSD.
On Linux or UNIX, installation follows typical UNIX program installation: download, configure, make, make install.
First, download the latest package. At the time of this writing, the latest unstable version is 0.3.2, and the latest stable is 0.2.5. I recommend moving toward 0.3+ as soon as possible. Don't be fooled by the low version numbers; plenty of production sites are using Node right now for at least part of their environment, including github.com.
You can download the tarball directly from nodejs.org, or clone the github repository, my preferred method. If you don't have git installed already, do so via your preferred package manager or directly. Before you get started, make sure you have the prerequisites. Although I could include them here, the details of building git are beyond the scope of this article.
On Mac OS X:
# install XCode from the Apple developer Web site $ brew install git
On Linux or similar with the apt packaging system:
$ sudo apt-get install g++ curl libssl-dev apache2-utils $ sudo apt-get install git-core
Now, you are ready to download, compile and install Node. First, you need to cd to the appropriate directory. At that point, clone the git repository:
$ git clone git://github.com/ry/node.git # if you have problems with git protocol, http works fine $ git clone http://github.com/ry/node.git
Next, make sure you are in the right version. Because git clones the entire repository, make sure you switch to the correct version:
$ cd node $ git checkout <version> # version can be whichever you want, # but I recommend v0.3.2 as of this writing
Then, run configure. As usual, configure will check whether you have all of the prerequisites installed. Configure also will determine where to install Node when it is ready. Unless you are working on a production machine, I highly recommend installing Node in a local writable repository under your home directory, such as ~/local/. Installing git in the default /usr/local/bin/ leads to all sorts of interesting permission issues when installing packages and running everything as sudo during installs. Unless it is going to be shared among everyone and used in production, installation makes a lot more sense in your own directory. It is also quite small. The entire installation on my laptop, including binaries, man pages and several add-on packages, is less than 50MB. The Node binary itself is less than 5MB:
# installing in the default $ ./configure # installing in your own local directory, # my preferred method $ ./configure --prefix=~/local
Then, compile and install:
$ make $ make install
At this point, Node is installed and ready to run. If you installed Node in ~/local/, you need to add ~/local/bin to your path, which depends on your shell.
The critical thing to remember about Node development is that everything important is asynchronous. Sure, you could do many things synchronously, but why would you?
For example, a traditional Web programming model might look something like this:
// pseudo-code conn = connection.waitForRequest(); if (conn != null) { request = conn.getRequest(); response = conn.getResponse(); data = database.getData(query); response.write(someData); }
In asynchronous Node, you would do something more like this:
server.handleRequest(function(request,response) { // we need some data from the database database.submitQuery(query,function(data) { response.write(data); }); });
Notice how everything is in callbacks, an event-driven asynchronous model.
Everything starts with hello world. This example demonstrates the basics of modules and asynchronous handling of requests.
First, include the necessary http module:
var http = require('http');
http is a standard module included with Node. If you wanted a module that was not in the standard path, you would preface it with ./, which is executed relative to the path in which the app is running. For example, require("./mymodule");.
Next, create the server, which is as simple as createServer(), as well as the callback function to handle each request:
http.createServer( function(request, response) { // handling code here });
Next, put in the handling code. You know you want the response to be hello world and the http status code to be 200, which is basic success:
http.createServer( function(request, response) { // set your status code to 200 and content to plain text, // since "hello, world!" is as plain as it gets response.writeHead(200,{"Content-Type": "text/plain"}); // write out our content response.write("Hello, world!\n"); // indicate that we are done response.end(); });
The above is a callback function. It will be called each and every time a new connection request comes in.
Finally, you need to tell the server to listen and on which port. For now, let's put it on 8080 (just to annoy Tomcat):
http.createServer( callbackFunction ).listen(8080);
Pulling it all together, you get a very simple program:
var http = require('http'); http.createServer( function(request, response) { // set your status code to 200 and content to plain text, // since "hello, world!" is as plain as it gets response.writeHead(200,{"Content-Type": "text/plain"}); // write out our content response.write("Hello, world!\n"); // indicate that we are done response.end(); }).listen(8080);
Six lines of code, and a functioning Web server that says “Hello, world!” Save the file as app.js, and then run it:
# cd to your development directory $ cd workingdir $ node ./app.js
Connect your browser to http://localhost:8080, or use curl or wget, and you will see “Hello, world!”
For the next example, let's serve up files from the local filesystem. If the file is available in the document root, let's return it with a 200 response; if it is not, let's return a 404 status code and an error message.
Like last time, you need the http module. Unlike last time, you also need the modules to read from the filesystem, and an ability to process URLs:
var http = require('http'), fs = require('fs'), ↪path = require('path'), url = require('url');
Create the server and its handler, and listen on port 8080 (just to annoy Tomcat) in the same way as last time:
http.createServer( function(request, response) { // handling code }).listen(8080);
The difference is in the handling code. Now, when you get a request, you want to see whether it exists in the filesystem, and if so, return it:
http.createServer( function(request, response) { // __dirname is a special variable set by node var file = __dirname+path; // check if the requested path exists path.exists(file, function(exists) { if (exists) { } else { }); }); }).listen(8080);
You use the path module to check whether the file is available, but you do it asynchronously. Normally, file access is very slow, and everything in the thread or process will block. With Node's event-driven model, nothing blocks; rather, the system continues to move and calls the function(exists) callback when it has an answer if the file exists.
If the file does exist, you need to read it using the “file” module and send it back. If it doesn't, you send back a 404 error. First, let's look at the simple file-not-found case:
if (exists) { // do some handling } else { response.writeHead(404, {"Content-Type": "text/plain"}); response.write("404 Not Found\n"); response.end(); }
Now, let's look at reading the file and sending it back when it does exist. Once again, read the file asynchronously:
if (exists) { // read the file asynchronously fs.readFile(file,"binary",function(err,file) { if (err) { // we got some kind of error, report it response.writeHead(500,{"Content-Type":"text/plain"}); response.write(err+"\n"); response.end(); } else { response.writeHead(200,{"Content-Type":"text/html"}); response.write(file,"binary"); response.end(); } }); }
Tying it all together and cleaning it up a bit, you get a nice tidy, asynchronous, event-driven Web file server:
var http = require('http'), fs = require('fs'), ↪path = require('path'), url = require('url'); http.createServer( function(request, response) { var file = __dirname+url.parse('url').pathname; // check if the requested path exists path.exists(file, function(exists) { if (exists) { fs.readFile(file,"binary",function(err,file) { if (err) { response.writeHead(500,{"Content-Type":"text/plain"}); response.write(err+"\n"); response.end(); } else { response.writeHead(200,{"Content-Type":"text/html"}); response.write(file,"binary"); response.end(); } }); } else { response.writeHead(404, {"Content-Type": "text/plain"}); response.write("404 Not Found\n"); response.end(); } }); }).listen(8080);
A static Web file server, which will outperform most such servers on the market, in just 23 lines of code—it's a work of Art.
Node.JS is an incredibly powerful, simple and elegant engine to run event-driven server-side JavaScript, and it has been a catalyst for an enormous amount of fermentation in the server-side world during the past year and a half.
Resources
Node.JS: nodejs.org
Node.JS Git Repo: github.com/ry/node
CommonJS: www.commonjs.org
Cygwin: www.cygwin.com
Nginx: nginx.org
Douglas Crockford: www.crockford.com
Language Popularity: www.webdirections.org/the-state-of-the-web-2008
Avi Deitcher is an operations and technology consultant based in New York and Israel who has been involved in technology since the days of the Z80 and Apple II. He has a BS in Electrical Engineering from Columbia University and an MBA from Duke University. He can be reached at avi@atomicinc.com.