GNU Bayonne Is for Telephony
Three years ago I came to realize that we had a serious need in free software. Although free software had expanded to fill almost every other void in the enterprise infrastructure, we had not addressed the needs of telecommunications. Telecommunications are not only a part of the infrastructure of every business, but they are also an often overlooked part of the desktop user's experience. At the same time, the hardware required to create telephony services for the public telephone network has become more widely available under commodity PC platforms and operating systems, including GNU/Linux.
In choosing to address telecommunications with free software, I and a few others decided to create a framework describing what all these services might be, ranging from the needs of desktop users and application programmers to the needs of the largest commercial carriers. This project later became known as GNUCOMM when it was officially folded into a GNU project working group.
One area we chose to define was the idea of a telephony application server. Such a server should make it both possible and easy to create and deploy new telephony application services. These would be applications specifically written to interact with real people that call the server over regular telephone lines and interact with the application with both a voice and a telephone keypad.
Applications of this nature typically include things like voice-mail systems or prepaid (debit card) calling platforms. All of these systems are complex and sometimes programmable systems and specialized computer telephony hardware are needed to provide an interface between the PC platform and the public telephone network. This can be hardware that talks to individual analog telephone lines or even hardware that provides multiport voice control over ISDN and T1 digital voice circuits, which larger enterprises can get directly from a local carrier's central office.
With full consideration that such systems in the past were generally very expensive, always proprietary and often hard to program, I chose to solve all of these problems at once by writing a server under the best supported free software platform available at the time: GNU/Linux.
When we started the project, few companies provided telephony hardware under GNU/Linux, so we used what was available. Even now, each telephony card is different from every other one and tends to include its own API. Since neither the hardware nor the APIs are in any manner standardized, most people that produce telephony applications do so for only a single vendor's card family, and they do so using exclusively the vendor's supplied API. This practice also means that any vendor in the computer telephony business has to provide a very broad family of hardware because one could not substitute easily other products to fit gaps in a product offering. All these things have made it difficult for new telephony card vendors to come into existence and easy for the limited vendors that do exist to maintain their markets without much change.
This is not to say that no efforts were made to standardize APIs. After all, there is the ECTF (European Community Telework/Telematics Forum). Being an industry consortium of proprietary vendors, they would have to come up with, through committees, a complicated set of standards and proposals for how proprietary vendors could develop and maintain computer telephony solutions. Furthermore, they would need to do so in ways that expand the need for specialized knowledge, increasing the stranglehold of their existing members on the computer telephony marketplace.
Another popular organization is the ITU (International Telecommunications Union), best known for the fact that appointment often is handled by national governments. In the US, for example, this is done as a political appointment by the state department, rather than from among the best and brightest minds.
Our goal was not only to produce a telephony server as free software, but we wanted also to make telephony application services as readily and easily approachable as creating and administering a web site. We also wanted to abstract the telephony driver and APIs to the point that they were both irrelevant and invisible in the development of application services. Doing so would mean anyone could substitute hardware as they wished, rather than being locked into the offering of a single vendor.
Since we wanted to abstract everything within the server at a low level, the first thing we needed was a portable class foundation written in C++. I wanted to use C++ for several reasons. First, it seemed natural to use class encapsulation for the driver interfaces because of their abstract nature. Second, I found I could write bug-free C++ code faster than I could write C code. In fact, this would become my first large-scale C++ project.
Why we chose not to use an existing framework is also simple to explain. We knew we needed threading, socket support and a few other elements. No single existing framework did all these things except a few that were larger and more complex than we needed. For example, we wanted a small footprint for a telephony server. The most adaptable framework at the time was ACE (Adaptive Communication Environment), which typically added several MBs of core image for the runtime library. Since we were looking at running on machines with as little as 8-12MBs of memory, this seemed an unacceptable overhead.
GNU Common C++ (originally APE) was created to provide an easy-to-comprehend and portable class abstraction for threads, sockets, semaphores, exceptions and so on. APE has since grown and is now used as a foundation for a number of projects in addition to being a part of GNU.
As to creating services themselves, we realized we needed a new way to create telephony applications—one that would make the process approachable for the average system administrator. For simplicity we choose to use a common scripting language, which later became known as GNU ccScript. By writing scripts and recording audio samples to create telephony application services, virtually anyone could participate without needing specialized knowledge or deep understanding of fantastically complex APIs like those promoted by the ECTF. Because the underlying telephony hardware is both invisible and abstracted away from the application scripting language, the cycle of dependence on using a single card family is also broken.
But what form should this new scripting language take? Many extension languages assume a separate execution instance (thread or process) for each interpreter instance, making them unsuitable for our project. Many extension languages assume expression parsing with nondeterministic runtime. An expression could invoke recursive functions or entire subprograms, for example. Again, we did not want to have a separate execution instance for each interpreter instance, and we did not want to have each instance respond to the leading edge of an event callback from the telephony driver as it steps through a state machine, so none of the existing common solutions like Tcl, Perl, Guile, etc., would immediately work for us. Instead, we created an entirely new nonblocking and deterministic scripting engine for our first server.
Our scripting language is unique in several ways. First of all, it is step executed and nonblocking. Statements can either execute and return immediately or schedule their completion for a later time with the executive. This allows a single thread to invoke and manage multiple interpreter instances. While a telephony server can potentially support interactions with hundreds of simultaneous telephone callers on high-density carrier scale hardware, we do not require hundreds of native “thread” instances running in the server, and we have a very modest CPU load.
Another way our scripting is unique is in support for memory-loaded scripts. To avoid delay or blocking while loading scripts, all scripts are loaded and parsed into a virtual machine (VM) structure in memory. When we wish to change scripts, a brand new VM instance is created to contain these scripts. Calls currently in progress continue under the old VM, and new callers are offered the new VM. When the last old call terminates, then the entire old VM is disposed of. This allows for 100% uptime, even while services are modified.
Finally, since we were building a C++ scripting system, we allowed direct class extensions of the script interpreter as a means to add new script functionality. This allows one to create a derived dialect specific to a given application or, if needed, specific to a given telephony driver, simply by deriving it from the core language through standard C++ class extension.
While the server scripting language can support the creation of complete telephony applications, it was not designed to be a general-purpose programming language or to integrate with external libraries the way traditional languages do. Nonblocking requires that any module extensions created for the server be highly customized. Instead, we wanted a general-purpose way to create script extensions that could interact with databases or other system resources. To that end we chose a model essentially similar to how a web server did this when our ACS (Adjunct Communication Server) Project was created.
The TGI model for our server is similar to how CGI works for a web server. In TGI, a separate process is started, then is passed information on the phone caller through environment variables. Environment variables rather than command-line arguments are used to prevent snooping of transactions that might include things such as credit-card information that could be visible with a simple ps command.
The TGI process is tethered to the server through stdout and any output the TGI application generates is used to invoke server commands. These commands can do things like set return values, such as the result of a database lookup, or they can do things like invoke new sessions to perform outbound dialing. Rather than creating a gateway for each concurrent call session, a pool of available processes are maintained for TGI gateways so it can be treated as a restricted resource. It is assumed that gateway execution time represents a small percentage of the total call time, so maintaining a small process pool always available for quick TGI startup is efficient. This helps to prevent stampeding if, say, all the callers hit a TGI at the same moment.
With these basic tools, it was possible to create interactive voice response applications. As soon as it was functional, our first telephony server was used commercially by Open Source Telecom and other companies. This wide adoption was a result in part of how simple it is to create new application services and to integrate telephony applications under this server with other aspects of a commercial enterprise. As noted, the only requirements are some skill in constructing a server-side script, the ability to play and record audio and some knowledge of common tools like Perl.
A typical application for our server might look like the one shown in Listing 1 [available at ftp.linuxjournal.com/pub/lj/listings/issue100/6077.tgz], the playrec script. This script demonstrates the different concepts in the current scripting language, including symbol scope and event trapping, which, used under named script references, form a chain of logic for processing an interactive telephony application. In Listing 2 [available at ftp.linuxjournal.com/pub/lj/listings/issue100/6077.tgz], we have an example of the server's use of Perl with the TGI.pm module and the tgigetdbval.pl Perl script.
As noted earlier, we achieved all these goals some two years ago with the first of these telephony application servers, which as previously stated, which was called Adjunct Communication Server or ACS, for short. Unfortunately, ACS suffered from a name problem, and I received many letters from different people pointing out that ACS was used by several other projects, including Al's Circuit Simulator. Clearly this was a problem.
Bayonne Internals
At the same time, the ACS architecture was showing its limitations. First, it was based on the idea of building a server directly bound to the telephony card, much the way XFree86 3 bound the X server to a given family of video hardware. This meant separate servers had to be compiled for each card family, and a lot of code was being duplicated needlessly.
I chose to rewrite the entire core server from scratch, and this was completed over a period of a few weeks. The first thing I did was create a concept for supporting plugins, using a somewhat different idea from how most people had done plugins in the past.
Typically, a plugin would be a small object file dynamically loaded with a known symbol name or structure that could be found easily once the loaded file was examined. Hence, one can use dlopen to open the plugin and dlsym to find a known symbol, thereby calling functions within the plugin.
I came up with a different method: I made the new server export its own dynamic symbols. The server then had a bunch of base classes with constructors that would initialize a registration structure. The plugins were written instead as C++ derived classes, where the base class was defined in the server and had static objects for these derived classes. When the plugin was loaded with dlopen, the constructors of these static objects would be invoked automatically, and the base class references to the server image resolved automatically. The base class held inside the server image would be invoked from the constructor, and it would register the plugin object. Hence, a single dlopen would both load the plugin and perform all initialization as a single operation.
Furthermore, things that were part of ACS got spun off as separate packages. This is when GNU ccScript and GNU ccAudio became separate class libraries, as these represented the already useful scripting engine and audio processing functionality found in ACS. In particular, we were looking at using the scripting language in other servers that would be part of GNUCOMM.
GNU ccAudio has proven to be a useful general-purpose audio processing library. It can be used to pregenerate single- and dual-frequency tones that can be played later from memory, and it can assemble audio from multiple input files into packed, fixed-sized frames with silence filling at the end, as is commonly needed for feeding DSPs (digital signal processing). This feature makes it a bit unique because other audio processing libraries typically do not concern themselves with these issues. Ideally, I would like to extend GNU ccAudio into a full, general-purpose audio processing framework that can also be used to provide host-based DSP-like processing.
So we had a new server, only it lacked a name. Since we wanted something distinct and unlikely to be used by someone else, we decided not to use yet another acronym. Instead, since the server was essentially a bridge between the computer and telephony world, it seemed natural to choose a bridge for a metaphor. But what bridge?
There of course is the Brooklyn Bridge. But overused and having bad connotations, it seemed a bad choice. Similarly, Golden Gate is extremely overused and, in any case, associated with IBM's Java initiative. Tacoma Narrows was a possibility, but considering it was famous for self-destructing, we thought we would leave that one alone, perhaps for a proprietary vendor in Washington.
There is a bridge not far from where we are in New Jersey: the Bayonne Bridge. Virtually nobody has heard of it, and in any case, the name is little-used.
Summer 2002 marks the introduction of the 1.0 release of GNU Bayonne. At present, GNU Bayonne is part of not only the GNU Project, but it has been packaged and distributed as a standard part of several GNU/Linux distributions, including GNU/Debian and Mandrake. In that we wish to make telephony application services universally available to free software developers, this is a positive development.
GNU Bayonne is widely used already in every part of the world. Users range from commercial carriers in Russia to state and federal government agencies in the US, and they include many enterprises that are looking for either specialized telephony-enabled web services or a platform for enterprise applications such as voice messaging.
GNU Bayonne does not exist alone but is part of a larger metaproject: GNUCOMM. The goal of GNUCOMM is to provide telephony services for both current and next-generation telephone networks using freely licensed software. These services can be defined as 1) services that interact with desktop users, such as address books that can dial phones and soft phone applications; 2) services for telephone switching, such as the IPSwitch GNU Softswitch Project and GNU oSIP proxy server; 3) services for gateways between current and next generation telephone networks, such as troll, and proxies between firewalled telephone networks such as Ogre; 4) real-time database transaction systems, such as preViking Infotel and BayonneDB; and 5) voice application services, such as those delivered through GNU Bayonne.
Even before GNU Bayonne 1.0 had been finalized, work started in late 2001 on a successor to GNU Bayonne. This successor attempts to simplify many of the architectural choices that were made early on in the project, with the hope of making it easier to adapt and integrate GNU Bayonne in new ways. The choice of design and much of the initial planning occurred during a two-day period late in 2001, while I was in London meeting with the people who developed the preViking telephony server. Some of these changes involved bringing the preViking Project directly into GNU Bayonne development.
One of the biggest challenges in the current GNU Bayonne server is creating telephony card plugins. These often involve the implementation of a complete state machine for each and every driver, and often the code is duplicated. GNU Bayonne 2 solves this by pushing the state machine into the core server and making it fully abstract through C++ class extension. This allows drivers to be simplified, but it also enables us to build multiple servers from a single code base.
Another key difference in GNU Bayonne 2 is much more direct support for carrier-grade Linux solutions. In particular, unlike its predecessor, this new server can take ports in and out of service on a live server, allowing for cards to be hot-plugged or hot-swapped. On a carrier-grade platform, the kernel will provide notification of changeover events, and application services can listen for and respond to these events. GNU Bayonne 2 is designed to support this concept of notification for management of resources it is controlling.
Finally, GNU Bayonne 2 is designed from the ground up to take advantage of XML in various ways. It uses a custom XML dialect as a configuration language. It also acts as a web service with both the ability to request XML content that describes the running state of GNU Bayonne services and the ability to support XMLRPC. This fits into our vision for making telephony servers integrate with web services, representing part of how we envision the project going forward.
David Sugar has been involved in developing free software over the last 20 years and is the principal author of a number of packages in the GNU Project, including GNU Bayonne. David Sugar is a founder of Open Source Telecom and chairs the DotGNU steering committee.