The Role of Linux in Grid Computing
Today, applications are developed to be geared toward a specific platform or hosting environment, for example Linux, Windows 2000, various UNIX flavors, mainframes, J2EE, Microsoft .NET and so on. Such computing tends to operate within a monolithic framework in which applications contend for resources as and when they're made available for that single platform. For a platform with limited resources, the resource availability starts decreasing as the demand for service grows. At such a time, if resources from other systems could be used or, in turn, the requirements could be serviced by resources from other systems, the strain on the native system would reduce considerably and the quality of service being offered would improve.
It is this objective that grid computing wants to meet. The objective of grid-based computing is to virtualize, manage and allocate distributed physical resources (processing power, memory, storage, networking) to applications and users on an as-needed (on-demand) basis--regardless of the resources' location. Grid networks transcend physical components, organizational units, enterprise infrastructure and geographic boundaries. Naturally, software plays a vital role in determining the success of grid computing. In this article, we focus on the role of Linux in grid computing.
Four arguments can be made for Linux becoming the basis of grid computing:
1. Open Grid Services Architecture (OGSA) is a service architecture built on the open-source paradigm of community participation and sharing code. According to the father of the grid, Ian Foster, a chief scientist at Argonne National Laboratory, the long-term success of grid computing depends on four issues: open standards, open software, open infrastructure and commercializing grid services to speed enterprise adoption. The development of Linux has progressed along similar lines.
The Globus Toolkit, which formed the basis of OGSA, is a community-based, open-architecture, open-source set of services and software libraries. Globus addresses issues of security, information discovery, resource management, data management, communication, fault detection and portability. Thus, it mirrors the community processes used for the development and evolution of the Linux kernel. Any grid network must accommodate a heterogeneous mix of existing resources. However, future generations of grid networks likely will center around operating system and development environments that support an open and collaborative community process whose infrastructure evolves through an open-source process. Because Linux has evolved from the same open-source process, there is a high degree of affinity between Linux and grid-computing projects. Open standards and protocols lead to the building of services, and services are at the heart of the grid.
2. The grid concept is based on the management and allocation of distributed resources rather than on a vertically integrated, monolithic resource tightly coupled to the underlying operating-system architecture of the platform.
The adoption of grid computing from single platform architectures will not happen all of a sudden. A few computational units will have to be deployed in small, inexpensive increments. The performance of these units will be measured and compared to the expected results. If the gain is significant, only then would there be a next round of deployments. This is in contrast to the major investments needed for large-scale monolithic systems, which typically are obsolete within four or five years and thus are a drain on capital and operating budgets.
Linux has gained a reputation for being a highly efficient operating system in simpler application environments running on smaller hardware configurations, the type that will be enabled by the grid architecture. In such experimentation-based systems, the free nature of Linux will play a crucial important role due to lower investments.
3. Computing grids are virtual, extensible and horizontally scalable and use open network protocols. Many of the early instances of grid networks were developed in the scientific and technical computing environments of universities, technical laboratories, health and bio-informatics consortia. Most of them have relied on native operating system processes for the hosting environment, typically UNIX and Linux. Their experience suggests that Linux is the best platform available for grid computing. There has hardly been any evidence of grid computing projects being deployed on other operating systems, such as Windows 98 or Windows XP.
For example, the TeraGrid is designed to help the National Science Foundation address complex scientific research, including molecular modeling for disease detection, cures and drug discovery, automobile crash simulation and investigations of alternative energy sources. TeraGrid will use more than 3,000 Intel processors running Linux. The Grid Forum, a research consortium, is aiming its research at applications in the oil industry, physical disaster prediction and simulation, biological and ocean modeling, industrial simulations, agriculture applications, health service applications and e-utilities. Many of these applications currently run on UNIX or Linux.
4. Vendor-specific initiatives are promoting Linux. Although IBM's grid architectural block diagrams show the OGSA framework supporting operating system heterogeneity, they also clearly point to the centrality of Linux in IBM's grid strategy. Sun Microsystems has written an edition of its Grid Engine 5.3 software for the version of Linux available through SuSE Linux AG. Other vendors are investing in grid computing as well. Hewlett-Packard has incorporated software specs for massive grids into the Utility Data Center, a computing power-on-demand product that supports Linux. An Oracle spokesman recently said Linux is "the smart option for grid computing". In addition, Oracle recently announced that the Oracle 10g package is grid-enabled and runs smoothly on Linux.
On the whole, Linux is the buzzword as far as the platform for grid computing is concerned. But, arguments against pervasive adoption of Linux exist. A few them are listed below:
1. Grid computing is based on the principle of heterogeneity, where virtual organizations are formed with no discrimination between resources and systems, as long as the standard toolkit services are implemented.
The OGSA model does not specify an operating system. On the contrary, it has been developed so as to invite all computing architectures into the grid-computing family. Given grid computing's emphasis on resource virtualization and usage and the heterogeneous nature of most enterprises' IT infrastructures, enterprises are under no pressure to change their hardware and software to participate in grids or establish internal grids.
2. The grid philosophy does not specify implementations: its fundamental principle is to adapt to the operating system environment of specific hosts and exploit their native capabilities.
The grid architecture does not suggest or inhibit ways or solutions for implementation of grid architecture. Similarly, it does not specify anything about the platform to be used.
3. Grid computing addresses only a small part of the IT infrastructure. Grid computing is exactly what it implies--a mesh of distributed resources whose members share each other's resources through a specifically enforced set of protocols. The heterogeneity that OGSA attempts to address is in itself recognition that IT infrastructures will consist of a mix of computing architectures.The role Linux will play in such heterogeneous environments will depend largely on its performance, reliability and economics when running on such hosts.
Both grid computing and Linux are too immature for us to forecast that Linux will dominate in commercial grid applications. Only the future will tell whether the realm of grid computing is ruled by Linux. One thing is for sure, however; Linux definitely will form a large chunk in the Grid Computing Platform market.
Aseem has been working at Cygnus Microsystems Pvt Ltd, in India, for the past 15 months. He has special interests in grid computing and supercomputing.