Embedded System àla Carte
How many times have you faced the problem that after the specification phase is complete you simply cannot find a matching microprocessor with the required type and number of peripherals? How many times did you find out that for the closest matching microprocessor no Linux port is available? How hard was it to put together all the tools you need to design and debug your system? How many hours have you spent working around all of these issues? In any case, relax! The Virtex-II Pro FPGA family and the corresponding tools combined with MontaVista Linux OS and development environment enables you to pick your system à la carte and saves you a lot of time and sweat.
The Virtex-II Pro FPGA family offers up to four hard-core PowerPC 405 processors, 16 Rocket I/O 3.125Gbps serial transceivers, 3.8Mb of block RAM (BRAM) and four million system gates on a single programmable platform chip. This rich set of features opens the field for a wide range of applications and provides high flexibility for system designers. Single-chip systems with an arbitrary number and type of I/O devices are possible. For example, you can have five UARTs, a PCI bus and several Gigabit Ethernet ports all controlled by the on-chip processors.
An Xilinx partnership with IBM introduces support for the hard-core PPC405, a robust busing standard and a device fabrication process. Xilinx offers a wide variety of intellectual property (IP) cores for the Virtex-II Pro FPGA, which are predefined hardware blocks that directly snap into IBM's CoreConnect bus technology (Figure 1).
Figure 1. IP Cores Connected to CoreConnect
CoreConnect has several features for interconnecting the processor to the FPGA fabric in order to construct system designs. The key CoreConnect features are the application-oriented system buses, which include the processor local bus (PLB), the on-chip peripheral bus (OPB) and the device control register (DCR) bus. Selected IP cores include many forms of I/O and external memory controllers. For I/O, the cores include 16450- and 16550-compatible UARTs, IIC controller and SPI controller. The memory controllers include support for SRAM/FLASH, SDRAM, DDR SDRAM and ZBT devices. More sophisticated, system-level I/O is also available as IP cores: PCI cores, 10/100 Mb Ethernet, Gb Ethernet and the System ACE configuration interface. Each core is delivered with a corresponding driver where appropriate. The drivers are shipped as source code.
The powerful software design tools that help the user make use of the processors, the IP cores and, more generally the FPGA, will be described here in more detail. The interaction of this hardware and software is unique for the Virtex-II Pro FPGA because it enables the design of a completely custom embedded system solution on a single chip. The issues of hardware/software codesign include use of powerful application software tools for embedded Linux and some unique tools for embedded debugging. Before delving into codesign issues, let's look a little deeper inside the Virtex-II Pro FPGA.
The PowerPC 405 processors run at 300+ MHz and each have 16KB of data and 16KB of instruction cache. The PLB and OPB buses run at 100+ MHz. A powerful feature of the Virtex-II Pro FPGA is the implementation of the on-chip memory (OCM). The OCM is memory that has similar access characteristics as the processor caches but is managed by the user. The Virtex-II Pro FPGA implements the OCM as dual-ported BRAM, allowing for an extremely fast data path from peripheral devices to the processor or as a communication buffer between processors.
The Rocket I/O 3.125Gbps serial transceivers support many different communication standards, listed in Table 1. Important features like buffers, 8B/10B encoding and decoding, and CRC calculation and checking are implemented on-chip and do not take space away from the FPGA fabric.
Table 1. Virtex-II Pro Platform FPGAs support these protocols and baud rates.
The BRAMs can be used as memory accessible by the processors and can be connected to CoreConnect or as OCM. While OCM offers better performance and a direct way to the processor, hooking up the BRAM to the PLB makes it available for DMA transfers where the processor is not involved. It is up to the system designer to decide how the BRAM is used best.
The FPGA fabric can be filled with IP cores provided from Xilinx or with user-specified designs. While some of the IP cores interact with each other, others work completely in parallel.
Because FPGAs can be reconfigured dynamically you can even switch the number and type of devices while the system is running. Such a flexible platform needs a powerful operating system and tools for hardware and software engineers to develop and debug their embedded applications.
Depending on their application, users may choose from a wide variety of COTS (commercially available off the shelf) real-time operating systems, proprietary RTOS or embedded Linux environments. In the Linux arena, Xilinx chose MontaVista Linux as a suitable embedded OS and development environment for the Virtex-II Pro FPGA family because of its extreme versatility. Linux, by default, supports a wide range of devices. Xilinx and MontaVista Software chose to use a layered device-driver approach. It is split into a low-level, OS-independent layer that directly sets up on the hardware and an OS specific adaptation layer that sits in the middle between the OS and the low-level drivers. Xilinx provides the low-level drivers written for optimal performance and best use of the functionality of the IP cores. MontaVista Software implements the adaptation layer for Linux and uses its expertise for a seamless integration of the drivers. Both driver layers are pushed into the open-source repository and are released under the GPL (General Public License). All device drivers can be compiled into the kernel or are available as loadable modules. Being able to load and unload driver modules supports hardware that can be reconfigured while the system is running. Replacing hardware in a running system is a well-known process for external devices like USB devices and PC cards, but replacing hardware on the chip and dynamically loading the corresponding Linux driver is something completely new.
The Linux port is first targeted toward the Virtex-II Pro ML300 Platform from Xilinx (shown in Figure 2), and it supports most of the IP cores and hardware on this board. Since the ML300 is an elaborate board with a lot of functionality, it will be easy for users and customers to adapt the port to their own specific hardware. MontaVista Software offers professional support for such projects.
Figure 2. Virtex-II Pro ML300 Platform—Bottom View
Still, hardware and software engineers need powerful tools to develop, boot and debug their systems. With System Generator for Processors, GDB/XMD, System ACE and ChipScope Pro, Xilinx has a complete tool suite for all aspects of hardware, software and systems engineering.
System Generator for Processors (SGP) helps you put together the design for the Virtex-II Pro FPGA both on the hardware and on the software sides. In user-friendly dialog boxes you can specify all the parameters for the system, like base addresses for the peripherals, interrupts to be used and amount of memory present. As a result, SGP emits the hardware design files that are ready for FPGA implementation or simulation and a parameter file that is used when the Linux kernel is built and the corresponding driver modules are created. Listing 1 shows an excerpt of the parameter file. In this case, the user assigned parameters to configure the interrupt controller in the system.
Listing 1. Excerpt of the Parameter File to Configure the Device Drivers
SGP gives system architects the flexibility to investigate different options and variations for their new embedded systems. Setting the parameters to different values tailors the hardware and software to the specific requirements. Only functionality that will be accessed and used is included in the system; other functionality is stripped out of the design. Additionally, devices are preconfigured with the default parameters. As a result the hardware and software use less space, offer better performance and the initialization process is much simpler (in some cases it is not even required). SGP and its companion tools use an open interface that makes it simple for users to add their own hardware functionality and software drivers. XMD is a debugging server using the on-chip debug (OCD) protocol to communicate with GDB on the host system. It controls the target system through the JTAG port of a processor in the Virtex-II Pro FPGA. At the same time, XMD serves multiple GDBs. Thus, it is possible to debug more than one processor at once. More specifically, all four processors within one Virtex-II Pro FPGA can be debugged simultaneously. On Linux, GDB runs on the command line or with one of the different front ends. It is compiled with support for the Insight GUI but can also be used with DDD and Emacs.
In GDB the PPC405 architecture was added and the “target ocd” command was extended to support all the features of the Virtex-II Pro FPGA. As a result all the registers of the PPC405, the caches, the TLB (translation look aside buffer) entries and the contents of the OCM cannot only be inspected and changed but also can be mapped into the memory space of the processor.
Debugging an embedded system traditionally has been a difficult process. You have to look at hardware and software at the same time. An external, non-intrusive software debugger like GDB combined with XMD is a big help. Additionally, the PPC405 supports hardware breakpoints and allows freezing the processor on exceptions. But especially when processors and peripherals are integrated on the same chip, it is difficult to see what transactions are executed and how (in which order) memory is accessed. All the important signals are buried within the chip, and there is normally no way to get access to these signals. The Virtex-II Pro FPGA does not have such a limitation because it integrates peripherals as soft hardware. All signals are visible and can be accessed with the appropriate tool.
ChipScope Pro is an integrated hardware logic analyzer. It consists of a logic analyzer running on the debugging host system and a set of trigger and data units compiled or inserted into the hardware design. The integrated logic analyzer (ILA) units can be hooked up to any number of signals inside the FPGA and can trigger on user-defined conditions or processor bus transactions. Multiple ILA units can be active at the same time. Sometimes it is useful to hook up multiple ILAs to the same signals. In one case, we hooked up two different ILA units to the PLB address and data lines to resolve a problem with corrupted memory. One ILA was hooked up to the PLB signals connected to the PPC405. We knew that the processor would do a memory access at the time when the corruption would occur. The other ILA was hooked up to the PLB signals connected to the DDR memory controller. By comparing the address and data lines reported by the two ILA units, we were able to isolate the problem and fix it. Having access to the hardware and being able to watch bus transactions is very useful, especially in Linux where the MMU is used. From a software perspective, the same physical block of RAM can be mapped into many different virtual address spaces. On the hardware level, all addresses are physical.
The combination of ChipScope Pro, GDB and XMD gives the developer extremely high visibility into the system. The software tools share a common cable and communicate through the JTAG port of the FPGA. The friendly cooperation between the tools reduces the number of cables and makes the setup of the debugging environment much easier.
The boot process is an imminently important phase when an embedded system is powered up. In a few steps the components on the board, the processor, the memory system and the communication infrastructure are brought up. On the Virtex-II Pro FPGA, the boot process happens in two steps. On one hand, the FPGA is configured, and on the other hand, the processor is started. The FPGA needs to be configured with its functionality in one of many different ways. We will show one recommended method later in this article. Often a specialized primary boot loader is used to start the processor, bring up the system, load the Linux kernel into memory and transfer control to the entry point of the kernel. The Virtex-II Pro FPGA supports this traditional boot method where the primary boot loader resides in external ROM or in internal BRAM. The latter case removes the need for external ROM in that the primary boot loader is included in the bit stream that configures the FPGA. Immediately after the FPGA is configured the processor is released from reset, starts reading instructions from the internal BRAM and executes the primary boot loader.
A new and, especially with MontaVista Linux, powerful solution to boot the system uses System ACE. System ACE is a companion chip external to the Virtex-II Pro FPGA and allows booting a system without having any ROM. It has two main functions. For one, it boots the system by configuring the FPGA, the processor and any device on the processor bus through the JTAG chain from a CompactFlash card or a Microdrive. And two, it uses the same storage device as a filesystem accessible by Linux.
The Microdrive contains a FAT12 or FAT16 and a Linux partition. The Linux kernel is configured with support for the System ACE device, compiled, converted into a System ACE-specific file format, concatenated with the configuration bit stream for the FPGA and stored on the FAT filesystem. On power-up, System ACE reads the configuration file from the FAT filesystem, configures the FPGA and boots the kernel. During the boot process, the Linux kernel mounts the Linux partition on the Microdrive as a root filesystem. The non-obvious advantages of booting with System ACE are that no memory at all is required at the reset vector of the processor, different boot configurations can be stored on the FAT partition and the boot configuration can be changed by normal file operations.
System ACE works on the JTAG chain like an external debugger through the debug port of the processor. Code, data and, if required, a RAM disk, for the Linux kernel are loaded through the JTAG chain and the processor bus into system memory. Configuration of any devices in the system accessible by the processor can be done in the same way before the kernel is loaded. And at the end, the program counter of the PPC405 is set to the start address of the Linux kernel and directly executed from this location.
A switch points System ACE to one of eight active configurations. The configuration to be loaded also can be set by software. A running Linux system selects the new configuration, resets the system and boots into this new configuration that may consist of a different set of peripherals.
The FAT filesystem allows Linux to update the System ACE file while the system is running—a very powerful solution to upgrade hardware and software in-system and on the fly.
The Virtex-II Pro Developer's Kit adds another dimension to a successful system design experience. The kit allows you to simulate your embedded system before you build real hardware and introduces another abstraction level during the debugging phase. Each component of the system can be simulated on its own. Once the whole system is put together, hardware and software can be run in the simulation to verify the functionality of the embedded system. Problems observed in real hardware can be taken back into simulation and tracked down. GDB/XMD can be configured to connect to HDL simulators and enables an engineer to follow the program execution step by step and watch the bus transactions and hardware state changes as they occur. The completeness of the combination of the Virtex-II Pro FPGA with MontaVista Linux makes it an ideal platform for many different applications. The Rocket I/O serial multigigabit transceivers make it interesting for telecommunications, for example, in base stations where complex calculations have to be combined with high bandwidth and enormous computing power. The same transceivers can also be used as a backplane interconnect between multiple devices. The available peripherals combined with up to four processors also make it an ideal platform for data and graphics terminals or even as a workstation.
The integration of processors into the FPGA fabric offers some opportunities for interesting system architectures and future development. On the architectural side, in a simple system, multiple processors can be interconnected by a shared PLB. A more complex system uses a switched approach to prevent congestion on the bus and gain better performance. Due to the nature of FPGAs, system designers might start with the simple approach and later change their strategy. In any case, Linux will have to support the architecture. Since semaphores and mutexes are easily implemented by dual-ported BRAM, resource management and access to shared memory are straightforward. Hardware/software coprocessing will improve system performance a great deal. While hardware is fast and can execute in parallel, software is much more flexible. Linux will call certain system functions that in reality are implemented in hardware. It will be a challenge for the system designer to find the functions for which it makes sense to off-load into hardware, but it pays off in a faster and more dynamic system. In an even more complicated system Linux will use dynamic coprocessing. It partially reconfigures the FPGA with the desired hardware functions optimized for the currently running applications. While one application calculates extensive FFT transformations, another application searches for patterns in a data stream. Whenever the scheduler transfers control to one of the two applications, it also replaces the corresponding IP. Based on statistical data, the scheduler decides whether the application will be hardware accelerated or whether a corresponding software function is used. The Virtex-II Pro FPGA and MontaVista Linux combined with the corresponding system generation, debugging and configuration tools is a powerful and flexible solution. It enables you to implement the design in your specification and not the one given by hardware and software limitations, increases the integration factor without losing observabilitiy, reduces time-to-market because of available IP cores and related software drivers, and finally, opens a new dimension to your creativity with respect to hardware/software codesign.
Michael Baxter has been working in computer technology since he was nine, imprinted by a 1969 viewing of 2001: A Space Odyssey. He is an experienced computer architect, system, board and FPGA logic designer. Michael holds ten US patents in computer architecture and logic, plus five patents as a co-inventor. His interests also include hiking, amateur radio and programming in Lisp.
Peter Ryser works as a systems design engineer for Xilinx, Inc. He is responsible for various embedded software-related projects for Virtex-II Pro and can be reached at peter.ryser@xilinx.com.