First Look at an Apple G4 with the Altivec Processor

by Matthew Fite

When I first read about Apple's plans to develop a G4-based personal computer, I didn't even know what a G4 was. Supercomputer performance? Processing in the GFlops? How could this be? G4, also known as the Motorola 7400, is the processor with the AltiVec unit. AltiVec is the trade name of the vector processing unit found in this new line of PowerPC processors. Motorola has also announced the 7410 and the 7450, which feature an L2 cache on the die, a large backside L3 cache, a faster processor core and a deeper, seven-stage pipeline.

The AltiVec unit is an enhanced integer or floating processing unit. It provides a new 128-bit processing unit, 32 vector registers and over 160 new instructions that allow for the processing of data in a pipeline. These provide a tremendous opportunity to move data through the processor.

After a description like that, who wouldn't want to have one of these at home? I'm not a Macintosh aficinado nor do I care for Windows very much. When I read about the work that Cort Dugan, Paul Mackerras, Ben Herrenschmidt and many others had performed porting Linux to PowerPC (PPC) I was sold. After all, this sounded like an opportunity to try something new and challenging, learn a little (or a lot) and get faster numbers from my distributed.net client (some of the reasons I started using Linux last millennium).

My hardware is an Apple dual-G4/450MHz PowerPC with 512MB RAM. It comes with a 30GB Quantum Fireball IDE drive, a CD/DVD-ROM, two IEEE-1394 (Firewire interfaces), 100Mbps Ethernet and more USB hubs than you can shake a stick at. The keyboard and mouse are both USB devices. Apple calls this a New World machine. Although this sounds like a marketing term, “New World” is used to describe Apple hardware where the boot ROM is stored in software (as opposed to “Old World” machines where boot management software was stored in a PROM).

The Linux distribution I chose to install was Yellow Dog Linux. I don't know what finally pushed me in that direction, given that there is more than one choice—SuSE, LinuxPPC and Yellow Dog Linux immediately come to mind. YDL is based on Red Hat, so it's not too unfamiliar.

YDL is provided by Terra Soft Solutions. While Terra Soft provides another distribution, Black Lab Linux, YDL is the entry point solution for the common user. I downloaded the two ISO images of YDL Champion Server 1.2.1 from one of Terra Soft's mirrors. The first is the installation CD; the second CD is known as “Tasty Morsels”. It provides a rescue image and some additional software for the PPC. I burned these images with cdrecord on my SuSE/i386 box and then wondered what I had.

After I read the YDL installation guide I had some idea. The guide suggested I use yaboot, “yet another boot loader”. yaboot needs to live on an HFS (native Mac) partition so I needed to create one using the Mac system software.

Here are the steps I used to reinstall Mac OS9 and then install Linux:

  • Create an HFS partition (4GB) for yaboot (and OS9).

  • Reinstall OS9 from distribution CDs.

  • From the CS 1.2.1 CD, copy yaboot, yaboot.conf and vmlinux.gz to the system folder.

yaboot.conf looks and feels a lot like lilo.conf. There are sections for each image with an area to provide a label so that when yaboot boots, the user can Tab to see the names of the kernel configurations and then select one at the prompt. Familiar stuff, but I had to modify yaboot.conf as shown below.

Here I should digress a bit about Open Firmware. Open Firmware, defined under IEEE 1275, is a specification for providing open support for firmware. This was one of the first interesting areas of my new Apple hardware explorations. I didn't see Open Firmware until I needed to. Upon booting the Mac an 880Hz tone sounds to indicates that your system just passed a hardware POST and is preparing to boot an operating system. At this point the booting process can be stopped by pressing and holding the Command-Opt-O-F keys. If all goes well the following greeting is displayed:

Apple PowerMac 3,3 3.4f1 BootROM built on
08/08/00 at 22:02:19
Copyright 1994-2000 Apple Computer, Inc.
Welcome to Open Firmware.
To continue booting, type "mac-boot"
and press return
To shut down, type "shut-down"
and press return
 ok
0 > _

The 0 > is a prompt. OF is, at its heart, a Forth interpreter. Forth is a stack-based language. To obtain a sense of this, type the following at the prompt:

0 > 3 [RETURN]
1 > 4 [RETURN]
2 > + [RETURN]
1 > . [RETURN]
and you will get the resulting:
0 > 7
The first command pushed “3” on the stack. The prompt displays the number of items on the stack before the “>”. Then I placed “4” on the stack and told the interpreter to add the results. Now there was only one item on the stack. The “.” operator pops the first value off of the stack and displays it.

You can see quite a bit about your hardware from here. For example, to see the default boot configuration of your machine type the following at the prompt:

0 > printenv

Listing 1 shows the built-in environment variables and their defaults.

Listing 1. Environment Variables and Their Defaults from the G4's Open Fireware

Another good bit of information from your PPC can be derived from the command devalias. Enter this command at the prompt and press the Return key. Pay attention to the value for hd. That is the hardware address of your first IDE hard drive. hd is an alias for the entire address displayed via printenv.

Saving This Information (More Fun with Forth)

If you are like me, you might get paranoid about changing any of these values. After a bit of research on http://developer.apple.com/, I came across some interesting technical notes. In particular were Technotes 2000-2004. Some of the benefit of having a full-featured interpreter with the power of an operating system is the ability to provide for viewing, running files and displaying hardware information for debugging. Much of this information is too detailed to write down, so there is the notion of a “two-machine” mode (TN 2004). In this mode, you can display the OF output on a serial port. The G4 PPC doesn't come with a serial port, but within Apple's OF there is a Telnet dæmon. I'm not entirely sure that you couldn't use the USB devices as output, after all, “serial” is in the acronym, but I do know that the Telnet dæmon works. Also, I don't know if minicom can be used with a USB port.

The dæmon is easily configured. First, from the OF prompt enter the following command:

0 > " enet:telnet, 192.168.2.20" io

Observe the spaces, press Return and now OF has created a Telnet dæmon awaiting a Telnet client. This command has configured the Ethernet interface to IP address 192.168.2.20. You may want to choose a different IP address depending on your own network configuration. You will need another machine on the same physical network segment as your PPC. If you don't have a segment, a crossover Ethernet cable will do.

From your client machine, Telnet to your target (PPC) machine. You should be presented with the same “0 >” prompt as displayed from the Mac. Now you have the ability to capture all of the output from printenv, devalias, etc., to a file. This helps if you screw things up so badly that you have to return to your default configuration.

Okay, let's install Linux. Insert the YDL CD into your DVD ROM and hold the C key down while you boot. This is the method to boot from the CD. You'll be presented with the installation screen for YDL. You can follow the YDL installation guide for the most part, but a word of caution about partitioning: unless you've installed Linux before on your Mac, you'll need to create some partitions. No longer are you creating ext2 partitions, now you'll be creating partitions of the type Apple_UNIX_SVR2. Also, you'll be using pdisk rather than fdisk to create your partitions. Use the p command to display the partitions. If you've followed my advice above, you should see nine partitions. These are created by default, and if you intend to leave some form of running system (recommended), leave them alone.

Now you need to create the partitions for your normal partitioning scheme, I've chosen to create partitions for the mount points /, /usr, /opt, /home and a swap partition. Yours may be different, but the scheme I've created is shown in Table 1.

Table 1. Partition Map (with 512 byte Blocks) on /dev/hda

After you write the partitions to the table using the w command, and you quit out of pdisk (q command), reboot the system. pdisk will not recognize the new partitions until a reboot. Begin the installation anew by holding down the C key; indicate your newly created mount points, and you can begin selecting packages as you would on a normal Red Hat Linux installation. After you have completed these steps, you're going to have to reboot again. This time, don't hold down any keys as you want to boot the Mac OS.

Now, back to the Mac OS. Open the yaboot.conf you copied to the system folder and take a look. Mine looks like Listing 2.

Listing 2. yaboot.conf

Notice the label for “linux”. The yaboot.conf that comes from the CD has an error; you need to prepend the extra “\\” to yaboot again. This time, use the command sequence Command-Opt-O-F to get to OF. When you again get the “0 >” prompt, enter the following:

0 > boot hd:,\\yaboot

After some flickering, you'll be presented with a LILO-like prompt. Linux should begin to boot. Success! You should now see the power of Open Firmware; the command above allows you to execute a file from your hard drive, and you haven't even booted an operating system yet!

After you log on as user root, you should edit the file /etc/modules.conf and add the following:

alias sound dmasound

This will allow you to use /dev/dsp to play audio. However, in its present form, dmasound supports write only—you can't use it to record data from an external microphone.

I configured X (XFree86 3.3.6) using the XConfigurator that runs during the Linux installation. I chose values for 1024 x 768 with a 24-bit color depth. In yaboot.conf I added the line:

append="video=aty128fb:vmode:17,cmode:24"

so that the kernel would correctly observe the ATI graphics card installed. Then I edited /etc/X11/XF86Config and added DefaultBitsPerPixel 24 in the “Screen” section so that I didn't have to pass the bits per pixel to startx when I ran it.

AltiVec Stuff

Now that Linux is installed the fun with AltiVec begins. As I already mentioned, the AltiVec unit is an additional processing unit, like the floating-point unit or the integer-unit, that processes data stored in 32 128-bit vector registers. The vector execution unit processes this vector data using the single instruction multiple data (SIMD) model. The processor, with one instruction, can operate on four, eight or 16 data units at once. Shortly I give an example to clarify this.

Motorola added 162 new assembler instructions to allow programmers to use the new functionality of the AltiVec-enabled processor. These instructions are detailed in the AltiVec Technology Programming Environments Manual (altivec_pem). The higher-level C instructions that use these new assembler instructions can be found in the AltiVec Technology Programming Interface Manual (altivec_pim). Both of these documents are available for download, in PDF format, from either Motorola's web site or from http://www.altivec.org/.

My next step was to download and install the AltiVec RPMs from http://www.altivec.org/. These RPMs provide a version of gcc (2.95.2) that has been modified to use these new directives. Installation is achieved by the following:

rpm -U binutils-2.9.5.0.22-6.vec.ppc.rpm
rpm -i gcc-altivec-2.95.2-1i.ppc.rpm
rpm -i gcc-altivec-c++-2.95.2-1i.ppc.rpm

After installation, I was able to use this new gcc as follows:

gcc-vec program.c -o program
gcc installs into /opt/bin so that it doesn't affect the default gcc. The RPM creates a link in /usr/bin, named gcc-vec, that points to the vectorized gcc in /opt.

To use the new vectorized commands, you have to write applications that use them and use a version of gcc that is aware of them. You cannot use this version of gcc on your standard C source code and expect to achieve a performance increase from the AltiVec unit. The AltiVec-enabled gcc is aware of new keywords and new functions. altivec_pim is the first step in learning the new commands provided for in gcc-vec. The new vector data types are seen in Table 2.

Table 2. AltiVec-enabled gcc: New Keywords and Functions

Notice the new keyword vector. This indicates that the following declaration is a 16-byte (128-bit) vector. Additionally, these types must be aligned on 16-byte boundaries for the vector execution unit to process the values suitably. A programmer must use caution when de-referencing data that is not aligned on a 16-byte boundary and typically will massage the data to be so aligned.

According to altivec_pim, compilers aware of the AltiVec-enabled processor should provide the following macro:

#define __VEC__

To build code that is capable of compiling on multiple architectures but is still capable of using the AltiVec instructions, you can do something like the following:

#ifdef __VEC__
       /* Put your vector code here */
       /* ... */
#else
       /* do it the old-fashioned way, here */
       /* ... */
#endif
To illustrate how to begin using the AltiVec-enabled gcc, I'll provide an example in Listing 3 [see FTP site at ftp.linuxjournal.com/pub/lj/listings/issue86].

First, notice the typedef union definitions. As previously discussed, the AltiVec registers are 128-bits. These definitions guarantee that the compiler will align the data declared by these types on 128-bit boundaries. Secondly, they provide a convenient method of accessing the individual elements of the vector data types. A final benefit of using the union data type is that now you are given a mechanism to look inside your register—by using printf( ). The altivec_pim provides for formatted input/output using scanf( )/printf( ). In theory, you should be able to print a vector float register using the following in your C code:

vector float f32 = (vector float)(1.1, 2.2, 3.3, 4.4);
printf( "%,vf\n", f32 );

To achieve this your C library (glibc*) must be aware of the vector format directives. The current implementation of the GNU C library (2.2) does not and probably never will. For this reason, I hope to modify a version of the GNU C library to serve this purpose. If you have any advice or interest, please feel free to contact me.

Next, notice the two different mechanisms for defining the vector types. The first declaration is for the vector constants stored in cVals, sVals, iVals and fVals where the vector data type is declared and defined in the same statement. This illustrates how to store constants (values that do not change at runtime) in vectors.

The next method declares a union type and assigns the vector values at runtime in an element-by-element fashion. This method would allow you to read in data from a buffer, copy it to a vector variable and pass it to your vector-aware functions.

Finally, notice the form of the vec_add( ) function. In all cases, I have used the same function, vec_add( ), and it provides the correct result, regardless of whether the arguments were vector shorts, vector ints or vector floats (the arguments must be of the same type). In this case, the compiler interpreted the data types I passed as arguments to vec_add( ) and generated the correct form of the assembler instruction vadd* for me. For example, in the following C code the compiler is able to generate the mapping below:

vector float a,b,c;
/* Assign a,b */
/*    ...     */
c = vec_add( a, b);

This translates to the following assembler instruction:

vaddfp c,a,b
This just keeps getting easier.

To compile this program, use the following command:

gcc-vec -fvec vecdemo1.c -o vecdemo1

The -fvec switch to the compiler tells it to interpret the vector commands. If you don't use the -fvec switch, the compiler will not recognize the vector data types or commands and will print error messages that will remind you to use the switch the next time.

The program produces the output shown in Listing 4.

Listing 4. Output of the Vecdemo1 Command

I've tried to provide an introduction to Linux on the PowerMac and to the AltiVec resources available to Linux programmers. I would like to do more. Other possible avenues would be to demonstrate how the AltiVec can be used as a platform for signal processing, how these processors can be used in place of special-purpose DSPs or to look at a common use for DSPs in signal processing, finite impulse response (FIR) filters.

I would like to thank the members of the AltiVec forum. The mail list has been an invaluable resource to get up and running. Also, thank you to all of the AltiVec developers that have provided such a rich set of tools to begin development on a platform as powerful as the G4.

Resources

First Look at an Apple G4 with the Altivec Processor
Matthew Fite and his wife live in northern Virginia where he works as an embedded software engineer. Although he uses a commercial RTOS during the day, he secretly dreams of replacing it with RTLinux. Matthew has a BS and MS in Electrical Engineering and is always looking for a new project, although his wife probably believes he has enough. You can contact him at mattfite@yahoo.com.
Load Disqus comments