MPEG Compression of 2-D Data Files
In the course of my research I have to deal with dozens of Gigabytes of data produced by various 2-D computer simulations. Typically, such data comes from a rectangular grid and is stored in a file in raw byte format as a sequence of ''frames'' for consecutive time steps. Each frame represents a 2-D array and can be visualized on a computer screen by showing its values as pixel intensities. Then, the entire data file can be viewed as a movie. There are two practical problems with movie files in raw data format.
First, I am not aware of any free or open-source players for this format. Commercial players are available for a limited number of platforms, but typically they have too many unnecessary features and are expensive. I wrote my own simple Mesa 3-D-based player for Linux, but it is slow, and sharing my data files with people who work on other platforms is a painful experience.
The second problem is the enormous size of my movie files. I have to compress the files with gzip every time I want to store them or send them to my colleagues over the Internet, and then I have to uncompress them every time I want to play them.
I solved both the portability and compression problems by converting my data files into MPEG video format. It took me a while to figure out how to do this with the free, open-source tools available for Linux, and now I am happy to share this with you.
The first step is to install Mjpegtools, a package that contains an MPEG encoder. Although some versions are available at rpmfind.net, I downloaded the latest source code for mjpegtools-1.6.0 directly from mjpeg.sourceforge.net as a gzipped tarball (about 1MB). Then I did tar -xzvf mjpegtools-1.6.0.tar.gz; cd mjpegtools-1.6.0 on my Red Hat 6.2 Pentium box, but I should have read the prerequisites in the INSTALL file first. I ignored them initially and went on to ./configure and make, which resulted in a successful compilation but led to crashes at runtime. I had to go back, install the assembler package, nasm-0.98-2m, from rpmfind and then re-issue ./configure and make.
Mjpegtools cannot accept my raw data files directly, but it can accept a similar format, a PPM (Portable Pixel Map) stream. PPM is a still-image format that consists of a simple header and a sequence of raw bytes encoding the red, green and blue components of each pixel in the image. It is one of the simplest color image formats available (see the ppm man page or netpbm.sourceforge.net/doc/ppm.html for details). A PPM stream is obtained when the contents of several PPM image files are placed one after another without any special separators. I wrote a simple C++ program, sim2ppm, that converts my raw movie file into a PPM stream and sends it to the standard output. Sim2ppm adds an appropriate PPM header at the beginning of each frame and, based on a color map of my choice, splits each data byte into three-color component bytes.
Assuming that my original 2-D simulation data file movie.sim and my sim2ppm binary are in the current directory mjpegtools-1.6.0/, here is how I convert it to the MPEG format:
sim2ppm movie.sim | lavtools/ppmtoy4m | mpeg2enc/mpeg2enc -F 2 -q 10 -a 1 -o movie.mpg
The ppmtoy4m filter converts a PPM stream into a YUV stream, which is a required input format of mpeg2enc, the actual MPEG encoder. Explanation of the command-line options can be found at the mpeg2enc man page that comes with the distribution. This command produces a compressed MPEG file, movie.mpg.
The produced file can be played with any MPEG player. Depending on the degree of compression and type of data, I usually get good quality movies that are over 200 times smaller in size compared to their raw originals, i.e., a Gigabyte of raw data may shrink to only a few Megabytes. Such an MPEG file can be placed on the Web for quick downloads. It can be played directly from most browsers on most operating systems.
There are two popular MPEG players freely available for Linux: mpeg_play and xanim. They already may be included in your Linux distribution or can be found at rpmfind.net. Xanim can play a variety of video formats, but it skips certain MPEG frames (shows only I frames). This is why I use mpeg_play to view my movie files. For example, typing mpeg_play -framerate 5 movie.mpg & will play our movie at a rate of five frames per second.
RPMfind, a large on-line database of Linux packages.
Roman Zaritski is an Assistant Professor of Computer Science at Montclair State University in New Jersey. His interests include numerical modeling and cluster computing.
email: zaritski@roman.montclair.edu