Finding Linux Software
One of the most popular questions asked throughout the comp.os.linux Usenet hierarchy is “Where can I find a program to do insert your favorite task here?” About half of these questions result in a three-month thread concerning the merits of WYSIWIG versus markup style text editing while the other half slide into obscurity without any answers at all.
Finding software for Linux on the Internet is pretty easy, but there are three principles about free software you should accept before you start looking:
1) Something close to what you want exists. There are thousands of gigabytes of anonymous ftp space on the Internet and one of them will contain something that you'll consider useful—if you ever find it.
2) Whatever you find won't do everything you need. Most of the free programs you'll soon be wading through were written by one or two programmers whose needs are fully met by the programs they wrote.
3) Whatever you find will do more then you need. Unix software tends to accumulate obscure features at an astonishing rate.
Everyone's annual pilgrimage for an X-based Ami Pro clone illustrates these principles. While GUI-based, somewhat WYSIWIG word processing techniques for Linux do exist, none of them (ez, idoc, TeX/LaTeX/xdvi/gs combination, or a groff/gs combination) will do exactly what you had in mind. Then again, you can balance your checkbook in the middle of ez and and use TeX to get every other character to print upside down.
The first place to look for Linux software is the Linux Software Map (LSM), maintained by Lars Wirzenius (lars.wirzenius@helsinki.fi ). It's a collection of lsm entries which are supplied by people who submit files to the primary Linux archives (sunsite.unc.edu and tsx-11.mit.edu). The LSM includes descriptions of packages and full path names to where packages can be found on those two sites.
The easiest way to search the LSM is via the World Wide Web. A number of Web search engines are available, such as siva.cshl.org/lsm/lsm.html. This site provides searching via a searchable index, which is supported by most web browsers. Another web search engine is available at harvest.cs.colorado.edu/brokers/lsm/query.html. This interface is based on forms, and not all browsers support it (though lynx, Mosaic, and Netscape all do). When a match is found, hypertext links enable the user to download the files that matched without leaving the browser.
If you don't have Web access (or if you just like doing things on the command line) you can ftp the latest LSM as one large text file from sunsite.unc.edu:/pub/Linux/docs/LSM.gz. A few Linux-based tools to make searching this file easier are available for ftp from sunsite.unc.edu in the directory /pub/Linux/search.
Each package in the LSM has description and keyword fields which provide an excellent basis for searching. Looking for “sound mixer” via the colorado web site found 15 matches, showing how effective searching the LSM can be.
However, the LSM quickly gets out of date. The filenames and version numbers in it are often wrong, so don't think you're finished. The site names, directory paths, and package names can be used to find the software you're looking for even if the specific filename is wrong. Use ftp to connect to the site and cd to the directory the LSM mentions, and you'll probably find a few different packages that meet your needs.
If searching the LSM fails to find what you need, look at the sunsite.unc.edu and tsx-11.mit.edu ftp sites. Both of those sites have large Linux directory trees (in /pub/Linux and /pub/linux respectively) with both binaries and sources. Most CD-ROM distributors of Linux include a CD-ROM of one or both sites which provide a fast alternative to ftp.
Sunsite provides a file that is especially handy for finding software—/pub/Linux/INDEX.whole.gz. It contains all of sunsite's text INDEX files concatenated together, meaning it has a description of almost all of the files in sunsite's Linux directory. Searching this file for keywords with grep or a text editor lets you skim all of sunsite's 2 gigabytes very quickly. Looking for the word “mixer” in this file found 10 lines that mention the word, one of which is the name of a directory full of audio mixers! If sunsite has what you're looking for, this method is sure to find it.
Tsx-11.mit.edu's archive doesn't have as nice a way of searching it. It provides two files, ls-lR and find-ls.gz that list all of the files in the system. If you know (or can guess) the name of the file you want, these files will help you find it.
If none of these techniques help, it's time to look outside of the Linux world. I've found that the Usenet FAQs provide an excellent source of information about free software. The comp.compilers FAQ has an extensive list of free language tools, including compilers and interpretors. Likewise, the rec.sport.football FAQ tells of an ftp site (ftp.vnet.com) that contains software for football fans and the comp.db FAQ has a large list of free database packages.
The hardest part about finding an FAQ is finding the name of a newsgroup it's been posted to. If your newsreader lets you search for newsgroups via a regular expression, that's the best way. Both rn and trn let you do this with the “a” command at the newsgroup selection level. Typing “a” followed by a regular expression returns the names of all newsgroups you have access to which satisfy that expression. For example, to look for newsgroups related to computer sound “a comp.*sound” (which returns 6 matches) would be a good way to start. If you're not familiar with regular expressions the GNU grep man page provides a good introduction, as did the introduction to grep in Linux Journal issue 18. If your newsreader doesn't let you search for newsgroups, try grepping the .newsrc file in your home directory (if you have one).
Once you get the name of the newsgroup whose FAQ you'd like to peruse, the rest is easy. The site rtfm.mit.edu collects all of the FAQs that are posted to the net and automatically updates them as new versions are released. This site is very busy and supplies a list of mirrors if you can't log in anonymously.
The directory /pub/usenet contains a directory for each newsgroup which has one or more FAQs. Those directories contain the FAQs themselves.
If you couldn't find the name of a newsgroup that might be helpful, all is not lost. Rtfm has a separate FAQ directory tree identical to Usenet's hierarchy. If you ftp in and cd to /pub/usenet-by-hierarchy you get a directory with a subdirectory for each top level news domain (such as comp, alt, rec and talk). Those subdirectories contain further subdirectories. For example, the newsgroup comp.sys.ibm.pc.soundcard.misc would have it's FAQ stored in the directory /pub/usenet-by-hierarchy/comp/sys/ibm/pc/soundcard/misc on rtfm.mit.edu.
Another resource for finding programs is the comp.archives newsgroup. It contains regular postings from a variety of archive sites detailing their contents. Reading through those posts or searching though it's archives (such as the one at ftp://wuarchive.wustl.edu/usenet/comp.archives) can help narrow down the sites to wade through.
The final choice (and the hardest to use) is the archie search service. While archie was designed for ftp searches it is an old service, and fairly hard to use. The various archie databases on the net each contain file lists from about 1200 ftp sites throughout the world, providing a total of around 2.5 million unique filenames that may be searched (the whole database is currently around 400MB). If a file can be ftped from the net, archie knows about it.
The World Wide Web provides the easiest way to conduct archie searches. A huge list of archie gateways on the web is available at web.nexor.co.uk/archie.html. While many of them support forms-based browsers, most also provide searchable indexes.
Several direct archie search clients are also available from archie.mcgill.ca in the /pub/archie-clients directory and come with some Linux distributions. They allow searching from either the command line or via a X search engine.
Finally, there are several telnet interfaces to archie databases. For example, telnetting to archie.mcgill.ca provides an interactive archie search session. While it can be somewhat cryptic, typing “help” should help you figure it out.
When you're searching archie it is important to remember that it just indexed file paths. If you know the file name you're looking for this isn't a problem, otherwise you have to guess what a likely name is.
You'll find that when you search on a term that finds matches, the matches will come in a flood. Using one of the archie WWW interfaces to search for “mixer” as a substring found 95 matches, which is what I had the match limit set to. All of the sites it found were in Europe, suggesting that archie hadn't begun to search any of the other continents yet!
The final thing to know about archie is that it can be very slow. It is an extremely popular way of searching the net due to the sheer volume of information it contains and the load on it results in search times of many minutes even at odd times of the night.
The Linux Software Map, the primary linux ftp sites, Usenet FAQs, comp.archives, and archie all provide useful ways of finding freely available Linux software on the net.
Erik Troan (ewt@redhat.com) maintains the sunsite.unc.edu /pub/Linux directory tree and is a developer at Red Hat Software.