Hack and / - A Little Spring Cleaning
No matter how big your hard drives are, at some point you're going to look at your storage and wonder where all the space went. Your /home directory is probably a good example. If you are like me, you don't always clean up after yourself or organize immediately after you download a file. Sure, I have directories for organizing my ISOs, my documents and my videos, but more often than not, my home directory becomes the digital equivalent of a junk drawer with a few tarballs here, an old distribution ISO there and PDF specs for hardware I no longer own. Although some of these files don't really take up space on the disk—it's more a matter of clutter—when I'm running out of storage, I'd like to find the files that take up the most space and decide how to deal with them quickly. This month, I introduce some of my favorite commands for locating space-wasting files on my system and follow up with common ways to clear some space.
First, let's start with file clutter in your main home directory. Although all major GUI file managers these days make it easy to sort a directory by size, because I'm focusing on command-line tips, let's cover how to find the largest files in the current directory via the old standby, ls. If you type:
$ ls -lSh
you'll get a list of all the files in your current directory sorted by size. Of course, if you have a lot of files in the directory, the files you most want to see are probably somewhere along the top of the list, so I typically like to type:
$ ls -lSh | less
to see only the top ten largest files. Now, this is pretty basic, but it's worth reviewing, as you'll use these commands over and over again to track down space-wasting files. Depending on how you structure your home directory, you probably won't find all the large files together. It's more likely that they are scattered into different subdirectories, so you then need to scan through your directory structure recursively, tally up the disk space used in each directory, and sort the output. Luckily, you don't have to resort to ls for this; du does the job quite nicely. For instance, one common use for du that I see referenced a lot is the following:
$ du -sh *
This scans through all the subdirectories you list as arguments (in this case, all the subdirectories within my current directory) and then lists them one by one with human-readable file sizes (the -h option converts the file sizes into megabytes, gigabytes and so forth, so it's easier to read). Here's some example output from that command:
456K bin 28K Default-Compiz 16K hl4070cdwcups-1.0.0-7.i386.deb 344K hl4070cdwlpr-1.0.0-7.i386.deb 27M images 60K LexmarkC750.ppd 850M mail
Although you certainly could work with this information, it would be much easier if it were sorted. To do that, replace the -h argument with -k, and then pipe the output to sort:
$ du -sk * | sort -n 16 hl4070cdwcups-1.0.0-7.i386.deb 28 Default-Compiz 60 LexmarkC750.ppd 344 hl4070cdwlpr-1.0.0-7.i386.deb 456 bin 10224 writing 26948 images 869588 mail
This works better, because now I can see that my local e-mail cache is taking up the bulk of the storage; however, next I would need to change to the mail directory and run the command again, over and over, until I narrow it down to the subdirectory that has the large files. That's why I normally skip the above commands and go straight for what I affectionately call the duck command:
$ du -ck | sort -n . . . 87704 ./.mozilla 87704 ./.mozilla/firefox 119236 ./mail/example.net/sent-mail-2004 119236 ./mail/example.net/sent-mail-2004/cur 869852 ./mail 869852 ./mail/example.net 1064100 . 1064100 total
The -c option essentially recurses into each subdirectory like before, except it keeps a running tally of the space used by each subdirectory down the tree, not just the first level of directories. When it reports its findings, it might list the same top-level directory multiple times. This makes it easy to drill down to the actual directory that consumes the most space, which in this example seems to be ./mail/example.net/sent-mail-2004/cur. If I wanted to clean up files there, I could cd to that directory and then run the ls commands I used above to see which files used the most space.
The duck command works great to discover how the space is being used in your home directory, but if you are like me, your home directory is actually on a different partition from the root filesystem. If root is filling up, you still can use the duck command (with a slight tweak) to see which directories consume the most space. You need root privileges to scan all the directories in your root filesystem, so use either su or sudo -s (depending on how you get root permissions) before the duck command:
# cd / # du -ckx | sort -n . . . 243920 ./usr/lib/openoffice 277600 ./var/cache/apt 296376 ./var/cache 475144 ./var 952096 ./usr/share 1099264 ./usr/lib 2259332 ./usr 2908804 . 2908804 total
The extra -x argument I added above tells du to stay on one filesystem—in this case, the root filesystem. Otherwise, if you don't specify -x and you have /home or other directories on different filesystems, du will scan through those partitions as well, so you ultimately will have to skip them out as you scan through your results. As you can see from this output, the /usr directory takes up the bulk of the space on my system, with /usr/lib using almost half the space inside /usr. Also note that /var/cache/apt is listed here—more on how to deal with that below.
Now that you know how your storage is being used, here are a few common-sense ways to manage those files and free some space. If you do Linux programming, build software from source or regularly download tarballs, you probably have these tarballs lying around along with their extracted directories. One easy way to free up space is to delete either the tarball or the extracted directory. If you build your own kernels, you probably have a number of old kernel source trees in /usr/src that you won't ever use again and could stand to delete.
Another common space-waster is old ISO files. Do you really still need that Red Had 7.2 ISO? If so, burn an archive copy or two to CD and then delete the image. Along those same lines, audio files always end up with either an extra copy in a directory for a mix CD, or if you play with video conversion tools like me, you have video files in different phases of being transcoded. If you are done with a project, why not delete them and save the space?
On desktops, but especially on servers, one of the most common places you will find wasted space is in log directories. Logs definitely can be useful, but some logs and some levels of debugging are useful only immediately after a bug is found; the rest of the time they can be truncated or archived safely. Take a look in /var/log/, and see how many large uncompressed log files you have. If the file is no longer being used, you should gzip it. You would be amazed how far you can compress incredibly large log files if you haven't tried it before. If you aren't sure whether a log file is still being written to, use lsof to check:
# lsof | grep "/path/to/filename"
If you regularly find yourself cleaning up or gzipping the rotated log files in /var/log (they append .0, .1 and so on as they are being rotated), then edit /etc/logrotate.conf and enable compression. Usually, this simply requires finding the commented line labeled #compress and uncommenting it.
Another great place to free up space is in your package manager's local package cache. For instance, in the case of Debian-based systems, the packages apt downloads are cached in /var/cache/apt/archives. You could go to that directory and remove the files manually, or you simply could become root and type:
# apt-get autoclean
to remove all the cached packages you no longer need. If you have a distribution that uses yum, the following two commands will clear out the cached headers and packages from your system:
# yum clean headers # yum clean packages
Finally, archiving can be a good solution when cleaning your storage space. If you have a local file server or one machine with more storage than the rest, why not make sure all your large files exist only there and then access them over the network? Alternatively, burn large files you want to keep but don't immediately need to CD or DVD. Once you are done, you'll have plenty of newly freed space—hopefully, enough to last you until next spring.
Kyle Rankin is a Senior Systems Administrator in the San Francisco Bay Area and the author of a number of books, including Knoppix Hacks and Ubuntu Hacks for O'Reilly Media. He is currently the president of the North Bay Linux Users' Group.