Working with YouTube and Extracting Audio
In my last few articles, I've been exploring the capabilities of ImageMagick,
showing that just because you're working on a command line
doesn't mean you're stuck processing only text. As I explained,
ImageMagick makes it easy to work
with images, adding
watermarks and analyzing
content far more accurately than with the standard Linux
file
command,
and much, much more.
Continuing in a similar vein, I want to look at audio and video in this article. Well, maybe "listen" to audio and "look" at video, but again, I'm still focusing on the command line, so in both instances, player/viewer apps are required.
YouTube to MP3 AudioAs someone who watches a lot of lectures online, I'm also intrigued by the online services that can extract just the audio portion of a YouTube or Vimeo video and save it as an MP3. Listening to a lecture while driving is far safer than trying not to watch a video on the move, for example.
Since there are so many live concert performances online, many people also like to use a video-to-MP3 service to add those songs to their music libraries.
Note: be leery of copyright issues with any download and conversion of content. Just because it's on Vimeo, YouTube or other online service, doesn't mean you have permission to extract the audio or even download it and save it on your computer.
Let's start with the most basic functionality: downloading a video from YouTube so you can watch it on your Linux system. There are a lot of browser plugins and even websites devoted to this task, but who wants to risk malware or be plagued by porn site ads? Yech.
Fortunately, there's a terrific public domain program called youtube-dl on GitHub that covers all your needs. At its most basic, it lets you download video content from YouTube and a variety of other online video repositories, but as you'll learn, it can do quite a bit more.
You can grab a copy for your system here.
Let's start by downloading a copy of one of my own YouTube videos. It's a review of the splendid 1More quad-driver headphones, and its URL is https://www.youtube.com/watch?v=BFL1E77hTHQ.
As an aside: I have a YouTube channel where I review consumer electronics and gadgets. You should subscribe! Find all my videos at http://youtube.com/askdavetaylor.
YouTube has a bunch of ways it can assemble a URL, however, including using its URL-shortener youtu.be, but fortunately, youtube-dl can handle the variations.
Downloading a copy of the video to the current working directory is now as simple as:
youtube-dl 'https://www.youtube.com/watch?v=BFL1E77hTHQ'
The full output of the command is a bit, um, hairy, however:
$ youtube-dl 'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
WARNING: Requested formats are incompatible for merge and
will be merged into mkv.
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f137.mp4
[download] 100% of 118.74MiB in 02:49
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f251.webm
[download] 100% of 4.81MiB in 00:03
[ffmpeg] Merging formats into "1More Quad Driver In-Ear
Headphones Reviewed-BFL1E77hTHQ.mkv"
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f137.mp4 (pass -k to keep)
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f251.webm (pass -k to keep)
$
You can wade through the output messages, but it's the message from
companion open-source program ffmpeg that's most important:
merging formats into ... mkv
.
In other words, the download format of the video is MKV by default. MKV is part of the increasingly popular Matroska Multimedia Container format, and it works with a lot of video players (including VideoLan, aka VLC, my favorite cross-platform video player).
A quick ls
reveals the result and that the default filename is taken from
the title of the video, something that might not be particularly desirable:
$ ls -lh *mkv
-rw-r--r-- 1 taylor staff 124M Jan 31 16:56 1More Quad
Driver In-Ear Headphones Reviewed-BFL1E77hTHQ.mkv
Do you prefer to specify the output name and have the output file in MP4 (MPEG4) format instead? That's doable:
$ youtube-dl -o 1more-review.mp4 -f mp4 \
'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
[download] Destination: 1more-review.mp4
[download] 100% of 57.63MiB in 00:27
As a bonus, you get less ominous informational messages from the program too, so it's cleaner. And the output, sure enough, is in MP4 format:
$ ls -lh *mp4
-rw-r--r--@ 1 taylor staff 58M Jan 31 16:57 1more-review.mp4
As a second bonus, it's also more efficient in its video encoding, so the MP4 version of the downloaded video is only 58M as opposed to the 124M of the MKV-merged version.
So how do you watch it? Most likely, do a double-click and it'll be up and running, as shown in Figure 1.
Figure 1. Downloaded YouTube Video Playing in Ubuntu Player
That's easy enough, but the original goal was to be able to extract just the audio component of a YouTube video, so let's look at that task.
Downloading Just the Audio TrackSince I've already started to delve into the command-line options for the youtube-dl program, it's not a leap to find out that there's yet another command-line option that lets you save just the audio portion of a video:
$ youtube-dl -x --audio-format mp3 \
'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.webm
[download] 100% of 4.81MiB in 00:07
[ffmpeg] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.mp3
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.webm (pass -k to keep)
$ ls -lh *mp3
-rw-r--r-- 1 taylor staff 4.0M Jan 31 18:22 1More Quad
Driver In-Ear Headphones Reviewed-BFL1E77hTHQ.mp3
That's easy enough, and the output is delightfully small: 4MB total. The problem is,
there's the same awkward naming issue, so the addition of -o
output-filename
definitely will be a win. But, really, youtube-dl makes
these tasks trivially easy, as long as you're willing to figure out all
of its command-line options.
Instead of worrying about the obscure command-line flag notation, let's just write a script that does the heavy lifting for you. I'm going to call it ytdl for "youtube download", and by default, it'll accept just a URL and output an MP4 format video file that has the same name as the YouTube shortcut (for example, the above video would become BFL1E77hTHQ.mp4).
Add a second parameter, and that becomes the output filename. Specify the
-a
flag, and it saves audio output only, in MP3 format instead.
Let's start with a usage block if the user forgets to specify anything or just needs a simple reminder:
if [ $# -eq 0 ] ; then
echo "Usage: $(basename $0) {-a} YouTubeURL {outputfile}"
echo " where -a extracts the audio portion in MP3 format"
exit 1
fi
That's easy enough. The script is also going to use some predefined combinations of flags to make it easier to write:
youtubedl="/usr/local/bin/youtube-dl"
audioflags="-x --audio-format mp3"
videoflags="-f mp4"
flags=$videoflags # default set of command flags
audioonly=0 # default is audio + video
If the user specifies the -a
flag,
audioonly
will be set to true (that is, 1),
and the default flags will switch from video to audio:
if [ "$1" = "-a" ] ; then
audioonly=1
flags=$audioflags
shift
fi
You'll recall that the shift
command moves all the parameters
"down" one to the left, so $2
becomes
$1
and so on. It's an easy way to
process and discard parameters in a script, of course.
The biggest block of code creates a default output filename from the YouTube URL:
if [ $# -eq 1 ] ; then
# no output filename specified
outfile=$(echo "$1" | cut -d= -f2)
if [ $audioonly -eq 1 ] ; then
outfile="$outfile.mp3"
else
outfile="$outfile.mp4"
fi
else
outfile="$2"
fi
This isn't the most robust code, because it assumes that the URL
specified is in a format like the examples used herein,
youtube-yadda-yadaa?value=shortcode
. It extracts the shortcode and simply
appends an appropriate filename suffix. There are better ways to do this,
but that's okay, this'll work for now. Just realize that your
output format might be a bit weird if you have a very different type of
YouTube URL or a URL from another site.
And, finally, the actual invocation of the
youtube-dl
command:
$youtubedl $flags -o "$outfile" "$1"
That's it! Now you can download a video as simply as:
$ ytdl 'https://www.youtube.com/watch?v=5yXDzg_QDGw' wiper.mp4
And an audio portion with:
$ ytdl -a 'https://www.youtube.com/watch?v=5yXDzg_QDGw'
Nice, eh?
I've way overrun my space for this column, but this is such a fun and simple script atop a terrific, powerful program, that it's worth it, right? And now you know how to make YouTube work for you, rather than vice versa!