Geek Guide: Machine Learning with Python
I first heard the term “machine learning” a few years ago, and to be honest, I basically ignored it that time. I knew that it was a powerful technique, and I knew that it was in vogue, but I didn’t know what it really was— what problems it was designed to solve, how it solved them and how it related to the other sorts of issues I was working on in my professional (consulting) life and in my graduate-school research.
But in the past few years, machine learning has become a topic that most will avoid at their professional peril. Despite the scary-sounding name, the ideas behind machine learning aren’t that difficult to understand. Moreover, a great deal of open-source software makes it possible for anyone to use machine learning in their own work or research. I don’t think it’s an overstatement to say that machine learning already is having a huge impact on the computer industry and on our day-to-day lives.
In this ebook, I introduce the basic ideas behind machine learning and show how you can use Python to apply machine learning ideas to a number of different problems. I hope by the time you finish reading this guide, you’ll not only understand what machine learning aims to do, but also how to apply it to your own work and research.
What Is Machine Learning?Before doing anything else, let’s define the terms: “machine learning” sounds somewhat ominous, leading to a Matrix-like world in which the machines have taken over. But machine learning, at least as our current world sees it, is a mechanism by which computers can put inputs into categories.
Wait, that’s it? No, but that’s a very good starting point for thinking about machine learning.
Human minds basically are pattern-matching machines and excel at finding commonalities among different types of inputs; getting a computer to perform such categorization tasks is more than just an impressive trick. It means that computers can look through a large number of inputs and try to categorize those inputs.
And, of course, if there’s something that computers do better than people, it’s look through large quantities of data.
A related use of machine learning is to predict outputs based on inputs with some degree of certainty. So if I present you with an input value—a child’s age, for example—then you can predict that child’s height. Will your prediction be exact? No, but that’s okay; machine learning uses statistical reasoning. Thus, you’re looking for likely outcomes, not definite outcomes.
Because this is something that statisticians have been doing for years, there definitely are people who ask how machine learning is different from just statistics. One possible answer is that regression, one of the cornerstones of statistics, is just one type of model used in machine learning.
For example, let’s say you’re a credit-card company and you’re trying to determine whether a purchase is legitimate or fraudulent. Too many false positives, and your customers will be angry. Too many false negatives, and you’ll soon be out of business. Machine learning makes it possible to analyze someone’s purchase history and determine whether a purchase is likely to be good or bad.
Another common and famous example is that of identifying e-mail spam. It used to be that spam was not only obnoxious, but also easy to identify. Today, spammers use a variety of techniques to make their e-mail look legitimate. Machine learning allows a computer to accumulate information over time, getting an increasingly clear picture of what is considered a legitimate message.
And of course, if you’ve bought anything on-line in the last decade, you’ve likely been told that “people who bought this product also bought...”, followed by a long list of things that, when you think about it, actually are of interest to you. This sort of categorization also can be attacked using machine learning. As more information is fed into the system, it can make increasingly accurate predictions of what someone is likely to want to buy (or already has bought from another store).
As you can see, the number and types of problems that can be solved using machine learning is large and varied. Consider going back to when Claude Shannon and others first proposed that people could encode boolean logic in electrical circuits. Would you have imagined that today we would be holding powerful computers (mobile phones) in our pockets, sharing videos and e-mail messages effortlessly and globally? In the same way, we’re only at the start of a revolution in machine learning, and it remains to be seen just how far this will go.
Get the Full Geek Guide