Explained: Prepare.ai Conference T-Shirts
When I wear my Prepare.ai Conference T-shirt, I get a lot of curious looks and explanation requests. So here’s a brief origin story.
You can see below left that the shirt lists the names of three of today’s most popular machine learning (ML) algorithms: neural networks, naive bayes, and boosted decision trees. The layout is modeled off a trending Beatles T-shirt with a simple design that we really liked (shown at right).
We chose the three algorithms for printing based on how much they are used in industry… and also on how cool they sound.
I’ll give a brief explanation for each of them, sorted from bottom to top because that’s probably the easiest way to understand them.
But first, what is machine learning?…
In a nutshell, machine learning (ML) creates a mapping that remembers how to calculate a set of outputs when given a set of inputs. An ML algorithm “trains” or “learns” by looking at paired inputs and outputs over and over while gradually tweaking the internal mapping. Once it’s done training, it freezes that mapping so it can look at new inputs and calculate what the corresponding, unknown outputs should be. This is called making “predictions” or “inferences.”
So basically, ML algorithms learn by example. Let’s dig into our marquee algorithms now…
Boosted Decision Trees
A single decision tree navigates through a series of variables with if/then logic. For example, a decision tree for predicting home price might consider square footage, age, and number of bedrooms. Multiple homes are fed into this tree during training, and the paths, nodes, and threshold values are adjusted to maximize its ability to predict the price based on those three variables.
Once we understand a single decision tree, there is a whole family of more powerful, tree-based algorithms that we can use. Possible enhancements include “bagging” of multiple trees into a “forest” that votes on an answer; adding random sampling to which variables are included in which trees; and “boosting” the data, meaning we take sub-samples of training data multiple different times to effectively simulate a forest of data samples. Even further, we can make the boosting better by applying gradient-optimized boosting such that each new tree model gives more importance to data sub-samples that showed higher error in the preceding trees. These versions of tree algorithms are illustrated in the diagram below.
Naive Bayes is an esteemed veteran in the world of machine learning, based on venerable statistical principles. Namely, joint probability. Typically used in classification problems, this algorithm makes a simplifying assumption that each input variable has an independent probability distribution. This makes each variable “naive” to any small dependencies or interactive effects between variables that might actually exist in the real world. The Bayes part comes from Bayes Theorem, which essentially states that you can multiply event probabilities together to find the joint probability distribution for any of the relevant input variables.
Sticking with our home price example, this would mean that the ultimate probability of a home being classified into a particular price range would equal the probability that the observed square footage results in that price range multiplied by the independent probability that the observed home age results in that price range multiplied by the independent probability that the observed number of bedrooms results in that price range. If you know the probability distributions of the various input variables, you can simply multiply the product of each to predict the combined scenario.
Despite its age, this remains a powerful and very efficient algorithm, still used frequently in language-based tasks. It has the advantage of requiring very little explicit training relative to other methods and can perform well even with high-dimensional data.
Neural networks (NN’s) are comprised of a series of interconnected neurons which receive input signals from the front of the network, and which pass along weighted or modified signals to subsequent neurons. The signals — essentially mathematical sums and products — can exhibit complex behaviors because they have an “activation” gate that can either stop or pass the signal if it falls below or exceeds a learned activation threshold. In this way, the network is loosely modeled on the functioning of the human brain, which only sends nerve impulses down the neuron’s axon if enough sensory stimuli activate the nerve to “fire.”
If a NN has multiple layers between the input and output nodes, this is referred to as a “deep” neural network.
Deep NN’s are the workhorses of the recent explosion in artificial intelligence, powering applications like computer vision for self-driving vehicles. The map of calculations being created there is to take camera files as an input and convert through numerous layers of neurons into output predictions of steering, accelerator, and braking levels. Another current example of deep NN’s are those used by digital assistants to understand language. These models take the input of your voice or text and find the map to convert it into skills, actions, or commands. Exciting stuff!
So there you have it — three of the most important and fun sounding machine learning algorithm names out there. Tell someone about your favorite one today!
Also, we are publicly releasing videos of many of the best presentations from our conference HERE!
Be on the lookout for them as we publish them over the coming weeks.