If you're new to Machine Learning algorithms, then you might feel a little bit overwhelmed by the large number of algorithms that you find while browsing the web for tutorials. You hear terms like regression, classification, supervised learning, unsupervised learning and so on, and it might be a little too difficult to concentrate on where to start. After all, the Machine Learning or Data Science fields are more than a little bit intimidating in the beginning.
To help you, I've compiled a little list of Machine Learning algorithms you can study as a beginner. I'll first explain in short the 3 main Machine Learning paradigms and then we'll go over the algorithms list. For some of them I'll even attach a small example for how you can learn more.
Interested in more stories like this? Follow me on Twitter at @b_dmarius and I'll post there every new article.
- Machine Learning paradigms: Supervised Learning vs Unsupervised Learning vs Reinforcement Learning
- Linear Regression
- Logistic Regression
- Naive Bayes
- Decision Trees
- Random forests
Machine Learning paradigms: Supervised Learning vs Unsupervised Learning vs Reinforcement Learning
There are 3 main paradigms for Machine Learning and each one applies to a different set of problems and to different types of datasets. So if you want to start a Machine Learning project and don't know which type of algorithm to go with, the best way is to think deeply about what you are trying to achieve, what data you have available and how is your model going to learn.
What is Supervised Learning
Supervised Learning is a category of Machine Learning algorithms where our dataset looks like a series of pairs of inputs and outputs. The inputs can be one or more features describing our data, and the outputs can be a value or category for which the features match. It's called supervised learning because a human is required to label the dataset with positive and negative examples.
In Supervised Learning we build a model, we feed it examples of inputs and correct outputs and the model will figure out hidden patterns from the dataset. Then, in order to test our model, we provide new inputs and let the model decide on the output to see how it performs.
Mathematically speaking, let's say we have our input X, and Y as our output, then our supervised learning model would be a function f so that
f(X) ≈ Y
I've put "approximately equals" instead of "equals" because you'll see that 100% accuracy on a model is really difficult or next to impossible to obtain on real life use cases. So the function we obtain will be an approximation of the real function which we may never figure out 😁. But of course, our goal will always be to obtain an approximation that is as close as possible to the real function.
Now our X may contain one or more features, and our Y may be a real number(which transforms our problem into a regression taks) or a vector(in case of classifications tasks).
What is Unsupervised Learning
In Unsupervised Learning there are no pre-set labels. Unsupervised Learning algorithms look for previously undetected data into a dataset and use statistics and probability theory elements to organise the data based on the detected patterns. It's called unsupervised learning because no human or manual labelling is required for these types of algorithms to work.
Unsupervised Learning algorithms are used usually used to better understand or organise existing data. The results from these types of algorithms may further be used in other types of algorithms ore they can be used to classify new, incoming data or to structure and explain existing datasets.
What is Reinforcement Learning
In Reinforcement Learning is a type of Machine Learning tasks where we build agents that try to solve a problem step by step. They do this by looking at the current step and trying to find the best action that can be taken so that in the end the reward they receive by solving the problem is maximized.
Again, here we don't need any human interaction during the learning process and no labels are needed. We do need to establish a rewarding strategy though. We need to establish the rules by which we determine whether the model has solved the task or not and how we reward the agent for solving or not solving(rewards can be positive or negative).
Reinforcement Learning algorithms examples: Q-Learning, Tabular Q-Learning.
Top Machine Learning Algorithms for Beginners
The Linear Regression algorithm is used to estimate a real value based on one or more values(which might be continuous or discrete). The value to be estimated is called dependent variable and the values used for estimation are called independent variables.
What this algorith does is trying to find correlations between the independent variables and the dependent variable. If we could figure out the function by which the dependent variable appears with respect to the independent variables, then we figure out how to estimate the dependent one whenever we have new entries for the indepenent variables.
Linear Regression is a type of Supervised Learning, because we need to have a proper dataset prepared on which the model to look and try figure out the patterns and the correlations.
When we have only one independent variable, we say we perform a Simple Linear Regression. For more than one independent variables, we are performing Multiple Linear Regression.
Examples of problems in which you might use Linear Regression:
- estimating the correct price of a house based on a number of features(number of rooms, distance from city centre, year in which it was built)
- estimating the salary of a person based on a number of features(age, years of studies, country)
Logistic Regression is almost the same algortihm as Linear Regression, but instead of estimating a real value, we try to classify an item into one of multiple available classes, so it is a classification task.
If let's say we want to classify an item in our dataset into one of n classes, by using Logistic Regression we will obtain a vector like [p0, p1, p2,...,pn-1], where pi is the probability that the item falls into the i-1-th category. Then we choose the highest probability and we offer that as our class prediction.
You may have figured out already that Logistic Regression is also a type of Supervised Machine Learning and that here we apply the same rule:
- Simple Logistic Regression: one independent variable
- Multiple Logistic Regression: multiple independent variables
Examples of problems in which you might use Linear Regression:
- whether to offer a credit or not to a person based on some features(age, salary, previous debt)
- Estimating whether to buy stocks or not in a trading algorithm
The Naive Bayes algorithm is commonly used as a classifier model and it is highly appreciated for its speed and great results. The classifier works based on the Bayes' theorem.
The gist of the Naive Bayes algorithm is that it works based on the assumption that any two features of an object are not correlated. In reality that's not true of course(hence the name Naive) but using this assumption makes for a simple model and the results are surprinsingly good. The Naive Bayes algorithm is a Supervised Learning type of algorithm.
Examples of problems where you might use the Naive Bayes algorithm: any classification problem where the dataset is small or medium sized and the number of features is reduced.
The Decision Tree classifier is a classification model where the data space is not huge and where the number of features in the dataset is reduced.
Like the Naive Bayes classifier, it is also a simple model with surprisingly good results. It works based on the eponymous concept of Decision Trees.
The decision tree classifier is a Supervised Machine Learning algorithm and is used for classification tasks.
Examples of tasks in which you might use the decision tree classifier: any classification problem where the dataset is small or medium sized and the number of features is reduced.
Random forests often also called random decision forests represent a Machine Learning task that can be used for classification and regression problems. They work by employing a variable number of decision trees and the output is obtained by corroborating the output of the all the decision trees to settle for a single result.
They work based on the principle of power of the wisdom meaning they are based on the assumption that a a collection of decision trees outperform a single decision tree if the forest is built correctly.
Random forests generally work better than decision trees because using many one decision tree can help correct the other when the latter it's wrong. So you might use random forests for any type of problem where you've used decision trees and you're not happy with the results.
As a general rule of thumb, I would recommend first employing decision trees and only then random forests, because the second option requires more processing power and more training time.
The K-means algorithm is a clustering algorithm, meaning it is used for grouping data into two or more groups based on the properties of the data, and more exactly based on certain patterns which are more or less obvious in the data.
It is a type of Unsupervised Machine Learning task because you do not need to have a list of possible pre-populated clusters. The categories will emerge from the algorithm analyzing the data. Because of that, we may call clustering an exploratory machine learning task.
The K-Means clustering algorithm tries to build clusters by assigning every item in our dataset into exactly one of K classes. The number of K classes can be predefined or can be obtained by different try-outs of the model.
As with any other clustering algorithm, it tries to make the items in one cluster as similar as possible, while also making the clusters as different from each other as possible.
- K-Means Clustering Explained: Algorithm And Sklearn Implementation
- K-Means Clustering For Image Segmentation
In this article we took a look at some quick introductions to some of the most beginner-friendly Machine Learning algorithms. First we've listed the 3 Machine Learning algorithms: Supervised, Unsupervised and Reinforcement Learning and then we took a quick peek behind some easy algorithms that you can begin with.
Thank you so much for reading this! Interested in more stories like this? Follow me on Twitter at @b_dmarius and I'll post there every new article.