Choosing the right algorithm for your machine learning problem can be quite hard. I've had numerous questions about it during the machine learning course. Also I've seen many people struggle to find more information about it on the internet. There are of course plenty cheat sheets around. But all of them seem to be focused on the tools that the company behind the cheatsheet delivers.
In this second post of the series "Adventures in AI" I will show you how to pick the right algorithm for your machine learning problem.
To make things easier on you I've put together a mind map with a comprehensive set of machine learning algorithms which I will update regularly.
To find out more about the series, check out the first post Adventures in AI part 1 - What is a gradient descent algorithm? for a table of contents.
How to pick an algorithm
There are many cheatsheets available online. None of them very complete though, because they are focused on what a single product offers in terms of machine learning algorithms.
After struggling myself I've decided to make my own map. In this map you can find the different kinds of algorithms that you can use for different problems.
You start in the middle and follow the lines outwards towards the algorithms. I've added notes to various algorithms with tips and tricks for that algorithm.
The map is based on a set of cheatsheets from others that I've found on the internet:
- Scikit: http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
- Azure Machine Learning: https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-cheat-sheet
- SAS: http://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/
I've tried really hard to do justice to the hard work the people behind these maps have put into making their maps.
It could be that sometimes I've put an algorithm in the wrong spot. So if you have any suggestions, please feel free to drop me a note.
Tips and tricks
Choosing an algorithm based on the map is a little easier than just picking at random, but you're not done yet.
There are many algorithms for the same problem. That means that you need to follow a particular strategy to pick the right one.
For example, you want to predict a quantity. You may be tempted to go for accuracy, but I wouldn't do that myself. I would go for the fastest solution first.
Run your data through the algorithm and train your model. Next evaluate how well it performs. Is it good enough? Then stop mucking about with it and go for it.
Only when you see that the fast algorithm isn't good enough you should go for the other algorithms.
If you really feel inclined for a round of benchmarking you could of course pick a number of algorithms from the same category and train models with them and evaluate those.
This in the end usually gives the most accurate results. But it can be quite time consuming.
Want to learn more?
The map is useful for picking the right algorithm for your problem. It is also great if you haven't got much experience with machine learning.
Pick an algorithm from the map, find an open dataset and start experimenting. You will get a much better feel for the algorithm if you've tried it.
Enjoy!