Thursday, September 24, 2015

Who Knew Python Machine Learning Could Be So Easy?

I recently read "Python Machine Learning" by Sebastian Raschka.  I loved it!(You can buy it at Packt or Amazon)

Technical, but not too much. Let's face it, machine learning algorithms are technical in nature. However, this book allows you to gloss over the actual technical details if you don't really need to understand them right away and view the implementation of the logic in the code snippets. Though, I must say, the presentation of the technical subjects are explained clearly and with supporting graphs and images to help visualize the concepts. It was a wonderful experience to understand the code, even though the theory was also given. This allows most people to jump right in and start writing in python. For the mathematicians out there, you can take the equations and verify them if need be.

The ideas build upon each other and just like teaching a child to talk, the quality of machine learning seems to be about getting good training sets for your algorithms. As such, Sebastian is good about giving in-depth, best practice steps on how to make sure your training data is clean and normalized, as well as your feature selection is relevant - which was great. You'll learn how to merge results from multiple data sets into a more thorough model in order to filter out weaknesses of various algorithms. You'll be able to predict future outcomes using regression analysis using techniques from statistics to look for patterns and anomalies, again, all explained in very understandable words. Though the content and speed of the book is all very good and relevant, the icing on the cake is in the last two chapters (which you need to have worked up to in the previous chapters); understanding and then creating a layered neural network to solve complex problems like hand written digit recognition. And to top it all off he teaches us how to make it more powerful using the Theano tool.

You also gain great insight into many uses of the python language, SQLite databases, implement it by developing a small web application during the process, and understanding some parallel processing - from loading large data sets and processing them to using math and science libraries to process the data without having to be a rocket scientist.

Another great benefit of this read is not just the programming and math techniques you'll learn, but the right questions you need to ask about your data to make the results useful - unsupervised learning. The data sets used in the book range from breast cancer subjects to political science and from movie reviews to processing topics in order to determine a particular bias and then on to image processing.

The only downside (if you can even call it that) is that there is a lot of math involved, but to his credit, Sebastian teaches it in such a way that you don't really need to understand the math equation to understand how the equation works. In essence he is saying, "You don't need to understand the laws of physics to build a house, but here are a set of tools to help you create a magnificent house."

The fundamental concepts I've learned have opened the door to an enormous amount of possibilities I could not have even thought of doing had I not read this book. I used to think that true machine learning was only for super geniuses. But now I feel like I have another set of tools I can use to perform nearly superhero tasks. Python Machine Learning will be a reference book I use for many years to come.