# Posts

## kaggle Competition – Prudential life insurance assessment

First, instant gratification!! Continuing with the trials, decided to get  my toes wet with  Kaggle - Prudential life insurance assessment. To motivate oneself, one needs to taste a bit of success only to get addicted to it, so, tried to run a startup script Sklearn Randomforestclassifier. Wow!! submission got accepted and with a ranking of under 1000!! Serious stuff!! Having tasted a bit of success? though nowhere close to the top, I tried to pre-process and do a feature engineering. Only pre-process that was done in the starter script was hash all strings, in fact a nice way to convert strings to numbers. But noticed that there where NaNs in many columns., so, wrote the following method to fill all NaNs #hash strings def do_hashstr(df): for col in df: if df[col].dtype…

## Connecting volunteer and the needy using twitter

During the recent historical deluge, I was witnessing helplessly on the various social media networks, where volunteers looking for the victims and victims looking for various essentials from emergency rescue contacts to water, food and other supplies. Especially,  on the twitter both help and needs poured in and there were good Samaritans  like @rj_balaji, @Actor_Siddharth, FM Radio stations and  many others pitched in to bridge these two. But it was not easy to run through a large list, keep up-to-date, as prompt response was the need of the hour. I thought why not automate it and help mutually discover each other? that's how this twit_sos idea born, thought there could be one such app already,  scored the web but did not find any. I'm sure this may not be an unique idea nevertheless…

## Setting up Apache Spark

It's quite easy to set-up Spark but I figured that out the hard way(the usual way!). First "without Hadoop" is a bummer! what it means that Spark does not need a running instance of Hadoop but needs the Hadoop binaries. Prerequisite - Java 8 or 9 (Oracle or Open) , Python 2.7 (I prefer Python for ML so need to run Python on SPark) Download spark-1.5.2-bin-without-hadoop.tgz and Hadoop 2.7. If you want to setup on MS Windows you must download winutils.exe these are I/O binaries required by Hadoop and unzip in HADOOP_HOME/bin Unzip Spark and Hadoop inside your home folder. Go to the <spark home> conf folder, add the following to the spark-env.sh export JAVA_HOME=<your Java JRE>java-8-oracle export HADOOP_HOME=/home/<your>/hadoop-2.7.1   (edited: no quotes) export PATH=$PATH:$HADOOP_HOME/bin export SPARK_DIST_CLASSPATH=\$(hadoop classpath) Save and exit. Go to <spark home>/bin folder, run pyspark or…

## Tools for ML Data-to-Dashboard

As per widely? reported?! on the net, Python is  one of the most preferred languages of ML. I too like to work on the python (old timer from C/C++), so,  I set about researching on good libs on ML. There are a plenty! that's the problem, even as I write about this blog, there is yet another announcement from Google that a new lib, TensorFlow, is opensourced!! that's great but it only makes choosing ML tools that more difficult. To set-up a Data-to-Dashboard pipeline, we need atleast 4 set of libs/tools. ETL - Extract-Transform-Load, though a Data mining related  activity, ML too needs a set of tools to convert raw data to ML usable data typically vectors. We need to convert anything and everything, be it a text or images, we need to convert it…

## Getting started!!

After nearly 2 years of trying to learn ML and applied maths! finally decided to put my knowledge to test albeit gingerly. First of all finding the right set of tools that are good and more importantly I'm comfortable with googled relentlessly, downloading all sort of libs/SDKs, I stopped googling only when Google got tired and was showing same set of results even after a year! not only that I ended up with a library of books on ML!! It's not that I got into ML all of a sudden, as graduate in Mechanical Engineering, applied maths was always there. Probably it is the only engineering field that had maths throughout the course in various forms, like Thermodynamics, Mechanics of machines, Operations Research and even a few subjects of last semester…

## My trials with ML

Welcome to My trials with ML, it is a just another WordPress site, where I intend to publish how I passionately try to learn and use the ML tools. Thanks to Prof.Andrew Ng and his ML course at Coursera.org! where I got introduced to the ML as well as MIT OpenCourseWare.org MIT 18.02 Multivariable Calculus, by Prof. Denis Auroux, where I redeemed by brain!! Thanks to them! I can fairly say, I, atleast know, what I'm doing!