Getting Started

This blog will probably exclusively use Python. I think R is great for statistical modeling and visualization, but I believe Python is the better language for data science in general. It has a strong ability to be an all-purpose language. Thus, if you want to follow along with my blog, you will need Python.

I recommend installing the Anaconda distribution. It comes with most of the libraries you will need including numpy, scipy, pandas, and matplotlib. It also includes iPython. Most of my work will be in iPython notebooks. I will host them on github so you can download them yourself. I wrote a quick introduction to python for data mining post which should help get you started if you are new to this.

I will also have some posts on good books and other resources. I will try to keep my list of resources limited to only the ones I have found most useful. I believe having too many great things can make it hard to figure out where to start.

Leave a comment