The repository of the DataHack Jerusalem 2015 Cisco team
Where to start:
- http://docs.continuum.io/anaconda/index Anaconda might be your safest bet to install everything together.
- It’s a full package and environment manager, read http://conda.pydata.org/docs/intro.html http://conda.pydata.org/docs/test-drive.html
- or you can use pip (http://pip.readthedocs.org/en/stable/installing/) and then use it as a package manager with the whl files from http://www.lfd.uci.edu/~gohlke/pythonlibs/
- http://pydata.org/downloads/ useful links to many of the libraries
Some must haves:
- Numpy
- http://www.numpy.org/
- Scipy
- http://scipy.org/
- Pandas
- http://pandas.pydata.org/
- http://pandas.pydata.org/pandas-docs/stable/10min.html
- http://pandas.pydata.org/pandas-docs/stable/tutorials.html
- Scikit-learn
- http://scikit-learn.org/stable/tutorial/basic/tutorial.html
- http://scikit-learn.org/stable/user_guide.html
- http://scikit-learn.org/stable/auto_examples/index.html
- Matplotlib
- http://matplotlib.org/index.html
- http://matplotlib.org/gallery.html
- IPython / Jupyter
- http://jupyter.org/
- http://nbviewer.ipython.org/github/ipython/ipython/blob/3.x/examples/Notebook/Index.ipynb
- http://nbviewer.ipython.org/ - all sorts of example notebooks
- Bokeh
- http://bokeh.pydata.org/en/latest/
- http://bokeh.pydata.org/en/latest/docs/gallery.html
- Have a look at D3.js too
- http://d3js.org/
- https://github.com/mbostock/d3/wiki/Gallery
- http://datahack-il.com/data/
- http://datahack-il.com/challenges/
- http://archive.ics.uci.edu/ml/index.html - all sorts of classic datasets
- https://www.kaggle.com/competitions (look at the completed ones too!)
- https://github.com/caesar0301/awesome-public-datasets - like the name implies
- http://scikit-learn.org/stable/datasets/index.html#datasets
- http://mldata.org/repository/data/
- http://mlcomp.org/
- http://cisco.safaribooksonline.com/book/software-engineering-and-development/9781783988365 - scikit-learn focus, very good
- http://cisco.safaribooksonline.com/book/programming/python/9781783555130 - new one, a little advanced, but very practical
- http://cisco.safaribooksonline.com/book/programming/python/9781783981960 - pandas focus
- http://cisco.safaribooksonline.com/book/programming/python/9781449323592 - data analysis (pandas and more) focus
- http://cisco.safaribooksonline.com/book/programming/python/9781784392772 - 2nd edition of one of the more famous books, lots of examples
- http://cisco.safaribooksonline.com/book/programming/python/9781785280429 - broad array of data science topics
- http://cisco.safaribooksonline.com/book/programming/python/9781783284818 - IPython, visualizations
- http://cisco.safaribooksonline.com/book/information-technology-and-software-development/9781783989485 - somewhat more advanced scikit-learn ‘cookbook’
- http://cisco.safaribooksonline.com/book/programming/python/9781782161400 - yet another one, which I have as PDF
- http://cisco.safaribooksonline.com/book/programming/machine-learning/9781449330514 - ditto
- https://www.udacity.com/course/intro-to-descriptive-statistics--ud827 - this part covers the more basic stuff I think we probably know quite well
- https://www.udacity.com/course/intro-to-inferential-statistics--ud201 – this is a continuation of the previous course and covers more advanced material which I didn’t know too well like z-tests, t-tests, hypothesis testing etc.
- https://www.udacity.com/course/how-to-use-git-and-github--ud775 – very well done but rather basic introduction to Git and GitHub
- https://www.udacity.com/course/data-visualization-and-d3js--ud507 – I just started but the topic is IMHO very interesting and important
- https://www.udacity.com/course/intro-to-data-science--ud359 – the curriculum is pretty standard by now for this topic in MOOCs, haven’t really tried it out yet.
- https://www.udacity.com/course/intro-to-machine-learning--ud120 - didn’t try it out yet
And then there are the videos of the course that accompanies the excellent ISL book: http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/ http://www-bcf.usc.edu/~gareth/ISL/
And this is a great post covering someone else’s favorite resources: http://www.dataschool.io/how-to-get-better-at-data-science/