2014-07-15 23:15:23 +04:00
A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php.
2014-07-15 23:54:28 +04:00
If you want to contribute to this list, send me a pull request or contact me [@josephmisiti ](https://www.twitter.com/josephmisiti )
2014-07-15 23:15:23 +04:00
## Python
2014-07-15 23:18:51 +04:00
#### Natural Language Processing
2014-07-15 23:15:23 +04:00
* [NLTK ](http://www.nltk.org/ ) - A leading platform for building Python programs to work with human language data.
* [Pattern ](http://www.clips.ua.ac.be/pattern ) - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
* [TextBlob ](http://textblob.readthedocs.org/ ) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
* [jieba ](https://github.com/fxsjy/jieba#jieba-1 ) - Chinese Words Segementation Utilities.
* [SnowNLP ](https://github.com/isnowfy/snownlp ) - A library for processing Chinese text.
* [loso ](https://github.com/victorlin/loso ) - Another Chinese segmentation library.
* [genius ](https://github.com/duanhongyi/genius ) - A Chinese segment base on Conditional Random Field.
2014-07-15 23:18:51 +04:00
#### General-Purpose Machine Learning
* [scikit-learn ](http://scikit-learn.org/ ) - A Python module for machine learning built on top of SciPy.
* [pattern ](https://github.com/clips/pattern ) - Web mining module for Python.
* [NuPIC ](https://github.com/numenta/nupic ) - Numenta Platform for Intelligent Computing.
* [Pylearn2 ](https://github.com/lisa-lab/pylearn2 ) - A Machine Learning library based on [Theano ](https://github.com/Theano/Theano ).
* [hebel ](https://github.com/hannes-brt/hebel ) - GPU-Accelerated Deep Learning Library in Python.
* [gensim ](https://github.com/piskvorky/gensim ) - Topic Modelling for Humans.
* [PyBrain ](https://github.com/pybrain/pybrain ) - Another Python Machine Learning Library.
* [Crab ](https://github.com/muricoca/crab ) - A flexible, fast recommender engine.
* [python-recsys ](https://github.com/ocelma/python-recsys ) - A Python library for implementing a Recommender System.
2014-07-16 00:04:11 +04:00
* [BayesPy ](https://github.com/maxsklar/BayesPy )
2014-07-15 23:18:51 +04:00
#### Data Analysis / Data Visualization
* [SciPy ](http://www.scipy.org/ ) - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
* [NumPy ](http://www.numpy.org/ ) - A fundamental package for scientific computing with Python.
* [Numba ](http://numba.pydata.org/ ) - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
* [NetworkX ](https://networkx.github.io/ ) - A high-productivity software for complex networks.
* [Pandas ](http://pandas.pydata.org/ ) - A library providing high-performance, easy-to-use data structures and data analysis tools.
* [Open Mining ](https://github.com/avelino/mining ) - Business Intelligence (BI) in Python (Pandas web interface)
* [PyMC ](https://github.com/pymc-devs/pymc ) - Markov Chain Monte Carlo sampling toolkit.
* [zipline ](https://github.com/quantopian/zipline ) - A Pythonic algorithmic trading library.
* [PyDy ](https://pydy.org/ ) - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
* [SymPy ](https://github.com/sympy/sympy ) - A Python library for symbolic mathematics.
* [statsmodels ](https://github.com/statsmodels/statsmodels ) - Statistical modeling and econometrics in Python.
* [astropy ](http://www.astropy.org/ ) - A community Python library for Astronomy.
* [matplotlib ](http://matplotlib.org/ ) - A Python 2D plotting library.
* [bokeh ](https://github.com/ContinuumIO/bokeh ) - Interactive Web Plotting for Python.
* [plotly ](https://plot.ly/python ) - Collaborative web plotting for Python and matplotlib.
* [vincent ](https://github.com/wrobstory/vincent ) - A Python to Vega translator.
* [d3py ](https://github.com/mikedewar/d3py ) - A plottling library for Python, based on [D3.js ](http://d3js.org/ ).
* [ggplot ](https://github.com/yhat/ggplot ) - Same API as ggplot2 for R.
* [Kartograph.py ](https://github.com/kartograph/kartograph.py ) - Rendering beautiful SVG maps in Python.
* [pygal ](http://pygal.org/ ) - A Python SVG Charts Creator.
2014-07-16 03:06:28 +04:00
* [pycascading ](https://github.com/twitter/pycascading )
2014-07-15 23:18:51 +04:00
2014-07-15 23:52:14 +04:00
#### Misc Scripts / iPython Notebooks
* [pattern_classification ](https://github.com/rasbt/pattern_classification )
* [thinking stats 2 ](https://github.com/Wavelets/ThinkStats2 )
* [hyperopt ](https://github.com/hyperopt/hyperopt-sklearn )
* [numpic ](https://github.com/numenta/nupic )
* [2012-paper-diginorm ](https://github.com/ged-lab/2012-paper-diginorm )
* [ipython-notebooks ](https://github.com/ogrisel/notebooks )
2014-07-16 00:04:11 +04:00
* [decision-weights ](https://github.com/CamDavidsonPilon/decision-weights )
2014-07-15 23:52:14 +04:00
2014-07-15 23:18:51 +04:00
2014-07-15 23:15:23 +04:00
## Ruby
2014-07-15 23:24:19 +04:00
#### Natural Language Processing
* [Treat ](https://github.com/louismullie/treat ) - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ ve encountered so far for Ruby
* [Ruby Linguistics ](http://www.deveiate.org/projects/Linguistics/ ) - NLTK for Ruby
* [Stemmer ](https://github.com/aurelian/ruby-stemmer )
* [Ruby Wordnet ](http://www.deveiate.org/projects/Ruby-WordNet/ )
* [Raspel ](http://sourceforge.net/projects/raspell/ )
* [UEA Stemmer ](https://github.com/ealdent/uea-stemmer )
2014-07-16 03:06:28 +04:00
* [Twitter-text-rb ](https://github.com/twitter/twitter-text-rb )
2014-07-15 23:30:25 +04:00
#### General-Purpose Machine Learning
2014-07-15 23:30:42 +04:00
* [Ruby Machine Learning ](https://github.com/tsycho/ruby-machine-learning )
* [Machine Learning Ruby ](https://github.com/mizoR/machine-learning-ruby )
* [jRuby Mahout ](https://github.com/vasinov/jruby_mahout )
* [CardMagic-Classifier ](https://github.com/cardmagic/classifier )
2014-07-15 23:30:25 +04:00
2014-07-15 23:30:42 +04:00
#### Data Analysis / Data Visualization
2014-07-15 23:30:25 +04:00
2014-07-15 23:31:01 +04:00
* [rsruby ](https://github.com/alexgutteridge/rsruby )
* [data-visualization-ruby ](https://github.com/chrislo/data_visualisation_ruby )
* [ruby-plot ](https://www.ruby-toolbox.com/projects/ruby-plot )
* [plot-rb ](https://github.com/zuhao/plotrb )
* [scruffy ](http://www.rubyinside.com/scruffy-a-beautiful-graphing-toolkit-for-ruby-194.html )
2014-07-16 03:06:28 +04:00
* [SciRuby ](http://sciruby.com/ )
2014-07-16 03:15:40 +04:00
## Javascript
2014-07-16 03:06:28 +04:00
#### Natural Language Processing
* [Twitter-text-js ](https://github.com/twitter/twitter-text-js )
* [NLP.js ](https://github.com/nicktesla/nlpjs )
#### Data Analysis / Data Visualization
2014-07-16 03:06:48 +04:00
* [High Charts ](http://www.highcharts.com/ )
* [NVD3.js ](http://nvd3.org/ )
* [dc.js ](http://dc-js.github.io/dc.js/ )
* [chartjs ](http://www.chartjs.org/ )
* [dimple ](http://dimplejs.org/ )
* [amCharts ](http://www.amcharts.com/ )
2014-07-16 03:06:28 +04:00
#### General-Purpose Machine Learning
* [Convnet.js ](http://cs.stanford.edu/people/karpathy/convnetjs/ ) [DEEP LEARNING]
* [Clustering.js ](https://github.com/tixz/clustering.js )
* [Decision Trees ](https://github.com/serendipious/nodejs-decision-tree-id3 )
* [Node-fann ](https://github.com/rlidwka/node-fann )
* [Kmeans.js ](https://github.com/tixz/kmeans.js )
* [LDA.js ](https://github.com/primaryobjects/lda )
* [Learning.js ](https://github.com/yandongliu/learningjs )
* [Machine Learning ](http://joonku.com/project/machine_learning )
* [Node-SVM ](https://github.com/nicolaspanel/node-svm )
* [Brain ](https://github.com/harthur/brain )
2014-07-15 23:30:25 +04:00
2014-07-15 23:15:23 +04:00
## Scala
2014-07-15 23:52:14 +04:00
#### Natural Language Processing
2014-07-16 03:13:40 +04:00
* [ScalaNLP ](http://www.scalanlp.org/ ) - ScalaNLP is a suite of machine learning and numerical computing libraries.
* [Breeze ](https://github.com/scalanlp/breeze ) - Breeze is a numerical processing library for Scala.
* [Chalk ](https://github.com/scalanlp/chalk ) - Chalk is a natural language processing library.
* [FACTORIE ](https://github.com/factorie/factorie ) - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
2014-07-15 23:52:14 +04:00
#### Data Analysis / Data Visualization
2014-07-16 03:06:28 +04:00
* [Scalding ](https://github.com/twitter/scalding )
* [Summing Bird ](https://github.com/twitter/summingbird )
* [Algebird ](https://github.com/twitter/algebird )
2014-07-15 23:52:14 +04:00
#### General-Purpose Machine Learning
* [Conjecture ](https://github.com/etsy/Conjecture )
2014-07-16 03:13:40 +04:00
2014-07-15 23:15:23 +04:00
## Java
2014-07-15 23:40:45 +04:00
#### Natural Language Processing
* [CoreNLP] (http://nlp.stanford.edu/software/corenlp.shtml)
* [Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml)
* [Stanford POS Tagger] (http://nlp.stanford.edu/software/tagger.shtml)
* [Stanford Name Entity Recognizer] (http://nlp.stanford.edu/software/CRF-NER.shtml)
* [Stanford Word Segmenter] (http://nlp.stanford.edu/software/segmenter.shtml)
* [Tregex, Tsurgeon and Semgrex ](http://nlp.stanford.edu/software/tregex.shtml )
* [Stanford Phrasal: A Phrase-Based Translation System ](http://nlp.stanford.edu/software/phrasal/ )
* [Stanford English Tokenizer ](http://nlp.stanford.edu/software/tokenizer.shtml )
* [Stanford Tokens Regex ](http://nlp.stanford.edu/software/tokensregex.shtml )
* [Stanford Temporal Tagger ](http://nlp.stanford.edu/software/sutime.shtml )
* [Stanford SPIED ](http://nlp.stanford.edu/software/patternslearning.shtml )
* [Stanford Topic Modeling Toolbox ](http://nlp.stanford.edu/software/tmt/tmt-0.4/ )
2014-07-16 03:06:28 +04:00
* [Twitter Text Java ](https://github.com/twitter/twitter-text-java )
2014-07-16 03:15:26 +04:00
2014-07-15 23:40:45 +04:00
#### General-Purpose Machine Learning
* [Mahout ](https://github.com/apache/mahout )
* [Stanford Classifier ](http://nlp.stanford.edu/software/classifier.shtml )
#### Data Analysis / Data Visualization
* [Hadoop ](https://github.com/apache/hadoop-mapreduce )
* [Spark ](https://github.com/apache/spark )
* [Impala ](https://github.com/cloudera/impala )
2014-07-15 23:15:23 +04:00
## Go
2014-07-15 23:52:14 +04:00
#### Natural Language Processing
2014-07-16 03:13:40 +04:00
* [go-porterstemmer ](https://github.com/reiver/go-porterstemmer )
* [paicehusk ](https://github.com/Rookii/paicehusk )
* [snowball ](https://bitbucket.org/tebeka/snowball )
2014-07-15 23:52:14 +04:00
#### General-Purpose Machine Learning
* [Go Learn ](https://github.com/sjwhitworth/golearn )
2014-07-16 03:13:40 +04:00
* [go-pr ](https://github.com/daviddengcn/go-pr )
* [bayesian ](https://github.com/jbrukh/bayesian )
* [go-galib ](https://github.com/thoj/go-galib )
2014-07-15 23:52:14 +04:00
#### Data Analysis / Data Visualization
2014-07-16 03:13:40 +04:00
* [go-graph ](https://github.com/StepLg/go-graph )
* [SVGo ](http://www.svgopen.org/2011/papers/34-SVGo_a_Go_Library_for_SVG_generation/ )
2014-07-15 23:52:14 +04:00
## Matlab
#### Natural Language Processing
2014-07-16 03:06:48 +04:00
* [NLP ](https://amplab.cs.berkeley.edu/2012/05/05/an-nlp-library-for-matlab/ )
2014-07-15 23:52:14 +04:00
#### General-Purpose Machine Learning
2014-07-16 03:06:48 +04:00
* [Training a deep autoencoder or a classifier
on MNIST digits](http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html) [DEEP LEARNING]
* [t-Distributed Stochastic Neighbor Embedding ](http://homepage.tudelft.nl/19j49/t-SNE.html )
* [Spider ](http://people.kyb.tuebingen.mpg.de/spider/ )
* [LibSVM ](http://www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab )
* [LibLinear ](http://www.csie.ntu.edu.tw/~cjlin/liblinear/#download )
2014-07-15 23:52:14 +04:00
#### Data Analysis / Data Visualization
2014-07-16 03:06:48 +04:00
* [matlab_gbl ](https://www.cs.purdue.edu/homes/dgleich/packages/matlab_bgl/ )
* [gamic ](http://www.mathworks.com/matlabcentral/fileexchange/24134-gaimc---graph-algorithms-in-matlab-code )
2014-07-15 23:15:23 +04:00
## Julia
2014-07-16 00:04:11 +04:00
#### General-Purpose Machine Learning
2014-07-15 23:52:14 +04:00
2014-07-16 00:04:11 +04:00
* [PGM ](https://github.com/JuliaStats/PGM.jl )
* [DA ](https://github.com/trthatcher/DA.jl )
* [Regression ](https://github.com/lindahua/Regression.jl )
2014-07-16 00:04:29 +04:00
* [Local Regression ](https://github.com/dcjones/Loess.jl )
* [Naive Bayes ](https://github.com/nutsiepully/NaiveBayes.jl )
* [Mixed Models ](https://github.com/dmbates/MixedModels.jl )
* [Simple MCMC ](https://github.com/fredo-dedup/SimpleMCMC.jl )
* [Distance ](https://github.com/JuliaStats/Distance.jl )
* [Decision Tree ](https://github.com/bensadeghi/DecisionTree.jl )
* [Neural ](https://github.com/compressed/neural.jl )
2014-07-16 00:11:30 +04:00
* [MCMC ](https://github.com/doobwa/MCMC.jl )
* [GLM ](https://github.com/JuliaStats/GLM.jl )
* [Online Learning ](https://github.com/lendle/OnlineLearning.jl )
* [GLMNet ](https://github.com/simonster/GLMNet.jl )
* [Clustering ](https://github.com/JuliaStats/Clustering.jl )
* [SVM ](https://github.com/JuliaStats/SVM.jl )
2014-07-16 03:06:28 +04:00
* [Kernal Density ](https://github.com/JuliaStats/KernelDensity.jl )
* [Dimensionality Reduction ](https://github.com/JuliaStats/DimensionalityReduction.jl )
* [NMF ](https://github.com/JuliaStats/NMF.jl )
2014-07-16 00:11:30 +04:00
2014-07-16 00:04:11 +04:00
#### Natural Language Processing
2014-07-15 23:52:14 +04:00
2014-07-16 00:04:29 +04:00
* [Topic Models ](https://github.com/slycoder/TopicModels.jl )
2014-07-16 00:11:30 +04:00
* [Text Analysis ](https://github.com/johnmyleswhite/TextAnalysis.jl )
2014-07-16 00:04:29 +04:00
2014-07-15 23:52:14 +04:00
#### Data Analysis / Data Visualization
2014-07-16 00:04:11 +04:00
* [Graph Layout ](https://github.com/IainNZ/GraphLayout.jl )
* [Data Frames Meta ](https://github.com/JuliaStats/DataFramesMeta.jl )
* [Julia Data ](https://github.com/nfoti/JuliaData )
* [Data Read ](https://github.com/WizardMac/DataRead.jl )
2014-07-16 00:04:29 +04:00
* [Hypothesis Tests ](https://github.com/JuliaStats/HypothesisTests.jl )
2014-07-16 00:11:30 +04:00
* [Gladfly ](https://github.com/dcjones/Gadfly.jl )
* [Stats ](https://github.com/johnmyleswhite/stats.jl )
* [RDataSets ](https://github.com/johnmyleswhite/RDatasets.jl )
* [DataFrames ](https://github.com/JuliaStats/DataFrames.jl )
* [Distributions ](https://github.com/JuliaStats/Distributions.jl )
* [Data Arrays ](https://github.com/JuliaStats/DataArrays.jl )
* [Time Series ](https://github.com/JuliaStats/TimeSeries.jl )
* [Sampling ](https://github.com/JuliaStats/Sampling.jl )
2014-07-15 23:52:14 +04:00
2014-07-16 00:11:30 +04:00
#### Misc Stuff / Presentations
2014-07-15 23:52:14 +04:00
* [JuliaCon Presentations ](https://github.com/JuliaCon/presentations )
2014-07-16 00:11:30 +04:00
* [SignalProcessing ](https://github.com/davidavdav/SignalProcessing )
* [Images ](https://github.com/timholy/Images.jl )
2014-07-15 23:15:23 +04:00
2014-07-15 23:20:31 +04:00
## Credits
2014-07-15 23:15:23 +04:00
2014-07-15 23:20:31 +04:00
* Some of the python libraries were cut-and-pasted from [vinta ](https://github.com/vinta/awesome-python )
2014-07-16 03:13:40 +04:00
* The few go reference I found where pulled from [this page ](https://code.google.com/p/go-wiki/wiki/Projects#Machine_Learning )
2014-07-15 23:15:23 +04:00