Sunday, March 22, 2015

Quick links

What has caught my attention lately:
  • Distance and similarity in machine learning(Chinese). ([1])
  • "How to Choose a Neural Network" from DL4J. ([2])
  • Winning solution at the BCI Challenge @ NER 2015. ([3] [4])
  • Winning solution of The National Data Science Bowl. Convolutional neural networks win again! A lot of techniques to prevent overfitting! ([5] [6])
  • Scikit-image is a collection of algorithms for image processing. ([7])

Tuesday, March 17, 2015

Take away from kaggle tradeshift winner

Tradeshift competition is predicting the probability that a piece of text belongs to each of the 33 classes. Winning solutions can be found in forum threadCode is also in git.

  • Best solution is a weighted average of 14 two stage models, 13 online models and 2 simple one stage models. (blending!!!)
  • Prediction of 32 labels are used as features for 2nd half of data. (Labels have strong inter-dependence!)
  • Xgboost is chosen as the single metastage classifier. (Xgboost win again!)
  • Not only feature analysis, but also need label analysis. 
  • Feature selection for online model. 
  • Heavily rely on CV and grid search to fine-tune hyper-parameters.
Some other solutions shared in forum.

Sunday, March 8, 2015

Quick links

What has caught my attention lately:
  • LIBFFM: A Library for Field-aware Factorization Machines. It has been used to win two recent click-through rate prediction competitions (Criteo's and Avazu's). ([1])
  • Anscombe's quartet hmm~~ :)  ([2])
  • Timeseries Classification: KNN & DTW ([3] [4])
  • "Outlier and Anomaly Detection In Server Instances With Machine Learning At Netflix: Cody Rioux" DBSAN + MCMC ([5])
  • Introduction of python Decorators and Context Managers ([6])

Sunday, March 1, 2015

Quick links

What has caught my attention lately:
  • Dimensionality reduction techniques : "sparse random projections". ... "Again, random projections are not suitable for all datasets. There is no 'silver bullet' approach to dimensionality reduction." ([1] [2])
  • "Ten Lessons Learned from Building (real-life impactful) Machine Learning Systems" ([3])
  • NLP tool list. ([4] [5](can't open now...) )
  • "Optimizing Python in the Real World" ([6])
  • HBase v1.0! ([7])
  • Data Science At Zillow ([8])
  • "The G-means algorithm takes a hierarchical approach to detecting the number of clusters." ([9] [10])
  • Comparing supervised learning algorithms ([11])