Saturday, November 21, 2015

Quick links

What has caught my attention lately:
  • Difference between Machine Learning & Statistical Modeling... ([1])
  • Semi-supervised learning frameworks for Python. Interesting and worth a try. ([2])
  • The Present and the Future of the KDD Cup Competition. "Three main take-aways from the KDD Cup workshop presentations: XGBoost, Feature Engineering is the king and Team work is crucial." ([3])
  • Comparing 7 Python data visualization tools. compare + examples. :) ([4])
  • The Three Cultures of Machine Learning. ([5])
  • Slides and videos from MLconf 2015, San Francisco ([6])

Monday, June 22, 2015

Quick links

What has caught my attention lately:
  • Machine Learning Table of Elements Decoded. No xgboost?! ([1])
  • 6 tricks from otto challenge! ([2])
  • Machine learning evaluation metrics. ([3])
  • Ensembling! ([4])
  • Owen(kaggle #1) talk! ([5])
  • Data science ipython notebooks. ([6])

Monday, May 25, 2015

Quick links

What has caught my attention lately:
  • A Benchmark Dataset for Time Series Anomaly Detection. ([1])
  • Python image processing libraries performance: OpenCV vs Scipy vs Scikit-Image. ([2])
  • Exploring Spark MLlib. ([3])
  • 7 Python Libraries you should know about ([4])
  • Benchmarking random forest implementations. ([5])
  • Statistical inference is only mostly wrong. (really?!) ([6])

Sunday, March 22, 2015

Quick links

What has caught my attention lately:
  • Distance and similarity in machine learning(Chinese). ([1])
  • "How to Choose a Neural Network" from DL4J. ([2])
  • Winning solution at the BCI Challenge @ NER 2015. ([3] [4])
  • Winning solution of The National Data Science Bowl. Convolutional neural networks win again! A lot of techniques to prevent overfitting! ([5] [6])
  • Scikit-image is a collection of algorithms for image processing. ([7])

Tuesday, March 17, 2015

Take away from kaggle tradeshift winner

Tradeshift competition is predicting the probability that a piece of text belongs to each of the 33 classes. Winning solutions can be found in forum threadCode is also in git.

  • Best solution is a weighted average of 14 two stage models, 13 online models and 2 simple one stage models. (blending!!!)
  • Prediction of 32 labels are used as features for 2nd half of data. (Labels have strong inter-dependence!)
  • Xgboost is chosen as the single metastage classifier. (Xgboost win again!)
  • Not only feature analysis, but also need label analysis. 
  • Feature selection for online model. 
  • Heavily rely on CV and grid search to fine-tune hyper-parameters.
Some other solutions shared in forum.

Sunday, March 8, 2015

Quick links

What has caught my attention lately:
  • LIBFFM: A Library for Field-aware Factorization Machines. It has been used to win two recent click-through rate prediction competitions (Criteo's and Avazu's). ([1])
  • Anscombe's quartet hmm~~ :)  ([2])
  • Timeseries Classification: KNN & DTW ([3] [4])
  • "Outlier and Anomaly Detection In Server Instances With Machine Learning At Netflix: Cody Rioux" DBSAN + MCMC ([5])
  • Introduction of python Decorators and Context Managers ([6])