- Difference between Machine Learning & Statistical Modeling... ([1])
- Semi-supervised learning frameworks for Python. Interesting and worth a try. ([2])
- The Present and the Future of the KDD Cup Competition. "Three main take-aways from the KDD Cup workshop presentations: XGBoost, Feature Engineering is the king and Team work is crucial." ([3])
- Comparing 7 Python data visualization tools. compare + examples. :) ([4])
- The Three Cultures of Machine Learning. ([5])
- Slides and videos from MLconf 2015, San Francisco ([6])
Learning Curve
Saturday, November 21, 2015
Quick links
What has caught my attention lately:
Tuesday, August 11, 2015
Monday, June 22, 2015
Monday, May 25, 2015
Quick links
What has caught my attention lately:
- A Benchmark Dataset for Time Series Anomaly Detection. ([1])
- Python image processing libraries performance: OpenCV vs Scipy vs Scikit-Image. ([2])
- Exploring Spark MLlib. ([3])
- 7 Python Libraries you should know about ([4])
- Benchmarking random forest implementations. ([5])
- Statistical inference is only mostly wrong. (really?!) ([6])
Sunday, March 22, 2015
Quick links
What has caught my attention lately:
- Distance and similarity in machine learning(Chinese). ([1])
- "How to Choose a Neural Network" from DL4J. ([2])
- Winning solution at the BCI Challenge @ NER 2015. ([3] [4])
- Winning solution of The National Data Science Bowl. Convolutional neural networks win again! A lot of techniques to prevent overfitting! ([5] [6])
- Scikit-image is a collection of algorithms for image processing. ([7])
Tuesday, March 17, 2015
Take away from kaggle tradeshift winner
Tradeshift competition is predicting the probability that a piece of text belongs to each of the 33 classes. Winning solutions can be found in forum thread. Code is also in git.
- Best solution is a weighted average of 14 two stage models, 13 online models and 2 simple one stage models. (blending!!!)
- Prediction of 32 labels are used as features for 2nd half of data. (Labels have strong inter-dependence!)
- Xgboost is chosen as the single metastage classifier. (Xgboost win again!)
- Not only feature analysis, but also need label analysis.
- Feature selection for online model.
- Heavily rely on CV and grid search to fine-tune hyper-parameters.
- Additional (100-300) decision tree features based on Criteo's winning solution.
- Postprocess on some labels. (o_O)
- Even 3-stages solution!
- "sklearn RandomForestClassifier active paths or ended nodes" should be useful to generate tree based features.
- Semi-supervise in blending.
Sunday, March 8, 2015
Quick links
What has caught my attention lately:
- LIBFFM: A Library for Field-aware Factorization Machines. It has been used to win two recent click-through rate prediction competitions (Criteo's and Avazu's). ([1])
- Anscombe's quartet hmm~~ :) ([2])
- Timeseries Classification: KNN & DTW ([3] [4])
- "Outlier and Anomaly Detection In Server Instances With Machine Learning At Netflix: Cody Rioux" DBSAN + MCMC ([5])
- Introduction of python Decorators and Context Managers ([6])
Subscribe to:
Posts (Atom)