Tuesday, March 17, 2015

Take away from kaggle tradeshift winner

Tradeshift competition is predicting the probability that a piece of text belongs to each of the 33 classes. Winning solutions can be found in forum threadCode is also in git.

  • Best solution is a weighted average of 14 two stage models, 13 online models and 2 simple one stage models. (blending!!!)
  • Prediction of 32 labels are used as features for 2nd half of data. (Labels have strong inter-dependence!)
  • Xgboost is chosen as the single metastage classifier. (Xgboost win again!)
  • Not only feature analysis, but also need label analysis. 
  • Feature selection for online model. 
  • Heavily rely on CV and grid search to fine-tune hyper-parameters.
Some other solutions shared in forum.

No comments:

Post a Comment