Tradeshift competition is predicting the probability that a piece of text belongs to each of the 33 classes. Winning solutions can be found in forum
thread.
Code is also in git.
- Best solution is a weighted average of 14 two stage models, 13 online models and 2 simple one stage models. (blending!!!)
- Prediction of 32 labels are used as features for 2nd half of data. (Labels have strong inter-dependence!)
- Xgboost is chosen as the single metastage classifier. (Xgboost win again!)
- Not only feature analysis, but also need label analysis.
- Feature selection for online model.
- Heavily rely on CV and grid search to fine-tune hyper-parameters.
Some other solutions shared in
forum.
No comments:
Post a Comment