Entropy-based regularization of AdaBoost
Abstract
In this study, we introduce an entropy-based method to regularize the AdaBoost algorithm. The AdaBoost algorithm is a well-known algorithm used to create aggregated classifiers. In many real-world classification problems in addition to paying special attention classification accuracy of the final classifier, great focus is placed on tuning the number of the so-called weak learners, which are aggregated by the final (strong) classifier. The proposed method is able to improve the AdaBoost algorithm in terms of both criteria. While many approaches to the regularization of boosting algorithms can be complicated, the proposed method is straightforward and easy to implement. We compare the results of the proposed method (EntropyAdaBoost) with the original AdaBoost and also with its regularized version, ǫ-AdaBoost on several classification problems. It is shown that the proposed methods of EntropyAdaBoost and ǫ-AdaBoost are strongly complementary when the improvement of AdaBoost is considered.
Keywords
AdaBoost, regularization, entropy, EntropyAdaBoost,References
[1] T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, Springer Series in Statistics, 2009. http://www.springer.com/gp/book/9780387848570.[2] R. Meir, G. Ratsch. An introduction to boosting and leveraging. In: S. Mendelson, A.J. Smola [Eds.], Advanced Lectures on Machine Learning, Summer School 2002, Canberra, Australia, February 11–22, 2002, Revised Lectures, 118–183, Springer-Verlag New York, Inc., NY, USA, 2003. https://link.springer.com/chapter/10.1007%2F3-540-36434-X 4.
[3] R.E. Schapire. Theoretical views of boosting and applications. In: O.Watanabe, T. Yokomori [Eds.], Algorithmic Learning Theory: 10th International Conference, ALT99 Tokyo, Japan, December 6–8, 1999 Proceedings, pp. 13–25, Springer, Berlin/Heidelberg, 1999. https://link.springer.com/chapter/10.1007/3-540-46769-6 2.
[4] P. Viola, M.J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2): 137–154, 2004. https://link.springer.com/article/10.1023/B:VISI.0000013087.49260.fb.
[5] W. Jiang. Is regularization unnecessary for boosting? In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2001. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.5229.
[6] Y. Xi, Z. Xiang, P. Ramadge, R. Schapire. Speed and sparsity of regularized boosting. In: D. van Dyk, M.Welling [Eds.], Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Clearwater Beach, Florida, USA, Vol. 5 of JMLR:W&CP 5, pp. 615–622, 2009. http://proceedings.mlr.press/v5/xi09a.html.
[7] P. Buhlmann, T. Hothorn. Boosting algorithms: regularization, prediction and model fitting, Statistical Science, 22: 477–505, 2007. https://www.jstor.org/stable/27645854.
[8] C. Shen, H. Li, A. van den Hengel. Fully corrective boosting with arbitrary loss and regularization. Neural Networks, 48: 44–58, 2013. http://www.sciencedirect.com/science/article/pii/S0893608013001913.
[9] M.K. Warmuth, J. Liao, G. Ratsch. Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd International Conference on Machine Learning, ACM, New York, NY, USA, pp. 1001–1008, 2006. https://users.soe.ucsc.edu/~manfred/pubs/C75.pdf.
[10] D.D. Le, S. Satoh. Ent-Boost: boosting using entropy measures for robust object detection. Pattern Recognition Letters, 28: 1083–1090, 2007. http://www.sciencedirect.com/science/article/pii/S0167865507000190.
[11] R.E. Schapire. A brief introduction to boosting. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), 2: 1401–1406, 1999. https://www.cs.utah.edu/~piyush/teaching/brief intro boosting.pdf.
[12] R.E. Schapire, Y. Freund. Boosting: Foundations and Algorithms. The MIT Press, 2012. https://mitpress.mit.edu/books/boosting.
[13] S. Rosset, J. Zhu, T. Hastie. Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research, 5: 941–973, 2004. http://www.jmlr.org/papers/volume5/rosset04a/rosset04a.pdf.
[14] M. Lichman. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.
[15] J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7: 1–30, 2006. http://www.jmlr.org/papers/v7/demsar06a.html.
[16] J. Derrac, S. Garcıa, D. Molina, F. Herrera. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1: 3–18, 2011. http://www.sciencedirect.com/science/article/pii/S2210650211000034.
[17] J. Alcala-Fdez, L. Sanchez, S. Garcıa, M.J. del Jesus, S. Ventura, J.M. Garrell, J. Otero, C. Romero, J. Bacardit, V.M. Rivas, J.C. Fernandez, F. Herrera. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13: 307–318, 2008. https://link.springer.com/article/10.1007/s00500-008-0323-y.
[18] J. Alcala-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. Garcıa, L. Sanchez, F. Herrera. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17: 255–287, 2011. http://sci2s.ugr.es/keel/pdf/keel/articulo/2011-KEEL-dataset-MVLSC.pdf.