Nearest Neighbor Regression in the Presence of Bad Hubs

Prelegent: 

Krisztian Buza, Budapest University of Technology and Economics

Data: 

04/02/2016 - 12:00

Historia zmian

Data aktualizacji: 18/06/2018 - 09:34; autor zmian: ()
Data aktualizacji: 25/05/2016 - 15:13; autor zmian: (sebastian@iitis.pl)
Data aktualizacji: 17/02/2016 - 13:15; autor zmian: ()
Data aktualizacji: 08/02/2016 - 22:18; autor zmian: ()

Prediction on a numeric scale, i.e., regression, is one of the most prominent machine learning tasks with various applications in finance, medicine, social and natural sciences. Due to its simplicity, theoretical performance guarantees and successful real-world applications, one of the most popular regression techniques is the k nearest neighbor regression. However, k nearest neighbor approaches are affected by the presence of bad hubs, a recently observed phenomenon according to which some of the instances are similar to surprisingly many other instances and have a detrimental effect on the overall prediction performance. This paper is the first to study bad hubs in context of regression. We propose hubness-aware nearest neighbor regression schemes. We evaluate our approaches on publicly available real-world datasets from various domains. Our results show that the proposed approaches outperform various other regressions schemes such as kNN regression, regression trees and neural networks. We also evaluate the proposed approaches in the presence of label noise because tolerance to noise is one of the most relevant aspects from the point of view of real-world applications. In particular, we perform experiments under the assumption of conventional Gaussian label noise and an adapted version of the recently proposed hubness-proportional random label noise.

Data aktualizacji: 08/02/2016 - 21:57; autor zmian: ()
Data aktualizacji: 05/02/2016 - 09:23; autor zmian: ()
Prediction on a numeric scale, i.e., regression, is one of the most prominent machine learning tasks with various applications in finance, medicine, social and natural sciences. Due to its simplicity, theoretical performance guarantees and successful real-world applications, one of the most popular regression techniques is the k nearest neighbor regression. However, k nearest neighbor approaches are affected by the presence of bad hubs, a recently observed phenomenon according to which some of the instances are similar to surprisingly many other instances and have a detrimental effect on the overall prediction performance. This paper is the first to study bad hubs in context of regression. We propose hubness-aware nearest neighbor regression schemes. We evaluate our approaches on publicly available real-world datasets from various domains. Our results show that the proposed approaches outperform various other regressions schemes such as kNN regression, regression trees and neural networks. We also evaluate the proposed approaches in the presence of label noise because tolerance to noise is one of the most relevant aspects from the point of view of real-world applications. In particular, we perform experiments under the assumption of conventional Gaussian label noise and an adapted version of the recently proposed hubness-proportional random label noise.