KNN-Based Algorithm of Hard Case Detection in Datasets for Classification

dc.contributor.authorOkhrimenko, Anton
dc.contributor.authorKussul, Nataliia
dc.date.accessioned2023-11-07T13:22:33Z
dc.date.available2023-11-07T13:22:33Z
dc.date.issued2023
dc.description.abstractThe machine learning models for classification are designed to find the best way to separate two or more classes. In case of class overlapping, there is no possible way to clearly separate such data. Any ML algorithm will fail to correctly classify a certain set of datapoints, which are surrounded by a significant number of another class data points at the feature space. However, being able to find such hardcases in a dataset allows using another set of rules than for normal data samples. In this work, we introduce a KNN-based detection algorithm of data points and subspaces for which the classification decision is ambiguous. The algorithm described in details along with demonstration on artificially generated dataset. Also, the possible usecases are discussed, including dataset quality assessment, custom ensemble strategy and data sampling modifications. The proposed algorithm can be used during full cycle of machine learning model developing, from forming train dataset to real case model inference.uk
dc.format.pagerangePp. 113-118uk
dc.identifier.citationOkhrimenko, A. KNN-Based Algorithm of Hard Case Detection in Datasets for Classification / Anton Okhrimenko, Nataliia Kussul // In Proceedings of International Conference on Applied Innovation in IT, (ICAIIT). – 2023. – Pp. 113-118. – Bibliogr.: 10 ref.uk
dc.identifier.doihttps://doi.org/10.25673/101926
dc.identifier.urihttps://ela.kpi.ua/handle/123456789/62044
dc.language.isoenuk
dc.relation.ispartofProceedings of the 11th International Conference on Applied Innovations in IT, (ICAIIT)uk
dc.subjectKNNuk
dc.subjectDataset Quality Assessmentuk
dc.subjectImbalanced Datasetsuk
dc.subjectHard Casesuk
dc.titleKNN-Based Algorithm of Hard Case Detection in Datasets for Classificationuk
dc.typeArticleuk

Файли

Контейнер файлів
Зараз показуємо 1 - 1 з 1
Вантажиться...
Ескіз
Назва:
KNN-Based_Algo.pdf
Розмір:
884.75 KB
Формат:
Adobe Portable Document Format
Опис:
Ліцензійна угода
Зараз показуємо 1 - 1 з 1
Ескіз недоступний
Назва:
license.txt
Розмір:
9.1 KB
Формат:
Item-specific license agreed upon to submission
Опис: