Analysis of methods of classification of electronic messages based on neural network models

Onishchenko, Volodymyr; Minochkin, Anatolii

Analysis of methods of classification of electronic messages based on neural network models

dc.contributor.author	Onishchenko, Volodymyr
dc.contributor.author	Minochkin, Anatolii
dc.date.accessioned	2024-03-05T15:12:26Z
dc.date.available	2024-03-05T15:12:26Z
dc.date.issued	2023
dc.description.abstract	In the article, the creation of a mechanism for detecting and classifying messages is considered, with an assessment of how effectively different neural networks work and can recognize and classify different types of electronic messages, including phishing attacks, spam, and legitimate messages. A preliminary analysis of incoming messages has been performed, encompassing their headers, text, and other relevant attributes. For instance, in the case of emails, these attributes could be the 'subject' and 'sender' of the message. Methods for data preparation and processing have been reviewed, including text vectorization, noise removal, and normalization, to be utilized in training neural networks. Message tokenization has been performed by transforming them into a numerical format while considering the selection of features. For text messages, it is crucial to execute both tokenization and text vectorization. The model training was performed on the test data with prior splitting into two parts: 80% for training and 20% for testing. The training set is utilized for training the model, while the test set is used to evaluate its effectiveness. The peculiarity of the class structure of the data, namely the uniformity of the distribution of classes, is considered. In this case, spam occurs less frequently than legitimate messages, so class balancing techniques such as random deletion of redundant examples, upsampling, and subsampling were applied to ensure adequate model training. Optimization of network parameters was performed, by researching the optimal parameters of neural networks, such as the number and size of layers, activation functions, and optimization of hyperparameters to achieve the best performance. Hyperparameter optimization includes determining optimal settings for neural networks, such as layer size, activation functions, learning rate, and other parameters. The effectiveness was assessed by comparing the results and performance of various classification methods based on neural networks using metrics such as precision and F1-score. It was determined how well the methods can avoid misclassifications where legitimate messages are mistakenly identified as spam, and vice versa. A comparison of the methods' effectiveness in processing a large volume of messages in real time was conducted. An analysis of different architectures of neural network models was performed. Based on the analysis, it was revealed how effectively different neural network models can recognize and classify messages as spam.
dc.description.abstractother	Розглянуто створення механізму виявлення та класифікація повідомлень з оцінкою, наскільки ефективно працюють різні нейронні мережі та можуть розпізнавати, класифікувати різні типи електронних повідомлень, включаючи фішингові атаки, спам, легітимні повідомлення. Виконано попередній аналіз вхідних повідомлень, включаючи їх заголовки, текст та будь-які інші релевантні атрибути. Розглянуто методи підготовки та обробки даних, включаючи векторизацію тексту, видалення шуму та нормалізацію, для використання в навчанні нейронних мереж. Проведена токенізація повідомлення шляхом перетворення на числовий формат з урахуванням виділення ознак. Для текстових повідомлень, важливо виконати токенізацію та векторизацію тексту. Виконано навчання моделі на тестових даних з попереднім розбиттям на дві частини 80% для навчання, 20% для тестування. Навчальний набір використовується для навчання моделі, а тестовий – для оцінки її ефективності. Враховано особливість класової структури даних, а саме рівномірність розподілу класів. В даному випадку спам зустрічається рідше за легітимні повідомлення тому було застосовано техніки балансування класів для забезпечення адекватного навчання моделі. Для балансування класів було обрано техніки випадкове видалення зайвих прикладів, апсемплінг, субдескретизація. Виконана оптимізація параметрів мереж, шляхом дослідження оптимальних параметрів нейронних мереж, такі як кількість шарів, розмір шарів, функції активації, оптимізація гіперпараметрів для досягнення найкращої продуктивності. Оптимізація гіперпараметрів включає визначення оптимальних налаштувань для нейронних мереж, такі як розмір шарів, функції активації, швидкість навчання та інші параметри. Проведена оцінка ефективності шляхом порівняння результатів та продуктивності різних методів класифікації на основі нейронних мереж, використовуючи метрики, такі як точність, відзив, точність та F1 -оцінку. Визначино, наскільки методи здатні уникати помилкових класифікацій, коли легітимні повідомлення помилково визнаються спамом, і навпаки. Зроблено порівняння ефективність методів у відношенні до обробки великої кількості повідомлень в реальному часі. На основі аналізу виявлено, наскільки ефективно різні моделі нейронних мереж можуть розпізнавати та класифікувати повідомлення як спам. Розроблено рекомендації на основі результатів аналізу.
dc.format.pagerange	Pp. 216-226
dc.identifier.citation	Onishchenko, V. Analysis of methods of classification of electronic messages based on neural network models / Onishchenko Volodymyr, Minochkin Anatolii // Information Technology and Security. – 2023. – Vol. 11, Iss. 2 (21). – Pp. 216–226. – Bibliogr.: 8 ref.
dc.identifier.doi	https://doi.org/10.20535/2411-1031.2023.11.2.293797
dc.identifier.issn	2411-1031
dc.identifier.orcid	0009-0000-1355-9178
dc.identifier.orcid	0000-0002-4123-604X
dc.identifier.uri	https://ela.kpi.ua/handle/123456789/65229
dc.language.iso	en
dc.publisher	National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
dc.publisher.place	Kyiv
dc.relation.ispartof	Information Technology and Security, Vol. 11, Iss. 2 (21)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	message classification
dc.subject	neural networks
dc.subject	natural language processing
dc.subject	spam filtering
dc.subject	text vectorization
dc.subject	email classification
dc.subject	text analysis
dc.subject	model quality evaluation
dc.subject	класифікація повідомлень
dc.subject	нейронні мережі
dc.subject	оброблення природньої мови
dc.subject	фільтрація спаму
dc.subject	векторизація тексту
dc.subject	аналіз тексту
dc.subject	оцінювання якості моделі
dc.subject.udc	004.032.26
dc.title	Analysis of methods of classification of electronic messages based on neural network models
dc.title.alternative	Аналіз методів класифікації електронних повідомлень на основі моделей нейронних мереж
dc.type	Article

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: 293797-681434-1-10-20231228.pdf
Розмір:: 396.06 KB
Формат:: Adobe Portable Document Format

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 8.98 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Information Technology and Security, Vol. 11, Iss. 2 (21)