Методи резюмування документів на основі моделей-трансформерів

Новицький, Костянтин Віталійович

Методи резюмування документів на основі моделей-трансформерів

dc.contributor.advisor	Шушура, Олексій Миколайович
dc.contributor.author	Новицький, Костянтин Віталійович
dc.date.accessioned	2026-06-04T11:36:38Z
dc.date.available	2026-06-04T11:36:38Z
dc.date.issued	2026
dc.description.abstract	Дипломна робота виконана на 109 сторінках, містить 19 рисунків, 5 таблиць, 2 додатки, 43 джерела в переліку посилань. У сучасних умовах стрімкого зростання обсягів текстової інформації дедалі більшого значення набувають методи автоматичного узагальнення змісту документів. Це стосується наукових статей, урядових матеріалів та інших довгих текстів, довжина яких перевищує розмір контекстного вікна багатьох моделей-трансформерів. За таких умов пряме резюмування часто призводить до втрати важливої інформації, зниження зв’язності та погіршення якості підсумкового тексту. Тому розробка методів, орієнтованих саме на обробку документів великого обсягу, є актуальною задачею сучасної обробки природної мови. Метою дослідження є підвищення якості резюмування довгих документів шляхом розробки гібридного методу на основі моделей-трансформерів з використанням структурно-семантичної сегментації документа, кластеризації текстових блоків, виділення ключових тверджень, локального абстрактивного і подальшого глобального резюмування. Об’єктом дослідження є процес автоматичного резюмування текстових документів. Предметом дослідження є методи та інформаційні технології автоматичного резюмування документів на основі моделей-трансформерів. У роботі використано методи обробки природної мови, семантичного векторного подання тексту, кластеризації, генеративного резюмування на основі моделей-трансформерів, а також експериментальне порівняння за метриками ROUGE, BERTScore та часом виконання. Практична реалізація виконана у вигляді програмної системи з модулем резюмування на Python, серверною частиною на ASP.NET Core та клієнтською частиною на Angular. Наукова новизна полягає у створенні багатоступеневого методу автоматичного резюмування документів великого обсягу, який поєднує структурно-семантичну сегментацію документа, відбір інформативних текстових блоків, кластеризацію змістово близьких фрагментів, виділення ключових тверджень, локальне резюмування та подальше глобальне узагальнення. Практичне значення роботи полягає у розробленні програмної системи резюмування документів, яка забезпечує повний цикл взаємодії користувача із системою: аутентифікацію, завантаження документа, запуск побудови резюме та перегляд отриманого результату через веб-інтерфейс. На вибірці BookSum запропонований підхід показав приріст BERTScore приблизно на 13% та зменшення часу обробки приблизно на 48% порівняно з використанням моделі.
dc.description.abstractother	Thesis consists of 109 pages and includes 19 figures, 5 tables, 2 appendices, and 43 references. Under the current conditions of the rapid growth of textual information, methods for automatic summarization of document content are becoming increasingly important. This applies to scientific articles, government materials, and other long texts whose length exceeds the context window size of many transformer models. Under such conditions, direct summarization often leads to the loss of important information, reduced coherence, and lower quality of the final summary. Therefore, the development of methods specifically oriented toward the processing of long documents is an actual task in modern natural language processing. The purpose of the research is to improve the quality of summarization of long documents by developing a hybrid method based on transformer models using structural-semantic document segmentation, clustering of text blocks, extraction of key statements, and local abstractive and subsequent global summarization. Objectives of the work are: – to analyze existing approaches to document summarization; – to develop a hybrid summarization method; – to select tools and develop software for implementing the method; – to conduct a comparative study of the application of the developed method in comparison with existing approaches to document summarization. The object of the research is the process of automatic summarization of text documents. The subject of the research is methods and information technologies for automatic document summarization based on transformer models. Methods and tools: the work uses methods of natural language processing, semantic vector representation of text, clustering, generative summarization based on transformer models, as well as experimental comparison using ROUGE, BERTScore, and execution time. The practical implementation was carried out in the form of a software system with a summarization module in Python, a server-side component in ASP.NET Core, and a client-side component in Angular. The scientific novelty lies in the creation of a multi-stage method for automatic summarization of large-volume documents, which combines structural-semantic document segmentation, selection of informative text blocks, clustering of semantically related fragments, extraction of key statements, local summarization, and subsequent global generalization. Unlike single-level approaches, such a method makes it possible to better preserve the content of a document when working with long texts and reduces the load on the final generative model. The practical significance of the work lies in the developed document summarization software system, which provides a complete cycle of user interaction with the system: authentication, document upload, summary generation, and viewing of the obtained result through a web interface. The practical value of the system is confirmed experimentally: its application ensures a reduction in the time required to generate a summary and an improvement in the quality of the result. On the BookSum dataset, the proposed approach showed an approximately 13% increase in BERTScore and an approximately 48% reduction in processing time compared with using the model. Approbation of the research results. The main results of the work are presented in an article published in the first issue of 2026 of the professional journal “Visnyk of Kherson National Technical University”, and were also presented at the XXIII International Scientific and Practical Conference of Young Scientists and Students “Modern Problems of Scientific Support of Energy: Safety, Sustainability, IT and Environmental Monitoring (Dedicated to the 40th Anniversary of the Chornobyl Disaster)”.
dc.format.extent	109 с.
dc.identifier.citation	Новицький, К. В. Методи резюмування документів на основі моделей-трансформерів : магістерська дис. : 122 Комп’ютерні науки / Новицький Костянтин Віталійович. – Київ, 2026. – 109 с.
dc.identifier.uri	https://ela.kpi.ua/handle/123456789/81472
dc.language.iso	uk
dc.publisher	КПІ ім. Ігоря Сікорського
dc.publisher.place	Київ
dc.subject	резюмування тексту
dc.subject	обробка природної мови
dc.subject	моделітрансформери
dc.subject	Sentence-BERT
dc.subject	Qwen
dc.subject	автоматичне резюмування
dc.subject	програмна система
dc.subject	text summarization
dc.subject	natural language processing
dc.subject	transformer models
dc.subject	automatic summarization
dc.subject	software system
dc.title	Методи резюмування документів на основі моделей-трансформерів
dc.type	Master Thesis

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: Novytskyi_mahistr.pdf
Розмір:: 2.84 MB
Формат:: Adobe Portable Document Format

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 8.98 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Магістерські роботи (ЦТЕ)
Магістерські роботи