Інформаційна технологія обробки природномовних текстів на основі інтеграційного підходу

Сергеєв, Данило Сергійович

Інформаційна технологія обробки природномовних текстів на основі інтеграційного підходу

dc.contributor.author	Сергеєв, Данило Сергійович
dc.date.accessioned	2019-09-13T14:52:53Z
dc.date.available	2019-09-13T14:52:53Z
dc.date.issued	2019
dc.description.abstracten	In the recent years, the research in the field of natural language processing (NLP) has achieved significant practical results, including natural-language voice user interface for mobile devices, significant progress in machine translation technologies, handwriting and voice recognition, etc. At the same time, the task of improving the performance of these systems remains relevant. This study focuses on developing information technology for processing natural language texts based on the integrational approach, aimed to increase efficiency of natural language processing technologies. The subject of the research is models, methods, algorithms and information technologies for natural language processing. Based on the analysis of actual problems in the field of natural language processing, it is shown that applied technologies of natural language processing are successful in fulfilling the intended specific tasks, but it is determined that there is a room for improvement in the area of solving complex problems, in particular, machine translation and natural language search. The role of knowledge bases in information technologies of natural language processing is determined as a necessary component for the interaction of different systems. Existing approaches to developing natural-language knowledge bases are characterized and analyzed. A conclusion is made that existing technologies behind natural-language knowledge bases separately allow to achieve high levels of completeness, consistency and flexibility for practical purposes, but no technology combines high scores on all of the aforementioned qualities. A formal model of knowledge representation for a natural-language knowledge base is created, including models of its main elements, namely the quantum of knowledge, or the smallest element of knowledge, and the relation objects that describe connections between quanta of knowledge. A method for processing natural language texts based on this model of knowledge is developed, including procedures for using information technology in applied natural language processing problems. On the basis of the created models and the method, the procedures for writing and searching natural language skills for natural language processing technologies are developed, which allow to establish links at the structural level between the syntactic structure of the text and the arbitrary structure of the metadata. It is theoretically shown that the complexity of the natural-language search using the developed procedures does not exceed the complexity of the analogues, and on average is less than that of the analogues for complex search queries. Examples of use of the developed information technology for processing natural language texts in practical problems, namely natural language search and machine translation are provided. Writing and searching methods are created based on the knowledge representation model, allowing to establish links at the structural level between syntactic structure of the text and arbitrary structure of the metadata in natural language processing technologies. Information technology for processing natural language texts based on the integrational approach is developed, for which it is theoretically proven that the search complexity does not exceed that of the existing alternatives, and is on average 5-12% lower for complex search queries. Subsystems and operations of such system are defined, and database scheme is developed. Computational complexity of natural language knowledge search in the information system is analyzed and compared with the existing alternatives. Experimental testing of the information system is conducted and the acquired data are analyzed, demonstrating increased relevance of search results of natural language search. Within the framework of the work, information technology for the processing of natural language texts has been developed on the basis of the integrational approach. Based on experimental data acquired from measuring relevance of natural language search results, it has been shown that the developed information technology can increase relevance of search results. Specifically, relevance was increased by 14% on average for the whole set of experimental queries and search results, with no significant increase in relevance detected for the top quartile of results sorted by original relevance, and major increase detected for the lower quartile of original results. The information technology for the processing of natural language texts can be used to improve performance of various natural language processing technologies, in particular natural language search systems, machine translation systems and natural language user interfaces.	uk
dc.description.abstractru	Диссертация посвящена решению актуальной научно-технической задачи разработки информационной технологии обработки естественно-языковых текстов на основе интеграционного подхода. На основе анализа актуальных проблем в области обработки естественного языка показано, что прикладные технологии обработки естественного языка выполняют поставленные задачи, но возможно их усовершенствования для решения комплексных задач, в частности машинного перевода и естественно-языкового поиска. С этой целью создана формальная модель представления знаний в естественно-языковой базе знаний и модели ее основных элементов, а именно кванта знаний, или наименьшего элемента знаний, и отношения, которое описывает связи между квантами знаний. Разработан метод обработки естественно-языковых текстов на основе предложенной модели. На основе созданных моделей и метода разработаны процедуры записи и поиска естественно-языковых знаний для технологий обработки естественного языка, которые позволяют установить связи на структурном уровне между синтаксической структурой текста и произвольной структурой метаданных. Теоретически показано, что сложность естественно-языкового поиска с использованием разработанных процедур не превышает таковую для аналогов, и в среднем является меньшей чем у аналогов для сложных поисковых запросов. В рамках работы разработана информационная технология обработки естественно-языковых текстов на основе интеграционного подхода и экспериментально показано, что ее использование позволяет повысить среднюю релевантность естественно-языкового поиска на 14%.	uk
dc.description.abstractuk	Дисертація присвячена вирішенню актуальної науково-технічної задачі розробки інформаційної технології обробки природномовних текстів на основі інтеграційного підходу. На основі аналізу актуальних проблем у галузі обробки природної мови показано, що прикладні технології обробки природної мови виконують поставлені задачі, але є можливим їх удосконалення для вирішення комплексних задач, зокрема машинного перекладу та природномовного пошуку. З цією метою створено формальну модель представлення знань у природномовній базі знань та моделі її основних елементів, якими є квант знань, або найменший елемент знань, та відношення, яке описує зв'язок між квантами знань. Розроблено метод обробки природномовних текстів на основі запропонованої моделі. На основі створених моделей та методу розроблено процедури записування та пошуку природномовних знань для технологій обробки природної мови, які дозволяють встановити зв’язки на структурному рівні між синтаксичною структурою тексту та довільною структурою метаданих. Теоретично показано, що складність природномовного пошуку з використанням розроблених процедур не перевищує таку для аналогів, і в середньому є меншою ніж в аналогів для складних пошукових запитів. В рамках роботи розроблено інформаційну технологію обробки природномовних текстів на основі інтеграційного підходу та експериментально показано, що її використання дозволяє підвищити середню релевантність природномовного пошуку на 14%.	uk
dc.format.page	23 с.	uk
dc.identifier.citation	Сергеєв, Д. С. Інформаційна технологія обробки природномовних текстів на основі інтеграційного підходу : автореф. дис. … канд. техн. наук. : 05.13.06 – інформаційні технології технічні науки / Сергеєв Данило Сергійович. – Київ, 2019. – 23 с.	uk
dc.identifier.uri	https://ela.kpi.ua/handle/123456789/29285
dc.language.iso	uk	uk
dc.publisher	КПІ ім. Ігоря Сікорського	uk
dc.publisher.place	Київ	uk
dc.subject	інформаційна технологія	uk
dc.subject	природна мова	uk
dc.subject	обробка природної мови	uk
dc.subject	інтеграційний підхід	uk
dc.subject	база знань	uk
dc.subject	квант знань	uk
dc.subject	пошук	uk
dc.subject	машинний переклад	uk
dc.subject	information technology	uk
dc.subject	natural language	uk
dc.subject	natural language processing	uk
dc.subject	integrational approach	uk
dc.subject	knowledge base	uk
dc.subject	quantum of knowledge	uk
dc.subject	search	uk
dc.subject	machine translation	uk
dc.subject	информационная технология	uk
dc.subject	естественный язык	uk
dc.subject	обработка естественного языка	uk
dc.subject	интеграционный подход	uk
dc.subject	база знаний	uk
dc.subject	квант знаний	uk
dc.subject	поиск	uk
dc.subject	машинный перевод	uk
dc.subject.udc	004.93	uk
dc.title	Інформаційна технологія обробки природномовних текстів на основі інтеграційного підходу	uk
dc.type	Thesis	uk

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: Sergeiev_aref.pdf
Розмір:: 567.5 KB
Формат:: Adobe Portable Document Format
Опис:

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 8.98 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Автореферати (КТК)
Автореферати (вільний доступ)