Інформаційна технологія побудови розподілених сховищ даних гібридного типу

Яцишин, Андрій Юрійович

Інформаційна технологія побудови розподілених сховищ даних гібридного типу

dc.contributor.author	Яцишин, Андрій Юрійович
dc.contributor.degreedepartment	Кафедра автоматизованих систем обробки інформації та управління	uk
dc.contributor.degreefaculty	Факультет інформатики та обчислювальної техніки	uk
dc.contributor.degreegrantor	Національний технічний університет України "Київський політехнічний інститут"	uk
dc.date.accessioned	2016-05-20T11:33:20Z
dc.date.available	2016-05-20T11:33:20Z
dc.date.issued	2016
dc.description.abstracten	This thesis is devoted to the development of methods and tools to design hybrid distributed data warehouses. The analysis of the existing research aimed at solving this problem has shown the lack of the combined logical and physical data distribution in the data warehouse, based on both the data characteristics and query statistics. The thesis proposes the introduction of the concept of multibase data warehouses, as well as their models and inter-level transitions. Logical data distribution is performed based on the data structuredness with the purpose to choose models to represent the source data. Physical data distribution combines placement of data among the nodes and routing data replication, both operations are provided based on the minimal cost criterion. Multibase data warehouse concept is introduced. Multibase data warehouses consist of three levels: the data level, nodes level and data stores level. The data may be stored in the data stores, which are subject to data storage optimization, and the data sources that store the data loaded in the data stores and provide an extra level of data redundancy to protect the data warehouse from data retrieval failures. The nodes are classified into the central nodes that hold geo-independent data, and regional ones that store geo-dependent data. The data stores in nodes are classified into the primary and secondary ones, with the primary stores used for read and write access, and the secondary ones – for read only access. Based on the the concept of multibase data warehouses, the process of their building envisions application of hybrid distributed data warehousing an adaptive technology. It starts with the source analysis and then continues with the conceptual, logical and physical design phases. Conceptual design phase is done based on E/R model with the additional “split entity” and “merge entity” operations to support partly-structured data. To categorize the data into areas, relational model has been extended with “the linked set building” operation. Logical design phase determines the logical model, which describes the entity-to-data-store-element and the link-to-data-store-relation correspondence. To decide which model of data store will be chosen, the structuredness is analyzed, so the data is classified into the structured, semi-structured and partly-structured data. The models to store data are chosen based on the structuredness class and minimal cost criteria. To load the data properly, the procedures of converting data from the models of sources to the model of warehouse and vice versa are described. Physical design phase envisions distribution of data between the nodes and routes data replication based on the minimal cost criteria as well. Minimal cost criteria used in this paper ensures accounting for the data placement, processing and replication cost. The data placement and replication configuration, which provides minimal cost are found using the modified genetic algorithm. This algorithm uses block-based crossover and mutation with ad hoc adjusted probabilities, as well as the preliminary stage of forming initial population based on the artificial bee colony (ABC) method, which provides for approaching the sought solution and reducing the overall time needed to find it. The Information technology for building distributed hybrid data warehouses which utilizes the proposed models and methods was developed. To implement this technology in public finance of Ukraine, the generic system of public finance management is proposed. It has the levels of data storage, processing and delivery. The multibase data warehouse and its data access services are used for the data storage. The data can be accessed using standard SQL notation, regardless of the node, where the data have been stored and the model represented, and the distributed data services provide for the data usage with the acceptable connection latency regardless of user location that helps to maintain query processing time within the preset limit. Processing of data is done by applications that use the data in the warehouse. Different applications may use the same or different data depending on the business processes and the data warehouse will adapt to the resulting data usage. The data are delivered through both the Web services and sites, as well as non-Web thin clients and thick clients due to the diversity of existing applications and new developments. The examples of Public Finance Management System, “Transparent Budget” information system, as well as several e-learning and scientific resources demonstrate efficiency of the information technology due to the increase of query performance and decrease of total cost of storing and processing data in the warehouse. The software used for implementation of this information technology is Multibase Data Warehouse Management System (MDWMS), which consists of querying agents, database management agents, data placement and data warehouse description modules, MDWMS Configurator and Client applications. These modules have been designed to manage databases based on the proposed method and to support the environment for the user and applications access. User applications connect to the data through the software interface (MDWMS API) and the end users use MDW Client for direct data access. Having provided such access, user applications can use the data, regardless of their location and the actual data model, where they have been stored. The statistics is checked on a regular basis to find out whether the requirements to the data warehouse performance and reliability are met. If these requirements are not met, the repeated data warehouse building is performed based on the current data in the warehouse and data sources, query statistics, so that after such building has been completed, the set requirements will be met.	uk
dc.description.abstractru	В диссертационной работе решена актуальная научно-практическая задача построения распределенных хранилищ данных гибридного типа с учетом свойств данных и статистики выполнения запросов к хранилищу. Осуществлен анализ задачи построения распределенных хранилищ данных гибридного типа, обоснована актуальность решения этой задачи. Определены требования к информационной технологии построения распределенных хранилищ данных гибридного типа. Введено понятие мультибазовых хранилищ данных, разработана концептуальная, логическая и физическая модели таких хранилищ и процедур межуровневых переходов. Описана интеграция данных в хранилище с помощью процедур превращения данных и операций, а также выбора моделей для представления данных. Размещение данных между узлами хранилища и маршрута репликации данных определяются при помощи критерия стоимости и модифицированного генетического алгоритма. На основании предложенных моделей и методов разработана информационная технология построения распределенных хранилищ данных гибридного типа, которая решает поставленную научную задачу. Указанная технология применена при разработке информационных и информационно-аналитических систем для Министерства финансов Украины. Результаты внедрения подтвердили, что она отвечает заданным требованиям.	uk
dc.description.abstractuk	У дисертаційній роботі вирішено актуальне науково-практичне завдання створення інформаційної технології побудови розподілених сховищ даних гібридного типу з врахуванням властивостей даних і статистики виконання запитів до сховища. Здійснено аналіз проблеми побудови сховищ даних з врахуванням властивостей даних і виконуваних запитів, обґрунтовано актуальність вирішення цієї проблеми. Визначено вимоги до інформаційної технології побудови розподілених сховищ гібридного типу. Введено поняття мультибазових сховищ даних, розроблено концептуальну, логічну та фізичну моделі таких сховищ і процедури міжрівневих переходів. Описано інтеграцію даних у сховище за допомогою процедур перетворення елементів даних і операцій, вибору моделей представлення даних. Розташування даних по вузлах, маршрути реплікації даних визначаються за критерієм мінімальної сукупної вартості збереження та обробки даних з використанням модифікованого генетичного алгоритму. На основі запропонованих моделей і методів створено інформаційну технологію побудови розподілених сховищ гібридного типу, яка вирішує поставлене наукове завдання. Зазначена технологія застосована при розробленні інформаційних та інформаційно-аналітичних систем Міністерства фінансів України. Результати впровадження підтвердили її відповідність поставленим вимогам.	uk
dc.format.page	21 с.	uk
dc.identifier.citation	Яцишин А.Ю. Інформаційна технологія побудови розподілених сховищ даних гібридного типу : автореф. дис. ... канд. техн. наук. : 05.13.06 – інформаційні технології / Андрій Юрійович Яцишин. – Київ, 2016. - 21 с.	uk
dc.identifier.uri	https://ela.kpi.ua/handle/123456789/15906
dc.language.iso	uk	uk
dc.publisher	НТУУ "КПІ"	uk
dc.publisher.place	Київ	uk
dc.status.pub	published	uk
dc.subject	сховища даних	uk
dc.subject	проектування	uk
dc.subject	оптимізація	uk
dc.subject	інтеграція	uk
dc.subject	структурованість даних	uk
dc.subject	генетичні алгоритми	uk
dc.subject	data warehousing	en
dc.subject	design	en
dc.subject	optimization	en
dc.subject	integration	en
dc.subject	structured data	en
dc.subject	genetic algorithms	en
dc.subject	хранилища данных	ru
dc.subject	проектирование	ru
dc.subject	оптимизация	ru
dc.subject	интеграция	ru
dc.subject	структурированность данных	ru
dc.subject	генетические алгоритмы	ru
dc.subject.udc	[004.65:004.415.2](043)	uk
dc.title	Інформаційна технологія побудови розподілених сховищ даних гібридного типу	uk
dc.type	Thesis	uk
thesis.degree.level	candidate	uk
thesis.degree.name	кандидат технічних наук	uk
thesis.degree.speciality	05.13.06 – інформаційні технології	uk

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: Yatsyshyn_aref.pdf
Розмір:: 463.98 KB
Формат:: Adobe Portable Document Format

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 7.71 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Автореферати (АСОІУ)
Автореферати (вільний доступ)