A Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models

dc.contributor.authorTurskyi, Viktor
dc.date.accessioned2026-02-26T09:54:36Z
dc.date.available2026-02-26T09:54:36Z
dc.date.issued2025
dc.description.abstractUnstructured cyber threat intelligence (CTI) reports present major challenges for systematic analysis, particularly when accuracy and reliability are critical. This paper introduces a formal, four-stage mathematical model for constructing canonical knowledge graphs from sensitive textual data. The model integrates the advanced extraction and reasoning capabilities of GPT-5 with deterministic rule-based inference and network analysis to bridge the “formalization gap” between probabilistic large language model (LLM) outputs and verifiable analytical structures. Using a corpus of 204 official CERT-UA incident reports as a test case, the methodology successfully normalized thousands of raw entities, identified central threat actors and high-value targets, and revealed distinct operational ecosystems within Ukraine’s cyber threat landscape. Theoretically, the study contributes a replicable and mathematically defined framework for integrating next-generation LLMs into formalized knowledge graph pipelines. Practically, it provides a scalable and reliable tool for analysts in cybersecurity, national security, and related fields, enabling the transformation of unstructured reports into actionable intelligence
dc.format.pagerangeP. 98-107
dc.identifier.citationTurskyi, V. A Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models / Viktor Turskyi // Theoretical and Applied Cybersecurity: scientific journal. – 2025. – Vol. 7, No. 2. – P. 98-107. – Bibliogr.: 10 ref.
dc.identifier.doihttps://doi.org/10.20535/tacs.2664-29132025.2.338785
dc.identifier.urihttps://ela.kpi.ua/handle/123456789/79076
dc.language.isoen
dc.publisherIgor Sikorsky Kyiv Polytechnic Institute
dc.publisher.placeKyiv
dc.relation.ispartofTheoretical and Applied Cybersecurity: scientific journal, Vol. 7, No. 2
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/deed.uk
dc.subjectLarge Language Models (LLM)
dc.subjectCyber Threat Intelligence (CTI)
dc.subjectSensitive Data Analysis
dc.subjectNetwork Analysis
dc.subjectEntity Resolution
dc.subjectCERT-UA
dc.subject.udc004.89
dc.titleA Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models
dc.typeArticle

Файли

Контейнер файлів
Зараз показуємо 1 - 1 з 1
Вантажиться...
Ескіз
Назва:
98-107.pdf
Розмір:
9.5 MB
Формат:
Adobe Portable Document Format
Ліцензійна угода
Зараз показуємо 1 - 1 з 1
Ескіз недоступний
Назва:
license.txt
Розмір:
8.98 KB
Формат:
Item-specific license agreed upon to submission
Опис: