A Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models
Вантажиться...
Файли
Дата
2025
Автори
Науковий керівник
Назва журналу
Номер ISSN
Назва тому
Видавець
Igor Sikorsky Kyiv Polytechnic Institute
Анотація
Unstructured cyber threat intelligence (CTI) reports present major challenges for systematic analysis, particularly when accuracy and reliability are critical. This paper introduces a formal, four-stage mathematical model for constructing canonical knowledge graphs from sensitive textual data. The model integrates the advanced extraction and reasoning capabilities of GPT-5 with deterministic rule-based inference and network analysis to bridge the “formalization gap” between probabilistic large language model (LLM) outputs and verifiable analytical structures. Using a corpus of 204 official CERT-UA incident reports as a test case, the methodology successfully normalized thousands of raw entities, identified central threat actors and high-value targets, and revealed distinct operational ecosystems within Ukraine’s cyber threat landscape. Theoretically, the study contributes a replicable and mathematically defined framework for integrating next-generation LLMs into formalized knowledge graph pipelines. Practically, it provides a scalable and reliable tool for analysts in cybersecurity, national security, and related fields, enabling the transformation of unstructured reports into actionable intelligence
Опис
Ключові слова
Large Language Models (LLM), Cyber Threat Intelligence (CTI), Sensitive Data Analysis, Network Analysis, Entity Resolution, CERT-UA
Бібліографічний опис
Turskyi, V. A Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models / Viktor Turskyi // Theoretical and Applied Cybersecurity: scientific journal. – 2025. – Vol. 7, No. 2. – P. 98-107. – Bibliogr.: 10 ref.