A Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models

Вантажиться...
Ескіз

Дата

2025

Автори

Науковий керівник

Назва журналу

Номер ISSN

Назва тому

Видавець

Igor Sikorsky Kyiv Polytechnic Institute

Анотація

Unstructured cyber threat intelligence (CTI) reports present major challenges for systematic analysis, particularly when accuracy and reliability are critical. This paper introduces a formal, four-stage mathematical model for constructing canonical knowledge graphs from sensitive textual data. The model integrates the advanced extraction and reasoning capabilities of GPT-5 with deterministic rule-based inference and network analysis to bridge the “formalization gap” between probabilistic large language model (LLM) outputs and verifiable analytical structures. Using a corpus of 204 official CERT-UA incident reports as a test case, the methodology successfully normalized thousands of raw entities, identified central threat actors and high-value targets, and revealed distinct operational ecosystems within Ukraine’s cyber threat landscape. Theoretically, the study contributes a replicable and mathematically defined framework for integrating next-generation LLMs into formalized knowledge graph pipelines. Practically, it provides a scalable and reliable tool for analysts in cybersecurity, national security, and related fields, enabling the transformation of unstructured reports into actionable intelligence

Опис

Ключові слова

Large Language Models (LLM), Cyber Threat Intelligence (CTI), Sensitive Data Analysis, Network Analysis, Entity Resolution, CERT-UA

Бібліографічний опис

Turskyi, V. A Formal Model for Constructing Sensitive Data Graphs from Cyber Reports using Large Language Models / Viktor Turskyi // Theoretical and Applied Cybersecurity: scientific journal. – 2025. – Vol. 7, No. 2. – P. 98-107. – Bibliogr.: 10 ref.

ORCID