Submitted:
04 January 2026
Posted:
06 January 2026
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Literature Review and Research Gaps
2.1. Computational Content Analysis and ‘Text as Data’
2.2. Topic Models and Structured Discourse Analysis
2.3. Semantic Embeddings and Discourse Drift
2.4. Research Gaps
III. Research Questions and Hypotheses
IV. Data and Research Methodology
4.1. Data Sources
4.2. Analytical Units and Quasi-Temporal Construction
4.3. Methodological Combination
V. Empirical Findings
5.1. Descriptive Findings
5.2. Lagging Governance Pattern
5.3. Discourse Drift and Transitions
5.4. Formation of the Institutional Core
VI. Discussion: Institutionalised Absorption Discourse Model (IADM)
VII. Conclusions
Appendices
Appendix A: Variable Dictionary and Codebook (Codebook + Measures)
| Frame type | Core word family | Synonymy and Extension Rules |
| Technological disruption(AI) | ai, artificial intelligence, autonomous | machine learning, deep learning, algorithmic, agent, automation |
| Cyber intelligence(CyINT) | cyber, CyINT, cyber intelligence | cybersecurity, intrusion, malware, phishing |
| Threat Framework | threat, risk, security | adversary, danger, vulnerability, attack |
| Governance and Accountability | governance, accountability | oversight, responsibility, audit, compliance, ethics |
| Education and Institutionalisation | education, discipline, faculty | curriculum, training, institution, professionalization |
| Measurement and Performance | measure, effectiveness, accuracy | assess, evaluate, benchmark, probability, forecast |
| Variable name | Variable type | Calculation method | Note |
| AI_Intensity(t) | Continuous variable | AI word family frequency / number of words × 1000 | Indicates the impact strength of AI technology |
| Cyber_Intensity(t) | Continuous variable | CyINT Word Family Frequency / Number of Words × 1000 | Indicates the severity of cyber threats |
| Threat_Frame(t) | Continuous variable | Threat Word Cluster Density | Perception of Discourse Risk |
| Governance_Density(t) | Continuous variable | Governance of lexical density | Institutional and Accountability Responses |
| Education_Density(t) | Continuous variable | Educational Lexical Density | Institutionalisation and Disciplinary Construction |
| Measurement_Signal(t) | Continuous variable | Measuring lexical density | Verifiable output |
| Institutionalization_Index(t) | Composite Index | (Governance + Education + Measurement)/3 | Institutionalisation level |
Appendix B: Reproducible Workflow and Robustness Testing
| Inspection Type | Method of operation | Robustness assessment criteria |
| Thematic Stability | K takes 6–9 to compare thematic keyword overlap | Maintaining thematic consistency |
| Robustity of Morphological Restoration | Enable/Disablelemmatization | The fundamental arrangement of variables remains unchanged. |
| Drift Threshold Robustness | Similarity threshold 0.05–0.15 | High overlap rate at transition points |
| Robustness of Embedded Models | TF-IDF+SVD vs SBERT | Drift direction consistent |
Appendix C: GraphRAG Graph Structure and Retrieval Rules
| Element type | Name | Note |
| node | DiscourseUnit | Text page or paragraph |
| node | Topic | Thematic Model Output |
| node | Frame | discourse framework |
| node | Stance | Position and Normativity |
| Edge | Unit→Topic | Topic Weighting |
| Edge | Unit→Frame | Frame density |
| Edge | Unit(t)→Unit(t+1) | Semantic drift |
Appendix D: Measurement Validity and Robustness (Measures Table)
| construct | Variable name | Operational definition | scale | Model character | Validity Statement |
| Technological disruption(AI) | AI_Intensity | Standardised density of AI-related word families in text (per 1000 words) | Continuous variable | exogenous variable | Content validity: Based on clearly defined AI terminology; Structural validity: Related to governance variables. |
| Impact of Internet Technology | Cyber_Intensity | Cyber/CyINTLexical density | Continuous variable | exogenous variable | Content validity: Coverage of critical infrastructure and threat terminology |
| Threat Framework | Threat_Frame | threat/risk/securityLexical density | Continuous variable | mediating variable | Construct validity: significantly correlated with technological disruption |
| Governance and Accountability | Governance_Density | governance/accountabilityLexical density | Continuous variable | mediating variable | Discriminative validity: Distinguishing threats from techniques |
| Institutionalisation of education | Education_Density | education/disciplineLexical density | Continuous variable | Output variable | Content validity: directly reflects institutionalised semantics |
| Measurability | Measurement_Signal | measure/effectivenessLexical density | Continuous variable | Output variable | External validity: Corresponding to verifiable research designs |
| Institutionalisation Index | Institutionalization_Index | Standardised integration of governance, education and measurement | index | Outcome variable | Robustness: PCA is consistent with equal weights |
Appendix E: Robustness Checks and Discourse Turnpoint Case Vignettes
Appendix F: Measures Validity & Robustness
| construct | Content Validity (Why We Measure It This Way) | Construct validity (whether it 'resembles' it) | Robustness test | Potential Bias and Correction |
| AI_Intensity | The AI lexicon directly addresses the semantic core of technological disruption and is explicitly listed as an input element within the summit agenda. | Positively correlated with Cyber_Intensity and Threat_Frame, positioned proximate to the technology-institutional cluster within the semantic space. | After expanding or collapsing the AI synonym list, the peak positions and sorting remain consistent. | The implicit AI discussion may be underestimated; supplementary system names and model nomenclature could be added. |
| Cyber_Intensity | Cyber/CyINT directly addresses the cyber intelligence threat domain and critical infrastructure risks. | Highly correlated with Threat_Frame, forming an independent domain semantic cluster. | The transition point remains stable after merging or splitting the cyber and CyINT vocabularies. | Differences in industry terminology may be rectified through the hierarchical refinement of subdomain glossaries. |
| Threat_Frame | threat/risk/security It is a classic threat framework indicator in security research. | It changes in tandem with technological variables, but exhibits a phased divergence from governance variables. | After removing security, repeated calculations show no change in the overall trend. |
Security semantic generalisation, capable of cross-constraint with adversary/attack. |
| Governance_Density | governance/accountability Direct mapping of institutional governance and accountability frameworks. |
The lag in the sequence relative to the AI peak aligns with the institutional response logic. | A positive correlation is maintained under both lag=1 and lag=2 conditions. |
For low-frequency words, sliding window smoothing can be employed to reduce noise. |
| Education_Density | education/faculty/discipline Corresponding to institutionalised absorption and disciplinary construction. | Constitute the institutional core alongside governance and measurement variables within the semantic space. | Extend the synonym direction of curriculum/training to be consistent. |
Metaphorical expressions may be omitted and supplemented based on thematic weighting. |
| Measurement_Signal | Measurement/Accuracy/Effectiveness corresponds to the verifiable output end. |
Together with Education and Governance, it forms an institutionalised terminal. | Using the assess/compare subset recalculation, peak pages remain consistent. |
Methodological metaphor omissions may be rectified through the task/forecast lexical family. |
| Drift | The cosine similarity between adjacent cells serves as a standard measure of semantic variation. | Low values correspond to clear agenda transitions and align with thematic shifts. | After replacing the threshold quantiles, the overlap rate of transition points exceeds 70%. | Sensitive to embedding methods; alternative embedding verification has been employed. |
| Institutionalization_Index | Governance, education, and measurement constitute the three elements of institutionalised absorption. | The text exhibits a marked elevation in the latter section, consistent with theoretical expectations. | Equal weights and PCA weights show a high degree of correlation. | The subjectivity of weighting can be replaced by Bayesian weighting. |
References
- Blei, D. M.; Ng, A. Y.; Jordan, M. I. Latent Dirichlet allocation. Journal of Machine Learning Research 2003, 3, 993–1022. Available online: https://jmlr.csail.mit.edu/papers/v3/blei03a.html.
- Grimmer, J.; Stewart, B. M. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 2013, 21(3), 267–297. [Google Scholar] [CrossRef]
- Krippendorff, K. Content analysis: An introduction to its methodology, 4th ed.; SAGE, 2018; Available online: https://us.sagepub.com/en-us/nam/content-analysis/book258450.
- National Intelligence University. Proceedings of NIU’s Intelligence Studies Summit 2025. 2025. Available online: https://www.ni-u.edu/wp-content/uploads/2025/12/Proceedings-of-NIUs-Intelligence-Studies-Summit-2025.pdf.
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. arXiv 2019. [Google Scholar] [CrossRef]
- Roberts, M. E.; Stewart, B. M.; Tingley, D. stm: An R package for structural topic models. Journal of Statistical Software 2019, 91(2), 1–40. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).