Preprint
Article

This version is not peer-reviewed.

Semantic Analysis of Technical Documentation: Systematic Review, Formal Task Definition, and Transformer-Based NER Implementation

Submitted:

11 June 2026

Posted:

15 June 2026

You are already at the latest version

Abstract
The increasing complexity and volume of technical documentation, includ-ing requirements specifications, patents, and engineering reports, creates significant challenges for manual analysis and knowledge extraction. This paper includes a systematic review of methods for semantic content analy-sis of technical documents, with a particular focus on Natural Language Processing (NLP) techniques and Transformer-based models. The study formalizes the task of structured information extraction and provides a mathematical description of Named Entity Recognition (NER) as a core subtask. A practical case study demonstrates an end-to-end NER pipeline for Russian-language technical requirements, leveraging ruRoberta-large via spaCy-transformers. The results highlight both the potential and limitations of current approaches, emphasizing the critical role of annotation con-sistency and document format normalization. This work contributes to the development of intelligent systems for engineering documentation analysis and outlines key directions for future research.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated