In this section, we describe the methodology for constructing input-output production networks. First, we outline the classification system used to categorize products (i.e., goods). Next, we explain the AI agent-based approach for constructing the networks. Finally, we detail the validation process, which involves using international trade data and correlation-based methods to assess the accuracy of the constructed network.
3.1. Product Classification
In our research, we use the Harmonized System
1 for classifying goods, as it provides a standardized framework for categorizing products based on their economic function and material composition. This classification system is widely used in international trade and economic analysis, making it well-suited for our objective of identifying input-output relationships between goods.
The HS classification is structured into four hierarchical levels: sections, chapters, headings, and subheadings. Sections and chapters define broad categories of goods, offering a high-level classification of product groups. Headings and subheadings provide more detailed categorizations, differentiating products based on specific characteristics. Each product within the HS system is assigned a six-digit code, where the first two digits represent the chapter under which the product is classified, the next two digits correspond to the heading within that chapter, and the final two digits indicate the subheading within the heading.
For our study, we focus on the second and third hierarchical levels, chapters (HS2) and headings (HS4), which correspond to two-digit and four-digit HS codes, respectively. The HS2 classification provides a broad overview of product categories, allowing us to establish general input-output relationships, while the HS4 classification offers a more detailed representation, capturing product-level dependencies. We consider these levels to be the most appropriate for our analysis as they provide a sufficient level of detail while maintaining a clear and structured classification of goods.
3.2. Construction of the Input-Output Production Network
We define an input-output production network as , where V represents the set of all identified vertices, either product categories at the two-digit level (HS2) or individual products at the four-digit level (HS4) of the Harmonized System, depending on the level of classification used. The set E consists of directed edges , where . Each directed edge signifies that the input product or category i is required for the production of the corresponding output product or category o. This network structure enables the representation of production dependencies at different levels of granularity, providing insight into the relationships between inputs and outputs across industries.
We employ an AI agent-based workflow consisting of three sequential steps to construct input-output production networks. First, the LLM generates an initial input-output network by identifying production relationships between product categories classified at the two-digit level of the Harmonized System. This step establishes a broad structural foundation and captures high-level interdependencies in production. Next, the LLM expands on this foundation by refining the network to the four-digit level of the Harmonized System. It systematically maps the input-output relationships between product categories at the HS2 level, as products at the HS4 level, while maintaining the structure established at the HS2 level. This process enhances the network’s granularity, capturing more specific production dependencies. Finally, the LLM evaluates and validates the input-output production network at the HS4 level. For each output good, it reviews all proposed input goods and determines whether they are essential for production. Only confirmed input-output relationships are retained, ensuring the final directed HS4-based network accurately reflects real production dependencies. In
Figure 1, we illustrate each stage of the process, from the initial network generation to the final validation step, highlighting the role of the LLM in identifying and refining the input-output relationships.
3.2.1. Development of the HS2-Based Input-Output Network
To construct the HS2-based input-output network, we first generate all possible ordered pairs of product categories classified under the Harmonized System at the two-digit level. In each pair, the first category represents the potential input, while the second represents the potential output. Next, the LLM systematically evaluates each ordered pair to determine whether any goods within the input category are necessary for producing goods in the output category. The LLM is explicitly instructed to provide a binary response, either "Yes" or "No", to indicate the presence or absence of a production relationship. Based on these responses, we construct a directed network where nodes represent HS2 product categories, and directed edges signify the determined input-output dependencies. This network provides a structured representation of production relationships between product categories based on the HS2 classification. In
Figure 2 and
Figure 3, we present the system prompt and user prompt, respectively.
3.2.2. Refinement to an HS4-Based Input-Output Network
To construct the HS4-based input-output network, we examine each directed edge in the previously constructed HS2-based network. For each input category and its corresponding output category at the HS2 level, we identify all individual products classified under the Harmonized System at the four-digit level. We then generate all possible ordered pairs of input and output products within these HS2 categories. For each ordered pair, the LLM determines whether the input good is required for the production of the output good. The LLM is explicitly instructed to provide a binary response, either "Yes" or "No", to indicate the presence or absence of a production dependency. We provide the system prompt and user prompt used for prompting the LLM in
Figure 4 and
Figure 5, respectively. This process ensures that the HS4-based input-output network is constructed with a higher level of detail, capturing more precise production dependencies while maintaining consistency with the broader HS2-based structure.
3.2.3. Validation of the HS4-Based Input-Output Network
In the final step, the LLM validates the HS4-based input-output network to ensure the accuracy of the identified production relationships. For each output good in the network, we retrieve the set of all associated input goods. Specifically, we identify all ordered pairs
, where
i belongs to the input set
if and only if
. The LLM then evaluates all ordered pairs
, determining which inputs are indeed required for producing the given output. This assessment is based on general knowledge, logical reasoning, and industry practices. Only validated input-output relationships are retained, ensuring that the final HS4-based network accurately reflects real-world production dependencies. The system and user prompts used in the validation process are provided in
Figure 6 and
Figure 7.
3.3. Validation of the Constructed Networks
To validate the constructed input-output production networks, we conduct a structural and a statistical analysis.
We validate the HS2-based network using WIOD [
21] as a reference. Each HS2 product is mapped to an ISIC
2 industry, and WIOD is filtered to include only these mapped sectors. We then build an input-output table showing monetary flows between sectors. A directed link from sector
to
is inferred if the flow from
to
is greater than the average flow from
to all other sectors. This WIOD-based network serves as a ground truth. We compare it to our constructed network to identify false positives (links in our network but not in WIOD) and false negatives (links in WIOD but missing from our network). Since WIOD covers 2000–2014, we also check whether each link is persistent over time, focusing on temporal coherence.
Meanwhile, the statistical analysis validates the final HS4-based network. It consists of two key steps: flow analysis and correlation analysis, based on international trade data. In the flow analysis, we assess whether a country that exports a given output also imports the necessary inputs identified in the network. It is important to note that this assumption does not always hold, as some inputs may be produced domestically rather than imported. However, to validate the network, we proceed with this assumption to evaluate the overall consistency of input-output relationships. In the correlation analysis, we examine whether there is a strong positive Spearman correlation between the imported value of an input and the exported value of the corresponding output. This analysis is conducted for each input-output pair in the network across all countries in the dataset. A significant positive correlation further supports the validity of the identified production dependencies.
3.3.1. World Input-Output Database (WIOD)
For structural validation, we use data from the World Input-Output Database (WIOD)
3. WIOD provides harmonized input-output tables across countries, capturing inter-industry monetary flows based on the ISIC classification. The dataset covers 43 countries and a "rest of the world" aggregate for 56 industry sectors over 15 years (2000–2014) with annual resolution. It includes both national and international production linkages between industry sectors, enabling the reconstruction of global value chains. This makes WIOD a suitable reference for evaluating the structural consistency of the HS2-based input-output network.
3.3.2. Structural Analysis
If the constructed input-output production network shows that product is used as an input to produce product , and product is associated with industry while product is associated with industry , then we can infer that industry relies on inputs from industry . This forms the basis for identifying inter-industry linkages. The core idea behind structural analysis is that if industry depends on inputs from industry , we should consistently observe significant monetary flows from to over time.
For our structural analysis, we use data from the WIOD, specifically the world input-output tables covering the years 2000 to 2014. These tables report monetary flows between industries, both within and across countries. To construct a global inter-industry transaction matrix, we aggregate the monetary flows from industry to industry across all regions r. This means we sum all flows from to , regardless of the country or region. The result is a single-region (world) input-output model that captures industry-to-industry linkages without considering regional differences.
We then map each HS2 product from the input-output production network to an ISIC industry from the aggregated world input-output table. This step converts the product-level input-output network into an industry-level network. Formally, this involves defining a many-to-one mapping from the set of HS2 products P to the set of industries S in the input-output table. Every product in P is assigned to an industry in S, but not every industry in S has a product mapped to it. In other words, the mapping is not a surjection. We denote the subset of industries in S that have at least one mapped product as .
In addition to the translated HS2-based network, we construct a "ground-truth" structural network from the WIOD. To do this, we consider all pairs of industries
, and compute the Relative Input Intensity index
as defined in Equation
1. Here,
denotes the monetary flow from industry
to industry
in year
t. The
index measures how large the flow from
to
is, relative to the average flow from
to all industries in
in that same year. This gives us a normalized measure of the strength of the connection from
to
over time.
Next, we construct a structural network for each year t from 2000 to 2014. In each year’s network, we include a directed edge from industry to industry if the Relative Input Intensity index . This indicates that the flow from to is higher than the average flow from to all industries in for that year, suggesting a meaningful structural link. If , no edge is added.
Then, we evaluate the "industry-translated" HS2-based input-output network by comparing it to each structural network for the years from 2000 to 2014. Specifically, we classify the links in the HS2-based network as true positives (links that match those in the structural network), false positives (links present in the HS2-based network but not in the structural network), true negatives (links correctly absent in both networks), and false negatives (links missing in the HS2-based network but present in the structural network). Based on these classifications, we compute standard evaluation metrics: precision, recall, and F1 score. This allows us to assess how well the product-level network captures the underlying industry-level structure and its stability over time.
As a final step, we assess the temporal consistency of each predicted edge in the HS2-based network. For each edge, we check whether its corresponding Relative Input Intensity (RII) is greater than 1 in a given percentage of years between 2000 and 2014. This allows us to identify edges that represent persistent structural links, rather than those that appear only sporadically. By incorporating this temporal dimension, we ensure that the evaluation accounts not only for accuracy in individual years but also for the stability of inter-industry relationships over time.
3.3.3. Global Trade Data
For statistical validation, we use international trade data from the United Nations (UN) Comtrade database
4. This dataset contains detailed records of import and export values classified at the four-digit HS4 level. It covers 243 countries over a 28-year period (1994–2021) with annual granularity. The dataset captures country-to-country trade flows for all HS4-classified products, allowing for a precise analysis of trade patterns. This level of detail provides a solid foundation for assessing global trade flows and validating the constructed HS4-based input-output production network.
3.3.4. Flow Analysis
The flow analysis aims to determine whether a country that exports a specific output also imports the necessary inputs identified in the input-output network. As previously noted, this assumption does not always hold since some inputs may be produced domestically rather than imported. However, for network validation, we proceed with this assumption.
For each output product in the network, we analyze all countries and years in which that product is exported. We then check whether the exporting country, in the corresponding year, imports the identified input products from the network. To account for the possibility of domestic production, we consider an input-output relationship as validated if the country imports either all or all but one of the required inputs.
To formalize the flow analysis, let be the set of all ordered pairs , where o is an output good in the input-output production network . The set represents the inputs required to produce o, meaning that and is a non-empty set containing all inputs such that .
For each ordered pair , we identify all ordered pairs , where Y is the set of years and C is the set of countries in our dataset, for which country c exports the output product o in year y. Let this set be denoted as S. For each , we then determine whether country c imports all or all but one of the inputs . Let the subset of S that satisfies this condition be denoted as . As a result of this analysis, for each , we compute the proportion , which indicates the extent to which the production relationship between the output product o and its associated inputs is validated.
3.3.5. Correlation Analysis
The correlation analysis examines whether a strong positive Spearman correlation exists between each input-output pair in the network. For each directed edge, we assess the monotonic relationship between the import value of the input product and the export value of the corresponding output product.
To quantify this relationship, we calculate the Spearman correlation coefficient between the imported value of the input product and the exported value of the output product for each country over the period from 1994 to 2021. A moderate to strong positive correlation provides additional evidence supporting the validity of the identified input-output dependencies.
Formally, for each edge in the input-output production network , we identify all countries that import the input product i and export the output product o for each year , where Y is the set of years and C is the set of countries in the dataset. Let denote this subset of countries. Then, for each , we check if the Spearman correlation coefficient s between the imported value of i and the exported value of o over the time period T is greater than 0.3. The time period T is the ordered sequence of years . Let the subset of that satisfies this condition be denoted as . As a result of this analysis, for each , we compute which represents the proportion of countries for which the input-output relationship is supported by a moderate to strong positive correlation.