1. Introduction
The ability of artificial systems to replicate human-like emotional recall provides a new perspective to evaluate how closely a Large Language Model (LLM) can approximate complex psychological processes [
1]. Examples include the logical reasoning behind providing an argument [
2], the activation of emotion-laden words in clinical settings [
1], or the agreement of naming conventions in social systems [
3]. Humans and LLMs might differ significantly in how emotions and cognition interact with each other. In humans, emotions strongly shape cognitive processes, influencing how people perceive the world, retrieve memories, and approach everyday tasks in general[
4]. When negative emotional states intensify, leading to distress, anxiety or depressive disorders, human behavior and decision making can be further compromised. Anxiety, for instance, is known to negatively impact executive functioning [
5]; depression is closely intertwined with the cognitive structure that sustains it [
6] and stress interferes with processes such as attention, memory formation, and recall [
7]. These connections highlight how close bonds between emotions and cognition are, especially regarding memory. Cognitive network science offers an effective way to examine these connections, enabling researchers to map the network of associations that support or amplify negative emotional states ([
8,
9,
10]).
Cognitive network science conceptualizes knowledge as a network of interconnected concepts [
11,
12,
13], where associations reflect how information is stored and retrieved in a mental lexicon of knowledge [
9]. Within this framework, spreading activation models describe how the retrieval of one concept can trigger cascades of activity across related nodes, allowing emotionally charged words to become more accessible depending on their position and connectivity in the network [
8,
9,
10]. For example, highly central nodes such as "stress" or "anxiety" may receive disproportionate activation, amplifying their salience and sustaining negative emotional states through recurrent loops of activation. This mechanism provides a quantitative account of memory biases observed in psychological distress, where minor triggers can rapidly propagate toward clusters of negatively valenced concepts. By applying these tools, cognitive network science [
11] enables researchers to capture how structural and dynamical properties of semantic networks mirror individual differences in affect, personality, and vulnerability to mental disorders. This one is the approach we follow in this work.
By integrating network metrics of concepts in a network of memory recalls between concepts, this work aims to understand how closely the activation of nodes related to mental distress (such as “anxiety”, “stress”, and “depression”) correlates with measurable psychological indicators of well-being, both at clinical and personality level. Applying network science and spreading activation [
10], we focus on the interpretation of network-driven results in the context of mental health. Additionally, we also aim to highlight the difference between humans, whose cognitive representations are shaped by personal history and socio-cognitive factors [
13,
14], and LLMs, which rely on linguistic training data without direct experience or autobiographical context of events and concepts [
15].
2. Literature Review
Cognitive network science are representational models of cognition. Cognitive network science models human knowledge as a network of concepts linked by learned associations [
11], enabling formal study of how structure shapes retrieval and reasoning [
9,
10,
13]. Foundational spreading-activation theories [
8,
16] propose that cueing one concept propagates activation along associative links, increasing the accessibility of nearby nodes; these ideas are now operationalized in computational simulations that estimate how activation unfolds over time across a lexical network. Normative free-association resources such as the Small World of Words (SWOW) [
17] provide the backbone topology for such models, while affective lexica with Valence–Arousal–Dominance (VAD) norms [
18] annotate nodes with emotional properties, allowing joint analyses of semantics and affect. Within this framework, classic network measures (e.g., degree, shortest paths, clustering, centrality) predict which concepts act as hubs or bridges for activation flow, and hence which ideas are most likely to be retrieved under minimal input [
9,
10]. When negatively valenced clusters (e.g., anxiety, stress, depression) are densely interconnected, even mild cues can traverse short associative paths and disproportionately energize these hubs, sustaining recurrent activation loops consistent with rumination and other maladaptive dynamics observed in psychopathology.
Personality Traits are linked with mental health. The Big Five Personality Traits, Neuroticism, Conscientiousness, Extraversion, Agreeableness, and Openness [
19] have long been linked to psychological well-being and increased risk for psychopathology. Among these, the neuroticism trait is most consistently associated with greater vulnerability to negative emotions and with an elevated risk of developing anxiety and depressive disorders [
14,
20]. Neurotic individuals tend to simulate past and future problems, therefore experiencing negative mood states in the apparent absence of a plausible cause [
21]. From a network science perspective, we can assume that these individuals possess cognitive networks where concepts linked with negative mood states (e.g. "anxiety" or "depression") also have an increased connectivity. This network aspect could lead to quicker or more persistent activation of anxious or depressive thoughts, characteristic of rumination. Rumination is defined as the repetitive and excessive focus on negative emotions, thoughts or events [
22]. It has been strongly linked to the development and maintenance of affective disorders such as depression and anxiety [
22], and commonly occurs in other mental disorders. In the context of spreading activation models [
16], rumination can be interpreted as a self-perpetuating activation loop: once a negative concept is triggered (e.g. concepts/nodes like
depression,
anxiety and
stress), the structure and connectivity of the semantic network determines whether this activation dissipates or initiates a recursive cycle. This occurs because highly central and connected nodes (often representing concepts with a high valence for the individual) activate neighboring nodes in a cluster with similar affective valence, reinforcing the same emotional patterns. Moreover, these secondary nodes activated, may have fewer connections and could potentially return activation to the starting node as well, reinforcing a long-lasting emotional loop characteristic of rumination.
Unlike neuroticism, conscientiousness, extraversion, agreeableness, and openness may serve as protective factors against mental health disorders [
14,
20]. Conscientious individuals have a goal-focused mindset, mitigating rumination. Extraversion correlates with positive affect and sociability, agreeableness promotes harmonious social interactions, while openness encourages creativity and flexible thinking. Collectively, these behavioral traits and thinking tendencies can foster dense positive clusters of nodes and weaken direct connections to negative concepts, potentially protecting from stress factors. Personality traits are thus studied both as a structural and as a behavioral moderator of emotional activation, potentially influencing which concepts are accessible, how strongly they interlink, and how stimuli are elaborated.
Psychometric scales as proxies for assessing mental health. To assess psychological well-being and its relationship with cognitive networks dynamics, three validated psychometric scales were employed: the Depression, Anxiety and Stress Scales (DASS-21), the Life Satisfaction Scale, and the Positive and Negative Affect Schedule (PANAS).
Depression Anxiety Stress Scales (DASS-21). The DASS-21 [
23] is a widely used self-report scale, designed to assess the severity of three distinct psychological constructs: depression, anxiety, and stress. Each subscale consists of seven items that capture core symptomatology: depression (e.g. anhedonia, hopelessness), anxiety (e.g. hyperarousal, excessive worry), and stress (e.g., tension, irritability). By conceptualizing these constructs dimensionally, this tool is especially suited to examine how the different manifestations of distress operate in associative networks. In this context, we hypothesize that individuals with higher scores in the DASS-21 subscale scores may also exhibit cognitive networks with denser and more interconnected negative associations among nodes representing depression, stress and anxiety, much like found in other studies about math anxiety [
10].
Life Satisfaction Scale. The Life Satisfaction Scale [
24] measures global well-being by assessing individual’s overall perception of life quality. The scale consists of five items that evaluate the extent to which individuals perceive their life as fulfilling and meaningful; unlike the DASS-21, this scale provides a cognitive evaluation of well-being, focusing on positive perception of well-being or its absence. In cognitive networks, individuals reporting greater life satisfaction may exhibit a network structure where positive concepts are more central or framed along other positive concepts, as found in past studies with the Emotional Recall Task [
25].
Positive and Negative Affect Schedule (PANAS). The PANAS [
26] is a psychometric scale that differentiates between positive affect (PA) and negative affect (NA). The PA subscale measures the frequency to which an individual experiences positive states (such as attention, determination, enthusiasm), while the NA subscale captures distress-related emotions such as fear, hostility, and guilt. From a network science perspective, high NA scores could be linked to stronger and persistent activation of negative semantic nodes, which in turn could reinforce distress, as found in past studies with the Emotional Recall Task [
25].
The Emotional Recall Task (ERT). The ERT is a free-association paradigm designed to capture how individuals spontaneously retrieve and verbalize emotional experiences [
27]. In its standard form, participants are asked to generate a fixed number of words—typically around ten—that describe how they have felt over a recent period (e.g., the past week or month). These self-reported words are then analyzed in terms of their affective properties (such as valence, arousal, and dominance) and their position within normative lexical networks. By mapping recalled words onto semantic networks, the ERT provides a window into the accessibility of emotion-laden concepts and how patterns of recall may reflect underlying psychological states, such as stress, anxiety, or depression. This approach has been shown to link individual differences in recall with validated psychometric measures, making it a useful tool for exploring the relationship between emotional memory, well-being, and personality [
28].
4. Materials and Methods
For Study 1, five distinct datasets were employed to examine the associations between words generated in the Emotional Recall Task and participants’ psychometric outcomes as measured by the PANAS, DASS-21, and Life Satisfaction scales. For Study 1, five distinct datasets were employed to examine the associations between words generated in the Emotional Recall Task and participants’ psychometric outcomes as measured by the PANAS, DASS-21, and Life Satisfaction scales. These datasets comprised two human participant datasets, which were sourced from previous studies [
25,
27,
29], as will be explained later, and three datasets of LLM-simulated artificial participants, specifically GPT-4, Claude Haiku, and Anthropic Opus. This diversity in sources enables comparison of association patterns across different populations and the assessment of how closely artificial models reflect human-like representations of emotion and well-being.
For Study 2, we re-used data from a large-scale online survey conducted in the United States between May and August 2024, collected and re-shared on an Open Science Repository by De Duro and colleagues [
29]. The survey was administered to a sample of 1,000 adult participants, who first completed a brief demographic questionnaire, followed by a series of psychometric assessments. Personality traits were measured using the IPIP-NEO Inventory (short form) [
30], which evaluates neuroticism, extraversion, openness, agreeableness, and conscientiousness through five items per trait. After completing the scale, participants took part in the Emotional Recall Task (ERT), requiring them to freely generate words describing how they had felt in recent weeks. The resulting data used to construct individual activation trajectories, which were then analyzed to examine relationships between specific personality profiles and the activation strength of nodes associated with mental distress.
Human participants datasets Two datasets were used to analyze the relationship between emotional word associations and psychological states in human subjects. The first dataset, derived from a study by Ying et al. [
27], contains data from 200 native English speakers recruited via Amazon Mechanical Turk. Participants provided ten emotional words describing their feelings over the past month, each accompanied by a self-rated frequency of experience, a valence rating (1–9 scale from unpleasant to pleasant), and an arousal rating (1–9 scale from calm to excited). These responses allowed for the computation of a valence-arousal emotional profile for each participant. After completing the emotional recall task, participants were administered several psychometric instruments: the Positive and Negative Affect Schedule (PANAS) [
26], the Depression Anxiety and Stress Scales (DASS-21) [
23], the Satisfaction With Life Scale (SWLS) [
24].
The second human dataset consists of Emotional Recall Task responses from a larger sample of 1000 individuals, collected by De Duro [
29] to validate the association between personality traits and trust in LLMs using free-recall results. This dataset contains personality traits (the IPIP-NEO questionnaire for personality traits [
31]) and is used to collect broader indicators of emotional and psychological functioning in order to generalize the activation results to a larger human sample. This dataset is used in both Studies 1 and 2.
Artificial participant datasets To simulate artificial participants, three distinct LLMs were employed: OpenAI’s GPT-4, Claude Haiku 3.5, and Anthropic Opus 3.5. For each model, artificial profiles were generated using a fixed, standardized prompt designed to elicit emotionally relevant content and simulate answers to a psychometric questionnaire:
Impersonate a [x] years old [male/female/person].
Please use 10 English words to describe feelings you have experienced during the past month. Reply only with 10 words separated by a comma.
Please read each numbered statement and indicate how much the statement applied to you over the past week. The rating scale is as follows: 0 indicates it did not apply to you at all, 1 indicates it applied to you to some degree, or some of the time, 2 indicates it applied to you to a considerable degree or a good part of time, 3 indicates it applied to you very much or most of the time. Reply only with the vector number corresponding to your answers.
[Statements from the psychometric questionnaire y are listed.]
Repeat the two tasks independently [z] times.
Here,
x represented an age value and ranged to match the age ranges in TILLMI by De Duro and colleagues [
29],
y was a questionnaire among DASS-21, Life Satisfaction and PANAS,
z allowed to repeat 10 times the task, facilitating the simulations. Each LLM was tasked with independently generating hundreds of artificial participants by repeating the prompt across different fictional profiles. For each simulated participant, ten emotional words and full psychometric vectors were collected. These data were treated identically to those of human participants in subsequent analyses, allowing for a direct comparison of emotional word associations and activation dynamics across human and LLM-generated datasets. We excluded the IPIP-NEO Inventory from LLMs’ simulations because the current literature indicates that LLMs without explicit prompting instructions might not possess clear, reliable and well-specified personality traits [
32,
33].
This design enabled us to assess the representational fidelity of LLMs in capturing emotional constructs and their relationship with mental health indicators. Furthermore, the use of three different models allowed for the identification of model-specific biases and performance differences in reproducing human-like semantic associations.
4.1. Preprocessing and Construction of the Network
Only participants whose entire set of recalled words from the Emotional Recall Task existed in the Small World of Words (SWOW) English dataset [
17] were included in the simulations. This filtering step ensured that each word had a corresponding node within the associative network, preventing bias introduced by missing data. When misspellings occurred and a valid corrected version existed in the SWOW, the misspelled words were manually corrected; otherwise, the participant’s full set was excluded to preserve consistency in network structure and input quality.
The cognitive network was constructed from the SWOW dataset [
17], with nodes representing individual words and edges reflecting associative strengths based on lexical recall frequencies. This network was implemented using the NetworkX library (available at
https://networkx.org/, Last Accessed: 06/10/2025).
For the dataset including PANAS and DASS-21 scales, participants’ psychometric scores were linked to their total network activation outputs, i.e. the sum of all activation levels of specific nodes over time, to analyze possible relationships between psychological distress and emotional concepts spreading in memory.
4.2. Spreading Activation Dynamics
We adopted spreading activation as implemented in SpreadPy [
10]. SpreadPy
https://github.com/dsalvaz/SpreadPy (Last Accessed:26/09/2025) is a Python library for simulating spreading activation on single- or multi-layer cognitive networks. Each node
i at time
t carries an activation energy
, which updates in discrete steps according to a retention parameter
r and a transfer function
that distributes residual energy to neighbors. The general update rule is:
where
is the neighborhood of node
i. In case of unweighted connections, SpreadPy assigns equal probability to all neighbors:
where
is the degree of node
i. In the weighted case, transitions depend on normalized edge weights
:
By tuning
r and
, SpreadPy provides a flexible framework for modeling how activation flows through semantic networks, making it a suitable tool for investigating memory biases, rumination, and emotional recall.
Figure 1 contains an overview of the spreading activation dynamics on a cognitive network.
4.3. Spreading Activation Model Implementation
The simulation framework with SpreadPy [
10] was designed to trace the semantic activation of the emotional concepts
stress,
anxiety, and
depression within a cognitive network, following the recall of emotional states. Initially, all ten emotional words recalled by each participant, and expressed from the ERT data, were activated simultaneously. Then, activation spreading was run to simulate how other associated concepts mediated the flow of activation across the cognitive network of free associations, i.e. memory recall patterns.
This approach allowed the observation of the total activation level achieved by a given individual for a target concept related to mental health, i.e. one among "anxiety", "stress" and "depression". In other words, our aim was to identify how individual differences in reported emotions (and, in the case of the first dataset, psychological profiles) influence the dynamics of semantic activation toward negative emotional concepts.
In Study 2, the spreading activation framework described in Study 1 was applied to examine the relationship between personality traits and semantic activation dynamics. The goal was to assess whether individual differences in traits such as neuroticism, conscientiousness, or extraversion influenced semantic activation towards emotionally salient concepts like stress, anxiety, and depression following word recall. As in Study 1, participants’ recalled ERT words served as simultaneous activation inputs in the network, and activation levels for each of the three target nodes were tracked over 200 computational time steps.
4.4. Batch Simulations and Correlation Analysis
Simulations employed a batch processing approach adopting the implementation of spreading activation of SpreadPy [
10]. SpreadPy implements Collins and Loftus’ spreading activation in single-layer and multiplex networks. We here used the single-layer version. For each participant, all ten recall words were activated simultaneously in the semantic network, with activation spreading tracked over 200 time steps. Activation levels of the target nodes “stress,” “anxiety,” and “depression” were recorded at each timestep.
The resulting time series of activation levels for each participant were visualized using log-log plots, and the final activation levels were extracted. For participants in the MTurk dataset, these final activation scores were statistically correlated with their psychometric results from the DASS-21 (Stress, Anxiety, Depression subscales) and PANAS (Positive and Negative Affect) scales, as well as their scores from the Life Satisfaction scale.
Kendall’s Tau coefficient was selected for correlation analysis for it suitability for detecting monotonic yet potentially nonlinear relationships between the shape of activation curves and psychological measures. The visual and statistical comparison aimed to investigate whether higher levels of emotional distress corresponded to faster or more intense activation of negative emotional concepts. All simulations and visualizations were implemented using Spreadpy, Seaborn, and Matplotlib, ensuring methodological continuity with Study 2.
6. Discussion
This paper examined the relationship between the results from the Emotional Recall Task and mental health. By applying the spreading activation model to word pair lists generated through the ERT task, we hypothesized that individuals scoring higher on mental health scales (e.g., DASS-21) and those with higher neuroticism traits would exhibit stronger activation peaks of "cognitive energy" following initial activation of nodes concerning mental distress (e.g., stress, depression, anxiety). The findings supported these hypotheses, revealing stronger peaks of activation reaching these groups and with results comparable across humans and LLMs regarding the relationship between mental health scales’ scores and word pairs generated from the ERT task. These results offer new insights into the structure and the functioning of emotional memory in its relationship with mental health.
Study 1 focused on the link between the word pairs generated from the Emotional Recall Task (ERT) and scores on psychometric scales such as the PANAS, DASS-21 and Life Satisfaction scales. We examined the possible correlation in both human participants and LLMs (Haiku, Opus and GPT-4) by generating simulated participants. This approach allowed us to investigate whether a relationship exists between psychometric scales and the ERT, and to what extent these associations correlate with clinical measures through simulations applied to ERT results and psychometric scales scores.
Study 1 also revealed recurrent patterns in activation trajectories among negative concepts, particularly in participants with higher distress scores. These loops likely reflect the ruminative circuits present in the individuals’ cognitive structure, moving from one negative concept to the following ones. This quantitative pattern indicates that there are semantic closures in the cognitive networks that could lead to negative activation zones where activation is trapped, limiting exits to other clusters. This structure is not only consistent with models of rumination sustaining negative emotional states and some affective disorders, but also aligns with the “attractor states” theory proposed to study dynamic systems and applied to psychology. According to this theory, certain psychological states, thoughts and behavior become stable over time due to repeated reinforcement [
35]. In the same way, negative clusters are automatically activated and reinforced, and this lexical activation is able to generalize to multiple situations over time, maintaining negative emotional states and thoughts and even supporting resistance to change and treatment in mental health disorders.
Additionally, Study 1 aimed at understanding whether LLMs are able to mirror the same emotional and mnemonic dynamics found in humans. The study’s results demonstrated that the associations emerging from the ERT in humans are positively correlate with the results obtained from psychometric scales. Participants showing higher and faster activation peaks following words recall also scored higher on scales measuring negative affect (such as the DASS-21 and PANAS Negative) and lower on scales measuring positive affect (such as the Life Satisfaction Scale and the PANAS Positive). As shown also in past studies [
1], LLMs showed widely varied performances: GPT-4 produced the most similar results to humans, though with lower activation peaks and less variability among simulated participants. Haiku and Opus, on the other hand, showed weaker and statistically nonsignificant correlations between ERT-generated activation levels and the scores of simulated participants across the different scales.
The findings from Study 1 indicate that lexical recall patterns, analyzed through the lens of associative networks and their cognitive activation dynamics can be successfully compared to scales designed to assess mental distress (such as the DASS-21, PANAS and Life Satisfaction scale), with cognitive activation towards words indicating emotional distress (such as
depression,
anxiety and
stress) varying linearly with the scores obtained in these scales. This association confirms that emotional states are tightly linked to lexical memory access, aligning with previous research showing that negative emotions bias memory recall and information processing ([
4,
6]). Individuals with higher DASS-21 anxiety, depression, or stress scores showed significantly stronger, faster and more persistent activation of negative concepts. These distress-concept activation patterns suggest that emotional recall, assessed through free associations, can identify mental distress and is sensitive to the cognitive changes in thoughts patterns found in mental disorders [
36,
37].
Study 2 explored the associations between the Big Five personality traits and the activation of negative emotional concepts, specifically
anxiety,
stress, and
depression, during a spreading activation simulation in cognitive semantic networks. The results revealed that high neuroticism scores correlated significantly and positively with increased activation of negative concepts, particularly depression, while high extraversion scores correlated negatively with depression activation. These results are consistent with the psychological literature identifying neuroticism as a key vulnerability factor for affective disorders, ([
38,
39] and extraversion as a potential protective trait from negative affective states [
40]. These findings have implications on psychological theories, as lexical recall [
9,
10,
41] could potentially be integrated into assessments for at-risk individuals: high centrality scores for
anxiety or
depression or high activation around certain nodes could signal a neuroticism personality trait, which in turn is associated with increased emotional vulnerability.
6.1. Memory in Humans vs. LLMs: The Role of Episodic Memory
The LLMs tested in Study 1 partly failed to replicate human responses, as Large Language Models’ outputs did not exhibit the same associative or emotional patterns observed in human participants. This discrepancy aligns with past ones recently detected in cognitive psychology [
42] and psycholinguistics [
41]. In our mental health case, this discrepancy likely stems from the lack of autobiographical memory [
1,
15]: Emotional states related to personal memory (as in the emotional recall task) also require autobiographical content, which is inherently absent in LLMs.
Study 1 involved the comparison of activation spreading dynamics between human participants and Large Language Models (LLMs). The comparison between the two groups highlighted both similarities and differences. While humans showed wide variability in activation levels across individuals, LLMs showed a sort of “compressed” distribution. This difference could be accounted for by the absence of episodic memory in LLMs, and also reflects the lack of autobiographical depth and emotional self-reference that characterizes human cognition and the memories that human form. Janik [
43] claims that human memory integrates contextual, sensory and emotional features, resulting in richly personal and highly variable memory recall. In contrast, LLMs can only reproduce the statistical co-occurrence mechanisms and semantic regularities found within their learning corpora or due to their training processes [
15].
Nonetheless, despite their lack of personal experience and memories [
1], LLMs surprisingly displayed structured activation trajectories and certain regularities resembling those seen in human participants. This was especially true for GPT-4, whose spreading activation patterns most closely approximated those of humans, especially around the
stress node activation. These similarities suggest that, although LLMs obviously lack the biological [
44] and emotional [
12,
23] foundation of human cognition, their internal representations may partially reflect aspects of human-like semantic organization.
We highlighted how negative concept activation in humans was not only stronger but also more varied, hinting at a role of individual differences, potentially including personality traits, trauma exposure, stress levels or even presence or absence of symptoms of mental distress in modulating the recall patterns in their cognitive network. LLMs, in contrast, showed minimal variability within each model, lacking the heterogeneity that characterizes human emotional cognition, influenced by lived experiences. This again underscores the structural limitations of artificial cognition compared to human mental life. Raz and colleagues [
45] show that LLMs show some similarities to human participants in terms of problem-solving and question complexity performance, warning about distinctive differences in the underlying cognitive processes between humans and LLMs. Mahowald and colleagues [
46] emphasize that while humans and LLMs produce similar outputs in recall or language generation tasks, the underlying cognitive pipelines at work are still fundamentally different.
6.2. Limitations and Future Directions
Several limitations from this study must be addressed. First, while statistically significant, the correlations found were modest and should be interpreted cautiously; their effect sizes suggest that personality is merely one of many factors influencing semantic activation, and individual variability remains high. Second, the Emotional Recall Task [
27] itself may be more sensitive to emotional states than traits, introducing noise into personality associations. While neuroticism is a quite stable trait [
47], the activation of words concerning symptoms or negative concepts in one session may reflect momentary distress rather than trait-like tendencies. Therefore, multiple results over time from the same individuals would be necessary to better identify which results can be linked to personality traits and which to mental state influences. Third, as stated above, the link between mental health and personality traits is still unclear. While in literature some patterns have already emerged linking different personality traits to attitudes [
48], thought patterns and emotional styles, researchers speak of tendencies rather than of clear, straight associations. Finally, although the study links personality types to semantic activation patterns, it does not examine the mechanisms through which this link occurs. Future studies could integrate measures of rumination [
49] and other cognitive styles, like mind-wandering [
44], to better specify these mediators and measure their impact on recall associations.
The strength of the correlations between activation and distress as measured by scales further highlights the possibility of using network-based tools in psychological assessment.
A major consideration emerging from this work concerns whether LLMs can effectively simulate patients with psychiatric symptoms. Despite LLMs being able to reproduce psycholinguistic data with some effectiveness [
50], the question of psychiatric symptoms is way more complex. Here, LLMs were able to partially generate responses that activated the same negative concepts targeted in human participants. However, here we found that the depth and variability of these activations diverged significantly. As described earlier, GPT-4 showed moderate alignment with human-like activation paths, particularly for the
stress node, suggesting some capacity of the model to mimic emotional associations. However, its inconsistencies in correlating with
depression or
anxiety across different models reveals a sort of ceiling effect in their ability to simulate distress dynamics. Nevertheless, there is growing evidence that indicates that LLMs, when prompted appropriately, can mirror certain styles of thoughts [
2,
3,
50] and, importantly, specific emotional tones. Both CounseLLMe by De Duro and colleagues [
29] and other recent work by Wang and colleagues [
51] both demonstrated that GPT-generated patient narratives and interactions well represent the communicative and emotional characteristics of real patients when guided with clinically relevant prompts. These results suggest that while the internal processes of LLMs are not rooted in personal experience, their training on emotionally charged corpora and on patient dialogues allows them to reconstruct the form of emotional expression as found in humans. Therefore, although the underneath mechanisms are different, the outputs produced by LLMs in certain conditions resemble those of human patients. This opens the way to novel intriguing directions for future works, making LLMs an interesting addition to clinical research and practice [
52,
53]. By embedding LLMs into therapeutic simulations for mental health trainees or diagnostic interviews, practitioners could explore differential emotional responses, identify linguistic markers of distress, and test hypotheses about psychopathology dynamics, provided their limitations are clearly understood [
54].