Submitted:
05 July 2025
Posted:
07 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- How accurately can an LLM (Gemma3 12B) simulate individual responses to diverse opinion questions based on structured ANES backstory data?
- How does the type of information provided (Demographic, Attitudinal, Moral) influence the simulation accuracy? Which combinations of variable types are most effective?
- How does the LLM’s performance compare to a standard machine learning classifier (Random Forest) trained on the same data and task?
- How does the LLM’s ability to select relevant variables within a pool affect prediction accuracy?
- How does simulation accuracy vary across different socio-political topics?
- We propose a new framework for conditioning LLMs on structured backstory variables from public survey data to simulate individual and collective opinions.
- We systematically analyze the impact of different types and combinations of background information on simulation accuracy.
- We directly compare LLM performance against traditional machine learning methods on the same survey prediction tasks.
- We demonstrate the viability of LLMs as synthetic populations for simulating public opinion distributions, opening new avenues for low-cost and scalable survey research.
2. Related Work
2.1. Persona Simulation
2.2. LLMs in Social Science Experiments
2.3. Virtual Surveys and Synthetic Respondents
3. Materials and Methods
3.1. ANES 2020 dataset
3.2. Data Assembling and Splitting
3.3. Evaluation Metrics Statistical Analysis
4. Results
4.1. Experimental Sequence
4.2. Experiment 1 - Random and Constant Model Baselines
4.3. Experiment 2 - Impact of Backstory Variable Categories on Simulation Accuracy
4.4. Experiment 3 - Impact of LLM-Driven Feature Selection on Simulation Accuracy
4.5. Experiment 4: Head-to-Head Model Performance Across All Topic and Variable Pool Configurations
5. Conclusions
6. Limitations and Future Research
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Prompts
Appendix A.1. Baseline Feature Selection
|
You are a reasoning assistant trained to understand how personal characteristics shape beliefs and opinions. Your task is to analyze a set of variables describing individuals in the U.S. and select the variables most predictive of a person’s response to a survey question on the topic of {TOPIC}. Below are the variables grouped by theme and their descriptions. Demographic race: Self-identified race age: Age gender: Male or female income: Annual family income education: Education level occupation: Employment type city_rural: Identifies as urban or rural children: Number of children health_insurance: Has health insurance or not Attitudes & Political Orientation ideology: Political ideology (liberal, conservative, etc.) party: Political party affiliation interested_politics: Interested in politics or not trust_media: Trust in mainstream media science: Belief that people need help from experts to understand science vaccines_autism: Belief that vaccines cause autism religion_importance: How important religion is in the respondent’s life Moral Compass & Social Values child_trait: Most important child trait (e.g., obedience vs. self-reliance) death_penalty: Support or opposition to the death penalty birth_citizenship: View on banning birthright citizenship for children of undocumented immigrants children_sent_back: View on deporting children of undocumented immigrants discrimination_woman: Perceived discrimination against women black_discrimination: Perceived discrimination against Black people black_hist: Belief that past racism still affects Black Americans today gays_discrimination: Perceived discrimination against gay people muslins_discrimination: Perceived discrimination against Muslims Survey Question We want to predict the respondent’s answer to the following question: Question: {QUESTION} Answer choices: {ANSWER CHOICES} Instructions Based on your understanding of human behavior, select exactly one variable from each of the three groups above that you believe is the most predictive of how someone would respond to the question above. Respond only in this JSON format: { "selected_variables": ["variable_1", "variable_2", "variable_3"] } |
Appendix A.2. Variables Pool Feature Selection
|
You are a reasoning assistant trained to understand how personal characteristics shape beliefs and opinions. Your task is to analyze a set of variables describing individuals in the U.S. and select the variables most predictive of a person’s response to a survey question on the topic of {TOPIC}. Below are the variables grouped by theme and their descriptions. Demographic race: Self-identified race age: Age gender: Male or female income: Annual family income education: Education level occupation: Employment type city_rural: Identifies as urban or rural children: Number of children health_insurance: Has health insurance or not Attitudes & Political Orientation ideology: Political ideology (liberal, conservative, etc.) party: Political party affiliation interested_politics: Interested in politics or not trust_media: Trust in mainstream media science: Belief that people need help from experts to understand science vaccines_autism: Belief that vaccines cause autism religion_importance: How important religion is in the respondent’s life Moral Compass & Social Values child_trait: Most important child trait (e.g., obedience vs. self-reliance) death_penalty: Support or opposition to the death penalty birth_citizenship: View on banning birthright citizenship for children of undocumented immigrants children_sent_back: View on deporting children of undocumented immigrants discrimination_woman: Perceived discrimination against women black_discrimination: Perceived discrimination against Black people black_hist: Belief that past racism still affects Black Americans today gays_discrimination: Perceived discrimination against gay people muslins_discrimination: Perceived discrimination against Muslims Survey Question We want to predict the respondent’s answer to the following question: Question: {QUESTION} Answer choices: {ANSWER CHOICES} Instructions Based on your understanding of human behavior, select only the variables that are most predictive of how someone would respond to the question above. Favor clarity and precision over length. Respond only in this JSON format: { "selected_variables": ["variable_1", "variable_2", "..."] } |
Appendix A.3. Baseline Feature Selection: Gun Regulation Example
|
You are a reasoning assistant trained to understand how personal characteristics shape beliefs and opinions. Your task is to analyze a set of variables describing individuals in the U.S. and select the variables most predictive of a person’s response to a survey question on the topic of “Gun regulation”. Below are the variables grouped by theme and their descriptions. Demographic race: Self-identified race age: Age gender: Male or female income: Annual family income education: Education level occupation: Employment type city_rural: Identifies as urban or rural children: Number of children health_insurance: Has health insurance or not Attitudes & Political Orientation ideology: Political ideology (liberal, conservative, etc.) party: Political party affiliation interested_politics: Interested in politics or not trust_media: Trust in mainstream media science: Belief that people need help from experts to understand science vaccines_autism: Belief that vaccines cause autism religion_importance: How important religion is in the respondent’s life Moral Compass & Social Values child_trait: Most important child trait (e.g., obedience vs. self-reliance) death_penalty: Support or opposition to the death penalty birth_citizenship: View on banning birthright citizenship for children of undocumented immigrants children_sent_back: View on deporting children of undocumented immigrants discrimination_woman: Perceived discrimination against women black_discrimination: Perceived discrimination against Black people black_hist: Belief that past racism still affects Black Americans today gays_discrimination: Perceived discrimination against gay people muslins_discrimination: Perceived discrimination against Muslims Survey Question We want to predict the respondent’s answer to the following question: Question: Do you think the federal government should make it more difficult for people to buy a gun than it is now, make it easier for people to buy a gun, or keep these rules about the same as they are now? Answer choices: (1) More difficult (2) Easier (3) Keep these rules about the same Instructions Based on your understanding of human behavior, select exactly one variable from each of the three groups above that you believe is the most predictive of how someone would respond to the question above. Respond only in this JSON format: { "selected_variables": ["variable_1", "variable_2", "variable_3"] } |
Appendix A.4. Variables Pool Feature Selection: Gun Regulation Example
|
You are a reasoning assistant trained to understand how personal characteristics shape beliefs and opinions. Your task is to analyze a set of variables describing individuals in the U.S. and select the variables most predictive of a person’s response to a survey question on the topic of “Gun regulation”. Below are the variables grouped by theme and their descriptions. Demographic race: Self-identified race age: Age gender: Male or female income: Annual family income education: Education level occupation: Employment type city_rural: Identifies as urban or rural children: Number of children health_insurance: Has health insurance or not Attitudes & Political Orientation ideology: Political ideology (liberal, conservative, etc.) party: Political party affiliation interested_politics: Interested in politics or not trust_media: Trust in mainstream media science: Belief that people need help from experts to understand science vaccines_autism: Belief that vaccines cause autism religion_importance: How important religion is in the respondent’s life Moral Compass & Social Values child_trait: Most important child trait (e.g., obedience vs. self-reliance) death_penalty: Support or opposition to the death penalty birth_citizenship: View on banning birthright citizenship for children of undocumented immigrants children_sent_back: View on deporting children of undocumented immigrants discrimination_woman: Perceived discrimination against women black_discrimination: Perceived discrimination against Black people black_hist: Belief that past racism still affects Black Americans today gays_discrimination: Perceived discrimination against gay people muslins_discrimination: Perceived discrimination against Muslims Survey Question We want to predict the respondent’s answer to the following question: Question: Do you think the federal government should make it more difficult for people to buy a gun than it is now, make it easier for people to buy a gun, or keep these rules about the same as they are now? Answer choices: (1) More difficult (2) Easier (3) Keep these rules about the same Instructions Based on your understanding of human behavior, select only the variables that are most predictive of how someone would respond to the question above. Favor clarity and precision over length. Respond only in this JSON format: { "selected_variables": ["variable_1", "variable_2", "..."] } |
Appendix A.5. Survey
|
Roleplay the person below. The date is November 3, 2020. When questioned, answer just with the option number and nothing more. {SELECTED BACKSTORY VARIABLE 1.} {SELECTED BACKSTORY VARIABLE 2.} {...} {SELECTED BACKSTORY VARIABLE N.} Question: {QUESTION} Answer choices: {ANSWER CHOICES} My answer is |
Appendix A.6. Survey: Gun Regulation Example with all backstory variables
|
Roleplay the person below. The date is November 3, 2020. When questioned, answer just with the option number and nothing more. Racially, you are White. Ideologically, you are slightly conservative. Politically, you are not very strong Republican. Religion is moderately important in my life. You are 42 years old. You are a man. You are not at all interested in politics. Your total family income is $175,000-249,999. At the moment you do have health insurance. You have a Bachelor’s degree. Your employment is best described as working for a for-profit company or organization. You consider yourself a small-town person. You have two children. You believe that people need a lot of help from experts to understand science. You believe that most scientific evidence shows childhood vaccines do not cause autism. You have little trust in the media when it comes to reporting the news accurately and fairly. You believe that when it comes to obedience vs self-reliance, obedience is a more important trait for a child to have. You believe there is a moderate amount of discrimination against women in the US today. You favor a moderate amount that children of unauthorized immigrants do not automatically get citizenship if they are born in the US. You oppose a little that children of unauthorized immigrants who were brought to the US illegally and have lived here for at least 10 years should be deported. You disagree somewhat that generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class. You believe there is a moderate amount of discrimination against blacks in the US today. You believe there is a moderate amount of discrimination against gays and lesbians in the US today. You believe there is a lot of discrimination against muslins in the US today. You favor the death penalty. Question: Do you think the federal government should make it more difficult for people to buy a gun than it is now, make it easier for people to buy a gun, or keep these rules about the same as they are now? Answer choices: (1) More difficult (2) Easier (3) Keep these rules about the same My answer is |
Appendix B. 95% Bootstrap Confidence Intervals for Performance Metrics
Appendix B.1. Experiment 1: LLM Baseline
| Topic | F1-score | JS | Cramér’s V |
|---|---|---|---|
| Climate Change | [0.574, 0.604] | [0.201, 0.228] | [0.394, 0.435] |
| Current Economy | [0.5, 0.531] | [0.071, 0.1] | [0.266, 0.31] |
| Drug Addiction | [0.541, 0.573] | [0.187, 0.221] | [0.087, 0.133] |
| Gay Marriage | [0.665, 0.698] | [0.083, 0.119] | [0.278, 0.323] |
| Gender Role | [0.592, 0.623] | [0.1, 0.133] | [0.182, 0.227] |
| Gun Regulation | [0.569, 0.601] | [0.237, 0.269] | [0.276, 0.312] |
| Health Insurance Policy | [0.574, 0.608] | [0.218, 0.246] | [0.268, 0.313] |
| Income Inequality | [0.592, 0.622] | [0.049, 0.078] | [0.377, 0.419] |
| Race Diversity | [0.584, 0.617] | [0.175, 0.205] | [0.196, 0.254] |
| Refugee Allowing | [0.456, 0.491] | [0.35, 0.383] | [0.232, 0.275] |
Appendix B.2. Experiment 1: LLM Baseline Gains against Random model
| Topic | Gain F1-score | Gain JS | Gain Cramér’s V |
|---|---|---|---|
| Climate change | [0.201, 0.246]* | [0.055, 0.094]* | [0.336, 0.39]* |
| Current economy | [0.144, 0.188]* | [0.001, 0.055]* | [0.203, 0.263]* |
| Drug addiction | [0.134, 0.178]* | [0.153, 0.192]* | [0.028, 0.088]* |
| Gay marriage | [0.277, 0.319]* | [0.204, 0.255]* | [0.224, 0.284]* |
| Gender role | [0.201, 0.246]* | [0.201, 0.251]* | [0.117, 0.178]* |
| Gun regulation | [0.184, 0.23]* | [0.027, 0.077]* | [0.22, 0.271]* |
| Health insurance policy | [0.205, 0.248]* | [-0.016, 0.038] | [0.214, 0.271]* |
| Income inequality | [0.234, 0.276]* | [0.037, 0.083]* | [0.324, 0.377]* |
| Race diversity | [0.213, 0.254]* | [0.103, 0.159]* | [0.141, 0.212]* |
| Refugee allowing | [0.096, 0.14]* | [-0.169, -0.122]* | [0.181, 0.236]* |
Appendix B.3. Experiment 1: LLM Baseline Gains against Constant model
| Topic | Gain F1-score | Gain JS | Gain Cramér’s V |
|---|---|---|---|
| Climate change | [0.15, 0.198]* | [0.272, 0.313]* | - |
| Current economy | [0.206, 0.247]* | [0.488, 0.514]* | - |
| Drug addiction | [-0.036, -0.004]* | [0.188, 0.232]* | - |
| Gay marriage | [0.076, 0.105]* | [0.284, 0.322]* | - |
| Gender role | [0.066, 0.108]* | [0.308, 0.349]* | - |
| Gun regulation | [0.181, 0.23]* | [0.255, 0.299]* | - |
| Health insurance policy | [0.157, 0.191]* | [0.26, 0.289]* | - |
| Income inequality | [0.272, 0.308]* | [0.486, 0.523]* | - |
| Race diversity | [0.163, 0.199]* | [0.303, 0.329]* | - |
| Refugee allowing | [0.06, 0.086]* | [0.133, 0.165]* | - |
Appendix B.4. Experiment 2: LLM best performance
| Topic | F1-score | JS | Cramér’s V |
|---|---|---|---|
| Climate Change | [0.681, 0.709] | [0.131, 0.161] | [0.42, 0.461] |
| Current Economy | [0.52, 0.55] | [0.065, 0.096] | [0.33, 0.369] |
| Drug Addiction | [0.599, 0.638] | [0.1, 0.135] | [0.093, 0.161] |
| Gay Marriage | [0.649, 0.68] | [0.17, 0.206] | [0.295, 0.339] |
| Gender Role | [0.616, 0.647] | [0.034, 0.068] | [0.176, 0.223] |
| Gun Regulation | [0.608, 0.639] | [0.188, 0.219] | [0.326, 0.408] |
| Health Insurance Policy | [0.605, 0.636] | [0.047, 0.082] | [0.361, 0.415] |
| Income Inequality | [0.595, 0.624] | [0.088, 0.119] | [0.398, 0.435] |
| Race Diversity | [0.588, 0.621] | [0.062, 0.093] | [0.243, 0.285] |
| Refugee Allowing | [0.549, 0.582] | [0.093, 0.129] | [0.285, 0.327] |
Appendix B.5. Experiment 2: LLM performance gains against Experiment 1 LLM baseline
| Topic | Gain F1-score | Gain JS | Gain Cramér’s V |
|---|---|---|---|
| Climate Change | [0.092, 0.122]* | [0.051, 0.085]* | [0.006, 0.047]* |
| Current Economy | [0.001, 0.035]* | [-0.016, 0.025] | [0.04, 0.082]* |
| Drug Addiction | [0.046, 0.079]* | [0.059, 0.109]* | [-0.021, 0.054] |
| Gay Marriage | [-0.028, -0.005]* | [-0.104, -0.071]* | [-0.004, 0.033] |
| Gender Role | [0.01, 0.038]* | [0.037, 0.098]* | [-0.025, 0.018] |
| Gun Regulation | [0.026, 0.051]* | [0.038, 0.061]* | [0.038, 0.115]* |
| Health Insurance Policy | [0.015, 0.043]* | [0.139, 0.195]* | [0.068, 0.128]* |
| Income Inequality | [-0.01, 0.016] | [-0.053, -0.028]* | [0.004, 0.037]* |
| Race Diversity | [-0.01, 0.019] | [0.095, 0.13]* | [0.012, 0.069]* |
| Refugee Allowing | [0.076, 0.108]* | [0.237, 0.273]* | [0.032, 0.072]* |
Appendix B.6. Experiment 2: LLM performance diff against RF model in Experiment 2
| Topic | Diff F1-score | Diff JS | Diff Cramér’s V |
|---|---|---|---|
| Climate Change | [-0.038, -0.011]* | [-0.038, -0.002]* | [-0.041, 0.001] |
| Current Economy | [-0.043, -0.007]* | [0.04, 0.097]* | [-0.038, -0.006]* |
| Drug Addiction | [0.015, 0.036]* | [0.163, 0.196]* | [0.012, 0.104]* |
| Gay Marriage | [-0.04, -0.012]* | [-0.06, -0.017]* | [-0.01, 0.042] |
| Gender Role | [0.001, 0.029]* | [0.156, 0.203]* | [-0.056, 0.009] |
| Gun Regulation | [-0.091, -0.062]* | [-0.06, -0.027]* | [-0.212, -0.121]* |
| Health Insurance Policy | [-0.01, 0.017] | [0.093, 0.15]* | [0.025, 0.083]* |
| Income Inequality | [-0.014, 0.016] | [0.003, 0.061]* | [-0.008, 0.024] |
| Race Diversity | [-0.065, -0.035]* | [0.052, 0.101]* | [-0.192, -0.143]* |
| Refugee Allowing | [-0.077, -0.043]* | [-0.032, 0.022] | [-0.066, -0.019]* |
Appendix B.7. Experiment 3: LLM best performance
| Topic | F1-score | JS | Cramér’s V |
|---|---|---|---|
| Climate Change | [0.679, 0.708] | [0.069, 0.103] | [0.404, 0.447] |
| Current Economy | [0.533, 0.565] | [0.051, 0.081] | [0.339, 0.38] |
| Drug Addiction | [0.617, 0.65] | [0.052, 0.087] | [0.122, 0.189] |
| Gay Marriage | [0.647, 0.681] | [0.142, 0.175] | [0.295, 0.338] |
| Gender Role | [0.588, 0.618] | [0.067, 0.101] | [0.208, 0.25] |
| Gun Regulation | [0.616, 0.647] | [0.164, 0.194] | [0.338, 0.439] |
| Health Insurance Policy | [0.564, 0.597] | [0.152, 0.187] | [0.317, 0.403] |
| Income Inequality | [0.6, 0.63] | [0.013, 0.037] | [0.391, 0.43] |
| Race Diversity | [0.587, 0.619] | [0.022, 0.054] | [0.228, 0.271] |
| Refugee Allowing | [0.54, 0.573] | [0.126, 0.16] | [0.277, 0.318] |
Appendix B.8. Experiment 3: LLM performance diff against RF model in Experiment 2
| Topic | Diff F1-score | Diff JS | Diff Cramér’s V |
|---|---|---|---|
| Climate Change | [-0.04, -0.014]* | [0.021, 0.059]* | [-0.058, -0.016]* |
| Current Economy | [-0.027, 0.006] | [0.052, 0.111]* | [-0.029, 0.004] |
| Drug Addiction | [0.026, 0.051]* | [0.202, 0.251]* | [0.039, 0.128]* |
| Gay Marriage | [-0.04, -0.014]* | [-0.032, 0.018] | [-0.01, 0.04] |
| Gender Role | [-0.03, 0.005] | [0.121, 0.178]* | [-0.03, 0.039] |
| Gun Regulation | [-0.083, -0.054]* | [-0.037, 0.001] | [-0.2, -0.095]* |
| Health Insurance Policy | [-0.049, -0.024]* | [-0.013, 0.047] | [-0.013, 0.063] |
| Income Inequality | [-0.006, 0.021] | [0.088, 0.13]* | [-0.015, 0.016] |
| Race Diversity | [-0.065, -0.036]* | [0.094, 0.135]* | [-0.205, -0.154]* |
| Refugee Allowing | [-0.087, -0.052]* | [-0.061, -0.01]* | [-0.078, -0.029]* |
Appendix C. Computational Environment and Software
- scikit-learn==1.5.2
- scipy==1.14.1
- ollama==0.4.8
Appendix D. Random Forest Classifier Parameters
- n_estimators = 100: The number of trees in the forest.
- max_depth = 10: The maximum depth of each tree.
- min_samples_split = 2: The minimum number of samples required to split an internal node.
- min_samples_leaf = 2: The minimum number of samples required to be at a leaf node.
- random_state: A specific integer value was used for the random_state parameter in each experimental run to ensure reproducibility of the results.
Appendix E. Appendix D: Mathematical Definitions of Metrics
Appendix E.1. F1-Score
- True Positives (TP): The number of positive instances correctly classified as positive.
- False Positives (FP): The number of negative instances incorrectly classified as positive (Type I error).
- False Negatives (FN): The number of positive instances incorrectly classified as negative (Type II error).
- True Negatives (TN): The number of negative instances correctly classified as negative.
- weighted average: Calculates metrics for each class and finds their average, weighted by the number of true instances for each class (support). This accounts for class imbalance.where is the proportion of true instances belonging to class c.
- F1-score ranges from 0 to 1.
- indicates perfect precision and recall.
- if either precision or recall (or both) is 0.
- It is useful when there is an uneven class distribution as it considers both false positives and false negatives.
Appendix E.2. Jensen-Shannon Distance (JSD)
- JSD is symmetric: .
- JSD is non-negative: .
- JSD is bounded: (if using natural log) or (if using ).
- if and only if .
- The JS Distance (square root of JSD) is a true metric.
Appendix E.3. Cramér’s V
- V ranges from 0 to 1, inclusive.
- indicates no association between the variables.
- indicates a perfect association between the variables.
- It is a symmetric measure.
- 0.00 to < 0.10: Negligible or very weak association.
- 0.10 to < 0.20 (or < 0.30): Weak or small association.
- 0.20 (or 0.30) to < 0.40 (or < 0.50): Moderate association.
- 0.40 (or 0.50) and above: Strong or large association.
References
- Feng, S.; Park, C.Y.; Liu, Y.; Tsvetkov, Y. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. arXiv 2023, arXiv:2305.08283. [Google Scholar] [CrossRef]
- Schramowski, P.; Turan, C.; Andersen, N.; Rothkopf, C.A.; Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence 2022, 4, 258–268. [Google Scholar] [CrossRef]
- Lin, L.; Wang, L.; Guo, J.; Wong, K.F. Investigating Bias in LLM-Based Bias Detection: Disparities between LLMs and Human Perception. arXiv 2024, arXiv:cs.CY/2403.14896. [Google Scholar] [CrossRef]
- Raj, C.; Mukherjee, A.; Caliskan, A.; Anastasopoulos, A.; Zhu, Z. Breaking bias, building bridges: Evaluation and mitigation of social biases in llms via contact hypothesis. In Proceedings of the Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2024, Vol. 7, pp. 1180–1189. [CrossRef]
- Santurkar, S.; Durmus, E.; Ladhak, F.; Lee, C.; Liang, P.; Hashimoto, T. Whose opinions do language models reflect? In Proceedings of the International Conference on Machine Learning. PMLR, 2023, pp. 29971–30004. [CrossRef]
- Argyle, L.P.; Busby, E.C.; Fulda, N.; Gubler, J.R.; Rytting, C.; Wingate, D. Out of one, many: Using language models to simulate human samples. Political Analysis 2023, 31, 337–351. [Google Scholar] [CrossRef]
- Jones, E.; Steinhardt, J. Capturing failures of large language models via human cognitive biases. Advances in Neural Information Processing Systems 2022, 35, 11785–11799. [Google Scholar] [CrossRef]
- Dasgupta, I.; Lampinen, A.K.; Chan, S.C.; Sheahan, H.R.; Creswell, A.; Kumaran, D.; McClelland, J.L.; Hill, F. Language models show human-like content effects on reasoning tasks. arXiv 2022, arXiv:2207.07051. [Google Scholar] [CrossRef]
- Jiang, G.; Xu, M.; Zhu, S.C.; Han, W.; Zhang, C.; Zhu, Y. Evaluating and inducing personality in pre-trained language models. Advances in Neural Information Processing Systems 2023, 36, 10622–10643. [Google Scholar]
- Park, J.S.; O’Brien, J.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative agents: Interactive simulacra of human behavior. In Proceedings of the Proceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22. [CrossRef]
- Wang, Z.; Zhang, D.; Agrawal, I.; Gao, S.; Song, L.; Chen, X. Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs. arXiv 2025, arXiv:2502.12988. [Google Scholar] [CrossRef]
- Xu, R.; Wang, X.; Chen, J.; Yuan, S.; Yuan, X.; Liang, J.; Chen, Z.; Dong, X.; Xiao, Y. Character is Destiny: Can Large Language Models Simulate Persona-Driven Decisions in Role-Playing? arXiv e-prints 2024, arXiv–2404. [Google Scholar] [CrossRef]
- Liu, Y.; Sharma, P.; Oswal, M.J.; Xia, H.; Huang, Y. Personaflow: Boosting research ideation with llm-simulated expert personas. arXiv 2024, arXiv:2409.12538. [Google Scholar] [CrossRef]
- Cheng, M.; Piccardi, T.; Yang, D. CoMPosT: Characterizing and evaluating caricature in LLM simulations. arXiv 2023, arXiv:2310.11501. [Google Scholar] [CrossRef]
- Aher, G.V.; Arriaga, R.I.; Kalai, A.T. Using large language models to simulate multiple humans and replicate human subject studies. In Proceedings of the International Conference on Machine Learning. PMLR, 2023, pp. 337–371. [CrossRef]
- Chen, Y.; Hu, Y.; Lu, Y. Predicting Field Experiments with Large Language Models. arXiv 2025, arXiv:2504.01167. [Google Scholar] [CrossRef]
- Zhang, X.; Lin, J.; Mou, X.; Yang, S.; Liu, X.; Sun, L.; Lyu, H.; Yang, Y.; Qi, W.; Chen, Y.; et al. SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users. arXiv 2025, arXiv:2504.10157. [Google Scholar] [CrossRef]
- Anthis, J.R.; Liu, R.; Richardson, S.M.; Kozlowski, A.C.; Koch, B.; Evans, J.; Brynjolfsson, E.; Bernstein, M. LLM Social Simulations Are a Promising Research Method. arXiv 2025, arXiv:2504.02234. [Google Scholar] [CrossRef]
- Qu, Y.; Wang, J. Performance and biases of Large Language Models in public opinion simulation. Humanities and Social Sciences Communications 2024, 11, 1–13. [Google Scholar] [CrossRef]
- Jiang, S.; Wei, L.; Zhang, C. Donald Trumps in the Virtual Polls: Simulating and Predicting Public Opinions in Surveys Using Large Language Models. arXiv 2024, arXiv:2411.01582. [Google Scholar] [CrossRef]
- Rakovics, Z.; Rakovics, M. Exploring the potential and limitations of large language models as virtual respondents for social science research. Intersections. East European Journal of Society and Politics 2024, 10, 126–147. [Google Scholar] [CrossRef]
- Jeong, H.J.; Lee, W.C. The level of collapse we are allowed: Comparison of different response scales in Safety Attitudes Questionnaire. Biom Biostat Int J 2016, 4, 00100. [Google Scholar] [CrossRef]




| Variable | Choices |
|---|---|
| Race | 1. White 2. Black 3. Hispanic 4. Asian 5. Native American 6. Mixed |
| Age | [free form value] |
| Gender | 1. Male 2. Female |
| Income (all family) | 1 Under $9,999 2. $10,000-14,999 3. $15,000-19,999 … 15. $80,000-89,999 16. $90,000-99,999 17. $100,000-109,999 … 21. $175,000-249,999 22. $250,000 or more |
| Education | 1. Less than high school credential 2. High school credential 3. Some post-high school, no bachelor’s degree 4. Bachelor’s degree 5. Graduate degree |
| Occupation | 1. For-profit company or organization 2. Non-profit organization 3. Local government 4. State government 5. Military 6. Federal government, as a civilian employee 7. Owner of non-incorporated business 8. Owner of incorporated business 9. for-profit family business |
| City or rural | 1. City person 2. Suburb person 3. Small-town person 4. Country person 5. Neither a city nor rural person |
| Children | 0. No children 1. One child 2. Two children 3. Three children 4. Four or more children |
| Has health insurance | 1. Yes 2. No |
| Variable | Choices |
|---|---|
| Ideology | 1. Extremely liberal 2. Liberal 3. Slightly liberal 4. Moderate 5. Slightly conservative 6. Conservative 7. Extremely conservative |
| Party | 1. Strong Democrat 2. Not very strong Democrat 3. Independent who leans Democratic 4. Independent 5. Independent who leans Republican 6. Not very strong Republican 7. Strong Republican |
| Interested in politics | 1. Very interested 2. Somewhat interested 3. Not very interested 4. Not at all interested |
| Trust Media | 1. No 2. A little 3. A moderate amount 4. A lot 5. A great deal |
| Vaccines & Autism | 1. Most scientific evidence shows childhood vaccines cause autism 2. Most scientific evidence shows childhood vaccines do not cause autism |
| Science Experts Necessity | 1. Do not need 2. Need a little 3. Need a moderate amount 4. Need a lot 5. Need a great deal |
| Religion importance | 1. Extremely important 2. Very important 3. Moderately important 4. Little importance 5. Not important at all |
| Variable | Choices |
|---|---|
| Preferred Child Trait | 1. Obedience 2. Self-reliance 3. Both 4. Neither |
| Death Penalty | 1. Favor 2. Oppose |
| Birthright Citizenship End | 1. Favor a great deal 2. Favor a moderate amount 3. Favor a little 4. Neither favor nor oppose 5. Oppose a little 6. Oppose a moderate amount 7. Oppose a great deal |
| Children Deportation | 1. Favor a great deal 2. Favor a moderate amount 3. Favor a little 4. Oppose a little. 5. Oppose a moderate amount 6. Oppose a great deal |
| Discrimination Women | 1. A great deal 2. A lot 3. A moderate amount 4. A little 5. None |
| Discrimination Black | 1. A great deal 2. A lot 3. A moderate amount 4. A little 5. None |
| Discrimination Gays | 1. A great deal 2. A lot 3. A moderate amount 4. A little 5. None |
| Discrimination Muslins | 1. A great deal 2. A lot 3. A moderate amount 4. A little 5. None |
| Historic Racism Impact | 1. Agree strongly 2. Agree somewhat 3. Neither agree nor disagree 4. Disagree somewhat 5. Disagree strongly |
| Variable | Question | Choices |
|---|---|---|
| Race diversity | Does the increasing number of people of many different races and ethnic groups in the United States make this country a better place to live, a worse place to live, or does it make no difference? | 1. Better 2. Worse 3. Makes no difference |
| Gender role | Do you think it is better, worse, or makes no difference for the family as a whole if the man works outside the home and the woman takes care of the home and family? | 1. Better 2. Worse 3. Makes no difference |
| Current Economy* | What do you think about the state of the economy these days in the United States? | 1. Good 2. Neither good nor bad 3. Bad |
| Drug addiction | Do you think the federal government should be doing more about the opioid drug addiction issue, should be doing less, or is it currently doing the right amount? | 1. Should be doing more 2. Should be doing less 3. Is doing the right amount |
| Climate change* | How much, if at all, do you think climate change is currently affecting severe weather events or temperature patterns in the United States? | 1. Not at all 2. A little 3. A lot |
| Gay marriage | Which comes closest to your view regarding gay and lesbian couples? | 1. They should be allowed to legally marry 2. They should be allowed to form civil unions but not legally marry 3. There should be no legal recognition of gay or lesbian couples relationship |
| Refugee allowing | Do you favor, oppose, or neither favor nor oppose allowing refugees who are fleeing war, persecution, or natural disasters in other countries to come to live in the U.S.? | 1. Favor 2. Oppose 3. Neither favor nor oppose |
| Health insurance | Do you favor an increase, decrease, or no change in government spending to help people pay for health insurance when people cannot pay for it all themselves? | 1. Increase 2. Decrease 3. No change |
| Gun regulation | Do you think the federal government should make it more difficult for people to buy a gun than it is now, make it easier for people to buy a gun, or keep these rules about the same as they are now? | 1. More difficult 2. Easier 3. Keep these rules about the same |
| Income inequality | Do you favor, oppose, or neither favor nor oppose the government trying to reduce the difference in incomes between the richest and poorest households? | 1. Favor 2. Oppose 3. Neither favor nor oppose |
| Topic | Model | F1-score | JSD | Cramér’s V |
|---|---|---|---|---|
| Climate Change | Gemma 3 12B | 0.59 | 0.21 | 0.41 |
| Random | 0.36 | 0.29 | 0.04 | |
| Constant | 0.41 | 0.51 | - | |
| Current Economy | Gemma 3 12B | 0.52 | 0.08 | 0.29 |
| Random | 0.35 | 0.11 | 0.04 | |
| Constant | 0.29 | 0.59 | - | |
| Drug Addiction | Gemma 3 12B | 0.56 | 0.20 | 0.10 |
| Random | 0.40 | 0.37 | 0.04 | |
| Constant | 0.58 | 0.41 | - | |
| Gay Marriage | Gemma 3 12B | 0.68 | 0.10 | 0.30 |
| Random | 0.38 | 0.33 | 0.03 | |
| Constant | 0.59 | 0.40 | - | |
| Gender Role | Gemma 3 12B | 0.61 | 0.12 | 0.20 |
| Random | 0.38 | 0.34 | 0.04 | |
| Constant | 0.52 | 0.44 | - | |
| Gun Regulation | Gemma 3 12B | 0.59 | 0.25 | 0.29 |
| Random | 0.38 | 0.30 | 0.03 | |
| Constant | 0.38 | 0.53 | - | |
| Health Insurance Policy | Gemma 3 12B | 0.59 | 0.23 | 0.29 |
| Random | 0.36 | 0.24 | 0.03 | |
| Constant | 0.42 | 0.51 | - | |
| Income Inequality | Gemma 3 12B | 0.61 | 0.06 | 0.40 |
| Random | 0.35 | 0.12 | 0.03 | |
| Constant | 0.32 | 0.57 | - | |
| Race Diversity | Gemma 3 12B | 0.60 | 0.19 | 0.21 |
| Random | 0.37 | 0.32 | 0.03 | |
| Constant | 0.42 | 0.51 | - | |
| Refugee Allowing | Gemma 3 12B | 0.47 | 0.37 | 0.25 |
| Random | 0.36 | 0.22 | 0.03 | |
| Constant | 0.40 | 0.52 | - |
| Topic | F1-score | JS | Cramér’s V | F1-score pool | JS pool | Cramér’s V pool |
|---|---|---|---|---|---|---|
| Climate Change | 0.70 | 0.15 | 0.44 | A+M | A+M | A+M |
| Current Economy | 0.53 | 0.08 | 0.35 | A | A | A+M |
| Drug Addiction | 0.62 | 0.12 | 0.12 | M | A | M |
| Gay Marriage | 0.66 | 0.19 | 0.31 | A | A | A |
| Gender Role | 0.63 | 0.05 | 0.20 | D+A+M | A | D+A |
| Gun Regulation | 0.62 | 0.20 | 0.35 | A | A | D+A |
| Health Insurance Policy | 0.62 | 0.06 | 0.39 | D+A+M | D+A | M |
| Income Inequality | 0.61 | 0.10 | 0.42 | A+M | A | A+M |
| Race Diversity | 0.60 | 0.08 | 0.26 | A | A | D+A |
| Refugee Allowing | 0.57 | 0.11 | 0.30 | A | A | D+A+M |
| Topic | F1-score | JS | Cramér’s V | F1-score pool | JS pool | Cramér’s V pool |
|---|---|---|---|---|---|---|
| Climate Change | 0.72 | 0.12 | 0.46 | A+M | M | A+M |
| Current Economy | 0.56 | 0.15 | 0.37 | A+M | M | A+M |
| Drug Addiction | 0.59 | 0.30 | 0.06 | A+M | M | D+A+M |
| Gay Marriage | 0.69 | 0.15 | 0.30 | D+A | A | A+M |
| Gender Role | 0.62 | 0.23 | 0.22 | A+M | A | D+A+M |
| Gun Regulation | 0.70 | 0.16 | 0.53 | D+A+M | M | D+A+M |
| Health Insurance Policy | 0.62 | 0.18 | 0.33 | A+M | M | A+M |
| Income Inequality | 0.61 | 0.13 | 0.41 | A+M | A | A+M |
| Race Diversity | 0.65 | 0.15 | 0.43 | D+A+M | M | D+A+M |
| Refugee Allowing | 0.62 | 0.11 | 0.35 | A+M | M | A+M |
| Topic | Gain F1-score | Gain JS | Gain Cramér’s V |
|---|---|---|---|
| Climate Change | 0.11* | 0.07* | 0.03* |
| Current Economy | 0.02* | 0.00 | 0.06* |
| Drug Addiction | 0.06* | 0.09* | 0.02 |
| Gay Marriage | -0.02* | -0.09* | 0.02 |
| Gender Role | 0.02* | 0.07* | -0.00 |
| Gun Regulation | 0.04* | 0.05* | 0.05* |
| Health Insurance Policy | 0.03* | 0.17* | 0.10* |
| Income Inequality | 0.00 | -0.04* | 0.02* |
| Race Diversity | 0.00 | 0.11* | 0.05* |
| Refugee Allowing | 0.09* | 0.26* | 0.05* |
| Topic | Diff F1-score | Diff JS | Diff Cramér’s V |
|---|---|---|---|
| Climate Change | -0.02* | -0.02* | -0.02 |
| Current Economy | -0.03* | 0.07* | -0.02* |
| Drug Addiction | 0.02* | 0.18* | 0.06* |
| Gay Marriage | -0.03* | -0.04* | 0.02 |
| Gender Role | 0.01* | 0.18* | -0.03 |
| Gun Regulation | -0.08* | -0.04* | -0.19* |
| Health Insurance Policy | 0.00 | 0.12* | 0.06* |
| Income Inequality | 0.00 | 0.03* | 0.01 |
| Race Diversity | -0.05* | 0.08* | -0.17* |
| Refugee Allowing | -0.06* | -0.00 | -0.04* |
| Topic | F1-score | JS | Cramér’s V | F1-score pool | JS pool | Cramér’s V pool |
|---|---|---|---|---|---|---|
| Climate Change | 0.69 | 0.09 | 0.42 | D+A | D+A | D+A+M |
| Current Economy | 0.55 | 0.06 | 0.36 | A+M | A | A |
| Drug Addiction | 0.63 | 0.07 | 0.15 | D+M | D+A | D+M |
| Gay Marriage | 0.66 | 0.16 | 0.31 | D+A+M | A | D+A |
| Gender Role | 0.60 | 0.08 | 0.23 | D+A+M | D+A+M | D+A+M |
| Gun Regulation | 0.63 | 0.18 | 0.36 | D+A | A+M | A |
| Health Insurance Policy | 0.58 | 0.17 | 0.34 | D+M | A+M | D+M |
| Income Inequality | 0.62 | 0.02 | 0.41 | A | A | A |
| Race Diversity | 0.60 | 0.03 | 0.25 | D+A | D+A | D+A |
| Refugee Allowing | 0.56 | 0.14 | 0.29 | D+A | D | D+A |
| Topic | Diff F1-score | Diff JS | Diff Cramér’s V |
|---|---|---|---|
| Climate Change | -0.03* | 0.04* | -0.04* |
| Current Economy | -0.01 | 0.08* | -0.01 |
| Drug Addiction | 0.04* | 0.23* | 0.09* |
| Gay Marriage | -0.03* | -0.01 | 0.02 |
| Gender Role | -0.01 | 0.15* | 0.00 |
| Gun Regulation | -0.07* | -0.02 | -0.18* |
| Health Insurance Policy | -0.04* | 0.02 | 0.01 |
| Income Inequality | 0.01 | 0.12* | 0.00 |
| Race Diversity | -0.05* | 0.12* | -0.18* |
| Refugee Allowing | -0.07* | -0.04* | -0.05* |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).