Submitted:
31 October 2024
Posted:
01 November 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Validation of the Evaluation Instrument
2.3. Procedure
2.4. Data Analysis
3. Results
3.1. Subjective Judgment of the Prompt Written According to the Instructions for the Activity
| Variable | Conditions | f | Mean, (SD) of prompt | Range and median of prompt |
|---|---|---|---|---|
| Use of standards in writing χ² (3) = 1.43, p = .700 |
No orthography or punctuation errors. | 11.8% | 2.00 (0.92) |
2 (1-3) 2 |
| orthography errors in the writing of the prompt. | 26.5% | 2.17 (0.78) |
2 (1-3) 2 |
|
| punctuation errors in the writing of the prompt. | 26.5% | 1.83 (0.92) |
2 (1-3) 1.5 |
|
| Both types of errors in the writing of the prompt. | 35.3% | 1.96 (0.85) |
2 (1-3) 2 |
|
| Verbal moods or attitudes of the speaker χ² (3) = 13.4, p = .004 |
Use of indicative mood | 51.5 % | 2.23 (0.87) |
2(1-3) 3 |
| Use of subjunctive mood | 7.4 % | 2.60 (0.54) |
1(2-3) 3 |
|
| Use of imperative mood | 0% | |||
| Use of two verb moods | 36.8 % | 1.64 (0.70) |
2 (1-3) 2 |
|
| Use of three verb moods | 4.4 % | 1.00 (0.00) |
0 (1-1) 1 |
|
| Sentence complexity χ² (4) = 17.4, p = .002 |
Use of simple sentence | 29.4 % | 2.45 (0.52) |
2 (1-3) 3 |
| Use of coordinated sentences | 2.9 % | 1.5 (1.00) |
1 (1-2) 1.5 |
|
| Use of subordinate clauses | 8.8 % | 2.67 (0.40) |
1 (2-3) 3 |
|
| Use of two types of sentences | 38.2 % | 1.81 (0.87) |
2 (1-3) 2 |
|
| Use three types of sentences | 20.6 % | 1.43 (0.83) |
2 (1-3) 1 |
3.2. Subjective Judgment of the Quality of the Response Given by the LLM
| Variable | Conditions | f | Mean (SD) of response | Range and median of response |
|---|---|---|---|---|
| Use of standards in writing χ² (3) = 7.78, p = .051 |
No orthography or punctuation errors. | 18.4% | 1.95 (0.84) |
2(1-3) 2 |
| orthography errors in the writing of the prompt. | 30.1% | 1.81 (0.74) |
2(1-3) 2 |
|
| punctuation errors in the writing of the prompt. | 21.4% | 1.41 (0.73) |
2(1-3) 1 |
|
| Both types of errors in the writing of the prompt. | 30.1% | 1.52 (0.62) |
2(1-3) 1 |
|
| Verbal moods or attitudes of the speaker χ² (4) = 18.7, p < .001 |
Use of indicative mood | 62.1 % | 1.73 (0.78) |
2(1-3) 2 |
| Use of subjunctive mood | 6.8 % | 2.0 (0.57) |
2(1-3) 2 |
|
| Use of imperative mood | 1.9 % | 3.0 (0) |
0(3-3) 3 |
|
| Use of two verb moods | 26.2 % | 1.37 (0.56) |
2(1-3) 1 |
|
| Use of three verb moods | 2.9 % | 1.0 (0) |
0(1-1) 1 |
|
| Sentence complexity χ² (4) = 18.7, p < .001 |
Use of simple sentence | 44.7 % | 1.91 (0.78) |
2 (1-3) 2 |
| Use of coordinated sentences | 2.9 % | 1.33 (0.57) |
1 (1-2) 1 |
|
| Use of subordinate clauses | 10.7 % | 2 (0.77) |
2 (1-3) 2 |
|
| Use of two types of sentences | 27.2 % | 1.43 (0.63) |
2 (1-3) 1 |
|
| Use three types of sentences | 14.6 % | 1.13 (0.35) |
1 (1-2) 1 |
3.3. Length of the Utterances of the Prompt
4. Discussion
4.1. Punctuation and Orthography
4.2. Verbal Moods
4.3. Sentence Complexity
4.4. Implications for How to Objectively Evaluate a Prompt Written by an Adult
4.5. Implications of this Work in the Field of Higher Education
5. Conclusions
Limitations
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
| mean | V | Vo = .50 | Lower | Upper | Vo = .70 | Decision | ||
|---|---|---|---|---|---|---|---|---|
| A1 The student's prompt demonstrates that he/she was able to follow the instructions given in the activity. | S | 3.75 | .92 | ✓ | .6461 | .9851 | X | Reassess sufficiency |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.25 | .75 | X | .4677 | .9111 | X | Rewrite | |
| A2 The answer given by the IA is satisfactory according to the given prompt. | S | 3.75 | .92 | ✓ | .6461 | .9851 | X | Revise sufficiency |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite | |
| B1 Record the number of words used in the question posed to the AI. | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| B2 Record the number of sentences used in the wording of the question posed to the IA. | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C1 There are orthographical errors in the wording of the prompt. | S | 2.25 | .42 | X | .1933 | .6805 | X | insufficiency |
| R | 3.50 | .83 | ✓ | .552 | .953 | X | Revise Relevance | |
| C | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C2 There are punctuation errors in the wording of the prompt. | S | 3.25 | .75 | X | .4677 | .9111 | X | insufficiency |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C3 Students comply with Colloquialisms and politeness in the writing of the prompt. | S | 2.75 | .58 | X | .3195 | .8067 | X | insufficiency |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.75 | .92 | ✓ | .6461 | .9851 | X | Revise writing | |
| D1 The indicative mode is present in the wording of the prompt. | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite | |
| D2 The subjunctive mood is present in the wording of the prompt. | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite | |
| D3 The imperative mood is present in the wording of the prompt. | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite | |
| D4 There are dual combinations of verb moods, i.e. at least two types in the wording of the prompt | S | 3.25 | .75 | X | .4677 | .9111 | X | insufficiency |
| R | 3.25 | .75 | X | .4677 | .9111 | X | Irrelevant | |
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite | |
| D5 The prompt uses three types of combined verb moods | S | 3.25 | .75 | X | .4677 | .9111 | X | Insufficiency |
| R | 3.25 | .75 | X | .4677 | .9111 | X | Irrelevant | |
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite | |
| E1 There is a simple sentence in the wording of the prompt. | S | 3.75 | .92 | ✓ | .6461 | 1.0000 | X | Revise sufficiency |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.25 | .75 | X | .4677 | .9111 | X | Rewrite | |
| E2 There is a coordinated sentence in the wording of the prompt. | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.25 | .75 | X | .4677 | .9111 | X | Rewrite | |
| E3 There is a subordinate sentence in the wording of the prompt. | S | 3.50 | .83 | ✓ | .552 | .953 | X | Revise sufficiency |
| R | 3.50 | .83 | ✓ | .552 | .953 | X | Revise relevance | |
| C | 3.25 | .75 | X | .4677 | .9111 | X | Rewrite | |
| E4 There are dual combinations of sentence types | S | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | |
| R | 4.00 | 1.00 | ✓ | .7575 | 1.0000 | ✓ | ||
| C | 3.75 | .92 | ✓ | .6461 | .9851 | X | Revise writing | |
| E5 Three types of combined sentences are used in the prompt | S | 3.50 | .83 | ✓ | .552 | .953 | X | Revise sufficiency |
| R | 3.50 | .83 | ✓ | .552 | .953 | X | Revise relevance | |
| C | 3.00 | .67 | X | .3906 | .8619 | X | Rewrite |
References
- Aiken, L. (1985). Three coefficients for analyzing the reliability and validity of ratings. Educational and Psychological Measurement, 45(1), 131–142. [CrossRef]
- Brown, A. V., Paz, Y. B., & Brown, E. K. (2021). El léxico-gramática del español: Una aproximación mediante la lingüística de corpus. Routledge.
- Bryant, C., Yuan, Z., Qorib, M. R., Cao, H., Ng, H. T., & Briscoe, T. (2023). Grammatical Error Correction: A Survey of the State of the Art. Computational Linguistics, 1–59. [CrossRef]
- Chan, C. K. Y., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education, 20(1), 43. [CrossRef]
- Chen, B., Zhang, Z., Langrené, N., & Zhu, S. (2024). Unleashing the potential of prompt engineering in Large Language Models: A comprehensive review (No. arXiv:2310.14735). arXiv. http://arxiv.org/abs/2310.14735.
- Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large Language Models Trained on Code (No. arXiv:2107.03374). arXiv. http://arxiv.org/abs/2107.03374.
- Deng, M., Wang, J., Hsieh, C.-P., Wang, Y., Guo, H., Shu, T., Song, M., Xing, E. P., & Hu, Z. (2022). RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning (No. arXiv:2205.12548). arXiv. http://arxiv.org/abs/2205.12548.
- Etikan, I. (2016). Comparison of Convenience Sampling and Purposive Sampling. American Journal of Theoretical and Applied Statistics, 5(1), 1. [CrossRef]
- Fang, T., Yang, S., Lan, K., Wong, D. F., Hu, J., Chao, L. S., & Zhang, Y. (2023). Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation (No. arXiv:2304.01746). arXiv. http://arxiv.org/abs/2304.01746.
- Gjenero, A. (2024). Uso del modo subjuntivo para expresar deseos. Doctoral Dissertation, University of Zagreb. Faculty of Humanities and Social Sciences. Department of Romance Languages and Literature.
- Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding (No. arXiv:2009.03300). arXiv. http://arxiv.org/abs/2009.03300.
- Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., & Steinhardt, J. (2021). Measuring Mathematical Problem Solving With the MATH Dataset (No. arXiv:2103.03874). arXiv. http://arxiv.org/abs/2103.03874.
- Heston, T., & Khun, C. (2023). Prompt Engineering in Medical Education. International Medical Education, 2(3), 198–205. [CrossRef]
- Holmes, W. (2023). The unintended consequences of artificial intelligence and education. Education International.
- Johnson, D. Johnson, D., Goodman, R., Patrinely, J., Stone, C., Zimmerman, E., Donald, R., Chang, S., Berkowitz, S., Finn, A., Jahangir, E., Scoville, E., Reese, T., Friedman, D., Bastarache, J., Heijden, Y. V. D., Wright, J., Carter, N., Alexander, M., Choe, J., … Wheless, L. (2023). Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. In Review. [CrossRef]
- Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 6, 100225. [CrossRef]
- Meskó, B. (2023). Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. Journal of Medical Internet Research, 25, e50638. [CrossRef]
- Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: A guidance for policymakers. Unesco Publishing.
- Muñoz De La Virgen, C. (2024). Adquisición del modo subjuntivo: Una propuesta didáctica. Didáctica. Lengua y Literatura, 36, 127–144. [CrossRef]
- Overono, A. L., & Ditta, A. S. (2023). The Rise of Artificial Intelligence: A Clarion Call for Higher Education to Redefine Learning and Reimagine Assessment. College Teaching, 1–4. [CrossRef]
- Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge. [CrossRef]
- Pavelko, S. L., Price, L. R., & Owens Jr, R. E. (2020). Revisiting reliability: Using Sampling Utterances and Grammatical Analysis Revised (SUGAR) to compare 25-and 50-utterance language samples. Language, Speech, and Hearing Services in Schools, 51(3), 778–794. [CrossRef]
- Pavez, M. (2002). Presentación del índice de desarrollo del lenguaje ‘Promedio de Longitud de los Enunciados’ (PLE). Universidad de Chile. https://bit.ly/2IH4rwV.
- Penfield, R. D., & Giacobbi, Jr., P. R. (2004). Applying a Score Confidence Interval to Aiken’s Item Content-Relevance Index. Measurement in Physical Education and Exercise Science, 8(4), 213–225. [CrossRef]
- Radford, A. (2023). Analysing English sentence structure: An intermediate course in syntax. Cambridge University Press.
- Roumeliotis, K. I., & Tselikas, N. D. (2023). ChatGPT and Open-AI Models: A Preliminary Review. Future Internet, 15(6), 192. [CrossRef]
- Saúde, S., Barros, J. P., & Almeida, I. (2024). Impacts of Generative Artificial Intelligence in Higher Education: Research Trends and Students’ Perceptions. Social Sciences, 13(8), 410. [CrossRef]
- Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff, S., Dulepet, P. S., Vidyadhara, S., Ki, D., Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H., Srivastava, A., … Resnik, P. (2024). The Prompt Report: A Systematic Survey of Prompting Techniques (No. arXiv:2406.06608). arXiv. http://arxiv.org/abs/2406.06608.
- Shi, F., Suzgun, M., Freitag, M., Wang, X., Srivats, S., Vosoughi, S., Chung, H. W., Tay, Y., Ruder, S., Zhou, D., Das, D., & Wei, J. (2022). Language Models are Multilingual Chain-of-Thought Reasoners (No. arXiv:2210.03057). arXiv. http://arxiv.org/abs/2210.03057.
- Singh, A., Singh, N., & Vatsal, S. (2024). Robustness of LLMs to Perturbations in Text (No. arXiv:2407.08989). arXiv. http://arxiv.org/abs/2407.08989.
- Soler, M. C., Murillo, E., Nieva, S., Rodríguez, J., Mendez-Cabezas, C., & Rujas, I. (2023). Verbal and More: Multimodality in Adults’ and Toddlers’ Spontaneous Repetitions. Language Learning and Development, 19(1), 16–33. [CrossRef]
- Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M., Abid, A., Fisch, A., Brown, A. R., Santoro, A., Gupta, A., Garriga-Alonso, A., Kluska, A., Lewkowycz, A., Agarwal, A., Power, A., Ray, A., Warstadt, A., Kocurek, A. W., Safaya, A., Tazarv, A., … Wu, Z. (2023). Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models (No. arXiv:2206.04615). arXiv. http://arxiv.org/abs/2206.04615.
- Team, G., Anil, R., Borgeaud, S., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., Millican, K., Silver, D., Johnson, M., Antonoglou, I., Schrittwieser, J., Glaese, A., Chen, J., Pitler, E., Lillicrap, T., Lazaridou, A., … Vinyals, O. (2024). Gemini: A Family of Highly Capable Multimodal Models (No. arXiv:2312.11805). arXiv. http://arxiv.org/abs/2312.11805.
- Torrego, L. G. (2015). Ortografía de uso español actual. Ediciones SM España.
- Vyčítalová, B. L. (2024). El subjuntivo y el indicativo: La importancia de una preparación previa del estudiante. Filozofická Fakulta Ústav Románských Jazyků a Literatur. Masarykova Univerzita.
- Wermelinger, M. (2023). Using GitHub Copilot to Solve Simple Programming Problems. Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, 172–178. [CrossRef]
- White, J., Hays, S., Fu, Q., Spencer-Smith, J., & Schmidt, D. C. (2024). ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design (No. arXiv:2303.07839). arXiv. http://arxiv.org/abs/2303.07839.

| Variable measured | Indicator | Punctuations |
|---|---|---|
| A1. The subjective judgment of the prompt written according to the instruction to the activity proposed by the evaluators (mean prompt) | The written prompt by the participant is evidence that he/she was able to follow the instructions given in the activity. | 1 achieved, |
| 2 moderately achieved | ||
| 3 not achieved | ||
| A2. The subjective judgment of the quality of the response given by the LLM | The answer given by the AI is satisfactory according to the prompt written by the participant. | 1 achieved, |
| 2 moderately achieved | ||
| 3 not achieved | ||
| B. Length of the utterances | Number of words used in the wording of the prompt posed to the IA. | Index: Number of words / Number of utterances. |
| Number of sentences used in the wording of the prompt posed to the IA. |
| Variable measured | Indicator | Alternatives |
|---|---|---|
| C. Use of standards in writing (form). | No orthography or punctuation errors. | 1 |
| Orthography errors in the writing of the prompt. | 2 | |
| Punctuation errors in the writing of the prompt. | 3 | |
| Both types of errors in the writing of the prompt. | 4 | |
| D. Verbal moods or attitudes of the speaker: Use of indicative, subjunctive and imperative moods. | In prompt writing, indicative moods are identified. | 1 |
| In prompt writing, subjunctive moods are identified. | 2 | |
| In prompt writing, imperative moods are identified. | 3 | |
| In the prompt, both types of verbal moods are identified. | 4 | |
| In the prompt, the three types of verbal moods are identified. | 5 | |
| E. Sentence complexity in the prompt: This determines the type(s) of sentences the participant used in writing the prompt. | In prompt writing, only simple sentences are identified. | 1 |
| In prompt writing, only coordinate sentences are identified. | 2 | |
| In prompt writing, only subordinate sentences are identified. | 3 | |
| In prompt writing, two types of sentences are identified. | 4 | |
| In prompt writing, the three types of sentences are identified. | 5 |
| Variable | Conditions | f | Mean, (SD) of Length of utterances | Range (min-max) and median of Length of utterances |
|---|---|---|---|---|
| Use of standards in writing χ² (3) = 6.19, p = .103 |
No orthography or punctuation errors. | 18.4% | ||
| orthography errors in the writing of the prompt. | 30.1% | 9.5 (3.43) |
13 (4-17) 9 |
|
| punctuation errors in the writing of the prompt. | 21.4% | 11.69 (5.25) |
21 (3-24) 11.4 |
|
| Both types of errors in the writing of the prompt. | 30.1% | 11.81 (4.73) |
19.7 (5-24) 11.5 |
|
| Verbal moods or attitudes of the speakerχ² (4) = 39.8, p < .001 | Use of indicative mode | 62.1 % | 10.07 (4.10) |
21 (3-24) 9 |
| Use of subjunctive mood | 6.8 % | 9.29 (1.70) |
5 (8-13) 9 |
|
| Use of imperative mood | 1.9 % | 5 (2.82) |
4 (3-7) 5 |
|
| Use of two verb moods | 26.2 % | 13.96 (5.19) |
27 (8-35) 13.5 |
|
| Use of three verb moods | 2.9 % | 16.89 (7.83) |
16 (9-24) 17 |
|
| Sentence complexity χ² (4) = 39.8, p < .001 |
Use of simple sentence | 44.7 % | 8.46 (3.58) |
21 (3-24) 8 |
| Use of coordinated sentences | 2.9 % | 10 (4.50) |
9 (5.5-14) 10 |
|
| Use of subordinate clauses | 10.7 % | 11.5 (2.48) |
7 (9-16) 11 |
|
| Use of two types of sentences | 27.2 % | 14.66 (5.70) |
29 (6-35) 13.2 |
|
| Use three types of sentences | 14.6 % | 12.72 (2.91) |
9 (8-17) 12.3 |
| EFA Exploratory Factor Analysis |
PCA Principal Component Analysis |
|||||
|---|---|---|---|---|---|---|
| Factor | KMO= .65 | 1a | Uniqueness | % of variance | Eigenvaluesb | |
| Length of the utterances | .615 | .65 | .776* | .397 | 63.9 | 1.91 |
| Verbal moods | .614 | .68 | .776* | .398 | 20.9 | .62 |
| Sentence complexity | .815 | .62 | .844* | .287 | 15.2 | .45 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).