Submitted:
19 August 2024
Posted:
19 August 2024
You are already at the latest version
Abstract

Keywords:
1. Introduction
- To what extent can GPT-4o accurately assess and correct coloring activities completed by elementary school students, given instructions involving multiple logical quantifiers and spatial relationships?
- What are the visual, logical, and correction capabilities of GPT-4o, and what type of prompts are most effective for obtaining accurate and detailed GPT-4o corrections when assessing coloring activities involving logical quantifiers and spatial relationships?
- Can GPT-4o be effectively utilized as a teaching assistant to provide immediate feedback for class discussions on these exercises?
2. Related Work
3. Methodology
3.1. Box and Ball Coloring Activities
- Choose a color and in each box paint at least 2 balls of that color.
- In each box choose a color and paint at least 2 balls of that color.
- In at least two boxes choose a color and paint each ball of that color.
- Paint each box only if the adjacent boxes are not painted. Choose two other colors and in each box paint every ball such that each ball has a different color with the adjacent balls inside the box and the ball in the same position but in the adjacent boxes.
- Choose a color and paint each ball with the ball’s color that is just above in the left adjacent box. If there is no ball above in the left adjacent box, paint with the ball’s color at the bottom of the left adjacent box.
3.2. Prompting Techniques
3.2.1. Zero-Shot, Few-Shot, and Chain of Thought
- Your job is to verify if the proposed solution satisfies the instruction the verbatim.
- Consider that there could be many possible solutions to the instruction.
- Conclude only at the end of your answer.â
3.2.2. Visualization of Thought
3.2.3. Logic Prompting and Self-Consistency
3.2.4. Emotional Prompting
- âWrite the instruction using math quantifiers and then solve the problemâ, which implements the use of logic;
- "Be careful to identify colors. It is different if a color is applied for all the balls in all boxes or each box has its own color for all its balls", preventing errors when interchanging quantifier positions;
- âVisualize each logical stepâ, which implements the visualization of the steps;
- "If the proposed solution is incorrect, don’t propose a correct solution", which prevents unnecessarily long responses, and
- "Unless the instruction says otherwise, you have to check all balls in all boxes whether or not the boxes are painted or not", which prevents errors by not considering uncolored boxes.
3.3. Experiments and Analysis
4. Results
4.1. Performance in Correcting Coloring Activities
4.2. Common Errors
4.2.1. Logic Errors
4.2.2. Spatial Errors
4.2.3. Other
5. Discussion
5.1. Limitations
5.2. Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Tested Problems





References
- Araya, R. Gamification Strategies to Teach Algorithmic Thinking to First Graders. Advances in Human Factors in Training, Education, and Learning Sciences; Z., T.; Salman, K.W.N.; Ahram., Eds. Springer International Publishing, 2021, pp. 133–141.
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 2023; arXiv:cs.CL/2201.11903. [Google Scholar]
- Wu, W.; Mao, S.; Zhang, Y.; Xia, Y.; Dong, L.; Cui, L.; Wei, F. Mind’s Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models. 2024; arXiv:cs.CL/2404.03622. [Google Scholar]
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-Consistency Improves Chain of Thought Reasoning in Language Models. 2023; arXiv:cs.CL/2203.11171. [Google Scholar]
- Pan, L.; Albalak, A.; Wang, X.; Wang, W.Y. Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning. 2023; arXiv:cs.CL/2305.12295. [Google Scholar]
- Li, C.; Wang, J.; Zhang, Y.; Zhu, K.; Hou, W.; Lian, J.; Luo, F.; Yang, Q.; Xie, X. Large Language Models Understand and Can be Enhanced by Emotional Stimuli. 2023; arXiv:cs.CL/2307.11760. [Google Scholar]
- Tversky, B. Visualizing thought. Topics in Cognitive Science 2011, 3, 499–535. [Google Scholar] [CrossRef] [PubMed]
- Franconeri, S.L.; Padilla, L.M.; Shah, P.; Zacks, J.M.; Hullman, J. The Science of Visual Data Communication: What Works. Psychological Science in the Public Interest 2021, 22, 110–161. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.E.; Bainbridge, W.A.; Chamberlain, R.; Wammes, J.D. Drawing as a versatile cognitive tool, 2023. [CrossRef]
- OECD. New PISA results on creative thinking, 2024.
- Somsaman, K.; Isoda, M.; Araya, R., Eds. Guidebook for Unplugged Computational Thinking; SEAMEO STEM-ED, 2024; pp. 1–9.
- Araya, R.; Isoda, M. Unplugged Computational Thinking with Colouring Books. Journal of Southeast Asian Education, 2023; 72–91. [Google Scholar]
- Feldon, D. Cognitive Load and Classroom Teaching: The Double-Edged Sword of Automaticity. Educational Psychologist 2007, 42, 123–137. [Google Scholar] [CrossRef]
- Ravi, P.; Broski, A.; Stump, G.; Abelson, H.; Klopfer, E.; Breazeal, C. UNDERSTANDING TEACHER PERSPECTIVES AND EXPERIENCES AFTER DEPLOYMENT OF AI LITERACY CURRICULUM IN MIDDLE-SCHOOL CLASSROOMS. ICERI2023 Proceedings. IATED, 2023, ICERI2023. [CrossRef]
- Jones, C.R.; Bergen, B.K. People cannot distinguish GPT-4 from a human in a Turing test. ArXiv, 2024; abs/2405.08007. [Google Scholar]
- Urrutia, F.; Araya, R. Who’s the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers.
- Yan, L.; Sha, L.; Zhao, L.; Li, Y.; Martinez-Maldonado, R.; Chen, G.; Li, X.; Jin, Y.; GaÅ¡eviÄ, D. Practical and ethical challenges of large language models in education: A systematic scoping review, 2024. [CrossRef]
- Anderson, N.; McGowan, A.; Galway, L.; Hanna, P.; Collins, M.; Cutting, D. Implementing Generative AI and Large Language Models in Education. ISAS 2023 - 7th International Symposium on Innovative Approaches in Smart Technologies, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. [CrossRef]
- Jeon, J.; Lee, S. Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies 2023, 28, 15873–15892. [Google Scholar] [CrossRef]
- Pinto, G.; Cardoso-Pereira, I.; Ribeiro, D.M.; Lucena, D.; de Souza, A.; Gama, K. Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT. 2023; arXiv:cs.SE/2307.16696. [Google Scholar]
- Rahman, M.M.; Watanobe, Y. ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Applied Sciences (Switzerland) 2023, 13. [Google Scholar] [CrossRef]
- Wang, K.D.; Burkholder, E.; Wieman, C.; Salehi, S.; Haber, N. Examining the potential and pitfalls of ChatGPT in science and engineering problem-solving. Frontiers in Education 2023, 8. [Google Scholar] [CrossRef]
- Orrù, G.; Piarulli, A.; Conversano, C.; Gemignani, A. Human-like problem-solving abilities in large language models using ChatGPT. Frontiers in Artificial Intelligence 2023, 6. [Google Scholar] [CrossRef]
- Plevris, V.; Papazafeiropoulos, G.; Rios, A.J. Chatbots Put to the Test in Math and Logic Problems: A Comparison and Assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard. AI (Switzerland) 2023, 4, 949–969. [Google Scholar] [CrossRef]
- Drori, I.; Zhang, S.; Shuttleworth, R.; Tang, L.; Lu, A.; Ke, E.; Liu, K.; Chen, L.; Tran, S.; Cheng, N.; Wang, R.; Singh, N.; Patti, T.L.; Lynch, J.; Shporer, A.; Verma, N.; Wu, E.; Strang, G. A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences 2022, 119. [Google Scholar] [CrossRef] [PubMed]
- Collins, K.M.; Jiang, A.Q.; Frieder, S.; Wong, L.; Zilka, M.; Bhatt, U.; Lukasiewicz, T.; Wu, Y.; Tenenbaum, J.B.; Hart, W.; Gowers, T.; Li, W.; Weller, A.; Jamnik, M. Evaluating Language Models for Mathematics through Interactions. 2023; arXiv:cs.LG/2306.01694. [Google Scholar]
- Verma, A.; Mukherjee, K.; Potts, C.; Kreiss, E.; Fan, J.E. Evaluating human and machine understanding of data visualizations.
- Wang, B.; Yue, X.; Sun, H. Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate. 2023; arXiv:cs.CL/2305.13160. [Google Scholar]
- Yan, H.; Hu, X.; Wan, X.; Huang, C.; Zou, K.; Xu, S. Inherent limitations of LLMs regarding spatial information. 2023; arXiv:cs.CL/2312.03042. [Google Scholar]
- Maslej, N.; Fattorini, L.; Perrault, R.; Parli, V.; Reuel, A.; Brynjolfsson, E.; Etchemendy, J.; Ligett, K.; Lyons, T.; Manyika, J.; Niebles, J.C.; Shoham, Y.; Wald, R.; Clark, J. The AI Index 2024 Annual Report, 2024.
- Singh, K.; Khanna, M.; Biswas, A.; Moturi, P. ; Shivam. VISUAL PROMPTING METHODS FOR GPT-4V BASED ZERO-SHOT GRAPHIC LAYOUT DESIGN GENERATION. The Second Tiny Papers Track at ICLR 2024, 2024.
- Sharma, P.; Shaham, T.R.; Baradad, M.; Fu, S.; Rodriguez-Munoz, A.; Duggal, S.; Isola, P.; Torralba, A. A Vision Check-up for Language Models. 2024; arXiv:cs.CV/2401.01862. [Google Scholar]
- Musielak, Z.E.; Quarles, B. The three-body problem. Reports on Progress in Physics 2014, 77, 065901. [Google Scholar] [CrossRef] [PubMed]
- Guy, J.H.C.R.K. The Book of Numbers. The Crimean Karaim Bible 2019. [Google Scholar]
- Yang, Z.; Li, L.; Lin, K.; Wang, J.; Lin, C.C.; Liu, Z.; Wang, L. The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision). 2023; arXiv:cs.CV/2309.17421. [Google Scholar]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; Liu, T. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. 2023; arXiv:cs.CL/2311.05232. [Google Scholar]
- DAIR.AI. Prompt Engineering Guide: Elements of a Prompt, 2024. https://www.promptingguide.ai/introduction/elements [Accessed: 2024-08-17].
- Photonics, M.Q.; Group, A. ChatTutor. https://github.com/ChatTutor/chattutor.git, 2023.











| Quantifiers | Explanation |
|---|---|
| 1. color box: | The color C is applied for all the boxes. |
| 2. box color: | Each box has its own color to apply. |
| 3. boxes: color | In two or more boxes a color is chosen and applied for every ball. |
| 4*. ( Ccolor(Bi) = C C, Ccolor(Bi-1) = C, color(Bi+1) = C ) (bjiBi, color(bji) color(bj+1i) color(bji) color(bji+1)) | A box must be painted if the left and right boxes are not. Each ball has a different color than the ball below and the ball in the right box. |
| 5*. (j1 color(bji) = color(bj-1i-1)) (color(b1i) =color(b5i-1)) | A ball is painted with the color of the top-left ball or the bottom ball in the left box, if the ball to be painted is the first one. |
| Instruction | Probability |
|---|---|
| 1.- Choose a color and in each box paint at least 2 balls of that color. | |
| 2.- In each box choose a color and paint at least 2 balls of that color. | |
| 3.- In at least two boxes choose a color and paint each ball of that color. | |
| 4.- Paint each box only if the adjacent boxes are not painted. Choose two other colors and in each box paint every ball such that each ball has a different color with the adjacent balls inside the box and the ball in the same position but in the adjacent boxes. | |
| 5.- Choose a color and paint each ball with the ball’s color that is just above in the left adjacent box. If there is no ball above in the left adjacent box, paint with the ball’s color at the bottom of the left adjacent box. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).