Submitted:
30 August 2023
Posted:
31 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Automated Scoring of Self-Explanations: The Imperative for Rich Data
2.2. Augmenting Mathematical Self -Explanations Using Large Language Models
3. Problem Setting: The Learning Task
3.1. Collecting Self-Explanations
3.2. Assessment of Self-Explanation Quality
3.3. The Text Regression Model Description
4. The Proposed Method
4.1. Overview or Pseudo Labeling
4.2. Pseudo-Labeling Training Algorithm: Dataset Categorization, Function Definitions, and Model Learning
- (1)
- Dataset Categorization:
- (2)
- Function Definitions:
- : A function that takes a set of parameters, denoted by , and a labeled dataset to yield a learned model .
- : A function that accepts a model and a non-labeled test dataset , subsequently outputting a labeled test dataset .
- : A function that takes in a dataset and a numerical value where ( refers to the total number of data points in dataset ), outputting a selected subset .
- (3)
- Model Learning and Final Test:
- (4)
- Parameter Setting in Our Study:
4.3. Pseudo Data Preparation: LLM Usage and Mathematical Material
- Random Data Selection: We began our process by randomly selecting 30% from our human-labeled training dataset to capitalize on the rich diversity of student-generated self-explanations.
- Keyword Extraction: Ten keywords were extracted from each self-explanation, encapsulating its essence, guiding LLM to produce contextually relevant data.
- LLM Generation: LLM Generation: Armed with the extracted keywords, we then proceeded to prompt the LLM [45]. Specifically, each set of 10 keywords was used as seed input, directing the LLM to generate contextually coherent pseudo-self-explanation data. The model was given a directive to "elaborate based on the provided keywords," ensuring the generated content-maintained relevance to the original self-explanation context.
4.4. Comparative Analysis of Original and LLM-Generated Dataset
| Data type | Counts | Mean Score | Std |
|---|---|---|---|
| Original | 2,205 | 2.91 | 1.33 |
| LLM | 4,096 | 3.39 | 1.44 |
| Math | 4,096 | 3.87 | 1.77 |
| LLM Generated Texts (Original) | LLM Generated Texts (English Translated) | Predicted Self-Explanation Score |
|---|---|---|
| その後、α+β=-a-2とαβ=2aの関係から解と係数の関係が分かる。次に、問題で言及されたαとβを用いて式を展開し整理し、右辺を0にする。さらに式を工夫して代入が可能な形にする。そして、関係式α+β=-a-2とαβ=2aを式に代入して簡略化し、a^2+a-6=0となる。これを因数分解してaの解を求めると、a=-3とa=2が得られる。その後、a=2を代入すると解が重解になり、解が一つしかなくなるため問題に合わない。したがって、正しい答えはa=-3である。その後、重積算や累乗などいろいろな計算が解に出てきて混乱するだろうから丁寧に説明していく。 | After that, α+β=-a-2 and αβ=2a can be understood from the relationship between the solutions and coefficients. Next, expand and organize the expressions using the α and β mentioned in the problem, and set the right side to 0. Further manipulate the equation to get it into a form that allows for substitution. Then, substitute the relationship α+β=-a-2 and αβ=2a into the equation and simplify it, resulting in a2+a-6=0. Factorize this to find the solutions for a, which are a=-3 and a=2. Afterward, when a=2 is substituted, the answer becomes a repeated root, which means there is only one solution, making it inconsistent with the problem. Therefore, the correct answer is a=-3. Afterward, various calculations such as product of roots and exponentiation may appear in the solution, and it may be confusing, so I will explain it carefully. | 5.00 |
| 代入してみたら、平方根の中が計算できた。αとβを求めた。αに-2を代入した。解を吟味して、答えは-3だった。その後、a=-4.7x^5+7として計算したが誤差があった。そこで右側の5を左側の4に変形すると正しい値が求められた(ただし小数点以下を切り捨て)。この問題は高校数学の範囲外であるが、中学・高校で習った知識で解くことができたので紹介したい。 | Substituted it into the formula for solutions. Calculated the value inside the square root. Found α and β. Substituted α=-2. Examined the solution, and the answer was -3. Afterward, calculated it as a=-4.7x^5+7, but there was an error. So I rearranged the 5 on the right side to 4, and the correct value was obtained (with the decimal part truncated). This problem is beyond the scope of high school mathematics, but it could be solved using knowledge learned in middle and high school, so I want to introduce it. | 4.08 |
| グラフを書いた傾きがー4のy=x^2の接線を求めた。その後、接点を原点とし両辺に点a,bを代入すると「sin-cos」と出たので、「COS(1)」で求めた。これで式の意味が分かったと思う。-5は角度が変化しないからゼロでないといけないのだ! | Used the slope obtained from graphing to find the tangent of y=x^2 with slope -4. Took the point of tangency as the origin and substituted points a and b on both sides. 'Sin-cos' was obtained, so I calculated it using 'COS(1)'. I think I understood the meaning of the equation now. -5 does not change the angle, so it must be nonzero! | 3.02 |
| x=3を代入する。方程式を解き進める。解の吟味を書く。間違いに気づき、修正する。最終的な答えを書く。その後、再帰的に解く。 | Substituted x=3. Proceeded to solve the equation. Wrote the examination of the solutions. Noticed the mistake and corrected it. Wrote the final answer. Afterward, solve it recursively. | 2.18 |
| 前のは間違えたため、全部消した。その後、通分してみた。 | Since the previous one was incorrect, I deleted everything and then performed the common denominator. Afterwards, something like this. | 1.23 |
| Math Texts | Predicted Self-Explanation Score |
|---|---|
| Angle bisector and ratio, using Ceva's theorem: Revised version Succeed Math A problem 349 △, let △ have the angle bisector of ∠ and the point where it intersects the side , and the point that divides the side in the ratio : . When the line intersects at point , find the length of the side . | 5.00 |
| Using Menelaus's theorem: Segment ratio and area ratio, Revised version Succeed Math A problem 350 △, let be the point where it divides the side in the ratio : , and the point where the segment is divided in the ratio : , and the point where the extension of the segment intersects the side . Find the following segment ratios and area ratios: : : △ : △ : : : | 4.93 |
| Using the relationship between sides and angles: Range of values for side length in a triangle, Revised version Succeed Math A problem 355, determine the range of values for so that a triangle with the following side lengths exists: , , , . | 3.84 |
| Using the relationship between the sizes of three sides: Proving inequalities related to segment lengths, Revised version Succeed Math A important example 66, take point inside △, and join , , and . Prove that . Abbreviated. | 3.13 |
| Examining the sizes of the three angles of a triangle, Revised version Succeed Math A important example 64, examine the sizes of the three interior angles of △ . | 2.66 |
5. Experiments and Evaluations
5.1. Exploring the Influence of Self-Explanation Augmentation on Model Efficiency
5.2. Evaluating Optimal Quantity of Pseudo-Self-Explanation Data


6. Discussion
6.1. Detailed Analysis of Results (RQ1)
6.2. Findings and Observations (RQ2)
6.3. Limitations and Future Research
- Subject Scope: Our dataset is restricted to mathematics, potentially constraining the generalizability of our findings to other subjects.
- Dependency on LLM: Our methodology hinges on the LLM's ability to generate pseudo-self-explanation data. This dependence may introduce noise and errors into our system.
- Data Quality and Representativeness: The performance of our approach is contingent on the quality and representativeness of labeled data. Poor or biased data could compromise model efficacy.
- Model Performance Variability: We identified noticeable disparities in our model's performance across various mathematical categories. For instance, it predicted 'Property of a Circle' (0.242) more accurately than 'Quadratic Functions' (0.419) within the validation datasets. These results indicate that self-explanation augmentation's effectiveness may be influenced by the inherent complexity of a topic and the linguistic nuances present within the self-explanations.
- Evaluation Dataset Categories and Size: The evaluation dataset for some categories is comparatively small, which poses challenges in drawing definitive conclusions. It's essential to consider the ease of inference as it pertains to various mathematical concepts, including linear functions, shapes, equations, and square roots. Certain subjects may be inherently more challenging for machine training due to their linguistic or conceptual intricacies.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rittle-Johnson, B., Loehr, A.M., & Durkin, K. (2017). Promoting self-explanation to improve mathematics learning: A meta-analysis and instructional design principles. ZDM, 49, 599-611. [CrossRef]
- Rittle-Johnson, B. (2017). Developing Mathematics Knowledge. Child Development Perspectives, 11, 184-190. [CrossRef]
- Renkl, A. (2017). Learning from worked-examples in mathematics: students relate procedures to principles. ZDM, 49, 571-584. [CrossRef]
- Chi, M.T., Leeuw, N.D., Chiu, M., & LaVancher, C. (1994). Eliciting Self-Explanations Improves Understanding. Cogn. Sci., 18, 439-477. [CrossRef]
- Rittle-Johnson, B. (2006). Promoting transfer: effects of self-explanation and direct instruction. Child development, 77 1, 1-15. [CrossRef]
- Conati, C., VanLehn, K. (2000). Toward Computer-Based Support of Meta-Cognitive Skills: a Computational Framework to Coach Self-Explanation.
- Bisra, K., Liu, Q., Nesbit, J.C., Salimi, F., & Winne, P.H. (2018). Inducing Self-Explanation: a Meta-Analysis. Educational Psychology Review, 30, 703-725. [CrossRef]
- Crippen, K.J., Earl, B.L. (2007). The impact of web-based worked examples and self-explanation on performance, problem solving, and self-efficacy. Comput. Educ., 49, 809-821. [CrossRef]
- Nakamoto, R., Flanagan, B., Takam K., Dai Y., Ogata, H., Identifying Students’ Stuck Points Using Self-Explanations and Pen Stroke Data in a Mathematics Quiz, ICCE 2021, 2021.11.22-26.
- Nakamoto, R., Flanagan, B., Dai, Y., Takami, K., & Ogata, H. (2024). Unsupervised techniques for generating a standard sample self-explanation answer with knowledge components in a math quiz. Research and Practice in Technology Enhanced Learning, 19, 016. [CrossRef]
- Berthold, K., Eysink, T.H., & Renkl, A. (2009). Assisting self-explanation prompts are more effective than open prompts when learning with multiple representations. Instructional Science, 37, 345-363. [CrossRef]
- Berthold, K., Renkl, A. (2009). Instructional Aids to Support a Conceptual Understanding of Multiple Representations. Journal of Educational Psychology, 101, 70-87. [CrossRef]
- McEldoon, K. L., Durkin, K. L., & Rittle-Johnson, B. (2013). Is self-explanation worth the time? A comparison to additional practice. British Journal of Educational Psychology, 83(4), 615-632. [CrossRef]
- Panaite, M., Dascalu, M., Johnson, A.M., Balyan, R., Dai, J., McNamara, D.S., & Trausan-Matu, S. (2018). Bring It on! Challenges Encountered While Building a Comprehensive Tutoring System Using ReaderBench. International Conference on Artificial Intelligence in Education. [CrossRef]
- Hodds, M., Alcock, L., & Inglis, M. (2014). Self-explanation training improves proof comprehension. Journal for Research in Mathematics Education, 45, 62-101. [CrossRef]
- CyberAgent. (2023). Open-Calm-7B [Software]. Hugging Face. https://huggingface.co/cyberagent/open-calm-7b.
- Andonian, A., Anthony, Q., Biderman, S., Black, S., Gali, P., Gao, L., Hallahan, E., Levy-Kramer, J., Leahy, C., Nestler, L., Parker, K., Pieler, M., Purohit, S., Songz, T., Wang, P., & Weinbach, S. (2021). GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch (Version 0.0.1) [Computer software]. [CrossRef]
- Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.J., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-Shot Learners. ArXiv, abs/2005.14165.
- McNamara, D.S., Levinstein, I.B., & Boonthum, C. (2004). iSTART: Interactive strategy training for active reading and thinking. Behavior Research Methods, Instruments, & Computers, 36, 222-233. [CrossRef]
- Funayama, H., Asazuma, Y., Matsubayashi, Y., Mizumoto, T., & Inui, K. (2023). Reducing the Cost: Cross-Prompt Pre-finetuning for Short Answer Scoring. International Conference on Artificial Intelligence in Education. [CrossRef]
- Crossley, S.A., Kim, M., Allen, L.K., & McNamara, D.S. (2019). Automated Summarization Evaluation (ASE) Using Natural Language Processing Tools. International Conference on Artificial Intelligence in Education. [CrossRef]
- Özsoy, M.G., Alpaslan, F.N., & Çiçekli, I. (2011). Text summarization using Latent Semantic Analysis. Journal of Information Science, 37, 405 - 417. [CrossRef]
- León, J.A., Olmos, R., Escudero, I., Cañas, J.J., & Salmerón, L. (2006). Assessing short summaries with human judgments procedure and latent semantic analysis in narrative and expository texts. Behavior Research Methods, 38, 616-627. [CrossRef]
- Panaite, M., Ruseti, S., Dascalu, M., Balyan, R., McNamara, D.S., & Trausan-Matu, S. (2019). Automated Scoring of Self-explanations Using Recurrent Neural Networks. European Conference on Technology Enhanced Learning. [CrossRef]
- Cascante-Bonilla, P., Tan, F., Qi, Y., & Ordonez, V. (2020). Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning. AAAI Conference on Artificial Intelligence. [CrossRef]
- D. B. Rubin(1993), Statistical disclosure limitation,Journal of official Statistics., vol. 9, no. 2, pp. 461–468, 1993.
- Antulov-Fantulin, N., Bosnjak, M., Zlatic, V., Grcar, M., & Šmuc, T. (2012). Synthetic Sequence Generator for Recommender Systems - Memory Biased Random Walk on a Sequence Multilayer Network. IFIP Working Conference on Database Semantics. [CrossRef]
- Jelic, B. , Grbić, R., Vranješ, M., & Mijić, D. (2021). Can we replace real-world with synthetic data in deep learning-based ADAS algorithm development? IEEE Consumer Electronics Magazine, 1-1.
- Chen, R.J., Lu, M.Y., Chen, T.Y., Williamson, D.F., & Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5, 493 - 497. [CrossRef]
- El Emam, K. (2020). Seven Ways to Evaluate the Utility of Synthetic Data. IEEE Security & Privacy, 18, 56-59. [CrossRef]
- Ping, H., Stoyanovich, J., & Howe, B. (2017). DataSynthesizer: Privacy-Preserving Synthetic Datasets. Proceedings of the 29th International Conference on Scientific and Statistical Database Management.
- Dahmen, J., & Cook, D.J. (2019). SynSys: A Synthetic Data Generation System for Healthcare Applcations. Sensors (Basel, Switzerland), 19.
- Berg, A., Mol, S.T., Kismihók, G., & Sclater, N. (2016). The Role of a Reference Synthetic Data Generator within the Field of Learning Analytics. J. Learn. Anal., 3. [CrossRef]
- Peña-Ayala, A. (2018). Learning analytics: A glance of evolution, status, and trends according to a proposed taxonomy. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8.
- Flanagan, B., Majumdar, R., & Ogata, H. (2022). Fine Grain Synthetic Educational Data: Challenges and Limitations of Collaborative Learning Analytics. IEEE Access, PP, 1-1. [CrossRef]
- Dai, H., Liu, Z., Liao, W., Huang, X., Cao, Y., Wu, Z., Zhao, L., Xu, S., Liu, W., Liu, N., Li, S., Zhu, D., Cai, H., Sun, L., Li, Q., Shen, D., Liu, T., & Li, X. (2023). AugGPT: Leveraging ChatGPT for Text Data Augmentation. arXiv preprint . arXiv:2302.13007.
- Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023). Let's Verify Step by Step. ArXiv, abs/2305.20050.
- Flanagan, B., & Ogata, H. (2018). Learning analytics platform in higher education in Japan. Knowledge Management & E-Learning: An International Journal.
- Thompson, D.R., & Senk, S.L. (1998). Using rubrics in high school mathematics courses. Mathematics Teacher: Learning and Teaching PK–12, 91, 786-793.
- Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20, 37 - 46. [CrossRef]
- Wang, T., Inoue, N., Ouchi, H., Mizumoto, T., & Inui, K. (2019). Inject Rubrics into Short Answer Grading System. Conference on Empirical Methods in Natural Language Processing.
- Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention isAll you Need. NIPS.
- Suzuki, M. (2019). Pretrained Japanese BERT models, GitHub repository, https://github.com/cl-tohoku/bert-japanese.
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55, 1 - 35. [CrossRef]
- Chai, T., & Draxler, R.R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7, 1247-1250. [CrossRef]
- Hodson, T.O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not. Geoscientific Model Development. [CrossRef]





| Number | Rubric | Sample Answer of Self-explanations |
|---|---|---|
| Step 1 | Be able to find the equation of a linear function from two points. | Substituting the y-coordinate of p into the equation of the line AC. |
| Step 2 | Be able to find the equation of the line that bisects the area of a triangle. | Find the area of triangle ABC, then find the area of triangle OPC. |
| Step 3 | Be able to represent a point on a straight line using letters (P-coordinates). | With the line OC as the base, find the y-coordinate of p, which is the height. P’s coordinate is (t, -1/2t+4). |
| Step 4 | Be able to represent a point on a straight line using letters (Q-coordinate). | Since the coordinates of P are (3,5/2), the line OP is y=⅚x, and the coordinates of Q are (t,5/6). |
| Graded Score | Description |
|---|---|
| 1 (Unacceptable) | The number of steps for which self-explanation is filled in for the steps required for the solution is minimal, and there were problematic expressions in the students' self-explanation (e.g., mistaken patterns, boredom.) |
| 2 (Poor) | self-explanation are mainly provided for the steps required for the solution. Still, they are more like bullet points than explanations. |
| 3 (Fair) | self-explanation are mainly provided for the steps required for the answer—the average self-explanation level among all respondents. |
| 4 (Very Good) | self-explanation are provided for most of the steps required for the answer, but there is room for improvement as an explanation (Logic, expressions). |
| 5 (Excellent) | self-explanation are mainly provided for the steps required for the answer, and the explanation is logical and well-written. |
|
Data Type |
Num of quiz | Variations of math units | Total answers |
Sentence Length (Character count) |
Quality Score | ||
|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | ||||
| Train | 40 | 8 | 1,420 | 67.8 | 56.8 | 2.94 | 1.34 |
| Valid | 37 | 8 | 355 | 67.3 | 59.3 | 2.92 | 1.31 |
| Test | 8 | 3 | 431 | 63.7 | 53.2 | 2.81 | 1.25 |
| Dataset | base_line | LLM | math | mixed | only_LLM_math |
|---|---|---|---|---|---|
| Original (N=1,420) |
〇 | 〇 | 〇 | 〇 | |
| LLM- generated (N=4,096) |
〇 | 〇 | 〇 | ||
| Math texts (N=4,096) |
〇 | 〇 | 〇 | ||
| Total Number of Data | 1,420 | 5,516 | 5,516 | 9,612 | 8,192 |
| Data Type | base_line | LLM | math | mixed | only_LLM_math |
|---|---|---|---|---|---|
| Test | 0.749 | 0.699 | 0.646 | 0.692 | 1.135 |
| Val | 0.602 | 0.341 | 0.358 | 0.336 | 1.033 |
| Dataset | Number of datasets added | |||||
| 128 | 256 | 512 | 1024 | 2048 | 4096 | |
| base_line | 0.75 | |||||
| LLM | 0.67 | 0.63 | 0.72 | 0.72 | 0.71 | 0.7 |
| math | 0.64 | 0.66 | 0.67 | 0.64 | 0.65 | 0.65 |
| mixed | 0.68 | 0.66 | 0.71 | 0.68 | 0.73 | 0.69 |
| only_LLM_math | 1.19 | 0.96 | 1.02 | 0.89 | 1.15 | 1.14 |
| Dataset | Number of datasets added | |||||
| 128 | 256 | 512 | 1024 | 2048 | 4096 | |
| base_line | 0.60 | |||||
| LLM | 0.57 | 0.35 | 0.51 | 0.49 | 0.40 | 0.34 |
| math | 0.40 | 0.50 | 0.43 | 0.35 | 0.40 | 0.36 |
| mixed | 0.59 | 0.32 | 0.52 | 0.44 | 0.40 | 0.34 |
| only_LLM_math | 1.19 | 0.90 | 0.96 | 0.81 | 1.02 | 1.03 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).