Submitted:
07 October 2023
Posted:
10 October 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- The text parser and diagram parser are proposed to extract formal language set from problem description. For text parsing, we use a rule-based text parser to convert the problem text into a text logical form and identify the graph elements in the text through regular expressions. For diagram parsing, we propose an enhanced RetinaNet for detecting diagram symbols. This improved method increases the accuracy of diagram symbol detection and the accuracy of diagram formal language set extraction. Moreover, it leads to a boost in the improved RetinaNet model fitting speed and a reduction of parameters. Two auxiliary tasks also designed to improve parsing procedure.
- Using the extracted relationship set for theorem prediction. Through diagram framework in the relationship set, the formal language graph is constructed. We utilize graphic convolution neural network that encodes structural information in the relationship set. The node embedding of formal language set is used to predict theorems.
- Explainable problem problem solving has been achieved by predicted theorems and formal language set. The problem solving procedure is in a step by step way to reason by theorems. In the experiment, our geometric solver achieves better results and we also formally show the interpretable solution process.
2. Related Work
2.1. Methods for Math Problem Solving
2.2. Graph Neural Network
2.3. Methods for Diagram Processing
3. Methodology
3.1. Text Parser
| Algorithm 1 : Parse text logic forms |
|
Input: Extracted text features stored in text_logic_forms_annot TA Output: Encoded text formal language logic forms TF |
| 1:Assign the text logic form to text_logic_forms 2:for each text logic form: if debug_mode():Outputs the text logic form if text contains the “Find” and is the last one:the analytical result is used as the target solution result else:text is parsed into res for DFS |
| 3:return TF |
3.2. Diagram Parser
| Algorithm 2 : Parse diagram logic forms |
|
Input: Extracted image features stored in diagram_logic_forms_annot DA Output: Encoded diagram formal language logic forms DF |
| 1:Set up the logical parser LP according to the debug_mode of the data storage address 2:if diagram_parser() then 3: Defines point in diagram_parser DP Using the lambda(), the variable ch is defined as the points in the graph Travel the point in DP, the value of debug_mode is true, and output this point 4: Defines line in DP for the line segments Ls in the chart:Remove the header-tail space if Ls has a length of 2 and both the first and second ones are letters:The head-end letters are defined as Ls 5: Defines circle in DP for the points in the circle:Take points as a circle definition 6: Assign the graph logic form to logic_forms 7: Sort each form in DF, and place the perpendicular condition to the last one 8: for each logic form: if logic form(): if debug_mode():Outputs this logic form try:parse the above forms to tree, DFS the parsing tree except:output error 9: return DF |
3.3. GCN-based Theorem Predictor
| Algorithm 3 : Set up, initialize and run the logic solver |
|
Input: diagram_logic_forms DF and text_logic_forms TF Output: target problem solving T |
| 1:Initialization defines the theorem used in the solution process 2:Search initialization: remove no-use points initialize the Ls find all the triangles, quads solve the relationship between angles, line segments, and arcs3 3:Solutions algorithm: if round_or_step is false: try to get the answer before using theorems else: check order_lst |
| 4:return target T |
3.4. Solving for the Interpretability Geometry Problem
4. Experiment
4.1. Experimental Setup and Implementation Details
4.1.1. Experimental Setup
4.1.2. Implementation Details
4.2. Experimental Results
4.3. Interpretable Geometry Solution
4.4. Typical Case Analyzing


5. Conclusion
Acknowledgments
References
- Y. Wang, X. Liu, and S. Shi, “Deep neural solver for math word problems,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 845–854. [CrossRef]
- K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” EMNLP 2014—2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 1724–1734, 2014. [CrossRef]
- L. Wang, Y. Wang, D. Cai, D. Zhang, and X. Liu, “Translating a math word problem to a expression tree,” Proc. 2018 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2018, pp. 1064–1069, 2020. [CrossRef]
- Z. Xie and S. Sun, “A goal-driven tree-structured neural model for math word problems,” IJCAI Int. Jt. Conf. Artif. Intell., vol. 2019-Augus, pp. 5299–5305, 2019.
- Q. Wu, Q. Zhang, J. Fu, and X. Huang, “A knowledge-aware sequence-to-tree network for math word problem solving,” EMNLP 2020—2020 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., no. 2019, pp. 7137–7146, 2020. [CrossRef]
- Z. Li et al., “Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems,” pp. 2486–2496, 2021. [CrossRef]
- S. H. Tsai, C. C. Liang, H. M. Wang, and K. Y. Su, “Sequence to General Tree: Knowledge-Guided GeometryWord Problem Solving,” ACL-IJCNLP 2021—59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., vol. 2, pp. 964–972, 2021. [CrossRef]
- M. Seo, H. Hajishirzi, A. Farhadi, O. Etzioni, and C. Malcolm, “Solving geometry problems: Combining text and diagram interpretation,” Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process., no. September, pp. 1466–1476, 2015. [CrossRef]
- J. Chen et al., “GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning,” pp. 513–523, 2021. [CrossRef]
- P. Lu et al., “Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning,” ACL-IJCNLP 2021 - 59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 6774–6786, 2021. [CrossRef]
- T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020. [CrossRef]
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708. [CrossRef]
- Y. Bakman, “Robust understanding of word problems with extraneous information,” arXiv Prepr. math/0701393, 2007. [CrossRef]
- D. Huang, S. Shi, C.-Y. Lin, and J. Yin, “Learning fine-grained expressions to solve math word problems,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 805–814. [CrossRef]
- S. Roy and D. Roth, “Mapping to declarative knowledge for word problem solving,” Trans. Assoc. Comput. Linguist., vol. 6, pp. 159–172, 2018. [CrossRef]
- D. Goldwasser and D. Roth, “Learning from natural instructions,” Mach. Learn., vol. 94, no. 2, pp. 205–232, 2014. [CrossRef]
- T. R. Chiang and Y. N. Chen, “Semantically-aligned equation generation for solving and reasoning math word problems,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, pp. 2656–2668, 2019. [CrossRef]
- D. Huang, J. Liu, C. Y. Lin, and J. Yin, “Neural math word problem solver with reinforcement learning,” COLING 2018 - 27th Int. Conf. Comput. Linguist. Proc., pp. 213–223, 2018.
- M. L. Zhang, F. Yin, Y. H. Hao, and C. L. Liu, “Plane Geometry Diagram Parsing,” IJCAI Int. Jt. Conf. Artif. Intell., pp. 1636–1643, 2022. [CrossRef]
- F. Guo, P. Jian, Y. Wang, and Q. Wang, “A Framework of Cross-Modal Learning for Solving Geometry Problems,” TALE 2021 - IEEE Int. Conf. Eng. Technol. Educ. Proc., pp. 506–512, 2021. [CrossRef]
- P. Jian, F. Guo, Y. Wang, and Y. Li, “Solving Geometry Problems via Feature Learning and Contrastive Learning of Multi-Modal Data,” Comput. Model. Eng. \& Sci., vol. 136, pp. 1707--1728, 2023. [CrossRef]
- S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications,” IEEE Trans. Neural Networks Learn. Syst., pp. 1–27, 2021. [CrossRef]
- T. Yao, Y. Pan, Y. Li, and T. Mei, “Exploring visual relationship for image captioning,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 684–699. [CrossRef]
- H. Cai, V. W. Zheng, and K. C. C. Chang, “A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 9, pp. 1616–1637, 2018. [CrossRef]
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., pp. 1–14, 2017. [CrossRef]
- Y. Yang, C. Huang, L. Xia, and C. Li, Knowledge Graph Contrastive Learning for Recommendation, vol. 1, no. 1. Association for Computing Machinery, 2022. [CrossRef]
- L. Yao, C. Mao, and Y. Luo, “Graph convolutional networks for text classification,” 33rd AAAI Conf. Artif. Intell. AAAI 2019, 31st Innov. Appl. Artif. Intell. Conf. IAAI 2019 9th AAAI Symp. Educ. Adv. Artif. Intell. EAAI 2019, pp. 7370–7377, 2019. [CrossRef]
- F. Guo and P. Jian, “A Graph Convolutional Network Feature Learning Framework for Interpretable Geometry Problem Solving,” in 2022 International Conference on Intelligent Education and Intelligent Research (IEIR), 2022, pp. 59–64. [CrossRef]
- W. Gan, X. Yu, C. Sun, B. He, and M. Wang, “Understanding plane geometry problems by integrating relations extracted from text and diagram,” in Pacific-Rim Symposium on Image and Video Technology, 2017, pp. 366–381. [CrossRef]
- W. Gan, X. Yu, T. Zhang, and M. Wang, “Automatically Proving Plane Geometry Theorems Stated by Text and Diagram,” Int. J. Pattern Recognit. Artif. Intell., vol. 33, no. 7, 2019. [CrossRef]
- M. J. Seo, H. Hajishirzi, A. Farhadi, and O. Etzioni, “Diagram understanding in geometry questions,” Proc. Natl. Conf. Artif. Intell., vol. 4, pp. 2831–2838, 2014. [CrossRef]
- M. Seo, H. Hajishirzi, A. Farhadi, O. Etzioni, and C. Malcolm, “Solving Geometry Problems : Combining Text and Diagram Interpretation,” no. September, pp. 1466–1476, 2015. [CrossRef]
- Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., vol. 4, no. January, pp. 3104–3112, 2014. [CrossRef]
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016. [CrossRef]
- T. Bansal, A. Neelakantan, and A. McCallum, “Relnet: End-to-end modeling of entities \& relations,” arXiv Prepr. arXiv1706.07179, 2017. [CrossRef]
- M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” pp. 7871–7880, 2020. [CrossRef]
- E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32, no. 1. [CrossRef]









| Geometry | Unary geometric attribute | Other geometric attributes | Binary geometric relationship | Numeric Properties |
|---|---|---|---|---|
| Point | Isosceles | AreaOf | PointLiesOnLine | HalfOf |
| Line | Equilateral | PerimeterOf | PointLiesOnCircle | RatioOf |
| Angle | Reguar | RadiusOf | Parallel | SumOf |
| Triangle | DiameterOf | Perpendicular | AverageOf | |
| Quadrilateral | CircumferenceOf | IsMidpointOf | Add | |
| Polygon | AltitudeOf | IsRadiusOf | Mul | |
| Pentagon | HypotenuseOf | IsChordOf | Sub | |
| Circle | MeasureOf | Div | ||
| Arc | LengthOf | Pow | ||
| Equals | ||||
| Find |
| 0 | 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|---|
| Name | Text | Perpendicular | Bar | Parallel | Angle | Double bar |
| Number | 12365 | 1165 | 529 | 397 | 196 | 138 |
| Method | SSD | RetinaNet | Ours |
| mAP(%) | 76.46 | 79.56 | 83.83 |
| All | Angle | Length | Area | Ratio | Line | Triangle | Quad | Circle | Other | |
|---|---|---|---|---|---|---|---|---|---|---|
| RelNet‡ | 29.6 | 26.2 | 34.0 | 20.8 | 41.7 | 29.6 | 33.7 | 25.2 | 28.0 | 25.9 |
| FiLM-BART‡ | 33.0 | 32.1 | 33.0 | 35.8 | 50.0 | 34.6 | 32.6 | 37.1 | 30.1 | 37.0 |
| Inter-GPS‡ | 57.5 | 59.1 | 61.7 | 30.2 | 50.0 | 59.3 | 66.0 | 52.4 | 45.5 | 48.1 |
| Ours | 56.1 | 61.2 | 57.0 | 32.1 | 58.3 | 63.0 | 62.2 | 48.3 | 47.6 | 44.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).