Submitted:
06 December 2024
Posted:
09 December 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- -
- preprocessing (simplification, lemmatisation),
- -
- using the Jaccard index to measure similarity,
- -
- MultiLayer Perceptron (MLP) classifier.
2. Related Work
3. Methods
3.1. Exact and Approximate Text Matching Scoring Algorithm [1]
- Preprocess the students' answers: remove all punctuation, convert all to lowercase, and split them into a sequence of words.
- Apply lemmatisation to words from step 1 to convert to the basic form.
-
Search in the dictionary for acronyms (words in all upper case) to convert them to their lemma.
- If the word is not found, the Levenshtein distance will be applied to find its nearest word.
- If a number exists inside the student's answer, the exact matching method will be applied.

3.2. Revised Split Algorithm [3]
- ➢
- Length of Data Sets
- If the teacher's answer is longer than the student's answer, the Jaccard measure will be calculated as in section 3.1
-
Otherwise, we expect the teacher's answer to be included in the student's answer. We use the split algorithm and then count the matching words to calculate the score.
- ○
- If all the words from the teacher's answer are found inside the student's answer, the score will be five.
- ○
- If some of the teacher's answer words are found inside the student's answer, the percentage of found words will be calculated for the whole teacher's answer words. The score is five for 100% of the found words, and the score is 0 for less than 70% of the found words.
- ➢
- Levenshtein Distance
- ➢
- Calculate the Score

3.3. ANN-Based Method [2]
- The input set has been filtered from all the punctuation and special characters, separated the numbers, split the sentence into words and removed all extra spaces.
- Using the dictionary [47], assign an integer value to each word. If the word is in a base form or the same integer number with a float extension from 0.1 to 0.5, the word is in a different form. If there is no exact match for a word in the dictionary, the search is repeated using the Levenshtein distance, assuming that this distance cannot exceed 1, as shown in Listing 3.
- After step 2, we obtain a vector of numbers, but its length varies. The structure of the input layer of the network requires that the dimension of the vector be N. If the sequence of numbers is longer, we choose the first N numbers; if it is shorter - we fill it with zeros.
- In this step, the sequence of symbols from step 3 can be very long, and it would slow down the ANN training. Therefore, rescaling the input is necessary, and one of the best methods for rescaling is Min-Max Normalization. So, the input set will be scaled in the range of 0 and 1. The following equation is the Formula of Min-Max:
- 5.
- The training set was divided into two parts: 80% for learning and 20% for testing. The evaluation of the ANN performance was resolute by applying two accuracy measures: Sum Square Error (SSE) and Mean Square error (MSE). SSE was applied during the learning process, and MSE was used to choose the best ANN structure for the testing data set.

3.4. ANN-Based Method with Lemmatisation
3.5. Additional Reference Answers
4. Hybrid ANN-Based Method
5. Results
6. Discussion
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
| Abbreviations | Definition |
| TUL ANN MLP STS GAS ASAG ML NB DT DBN NLP LSTM ATS RN SFRN QRA BERT EFL LLM HRM-SDT AES AI GAI QAC J SSE MSE |
Lodz University of Technology Artificial Neural Network MultiLayer Perceptron Short Text Similarity Grader Assistance System Automatic Short Answer Grading Machine Learning Naive Bayes Decision Trees Deep Belief Networks Natural Language Processing Long Short-Term Memory Automatic Text Scoring (ATS) Relation Network Semantic Feature-wise transformation Relation Network Question (Q), Reference answer (R) and student Answer (A) Bidirectional Encoder Representations from Transformers English as a Foreign Language Large Language Models Hierarchical Rater Model based on Signal Detection Theory Automated Essay Scoring Artificial Intelligence Generative Artificial Intelligence Question Answer Community Jaccard index Sum Square Error Mean Square error |
References
- Jackowska-Strumiłło, L.; Bieniecki, W.; and Saad, M.B. A web system for assessment of students' knowledge. In Proceedings of the 8th International Conference on Human System Interaction (HSI 2015), Warszawa, Poland, 25-27 June 2015; p. 20. [Google Scholar] [CrossRef]
- Saad, M.B.; Jackowska-Strumiłło, L.; and Bieniecki, W. ANN Based Evaluation of Student's Answers in E-tests. In Proceedings of the 11th International Conference on Human System Interaction (HSI 2018), Gdansk, Poland, 4-6 July 2018; pp. 155–161. [Google Scholar] [CrossRef]
- Saad, M.B.; Jackowska-Strumiłło, L.; Bieniecki, W. Algorithms for Automatic Open Questions Scoring in E-Learning Systems. In Proceedings of the International Interdisciplinary PhD Workshop 2017 (IIPhDW 2017), Lodz, Poland, 9- 11 Sep. 2017; pp. 176–182. [Google Scholar]
- Duch, P.; Jaworski, T. Dante - Automated Assessments Tool for Students' Programming Assignments. In Proceedings of the 11th International Conference on Human System Interaction (HIS 2018), Gdansk, Poland, 4-6 July 2018; pp. 162–168. [Google Scholar]
- Stoliński, S.; Bieniecki, W.; and Stasiak-Bieniecka, M. Computer aided assessment of linear and quadratic function graphs using least-squares fitting. In Proceedings of the 2014 Federated Conference on Computer Science and Information Systems (FedCSIS 2014); pp. 651–658. [CrossRef]
- Jackowska-Strumiłło, L.; Nowakowski, J.; Strumiłło, P.; Tomczak, P. Interactive question based learning methodology and clickers: Fundamentals of computer science course case study. In Proceedings of the 6th International Conference on Human System Interactions (HIS 2013), Sopot, Poland, 6-8 June 2013; pp. 439–442. [Google Scholar]
- Dzikovska, M.O.; Nielsen, R.D.; Leacock, C. The Joint Student Response Analysis and Recognising Textual Entailment Challenge: Making Sense of Student Responses in Educational Applications. Language Resources and Evaluation 2016, 50, no–1. [Google Scholar] [CrossRef]
- McDonald, J.; Bird, R.J.; Zouaq, A.; Moskal, A.C.M. Short answers to deep questions: supporting teachers in large-class settings. Journal of Computer Assisted Learning, 4.
- Burrows, S.; Gurevych, I.; Stein, B. The Eras and Trends of Automatic Short Answer Grading. International Journal of Artificial Intelligence in Education 2015, 25, no–1. [Google Scholar] [CrossRef]
- Pulman, S.; Sukkarieh, J. Automatic short answer marking. In Proceedings of the second workshop on Building Educational Applications Using NLP, Ann Arbor, Michigan. Association for Computational Linguistics; 2005; p. 9. [Google Scholar]
- McClelland, J. L.; John M., St.; Taraban, R. Sentence comprehension: A parallel distributed processing approach. Language and cognitive processes 1989, 4, no–3. [Google Scholar] [CrossRef]
- Elsayed, E.; Eldahshan, K.; Tawfeek, S. Automatic evaluation technique for certain types of open questions in semantic learning systems. Hum. Cent. Comput. Inf. Sci. 2013, 3, no–1. [Google Scholar] [CrossRef]
- Leacock, C.; Chodorow, M. C-rater: Automated scoring of short-answer questions. Computers and the Humanities 2003, 37, no–4. [Google Scholar] [CrossRef]
- Singthongchai, J.; Niwattanakul, S. A method for measuring keywords similarity by applying Jaccard's, n-gram and vector space. Lecture Notes on Information Theory 2013, 1, no–4. [Google Scholar] [CrossRef]
- Aghahoseini, P.; (2019). Short Text Similarity: A Survey. Available online: https://www.researchgate.net/publication/337632914_Short_Text_Similarity_A_Survey (accessed on 22/11/2024).
- Pado, U.; and Kiefer, C. Short answer grading: When sorting helps and when it doesn't. In Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning (NODALIDA 2015), Vilnius, Lithuania, 2015, 11th May; p. 42.
- Magooda, A.; Zahran, M.A. ; RashwanM. ; Raafat H.; Florida, USA, May 2016, Fayek M.B. Vector based techniques for short answer grading. In Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference (FLAIRS); pp. 238–243.
- Zhang, Y.; Shah, R.; Chi, M. Deep Learning+ Student Modeling+ Clustering: A Recipe for Effective Automatic Short Answer Grading. In Proceedings of the 9th International Conference on Educational Data Mining (EDM), Raleigh, NC, 2016, International Educational Data Mining Society, Jun 29-Jul 2; pp. 562–567.
- Sakaguchi, K.; Heilman, M.; Madnani, N. Effective feature integration for automated short answer scoring. In Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies, Denver, Colorado, 2015, May 31 – June 5; pp. 1049–1054. [CrossRef]
- Jivani, A.G. A Comparative Study of Stemming Algorithms. International Journal of Computer Technology and Applications, 2016, 2, no–3. [Google Scholar]
- Heilman, M.; Madnani, N. The impact of training data on automated short answer scoring performance. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, Denver, Colorado, 2015, June 4; p. 81.
- Kumar, S.; Chakrabarti, S.; Roy, S. Earth Mover's Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia 19-25 August 2017; pp. 2046–2052. [Google Scholar]
- Alikaniotis, D.; Yannakoudakis, H.; Rei, M. Automatic text scoring using neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, August 2016; pp. 715–725. [Google Scholar] [CrossRef]
- Nowak, J.; Taspinar, A.; Scherer, R. LSTM recurrent neural networks for short text and sentiment classification. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing ICAISC 2017. Lecture Notes in Computer Science, 2017, vol 10246. Springer, Cham. pp. 553–562. [CrossRef]
- Jing, R. A self-attention based LSTM network for text classification. Journal of Physics: Conference Series, 2019, vol. 1207. [CrossRef]
- Le, T. An attention-based deep learning method for text sentiment analysis. In Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 16-18 December 2020; pp. 282–286. [Google Scholar] [CrossRef]
- Li, Z.; Tomar, Y.; Passonneau, R.J. A semantic feature-wise transformation relation network for automatic short answer grading. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. Association for Computational Linguistics, 7–11 November; pp. 6030–6040.
- Oraif, I. Natural Language Processing (NLP) and EFL Learning: A Case Study Based on Deep Learning. Journal of Language Teaching and Research 2024, 15, no–1. [Google Scholar] [CrossRef]
- van Genugten, R.D.; Schacter, D.L. Automated scoring of the autobiographical interview with natural language processing. Behavior Research Methods 2024, 56, no–3. [Google Scholar] [CrossRef] [PubMed]
- Rujun, G.; Merzdorf, H.E.; Anwar, S.; Hipwell, M.C.; Srinivasa, A. Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence 2024, 6, 100206. [Google Scholar] [CrossRef]
- Liu, Q.; Hu, A.; Daniel, B. Online assessment in higher education: a mapping review and narrative synthesis. Research and Practice in Technology Enhanced Learning 2025, 20, 7. [Google Scholar] [CrossRef]
- Fink, A.; Gombert, S.; Liu, T.; Drachsler, H.; Frey, A. A hierarchical rater model approach for integrating automated essay scoring models. Zeitschrift für Psychologie, 2024, 232, 209–218. [Google Scholar] [CrossRef]
- Chang, L.-H.; Ginter, F. Automatic Short Answer Grading for Finnish with ChatGPT. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, 20-27 February 2024, vol. 38, no. 21, Canada (AAAI-24); pp. 23173–23181. [CrossRef]
- Haller, S.; Aldea, A.; Seifert, C.; Strisciuglio, N. Survey on automated short answer grading with deep learning: from word embeddings to transformers. arXiv:2204.03503 (2022).
- Funayama, H.; Sato, T.; Matsubayashi, Y.; Mizumoto, T.; Suzuki, J.; Inui, K. Balancing cost and quality: an exploration of human-in-the-loop frameworks for automated short answer scoring. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED 2022), Lecture Notes in Computer Science (LNCS, vol. 13355), Cham: Springer International Publishing; 2022; pp. 465–476. [Google Scholar]
- Yang, K. , Raković, M. , Li, Y., Guan, Q., Gašević, D., & Chen, Fairness, and Generalizability. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada (AAAI-24), 20-27 February 2024, vol. 38, no. 21, 22466-22474., G. Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy. [CrossRef]
- Chang, L.-H.; Rastas, I.; Pyysalo, S.; Ginter, F. Deep learning for sentence clustering in essay grading support. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021); pp. 614–618.
- Lee, G.-G.; Latif, E.; Wu, X.; Liu, N.; Zhai, X. Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence, 2024, 6, 100213. [Google Scholar] [CrossRef]
- Uto, M.; Itsuki, A.; Tsutsumi, E.; Ueno, M. Integration of prediction scores from various automated essay scoring models using item response theory. IEEE Transactions on Learning Technologies 2023, 16, no–6. [Google Scholar] [CrossRef]
- Shakeel, M.H.; Faizullah, S.; Alghamidi, T.; Khan, I. Language independent sentiment analysis. In Proceedings of the IEEE 2019 International Conference on Advances in the Emerging Computing Technologies (AECT), Al Madinah Al Munawwarah, Saudi Arabia; 2020; p. 1. [Google Scholar] [CrossRef]
- Lintean, M.; Rus, V. Measuring Semantic Similarity in Short Texts through Greedy Pairing and Word Semantics. In Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2012), Marco Island, Florida, USA; 2012; pp. 244–249. [Google Scholar]
- Nakamura, T.; Shirakawa, M.; Hara, T.; Nishio, S. Semantic similarity measurements for multi-lingual short texts using Wikipedia. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Warsaw, Poland, vol. 2, 11-14 August 2014; p. 22. [Google Scholar] [CrossRef]
- Le, T. A hybrid method for text-based sentiment analysis. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI) Las Vegas, NV, USA; 2019; pp. 1392–1397. [Google Scholar] [CrossRef]
- Elalfy, D.; Gad, W.; Ismail, R. A hybrid model to predict best answers in question answering communities. Egypt. Inform. J., 2018, 19, 21–31. [Google Scholar] [CrossRef]
- Paluch, M.; Jackowska-Strumiłło, L. Hybrid Models Combining Technical and Fractal Analysis with ANN for Short-Term Prediction of Close Values on the Warsaw Stock Exchange. Applied Sciences, (MDPI), 2018, 8. [Google Scholar] [CrossRef]
- Jackowska-Strumillo, L.; Cyniak, D.; Czekalski, J.; Jackowski, T. Neural model of the spinning process dedicated to predicting properties of cotton-polyester blended yarns on the basis of the characteristics of feeding streams. Fibres Text. East. Eur. 2008, 16, 28–36. [Google Scholar]
- Słownik jezyka polskiego (Polish language dictionary), https://sjp.pl/slownik/odmiany/.
- Fletcher, S.; Islam, M.Z. Comparing sets of patterns with the Jaccard index. Australasian Journal of Information Systems 2018, 22. [Google Scholar] [CrossRef]
- Graupe, D. Principles of artificial neural networks., 3rd ed.; Advanced Series in Circuits and Systems, 7; World Scientific: Singapore, 2013. [Google Scholar]


| Question 1: Do zapisu znaków alfanumerycznych w komputerach PC stosuje się | ||||
| Teacher answers | Student #1 answer | J | Student #2 answer | J |
| kod ASCII | tablice kodow ASCII | 3.33 | ASCII | 2.5 |
| ASCII | 1.66 | 5 | ||
| znaki ASCII | 1.66 | 2.5 | ||
| ASCI | 0 | 0 | ||
| Question 2: Dyski twarde zaliczamy do pamięci | ||||
| Teacher answers | Student #1 answer | J | Student #2 answer | J |
| magnetycznych | elektryczny magnetyczny | 2.5 | trwałych | 0 |
| trwałych | 0 | 5 | ||
| zewnętrznych | 0 | 0 | ||
| masowych | 0 | 0 | ||
| Question 3: Pojęcie "hardware" określa: | ||||
| Teacher answers | Student #1 answer | J | Student #2 answer | J |
| wszystkie elementy materialne komputera | Sprzęt który składa się na komputer podzespoły komputera | 0.5 | sprzęt | 0 |
| elementy materialne komputera | 0.55 | 0 | ||
| sprzęt | 0.71 | 5 | ||
| sprzęt komputera | 1.42 | 2.5 | ||
| Question 4: Budowa i zasada działania płyty DVD jest najbardziej zbliżona do budowy | ||||
| Teacher answers | Student #1 answer | J | Student #2 answer | J |
| Płyty CD | płyty CD ROM | 3.33 | CD | 2.5 |
| CD | 1.66 | 5 | ||
| Dysku CD | 1.25 | 2.5 | ||
| Krążka CD | 1.25 | 2.5 | ||
| Question | Teacher 1 | Teacher 2 | Teacher 3 | Teacher 4 | New teacher |
|---|---|---|---|---|---|
| Dyski twarde zaliczamy do pamięci | magnetycznych | trwałych | zewnętrznych | masowych | nieulotnych |
| Budowa i zasada działania płyty DVD jest najbardziej zbliżona do budowy: |
płyty CD | CD | dysku CD | krążka CD | płyty która ma pity i landy |
| Type | Expert | Split Algorithm | New Split Algorithm | ||
|---|---|---|---|---|---|
| As Correct | As Incorrect | As Correct | As Incorrect | ||
| Correct | 1288 | True Positive | False Negative | True Positive | False Negative |
| 749 (58.16%) | 539 (41.84%) | 958 (74.3%) | 330 (25.6%) | ||
| Incorrect | 1766 | False Positive | True Negative | False Positive | True Negative |
| 1(0.1%) | 1765 (99.9%) | 0 (0%) | 1766 (100%) | ||
| Type | Expert | ANN Algorithm | New ANN Algorithm | ||
|---|---|---|---|---|---|
| As Correct | As Incorrect | As Correct | As Incorrect | ||
| Correct | 1288 | True Positive | False Negative | True Positive | False Negative |
| 550 (42.71%) | 738 (57.29%) | 1111 (86.26%) | 177 (13.74%) | ||
| Incorrect | 1766 | False Positive | True Negative | False Positive | True Negative |
| 293 (16.59%) | 1473 (83.41%) | 329 (18.6%) | 1437 (81.3%) | ||
| Type | Expert | New Split Algorithm | New ANN Algorithm | Hybrid Method | |||
|---|---|---|---|---|---|---|---|
| As Correct | As Incorrect | As Correct | As Incorrect | As Correct | As Incorrect | ||
| Correct | 1288 | True Positive | False Negative | True Positive | False Negative | True Positive | False Negative |
| 958 (74.3%) | 330 (25.6%) | 1111 (86.26%) | 177 (13.74%) | 1250 (97%) | 38 (3%) | ||
| Incorrect | 1766 | False Positive | True Negative | False Positive | True Negative | False Positive | True Negative |
| 0 (0%) | 1766 (100%) | 329 (18.6%) | 1437 (81.3%) | 0 (0%) | 1766 (100%) | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).