Submitted:
25 June 2024
Posted:
26 June 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A customized BERT model architecture with the addition of an encoder/decoder module and a highway network module has been proposed. The proposed model is developed to be integrated into an Italian-specific chatbot to improve the work of PA staff by reducing time and errors in processing and understanding documents.
- The proposed model is trained and tested on the Italian version of the SQuAD dataset to evaluate the model's ability to process the Italian language
- The results of the experiments conducted on the SQuAD-IT dataset show that the proposed model has a good ability to provide exactly the expected answers. Moreover, a comparative analysis shows that the proposed model outperforms compared to other NLP models, such as BIDAF.
2. Related Works
3. Materials and Methods
-
The first module consists of BERT architecture to encode the input into a vector representation that is then processed by the subsequent structures. BERT architecture can be represented as a multilayer bidirectional transformer encoder. BERT's pre-training is based on two different unsupervised tasks: the Masked Language Model (MLM) and Next Sentence Prediction (NSP). In MLM, a portion of the words in the text are masked, and the model must predict them. In NSP, the model must determine whether two sentences appear consecutive in the original text. This pre-training makes the BERT model scalable (fine-tuned) for different tasks, such as QA. BERT takes as input the combinations of the question and context as a single embedded sequence. The input embeddings are the sum of the token embeddings and segment embeddings. Specifically, token embeddings represent the encoding of the question into an embedding vector, and segment embeddings represent vectors indicating the segment to which each token corresponds. The segment embeddings are used to distinguish between question and context in the input text.Let represents a sequence of input, and represents an embedded achieved combining the token and segment embeddings for each . The sequences of embeddings is the input of the BERT module. The module BERT processes the embedding sequence through transformer layers to obtain the output sequences where is the hidden representation of at the level.
- The encoder/decoder module consists of two sequential BiLSTM layers. The introduction of this module better captures the context and temporal sequence of words, thus improving the model's overall performance in understanding. Specifically, the BERT output is taken as input to encoder/decoder module to produce a new sequence of hidden representations .
-
The Highway network module aims to filter out irrelevant information before processing the last dense layers. Highway Network transformations are based on a linear combination between the non-linear transformation of the input and the original input following a gate function. The output of the Highway network module is defined in Equation 1, where represents the element-by-element multiplication.The linear transformation is defined by Equation 2, where and are the weights and the bias of the gate function, respectively, and represents the sigmoid function.The non-linear transformation is defined in Equation 3, where and are the weights and the bias of the non-linear transformation, respectively, and is the activation function.
- The output module consists of two fully connected layers with softmax activation function. The output module predicts the start and end positions of the response within the context following Equations 4 and 5, respectively.where and represent the weights and bias of the fully connected layer for the prediction of the start token.where and represent the weights and bias of the fully connected layer for the prediction of the end token.
4. Dataset
5. Experimental Evaluation
6. Conclusion
Author Contributions
Funding
Conflicts of Interest
References
- V. C. I. L. M. I. Pislaru M., "Citizen-Centric Governance: Enhancing Citizen Engagement through Artificial Intelligence Tools," Sustainability (Switzerland), vol. 16, no. 7, 2024. [CrossRef]
- J. C. M. W. L. K. &. T. K. Devlin, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv, 2018.
- A. &. M. A. Severyn, "Modeling relational information in question-answer pairs with convolutional neural networks," arXiv preprint arXiv, 2016.
- Z. M. H. &. I. A. Wang, "Sentence similarity learning by lexical decomposition and composition," arXiv preprint arXiv, 2016.
- R. B. G. T. Z. M. S. R. J. Z. H. &. L. J. Sequiera, "Exploring the effectiveness of convolutional neural networks for answer selection in end-to-end question answering," arXiv preprint arXiv, 2017.
- Z. H. W. &. F. R. Wang, "Bilateral multi-perspective matching for natural language sentences.," arXiv preprint arXiv, 2017.
- Y. P. M. C. T. L. A. &. H. S. C. Tay, "Learning to rank question answer pairs with holographic dual lstm architecture.," Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval , pp. 695-704, 2017.
- M. M. A. Z. a. S. Z. Mishu, "Convolutional recurrent neural network for question answering," in 3rd International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 2017.
- N. Y. F. W. B. C. a. M. Z. Wenhui Wang, " Gated Self-Matching Networks for Reading Comprehension and Question Answering," in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017.
- Y. G. H. C. a. Z. H. T. Shao, "Transformer-Based Neural Network for Answer Selection in Question Answering," IEEE Access, vol. 7, pp. 26146-26156, 2019. [CrossRef]
- L. G. A. M. Kamyab M., "Attention-Based CNN and Bi-LSTM Model Based on TF-IDF and GloVe Word Embedding for Sentiment Analysis," Applied Sciences (Switzerland), vol. 11, no. 23, 2021. [CrossRef]
- M. K. A. F. A. &. H. H. Seo, "Bidirectional attention flow for machine comprehension," arXiv preprint arXiv, 2016.
- D. Z. A. B. R. Croce, "Neural Learning for Question Answering in Italian," in 17th Conference of the Italian Association for Artificial Intelligence, AI*IA 2018, 2018.
- J. Z. K. L. a. P. L. Pranav Rajpurkar, " SQuAD: 100,000+ Questions for Machine Comprehension of Text," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 2016.

| Parameter | Value |
|---|---|
| Encoding dimension | 128 |
| Decoding dimension | 64 |
| Loss | Sparse Categorical Cross-Entropy |
| Optimizer | Adam |
| Batch size | 8 |
| Learning rate | 5e-5 |
| Number of epochs | 6 |
| Dropout | False |
| Metric | Score (%) |
|---|---|
| F1-score | 59.4087 |
| EM | 46.2371 |
| Parameter | Value |
|---|---|
| Loss | Sparse Categorical Cross-Entropy |
| Optimizer | Adam |
| Batch size | 10 |
| Learning rate | 5e-4 |
| Number of epochs | 10 |
| Dropout | 0.2 |
| Model | F1-score | EM |
|---|---|---|
| Proposed | 59.4087 | 46.2371 |
| BIDAF | 49.3504 | 38.4313 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).