Submitted:
05 December 2023
Posted:
06 December 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Probing Test and Study Design
2.1. Elaborating on Dependency Trees
2.2. Exploring Parallel Universal Dependencies
2.3. Insights into the Tree Distinguishing Polynomial
2.4. Generalizing Polynomial for Dependency Trees
2.5. Quantifying Polynomial Distance in Dependency Trees
2.6. Experimental Design
- The ENG dataset is the largest, comprising 750 sentences initially penned in English. Each sentence is accompanied by 20 distinct dependency trees, representing translations into the 20 languages included in our study. This results in a total of 15,000 dependency trees within the ENG dataset.
- The GER dataset encompasses 100 German-origin sentences, with each having 20 translated dependency trees, totaling 2,000 trees.
- Similar to the GER dataset, the FRE, ITA, and SPA datasets each contain 50 original sentences in French, Italian, and Spanish, respectively. Each sentence in these datasets also translates into 20 different languages, summing up to 1,000 dependency trees per dataset.
2.6.1. Dependency Tree Analysis Across Languages
2.6.2. Visualizations and Syntactic Typology Study
2.6.3. Corpus Analysis and Syntax Diversity Measurement
2.7. Syntax Comparison of Sentences
2.7.1. Detailed Syntax Analysis in Sentence Pairs
2.8. Syntactic Similarity of Languages
2.8.1. In-Depth Analysis of Language Distance Matrices
2.8.2. Comparative Study of Language Families
2.8.3. Global Syntactic Landscape
3. Discussions and Conclusions
References
- Imrényi, A.; Mazziotta, N. Chapters of Dependency Grammar: A Historical Survey from Antiquity to Tesnière; Amsterdam/Philadelphia: John Benjamins Publishing Company, 2020.
- de Marneffe, M.C.; Manning, C.D.; Nivre, J.; Zeman, D. Universal Dependencies. Computational Linguistics 2021, 47, 255–308. [Google Scholar] [CrossRef]
- Zeman, D.; Popel, M.; Straka, M.; Hajič, J.; Nivre, J.; Ginter, F.; Luotolahti, J.; Pyysalo, S.; Petrov, S.; Potthast, M.; Tyers, F.; Badmaeva, E.; Gokirmak, M.; Nedoluzhko, A.; Cinková, S.; Hajič jr., J.; Hlaváčová, J.; Kettnerová, V.; Urešová, Z.; Kanerva, J.; Ojala, S.; Missilä, A.; Manning, C.D.; Schuster, S.; Reddy, S.; Taji, D.; Habash, N.; Leung, H.; de Marneffe, M.C.; Sanguinetti, M.; Simi, M.; Kanayama, H.; de Paiva, V.; Droganova, K.; Martínez Alonso, H.; Çöltekin, Ç.; Sulubacak, U.; Uszkoreit, H.; Macketanz, V.; Burchardt, A.; Harris, K.; Marheinecke, K.; Rehm, G.; Kayadelen, T.; Attia, M.; Elkahky, A.; Yu, Z.; Pitler, E.; Lertpradit, S.; Mandl, M.; Kirchner, J.; Alcalde, H.F.; Strnadová, J.; Banerjee, E.; Manurung, R.; Stella, A.; Shimada, A.; Kwak, S.; Mendonça, G.; Lando, T.; Nitisaroj, R.; Li, J. CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies; Association for Computational Linguistics: Vancouver, Canada, 2017; pp. 1–19.
- Chen, X.; Gerdes, K. Classifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks. Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017); Linköping University Electronic Press: Pisa, Italy, 2017; pp. 54–63. [Google Scholar]
- Fei, H.; Ren, Y.; Zhang, Y.; Ji, D.; Liang, X. Enriching contextualized language model from knowledge graph for biomedical information extraction. Briefings in Bioinformatics 2021, 22. [Google Scholar] [CrossRef]
- Chen, X.; Gerdes, K. Dependency Distances and Their Frequencies in Indo-European Language. Journal of Quantitative Linguistics 2022, 29, 106–125. [Google Scholar] [CrossRef]
- Fei, H.; Ren, Y.; Ji, D. Boundaries and edges rethinking: An end-to-end neural model for overlapping entity relation extraction. Information Processing & Management 2020, 57, 102311. [Google Scholar]
- Gerdes, K.; Kahane, S.; Chen, X. Typometrics: From implicational to quantitative universals in word order typology. Glossa: a journal of general linguistics 2021, 6. [Google Scholar]
- Lei, L.; Wen, J. Is dependency distance experiencing a process of minimization? A diachronic study based on the State of the Union addresses. Lingua 2020, 239, 102762. [Google Scholar] [CrossRef]
- Fei, H.; Ren, Y.; Ji, D. Retrofitting Structure-aware Transformer Language Model for End Tasks. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, pp. 2151–2161.
- Jones, V.F.R. A polynomial invariant for knots via von Neumann algebras. Bulletin of the American Mathematical Society 1985, 12, 103–111. [Google Scholar] [CrossRef]
- Freyd, P.; Yetter, D.; Hoste, J.; Lickorish, W.B.R.; Millett, K.; Ocneanu, A. A new polynomial invariant of knots and links. Bulletin of the American Mathematical Society 1985, 12, 239–246. [Google Scholar] [CrossRef]
- H. Kauffman, L. State models and the jones polynomial. Topology 1987, 26, 395–407. [Google Scholar] [CrossRef]
- Fei, H.; Wu, S.; Ren, Y.; Zhang, M. Matching Structure for Dual Learning. Proceedings of the International Conference on Machine Learning, ICML, 2022, pp. 6373–6391.
- Li, J.; Xu, K.; Li, F.; Fei, H.; Ren, Y.; Ji, D. MRN: A Locally and Globally Mention-Based Reasoning Network for Document-Level Relation Extraction. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 1359–1370.
- Thistlethwaite, M.B. A spanning tree expansion of the jones polynomial. Topology 1987, 26, 297–309. [Google Scholar] [CrossRef]
- Diao, Y.; Hetyei, G.; Liu, P. The braid index of reduced alternating links. Mathematical Proceedings of the Cambridge Philosophical Society 2020, 168, 415–434. [Google Scholar] [CrossRef]
- Li, J.; Fei, H.; Liu, J.; Wu, S.; Zhang, M.; Teng, C.; Ji, D.; Li, F. Unified Named Entity Recognition as Word-Word Relation Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 10965–10973.
- Murasugi, K. On the Braid Index of Alternating Links. Transactions of the American Mathematical Society 1991, 326, 237–260. [Google Scholar] [CrossRef]
- Fei, H.; Wu, S.; Ren, Y.; Li, F.; Ji, D. Better Combine Them Together! Integrating Syntactic Constituency and Dependency Representations for Semantic Role Labeling. Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021, pp. 549–559.
- Tutte, W.T. A Contribution to the Theory of Chromatic Polynomials. Canadian Journal of Mathematics 1954, 6, 80–91. [Google Scholar] [CrossRef]
- Liu, P. A tree distinguishing polynomial. Discrete Applied Mathematics 2021, 288, 1–8. [Google Scholar] [CrossRef]
- Wu, S.; Fei, H.; Li, F.; Zhang, M.; Liu, Y.; Teng, C.; Ji, D. Mastering the Explicit Opinion-Role Interaction: Syntax-Aided Neural Transition System for Unified Opinion Role Labeling. Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022, pp. 11513–11521.
- Shi, W.; Li, F.; Li, J.; Fei, H.; Ji, D. Effective Token Graph Modeling using a Novel Labeling Strategy for Structured Sentiment Analysis. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 4232–4241.
- Fei, H.; Zhang, Y.; Ren, Y.; Ji, D. Latent Emotion Memory for Multi-Label Emotion Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 7692–7699.
- Wang, F.; Li, F.; Fei, H.; Li, J.; Wu, S.; Su, F.; Shi, W.; Ji, D.; Cai, B. Entity-centered Cross-document Relation Extraction. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 9871–9881.
- Liu, P.; Biller, P.; Gould, M.; Colijn, C. Analyzing Phylogenetic Trees with a Tree Lattice Coordinate System and a Graph Polynomial. Systematic Biology 2022, 71, 1378–1390. [Google Scholar] [CrossRef]
- Janssen, R.; Liu, P. Comparing the topology of phylogenetic network generators. Journal of bioinformatics and computational biology 2021, 19, 2140012. [Google Scholar] [CrossRef]
- Pons, J.C.; Coronado, T.M.; Hendriksen, M.; Francis, A. A polynomial invariant for a new class of phylogenetic networks. PLOS ONE 2022, 17, 1–22. [Google Scholar] [CrossRef]
- van Iersel, L.; Moulton, V.; Murakami, Y. Polynomial invariants for cactuses. Preprint 2022. [Google Scholar] [CrossRef]
- Shang, M.; Li, P.; Fu, Z.; Bing, L.; Zhao, D.; Shi, S.; Yan, R. Semi-supervised Text Style Transfer: Cross Projection in Latent Space. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 4939–4948.
- Wu, S.; Fei, H.; Ren, Y.; Ji, D.; Li, J. Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms Extraction with Rich Syntactic Knowledge. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021, pp. 3957–3963.
- Fei, H.; Li, F.; Li, B.; Ji, D. Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 12794–12802.
- Fei, H.; Wu, S.; Li, J.; Li, B.; Li, F.; Qin, L.; Zhang, M.; Zhang, M.; Chua, T.S. LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model. Proceedings of the Advances in Neural Information Processing Systems, NeurIPS 2022, 2022, pp. 15460–15475.
- Wu, S.; Fei, H.; Ji, W.; Chua, T.S. Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 2593–2608.
- Culotta, A.; Sorensen, J. Dependency Tree Kernels for Relation Extraction. Proceedings of the 42nd annual meeting on association for computational linguistics, 2004, p. 423.
- Luo, Q.; Xi, J. A novel similarity measure for dependency trees [query answer system example]. Proceedings. 2005 International Conference on Communications, Circuits and Systems, 2005, p. 785.
- Reis, D.C.; Golgher, P.B.; Silva, A.S.; Laender, A.F. Automatic Web News Extraction Using Tree Edit Distance. Proceedings of the 13th International Conference on World Wide Web; Association for Computing Machinery: New York, NY, USA, 2004; WWW ’04, p. 502–511.
- Cox, T.F.; Cox, M.A. Multidimensional scaling, 2nd ed.; Monographs on statistics and applied probability; 88, Chapman & Hall, 2001.
- Sokal, R.R.; Michener, C.D. A statistical method for evaluating systematic relationships. University of Kansas science bulletin 1958, 38, 1409–1438. [Google Scholar]
- Bryant, D.; Tupper, P.F. Hyperconvexity and tight-span theory for diversities. Advances in Mathematics 2012, 231, 3172–3198. [Google Scholar] [CrossRef]
- Forkel, R.; Hammarström, H. Glottocodes: Identifiers linking families, languages and dialects to comprehensive reference information. Semantic Web 2022, 13, 917–924. [Google Scholar] [CrossRef]
- Robbeets, M.; Bouckaert, R.; Conte, M.; Savelyev, A.; Li, T.; An, D.I.; Shinoda, K.i.; Cui, Y.; Kawashima, T.; Kim, G.; Uchiyama, J.; Dolińska, J.; Oskolskaya, S.; Yamano, K.Y.; Seguchi, N.; Tomita, H.; Takamiya, H.; Kanzawa-Kiriyama, H.; Oota, H.; Ishida, H.; Kimura, R.; Sato, T.; Kim, J.H.; Deng, B.; Bjørn, R.; Rhee, S.; Ahn, K.D.; Gruntov, I.; Mazo, O.; Bentley, J.R.; Fernandes, R.; Roberts, P.; Bausch, I.R.; Gilaizeau, L.; Yoneda, M.; Kugai, M.; Bianco, R.A.; Zhang, F.; Himmel, M.; Hudson, M.J.; Ning, C. Triangulation supports agricultural spread of the Transeurasian languages. Nature 2021, 599, 616–621. [Google Scholar] [CrossRef]
| Index | Dependency Arc | Index | Dependency Arc |
|---|---|---|---|
| 1 | Adjectival clause modifier | 20 | Fixed multiword expression |
| 2 | Adverbial clause modifier | 21 | Flat multiword expression |
| 3 | Adverbial modifier | 22 | Goes with |
| 4 | Adjectival modifier | 23 | Indirect object |
| 5 | Appositional modifier | 24 | List |
| 6 | Auxiliary | 25 | Marker |
| 7 | Case marking | 26 | Nominal modifier |
| 8 | Coordinating conjunction | 27 | Nominal subject |
| 9 | Clausal complement | 28 | Numeric modifier |
| 10 | Classifier | 29 | Object |
| 11 | Compound | 30 | Oblique nominal |
| 12 | Conjunct | 31 | Orphan |
| 13 | Copula | 32 | Parataxis |
| 14 | Clausal subject | 33 | Punctuation |
| 15 | Unspecified dependency | 34 | Overridden disfluency |
| 16 | Determiner | 35 | Root |
| 17 | Discourse element | 36 | Vocative |
| 18 | Dislocated elements | 37 | Open clausal complement |
| 19 | Expletive |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).