On the fractal patterns of language structures.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2023
2023
Historique:
received:
24
12
2022
accepted:
26
04
2023
medline:
22
5
2023
pubmed:
18
5
2023
entrez:
18
5
2023
Statut:
epublish
Résumé
Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.
Identifiants
pubmed: 37200318
doi: 10.1371/journal.pone.0285630
pii: PONE-D-22-35269
pmc: PMC10194960
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0285630Informations de copyright
Copyright: © 2023 Ribeiro et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
PLoS One. 2021 Feb 18;16(2):e0247133
pubmed: 33600483
Science. 1999 Oct 15;286(5439):509-12
pubmed: 10521342
Entropy (Basel). 2020 Aug 17;22(8):
pubmed: 33286673
Nature. 2005 Mar 17;434(7031):289
pubmed: 15772637
PLoS One. 2010 Mar 09;5(3):e9411
pubmed: 20231884
Nature. 2003 Nov 27;426(6965):435-9
pubmed: 14647380
Proc Biol Sci. 2005 Mar 7;272(1562):561-5
pubmed: 15799952
PLoS One. 2015 Jun 19;10(6):e0130617
pubmed: 26091207