Informational content of cosine and other similarities calculated from high-dimensional Conceptual Property Norm data.

Chebyshev distance Clustering Conceptual properties Cosine similarity Euclidean distance

Journal

Cognitive processing
ISSN: 1612-4790
Titre abrégé: Cogn Process
Pays: Germany
ID NLM: 101177984

Informations de publication

Date de publication:
Nov 2020
Historique:
received: 22 04 2019
accepted: 01 07 2020
pubmed: 11 7 2020
medline: 25 11 2020
entrez: 11 7 2020
Statut: ppublish

Résumé

To study concepts that are coded in language, researchers often collect lists of conceptual properties produced by human subjects. From these data, different measures can be computed. In particular, inter-concept similarity is an important variable used in experimental studies. Among possible similarity measures, the cosine of conceptual property frequency vectors seems to be a de facto standard. However, there is a lack of comparative studies that test the merit of different similarity measures when computed from property frequency data. The current work compares four different similarity measures (cosine, correlation, Euclidean and Chebyshev) and five different types of data structures. To that end, we compared the informational content (i.e., entropy) delivered by each of those 4 × 5 = 20 combinations, and used a clustering procedure as a concrete example of how informational content affects statistical analyses. Our results lead us to conclude that similarity measures computed from lower-dimensional data fare better than those calculated from higher-dimensional data, and suggest that researchers should be more aware of data sparseness and dimensionality, and their consequences for statistical analyses.

Identifiants

pubmed: 32647948
doi: 10.1007/s10339-020-00985-5
pii: 10.1007/s10339-020-00985-5
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

601-614

Subventions

Organisme : Fondo de Fomento al Desarrollo Científico y Tecnológico
ID : 1200139

Auteurs

Enrique Canessa (E)

Center for Cognition Research (CINCO), School of Psychology, Universidad Adolfo Ibáñez, Av. Presidente Errázuriz 3328, Las Condes, Santiago, Chile. ecanessa@uai.cl.
Faculty of Engineering and Science, Universidad Adolfo Ibáñez, Av. Padre Hurtado 750, Lote H, Viña del Mar, Chile. ecanessa@uai.cl.

Sergio E Chaigneau (SE)

Center for Cognition Research (CINCO), School of Psychology, Universidad Adolfo Ibáñez, Av. Presidente Errázuriz 3328, Las Condes, Santiago, Chile.
Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Av. Presidente Errázuriz 3328, Las Condes, Santiago, Chile.

Sebastián Moreno (S)

Faculty of Engineering and Science, Universidad Adolfo Ibáñez, Av. Padre Hurtado 750, Lote H, Viña del Mar, Chile.

Rodrigo Lagos (R)

Programa Magister en Bioestadística, Universidad de Chile, Santiago, Chile.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH