Cartolabe: A Web-Based Scalable Visualization of Large Document Collections.


Journal

IEEE computer graphics and applications
ISSN: 1558-1756
Titre abrégé: IEEE Comput Graph Appl
Pays: United States
ID NLM: 9881869

Informations de publication

Date de publication:
Historique:
pubmed: 24 10 2020
medline: 24 10 2020
entrez: 23 10 2020
Statut: ppublish

Résumé

We describe Cartolabe, a web-based multiscale system for visualizing and exploring large textual corpora based on topics, introducing a novel mechanism for the progressive visualization of filtering queries. Initially designed to represent and navigate through scientific publications in different disciplines, Cartolabe has evolved to become a generic framework and accommodate various corpora, ranging from Wikipedia (4.5M entries) to the French National Debate (4.3M entries). Cartolabe is made of two modules: The first relies on natural language processing methods, converting a corpus and its entities (documents, authors, and concepts) into high-dimensional vectors, computing their projection on the two-dimensional plane, and extracting meaningful labels for regions of the plane. The second module is a web-based visualization, displaying tiles computed from the multidimensional projection of the corpus using the Umap projection method. This visualization module aims at enabling users with no expertise in visualization and data analysis to get an overview of their corpus, and to interact with it: exploring, querying, filtering, panning, and zooming on regions of semantic interest. Three use cases are discussed to illustrate Cartolabe's versatility and ability to bring large-scale textual corpus visualization and exploration to a wide audience.

Identifiants

pubmed: 33095705
doi: 10.1109/MCG.2020.3033401
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

76-88

Auteurs

Classifications MeSH