Upscaling human activity data: A statistical ecology approach.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
09
01
2021
accepted:
04
06
2021
entrez:
1
7
2021
pubmed:
2
7
2021
medline:
4
11
2021
Statut:
epublish
Résumé
Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases.
Identifiants
pubmed: 34197484
doi: 10.1371/journal.pone.0253461
pii: PONE-D-21-00748
pmc: PMC8248688
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0253461Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Science. 2009 Sep 25;325(5948):1696-700
pubmed: 19779200
Science. 2016 Nov 4;354(6312):
pubmed: 27811240
Nat Commun. 2019 Apr 15;10(1):1759
pubmed: 30988286
PLoS One. 2013 Aug 21;8(8):e71226
pubmed: 23990938
Sci Rep. 2017 Jan 30;7:41673
pubmed: 28134358
Entropy (Basel). 2017 Dec 27;20(1):
pubmed: 33265102
Nature. 2003 Aug 28;424(6952):1035-7
pubmed: 12944964
Proc Natl Acad Sci U S A. 2008 Nov 25;105(47):18153-8
pubmed: 19017788
Science. 1999 Oct 15;286(5439):509-12
pubmed: 10521342
PLoS One. 2011 Mar 30;6(3):e17680
pubmed: 21479206
Proc Natl Acad Sci U S A. 2015 Jun 16;112(24):7472-7
pubmed: 26034279
Nature. 2005 May 12;435(7039):207-11
pubmed: 15889093
Ecol Lett. 2009 Aug;12(8):789-97
pubmed: 19486123
Sci Rep. 2014 Feb 06;4:3997
pubmed: 24499738
Proc Natl Acad Sci U S A. 2005 Mar 22;102(12):4221-4
pubmed: 15767579
PLoS One. 2017 Jun 8;12(6):e0179303
pubmed: 28594909
Proc Natl Acad Sci U S A. 2012 May 1;109(18):6819-24
pubmed: 22509002
Entropy (Basel). 2018 Apr 24;20(5):
pubmed: 33265399
Phys Rev E Stat Nonlin Soft Matter Phys. 2012 Feb;85(2 Pt 1):021139
pubmed: 22463184
Sci Rep. 2017 Apr 26;7:46677
pubmed: 28443647
Phys Rev Lett. 2013 Feb 22;110(8):088701
pubmed: 23473207
Proc Natl Acad Sci U S A. 2016 Nov 22;113(47):13283-13288
pubmed: 27830649
Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Jul;90(1):012817
pubmed: 25122352
Sci Adv. 2017 Oct 18;3(10):e1701438
pubmed: 29057324
Proc Natl Acad Sci U S A. 2016 Jun 28;113(26):7047-52
pubmed: 27274050
PLoS One. 2012;7(1):e30091
pubmed: 22272279
Nature. 2007 Nov 1;450(7166):45-9
pubmed: 17972874