Collecting data on textiles from the internet using web crawling and web scraping tools.

Fiber Fibre Forensic Interpretation Market study Population study

Journal

Forensic science international
ISSN: 1872-6283
Titre abrégé: Forensic Sci Int
Pays: Ireland
ID NLM: 7902034

Informations de publication

Date de publication:
May 2021
Historique:
received: 13 08 2020
revised: 09 03 2021
accepted: 11 03 2021
pubmed: 23 3 2021
medline: 23 3 2021
entrez: 22 3 2021
Statut: ppublish

Résumé

Fibre population surveys are a necessary part of the forensic fibres examination field. They provide valuable information as to which fibres are the most popular and help estimate the likelihood of observing similar properties in a fibre unrelated to the event. The time needed to carry these types of studies is however a major obstacle to wider use. With the advent of e-commerce and digital computation, collecting information from digital sources and structuring it in a convenient way may provide meaningful information on fibres population. It has become more affordable for researchers who can now devote most of their time to extracting meaningful information from the structured data. In this article, we have used a scrapy and kibana/elastic search interface to crawl and scrape a major online clothes retailer. In less than 24 h we have extracted 68 text-based field describing a total of 24,701 clothes to help provide precise estimations of fibres types and color frequencies. We were able to provide data that cotton, polyester, viscose and elastane are the 4 main types of fibres used in the textile industry. Elastane, while being very popular in garments, rarely accounts for more than 10% of the mass while cotton accounts for up to 80% of content. The most common colors are white, black, and blue, with important dependencies to the fibre type. Through further statistics and examples we demonstrate that web scraping techniques have the potential to provide near real-time population studies that can greatly benefit forensic practitioners.

Identifiants

pubmed: 33752084
pii: S0379-0738(21)00073-6
doi: 10.1016/j.forsciint.2021.110753
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

110753

Informations de copyright

Copyright © 2021 Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Cyril Muehlethaler (C)

University of Quebec at Trois-Rivières, Canada; Laboratoire de Recherche en Criminalistique, Trois-Rivières, Canada; Centre International de Criminologie Comparee, Montreal, Canada. Electronic address: Cyril.Muehlethaler@uqtr.ca.

René Albert (R)

Centre International de Criminologie Comparee, Montreal, Canada.

Classifications MeSH