Categories
Nevin Manimala Statistics

Collecting data on textiles from the internet using web crawling and web scraping tools

Forensic Sci Int. 2021 Mar 15;322:110753. doi: 10.1016/j.forsciint.2021.110753. Online ahead of print.

ABSTRACT

Fibre population surveys are a necessary part of the forensic fibres examination field. They provide valuable information as to which fibres are the most popular and help estimate the likelihood of observing similar properties in a fibre unrelated to the event. The time needed to carry these types of studies is however a major obstacle to wider use. With the advent of e-commerce and digital computation, collecting information from digital sources and structuring it in a convenient way may provide meaningful information on fibres population. It has become more affordable for researchers who can now devote most of their time to extracting meaningful information from the structured data. In this article, we have used a scrapy and kibana/elastic search interface to crawl and scrape a major online clothes retailer. In less than 24 h we have extracted 68 text-based field describing a total of 24,701 clothes to help provide precise estimations of fibres types and color frequencies. We were able to provide data that cotton, polyester, viscose and elastane are the 4 main types of fibres used in the textile industry. Elastane, while being very popular in garments, rarely accounts for more than 10% of the mass while cotton accounts for up to 80% of content. The most common colors are white, black, and blue, with important dependencies to the fibre type. Through further statistics and examples we demonstrate that web scraping techniques have the potential to provide near real-time population studies that can greatly benefit forensic practitioners.

PMID:33752084 | DOI:10.1016/j.forsciint.2021.110753

By Nevin Manimala

Portfolio Website for Nevin Manimala