Dataset for file fragment classification of textual file formats.


Journal

BMC research notes
ISSN: 1756-0500
Titre abrégé: BMC Res Notes
Pays: England
ID NLM: 101462768

Informations de publication

Date de publication:
11 Dec 2019
Historique:
received: 21 10 2019
accepted: 29 11 2019
entrez: 13 12 2019
pubmed: 13 12 2019
medline: 8 5 2020
Statut: epublish

Résumé

Classification of textual file formats is a topic of interest in network forensics. There are a few publicly available datasets of files with textual formats. Therewith, there is no public dataset for file fragments of textual file formats. So, a big research challenge in file fragment classification of textual file formats is to compare the performance of the developed methods over the same datasets. In this study, we present a dataset that contains file fragments of five textual file formats: Binary file format for Word 97-Word 2003, Microsoft Word open XML format, portable document format, rich text file, and standard text document. This dataset contains the file fragments in three different languages: English, Persian, and Chinese. For each pair of file format and language, 1500 file fragments are provided. So, the dataset of file fragments contains 22,500 file fragments.

Identifiants

pubmed: 31829258
doi: 10.1186/s13104-019-4837-4
pii: 10.1186/s13104-019-4837-4
pmc: PMC6907108
doi:

Types de publication

Dataset Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

801

Références

BMC Res Notes. 2019 Dec 11;12(1):801
pubmed: 31829258

Auteurs

Fatemeh Mansouri Hanis (F)

Information Theory and Coding Laboratory, University of Tehran, Tehran, Iran.

Mehdi Teimouri (M)

Information Theory and Coding Laboratory, University of Tehran, Tehran, Iran. mehditeimouri@ut.ac.ir.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH