Unsupervised AI reveals insect species-specific genome signatures.

Chromatin Genome signature Insect genome Oligonucleotide usage Transcription factor binding motifs Unsupervised machine learning

Journal

PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425

Informations de publication

Date de publication:
2024
Historique:
received: 17 11 2023
accepted: 07 02 2024
medline: 11 3 2024
pubmed: 11 3 2024
entrez: 11 3 2024
Statut: epublish

Résumé

Insects are a highly diverse phylogeny and possess a wide variety of traits, including the presence or absence of wings and metamorphosis. These diverse traits are of great interest for studying genome evolution, and numerous comparative genomic studies have examined a wide phylogenetic range of insects. Here, we analyzed 22 insects belonging to a wide phylogenetic range (Endopterygota, Paraneoptera, Polyneoptera, Palaeoptera, and other insects) by using a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions in their genomic fragments (100-kb or 1-Mb sequences), which is an unsupervised machine learning algorithm that can extract species-specific characteristics of the oligonucleotide compositions (genome signatures). The genome signature is of particular interest in terms of the mechanisms and biological significance that have caused the species-specific difference, and can be used as a powerful search needle to explore the various roles of genome sequences other than protein coding, and can be used to unveil mysteries hidden in the genome sequence. Since BLSOM is an unsupervised clustering method, the clustering of sequences was performed based on the oligonucleotide composition alone, without providing information about the species from which each fragment sequence was derived. Therefore, not only the interspecies separation, but also the intraspecies separation can be achieved. Here, we have revealed the specific genomic regions with oligonucleotide compositions distinct from the usual sequences of each insect genome,

Identifiants

pubmed: 38464746
doi: 10.7717/peerj.17025
pii: 17025
pmc: PMC10924456
doi:

Banques de données

figshare
['10.6084/m9.figshare.25036358.v1']

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e17025

Informations de copyright

©2024 Sawada et al.

Déclaration de conflit d'intérêts

The authors declare there are no competing interests.

Auteurs

Yui Sawada (Y)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Ryuhei Minei (R)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Hiromasa Tabata (H)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Toshimichi Ikemura (T)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Kennosuke Wada (K)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Yoshiko Wada (Y)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Hiroshi Nagata (H)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Yuki Iwasaki (Y)

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan.

Classifications MeSH