Hierarchical Graph Convolutional Networks for Structured Long Document Classification.


Journal

IEEE transactions on neural networks and learning systems
ISSN: 2162-2388
Titre abrégé: IEEE Trans Neural Netw Learn Syst
Pays: United States
ID NLM: 101616214

Informations de publication

Date de publication:
Oct 2023
Historique:
medline: 30 6 2022
pubmed: 30 6 2022
entrez: 29 6 2022
Statut: ppublish

Résumé

Long document classification (LDC) has been a focused interest in natural language processing (NLP) recently with the exponential increase of publications. Based on the pretrained language models, many LDC methods have been proposed and achieved considerable progression. However, most of the existing methods model long documents as sequences of text while omitting the document structure, thus limiting the capability of effectively representing long texts carrying structure information. To mitigate such limitation, we propose a novel hierarchical graph convolutional network (HGCN) for structured LDC in this article, in which a section graph network is proposed to model the macrostructure of a document and a word graph network with a decoupled graph convolutional block is designed to extract the fine-grained features of a document. In addition, an interaction strategy is proposed to integrate these two networks as a whole by propagating features between them. To verify the effectiveness of the proposed model, four structured long document datasets are constructed, and the extensive experiments conducted on these datasets and another unstructured dataset show that the proposed method outperforms the state-of-the-art related classification methods.

Identifiants

pubmed: 35767491
doi: 10.1109/TNNLS.2022.3185295
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

8071-8085

Auteurs

Classifications MeSH