RGB-D scene analysis in the NICU.
Computer vision
Documentation
Image classification
Image processing
Knowledge transfer
Multimodal sensors
Neural networks
Patient monitoring
Scene analysis
Sensor fusion
Journal
Computers in biology and medicine
ISSN: 1879-0534
Titre abrégé: Comput Biol Med
Pays: United States
ID NLM: 1250250
Informations de publication
Date de publication:
11 2021
11 2021
Historique:
received:
09
07
2021
revised:
09
09
2021
accepted:
13
09
2021
pubmed:
3
10
2021
medline:
5
11
2021
entrez:
2
10
2021
Statut:
ppublish
Résumé
Continuity of care is achieved in the neonatal intensive care unit (NICU) through careful documentation of all events of clinical significance, including clinical interventions and routine care events (e.g., feeding, diaper change, weighing, etc.). As a step towards automating this documentation process, we propose a scene recognition algorithm that can automatically identify key features in a single image of the patient environment, paired with a rule-based sentence generator to caption the scene. Color and depth video were obtained from 29 newborn patients from the Children's Hospital of Eastern Ontario (CHEO) using an Intel RealSense SR300 RGB-D camera and manual bedside event annotation. Image processing techniques are implemented to classify two lighting conditions: brightness level and phototherapy. A deep neural network is developed for three image classification tasks: on-going intervention, bed occupancy, and patient coverage. Transfer learning is leveraged in the feature extraction layers, such that weights learned from a generic data-rich task are applied to the clinical domain where data collection is complex and costly. Different depth fusion techniques are implemented and compared among classification tasks, where the depth and color data are fused as an RGB-D image (image fusion) or separately at various layers in the network (network fusion). Promising results were obtained with >84% sensitivity and >73% F1 measure across all context variables despite the large class imbalance. RGBD-based models are shown to outperform RGB models on most tasks. In general, a 4-channel image fusion and network fusion at the 11th layer of the VGG-16 architecture were preferred. Ultimately, achieving complete scene understanding through multimodal computer vision could form the basis for a semi-automated charting system to assist clinical staff.
Identifiants
pubmed: 34600329
pii: S0010-4825(21)00667-3
doi: 10.1016/j.compbiomed.2021.104873
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
104873Informations de copyright
Copyright © 2021 The Authors. Published by Elsevier Ltd.. All rights reserved.