Enhancing Single-Frame Supervision for Better Temporal Action Localization.
Journal
IEEE transactions on visualization and computer graphics
ISSN: 1941-0506
Titre abrégé: IEEE Trans Vis Comput Graph
Pays: United States
ID NLM: 9891704
Informations de publication
Date de publication:
15 Apr 2024
15 Apr 2024
Historique:
pubmed:
15
4
2024
medline:
15
4
2024
entrez:
15
4
2024
Statut:
aheadofprint
Résumé
Temporal action localization aims to identify the boundaries and categories of actions in videos, such as scoring a goal in a football match. Single-frame supervision has emerged as a labor-efficient way to train action localizers as it requires only one annotated frame per action. However, it often suffers from poor performance due to the lack of precise boundary annotations. To address this issue, we propose a visual analysis method that aligns similar actions and then propagates a few user-provided annotations (e.g., boundaries, category labels) to similar actions via the generated alignments. Our method models the alignment between actions as a heaviest path problem and the annotation propagation as a quadratic optimization problem. As the automatically generated alignments may not accurately match the associated actions and could produce inaccurate localization results, we develop a storyline visualization to explain the localization results of actions and their alignments. This visualization facilitates users in correcting wrong localization results and misalignments. The corrections are then used to improve the localization results of other actions. The effectiveness of our method in improving localization performance is demonstrated through quantitative evaluation and a case study.
Identifiants
pubmed: 38619947
doi: 10.1109/TVCG.2024.3388521
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM