A synthetic data set to benchmark anti-money laundering methods.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
28 09 2023
28 09 2023
Historique:
received:
04
04
2023
accepted:
14
09
2023
medline:
2
10
2023
pubmed:
29
9
2023
entrez:
28
9
2023
Statut:
epublish
Résumé
Bank transactions are highly confidential. As a result, there are no real public data sets that can be used to investigate and compare anti-money laundering (AML) methods in banks. This severely limits research on important AML problems such as efficiency, effectiveness, class imbalance, concept drift, and interpretability. To address the issue, we present SynthAML: a synthetic data set to benchmark statistical and machine learning methods for AML. The data set builds on real data from Spar Nord, a systemically important Danish bank, and contains 20,000 AML alerts and over 16 million transactions. Experimental results indicate that performance on SynthAML can be transferred to the real world. As use cases, we present and discuss open problems in the AML literature.
Identifiants
pubmed: 37770445
doi: 10.1038/s41597-023-02569-2
pii: 10.1038/s41597-023-02569-2
pmc: PMC10539331
doi:
Types de publication
Dataset
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
661Informations de copyright
© 2023. Springer Nature Limited.
Références
Sci Adv. 2021 Nov 19;7(47):eabg3296
pubmed: 34788101
Proc Natl Acad Sci U S A. 2019 Oct 29;116(44):22071-22080
pubmed: 31619572
Nat Commun. 2019 Jul 23;10(1):3069
pubmed: 31337762
PLoS One. 2015 Jul 10;10(7):e0130140
pubmed: 26161953
Sci Rep. 2013;3:1376
pubmed: 23524645