Pushing ML Predictions Into DBMSs.

MLOPs SQL machine learning

Journal

IEEE transactions on knowledge and data engineering
ISSN: 1041-4347
Titre abrégé: IEEE Trans Knowl Data Eng
Pays: United States
ID NLM: 9887654

Informations de publication

Date de publication:
01 Oct 2023
Historique:
received: 19 01 2022
revised: 12 02 2023
accepted: 08 04 2023
medline: 13 11 2023
pubmed: 13 11 2023
entrez: 13 11 2023
Statut: epublish

Résumé

In the past decade, many approaches have been suggested to execute ML workloads on a DBMS. However, most of them have looked at in-DBMS ML from a training perspective, whereas ML inference has been largely overlooked. We think that this is an important gap to fill for two main reasons: (1) in the near future, every application will be infused with some sort of ML capability; (2) behind every web page, application, and enterprise there is a DBMS, whereby in-DBMS inference is an appealing solution both for efficiency (e.g., less data movement), performance (e.g., cross-optimizations between relational operators and ML) and governance. In this article, we study whether DBMSs are a good fit for prediction serving. We introduce a technique for translating trained ML pipelines containing both featurizers (e.g., one-hot encoding) and models (e.g., linear and tree-based models) into SQL queries, and we compare in-DBMS performance against popular ML frameworks such as Sklearn and ml.net. Our experiments show that, when pushed inside a DBMS, trained ML pipelines can have performance comparable to ML frameworks in several scenarios, while they perform quite poorly on text featurization and over (even simple) neural networks.

Identifiants

pubmed: 37954972
doi: 10.1109/TKDE.2023.3269592
pmc: PMC10620958
doi:

Types de publication

Journal Article

Langues

eng

Pagination

10295-10308

Informations de copyright

© 2023 The Authors.

Auteurs

Matteo Paganelli (M)

University of Modena and Reggio Emilia 41121 Modena Italy.

Paolo Sottovia (P)

Huawei Research Munich 80992 München Germany.

Kwanghyun Park (K)

Yonsei University Seoul 03722 South Korea.

Matteo Interlandi (M)

Microsoft Research Redmond WA 98052 USA.

Francesco Guerra (F)

University of Modena and Reggio Emilia 41121 Modena Italy.

Classifications MeSH