Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis.
database harmonization
dish images
food composition database
food matching
manual data curation
missing imputation
nutrition
nutritional composition of foods
“Nutrition5k” dataset
Journal
Nutrients
ISSN: 2072-6643
Titre abrégé: Nutrients
Pays: Switzerland
ID NLM: 101521595
Informations de publication
Date de publication:
01 Oct 2024
01 Oct 2024
Historique:
received:
01
08
2024
revised:
23
09
2024
accepted:
27
09
2024
medline:
16
10
2024
pubmed:
16
10
2024
entrez:
16
10
2024
Statut:
epublish
Résumé
Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants. After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models. Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: -2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs. In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.
Sections du résumé
BACKGROUND
BACKGROUND
Training of machine learning algorithms on dish images collected in other countries requires possible sources of systematic discrepancies, including country-specific food composition databases (FCDBs), to be tackled. The US Nutrition5k project provides for ~5000 dish images and related dish- and ingredient-level information on mass, energy, and macronutrients from the US FCDB. The aim of this study is to (1) identify challenges/solutions in linking the nutritional composition of Italian foods with food images from Nutrition5k and (2) assess potential differences in nutrient content estimated across the Italian and US FCDBs and their determinants.
METHODS
METHODS
After food matching, expert data curation, and handling of missing values, dish-level ingredients from Nutrition5k were integrated with the Italian-FCDB-specific nutritional composition (86 components); dish-specific nutrient content was calculated by summing the corresponding ingredient-specific nutritional values. Measures of agreement/difference were calculated between Italian- and US-FCDB-specific content of energy and macronutrients. Potential determinants of identified differences were investigated with multiple robust regression models.
RESULTS
RESULTS
Dishes showed a median mass of 145 g and included three ingredients in median. Energy, proteins, fats, and carbohydrates showed moderate-to-strong agreement between Italian- and US-FCDB-specific content; carbohydrates showed the worst performance, with the Italian FCDB providing smaller median values (median raw difference between the Italian and US FCDBs: -2.10 g). Regression models on dishes suggested a role for mass, number of ingredients, and presence of recreated recipes, alone or jointly with differential use of raw/cooked ingredients across the two FCDBs.
CONCLUSIONS
CONCLUSIONS
In the era of machine learning approaches for food image recognition, manual data curation in the alignment of FCDBs is worth the effort.
Identifiants
pubmed: 39408306
pii: nu16193339
doi: 10.3390/nu16193339
pii:
doi:
Types de publication
Journal Article
Comparative Study
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Ministero dell'Istruzione e del Merito
ID : PRIN 20227YCB5P