Critical appraisal of two Box-Cox formulae for their utility in determining reference intervals by realistic simulation and extensive real-world data analyses.
Bias ratio
Clinical chemistry
Nonparametric method
Power transformation
Power-normal distribution
Reference values
Journal
Computer methods and programs in biomedicine
ISSN: 1872-7565
Titre abrégé: Comput Methods Programs Biomed
Pays: Ireland
ID NLM: 8506513
Informations de publication
Date de publication:
Dec 2023
Dec 2023
Historique:
received:
14
01
2023
revised:
20
07
2023
accepted:
15
09
2023
medline:
14
11
2023
pubmed:
24
10
2023
entrez:
23
10
2023
Statut:
ppublish
Résumé
The reference interval (RI) is defined as the central 95 % range of reference values (RVs) from healthy individuals. The ideal method for determining RIs is to transform RV distribution into Gaussian and estimate its 95 % range parametrically. One-parameter Box-Cox formula (1pBC) is widely used for correcting skewness (Sk) or kurtosis (Kt) in data distribution. However, 1pBC is not popular for computing RIs due to its unreliability in Gaussian transformation. While its two-parameter version (2pBC) is not used due to a challenge in fitting power (λ) and shift (α) parameters simultaneously. In this study, technical issues in fitting both formulae are assessed, and an alternative algorithm for successful use of 2pBC is proposed. For fitting 1pBC, optimal λ was determined by stepwise linear search. For 2pBC, optimal [λ, α] combination was pursued in two ways: by grid search of λ and α (2pBCgrid) or by using the grid search but keeping α-range close to the reference distribution (2pBCopt). Their accuracy and precision in determining RIs were compared by generating power-normal distributions simulating RVs of 23 major chemistry analytes. Additionally, their practical utilities were compared by analyzing 776 real-world datasets comprising test results of 25 analytes that were obtained from the global multicenter RV study of IFCC. For comparison, the performance of nonparametric method was evaluated in both settings. For analytes with not-much-skewed distributions, unbiased estimation of RIs was accomplished by all methods. Nevertheless, when reference distributions are located far from zero, λ estimated by1pBC and 2pBCgrid fluctuated widely, which was attributable to virtually flat goodness-of-fit profile for [λ, α]. For highly skewed distributions, 1pBC caused bias in estimating RI and λ due to remotely peaked goodness-of-fit profile. Real-world data analyses revealed that 2pBCopt and 1pBC achieved Gaussian transformation (|Sk|<0.1 and |Kt|<0.3) in 82.4 % and 66.9 % among 776 datasets, respectively. Fitting bias signified by Kt<-0.4 was more common to 1pBC. The practical utility of 2pBCopt was unbiased prediction of analyte-specific distribution-shape (λ). Whereas nonparametric method gave highly variable RIs for both simulated and real-world datasets. 2pBCopt is suitable for calculating RIs and grasping distribution-shape from the estimated λ.
Sections du résumé
BACKGROUND
BACKGROUND
The reference interval (RI) is defined as the central 95 % range of reference values (RVs) from healthy individuals. The ideal method for determining RIs is to transform RV distribution into Gaussian and estimate its 95 % range parametrically. One-parameter Box-Cox formula (1pBC) is widely used for correcting skewness (Sk) or kurtosis (Kt) in data distribution. However, 1pBC is not popular for computing RIs due to its unreliability in Gaussian transformation. While its two-parameter version (2pBC) is not used due to a challenge in fitting power (λ) and shift (α) parameters simultaneously. In this study, technical issues in fitting both formulae are assessed, and an alternative algorithm for successful use of 2pBC is proposed.
METHODS
METHODS
For fitting 1pBC, optimal λ was determined by stepwise linear search. For 2pBC, optimal [λ, α] combination was pursued in two ways: by grid search of λ and α (2pBCgrid) or by using the grid search but keeping α-range close to the reference distribution (2pBCopt). Their accuracy and precision in determining RIs were compared by generating power-normal distributions simulating RVs of 23 major chemistry analytes. Additionally, their practical utilities were compared by analyzing 776 real-world datasets comprising test results of 25 analytes that were obtained from the global multicenter RV study of IFCC. For comparison, the performance of nonparametric method was evaluated in both settings.
RESULTS
RESULTS
For analytes with not-much-skewed distributions, unbiased estimation of RIs was accomplished by all methods. Nevertheless, when reference distributions are located far from zero, λ estimated by1pBC and 2pBCgrid fluctuated widely, which was attributable to virtually flat goodness-of-fit profile for [λ, α]. For highly skewed distributions, 1pBC caused bias in estimating RI and λ due to remotely peaked goodness-of-fit profile. Real-world data analyses revealed that 2pBCopt and 1pBC achieved Gaussian transformation (|Sk|<0.1 and |Kt|<0.3) in 82.4 % and 66.9 % among 776 datasets, respectively. Fitting bias signified by Kt<-0.4 was more common to 1pBC. The practical utility of 2pBCopt was unbiased prediction of analyte-specific distribution-shape (λ). Whereas nonparametric method gave highly variable RIs for both simulated and real-world datasets.
CONCLUSIONS
CONCLUSIONS
2pBCopt is suitable for calculating RIs and grasping distribution-shape from the estimated λ.
Identifiants
pubmed: 37871480
pii: S0169-2607(23)00486-8
doi: 10.1016/j.cmpb.2023.107820
pii:
doi:
Types de publication
Multicenter Study
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
107820Informations de copyright
Copyright © 2023 The Authors. Published by Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest No authors have any competing interests to declare in conducting this study and reporting the results.