How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of Likert scale.
Computed tomography (X-ray)
Interobserver variability
Intraobserver variability
Journal
European radiology
ISSN: 1432-1084
Titre abrégé: Eur Radiol
Pays: Germany
ID NLM: 9114774
Informations de publication
Date de publication:
02 Jan 2024
02 Jan 2024
Historique:
received:
26
06
2023
accepted:
29
10
2023
revised:
22
09
2023
medline:
2
1
2024
pubmed:
2
1
2024
entrez:
2
1
2024
Statut:
aheadofprint
Résumé
The aim of this study is to improve the reliability of subjective IQ assessment using a pairwise comparison (PC) method instead of a Likert scale method in abdominal CT scans. Abdominal CT scans (single-center) were retrospectively selected between September 2019 and February 2020 in a prior study. Sample variance in IQ was obtained by adding artificial noise using dedicated reconstruction software, including reconstructions with filtered backprojection and varying iterative reconstruction strengths. Two datasets (each n = 50) were composed with either higher or lower IQ variation with the 25 original scans being part of both datasets. Using in-house developed software, six observers (five radiologists, one resident) rated both datasets via both the PC method (forcing observers to choose preferred scans out of pairs of scans resulting in a ranking) and a 5-point Likert scale. The PC method was optimized using a sorting algorithm to minimize necessary comparisons. The inter- and intraobserver agreements were assessed for both methods with the intraclass correlation coefficient (ICC). Twenty-five patients (mean age 61 years ± 15.5; 56% men) were evaluated. The ICC for interobserver agreement for the high-variation dataset increased from 0.665 (95%CI 0.396-0.814) to 0.785 (95%CI 0.676-0.867) when the PC method was used instead of a Likert scale. For the low-variation dataset, the ICC increased from 0.276 (95%CI 0.034-0.500) to 0.562 (95%CI 0.337-0.729). Intraobserver agreement increased for four out of six observers. The PC method is more reliable for subjective IQ assessment indicated by improved inter- and intraobserver agreement. This study shows that the pairwise comparison method is a more reliable method for subjective image quality assessment. Improved reliability is of key importance for optimization studies, validation of automatic image quality assessment algorithms, and training of AI algorithms. • Subjective assessment of diagnostic image quality via Likert scale has limited reliability. • A pairwise comparison method improves the inter- and intraobserver agreement. • The pairwise comparison method is more reliable for CT optimization studies.
Identifiants
pubmed: 38165429
doi: 10.1007/s00330-023-10493-7
pii: 10.1007/s00330-023-10493-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023. The Author(s).
Références
Valentin J (2007) The 2007 Recommendations of the International Commission on Radiological Protection. Oxford: Elsevier 37(2-4):1-133
Valentin J (2007) International Commission on Radiation Protection. Managing patient dose in multi-detector computed tomography (MDCT). New York: Elsevier 1-79
Samei E, Bakalyar D, Boedeker KL et al (2019) Performance evaluation of computed tomography systems: summary of AAPM Task Group 233. Med Phys 46(11):e735–e756
doi: 10.1002/mp.13763
pubmed: 31408540
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):5–55
Zhang Z, Zhau J, Liu N, Gu X, Zhang Y (2017) An improved pairwise comparison scaling method for subjective image quality assessment. IEEE Int Symp Broadb Multimed Syst Broadcast (BMSB) 1-6
Leveque L, Liu H, Baraković S, et al (2018) On the subjective assessment of the perceived quality of medical images and videos. IEEE Tenth Int Conf Qual Multimed Exper (QoMEX) 1-6
Chow LS, Paramesran R (2016) Review of medical image quality assessment. Biomed Sig Process Contr 27:145–154
doi: 10.1016/j.bspc.2016.02.006
Mason A, Rioux J, Clarke SE (2020) Comparison of objective image quality metrics to expert radiologists’ scoring of diagnostic quality of MR images. IEEE Trans Med Imaging 39(4):1064–1072
doi: 10.1109/TMI.2019.2930338
pubmed: 31535985
Cheng Y, Abadi E, Smith TB (2019) Validation of algorithmic CT image quality metrics with preferences of radiologists. Med Phys 46(11):4837–4846
doi: 10.1002/mp.13795
pubmed: 31465538
Jeukens CRLPN, Brauer MTH, Mihl C et al (2023) A new algorithm for automatically calculating noise, spatial resolution, and contrast image quality metrics: proof-of-concept and agreement with subjective scores in phantom and clinical abdominal CT. Invest Radiol 10:1097
Vaishnav JY, Jung WC, Popescu LM, Zeng R, Myers KJ (2014) Objective assessment of image quality and dose reduction in CT iterative reconstruction. Med Phys 41(7):071904
doi: 10.1118/1.4881148
pubmed: 24989382
Thurstone LL (1927) A law of comparative judgment. Psychol Rev 34(4):273–286
doi: 10.1037/h0070288
Mantiuk RK, Tomaszewska A, Mantiuk R (2012) Comparison of four subjective methods for image quality assessment. Comput Graph Forum 31(8):2478–2491
doi: 10.1111/j.1467-8659.2012.03188.x
Phelps AS, Naeger DM, Courtier JL et al (2015) Pairwise comparison versus Likert scale for biomedical image assessment. AJR Am J Roentgenol 204(1):8–14
doi: 10.2214/AJR.14.13022
pubmed: 25539230
Kumcu A, Bombeke K, Platiša L, Jovanov L, Van Looy J, Philips W (2017) Performance of four subjective video quality assessment protocols and impact of different rating preprocessing and analysis method. IEEE J Sel Top Sig Process 11(1):48–63
doi: 10.1109/JSTSP.2016.2638681
Gur D, Rubin DA, Kart BH et al (1997) Forced choice and ordinal discrete rating assessment of image quality: a comparison. J Digit Imaging. 10(3):103–107
doi: 10.1007/BF03168596
pubmed: 9268904
pmcid: 3452949
Saaty TL (2008) Relative measurement and its generalization in decision making why pairwise comparisons are central in mathematics for the measurement of intangible factors the analytic hierarchy/network process. RACSAM-Revista de la Real Academia de Ciencias Exactas. Fisicas y Naturales. Serie A. Matematicas 102:251–318
Martens B, Bosschee JGA, Van Kuijk SMJ et al (2022) Finding the optimal tube current and iterative reconstruction strength in liver imaging; two needles in one haystack. PLoS One 17(4):1–12
doi: 10.1371/journal.pone.0266194
Ford LR Jr, Johnson SM (1959) A tournament problem. Am Math Month 66(5):387–389
doi: 10.1080/00029890.1959.11989306
De Vet HCW, Terwee CB, Mokkink LB, Knol DL (2011) Measurement in medicine: a practical guide. Cambridge University Press
doi: 10.1017/CBO9780511996214
Sartoretti T, Landsmann A, Nakhostin D et al (2022) Quantum iterative reconstruction for abdominal photon-counting detector CT improves image quality. Radiology 303(2):339–348
doi: 10.1148/radiol.211931
pubmed: 35103540
Obuchowicz R, Oszust M, Piorkowski A (2020) Interobserver variability in quality assessment of magnetic resonance images. BMC Med Imaging 20(1):109
doi: 10.1186/s12880-020-00505-z
pubmed: 32962651
pmcid: 7509933
De Crop A, Smeets P, Van Hoof T et al (2015) Correlation of clinical and physical-technical image quality in chest CT: a human cadaver study applied on iterative reconstruction. BMC Med Imaging 15(1):1–9
Obuchowicz R, Oszust M, Bielecka M, Bielecki A, Piórkowski A (2020) Magnetic resonance image quality assessment by using non-maximum suppression and entropy analysis. Entropy 22(2):220
doi: 10.3390/e22020220
pubmed: 33285994
pmcid: 7516651
Chow LS, Rajagopal H, Paramesran R (2016) Alzheimer’s Disease Neuroimaging Initiative. Correlation between subjective and objective assessment of magnetic resonance (MR) images. Magn Reson Imaging 34(6):820–831
doi: 10.1016/j.mri.2016.03.006
pubmed: 26969762
Horehledova B, Mihl C, Milanese G et al (2018) CT angiography in the lower extremity peripheral artery disease feasibility of an ultra-low volume contrast media protocol. Cardiovasc Intervent Radiol 41(11):1751–1764
doi: 10.1007/s00270-018-1979-z
pubmed: 29789875
pmcid: 6182764
MacDougall RD, Zhang Y, Callahan MJ et al (2019) Improving low-dose pediatric abdominal CT by using convolutional neural networks. Radiol Artif Intell 1(6):e180087
doi: 10.1148/ryai.2019180087
pubmed: 32090205
pmcid: 6884028
Ellmann S, Kammerer F, Brand M et al (2016) A novel pairwise comparison-based method to determine radiation dose reduction potentials of iterative reconstruction algorithms, exemplified through circle of Willis computed tomography angiography. Invest Radiol 51(5):331–9
doi: 10.1097/RLI.0000000000000243
pubmed: 26741892