How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of Likert scale.

Computed tomography (X-ray) Interobserver variability Intraobserver variability

Journal

European radiology

ISSN: 1432-1084

Titre abrégé: Eur Radiol

Pays: Germany

ID NLM: 9114774

Informations de publication

Date de publication:
02 Jan 2024

Historique:

received: 26 06 2023

accepted: 29 10 2023

revised: 22 09 2023

medline: 2 1 2024

pubmed: 2 1 2024

entrez: 2 1 2024

Statut: aheadofprint

Résumé

The aim of this study is to improve the reliability of subjective IQ assessment using a pairwise comparison (PC) method instead of a Likert scale method in abdominal CT scans. Abdominal CT scans (single-center) were retrospectively selected between September 2019 and February 2020 in a prior study. Sample variance in IQ was obtained by adding artificial noise using dedicated reconstruction software, including reconstructions with filtered backprojection and varying iterative reconstruction strengths. Two datasets (each n = 50) were composed with either higher or lower IQ variation with the 25 original scans being part of both datasets. Using in-house developed software, six observers (five radiologists, one resident) rated both datasets via both the PC method (forcing observers to choose preferred scans out of pairs of scans resulting in a ranking) and a 5-point Likert scale. The PC method was optimized using a sorting algorithm to minimize necessary comparisons. The inter- and intraobserver agreements were assessed for both methods with the intraclass correlation coefficient (ICC). Twenty-five patients (mean age 61 years ± 15.5; 56% men) were evaluated. The ICC for interobserver agreement for the high-variation dataset increased from 0.665 (95%CI 0.396-0.814) to 0.785 (95%CI 0.676-0.867) when the PC method was used instead of a Likert scale. For the low-variation dataset, the ICC increased from 0.276 (95%CI 0.034-0.500) to 0.562 (95%CI 0.337-0.729). Intraobserver agreement increased for four out of six observers. The PC method is more reliable for subjective IQ assessment indicated by improved inter- and intraobserver agreement. This study shows that the pairwise comparison method is a more reliable method for subjective image quality assessment. Improved reliability is of key importance for optimization studies, validation of automatic image quality assessment algorithms, and training of AI algorithms. • Subjective assessment of diagnostic image quality via Likert scale has limited reliability. • A pairwise comparison method improves the inter- and intraobserver agreement. • The pairwise comparison method is more reliable for CT optimization studies.

Identifiants

DOI: 10.1007/s00330-023-10493-7 PMID: 38165429

pubmed: 38165429

doi: 10.1007/s00330-023-10493-7

pii: 10.1007/s00330-023-10493-7

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Valentin J (2007) The 2007 Recommendations of the International Commission on Radiological Protection. Oxford: Elsevier 37(2-4):1-133

Valentin J (2007) International Commission on Radiation Protection. Managing patient dose in multi-detector computed tomography (MDCT). New York: Elsevier 1-79

Samei E, Bakalyar D, Boedeker KL et al (2019) Performance evaluation of computed tomography systems: summary of AAPM Task Group 233. Med Phys 46(11):e735–e756

doi: 10.1002/mp.13763 pubmed: 31408540

Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):5–55

Zhang Z, Zhau J, Liu N, Gu X, Zhang Y (2017) An improved pairwise comparison scaling method for subjective image quality assessment. IEEE Int Symp Broadb Multimed Syst Broadcast (BMSB) 1-6

Leveque L, Liu H, Baraković S, et al (2018) On the subjective assessment of the perceived quality of medical images and videos. IEEE Tenth Int Conf Qual Multimed Exper (QoMEX) 1-6

Chow LS, Paramesran R (2016) Review of medical image quality assessment. Biomed Sig Process Contr 27:145–154

doi: 10.1016/j.bspc.2016.02.006

Mason A, Rioux J, Clarke SE (2020) Comparison of objective image quality metrics to expert radiologists’ scoring of diagnostic quality of MR images. IEEE Trans Med Imaging 39(4):1064–1072

doi: 10.1109/TMI.2019.2930338 pubmed: 31535985

Cheng Y, Abadi E, Smith TB (2019) Validation of algorithmic CT image quality metrics with preferences of radiologists. Med Phys 46(11):4837–4846

doi: 10.1002/mp.13795 pubmed: 31465538

Jeukens CRLPN, Brauer MTH, Mihl C et al (2023) A new algorithm for automatically calculating noise, spatial resolution, and contrast image quality metrics: proof-of-concept and agreement with subjective scores in phantom and clinical abdominal CT. Invest Radiol 10:1097

Vaishnav JY, Jung WC, Popescu LM, Zeng R, Myers KJ (2014) Objective assessment of image quality and dose reduction in CT iterative reconstruction. Med Phys 41(7):071904

doi: 10.1118/1.4881148 pubmed: 24989382

Thurstone LL (1927) A law of comparative judgment. Psychol Rev 34(4):273–286

doi: 10.1037/h0070288

Mantiuk RK, Tomaszewska A, Mantiuk R (2012) Comparison of four subjective methods for image quality assessment. Comput Graph Forum 31(8):2478–2491

doi: 10.1111/j.1467-8659.2012.03188.x

Phelps AS, Naeger DM, Courtier JL et al (2015) Pairwise comparison versus Likert scale for biomedical image assessment. AJR Am J Roentgenol 204(1):8–14

doi: 10.2214/AJR.14.13022 pubmed: 25539230

Kumcu A, Bombeke K, Platiša L, Jovanov L, Van Looy J, Philips W (2017) Performance of four subjective video quality assessment protocols and impact of different rating preprocessing and analysis method. IEEE J Sel Top Sig Process 11(1):48–63

doi: 10.1109/JSTSP.2016.2638681

Gur D, Rubin DA, Kart BH et al (1997) Forced choice and ordinal discrete rating assessment of image quality: a comparison. J Digit Imaging. 10(3):103–107

doi: 10.1007/BF03168596 pubmed: 9268904 pmcid: 3452949

Saaty TL (2008) Relative measurement and its generalization in decision making why pairwise comparisons are central in mathematics for the measurement of intangible factors the analytic hierarchy/network process. RACSAM-Revista de la Real Academia de Ciencias Exactas. Fisicas y Naturales. Serie A. Matematicas 102:251–318

Martens B, Bosschee JGA, Van Kuijk SMJ et al (2022) Finding the optimal tube current and iterative reconstruction strength in liver imaging; two needles in one haystack. PLoS One 17(4):1–12

doi: 10.1371/journal.pone.0266194

Ford LR Jr, Johnson SM (1959) A tournament problem. Am Math Month 66(5):387–389

doi: 10.1080/00029890.1959.11989306

De Vet HCW, Terwee CB, Mokkink LB, Knol DL (2011) Measurement in medicine: a practical guide. Cambridge University Press

doi: 10.1017/CBO9780511996214

Sartoretti T, Landsmann A, Nakhostin D et al (2022) Quantum iterative reconstruction for abdominal photon-counting detector CT improves image quality. Radiology 303(2):339–348

doi: 10.1148/radiol.211931 pubmed: 35103540

Obuchowicz R, Oszust M, Piorkowski A (2020) Interobserver variability in quality assessment of magnetic resonance images. BMC Med Imaging 20(1):109

doi: 10.1186/s12880-020-00505-z pubmed: 32962651 pmcid: 7509933

De Crop A, Smeets P, Van Hoof T et al (2015) Correlation of clinical and physical-technical image quality in chest CT: a human cadaver study applied on iterative reconstruction. BMC Med Imaging 15(1):1–9

Obuchowicz R, Oszust M, Bielecka M, Bielecki A, Piórkowski A (2020) Magnetic resonance image quality assessment by using non-maximum suppression and entropy analysis. Entropy 22(2):220

doi: 10.3390/e22020220 pubmed: 33285994 pmcid: 7516651

Chow LS, Rajagopal H, Paramesran R (2016) Alzheimer’s Disease Neuroimaging Initiative. Correlation between subjective and objective assessment of magnetic resonance (MR) images. Magn Reson Imaging 34(6):820–831

doi: 10.1016/j.mri.2016.03.006 pubmed: 26969762

Horehledova B, Mihl C, Milanese G et al (2018) CT angiography in the lower extremity peripheral artery disease feasibility of an ultra-low volume contrast media protocol. Cardiovasc Intervent Radiol 41(11):1751–1764

doi: 10.1007/s00270-018-1979-z pubmed: 29789875 pmcid: 6182764

MacDougall RD, Zhang Y, Callahan MJ et al (2019) Improving low-dose pediatric abdominal CT by using convolutional neural networks. Radiol Artif Intell 1(6):e180087

doi: 10.1148/ryai.2019180087 pubmed: 32090205 pmcid: 6884028

Ellmann S, Kammerer F, Brand M et al (2016) A novel pairwise comparison-based method to determine radiation dose reduction potentials of iterative reconstruction algorithms, exemplified through circle of Willis computed tomography angiography. Invest Radiol 51(5):331–9

doi: 10.1097/RLI.0000000000000243 pubmed: 26741892

How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of Likert scale.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Eva J I Hoeijmakers (EJI)

Bibi Martens (B)

Babs M F Hendriks (BMF)

Casper Mihl (C)

Razvan L Miclea (RL)

Walter H Backes (WH)

Joachim E Wildberger (JE)

Frank M Zijta (FM)

Hester A Gietema (HA)

Patricia J Nelemans (PJ)

Cécile R L P N Jeukens (CRLPN)

Classifications MeSH