Abstract
Introduction: A recently developed deep-learning-based automatic evaluation model provides reliable and efficient Cobb angle measurements for scoliosis diagnosis. However, few studies have explored its clinical application, and external validation is lacking. Therefore, this study aimed to explore the value of automated assessment models in clinical practice by comparing deep-learning models with manual measurement methods.
Methods: The 481 spine radiographs from an open-source dataset were divided into training and validation sets, and 119 spine radiographs from a private dataset were used as the test set. The mean Cobb angle values assessed by three physicians in the hospital's PACS system served as the reference standard. The results of Seg4Reg, VFLDN, and manual measurement were statistically analyzed. The intra-class correlation coefficients (ICC) and the Pearson correlation coefficient (PCC) were used to compare their reliability and correlation. The Bland-Altman method was used to compare their agreement. The Kappa statistic was used to compare the consistency of Cobb angles at different severity levels.
Results: The mean Cobb angle values measured were 35.89° ± 9.33° with Seg4Reg, 31.54° ± 9.78° with VFLDN, and 32.23° ± 9.28° with manual measurement. The ICCs for the reliability of Seg4Reg and VFLDN were 0.809 and 0.974, respectively. The PCC and MAD between Seg4Reg and manual measurements were 0.731 (p<0.001) and 6.51°, while those between VFLDN and manual measurements were 0.952 (p<0.001) and 2.36°. The Kappa statistic indicated VFLDN (k= 0.686, p< 0.001) was superior to Seg4Reg and manual measurements for Cobb angle severity classification.
Conclusion: The deep-learning-based automatic scoliosis Cobb angle assessment model is feasible in clinical practice. Specifically, the keypoint-based VFLDN is more valuable in actual clinical work with higher accuracy, transparency, and interpretability.