Leveraging multimodal deep learning architecture with retina lesion information to detect diabetic retinopathy

Vincent S. Tseng, Ching Long Chen, Chang Min Liang, Ming Cheng Tai, Jung Tzu Liu, Po Yi Wu, Ming Shan Deng, Ya Wen Lee, Teng Yi Huang, Yi Hao Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Purpose: To improve disease severity classification from fundus images using a hybrid architecture with symptom awareness for diabetic retinopathy (DR). Methods: We used 26,699 fundus images of 17,834 diabetic patients from three Taiwanese hospitals collected in 2007 to 2018 for DR severity classification. Thirty-seven ophthalmologists verified the images using lesion annotation and severity classification as the ground truth. Two deep learning fusion architectures were proposed: late fusion, which combines lesion and severity classification models in parallel using a postprocess-ing procedure, and two-stage early fusion, which combines lesion detection and classification models sequentially and mimics the decision-making process of ophthalmol-ogists. Messidor-2 was used with 1748 images to evaluate and benchmark the performance of the architecture. The primary evaluation metrics were classification accuracy, weighted κ statistic, and area under the receiver operating characteristic curve (AUC). Results: For hospital data, a hybrid architecture achieved a good detection rate, with accuracy and weighted κ of 84.29% and 84.01%, respectively, for five-class DR grading. It also classified the images of early stage DR more accurately than conventional algorithms. The Messidor-2 model achieved an AUC of 97.09% in referral DR detection compared to AUC of 85% to 99% for state-of-the-art algorithms that learned from a larger database. Conclusions: Our hybrid architectures strengthened and extracted characteristics from DR images, while improving the performance of DR grading, thereby increasing the robustness and confidence of the architectures for general use. Translational Relevance: The proposed fusion architectures can enable faster and more accurate diagnosis of various DR pathologies than that obtained in current manual clinical practice.

Original languageEnglish
Article number41
Pages (from-to)1-12
Number of pages12
JournalTranslational Vision Science and Technology
Issue number2
StatePublished - 2020


  • Convolutional neural network
  • Diabetic retinopathy
  • Fundus image
  • Fusion architecture
  • Object detection

Fingerprint Dive into the research topics of 'Leveraging multimodal deep learning architecture with retina lesion information to detect diabetic retinopathy'. Together they form a unique fingerprint.

Cite this