Data in electronic medical records (EMRs) have been widely employed owing to rapid advances in disease assessment technologies. Accordingly, the challenging issue of how to effectively retrieve meaningful data from large-scale medical databases for disease assessment has risen. Furthermore, the manner in which early disease risk assessment models can detect disease symptoms is an issue of concern because early detection leads to early treatment. In this paper, with the aim of detecting diseases sooner and more effectively, a novel early disease risk assessment method is proposed, and type 2 diabetes mellitus (T2DM) is used as a case study. The proposed method is to improve the quality and meaning of diagnostic data using novel features and early strategy. To apply EMRs to construct a relationship matrix between patients and diseases, a retrieval method for generalized diagnostic coded information with extracted occurrence numbers was proposed. In order to identify diseases earlier, a disease risk assessment strategy from 7, 60, and 120 days before the onset of T2DM was established. The experimental results showed that the proposed method to improve disease risk assessment achieved high accuracy in terms of AUC-ROC and AUC-PR values. These results also demonstrate that the EMR information retrieval methods play an important role for disease assessment, and assessments can be performed at an earlier stage based on large-scale diagnostic databases.