To impute the missing values of mass in the transiting exoplanet data, this paper uses the Frank copula to combine two Pareto marginal distributions. Next, a Bayesian Markov chain Monte Carlo (MCMC) imputation method is proposed. The proposed Bayesian MCMC imputation method is found to outperform the mean imputation method. Clustering analysis can shed light on the formation and evolution of exoplanets. After imputing the missing values of mass in the transiting exoplanet data using the proposed approach, the similarity-based clustering method (SCM) clustering algorithm is applied to the logarithm of mass and period for this complete data set. The SCM clustering result indicates two clusters. Furthermore, the intracluster Spearman rank-order correlation coefficients (Formula presented.) for mass and period in these two clusters are 0.401 and (Formula presented.) , respectively, at a significance level of 0.01. This result illustrates that the mass and period correlate in an opposite way between the two different clusters. It implies that the formation and evolution processes of these two clusters are different.
- hot Jupiters
- Metropolis–Hastings algorithm
- missing data
- transiting exoplanets