In this paper, we demonstrated that combining multiple scoring functions improves enrichment of true positives only if (a) each of the individual scoring functions has relatively high performance, and (b) the individual scoring functions are distinctive. The major weakness, the inability to consistently identify true positives (leads), of virtual screening is likely due to the imprecise scoring algorithms. It has been demonstrated that consensus scoring improves enrichment of true positives, but they are yet to provide theoretical analysis that gives insight into real features of combinations and data fusion for virtual screening. This work thus establishes a potential theoretical basis for the probable success of data fusion approaches to improve yields in silico screening experiments. We provide initial validation of this theoretical approach using data from five scoring systems with two evolutionary docking algorithms on four targets, thymidine kinase, human dihydrofolate reductase, and estrogen receptors of antagonists and agonists. Results of the experiment show a fairly significant improvement (vs. single algorithms) in several measures of scoring quality, specifically "goodness-of-hit" scores, false positive rate, and "enrichment".