Structural key bit occurrence frequencies and dependencies in PubChem and their effect on similarity searches

Nelson Chen*, Val Golovlev

*Corresponding author for this work

研究成果: Article同行評審

2 引文 斯高帕斯(Scopus)

摘要

Little published literature exists on the 881 bit structural keys used by PubChem for categorizing and comparing the compounds present in its database. We characterized these structural keys by examining their frequencies of occurrence within the PubChem compound database. In addition, bit dependencies, defined as the universal presence of a bit given the presence of another, were determined. We show that the vast majority of bits are rarely set and that substantial numbers of dependencies exist. A comparison of similarity searches with five United States Food and Drug Administration approved drugs as reference compounds using the full structural keys versus a variant in which all dependent bits were removed was performed using the Tanimoto coefficient. These bit dependencies not only affect similarity scores, but also alter the compounds returned in similarity searching. Judicious selection of bits is needed to maintain sufficient ability to differentiate related compounds.

原文English
頁(從 - 到)355-361
頁數7
期刊Molecular Informatics
32
發行號4
DOIs
出版狀態Published - 1 四月 2013

指紋 深入研究「Structural key bit occurrence frequencies and dependencies in PubChem and their effect on similarity searches」主題。共同形成了獨特的指紋。

引用此