Abstract
The fingerprints that are widely used for similarity-based virtual screening typically encode the presence or absence of fragments, without any indication as to their relative importance. This chapter discusses the use of weighted fingerprints, where each fragment is associated with a weight denoting its degree of importance in quantifying the degree of similarity between a reference structure and a database structure. Extensive studies using the World of Molecular Bioactivity and MDL Drug Data Report databases show that weighting fragments according to their frequency of occurrence within a molecule can increase the effectiveness of screening, but that this is not the case when fragments are weighted according to their frequency of occurrence within a database.
Keywords: Chemoinformatics, ECFC4 fingerprint, extended connectivity fingerprint counts fingerprint, fingerprint, fragment weighting scheme, frequency weighting, IDF weighting, information retrieval, inverse frequency weighting, ligand-based virtual screening, /MDL Drug Data Report/database, similarity-based virtual screening, similarity coefficient, similarity searching, TF weighting, virtual screening, weighting scheme, /World of Molecular Bioactivity/database.