Abstract
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Current Genomics
Title: Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
Volume: 10 Issue: 6
Author(s): Lori Dalton, Virginia Ballarin and Marcel Brun
Affiliation:
Abstract: The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Export Options
About this article
Cite this article as:
Dalton Lori, Ballarin Virginia and Brun Marcel, Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics, Current Genomics 2009; 10 (6) . https://dx.doi.org/10.2174/138920209789177601
DOI https://dx.doi.org/10.2174/138920209789177601 |
Print ISSN 1389-2029 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5488 |
Call for Papers in Thematic Issues
Current Genomics in Cardiovascular Research
Cardiovascular diseases are the main cause of death in the world, in recent years we have had important advances in the interaction between cardiovascular disease and genomics. In this Research Topic, we intend for researchers to present their results with a focus on basic, translational and clinical investigations associated with ...read more
Deep learning in Single Cell Analysis
The field of biology is undergoing a revolution in our ability to study individual cells at the molecular level, and to integrate data from multiple sources and modalities. This has been made possible by advances in technologies for single-cell sequencing, multi-omics profiling, spatial transcriptomics, and high-throughput imaging, as well as ...read more
New insights on Pediatric Tumors and Associated Cancer Predisposition Syndromes
Because of the broad spectrum of children cancer susceptibility, the diagnosis of cancer risk syndromes in children is rarely used in direct cancer treatment. The field of pediatric cancer genetics and genomics will only continue to expand as a result of increasing use of genetic testing tools. It's possible that ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
- Announcements
Related Articles
-
The Insulin-like Growth Factor-1 Axis and its Potential as a Therapeutic Target in Central Nervous System (CNS) Disorders
Central Nervous System Agents in Medicinal Chemistry The Molecular Basis of Class Side Effects Due to Treatment with Inhibitors of the VEGF/VEGFR Pathway
Current Clinical Pharmacology Targeting the Immune System in Cancer
Current Pharmaceutical Biotechnology Anti-Cancer / Anti-Tumor
Current Bioactive Compounds Anticancer Perspectives on the Fungal-Derived Polyphenolic Hispolon
Anti-Cancer Agents in Medicinal Chemistry The Interactions of Anticancer Agents with Tea Catechins: Current Evidence from Preclinical Studies
Anti-Cancer Agents in Medicinal Chemistry Anlotinib Overcomes Multiple Drug Resistant Colorectal Cancer Cells via Inactivating PI3K/AKT Pathway
Anti-Cancer Agents in Medicinal Chemistry Non-Viral Gene Delivery Methods
Current Pharmaceutical Biotechnology Secondary Hypertension: The Ways of Management
Current Vascular Pharmacology <sup>89</sup>Zr-PET Radiochemistry in the Development and Application of Therapeutic Monoclonal Antibodies and Other Biologicals
Current Topics in Medicinal Chemistry Differentiation Between Osteoporotic and Neoplastic Vertebral Fractures: State of The Art and Future Perspectives
Current Medical Imaging Bioactive Components and Pharmacological Action of Wikstroemia indica (L.) C. A. Mey and Its Clinical Application
Current Pharmaceutical Biotechnology The “Big Five” Phytochemicals Targeting Cancer Stem Cells: Curcumin, EGCG, Sulforaphane, Resveratrol and Genistein
Current Medicinal Chemistry Peptides as Therapeutic Agents or Drug Leads for Autoimmune, Hormone Dependent and Cardiovascular Diseases
Anti-Inflammatory & Anti-Allergy Agents in Medicinal Chemistry Phosphonium Salt Displays Cytotoxic Effects Against Human Cancer Cell Lines
Anti-Cancer Agents in Medicinal Chemistry Dimethyloxallyl Glycine-Incorporated Borosilicate Bioactive Glass Scaffolds for Improving Angiogenesis and Osteogenesis in Critical-Sized Calvarial Defects
Current Drug Delivery Damnacanthal: A Promising Compound as a Medicinal Anthraquinone
Anti-Cancer Agents in Medicinal Chemistry Synthetic Nanocarriers for Intracellular Protein Delivery
Current Drug Metabolism Application of In Vivo Animal Models to Characterize the Pharmacokinetic and Pharmacodynamic Properties of Drug Candidates in Discovery Settings
Combinatorial Chemistry & High Throughput Screening Phosphoproteins with Stability Against All Urinary Phosphatases as Potential Biomarkers in Urine
Protein & Peptide Letters