Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

Authors

  • Sorana D. BOLBOACĂ "Iuliu Hatieganu" University of Medicine and Pharmacy Cluj-Napoca

Keywords:

quantitative Structure-Activity Relationship (qSAR), Molecular Descriptors Family on Vertices (MDFV), Anticancer drug, Generalized Cluster Analysis.

Abstract

Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

Author Biography

Sorana D. BOLBOACĂ, "Iuliu Hatieganu" University of Medicine and Pharmacy Cluj-Napoca

Department of Medical Informatics and Biostatistics

Assist. Prof., Ph.D., M.Sc., M.D.

Downloads

Published

15.06.2011

How to Cite

1.
BOLBOACĂ SD. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique. Appl Med Inform [Internet]. 2011 Jun. 15 [cited 2024 Apr. 20];28(2):9-14. Available from: https://ami.info.umfcluj.ro/index.php/AMI/article/view/225

Issue

Section

Articles