Evaluation of Classication in More Than Two Classes

Authors

  • Daniel Volovici Lucian Blaga University of Sibiu

Abstract

Machine Learning is the most important part of Articial Intelligence in the same sense as we cannot speak about intelligence without the capacity of learning. One of the basics type of learning is to learn to classify objects or putting labels on objects. If you are able to recognize that an object have the attributes of a class C or not (meaning that it is part of class non C), than you will be able to classify in more than one classes: with the strategy one-vs-all or with the strategy one-vs-one. Classication as a learning task imply training with examples of objects a priori labeled with the class which they belong. But if in data we do not have denitions of classes, splitting data into groups has the name of clustering. The idea behind clustering is that probably the data are produced by different processes or that they belong naturally to diferent groups. So, the best way to evaluate the quality of the clustering is to try to cluster data generated to be part of dierent classes.The most used way for evaluation of classication and clustering methods is the confusion matrix dened for two classes. Starting from this matrix it is obtained the measures of Precision, Recall and the Fmeasure. Exist a generalization to n classes using a n x n matrix. But for the situation where exist a different number of clusters than the number of original classes we must use a nxm contingency matrix also named association matrix. And because the degree of association is measured by the dominance of the principal diagonal it is very important to use time efcient methods of manipulation of the lines and columns of matrixes.

References

Anderberg, M., R.:Cluster Analysis for Applications. Academic Press, 1973;

Cretulescu, R., Morariu, D.:Text Mining tehnici de clasicaresi clustering al documentelor. Editura Albastra, Cluj-Napoca, 2012.;

Fletcher, R. H., Fletcher, S. W., Fletcher, G. S.: Clinical epidemiology: the essentials. Wolters Kluwer/Lippincott Williams & Wilkins Health, 5th edition, 2014;

Fleiss, J.L.:Statistical Methods for Rates and Proportions. Wiley Series in Probability and Statistics. Wiley, 1981.;

Japkowicz, N., Shah, M.:Evaluating Learning Algorithms: A Classication Perspective. Cambridge University Press, New York, NY, USA, 2011.;

van Rijsbergen, C. J.:Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979.;

Volovici, D., Breazu, M., Curea, G., D., Morariu, D.:Statistical methods for performance evaluation of web document classication. Studies in Informatics and Control, 19(2):169, 2010.;

Witten, I., H., Eibe, F., Hall, M.:Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 3rd edition, 2011.;

Downloads

Published

2016-12-01

Issue

Section

Articles