THE WEKA MULTILAYER PERCEPTRON CLASSIFIER

Daniel Morariu, Radu Crețulescu, Macarie Breazu

Abstract


Automatic document classification is a must when dealing with large collection of documents. WEKA, and especially Weka Knowledge Flow Environment, is a state-of-the-art tool for developing classification applications, even with no programming abilities. We continue our WEKA project presented in a previous paper but changing the classification step, now using the Multilayer Perceptron Classifier. The used dataset is one based on documents from the Reuters Corpus and with vector space model representation, the number of features being reduced by using the InformationGain method. The theoretical bases for Multilayer Perceptron neural networks are presented, both for the architecture and for the backpropagation learning algorithm. In order to evaluate the performance of the Multilayer Perceptron Classifier experiments were done, first with the default network architecture. Results are presented and prove valuable, but for a large number of features the performances decrease. In order to improve the obtained results we test different fine-tuned architectures by changing the number of neurons in the hidden layer. Therefore, the Weka Multilayer Perceptron Classifier is a classifier that deserves attention, but mainly when time requirements are not important at all..

Full Text:

PDF

References


Breazu, M. Tehnici fractale şi neuronale în compresia de imagini, Editura Universitatii „Lucian Blaga” din Sibiu, ISBN 973-739-251-5, Sibiu, 2006.

Cretulescu, R., Morariu, D., Breazu, M. - Using WEKA framework in document classification, The 7th International conference on Information Science and Information Literacy, ISSN 2067-9882, April 2016, Sibiu.

Han, J., Kamber, M., - Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001;

Manning, C., - An Introduction to Information Retrieval, Cambridge University Press, 2009

Mitchell, T., M. Machine Learning, The McGrow-Hill Companies, 1997

Mitkov R., The Oxford Handbook of Computational Linguistics, Oxford University Press, 2005;

Misha Wolf and Charles Wicksteed – Reuters Corpus: http://trec.nist.gov/data/reuters/reuters.html, accessed in 03.2016

Morariu D., Cretulescu R., Breazu, M. - Feature Selection in Document Classification, The fourth International Conference in Romania of Information Science and Information Literacy, ISSN-L 2247-0255, April 2013, Sibiu

Witten, I. H., , Hall, E. F., Pal, C.J., Data Mining – Practical Machine Learning Tools and Techniques with Java Implementation, Morgan Koufmann Press, 2000

http://www.cs.waikato.ac.nz/ml/weka/, accessed in 03.2016

http://www.cs.waikato.ac.nz/ml/weka/documentation.html, accessed 03.2016


Refbacks

  • There are currently no refbacks.


IJSASITELS is indexed by PKP - index , ROAD,  Google Scholar