KLASIFIKASI BERBASIS GRAVITASI DATA DAN PROBABILITAS POSTERIOR
Abstract
The classifi
cation method based on data gravitation (DGC) is one of the new classification techniques that uses data gravitation as the criteria of the classification. In the case of DGC, an object is classified on the basis of the class that creates the largest gravitation in that object. However, the DGC method may cause inaccurate result when the training data being used suffer from the class imbalanced problem. This may be caused by the existence of the training data containing a class having excessively big mass that will in turn tend to classify an uknown object as a member of that class due to the high degree of the data gravitation produced, and vice versa.
In this research, a modification to the DGC method is performed by constructing a classificaion method that is based on both the data gravitation and posterior probability (DGCPP). In DGCPP, the mass concept defined in the DGC method as the prior probability is replaced by the posterior probability. By using this modification, data gravitation calculation process is expected to produce more accurate results in compared to those produced by the DGC method. In addtion, by improving the data gravitation calculation, it is expected that the DGCPP method will
produce more accurate classification results in compared to those produced by the DGC method for both normal dataset as well as dataset having class imbalanced problems. A thorough tests for evaluating the classification accuracy are performed using a ten-fold cross-validation method on several datasets containing both normal and
imbalanced-class datasets. The results showed that DGCPP method produced positive average of accuracy differences in compared to those produced by the DGC method. For the tests using the entire normal datasets showed that the average of accuracy differences are statistically significant with a 95% confidence level. In addition, results of the tests using the four imbalanced-class datasets also showed that the average accuracy differences are statistically significant with a 95% confidence level. Finally, results of the tests for evaluating the computing times required by the classification program showed that the additional computing time needed by DGCPP method to perform the classification process is insignificant and less than the human response time, in compared to that needed by DGC method for running all datasets being used.
Â
Keywords—data gravitation-based classification, class imbalanced problem,posterior probability
Â
Full Text:
14-22 (Bahasa Indonesia)References
Kun, 2007,“Hierarchically SVM classification based on support vector clusteringmethod and its application to document categorizationâ€, Expert
Systemswith Applications, 33 (2007), 627–635
Li, Tzuu-Hseng S., Guo, Nai Ren dan Cheng, Chia Ping, 2008,“Design of a two-stage fuzzy classification modelâ€, Expert Systems with Applications, 35 (2008), 1482–1495
Jan, Nien-Yi, Lin, Shun-Chieh, Tseng, ShianShyong dan P. Lin,Nancy, 2009, “A decision support system for constructing an alert classification modelâ€, Expert Systems with Applications, 36 (2009),11145–11155
Peng, Lizhi, Yang, Bo dan Chen, Yuehui 2005, "A NovelClassification Method Based on Data Gravitation", Proc. OfInternational Conference on Neural Networks and Brain(ICNN&B),667-672, 2005.
Peng, Lizhi, Yang, Bo, Chen, Yuehui dan Abraham, Ajith, 2009, “Data Gravitation Based Classificationâ€, Information Sciences, 179, 809–
Tan, P.N., Steinbach, M. dan Kumar, V., 2006, “Introduction to DataMiningâ€, Pearson Education, Inc., Boston.
Li, Yumei dan Anderson-Sprecher, Richard, 2006, “Faciesidentification from well logs: A comparison of discriminant analysis andnaïve
bayesclassifierâ€, Journal of Petroleum Science and Engineering,53 (2006), 149–157
Rish, Irina, 2001, "An empirical study of the Naive bayes classifier", IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.
Turhan, Burak dan Bener, Ayse, 2009, “Analysis of Naive bayesassumptions on software fault data : An empirical studyâ€, Data &
Knowledge Engineering, 68 (2009), 278–290
DOI: http://dx.doi.org/10.53567/spirit.v7i1.23
Refbacks
- There are currently no refbacks.
Copyright (c) 2016 Jurnal SPIRIT
Diindeks Oleh:
SPIRIT : Sarana Penunjang Informasi Terkini
Diterbitkan oleh Teknologi Informasi Institut Teknologi dan Bisnis Yadika Pasuruan
Alamat Redaksi: Jl. Bader No.9, Kwangsan, Kalirejo, Kec. Bangil, Pasuruan, Jawa Timur 67153
Telp/Fax: (0343) 742070 , Email : lppm@stmik-yadika.ac.id
Google Maps : Klik Disini
Karya ini dilisensikan di bawah Lisensi Internasional Creative Commons Atribusi 4.0 .