Imminent Rift Assortment Algorithm for Elevated Facet Data Using Wanton Collecting
Abstract
Feature subset clustering is a powerful technique to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a similarity-based self-constructing algorithm for feature clustering with the help of K-Means strategy. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster, and make a head to each cluster data sets.
By the FAST algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our FAST algorithm implementation can run faster and obtain better-extracted features than other methods.Keywords
References
Almuallim H. and Dietterich T.G., Algorithms for Identifying Relevant Features, In Proceedings of the 9th
Canadian Conference on AI, pp 3845,1992.
Bell D.A. and Wang, H., A formalism for relevance and its application infeature subset selection, Machine Learning, 41(2), pp 175195, 2000.
Biesiada J. and Duch W., Features election for
highdimensionaldatałaPearsonredund ancy based filter, Advances inSoftComputing, 45, pp
C249,2008.
Dash M., Liu H. and Motoda H., Consistency based feature Selection, InProceedings of the Fourth Pacific Asia Conference on Knowledge Discoveryand Data Mining, pp 98-109, 2000.
Das S., Filters, wrappers and a boostingbased hybrid for feature Selection,In Proceedings of the Eighteenth International Conference on MachineLearning, pp 74-81, 2001.
Dash M. and Liu H., Consistency-based search in feature selection. Artificial Intelligence, 151(1-2), pp 155-176, 2003.
Demsar J., Statistical comparison of classifiers over multiple data sets, J.Mach. Learn. Res., 7, pp 1-30, 2006.
Fleuret F., Fast binary feature selection with conditional mutual
Information,Journal of Machine Learning Research, 5, pp 15311555, 2004.
Forman G., An extensive empirical study of feature selection metrics fortext classification, Journal of Machine Learning Research, 3, pp 1289-1305,2003.
Garcia S and Herrera F., An extension on “Statistical Comparisons ofClassifiers over Multiple Data Sets†for all pairwise comparisons, J. Mach.Learn. Res., 9, pp 2677-2694, 2008.
Guyon I. and Elisseeff A., An introduction to variable and feature selection,Journal of Machine Learning Research, 3, pp 11571182, 2003.
Hall M.A., Correlation-Based Feature
Selection for Discrete and
NumericClass Machine Learning, In Proceedings of 17th International Conferenceon Machine
Learning, pp 359-366, 2000.
Refbacks
- There are currently no refbacks.
Copyright © 2013, All rights reserved.| ijseat.com
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.
Â