Feature Extraction and Duplicate Detection for Text Mining: A Survey
Feature Extraction and Duplicate Detection for Text Mining: A Survey
Article PDF

Keywords

text feature extraction
text mining
query search
text classification

How to Cite

Ramya R S, & Venugopal K R. (2017). Feature Extraction and Duplicate Detection for Text Mining: A Survey. Global Journal of Computer Science and Technology, 16(C5), 1–20. Retrieved from https://gjcst.com/index.php/gjcst/article/view/810

Abstract

Text mining also known as Intelligent Text Analysis is an important research area It is very difficult to focus on the most appropriate information due to the high dimensionality of data Feature Extraction is one of the important techniques in data reduction to discover the most important features Proce- ssing massive amount of data stored in a unstructured form is a challenging task Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data The survey covers different text summarization classi- fication clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user Dealing with collection of text documents it is also very important to filter out duplicate data Once duplicates are deleted it is recommended to replace the removed duplicates Hence we also review the literature on duplicate detection and data fusion remove and replace duplicates The survey provides existing text mining techniques to extract relevant features detect duplicates and to replace the duplicate data to get fine grained knowledge to the user
Article PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2016 Authors and Global Journals Private Limited