Abstract
Text mining also known as Intelligent Text Analysis is an important research area It is very difficult to focus on the most appropriate information due to the high dimensionality of data Feature Extraction is one of the important techniques in data reduction to discover the most important features Proce- ssing massive amount of data stored in a unstructured form is a challenging task Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data The survey covers different text summarization classi- fication clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user Dealing with collection of text documents it is also very important to filter out duplicate data Once duplicates are deleted it is recommended to replace the removed duplicates Hence we also review the literature on duplicate detection and data fusion remove and replace duplicates The survey provides existing text mining techniques to extract relevant features detect duplicates and to replace the duplicate data to get fine grained knowledge to the user![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2016 Authors and Global Journals Private Limited