First rule of document pre-processing : Improper pre-processing schemes may lead to losss of lexical content. Hence, pre-processings steps are unique to a problem. Having said that there are few pre-processing steps which applies to most of the application at hand. They are : a. tokenization b. normalization c. substitution. Other well known pre-processing steps : a. Case folding b. Stemming c. Lemmatization d. Remove misspellings e. Punctuations. What is a stemming operation ? - Process of reducing a inflected word to its root. Where inflected word is a word with extra letter or letters added to nouns,verbs and adjectives in different grammatical forms. What is lemmatization ? - Here also there is reduction in inflected word to its root, however stemming resultant need not to be a proper word in vocabulary but in case of lemmatization word has be part of the given language vocabulary. What is case folding and its usage ? - Case-folding is a part of the Unicode s...