Skip to main content

Typical Document Processing Operations

First rule of document pre-processing : Improper pre-processing schemes may lead to losss of lexical content.
Hence, pre-processings steps are unique to a problem. Having said that there are few pre-processing steps which applies to most of the application at hand.
They are : a. tokenization b. normalization  c. substitution.
Other well known pre-processing steps : a. Case folding b. Stemming c. Lemmatization d. Remove misspellings e. Punctuations.

What is a stemming operation ?
- Process of reducing a inflected word to its root. Where inflected word is a word with extra letter or letters added to nouns,verbs and adjectives in different grammatical forms.

What is lemmatization ?
- Here also there is reduction in inflected word to its root, however stemming resultant need not to be a proper word in vocabulary but in case of lemmatization word has be part of the given language vocabulary.

What is case folding and its usage ?
- Case-folding is a part of the Unicode standard that allows any two strings that differ from one another only by case to map to the same "case-folded" form, even when those strings include characters with complex case-mappings.[ convert all letters to a single case , either upper case of lower case whichever is chosen. ]. Helps in normalization and making text searches relavent.

Comments

Popular posts from this blog

Event Sourcing with CQRS.

  The way event sourcing works with CQRS is to have  part of the application that models updates as writes to an event log or Kafka topic . This is paired with an event handler that subscribes to the Kafka topic, transforms the event (as required) and writes the materialized view to a read store.

Procedure to enable wireless LAN adapter in vmware!

I have went to find the way in which wlan in Kali Linux could be made listed my airmon-ng command. Most of the videos and posts suggested me that it is necessary to have another wi-fi adapter to have connected to wifi network. But this is not necessary. The procedure to do this is as follows: 1. Download this package in Virtual Machine(Kali linux) " http://wireless.kernel.org/download/compat-wireless-2.6/compat-wireless-2010-06-26-p.tar.bz2" 2.Unzip this package in terminal i.e navigate to place where you have downloaded this file , say desktop then type : cd Desktop for unzip type : tar -jxvf compat-wireless-2010-06-26-p.tar.bz2 3. Navigate into the uncompressed directory and type: i."make unload"  (only words within the quotes are the commands) ii."make load" Done!  now airmon-ng  should have the list of the required devices.