Skip to main content

Posts

Showing posts from August 4, 2019

Typical Document Processing Operations

First rule of document pre-processing : Improper pre-processing schemes may lead to losss of lexical content. Hence, pre-processings steps are unique to a problem. Having said that there are few pre-processing steps which applies to most of the application at hand. They are : a. tokenization b. normalization  c. substitution. Other well known pre-processing steps : a. Case folding b. Stemming c. Lemmatization d. Remove misspellings e. Punctuations. What is a stemming operation ? - Process of reducing a inflected word to its root. Where inflected word is a word with extra letter or letters added to nouns,verbs and adjectives in different grammatical forms. What is lemmatization ? - Here also there is reduction in inflected word to its root, however stemming resultant need not to be a proper word in vocabulary but in case of lemmatization word has be part of the given language vocabulary. What is case folding and its usage ? - Case-folding is a part of the Unicode standard t

Applications of Vector Space Model

What does vector space model mean ? - It's an algebraic representation of text documents. Why do we need to represent text documents algebraically ?  - Typical algebric operations gets simpler to perform and visualize.  - Information retrival, filtering, indexing, ranking can be made with standard procedures and helps to have standardized measuring metrics around it. So how is vector space model helping in Information retrval ?

Free MPI platform for High Performance Computing

Here is the official website of HPC @ Uni.lu platform, which assembles information about the computing clusters operated by the  University of Luxembourg  and the organization running them :  https://hpc.uni.lu/ Page here :  https://ulhpc-tutorials.readthedocs.io/en/latest/parallel/basics/  , helps you get started for using the computing resources offerred by the platform. For learning OpenMP from scratch there is none better than this :  https://nptel.ac.in/courses/106102163/