Natural Language Processing Interview Questions - 2

8. What major steps would you take in pre-processing text data?

While there can be a lot of step in pre-processing the text data, there are 3 major categories in which you divide them. They are:

i.) Segmentation and Tokenization - Segmentation is the process of dividing large paragraphs into sentences while Tokenization means splitting the sentence into separate words. While segmentation may look very easy, it is actually a tough task.

ii.) Normalizing the data - Normalizing the data has a lot of small steps to be performed during the process. These steps include:

a.) Converting all the words into a similar case - lower or upper
b.) Removing punctuation signs
c.) Converting numbers in words
d.) Getting rid of stop words
e.) Removing white space
f.) Performing Stemming and Lemmatization to achieve a set of base words.

iii.) Removing the noise

9. What are the different types of Artificial Neural Networks for Natural Language Processing?

An artificial Neural Network is a non-linear computation model which is based on actual neural network of a human brain. It consists of artificial neurons which are also called as processing elements.

The major types of Artificial Neural Networks used in NLP are:

i.) Multilayer Perceptron (MLP)
ii.) Convolution Neural Network (CNN)
iii.) Recursive Neural Network (RNN)
iv.) Recurrent Neural Network
v.) Long-short term memory (LSTM)
vi.) Sequence-to-Sequence Models
vii.) Shallow Neural Networks