Data Science Interview Questions and Answers

Data Science Interview Questions and Answers

The world of technology is evolving and with that are evolving the professions. Analytics which has evolved into Data Science is one of the most talked about career fields these days.

This section covers Data Science Interview questions and answers on general, conceptual and technical topics. You can also find interesting examples with the questions, as required.

Who are these Data Science Interview Questions useful for?

The Data Science Interview Questions will be useful for all the beginners and experienced candidates interviewing for the role of Junior Data Scientist, Senior Data Scientists, Data Science Interns etc.

1. What are the primary skills that a Data Scientist must possess?

Well, quite a simple question to begin the interview with but something that can give the interviewer a very deep insight into your understanding of this role.

We have divided this answer into three segments based on the seniority of the role. See, what is suitable to you and use it to answer this question in your interview.

The primary skills that the Data Scientists at an entry level are expected to possess are:

i.) Good knowledge of Statistics and Mathematics

ii.) Ability to think logically coupled with an analytical approach to things

iii.) A good understanding of Data Science models with an ability to rework the existing models

iv.) And, off course an ability to code

However, the Data scientists are not expected to be high level coders like Software Developers.

If you are a mid-level Data Scientist, in addition to all the above skills, you should:

i.) Be able to identify the flaws in a model and put it into production.

ii.) Know the details of the products you have worked on but are not expected to know the complete architecture

iii.) While you may not have solutions, you are expected to understand the problems that the business is facing

iv.) You should possess good communication skills and should be able to work as bridge between the higher management and lower level Data Scientists.

The Data Scientists at the next level are the actual brains behind every project. The most important skills they bring with them are:

i.) Business Acumen - They clearly understand the business problems and try to solve them

ii.) Ability to create high end Data Science projects

iii.) Ability to lead a cross functional team

iv.) Ideas about new products which Product managers adapt as per the market requirements

v.) Good communication and people skills.

Video : Data Science Interview Questions and Answers - Analytics Interview Questions

2. What tips would you give to a person, to excel in Data Science career?

Understand this question as, "What tips would "YOU" practice to succeed in this profession?"

So, some of the things you can talk about to answer this question effectively are:

i.) Strong foundation - In statistics and mathematics.

ii.) Quality of Data - Ensure that the quality of data you are using is good. The size of the population and its authenticity is very important.  

iii.) Be doubtful of your own assumptions - Question them even if you believe them to be correct. This becomes all the more important when you are dealing with human behavior and issues that can not be predicted with high level of certainty. Being overly confident can lead to failures.

iv.) Acknowledge if you are biased - It is possible for human beings to fall a victim to their own biases. If the issue is closer to your heart, you may not be able to make unbiased assumptions. Beware of this!

v.) Don't let your curiosity die down - Data Scientists are people who think from various angles and always have an element "what if". Ensure that it stay alive always.

vi.) Know the purpose of data - Before you start working on a data ask what it will be used for. This makes you take the right approach because if the two understandings don't match, the data collected is not effective and unnecessarily takes time.

vii.) Dedicate time to master some tools - You won't be able to master all but some can definitely be mastered.

viii.) Learn new things - This is an evolving field. Those who keep themselves updated, will rise faster than others.

ix.) Practice - Just reading about things or seeing your colleagues do them won't help. Get your hands dirty. Practice them.

3. What are the various types of analytics used in Data Science?

Data Scientists work to derive the human understandable meaning of the data.

The four major types of analytics they carry out are:

i.) Descriptive Analytics - As the name suggests, Descriptive Analysis describes in layman's language what the raw data says about an event. It helps in understanding any patterns to deal with the future.

ii.) Diagnostic Analytics - Here the Data Scientists dig deeper to find the source of problem.

iii.) Predictive Analytics - The Predictive Analytics models use various related factors or variables to find the probability or timing of an event or trend for the future. This helps the businesses in gearing themselves up for the future.

iv.) Prescriptive Analytics - As the name suggests, this type of analytics prescribe the actions that can be taken in the future to get the desired results.

4. What are Predictor variables? Would you have too many or just a few of them in a model? Why?

Predictor variables are also referred to as independent variables or x-variables.

In a model, you try to see how the change in a predictor variable affects the outcome.

We would prefer to have only a few relevant predictor variables in a model because:

i. Having too many predictor variables might mean that some of them have a similar effect on the model. So, they unknowingly get an element of redundancy into the model.

ii. It is also possible that not all the predictor variables are relevant to the model thus making it less effective and time consuming to execute.

iii. Having too many predictor variables in the model may increase its complexity and ultimately the performance in real case scenario.

So, to get a good model, it is advisable to select most relevant and limited number of predictor variables.

5. Explain False Positive and False Negative?

i. False Positive - When a test wrongly identifies the presence of a condition when it is actually not, it is called as False Positive.

For e.g. if a medical test identifies the presence of a medical condition, when actually it is not, it is called false positive. In such a case, the patient may unnecessarily take the medicine or go through the treatment which may further harm him.

ii. False Negative - When a test wrongly indicates the absence of a condition when it is actually present, it is called False Negative.

For e.g. If a medical test say that a person doesn't have a medical condition or disease when he actually has it, it is False Negative.

This situation is bad because either the patient will go without the required treatment or will have to take further tests which costs more money.

6. What is the difference between Linear and Non-linear regression models?

A lot of students believe that linear equations are the ones that produce straight line when plotted on the chart while the non linear equations produce curves. But, the difference between the two is not that simple.

The terms in a linear regression model will fall into one of the following categories:

i. The constant

ii. A parameter multiplied by an independent variable

The equation would be:

Y = a + b*X + c*X1

The function should be linear in parameter while the independent variables may be squared to form a curve. The model will still stay linear.

So, Y = a + b*X + c*X12

is linear.

The presence of log terms or inverse terms can change the type of curve but it'll still be linear because it is still linear in parameters.

The non-linear equations are not comprised of just addition and multiplication& anything that doesn't look like a linear model is non-linear.

7. What is Regression Analysis? What are its major types?

Regression analysis is a type of Predictive modelling technique that tries to find out the relationship between dependent and independent variables, referred to as "target" and "predictor" respectively.

The technique is mainly used for forecasting and to estimate the relationship between various variables.

Regression analysis is divided into various types depending upon the following factors:

i.) No. of Predictors i.e. Independent variables

ii.) Shape of Regression Line

iii.) Type of dependent variable

The various types of Regression Analysis are:

i. Linear Regression - Linear regression establishes the relation between dependent variable and independent variables. It is the most widely used form of regression. There are two type of linear regressions - Simple linear regression (when the predictor is only one, Multiple linear regression (when there are many predictors).

ii. Logistic Regression - When the dependent variable has a binary value, Logistic regression is used.  There are two type of Logistic Regression - Ordinal and Multinomial.

iii. Polynomial Regression

iv. Stepwise Regression

v. Ridge Regression

vi. Lasso Regression

vii. ElasticNet Regression