# How Well Do You Know About Data Science? Data Science Quiz

Approved & Edited by ProProfs Editorial Team
At ProProfs Quizzes, our dedicated in-house team of experts takes pride in their work. With a sharp eye for detail, they meticulously review each quiz. This ensures that every quiz, taken by over 100 million users, meets our standards of accuracy, clarity, and engagement.
Learn about Our Editorial Process
| Written by Anil
A
Anil
Community Contributor
Quizzes Created: 1 | Total Attempts: 9,270
Questions: 25 | Attempts: 9,283

Settings

Data science deals with processes and systems, which are used to extract knowledge or insights from large amounts of data. Data extracted can be either structured or unstructured and can be used to form conclusions. Test out what you know about data science by taking up the quiz below. All the best!

• 1.

### Which of the following model is usually gold standard for data analysis?

• A.

Inferential

• B.

Descriptive

• C.

Casual

• D.

All of the mentioned

A. Inferential
Explanation
The inferential model is usually considered the gold standard for data analysis because it allows researchers to make predictions and draw conclusions about a population based on a sample. This model involves using statistical techniques to analyze data and make inferences about a larger population. Descriptive analysis, on the other hand, focuses on summarizing and describing the data without making any predictions or inferences. Causal analysis is used to determine cause-and-effect relationships between variables, but it is not typically considered the gold standard for data analysis. Therefore, the correct answer is inferential.

Rate this question:

• 2.

### Which of the following are  “Measures of Central Tendency”?

• A.

Mean,Range, Mode

• B.

Mean, Standard Deviation, Range

• C.

Mode, Mean, Median

• D.

Range, Standard Deviation, Variance

C. Mode, Mean, Median
Explanation
The measures of central tendency are statistical measures used to describe the center or average of a data set. The mode is the most frequently occurring value, the mean is the average of all values, and the median is the middle value when the data set is arranged in ascending or descending order. Therefore, the correct answer is mode, mean, and median as they are all measures of central tendency.

Rate this question:

• 3.

### Who is a data scientist?

• A.

Mathematician

• B.

Statistician

• C.

Software programmer

• D.

All of the above

D. All of the above
Explanation
A data scientist is someone who possesses a combination of skills in mathematics, statistics, and software programming. They use these skills to analyze and interpret complex data sets, identify patterns and trends, and develop algorithms and models to solve problems and make data-driven decisions. By having expertise in all three areas, data scientists are able to handle the entire process of data analysis, from collecting and cleaning data to implementing and deploying analytical solutions. Therefore, the correct answer is "All of the above" as all three roles (mathematician, statistician, and software programmer) are encompassed within the field of data science.

Rate this question:

• 4.

### Which of the following is performed by Data Scientist?

• A.

Define the question

• B.

Create reproducible code

• C.

Challenge results

• D.

All of the Mentioned

C. Challenge results
Explanation
Data scientists perform the task of challenging results. This involves critically analyzing and evaluating the outcomes of data analysis and machine learning models. They assess the reliability and accuracy of the results, identify any limitations or biases, and determine if the findings align with the initial research question or hypothesis. By challenging results, data scientists ensure the validity and robustness of the conclusions drawn from the data analysis process.

Rate this question:

• 5.

### Which of the following is one of the key data science skill?

• A.

Statistics

• B.

Machine learning

• C.

Data visualization

• D.

All of the mentioned

B. Machine learning
Explanation
Machine learning is one of the key data science skills because it involves the use of algorithms and statistical models to enable computers to learn from and make predictions or decisions based on data. It is a crucial skill in data science as it allows for the development of models that can analyze and interpret large amounts of data, identify patterns, and make accurate predictions or classifications. Machine learning is widely used in various industries for tasks such as fraud detection, recommendation systems, image recognition, and natural language processing.

Rate this question:

• 6.

### Raw data should be processed only one time.

• A.

True

• B.

False

B. False
Explanation
Processing raw data multiple times can be necessary in certain situations. For example, if new information or updates are received, the raw data may need to be processed again to incorporate these changes. Additionally, different analyses or calculations may require different processing methods, leading to the need for multiple processing steps. Therefore, the statement that raw data should be processed only one time is incorrect.

Rate this question:

• 7.

### Which of the following is characteristic of Processed Data?

• A.

Data is not ready for analysis

• B.

All steps should be noted

• C.

Hard to use for data analysis

• D.

None of the mentioned

D. None of the mentioned
Explanation
Processed data refers to information that has been organized, structured, or manipulated in some way to make it more useful and meaningful for analysis. It is the opposite of raw data, which is unprocessed and typically not ready for analysis. Therefore, the statement "None of the mentioned" is the correct answer because processed data is indeed ready for analysis and can be used effectively for data analysis purposes.

Rate this question:

• 8.

### Which of the following testing is concerned with making decisions using data?

• A.

Probability

• B.

Hypothesis

• C.

Casual

• D.

None of the mentioned

B. Hypothesis
Explanation
Hypothesis testing is concerned with making decisions using data. In hypothesis testing, a researcher formulates a hypothesis about a population parameter and collects data to determine whether the evidence supports or contradicts the hypothesis. The goal is to make an inference about the population based on the sample data. This involves making decisions, such as accepting or rejecting the null hypothesis, based on the evidence provided by the data. Therefore, hypothesis testing is the correct answer as it involves using data to make decisions.

Rate this question:

• 9.

### Which of the following of a random variable is a measure of spread?

• A.

Variance

• B.

Standard deviation

• C.

Empirical mean

• D.

All of the mentioned

B. Standard deviation
Explanation
Standard deviation is a measure of spread for a random variable. It quantifies the amount of dispersion or variability in the data set. It measures how far each data point is from the mean, providing an indication of the spread or dispersion around the average. A higher standard deviation indicates a greater spread, while a lower standard deviation indicates a narrower spread. Therefore, the correct answer is standard deviation.

Rate this question:

• 10.

### Which of the following technique comes under practical machine learning?

• A.

Decision Tree

• B.

Data Visualisation

• C.

Forecasting

• D.

None of the mentioned

A. Decision Tree
Explanation
Decision Tree is a technique that falls under practical machine learning. It is a supervised learning algorithm that is used for both classification and regression tasks. It is practical because it is easy to understand and interpret, and it can handle both categorical and numerical data. Decision Tree builds a model by learning simple decision rules inferred from the data features, making it a widely used technique in various industries and applications. Data visualization and forecasting, though related to machine learning, are not specific techniques but rather tools or methods that can be used in conjunction with different machine learning algorithms.

Rate this question:

• 11.

### Which of the following is definition of Raw Data?

• A.

Set of Measurement on Recorded Values

• B.

Processed Data

• C.

Easy to use for data analysis

• D.

None of the Mentioned

A. Set of Measurement on Recorded Values
Explanation
Raw data refers to unprocessed and unorganized data that is collected directly from various sources. It consists of measurements or recorded values in their original form, without any manipulation or analysis. Raw data serves as the foundation for data analysis and is typically transformed and processed to extract meaningful insights and patterns. Therefore, the definition "Set of Measurement on Recorded Values" accurately describes raw data.

Rate this question:

• 12.

### __________ is the standard deviation of a sampling distribution.

• A.

Sample error

• B.

Sampling error

• C.

Simple error

• D.

Standard error

D. Standard error
Explanation
Standard error is the correct answer because it represents the standard deviation of a sampling distribution. A sampling distribution is a distribution of statistics obtained from multiple samples of the same population. The standard error measures the variability or spread of these statistics, indicating how much they differ from the true population parameter. It is an important measure in inferential statistics as it helps estimate the precision of sample statistics and make inferences about the population.

Rate this question:

• 13.

### Which of the following diagram is used to view correlation?

• A.

Triangle

• B.

Boxplot

• C.

Corrgram

• D.

Histogram

C. Corrgram
Explanation
A corrgram is a diagram used to view correlation. It displays a matrix of correlation coefficients between variables, usually represented by a grid of squares. Each square represents the correlation between two variables, with the color or shading indicating the strength and direction of the correlation. This diagram is useful for visually understanding the relationships between variables and identifying patterns or trends in the data.

Rate this question:

• 14.

### ____________ is a multidisciplinary which involves extraction of knowledge from large volumes of data that are structured or unstructured.

• A.

Data Science

• B.

Data Analysis

• C.

Descriptive Analysis

• D.

None of the mentioned

A. Data Science
Explanation
Data Science is the correct answer because it is a multidisciplinary field that involves the extraction of knowledge from large volumes of data, whether it is structured or unstructured. Data scientists use various techniques and tools to analyze and interpret data in order to gain insights and make informed decisions. This field combines elements of statistics, mathematics, computer science, and domain knowledge to extract valuable information from data.

Rate this question:

• 15.

### Pick Lazy Algorithm

• A.

K-Mean

• B.

CNN

• C.

KNN

• D.

RNN

C. KNN
Explanation
KNN stands for K-Nearest Neighbors, which is a lazy algorithm used for classification and regression tasks. It works by finding the k nearest neighbors to a given data point in the feature space and making predictions based on the majority class or average value of those neighbors. KNN is a non-parametric algorithm, meaning it does not make any assumptions about the underlying data distribution. It is simple to implement and can be effective for small to medium-sized datasets. However, it can be computationally expensive for large datasets and may not perform well in the presence of irrelevant or noisy features.

Rate this question:

• 16.

### 3V’s in Big Data

• A.

Velocity, Victory, Volume

• B.

Volume, Velocity, Variety

• C.

Volume, Viscous, Velocity

• D.

None of the above

B. Volume, Velocity, Variety
Explanation
The correct answer is Volume, Velocity, Variety. These are the three main characteristics of big data. Volume refers to the large amount of data being generated and collected. Velocity refers to the speed at which data is being generated and needs to be processed in real-time. Variety refers to the different types and formats of data, including structured, unstructured, and semi-structured data. These three V's are essential for understanding and analyzing big data effectively.

Rate this question:

• 17.

### Positive Correlation:

• A.

Above -0.8

• B.

Below -0.8

• C.

Above 0.8

• D.

Below 0.65

C. Above 0.8
Explanation
The correct answer is "Above 0.8". In statistics, a positive correlation indicates that as one variable increases, the other variable also tends to increase. The value of 0.8 indicates a strong positive correlation, meaning that there is a high degree of linear relationship between the two variables. Therefore, when the correlation coefficient is above 0.8, it suggests a strong positive correlation between the variables being studied.

Rate this question:

• 18.

### Weighted Average is used in:

• A.

Classification

• B.

Regression

• C.

Forecasting

• D.

Above All

C. Forecasting
Explanation
Weighted average is commonly used in forecasting to calculate a weighted average of historical data. This allows for the consideration of different weights or importance assigned to each data point, based on factors such as recency or reliability. By using a weighted average, the forecast can reflect the significance of each data point and provide a more accurate prediction of future trends or values. Therefore, forecasting is a specific application where weighted average is utilized.

Rate this question:

• 19.

### Sequential Modelling is done on

• A.

CNN

• B.

KNN

• C.

RNN

• D.

ANN

C. RNN
Explanation
Sequential modeling is a technique used to analyze and predict sequential data, such as time series or natural language. Recurrent Neural Networks (RNN) are particularly suitable for sequential modeling as they have a feedback loop that allows information to persist and be processed over time. Therefore, RNN is the correct answer as it is specifically designed for sequential modeling tasks. CNN (Convolutional Neural Networks) are mainly used for image and video analysis, KNN (K-Nearest Neighbors) is a non-parametric algorithm for classification and regression, and ANN (Artificial Neural Networks) is a general term that can refer to any type of neural network model.

Rate this question:

• 20.

### Why Machine Learning in Data Science?

• A.

For Visualization

• B.

For Prediction

• C.

For Cleaning

• D.

All the above

B. For Prediction
Explanation
Machine learning is used in data science for prediction because it allows the development of models that can analyze patterns and make accurate predictions based on historical data. By training these models with known data, they can learn to recognize patterns and relationships, and then apply that knowledge to make predictions on new, unseen data. This prediction capability is valuable in various fields, such as finance, healthcare, and marketing, where accurate predictions can help in decision-making and improving outcomes.

Rate this question:

• 21.

### Tableau can create worksheet-specific filters.

• A.

True

• B.

False

A. True
Explanation
Tableau has the capability to create filters that are specific to individual worksheets. This means that users can apply filters to a particular worksheet without affecting the data displayed in other worksheets. By using worksheet-specific filters, users can easily analyze and visualize data based on specific criteria, allowing for more focused and targeted insights. This feature enhances the flexibility and customization options available to users when working with Tableau.

Rate this question:

• 22.

### What is the order of execution of filters in tableau? 1) Context 2) Traditional 3) Custom 4) Show Me

• A.

1,2,3,4

• B.

2,3,4,1

• C.

3,1,2,4

• D.

4,3,2,1

C. 3,1,2,4
Explanation
The order of execution of filters in Tableau is 3) Custom, 1) Context, 2) Traditional, and 4) Show Me. This means that custom filters are applied first, followed by context filters, then traditional filters, and finally the Show Me filters.

Rate this question:

• 23.

### Will filters work when we do data blending?

• A.

True

• B.

False

A. True
Explanation
When we do data blending, filters will still work. Data blending is a technique used to combine data from multiple sources or tables into a single view. Filters are used to narrow down the data based on specific criteria. Even when data blending is performed, filters can still be applied to limit the data being displayed or analyzed. Thus, filters will continue to work effectively during data blending.

Rate this question:

• 24.

### Point out the correct statement:

• A.

Machine learning focuses on prediction, based on known properties learned from the training data

• B.

Data Cleaning focuses on prediction, based on known properties learned from the training data.

• C.

Representing data in a form which both mere mortals can understand and get valuable insights is as much a science as much as it is art

• D.

None of the Mentioned

A. Machine learning focuses on prediction, based on known properties learned from the training data
Explanation
The correct answer is "Machine learning focuses on prediction, based on known properties learned from the training data." This statement accurately describes the main objective of machine learning, which is to make predictions or decisions based on patterns and relationships learned from a set of training data. Machine learning algorithms analyze the training data to identify these patterns and use them to make predictions on new, unseen data.

Rate this question:

• 25.

### Which of the following can be considered as random variable ?

• A.

The outcome from the roll of a die

• B.

The outcome of flip of a coin

• C.

The outcome of exam

• D.

All of the Mentioned