Data Mining Trivia Quiz

Reviewed by Editorial Team
The ProProfs editorial team is comprised of experienced subject matter experts. They've collectively created over 10,000 quizzes and lessons, serving over 100 million users. Our team includes in-house content moderators and subject matter experts, as well as a global network of rigorously trained contributors. All adhere to our comprehensive editorial guidelines, ensuring the delivery of high-quality content.
Learn about Our Editorial Process
| By Cloud_XII
C
Cloud_XII
Community Contributor
Quizzes Created: 2 | Total Attempts: 15,749
| Attempts: 14,368 | Questions: 50
Please wait...
Question 1 / 50
0 %
0/100
Score 0/100
1. A collection of attributes describe an object?

Explanation

A collection of attributes indeed describes an object. Attributes are characteristics or properties that define an object and provide information about it. They can include things like size, color, shape, and any other relevant details that help identify or differentiate the object. By having a collection of attributes, we can fully describe and understand the object in question.

Submit
Please wait...
About This Quiz
Data Mining Trivia Quiz - Quiz

Do you know everything about data mining? You can take up this data mining trivia quiz to check your knowledge on the same. Living in a world run... see moreand managed using soft technology, data has become an essential part of human existence. The quantity of data generated and stored every day makes it hard to retrieve when needed. How much do you know about data mining? The quiz results will show how well you know about this topic. All the best! see less

2. Some purposes of Dimensionality Reduction include:
 -- Avoid the curse of dimensionality
 -- Reduce the amount of time and memory required by data mining algorithms
 -- Allow data to be more easily visualized
 -- May help to eliminate irrelevant features or reduce noise

Explanation

The statement is true. Dimensionality reduction techniques are used to address the curse of dimensionality, where the performance of data mining algorithms decreases as the number of dimensions increases. By reducing the number of dimensions, the amount of time and memory required by these algorithms is also reduced. Additionally, dimensionality reduction allows for easier visualization of data and can help eliminate irrelevant features or reduce noise.

Submit
3. Some techniques of Dimensionality reduction: 
 -- Principle Component Analysis 
 -- Singular Value Decomposition
 -- Others: supervised and non-linear techniques

Explanation

The given answer is true because the statement accurately lists some common techniques of dimensionality reduction, including Principle Component Analysis, Singular Value Decomposition, and other supervised and non-linear techniques. These techniques are commonly used to reduce the number of features or variables in a dataset while preserving important information and minimizing loss of accuracy.

Submit
4. In Curse of Dimensionality, when dimensionality increases, data becomes increasingly sparse in the space that it occupies.

Explanation

As the dimensionality increases, the volume of the space also increases exponentially. However, the number of data points available remains limited. This leads to the data points being spread thinly across the high-dimensional space, resulting in sparsity. This phenomenon is known as the curse of dimensionality, where the data becomes increasingly sparse as the dimensionality increases. Therefore, the given statement is true.

Submit
5. Same attribute can be mapped to different attribute values.

Explanation

In data modeling, it is possible for the same attribute to be mapped to different attribute values. This can occur when there is a need to represent different values for the same attribute in different contexts or scenarios. This flexibility allows for more comprehensive and accurate representation of data, as it accommodates the variability that can exist in real-world situations. Therefore, the statement is true.

Submit
6. In curse dimensionality, definitions of density and distance between points, which are critical for clustering and outlier detection, become less meaningful. 

Explanation

In curse dimensionality, as the number of dimensions increases, the definitions of density and distance between points become less meaningful. This is because in high-dimensional spaces, the points tend to spread out and become more uniformly distributed, making it difficult to define meaningful clusters or identify outliers based on distance or density. Therefore, the statement that in curse dimensionality, definitions of density and distance between points become less meaningful is true.

Submit
7. Examples of Ratios can be:

Explanation

The given answer is correct because the examples provided all represent different types of ratios. The temperature in Kelvin, length, time, and counts are all measurable quantities that can be compared and expressed as ratios. These examples demonstrate the concept of ratios in various contexts, such as scientific measurements, quantitative data, and categorical data.

Submit
8. Examples of Nominal can be:

Explanation

The examples given in the question are all examples of nominal data. Nominal data is a type of categorical data where the values represent categories or groups. In this case, ID numbers, eye color, and zip codes are all examples of categories or groups that do not have any inherent order or numerical value associated with them. They simply represent different options or labels.

Submit
9. An _____ is a property or characteristic of an object. Example: eye color of a person, temperature, etc.

Explanation

The given correct answer is "attribute". An attribute refers to a property or characteristic of an object. It can be used to describe various features of an object, such as the eye color of a person or the temperature of an environment. Attributes provide information about the qualities or traits associated with an object, helping to define and understand its nature.

Submit
10. What are some examples of data quality problems:

Explanation

Noise and outliers, missing values, and duplicate data are all examples of data quality problems. Noise refers to irrelevant or random data that can distort the analysis or interpretation of the data. Outliers are extreme values that deviate significantly from the other data points and can skew the results. Missing values occur when data is incomplete or not recorded for certain variables, which can lead to biased or inaccurate analysis. Duplicate data refers to multiple instances of the same data, which can cause redundancy and inconsistencies in the analysis.

Submit
11. Examples of Ordinal can be:

Explanation

The given examples of ordinal data include rankings, taste of potato chips, grades, and height. Ordinal data is a type of categorical data that has a specific order or ranking associated with it. In rankings, there is a clear order from first to last. The taste of potato chips can be ranked from least favorite to most favorite. Grades can be ordered from lowest to highest. Height can also be ranked from shortest to tallest.

Submit
12. Generic graph and HTML Links are examples of _________ data

Explanation

Generic graph and HTML Links are examples of graph data. A graph is a data structure that consists of nodes (also known as vertices) and edges. Nodes represent entities, while edges represent the relationships or connections between those entities. In the case of a generic graph, the nodes and edges can represent any type of data or concept. HTML Links, on the other hand, are a specific type of graph data that represent connections between web pages or resources on the internet.

Submit
13. _______ refers to the modification of original values, such as distortion of a person's voice when talking on a poor phone and "snow" on the television screen.

Explanation

Noise refers to the modification of original values, such as distortion of a person's voice when talking on a poor phone and "snow" on the television screen. Noise can be defined as any unwanted or random variations that interfere with the desired signal or information. In the context of the given question, noise specifically refers to the disturbances or alterations that affect the clarity or quality of audio or visual signals, resulting in distorted or unclear output.

Submit
14. The Key principle for effective sampling is the following:

Explanation

The key principle for effective sampling is that using a sample will work almost as well as using the entire dataset if the sample is representative. This means that the sample should have approximately the same property of interest as the original dataset. If the mean is of interest, then the mean of the sample should be similar to the mean of the full data. Therefore, all of the statements mentioned above are correct and contribute to the principle of effective sampling.

Submit
15. Examples of intervals can be:

Explanation

The given answer is "Calendar dates, temperatures in Celsius or Fahrenheit". This is because calendar dates and temperatures in Celsius or Fahrenheit are examples of intervals. Intervals are a range of values that can be measured or categorized. In the case of calendar dates, it represents a range of dates within a specific time period. Similarly, temperatures in Celsius or Fahrenheit represent a range of values within a temperature scale. Both examples demonstrate the concept of intervals where values can be measured or categorized within a specific range.

Submit
16. The record data set consists of: 

Explanation

The correct answer is Data Matrix, Document Data, Transaction Data. These are different types of data sets that can be found in the record data set. A data matrix is a two-dimensional array of data, where each row represents an individual and each column represents a variable. Document data refers to textual data, such as articles or reports. Transaction data refers to records of individual transactions, such as purchases or sales. These different types of data sets highlight the diverse nature of data that can be included in the record data set.

Submit
17. Data Matrix is:

Explanation

The given correct answer is "Both A and B." This means that both statement A and statement B are correct. Statement A states that if data objects have the same fixed set of numeric attributes, then the data objects can be thought of as points in a multi-dimensional space, where each dimension represents a distinct attribute. Statement B states that such a data set can be represented by an m by n matrix, where there are m rows, one for each object, and n columns, one for each attribute. Both of these statements are true and provide an accurate description of a data matrix.

Submit
18. Different attributes cannot be mapped to the same set of values.

Explanation

This statement is false because different attributes can be mapped to the same set of values. For example, in a database, two different attributes like "age" and "years of experience" can both be mapped to the same set of numerical values, such as 0-100. Therefore, it is possible for different attributes to have the same set of values.

Submit
19. What are the different types of attributes?

Explanation

The different types of attributes are nominal, ordinal, interval, and ratio. Nominal attributes are categorical and have no inherent order or ranking. Ordinal attributes have a natural order or ranking, but the differences between values may not be consistent. Interval attributes have a consistent difference between values, but there is no true zero point. Ratio attributes have a consistent difference between values and a true zero point.

Submit
20. The type of a Ratio attribute depends on which of the following properties:

Explanation

The type of a Ratio attribute depends on all four properties: distinctness, order, addition, and distinctness. Distinctness ensures that each value in the attribute is unique, order determines the arrangement of values from least to greatest, addition allows for mathematical operations such as addition and subtraction, and distinctness ensures that no two values in the attribute are the same. All four properties are necessary to define a Ratio attribute.

Submit
21. _______ values are numbers or symbols assigned to an attribute

Explanation

In the context of the question, an attribute refers to a characteristic or property of an object or entity. These attributes can be assigned values, which can be either numbers or symbols, to represent the specific characteristics or properties of the object or entity. Therefore, the correct answer is "Attribute."

Submit
22. The type of a Nominal attribute depends on which of the following properties:

Explanation

The type of a Nominal attribute depends on the property of distinctness. Nominal attributes represent categories or labels that do not have any inherent order or numerical value associated with them. They are used to classify data into distinct groups or classes. The other properties mentioned in the options, such as order and addition, are not relevant for determining the type of a Nominal attribute.

Submit
23. The type of an Ordinal attribute depends on which of the following properties:

Explanation

The type of an Ordinal attribute depends on distinctness and order because these two properties are fundamental to defining an ordinal attribute. Distinctness ensures that each value in the attribute is unique and different from others, while order determines the relative position or ranking of the values. These two properties allow for the classification and comparison of values in an ordinal attribute, making it possible to establish a hierarchy or sequence based on their order. Addition is not necessary for defining the type of an ordinal attribute, as it is not directly related to the distinctness and order of the values.

Submit
24. Graph data set consists of: 

Explanation

The graph data set consists of the World Wide Web and molecular structures. This means that the data set includes information related to the interconnectedness of web pages and the structures of molecules. The World Wide Web is a network of interlinked hypertext documents, while molecular structures refer to the arrangement of atoms and bonds in molecules. These two types of data are examples of the information that can be represented and analyzed using graph data structures.

Submit
25. An ordered data set consists of: 

Explanation

The correct answer consists of different types of ordered data sets. Spatial data refers to data that has a spatial component, such as geographic coordinates. Temporal data refers to data that is related to time, such as timestamps. Sequential data refers to data that has a specific order or sequence, such as a series of events. Genetic sequence data refers to data that represents the sequence of nucleotides in a DNA molecule. Therefore, the correct answer includes various types of ordered data sets.

Submit
26. _________ are data objects with characteristics that are considerably different than most of the other data objects in the data set.

Explanation

Outliers are data objects that have characteristics that are significantly different from the majority of the other data objects in the dataset. They are extreme values that lie far away from the other data points and can have a significant impact on statistical analysis and modeling. Outliers can occur due to various reasons such as measurement errors, data entry mistakes, or genuine unusual observations. Identifying and handling outliers is important in data analysis to ensure accurate and reliable results.

Submit
27. Each document becomes a ___________ vector

Explanation

In the given question, the missing word is "term". The term "term" is commonly used in the context of document analysis or text mining. In this context, a document is typically represented as a vector, where each element of the vector corresponds to a specific term or word. This vector representation allows for various mathematical operations and analysis to be performed on the documents, such as clustering or classification. Therefore, the correct answer is "term".

Submit
28. _____ data set may include data objects that are duplicates or almost duplicates of one another.

Explanation

A duplicate data set may include data objects that are duplicates or almost duplicates of one another. This means that there are multiple instances of the same data or very similar data within the data set. These duplicates can be problematic as they can lead to redundancy, inefficiency, and inaccuracies in data analysis and processing. It is important to identify and handle duplicates appropriately to ensure the integrity and quality of the data.

Submit
29. _______ data is a data that consists of a collection of records, each of which consists of a fixed set of attributes

Explanation

A record is a type of data that contains a fixed set of attributes. It is a collection of information organized in a structured manner, where each record represents a specific entity or object. The attributes within a record are predefined and remain constant for all records in the dataset. This type of data structure is commonly used in databases and file systems to store and retrieve information efficiently.

Submit
30. ________ is the closeness of measurements to the true value of the quantity being measured.

Explanation

Accuracy refers to the closeness of measurements to the true value of the quantity being measured. In other words, it is a measure of how well a measurement represents the actual value. A high level of accuracy means that the measurements are very close to the true value, while a low level of accuracy indicates a greater deviation from the true value. Accuracy is an important factor in many fields, such as science, engineering, and medicine, as it ensures reliable and trustworthy data for analysis and decision-making.

Submit
31. _______ is combining two or more attributes (or objects) into a single attribute (or object).

Explanation

Aggregation is the process of combining two or more attributes or objects into a single attribute or object. It involves creating a relationship between the objects, where one object represents a whole and the other objects represent its parts. This allows for a more simplified and organized representation of complex systems or structures. In aggregation, the individual objects can still exist independently, but they are also part of the larger whole.

Submit
32. Which of the following is NOT an example of sampling?

Explanation

not-available-via-ai

Submit
33. _______ Attribute has only a finite or countably infinite set of values, often represented as integer variables, Example: zip codes, counts, or the set of words in a collection of documents

Explanation

The attribute described in the question has a set of values that is either finite or countably infinite. This means that the values can be represented as integers and examples of such attributes include zip codes, counts, or the set of words in a collection of documents. Therefore, the correct answer is "Discrete".

Submit
34. The type of an Interval attribute depends on which of the following properties:

Explanation

The type of an Interval attribute depends on distinctness, order, and addition. Distinctness refers to the uniqueness of values in the attribute. Order refers to the arrangement of values in a specific sequence. Addition refers to the ability to perform mathematical operations such as adding intervals together. All three properties are necessary to determine the type of an Interval attribute.

Submit
35. ____________ data is sequences of transactions or genomic sequence data

Explanation

not-available-via-ai

Submit
36. __________ data is a special type of record data, where each record involves a set of items

Explanation

Transaction data is a special type of record data where each record consists of a set of items. This means that in transaction data, each entry represents a transaction or an event that involves multiple items. For example, in a retail sales dataset, each transaction record would contain the items purchased by a customer in a single transaction. Transaction data is commonly used in various fields such as market basket analysis, customer behavior analysis, and fraud detection.

Submit
37. __________ is a systematic variation of Measurements from the quantity being measured.

Explanation

Bias is a term used to describe a systematic variation of measurements from the actual quantity being measured. It refers to a consistent deviation in the measurements, which can be caused by various factors such as faulty instruments, human error, or a flawed experimental design. Bias can lead to inaccurate and unreliable results, as it introduces a consistent offset or distortion in the measurements. Therefore, it is important to identify and minimize bias in order to obtain accurate and valid data.

Submit
38. _______ Attribute has real numbers as attribute values. Practically, real values can only be measured and represented using a finite number of digits. It is typically represented as a floating-point variable.

Explanation

The given answer, "Continuous," is correct because a continuous attribute is one that can take on any real number value within a certain range. Real values, which include all rational and irrational numbers, can only be measured and represented with a finite number of digits due to the limitations of computational systems. Therefore, continuous attributes are typically represented using floating-point variables, which can accurately represent real numbers with a certain precision.

Submit
39. The average Monthly Temperature of land and ocean can be considered a(n) ______ data.

Explanation

The term "ordered" suggests that the average Monthly Temperature of land and ocean data has a specific arrangement or sequence. This implies that the data is organized in a systematic manner, possibly in ascending or descending order. The use of the term "ordered" indicates that there is a logical structure to the data set, making it easier to analyze and interpret.

Submit
40. What are the purposes of Aggregation:

Explanation

Aggregation serves multiple purposes, including data reduction, change of scale, and obtaining more "stable" data. Data reduction involves summarizing or consolidating large amounts of data into a more manageable form. Change of scale refers to converting data from one level of detail to another, such as aggregating daily data into monthly or yearly data. Aggregation can also help in obtaining more "stable" data by reducing noise or variability in the data, making it more reliable for analysis and decision-making.

Submit
41. _________ is the closeness of repeated measurements (of the same quantity) to other measurements.

Explanation

Precision refers to the closeness or consistency of repeated measurements of the same quantity to each other. It indicates how well the measurements agree with each other and how reproducible they are. A high level of precision means that the measurements are very close to each other, indicating a low level of random error. On the other hand, low precision suggests that the measurements are scattered and less consistent, indicating a higher level of random error. In summary, precision is a measure of the reliability and consistency of measurements.

Submit
42. Important Characteristics of Structured Data are:

Explanation

Structured data refers to data that is organized in a predefined format, such as a table with rows and columns. The characteristics mentioned in the question - dimensionality, resolution, and sparsity - are important aspects of structured data.

Dimensionality refers to the number of attributes or features present in the data. Higher dimensionality means there are more attributes, which can provide more detailed information but also increase complexity.

Resolution refers to the level of detail in the data. Higher resolution means more precise and fine-grained data, while lower resolution may result in aggregated or summarized information.

Sparsity refers to the proportion of missing or empty values in the data. Structured data can have missing values, and the sparsity level indicates how much of the data is missing.

These characteristics are important to consider when working with structured data as they can impact data analysis and decision-making processes.

Submit
43. The data that helps to identify substructures is considered to be ___________ data

Explanation

The correct answer is "chemical" because substructures are typically identified based on the chemical composition and arrangement of atoms within a molecule. Chemical data refers to information about the properties, structures, and interactions of chemical compounds, which is crucial for identifying substructures in various chemical systems.

Submit
44. What are the methodologies of Feature Creation?

Explanation

The question asks for the methodologies of feature creation. The given answer options include "Feature Extraction," "Mapping Data to New Space," and "Feature Construction." These three options all represent different methods of creating features. Feature extraction involves extracting relevant information from raw data. Mapping data to a new space involves transforming the data into a different representation to create new features. Feature construction involves creating new features by combining existing features or generating new ones based on domain knowledge. Therefore, the correct answer is the combination of these three methodologies.

Submit
45. What are the types of sampling:

Explanation

The answer provided lists the types of sampling methods. Random sampling refers to the selection of individuals from a population in a way that each individual has an equal chance of being chosen. Without replacement means that once an individual is selected, they are not put back into the population for future selection. With replacement means that individuals are put back into the population after selection, allowing them to be chosen again. Stratified sampling involves dividing the population into subgroups or strata and then selecting individuals from each stratum. Simplified sampling is not a recognized type of sampling, so it is not included in the explanation.

Submit
46. Types of data sets are: 

Explanation

The given data sets are "Graph, Categorial, Gyroscope, Graph, Counter, Ordered". The types of data sets mentioned are "Graph, Graph, Ordered". This means that there are two instances of the "Graph" data set type and one instance of the "Ordered" data set type.

Submit
47. What are some technique and approach to the Feature Subset Selection:

Explanation

The brute force approach involves evaluating all possible combinations of features to find the best subset. This can be computationally expensive for large feature sets. The embedded approach incorporates feature selection as part of the model building process. It selects features based on their importance in the model. The filter approach selects features based on their statistical properties, such as correlation with the target variable. Wrapper approaches use a specific model and evaluate different subsets of features by training and testing the model. They select the subset that gives the best performance.

Submit
48. Proximity refers to a ____________ and ___________

Explanation

Proximity refers to the closeness or nearness of two or more things. In this context, it refers to the degree of similarity or dissimilarity between these things. When two things have a high degree of similarity, they are considered to be close or proximate to each other. On the other hand, when two things have a high degree of dissimilarity, they are considered to be far or distant from each other. Therefore, proximity can be understood as a measure of both similarity and dissimilarity between objects or concepts.

Submit
49. Feature Subset Selection consists of which of the following:

Explanation

Feature Subset Selection is a technique used to reduce the dimensionality of data by selecting a subset of relevant features while discarding redundant and irrelevant ones. It involves identifying features that are highly correlated or redundant, as well as features that do not contribute significantly to the prediction or analysis. This process helps in improving the efficiency of machine learning models, reducing overfitting, and enhancing interpretability. Therefore, the correct answer includes "Another way to reduce the dimensionality of data," "Redundant features," "Irrelevant Features," and "Techniques."

Submit
50. Which seven of these are part of Data Preprocessing?

Explanation

not-available-via-ai

Submit
View My Results

Quiz Review Timeline (Updated): May 26, 2023 +

Our quizzes are rigorously reviewed, monitored and continuously updated by our expert board to maintain accuracy, relevance, and timeliness.

  • Current Version
  • May 26, 2023
    Quiz Edited by
    ProProfs Editorial Team
  • Feb 16, 2016
    Quiz Created by
    Cloud_XII
Cancel
  • All
    All (50)
  • Unanswered
    Unanswered ()
  • Answered
    Answered ()
A collection of attributes describe an object?
Some purposes of Dimensionality Reduction include: -- Avoid the...
Some techniques of Dimensionality reduction:  -- Principle...
In Curse of Dimensionality, when dimensionality increases, data...
Same attribute can be mapped to different attribute values.
In curse dimensionality, definitions of density and distance between...
Examples of Ratios can be:
Examples of Nominal can be:
An _____ is a property or characteristic of an object. Example: eye...
What are some examples of data quality problems:
Examples of Ordinal can be:
Generic graph and HTML Links are examples of _________ data
_______ refers to the modification of original values, such as...
The Key principle for effective sampling is the following:
Examples of intervals can be:
The record data set consists of: 
Data Matrix is:
Different attributes cannot be mapped to the same set of values.
What are the different types of attributes?
The type of a Ratio attribute depends on which of the following...
_______ values are numbers or symbols assigned to an attribute
The type of a Nominal attribute depends on which of the following...
The type of an Ordinal attribute depends on which of the following...
Graph data set consists of: 
An ordered data set consists of: 
_________ are data objects with characteristics that are considerably...
Each document becomes a ___________ vector
_____ data set may include data objects that are duplicates or almost...
_______ data is a data that consists of a collection of records, each...
________ is the closeness of measurements to the true value of the...
_______ is combining two or more attributes (or objects) into a single...
Which of the following is NOT an example of sampling?
_______ Attribute has only a finite or countably infinite set of...
The type of an Interval attribute depends on which of the following...
____________ data is sequences of transactions or genomic sequence...
__________ data is a special type of record data, where each record...
__________ is a systematic variation of Measurements from the quantity...
_______ Attribute has real numbers as attribute values. Practically,...
The average Monthly Temperature of land and ocean can be considered...
What are the purposes of Aggregation:
_________ is the closeness of repeated measurements (of the same...
Important Characteristics of Structured Data are:
The data that helps to identify substructures is considered to be...
What are the methodologies of Feature Creation?
What are the types of sampling:
Types of data sets are: 
What are some technique and approach to the Feature Subset Selection:
Proximity refers to a ____________ and ___________
Feature Subset Selection consists of which of the following:
Which seven of these are part of Data Preprocessing?
Alert!

Advertisement