Data Science Analysis Questions

14 Questions | Total Attempts: 2064

SettingsSettingsSettings
Data Science Analysis Questions - Quiz

.


Questions and Answers
  • 1. 
    Data has been collected on visitors' viewing habits at a bank's website. Which technique is used to identify pages commonly viewed during the same visit to the website?
    • A. 

      Clustering

    • B. 

      Association Rules

    • C. 

      Classification

    • D. 

      Regression

  • 2. 
    You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing. What should you do?
    • A. 

      Ensure that a DataNode is running

    • B. 

      Ensure that the JobTracker is running

    • C. 

      Ensure that the NameNode is running

    • D. 

      Ensure that the TaskTracker is running

  • 3. 
    You have been assigned to run a logistic regressionmodel for each of 100 countries, and all the data is currently stored in a PostgreSQL database. Which tool/library would you use to produce these models with the least effort?
    • A. 

      MADlib

    • B. 

      Mahout

    • C. 

      RStudio

    • D. 

      HBase

  • 4. 
    What describes the use of UNION clause in a SQL statement?
    • A. 

      Operates on queries and potentially increases the number of rows

    • B. 

      Operates on queries and potentially decreases the number of rows

    • C. 

      Operates on tables and potentially decreases the number of columns

    • D. 

      Operates on both tables and queries and potentially increases both the number of rows and columns

  • 5. 
    When would you use a Wilcoxson Rank Sum test?
    • A. 

      When you cannot make an assumption about the distribution of the populations

    • B. 

      When the data can easily be sorted

    • C. 

      When the populations represent the sums of other values

    • D. 

      When the data cannot easily be sorted

  • 6. 
    In the MapReduce framework, what is the purpose of the Reduce function?
    • A. 

      It aggregates the results of the Map function andgenerates processed output

    • B. 

      It distributes the input to multiple nodes for processing

    • C. 

      It writes the output of the Map function to storage

    • D. 

      It breaks the input into smaller components and distributes to other nodes in the cluster

  • 7. 
    A Data Scientist is assigned to build a model from a reporting data warehouse. The warehouse contains data collected from many sources and transformed througha complex, multi-stage ETL process. What is a concern the data scientist should have about the data?
    • A. 

      It is too processed

    • B. 

      It is not structured

    • C. 

      It is not normalized

    • D. 

      It is too centralized

  • 8. 
    Ou are given a list of pre-defined associationrules:A) RENTER => BAD CREDITB) RENTER => GOOD CREDITC) HOME OWNER => BAD CREDITD) HOME OWNER => GOOD CREDITE) FREE HOUSING => BAD CREDITF) FREE HOUSING => GOOD CREDITFor your next analysis, you must limit your datasetbased on rules with confidence greater than 60%.Which of the rules will be kept in the analysis?
    • A. 

      Rules B and D

    • B. 

      Rules A and F

    • C. 

      Rules C and E

    • D. 

      Rules D and E

  • 9. 
    You have run a linear regression model against yourdata, and have plotted true outcome versus predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model?
    • A. 

      The R-squared may be biased upwards by the extreme-valued outcomes. Remove them and refit to get a better idea of the model’s quality over typical data.

    • B. 

      The R-squared is good. The model should perform well.

    • C. 

      The extreme-valued outliers may negatively affectthe model’s performance. Remove them to see if theRsquared mproves over typical data.

    • D. 

      The observations seem to come from two different populations,but this model fits them both equally well.

  • 10. 
    A data scientist is asked to implement an article recommendation feature for an on-line magazine. The magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only the style and subject matter of the current articleis available for making recommendations. All of the magazine’s articles are stored in a database in a format suitable for analytics. Which method should the data scientist try first?
    • A. 

      K Means Clustering

    • B. 

      K Means Clustering

    • C. 

      Logistic Regression

    • D. 

      Association Rules

  • 11. 
    Imagine you are trying to hire a Data Scientist foryour team. In addition to technical ability and quantitative background, which additional essential trait would you look for in people applying for this position?
    • A. 

      Communication skill

    • B. 

      Scientific background

    • C. 

      Domain expertise

    • D. 

      Well Organized

  • 12. 
    An analyst is searching a corpus of documents for the topic “solid state disk”. In the Exhibit, Table A provides the inverse document frequency for each term across the corpus. Table B provides each term’s frequency in four documents selected from corpus. Which of the four documents is most relevant to the analyst’s search?
    • A. 

      Document B

    • B. 

      Document A

    • C. 

      Document C

    • D. 

      Document D

  • 13. 
    Consider the training data set shown in the exhibit. What are the classification (Y = 0 or 1) and the probability of the classification for the tuple X(1, 0, 0) using Naive Bayesian classifier?
    • A. 

      Classification Y = 0,Probability = 4/54

    • B. 

      Classification Y = 1,Probability = 4/54

    • C. 

      Classification Y = 0,Probability = 1/54

    • D. 

      Classification Y = 1,Probability = 1/54

  • 14. 
    What provides the decision tree for predicting whether or not someone is a good or bad credit risk. What would be the assigned probability, p(good), of a single male with no known savings?
    • A. 

      0.83

    • B. 

      0

    • C. 

      0.498

    • D. 

      0.6

Back to Top Back to top