Big Data

106 Questions | Total Attempts: 1567

SettingsSettingsSettings
Please wait...
Big Data


Questions and Answers
  • 1. 
    The number of maps is usually driven by the total size of
    • A. 

      Inputs

    • B. 

      Outputs

    • C. 

      Tasks

    • D. 

      None of the listed options

  • 2. 
    You want to count the number of occurrences for each unique word in the supplied input data. You have decided to implement this by having your mapper tokenize each word and emit a literal value 1 and then have your reducer increment a counter for each literal 1 it receives. After successful implementation it occurs to you that you could optimise this by specifying a combiner. Will you be able to use your existing reducer as your combiner and why or why not
    • A. 

      Yea because the sum operation is both associative and commutative and the input and output types of reduce method match

    • B. 

      No, because the sum operation in reducer is incompatible with operation of a reducer

    • C. 

      No ,because combiner and reducers use different interfaces

    • D. 

      No, because mapper and combiner must use the same input data types.

  • 3. 
    ____ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution
    • A. 

      Map Parameters

    • B. 

      JobConf

    • C. 

      MemoryConf

    • D. 

      None of the listed options

  • 4. 
    _____ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer
    • A. 

      Partitioner

    • B. 

      OutputCollector

    • C. 

      Reporter

    • D. 

      All of the listed options

  • 5. 
    Which is the default Input Formats defined in Hadoop
    • A. 

      SequenceFileInputFormat

    • B. 

      ByteInputFormat

    • C. 

      KeyValueInputFormat

    • D. 

      TextInputFormat

  • 6. 
    Is Which daemon spawns child JVMs to perform MapReduce processing
    • A. 

      JobTracker

    • B. 

      NameNode

    • C. 

      DataNode

    • D. 

      TaskTracker

    • E. 

      Secondary NameNode

  • 7. 
    Input to the _______ is the sorted output of the mappers
    • A. 

      Reducer

    • B. 

      Mapper

    • C. 

      Shuffle

    • D. 

      All of the listed options

  • 8. 
    Mapper class must have to extend with following class
    • A. 

      Mapper

    • B. 

      Reducer

    • C. 

      Partitioner

    • D. 

      Combiner

  • 9. 
    Map method will read how many records at a time?
    • A. 

      1

    • B. 

      2

    • C. 

      3

    • D. 

      4

  • 10. 
    Mapper class contains howmany arguments
    • A. 

      2

    • B. 

      3

    • C. 

      4

    • D. 

      5

  • 11. 
    ____ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer
    • A. 

      Hadoop Strdata

    • B. 

      Hadoop Streaming

    • C. 

      Hadoop Stream

    • D. 

      None of the listed options

  • 12. 
    ____ is the default Partitioner for partitioning key space
    • A. 

      HashPar

    • B. 

      Partitioner

    • C. 

      HashPartitioner

    • D. 

      None of the listed options

  • 13. 
    A____service acts in the Slave and is responsible for executing a Task assigned to it by the JobTracker
    • A. 

      MapReduce

    • B. 

      Mapper

    • C. 

      TaskTracker

    • D. 

      JobTracker

  • 14. 
    Point out the correct statement
    • A. 

      MapReduce tries to place the data and the compute as close as possible

    • B. 

      Map Task in MapReduce is performed using the Mapper() function

    • C. 

      Reduce Task in MapReduce is performed using the Map() function

    • D. 

      All of the listed options

  • 15. 
    Mapper input howmany records
    • A. 

      1

    • B. 

      64

    • C. 

      1 ton

    • D. 

      0 to n

  • 16. 
    The output of the reduce task is typically written to the FileSystem via _____________
    • A. 

      OutputCollector.collect

    • B. 

      OutputCollector.get

    • C. 

      OutputCollector.receive

    • D. 

      OutputCollector.put

  • 17. 
    How many instances of Job tracker can run on Hadoop cluster
    • A. 

      1

    • B. 

      2

    • C. 

      3

    • D. 

      4

  • 18. 
    Whats is the default input format
    • A. 

      Testinputformat

    • B. 

      Testinputformat

    • C. 

      Sequence file input format

    • D. 

      Custom input format

  • 19. 
    In the standard word count MapReduce algorithm, why might using a combiner reduce the overall Job Running time
    • A. 

      Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster

    • B. 

      Because combiners perform local aggregation of word counts, thereby reducing the number of mappers that need to run

    • C. 

      Because combiners perform local aggregation of word counts, and the transfer that data to reducers without writing the intermediate data to disk

    • D. 

      Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that needs to shuffled across the network to the reducers

  • 20. 
    What is the disadvantage of using multiple reduces with default HashPartioner and distributing your workload across your cluster
    • A. 

      You will not be able to compress your intermediate data

    • B. 

      You will no longer will be able to take the advantage of a Combiner

    • C. 

      The output files may not be in global sorted orDER

    • D. 

      There is no problem

  • 21. 
    Users can control which keys (and hence records) go to which Reducer by implementing a custom
    • A. 

      Partitioner

    • B. 

      OutputSplit

    • C. 

      Reporter

    • D. 

      All of the listed options

  • 22. 
    Two files need to be joined over a common column. Which technique is faster and why
    • A. 

      The reduce –side joining is faster as it receives the records sorted by keys

    • B. 

      The reduce side joining is faster as it uses secondary sort

    • C. 

      The map –side joining faster as it caches the data from one file in-memory

    • D. 

      The map –side joining faster as it writes the intermediate data on local file system

  • 23. 
    You are developing a combiner that takes as input Text keys, IntWritable Values, and emits Text keys, IntWritable values. Which interface should your class implement
    • A. 

      Combiner< Text, IntWritable, Text, IntWritable>

    • B. 

      Reducer< Text, Text,IntWritable,IntWritable

    • C. 

      Reducer

    • D. 

      Reducer< Text, IntWritable, Text, IntWritable>

  • 24. 
    Which of the following phases occur simultaneously
    • A. 

      Shuffle and Sort

    • B. 

      Reduce and Sort

    • C. 

      Shuffle and Map

    • D. 

      All of the listed options

  • 25. 
    Combiners increase the efficiency of a MapReduce program because
    • A. 

      They provide a mechanism for different mappers to communicate with each other, thereby reducing synchronization overhead

    • B. 

      They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of n; where n is the number of reducers

    • C. 

      They aggregate map output locally in each individual machine and therefore reduce the amount of data that needs to shuffled across the network to the reducers

    • D. 

      They aggregate intermediate map output to a small number of nearby (i.e., rack local) machines and therefore reduce the amount of the amount data that needs to be shuffled across the reducers