Big Data

106 Questions | Attempts: 1576
Share

SettingsSettingsSettings
Big Data - Quiz


Questions and Answers
  • 1. 

    The number of maps is usually driven by the total size of

    • A.

      Inputs

    • B.

      Outputs

    • C.

      Tasks

    • D.

      None of the listed options

    Correct Answer
    A. Inputs
  • 2. 

    You want to count the number of occurrences for each unique word in the supplied input data. You have decided to implement this by having your mapper tokenize each word and emit a literal value 1 and then have your reducer increment a counter for each literal 1 it receives. After successful implementation it occurs to you that you could optimise this by specifying a combiner. Will you be able to use your existing reducer as your combiner and why or why not

    • A.

      Yea because the sum operation is both associative and commutative and the input and output types of reduce method match

    • B.

      No, because the sum operation in reducer is incompatible with operation of a reducer

    • C.

      No ,because combiner and reducers use different interfaces

    • D.

      No, because mapper and combiner must use the same input data types.

    Correct Answer
    A. Yea because the sum operation is both associative and commutative and the input and output types of reduce method match
  • 3. 

    ____ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution

    • A.

      Map Parameters

    • B.

      JobConf

    • C.

      MemoryConf

    • D.

      None of the listed options

    Correct Answer
    B. JobConf
  • 4. 

    _____ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer

    • A.

      Partitioner

    • B.

      OutputCollector

    • C.

      Reporter

    • D.

      All of the listed options

    Correct Answer
    B. OutputCollector
  • 5. 

    Which is the default Input Formats defined in Hadoop

    • A.

      SequenceFileInputFormat

    • B.

      ByteInputFormat

    • C.

      KeyValueInputFormat

    • D.

      TextInputFormat

    Correct Answer
    D. TextInputFormat
  • 6. 

    Is Which daemon spawns child JVMs to perform MapReduce processing

    • A.

      JobTracker

    • B.

      NameNode

    • C.

      DataNode

    • D.

      TaskTracker

    • E.

      Secondary NameNode

    Correct Answer
    B. NameNode
  • 7. 

    Input to the _______ is the sorted output of the mappers

    • A.

      Reducer

    • B.

      Mapper

    • C.

      Shuffle

    • D.

      All of the listed options

    Correct Answer
    A. Reducer
  • 8. 

    Mapper class must have to extend with following class

    • A.

      Mapper

    • B.

      Reducer

    • C.

      Partitioner

    • D.

      Combiner

    Correct Answer
    A. Mapper
  • 9. 

    Map method will read how many records at a time?

    • A.

      1

    • B.

      2

    • C.

      3

    • D.

      4

    Correct Answer
    A. 1
  • 10. 

    Mapper class contains howmany arguments

    • A.

      2

    • B.

      3

    • C.

      4

    • D.

      5

    Correct Answer
    C. 4
  • 11. 

    ____ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer

    • A.

      Hadoop Strdata

    • B.

      Hadoop Streaming

    • C.

      Hadoop Stream

    • D.

      None of the listed options

    Correct Answer
    B. Hadoop Streaming
  • 12. 

    ____ is the default Partitioner for partitioning key space

    • A.

      HashPar

    • B.

      Partitioner

    • C.

      HashPartitioner

    • D.

      None of the listed options

    Correct Answer
    C. HashPartitioner
  • 13. 

    A____service acts in the Slave and is responsible for executing a Task assigned to it by the JobTracker

    • A.

      MapReduce

    • B.

      Mapper

    • C.

      TaskTracker

    • D.

      JobTracker

    Correct Answer
    C. TaskTracker
  • 14. 

    Point out the correct statement

    • A.

      MapReduce tries to place the data and the compute as close as possible

    • B.

      Map Task in MapReduce is performed using the Mapper() function

    • C.

      Reduce Task in MapReduce is performed using the Map() function

    • D.

      All of the listed options

    Correct Answer
    A. MapReduce tries to place the data and the compute as close as possible
  • 15. 

    Mapper input howmany records

    • A.

      1

    • B.

      64

    • C.

      1 ton

    • D.

      0 to n

    Correct Answer
    C. 1 ton
  • 16. 

    The output of the reduce task is typically written to the FileSystem via _____________

    • A.

      OutputCollector.collect

    • B.

      OutputCollector.get

    • C.

      OutputCollector.receive

    • D.

      OutputCollector.put

    Correct Answer
    A. OutputCollector.collect
  • 17. 

    How many instances of Job tracker can run on Hadoop cluster

    • A.

      1

    • B.

      2

    • C.

      3

    • D.

      4

    Correct Answer
    A. 1
  • 18. 

    Whats is the default input format

    • A.

      Testinputformat

    • B.

      Testinputformat

    • C.

      Sequence file input format

    • D.

      Custom input format

    Correct Answer
    A. Testinputformat
  • 19. 

    In the standard word count MapReduce algorithm, why might using a combiner reduce the overall Job Running time

    • A.

      Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster

    • B.

      Because combiners perform local aggregation of word counts, thereby reducing the number of mappers that need to run

    • C.

      Because combiners perform local aggregation of word counts, and the transfer that data to reducers without writing the intermediate data to disk

    • D.

      Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that needs to shuffled across the network to the reducers

    Correct Answer
    D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that needs to shuffled across the network to the reducers
  • 20. 

    What is the disadvantage of using multiple reduces with default HashPartioner and distributing your workload across your cluster

    • A.

      You will not be able to compress your intermediate data

    • B.

      You will no longer will be able to take the advantage of a Combiner

    • C.

      The output files may not be in global sorted orDER

    • D.

      There is no problem

    Correct Answer
    C. The output files may not be in global sorted orDER
  • 21. 

    Users can control which keys (and hence records) go to which Reducer by implementing a custom

    • A.

      Partitioner

    • B.

      OutputSplit

    • C.

      Reporter

    • D.

      All of the listed options

    Correct Answer
    A. Partitioner
  • 22. 

    Two files need to be joined over a common column. Which technique is faster and why

    • A.

      The reduce –side joining is faster as it receives the records sorted by keys

    • B.

      The reduce side joining is faster as it uses secondary sort

    • C.

      The map –side joining faster as it caches the data from one file in-memory

    • D.

      The map –side joining faster as it writes the intermediate data on local file system

    Correct Answer
    C. The map –side joining faster as it caches the data from one file in-memory
  • 23. 

    You are developing a combiner that takes as input Text keys, IntWritable Values, and emits Text keys, IntWritable values. Which interface should your class implement

    • A.

      Combiner< Text, IntWritable, Text, IntWritable>

    • B.

      Reducer< Text, Text,IntWritable,IntWritable

    • C.

      Reducer

    • D.

      Reducer< Text, IntWritable, Text, IntWritable>

    Correct Answer
    D. Reducer< Text, IntWritable, Text, IntWritable>
  • 24. 

    Which of the following phases occur simultaneously

    • A.

      Shuffle and Sort

    • B.

      Reduce and Sort

    • C.

      Shuffle and Map

    • D.

      All of the listed options

    Correct Answer
    A. Shuffle and Sort
  • 25. 

    Combiners increase the efficiency of a MapReduce program because

    • A.

      They provide a mechanism for different mappers to communicate with each other, thereby reducing synchronization overhead

    • B.

      They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of n; where n is the number of reducers

    • C.

      They aggregate map output locally in each individual machine and therefore reduce the amount of data that needs to shuffled across the network to the reducers

    • D.

      They aggregate intermediate map output to a small number of nearby (i.e., rack local) machines and therefore reduce the amount of the amount data that needs to be shuffled across the reducers

    Correct Answer
    C. They aggregate map output locally in each individual machine and therefore reduce the amount of data that needs to shuffled across the network to the reducers
  • 26. 

    You write a MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses Text Input Format and the IdentityReducer. The mapper applies a regular expression over input values and emits key –value pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reducer to zero.

    • A.

      There is no difference in output between the two settings

    • B.

      With Zero reducers, no reducer runs and the job throw an exception, with one reducer; instances of matching patterns are stored in a single file on HDFS

    • C.

      With zero reducer, all instances of matching patterns stored in multiple files on HDFS

    • D.

      With zero reducers, instances of matching pattern are stored in multiple files on HDFS.With one reducer; all instances of matching patterns are collected in one on HDFS

    Correct Answer
    D. With zero reducers, instances of matching pattern are stored in multiple files on HDFS.With one reducer; all instances of matching patterns are collected in one on HDFS
  • 27. 

    In a MapReduce job, you want each of your output files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies

    • A.

      Increase the parameter that controls minimum splits size in the job configuration

    • B.

      Write a custom MapRunner that iterates over all key-value pairs in the entire file

    • C.

      Set the number of mappers equal to the number of input files you want to process

    • D.

      Write a custom FileInputFormat and override the method is splittable to always return false

    Correct Answer
    D. Write a custom FileInputFormat and override the method is splittable to always return false
  • 28. 

    In a MapReduce job, you want each of you input files processed by a single map task. How do youconfigure a MapReduce job so that a single map task processes each input file regardless of howmany blocks the input file occupies

    • A.

      Increase the parameter that controls minimum split size in the job configuration

    • B.

      Write a custom MapRunner that iterates over all key-value pairs in the entire file

    • C.

      Set the number of mappers equal to the number of input files you want to process

    • D.

      Write a custom FileInputFormat and override the method isSplittable to always return false

    Correct Answer
    D. Write a custom FileInputFormat and override the method isSplittable to always return false
  • 29. 

    Point out the correct statement

    • A.

      Applications can use the Reporter to report progress

    • B.

      The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job

    • C.

      The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format

    • D.

      All of the listed options

    Correct Answer
    D. All of the listed options
  • 30. 

    Which of the following is not the Dameon process that runs on a hadoopcluster

    • A.

      JobTracker

    • B.

      DataNode

    • C.

      TaskTracker

    • D.

      TaskNode

    Correct Answer
    C. TaskTracker
  • 31. 

    When combiner will work?

    • A.

      Before mapper phase

    • B.

      After mapper phase

    • C.

      After reducer phase

    • D.

      None of the listed options

    Correct Answer
    B. After mapper phase
  • 32. 

    Which of the following is key values ?

    • A.

      Mapper input

    • B.

      Mapper output

    • C.

      Reducer input

    • D.

      Reducer output

    Correct Answer
    C. Reducer input
  • 33. 

    Point out the wrong statement

    • A.

      A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner

    • B.

      The MapReduce framework operates exclusively on pairs

    • C.

      Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods

    • D.

      None of the listed options

    Correct Answer
    D. None of the listed options
  • 34. 

    What is the command you will use to run a driver named “Sales Analysis “whose complied code is available in a jar file “Sales Analytics.jar” with input data in directory “/sales/data “ and output in a directory “Sales/analytics”

    • A.

      Hadoop fs-jar SalesAnalytics.jar Sales Analysis- input /sales /data- output /sales/analysis

    • B.

      Hadoop fs jar SalesAnalytics.jar –input /sales/data- output /sales/analysis

    • C.

      Hadoop –jar SalesAnalytics.jar Sales Analysis –input/sales/data –output/sales/analysis

    • D.

      Hadoop jar Sales Analytics. Jar Sales Analysis / sales/data / sales/ analysis

    Correct Answer
    D. Hadoop jar Sales Analytics. Jar Sales Analysis / sales/data / sales/ analysis
  • 35. 

    In a MapReduce job, the reducer receives all values associated with the same key. Which statement is most accurate about the ordering of these values

    • A.

      The values are in sorted order

    • B.

      The values are arbitrarily ordered , and the ordering may vary from run to run of the same MapReduce job

    • C.

      The values are arbitrarily ordered, but multiple runs of the same MapReduce job will always have the same ordering

    • D.

      Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values

    Correct Answer
    D. Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values
  • 36. 

    What is the maximum limit for key-value pair that a mapper can emit

    • A.

      Its equivalent to number of lines in input files

    • B.

      Its equivalent to number of times mapt) method is called in mapper task

    • C.

      There is no such restriction. It depends on the use case and logic

    • D.

      10000

    Correct Answer
    C. There is no such restriction. It depends on the use case and logic
  • 37. 

    Which of the following is not a phase of Reducer

    • A.

      Map

    • B.

      Reduce

    • C.

      Shuffle

    • D.

      Sort

    Correct Answer
    A. Map
  • 38. 

    One map-reduce programme takes a text were each line break is considered one complete record and the line offset as a key. The map method parses the record into words and for each word it creates multiple key value pair where keys are the words itself and values are the characters in the word. The reducer finds the characters used for each unique word.This programme may not be a perfect programme but it works correctly. The problem this program has is That, it creates more key value pairs in the intermediate output of mappers from single input (key-value). This leads to increasese of which of the following> (Select the correct answer)

    • A.

      Disk I/O and network traffic

    • B.

      Memory foot-print of mappers and network traffic

    • C.

      Disk-io and memory foot print of mappers

    • D.

      Block size and disk-io

    Correct Answer
    A. Disk I/O and network traffic
  • 39. 

    In one job, Howmany combiner tasks will work?

    • A.

      Equal to block size

    • B.

      Equal to number of mapper tasks

    • C.

      Equal to number of reducer tasks

    • D.

      Equal to number of replications

    Correct Answer
    B. Equal to number of mapper tasks
  • 40. 

    During the standard sort and shuffle phase of MapReduce, keys and values are passed to reducers, which of the following is true

    • A.

      Keys are presented to a reducer in sorted order; values for a given key are not sorted

    • B.

      Keys are presented to a reducer in sorted order; values for a given key are sorted in ascending order

    • C.

      Keys are presented to a reducer in random order; values for a given key are not sorted

    • D.

      Keys are presented to a reducer in random order ; values for a given key are sorted in ascending order

    Correct Answer
    A. Keys are presented to a reducer in sorted order; values for a given key are not sorted
  • 41. 

    What is a Sequence File

    • A.

      A Sequence File contains a binary encoding of an arbitrary number of homogeneous writable objects

    • B.

      A Sequence File contains a binary encoding of an arbitrary number of heterogeneous writable objects

    • C.

      A Sequence File contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order

    • D.

      A Sequence File contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type

    Correct Answer
    D. A Sequence File contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type
  • 42. 

    One large data set has fewer keysets but each key has large number of occurrences in the data. A single reducer may not be able to process the whole data set. So, you decided to create one reducer task per key ranges. What is component you will use to make each key is processed by the appropriate reducer

    • A.

      Combiner

    • B.

      OOZIE

    • C.

      PIG

    • D.

      Total Order Partitioner

    Correct Answer
    D. Total Order Partitioner
  • 43. 

    Which of the following statements best describes how a large (100 GB) is stored in HDFS

    • A.

      The file is divided into variable size blocks, which are stored on multiple data nodes .Each block is replicated three times by default

    • B.

      The file is replicated three times by default. Each copy of the file is stored on a separate data nodes

    • C.

      The master copy of the file is stored on a single data node. The replica copies are divided into fixed –size block, which are stored on multiple data nodes

    • D.

      The file is divided into fixed -size blocks, which are stored on multiple data nodes .Each block is replicated three times by default .Multiple blocks from the same file might reside on the same data node

    • E.

      The file is divided into fixed –size blocks which are stored on multiple datanodes. Each block is replicated three times by default .HDFS guarantees that different blocks from the same file are never on the same datanode

    Correct Answer
    E. The file is divided into fixed –size blocks which are stored on multiple datanodes. Each block is replicated three times by default .HDFS guarantees that different blocks from the same file are never on the same datanode
  • 44. 

    Which of the following is wrong?

    • A.

      Number of mapper tasks is equal to input splits

    • B.

      Number of mapper tasks is equal to number of combiner tasks

    • C.

      Number of mapper tasks is equal to number of reducer tasks

    • D.

      Number of input split depended on block size.

    Correct Answer
    C. Number of mapper tasks is equal to number of reducer tasks
  • 45. 

    Your client application submits a MapReduce to your Hadoop cluster. Identify the Hadoop cluster . Identify the Hadoop daemon on which the Hadoop framework will look for an available slot to schedule a MapReduce operation

    • A.

      TaskTracker

    • B.

      NameNode

    • C.

      DataNode

    • D.

      JobTracker

    Correct Answer
    C. DataNode
  • 46. 

    The number of reduces for the job is set by the user via

    • A.

      JobConf.setNumTasks(int)

    • B.

      JobConf.setNumReduceTasks(int)

    • C.

      JobConf.setNumMapTasks(int)

    • D.

      All of the listed options

    Correct Answer
    B. JobConf.setNumReduceTasks(int)
  • 47. 

    Which of the following best describes the workings of TextInputFormat

    • A.

      Input file splits may cross line boundary .A line that crosses tile splits is ignored

    • B.

      The input file is split exactly at the breaks, so each Record Reader will read a series of blocks

    • C.

      Input file splits may cross line boundary. A line that crosses file splits is read RecordReaders of both splits containing the broken line

    • D.

      Input file splits may cross line

    • E.

      Input file splits may cross line boundary. A line that crosses file splits is read by the RecordReader of split that contains the beginning of the broken line

    Correct Answer
    D. Input file splits may cross line
  • 48. 

    Which of the following is a valid flow in Hadoop

    • A.

      Input -> Reducer -> Mapper -> Combiner -> -> Output

    • B.

      Input -> Mapper -> Reducer -> Combiner -> Output

    • C.

      Input -> Mapper -> Combiner -> Reducer -> Output

    • D.

      Input -> Reducer -> Combiner -> Mapper -> Output

    Correct Answer
    C. Input -> Mapper -> Combiner -> Reducer -> Output
  • 49. 

    MapReduce was devised by

    • A.

      Apple

    • B.

      Google

    • C.

      Microsoft

    • D.

      Samsung

    Correct Answer
    B. Google
  • 50. 

    Which of the following should be used when possible to improve performance

    • A.

      Combiner

    • B.

      Partitioner

    • C.

      Comparator

    • D.

      Reducer

    • E.

      All of the listed options

    Correct Answer
    A. Combiner

Quiz Review Timeline +

Our quizzes are rigorously reviewed, monitored and continuously updated by our expert board to maintain accuracy, relevance, and timeliness.

  • Current Version
  • Mar 21, 2022
    Quiz Edited by
    ProProfs Editorial Team
  • Jan 13, 2017
    Quiz Created by
    Datawh023
Back to Top Back to top
Advertisement
×

Wait!
Here's an interesting quiz for you.

We have other quizzes matching your interest.