Certified Developer Quiz: How Much Do You Know About Apache Hadoop?

Approved & Edited by ProProfs Editorial Team
The editorial team at ProProfs Quizzes consists of a select group of subject experts, trivia writers, and quiz masters who have authored over 10,000 quizzes taken by more than 100 million users. This team includes our in-house seasoned quiz moderators and subject matter experts. Our editorial experts, spread across the world, are rigorously trained using our comprehensive guidelines to ensure that you receive the highest quality quizzes.
Learn about Our Editorial Process
| By Ssashi
S
Ssashi
Community Contributor
Quizzes Created: 1 | Total Attempts: 518
Questions: 10 | Attempts: 518

SettingsSettingsSettings
Certified Developer Quiz: How Much Do You Know About Apache Hadoop? - Quiz

Are you a developer? Why not take this super fun and informative Certified developer quiz? All the questions in the quiz are designed to test your knowledge and make you learn new things about the concept. Please make sure to read all the questions carefully before answering. Interestingly, you can use this quiz to refresh your knowledge for an upcoming quiz. The quiz has no time bar, so feel free to take this quiz as many times as you like. Keep learning and have fun!


Questions and Answers
  • 1. 

    Data locality is considered when scheduling

    • A.

      Job tracker

    • B.

      Map task

    • C.

      Reduce task

    • D.

      Task tracker

    Correct Answer
    B. Map task
    Explanation
    Job tracker and task tracker are daemons and are not scheduled. It is not possible to consider data locality for reduce tasks as they are dependent on output of the Map tasks

    Rate this question:

  • 2. 

    Task scheduling is handled by

    • A.

      Reduce task

    • B.

      Task tracker

    • C.

      Map task

    • D.

      Job tracker

    Correct Answer
    D. Job tracker
    Explanation
    The correct answer is Job tracker. In Hadoop, task scheduling is handled by the Job tracker. The Job tracker is responsible for assigning tasks to the available Task trackers in the cluster. It keeps track of the overall progress of the job and coordinates the execution of Map and Reduce tasks. The Job tracker also handles the failure of tasks and reschedules them if necessary.

    Rate this question:

  • 3. 

    Input splits created by

    • A.

      Driver program

    • B.

      Job tracker

    • C.

      Map task

    • D.

      Reduce task

    Correct Answer
    A. Driver program
    Explanation
    The correct answer is the Driver program because it is responsible for dividing the input data into smaller chunks called input splits. These input splits are then assigned to the map tasks for processing. The Driver program determines the number and size of the input splits based on the input data size and the configuration settings.

    Rate this question:

  • 4. 

    When is the earliest point at which the reduce method of a given Reducer can be called?

    • A.

      As soon as at least one mapper has finished processing its input split.

    • B.

      As soon as a mapper has emitted at least one record.

    • C.

      Not until all mappers have finished processing all records.

    • D.

      It depends on the InputFormat used for the job.

    Correct Answer
    C. Not until all mappers have finished processing all records.
    Explanation
    In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished.
    Note: The reduce phase has 3 steps: shuffle, sort, and reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer.

    Rate this question:

  • 5. 

    Which describes how a client reads a file from HDFS?

    • A.

      The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).

    • B.

      The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.

    • C.

      The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.

    • D.

      The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client.

    Correct Answer
    C. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.
    Explanation
    The Client communication to HDFS happens using Hadoop HDFS API. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file on HDFS. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.

    Rate this question:

  • 6. 

    You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?

    • A.

      Combiner (Text, IntWritable, Text, IntWritable)

    • B.

      Mapper (Text, IntWritable, Text, IntWritable)

    • C.

      Reducer (Text, Text, IntWritable, IntWritable)

    • D.

      Combiner (Text, Text, IntWritable, IntWritable)

    Correct Answer
    D. Combiner (Text, Text, IntWritable, IntWritable)
  • 7. 

    How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?

    • A.

      Keys are presented to reducer in sorted order; values for a given key are not sorted.

    • B.

      Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.

    • C.

      Keys are presented to a reducer in random order; values for a given key are not sorted.

    • D.

      Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.

    Correct Answer
    A. Keys are presented to reducer in sorted order; values for a given key are not sorted.
    Explanation
    Reducer has 3 primary phases:
    1. Shuffle
    The Reducer copies the sorted output from each Mapper using HTTP across the network.
    2. Sort
    The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).
    The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
    SecondarySort
    To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.
    3. Reduce
    In this phase the reduce(Object, Iterable, Context) method is called for each in the sorted inputs.
    The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object).
    The output of the Reducer is not re-sorted.

    Rate this question:

  • 8. 

    Assuming default settings, which best describes the order of data provided to a reducer’s reduce method:

    • A.

      The keys given to a reducer aren’t in a predictable order, but the values associated with those keys always are.

    • B.

      Both the keys and values passed to a reducer always appear in sorted order.

    • C.

      Neither keys nor values are in any predictable order.

    • D.

      The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order

    Correct Answer
    D. The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order
    Explanation
    Reducer has 3 primary phases:
    1. Shuffle
    The Reducer copies the sorted output from each Mapper using HTTP across the network.
    2. Sort
    The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).
    The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
    SecondarySort
    To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.
    3. Reduce
    In this phase the reduce(Object, Iterable, Context) method is called for each in the sorted inputs.
    The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object).
    The output of the Reducer is not re-sorted.

    Rate this question:

  • 9. 

    You’ve built a MapReduce job that denormalizes a very large table, resulting in an extremely large amount of output data. Which two cluster resources will your job stress? (Choose two).

    • A.

      Processor

    • B.

      RAM

    • C.

      Network I/O

    • D.

      Disk I/O

    Correct Answer(s)
    C. Network I/O
    D. Disk I/O
    Explanation
    The MapReduce job denormalizes a large table, which means it combines data from multiple tables into one. This process requires a lot of data transfer over the network, as well as reading and writing data to the disk. Therefore, the job will stress both the Network I/O and Disk I/O resources of the cluster.

    Rate this question:

  • 10. 

    In the execution of a MapReduce job, where does the Mapper place the intermediate data of each Map task?

    • A.

      The Hadoop framework hold the intermediate data in the TaskTracker's memory

    • B.

      The Mapper transfers the intermediate data to the JobTracker, which then sends it to the Reducers

    • C.

      The Mapper stores the intermediate data on the underlying filesystem of the local disk of the machine which ran Map task

    • D.

      The Mapper transfers the intermediate data to the reducers as soon as it is generated by the Map task

    Correct Answer
    C. The Mapper stores the intermediate data on the underlying filesystem of the local disk of the machine which ran Map task
    Explanation
    The Mapper stores the intermediate data on the underlying filesystem of the local disk of the machine which ran the Map task. This is because the intermediate data is generated by the Mapper and needs to be stored temporarily before being transferred to the Reducers. Storing the data on the local disk allows for efficient access and retrieval when it is needed for the Reducers to process and combine the data.

    Rate this question:

Quiz Review Timeline +

Our quizzes are rigorously reviewed, monitored and continuously updated by our expert board to maintain accuracy, relevance, and timeliness.

  • Current Version
  • Aug 21, 2023
    Quiz Edited by
    ProProfs Editorial Team
  • Feb 25, 2014
    Quiz Created by
    Ssashi
Back to Top Back to top
Advertisement
×

Wait!
Here's an interesting quiz for you.

We have other quizzes matching your interest.