Inputs
Outputs
Tasks
None of the listed options
Yea because the sum operation is both associative and commutative and the input and output types of reduce method match
No, because the sum operation in reducer is incompatible with operation of a reducer
No ,because combiner and reducers use different interfaces
No, because mapper and combiner must use the same input data types.
Map Parameters
JobConf
MemoryConf
None of the listed options
Partitioner
OutputCollector
Reporter
All of the listed options
SequenceFileInputFormat
ByteInputFormat
KeyValueInputFormat
TextInputFormat
JobTracker
NameNode
DataNode
TaskTracker
Secondary NameNode
Reducer
Mapper
Shuffle
All of the listed options
Mapper
Reducer
Partitioner
Combiner
1
2
3
4
2
3
4
5
Hadoop Strdata
Hadoop Streaming
Hadoop Stream
None of the listed options
HashPar
Partitioner
HashPartitioner
None of the listed options
MapReduce
Mapper
TaskTracker
JobTracker
MapReduce tries to place the data and the compute as close as possible
Map Task in MapReduce is performed using the Mapper() function
Reduce Task in MapReduce is performed using the Map() function
All of the listed options
1
64
1 ton
0 to n
OutputCollector.collect
OutputCollector.get
OutputCollector.receive
OutputCollector.put
1
2
3
4
Testinputformat
Testinputformat
Sequence file input format
Custom input format
Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster
Because combiners perform local aggregation of word counts, thereby reducing the number of mappers that need to run
Because combiners perform local aggregation of word counts, and the transfer that data to reducers without writing the intermediate data to disk
Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that needs to shuffled across the network to the reducers
You will not be able to compress your intermediate data
You will no longer will be able to take the advantage of a Combiner
The output files may not be in global sorted orDER
There is no problem
Partitioner
OutputSplit
Reporter
All of the listed options
The reduce –side joining is faster as it receives the records sorted by keys
The reduce side joining is faster as it uses secondary sort
The map –side joining faster as it caches the data from one file in-memory
The map –side joining faster as it writes the intermediate data on local file system
Combiner< Text, IntWritable, Text, IntWritable>
Reducer< Text, Text,IntWritable,IntWritable
Reducer
Reducer< Text, IntWritable, Text, IntWritable>
Shuffle and Sort
Reduce and Sort
Shuffle and Map
All of the listed options
They provide a mechanism for different mappers to communicate with each other, thereby reducing synchronization overhead
They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of n; where n is the number of reducers
They aggregate map output locally in each individual machine and therefore reduce the amount of data that needs to shuffled across the network to the reducers
They aggregate intermediate map output to a small number of nearby (i.e., rack local) machines and therefore reduce the amount of the amount data that needs to be shuffled across the reducers
There is no difference in output between the two settings
With Zero reducers, no reducer runs and the job throw an exception, with one reducer; instances of matching patterns are stored in a single file on HDFS
With zero reducer, all instances of matching patterns stored in multiple files on HDFS
With zero reducers, instances of matching pattern are stored in multiple files on HDFS.With one reducer; all instances of matching patterns are collected in one on HDFS
Increase the parameter that controls minimum splits size in the job configuration
Write a custom MapRunner that iterates over all key-value pairs in the entire file
Set the number of mappers equal to the number of input files you want to process
Write a custom FileInputFormat and override the method is splittable to always return false
Increase the parameter that controls minimum split size in the job configuration
Write a custom MapRunner that iterates over all key-value pairs in the entire file
Set the number of mappers equal to the number of input files you want to process
Write a custom FileInputFormat and override the method isSplittable to always return false
Applications can use the Reporter to report progress
The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job
The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
All of the listed options
JobTracker
DataNode
TaskTracker
TaskNode
Before mapper phase
After mapper phase
After reducer phase
None of the listed options
Mapper input
Mapper output
Reducer input
Reducer output
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner
The MapReduce framework operates exclusively on pairs
Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
None of the listed options
Hadoop fs-jar SalesAnalytics.jar Sales Analysis- input /sales /data- output /sales/analysis
Hadoop fs jar SalesAnalytics.jar –input /sales/data- output /sales/analysis
Hadoop –jar SalesAnalytics.jar Sales Analysis –input/sales/data –output/sales/analysis
Hadoop jar Sales Analytics. Jar Sales Analysis / sales/data / sales/ analysis
The values are in sorted order
The values are arbitrarily ordered , and the ordering may vary from run to run of the same MapReduce job
The values are arbitrarily ordered, but multiple runs of the same MapReduce job will always have the same ordering
Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values
Its equivalent to number of lines in input files
Its equivalent to number of times mapt) method is called in mapper task
There is no such restriction. It depends on the use case and logic
10000
Map
Reduce
Shuffle
Sort
Disk I/O and network traffic
Memory foot-print of mappers and network traffic
Disk-io and memory foot print of mappers
Block size and disk-io
Equal to block size
Equal to number of mapper tasks
Equal to number of reducer tasks
Equal to number of replications
Keys are presented to a reducer in sorted order; values for a given key are not sorted
Keys are presented to a reducer in sorted order; values for a given key are sorted in ascending order
Keys are presented to a reducer in random order; values for a given key are not sorted
Keys are presented to a reducer in random order ; values for a given key are sorted in ascending order
A Sequence File contains a binary encoding of an arbitrary number of homogeneous writable objects
A Sequence File contains a binary encoding of an arbitrary number of heterogeneous writable objects
A Sequence File contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order
A Sequence File contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type
Combiner
OOZIE
PIG
Total Order Partitioner
The file is divided into variable size blocks, which are stored on multiple data nodes .Each block is replicated three times by default
The file is replicated three times by default. Each copy of the file is stored on a separate data nodes
The master copy of the file is stored on a single data node. The replica copies are divided into fixed –size block, which are stored on multiple data nodes
The file is divided into fixed -size blocks, which are stored on multiple data nodes .Each block is replicated three times by default .Multiple blocks from the same file might reside on the same data node
The file is divided into fixed –size blocks which are stored on multiple datanodes. Each block is replicated three times by default .HDFS guarantees that different blocks from the same file are never on the same datanode
Number of mapper tasks is equal to input splits
Number of mapper tasks is equal to number of combiner tasks
Number of mapper tasks is equal to number of reducer tasks
Number of input split depended on block size.
TaskTracker
NameNode
DataNode
JobTracker
JobConf.setNumTasks(int)
JobConf.setNumReduceTasks(int)
JobConf.setNumMapTasks(int)
All of the listed options
Input file splits may cross line boundary .A line that crosses tile splits is ignored
The input file is split exactly at the breaks, so each Record Reader will read a series of blocks
Input file splits may cross line boundary. A line that crosses file splits is read RecordReaders of both splits containing the broken line
Input file splits may cross line
Input file splits may cross line boundary. A line that crosses file splits is read by the RecordReader of split that contains the beginning of the broken line
Input -> Reducer -> Mapper -> Combiner -> -> Output
Input -> Mapper -> Reducer -> Combiner -> Output
Input -> Mapper -> Combiner -> Reducer -> Output
Input -> Reducer -> Combiner -> Mapper -> Output
Apple
Microsoft
Samsung
Combiner
Partitioner
Comparator
Reducer
All of the listed options
Quiz Review Timeline +
Our quizzes are rigorously reviewed, monitored and continuously updated by our expert board to maintain accuracy, relevance, and timeliness.
Wait!
Here's an interesting quiz for you.