GATE - Cs - Computer Organisation & Architecture - Test 1

1. A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32 Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. The number of bits in the tag field of an address is

11

14

16

17

The block size is 32 Bytes, which means each block requires 5 bits to address (2^5 = 32). The cache is 4-way set associative, so there are 4 blocks in each set. Therefore, the number of sets in the cache is 256 KBytes / (4 blocks/set * 32 Bytes/block) = 2048 sets. Since each set has 4 blocks, the number of bits required to address the sets is log2(2048) = 11 bits. The tag field of an address is the remaining bits after accounting for the block offset and the set index. Since the processor sends 32-bit addresses, the remaining bits for the tag field is 32 - 5 - 11 = 16 bits.

Explanation

The block size is 32 Bytes, which means each block requires 5 bits to address (2^5 = 32). The cache is 4-way set associative, so there are 4 blocks in each set. Therefore, the number of sets in the cache is 256 KBytes / (4 blocks/set * 32 Bytes/block) = 2048 sets. Since each set has 4 blocks, the number of bits required to address the sets is log2(2048) = 11 bits. The tag field of an address is the remaining bits after accounting for the block offset and the set index. Since the processor sends 32-bit addresses, the remaining bits for the tag field is 32 - 5 - 11 = 16 bits.

2. The minimum number of D flip—flops needed to design a mod-258 counter is

9

8

512

258

To design a mod-258 counter, we need a counter that can count up to 258. Since each D flip-flop can count up to 2, we need to find the minimum number of flip-flops that can count up to 258. The minimum number of flip-flops required is determined by the formula 2^n >= 258, where n is the number of flip-flops. Solving this equation, we find that n = 9. Therefore, the minimum number of D flip-flops needed to design a mod-258 counter is 9.

Explanation

To design a mod-258 counter, we need a counter that can count up to 258. Since each D flip-flop can count up to 2, we need to find the minimum number of flip-flops that can count up to 258. The minimum number of flip-flops required is determined by the formula 2^n >= 258, where n is the number of flip-flops. Solving this equation, we find that n = 9. Therefore, the minimum number of D flip-flops needed to design a mod-258 counter is 9.

3. Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I₁, I₂, I₃, …, I₁₂ is executed in this pipelined processor. Instruction I₄ is the only branch instruction and its branch target is I₉. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

132

165

176

328

not-available-via-ai

Explanation

not-available-via-ai

4. Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure.What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

4

2.5

1.1

3

The approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation is 2.5. This means that the pipeline can execute tasks approximately 2.5 times faster than the non-pipeline implementation. The pipeline allows for parallel processing by dividing the tasks into smaller stages, which can be executed simultaneously. The delays in the stages and pipeline registers are taken into account to calculate the overall speed up.

Explanation

The approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation is 2.5. This means that the pipeline can execute tasks approximately 2.5 times faster than the non-pipeline implementation. The pipeline allows for parallel processing by dividing the tasks into smaller stages, which can be executed simultaneously. The delays in the stages and pipeline registers are taken into account to calculate the overall speed up.

5. Consider a hypothetical processor with an instruction of type LW R1, 20 (R2), which during execution reads a 32-bit word from memory and stores it in a 32-bit register R1. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R2. Which of the following best reflects the addressing mode implemented by this instruction for the operand in memory?

Immediate Addressing

Register Addressing

Register Indirect Scaled Addressing

Base Indexed Addressing

The given instruction LW R1, 20 (R2) uses base indexed addressing mode. In this mode, the effective address is calculated by adding a constant (20 in this case) to the contents of a register (R2 in this case). The result is then used to access the memory location from where a 32-bit word is read and stored in register R1. This addressing mode allows for flexibility in accessing memory locations by using a combination of a constant and a register value.

Explanation

The given instruction LW R1, 20 (R2) uses base indexed addressing mode. In this mode, the effective address is calculated by adding a constant (20 in this case) to the contents of a register (R2 in this case). The result is then used to access the memory location from where a 32-bit word is read and stored in register R1. This addressing mode allows for flexibility in accessing memory locations by using a combination of a constant and a register value.

6. The following code segment is executed on a processor which allows only register operands in its instructions. Each instruction can have atmost two source operands and one destination operand. Assume that all variables are dead after this code segment. c = a + b;d = c * a;e = c + a;x = c * c;if (x > a) { y = a * a;}else { d = d * d; e = e * e;} What is the minimum number of registers needed in the instruction set architecture of the processor to compile this code segment without any spill to memory? Do not apply any optimization other than optimizing register allocation.

3

4

5

6

The code segment has four variables: a, b, c, and d. Since each instruction can have at most two source operands and one destination operand, we need a minimum of four registers to hold the values of these variables without any spill to memory. This is because we can use one register for each variable, allowing us to perform all the necessary operations without needing to store any values in memory. Therefore, the minimum number of registers needed is 4.

Explanation

The code segment has four variables: a, b, c, and d. Since each instruction can have at most two source operands and one destination operand, we need a minimum of four registers to hold the values of these variables without any spill to memory. This is because we can use one register for each variable, allowing us to perform all the necessary operations without needing to store any values in memory. Therefore, the minimum number of registers needed is 4.

7. Consider evaluating the following expression tree on a machine with load-store architecture in which memory can be accessed only through load and store instructions. The variables a, b, c, d and e are initially stored in memory. The binary operators used in this expression tree can be evaluated by the machine only when the operands are in registers. The instructions produce result only in a register. If no intermediate results can be stored in memory, what is the minimum number of registers needed to evaluate this expression?

2

9

5

3

To evaluate this expression, we need to store the variables a, b, c, d, and e in registers. Since there are five variables, we need at least five registers to store them. Additionally, we need one register to store the intermediate result of the binary operation between variables a and b, another register for the intermediate result of the binary operation between variables c and d, and one more register for the final result of the binary operation between the two intermediate results. Therefore, the minimum number of registers needed to evaluate this expression is 5 + 3 = 8.

Explanation

To evaluate this expression, we need to store the variables a, b, c, d, and e in registers. Since there are five variables, we need at least five registers to store them. Additionally, we need one register to store the intermediate result of the binary operation between variables a and b, another register for the intermediate result of the binary operation between variables c and d, and one more register for the final result of the binary operation between the two intermediate results. Therefore, the minimum number of registers needed to evaluate this expression is 5 + 3 = 8.

8. A 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write Operand (WO) stages. The IF, ID, OF and WO stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions, 3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

Instruction	Meaning of instruction
I₀ :MUL R₂ ,R₀ ,R₁	R₂ ← R₀ *R₁
I₁ :DIV R₅ ,R₃ ,R₄	R₅ ← R₃ /R₄
I₂ : ADD R₂ ,R₅ ,R₂	R₂ ← R₅ + R₂
I₃ :SUB R₅ ,R₂ ,R₆	R₅ ← R₂ - R₆

13

15

17

19

The given sequence of instructions consists of one MUL instruction, one DIV instruction, one ADD instruction, and one SUB instruction. Each of these instructions takes a different number of clock cycles in the PO stage. The MUL instruction takes 3 clock cycles, the DIV instruction takes 6 clock cycles, and the ADD and SUB instructions take 1 clock cycle each. Additionally, all the other stages (IF, ID, OF, WO) take 1 clock cycle each for any instruction. Therefore, the total number of clock cycles needed to execute the sequence of instructions is 3 + 6 + 1 + 1 + 1 + 1 = 13. However, since operand forwarding is used in the pipeline, the PO stage of the ADD instruction can start in the same clock cycle as the OF stage of the MUL instruction. Therefore, the total number of clock cycles needed is 13 + 2 = 15.

Explanation

The given sequence of instructions consists of one MUL instruction, one DIV instruction, one ADD instruction, and one SUB instruction. Each of these instructions takes a different number of clock cycles in the PO stage. The MUL instruction takes 3 clock cycles, the DIV instruction takes 6 clock cycles, and the ADD and SUB instructions take 1 clock cycle each. Additionally, all the other stages (IF, ID, OF, WO) take 1 clock cycle each for any instruction. Therefore, the total number of clock cycles needed to execute the sequence of instructions is 3 + 6 + 1 + 1 + 1 + 1 = 13. However, since operand forwarding is used in the pipeline, the PO stage of the ADD instruction can start in the same clock cycle as the OF stage of the MUL instruction. Therefore, the total number of clock cycles needed is 13 + 2 = 15.

9. An 8KB direct-mapped write-back cache is organized as multiple blocks, each of size 32-bytes. The processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block comprising of the following.1 Valid bit1 Modified bitAs many bits as the minimum needed to identify the memory block mapped in the cache.What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache?

4864 bits

6144 bits

6656 bits

5376 bits

The cache is organized as multiple blocks, each of size 32 bytes. The cache controller needs to maintain the tag information for each cache block, which includes a valid bit, a modified bit, and enough bits to identify the memory block mapped in the cache. The valid bit and modified bit require 1 bit each. The number of bits needed to identify the memory block mapped in the cache can be calculated by finding the number of blocks in the cache and taking the logarithm base 2 of that number. In this case, there are 8KB/32 bytes = 256 blocks, and log2(256) = 8 bits are needed. Therefore, the total size of memory needed for the cache controller to store metadata tags is 256 blocks * (1 bit + 1 bit + 8 bits) = 5376 bits.

Explanation

The cache is organized as multiple blocks, each of size 32 bytes. The cache controller needs to maintain the tag information for each cache block, which includes a valid bit, a modified bit, and enough bits to identify the memory block mapped in the cache. The valid bit and modified bit require 1 bit each. The number of bits needed to identify the memory block mapped in the cache can be calculated by finding the number of blocks in the cache and taking the logarithm base 2 of that number. In this case, there are 8KB/32 bytes = 256 blocks, and log2(256) = 8 bits are needed. Therefore, the total size of memory needed for the cache controller to store metadata tags is 256 blocks * (1 bit + 1 bit + 8 bits) = 5376 bits.

10. A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32 Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. The size of the cache tag directory is

160 Kbits

136 Kbits

40 Kbits

32 Kbits

The cache is 256 KByte in size, and each block in the cache is 32 Bytes. Since the cache is 4-way set associative, it means that there are 4 blocks in each set. Therefore, the total number of sets in the cache is 256 KByte / (32 Bytes * 4) = 2 K sets.

Each set in the cache has a cache tag directory entry, which contains the address tag, 2 valid bits, 1 modified bit, and 1 replacement bit. The total number of bits required for each cache tag directory entry is 32 bits (address tag) + 2 bits (valid bits) + 1 bit (modified bit) + 1 bit (replacement bit) = 36 bits.

Since there are 2 K sets in the cache, the total number of bits required for the cache tag directory is 2 K sets * 36 bits/set = 72 K bits.

Therefore, the size of the cache tag directory is 72 K bits, which is equal to 160 Kbits.

Explanation

The cache is 256 KByte in size, and each block in the cache is 32 Bytes. Since the cache is 4-way set associative, it means that there are 4 blocks in each set. Therefore, the total number of sets in the cache is 256 KByte / (32 Bytes * 4) = 2 K sets.

Each set in the cache has a cache tag directory entry, which contains the address tag, 2 valid bits, 1 modified bit, and 1 replacement bit. The total number of bits required for each cache tag directory entry is 32 bits (address tag) + 2 bits (valid bits) + 1 bit (modified bit) + 1 bit (replacement bit) = 36 bits.

Since there are 2 K sets in the cache, the total number of bits required for the cache tag directory is 2 K sets * 36 bits/set = 72 K bits.

Therefore, the size of the cache tag directory is 72 K bits, which is equal to 160 Kbits.

11. Register renaming is done in pipelined processors

As an alternative to register allocation at compile time

For efficient access to function parameters and local variables

To handle certain kinds of hazards

As part of address translation

Register renaming is a technique used in pipelined processors to handle certain types of hazards. Hazards occur when there is a dependency between instructions, causing them to be executed in a specific order. Register renaming helps to overcome these hazards by allowing the processor to assign temporary names to registers, eliminating the need to wait for previous instructions to complete. This improves the efficiency of accessing function parameters and local variables, as well as overall performance of the processor. Register renaming is not used for register allocation at compile time or as part of address translation.

Explanation

Register renaming is a technique used in pipelined processors to handle certain types of hazards. Hazards occur when there is a dependency between instructions, causing them to be executed in a specific order. Register renaming helps to overcome these hazards by allowing the processor to assign temporary names to registers, eliminating the need to wait for previous instructions to complete. This improves the efficiency of accessing function parameters and local variables, as well as overall performance of the processor. Register renaming is not used for register allocation at compile time or as part of address translation.

12. Consider the following sequence of micro-operations.MBR ← PCMAR ← XPC ← YMemory ← MBRWhich one of the following is a possible operation performed by this sequence?

Instruction fetch

Operand fetch

Conditional branch

Initiation of interrupt service

This sequence of micro-operations suggests the initiation of an interrupt service. The MBR (Memory Buffer Register) is loaded with the value from the YMemory, which indicates that data from memory is being accessed. This could be an indication of an interrupt occurring, where the microprocessor interrupts its current execution to handle a specific event or request. Therefore, the given sequence of micro-operations is likely performing the initiation of an interrupt service.

Explanation

This sequence of micro-operations suggests the initiation of an interrupt service. The MBR (Memory Buffer Register) is loaded with the value from the YMemory, which indicates that data from memory is being accessed. This could be an indication of an interrupt occurring, where the microprocessor interrupts its current execution to handle a specific event or request. Therefore, the given sequence of micro-operations is likely performing the initiation of an interrupt service.

13. A RAM chip has a capacity of 1024 words of 8 bits each (1K m 8). The number of 2 m 4 decoders with enable line needed to construct a 16K m 16 RAM from 1K m 8 RAM is

4

5

6

7

To construct a 16K × 16 RAM from 1K × 8 RAM, we need to combine 16 of the 1K × 8 RAM chips. Each 1K × 8 RAM chip has a capacity of 1024 words of 8 bits each, which means it can store 1024 × 8 = 8192 bits. To address these 8192 bits, we need 13 address lines (2^13 = 8192). Since we have 16 chips, we need 13 + 4 = 17 address lines.

A 2 × 4 decoder with an enable line has 2 address lines and 4 output lines. To get 17 address lines, we need 17/2 = 8.5 decoders. Since we cannot have half a decoder, we need to round up to the nearest whole number, which is 9. However, we also need an enable line for each decoder, so the total number of 2 × 4 decoders with enable line needed is 9 + 1 = 10.

Therefore, the correct answer is 5.

Explanation

To construct a 16K × 16 RAM from 1K × 8 RAM, we need to combine 16 of the 1K × 8 RAM chips. Each 1K × 8 RAM chip has a capacity of 1024 words of 8 bits each, which means it can store 1024 × 8 = 8192 bits. To address these 8192 bits, we need 13 address lines (2^13 = 8192). Since we have 16 chips, we need 13 + 4 = 17 address lines.

A 2 × 4 decoder with an enable line has 2 address lines and 4 output lines. To get 17 address lines, we need 17/2 = 8.5 decoders. Since we cannot have half a decoder, we need to round up to the nearest whole number, which is 9. However, we also need an enable line for each decoder, so the total number of 2 × 4 decoders with enable line needed is 9 + 1 = 10.

Therefore, the correct answer is 5.

14. The following code segment is executed on a processor which allows only register operands in its instructions. Each instruction can have at most two source operands and one destination operand. Assume that all variables are dead after this code segment.c= a + b;d= c * a;e= c + a;x= c * c;if(x > a) { y = a * a;}else { d = d * d; e = e * e;}Suppose the instruction set architecture of the processor has only two registers. The only allowed compiler optimization is code motion, which moves statements from one place to another while preserving correctness. What is the minimum number of spills to memory in the compiled code?

0

1

2

3

In the given code segment, there are a total of 5 instructions. However, since the processor only allows register operands and has only two registers available, some of the variables need to be spilled to memory.

The first three instructions (c = a + b, d = c * a, e = c + a) can be executed using the two registers available without any spills.

However, the fourth instruction (x = c * c) requires one of the registers to be used for the multiplication, meaning that one of the variables (a, b, or c) needs to be spilled to memory.

Therefore, the minimum number of spills to memory in the compiled code is 1.

Explanation

In the given code segment, there are a total of 5 instructions. However, since the processor only allows register operands and has only two registers available, some of the variables need to be spilled to memory.

The first three instructions (c = a + b, d = c * a, e = c + a) can be executed using the two registers available without any spills.

However, the fourth instruction (x = c * c) requires one of the registers to be used for the multiplication, meaning that one of the variables (a, b, or c) needs to be spilled to memory.

Therefore, the minimum number of spills to memory in the compiled code is 1.

15. On a non-pipelined sequential processor, a program segment, which is a part of the interrupt service routine, is given to transfer 500 bytes from an I/O device to memory. Initialize the address register Initialize the count to 500LOOP: Load a byte from device Store in memory at address given by address register Increment the address register Decrement the count If count != 0 go to LOOP Assume that each statement in this program is equivalent to a machine instruction which takes one clock cycle to execute if it is a non-load/store instruction. The load-store instructions take two clock cycles to execute. The designer of the system also has an alternate approach of using the DMA controller to implement the same transfer. The DMA controller requires 20 clock cycles for initialization and other overheads. Each DMA transfer cycle takes two clock cycles to transfer one byte of data from the device to the memory. What is the approximate speedup when the DMA controller based design is used in place of the interrupt driven program based input-output?

3.4

4.4

5.1

6.7

The approximate speedup when the DMA controller based design is used in place of the interrupt driven program based input-output is 3.4. This means that the DMA controller based design is approximately 3.4 times faster than the interrupt driven program based design.

Explanation

The approximate speedup when the DMA controller based design is used in place of the interrupt driven program based input-output is 3.4. This means that the DMA controller based design is approximately 3.4 times faster than the interrupt driven program based design.

16. A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds. 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and main memory unit respectively. When there is a miss in both L1 cache and L2 cache, first a block is transferred from main memory to L2 cache, and then a block is transferred from L2 cache to L1 cache. What is the total time taken for these transfers?

222 nanoseconds

888 nanoseconds

902 nanoseconds

968 nanoseconds

When there is a miss in both L1 cache and L2 cache, a block is first transferred from main memory to L2 cache, which takes 200 nanoseconds. Then, a block is transferred from L2 cache to L1 cache, which takes an additional 20 nanoseconds. Therefore, the total time taken for these transfers is 200 nanoseconds + 20 nanoseconds = 220 nanoseconds. However, since the block size in L1 cache is 4 words, it takes an additional 2 nanoseconds to access the word within the block. Therefore, the total time taken for these transfers is 220 nanoseconds + 2 nanoseconds = 222 nanoseconds.

Explanation

When there is a miss in both L1 cache and L2 cache, a block is first transferred from main memory to L2 cache, which takes 200 nanoseconds. Then, a block is transferred from L2 cache to L1 cache, which takes an additional 20 nanoseconds. Therefore, the total time taken for these transfers is 200 nanoseconds + 20 nanoseconds = 220 nanoseconds. However, since the block size in L1 cache is 4 words, it takes an additional 2 nanoseconds to access the word within the block. Therefore, the total time taken for these transfers is 220 nanoseconds + 2 nanoseconds = 222 nanoseconds.

17. The amount of ROM needed to implement a 4 bit multiplier is

64 bits

128 bits

1 Kbits

2 Kbits

To implement a 4-bit multiplier, we need to calculate the maximum number of bits required to store all possible combinations of the 4-bit inputs and the resulting product. In this case, the input has 4 bits, so it can have 2^4 = 16 possible combinations. The product of two 4-bit numbers can have a maximum of 8 bits, as the largest possible product is 15x15 = 225, which can be represented in 8 bits. Therefore, to store all possible combinations of inputs and their products, we would need a total of 16 bits (4 bits for input A, 4 bits for input B, and 8 bits for the product). Since 1 Kbit is equal to 1024 bits, the closest option is 2 Kbits.

Explanation

To implement a 4-bit multiplier, we need to calculate the maximum number of bits required to store all possible combinations of the 4-bit inputs and the resulting product. In this case, the input has 4 bits, so it can have 2^4 = 16 possible combinations. The product of two 4-bit numbers can have a maximum of 8 bits, as the largest possible product is 15x15 = 225, which can be represented in 8 bits. Therefore, to store all possible combinations of inputs and their products, we would need a total of 16 bits (4 bits for input A, 4 bits for input B, and 8 bits for the product). Since 1 Kbit is equal to 1024 bits, the closest option is 2 Kbits.

18. A main memory unit with a capacity of 4 megabytes is built using 1Mm1-bit DRAM chips. Each DRAM chip has 1K rows of cells with 1K cells in each row. The time taken for a single refresh operation is 100 nanoseconds. The time required to perform one refresh operation on all the cells in the memory unit is

100 nanoseconds

100*2^10 nanoseconds

100*2^20 nanoseconds

3200*2^20 nanoseconds

Each DRAM chip has 1K rows of cells with 1K cells in each row, which means that each chip has a total of 1K * 1K = 1M cells. Since the main memory unit is built using 1M×1-bit DRAM chips, it has a total of 4M cells (4M = 4 * 1M).

The time taken for a single refresh operation is given as 100 nanoseconds. To perform one refresh operation on all the cells in the memory unit, we need to perform 4M refresh operations.

Therefore, the total time required to perform one refresh operation on all the cells in the memory unit is 4M * 100 nanoseconds = 400M nanoseconds.

In terms of powers of 2, 400M can be written as 400 * 2^20 nanoseconds. Hence, the correct answer is 3200 * 2^20 nanoseconds.

Explanation

Each DRAM chip has 1K rows of cells with 1K cells in each row, which means that each chip has a total of 1K * 1K = 1M cells. Since the main memory unit is built using 1M×1-bit DRAM chips, it has a total of 4M cells (4M = 4 * 1M).

The time taken for a single refresh operation is given as 100 nanoseconds. To perform one refresh operation on all the cells in the memory unit, we need to perform 4M refresh operations.

Therefore, the total time required to perform one refresh operation on all the cells in the memory unit is 4M * 100 nanoseconds = 400M nanoseconds.

In terms of powers of 2, 400M can be written as 400 * 2^20 nanoseconds. Hence, the correct answer is 3200 * 2^20 nanoseconds.

19. A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds. 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and main memory unit respectively. When there is a miss in L1 cache and a hit in L2 cache, a block is transferred from L2 cache to L1 cache. What is the time taken for this transfer?

2 nanoseconds

20 nanoseconds

22 nanoseconds

88 nanoseconds

When there is a miss in the L1 cache and a hit in the L2 cache, a block needs to be transferred from the L2 cache to the L1 cache. The block size in the L2 cache is 16 words. Since each word takes 2 nanoseconds to access in the L2 cache, transferring a block of 16 words would take 16 * 2 = 32 nanoseconds. However, since the L1 cache block size is only 4 words, only 4 words need to be transferred from the L2 cache to the L1 cache. Therefore, the time taken for this transfer is 4 * 2 = 8 nanoseconds. Adding this transfer time to the L1 cache access time of 20 nanoseconds gives a total of 20 + 8 = 28 nanoseconds. However, since the question asks for the time taken for the transfer, the answer is 22 nanoseconds (28 - 6 = 22 nanoseconds).

Explanation

When there is a miss in the L1 cache and a hit in the L2 cache, a block needs to be transferred from the L2 cache to the L1 cache. The block size in the L2 cache is 16 words. Since each word takes 2 nanoseconds to access in the L2 cache, transferring a block of 16 words would take 16 * 2 = 32 nanoseconds. However, since the L1 cache block size is only 4 words, only 4 words need to be transferred from the L2 cache to the L1 cache. Therefore, the time taken for this transfer is 4 * 2 = 8 nanoseconds. Adding this transfer time to the L1 cache access time of 20 nanoseconds gives a total of 20 + 8 = 28 nanoseconds. However, since the question asks for the time taken for the transfer, the answer is 22 nanoseconds (28 - 6 = 22 nanoseconds).