Parallel Processing Quiz 2

Reviewed by Editorial Team

The ProProfs editorial team is comprised of experienced subject matter experts. They've collectively created over 10,000 quizzes and lessons, serving over 100 million users. Our team includes in-house content moderators and subject matter experts, as well as a global network of rigorously trained contributors. All adhere to our comprehensive editorial guidelines, ensuring the delivery of high-quality content.

Learn about Our Editorial Process

| By Rahulsaxena08

Rahulsaxena08

Community Contributor

Quizzes Created: 1 | Total Attempts: 796

| Attempts: 806

1/10 Questions

For the tiled single-precision matrix multiplication kernel, assume that the tile size is 32X32 and the system has a DRAM burst size of 128 bytes. How many DRAM bursts will be delivered to the processor as a result of loading one A-matrix tile by a thread block?Which one do you like?
- 16
- 32
- 64
- 128

About This Quiz

Parallel Processing Quiz 2 assesses knowledge on CUDA programming for parallel processing tasks. It covers shared memory usage, DRAM bursts, thread indexing, and kernel execution, focusing on optimizing data handling and computation in a GPU environment.

Recent Quizzes

Parallel Programming Exam Quiz!

The 'Parallel Programming Exam Quiz!' assesses knowledge in parallel computing using CUDA. It covers thread indexing, memory management, and synchronization, targeting skills...

Questions: 10 | Attempts: 717 | Last updated: Jun 14, 2023

Sample Question
We want to use each thread to calculate two (adjacent) elements of vector addition. Assume that a variable I should be the index for the first element to be processed by a thread. What would be the expression for mapping the thread/block indices to the data index?

I = blockIdx.x*blockDim.x + threadIdx.x + 2

I = blockIdx.x*threadIdx.x*2

I = (blockIdx.x*blockDim.x + threadIdx.x)*2

I = blockIdx.x*threadIdx.x*2 + threadIdx.x

Test Your Coding Interest Levels

Take the online quiz and discover your interest levels in the coding field. Each question embedded with a list of interest requires you to pick one ultimately will guide you to...

Questions: 10 | Attempts: 502 | Last updated: Mar 21, 2022

Sample Question
Use my artistic abilities developing sketches on papers for an actionable plan.

Very Interested

Interested

Slightly Interested

Not Interested

Programming For Problem Solving

This quiz titled 'Programming for Problem Solving' assesses knowledge in C programming, focusing on understanding functions, recursion, and program structure. It is designed for...

Questions: 30 | Attempts: 832 | Last updated: Mar 22, 2023

Sample Question
A function which calls itself is called a ___ function.

Self Function

Auto Function

Recursive Function

Static Function

PPS Class Test 2_1

This test evaluates understanding of basic programming concepts including structures, function prototypes, recursion, and array operations in a programming context.

Questions: 10 | Attempts: 140 | Last updated: Mar 21, 2023

Sample Question

50

4

15

69

True/False Exercise - NumPy

This True\/False quiz tests knowledge on NumPy, focusing on ndarray characteristics, array operations, and data manipulation techniques.

Questions: 10 | Attempts: 446 | Last updated: Mar 21, 2023

Sample Question
Ndarray is also known as the alias array

True

False

Midterm - Quiz 3

Midterm Quiz 3 explores multicore programming, thread management, and multithreading models, focusing on practical approaches and challenges in concurrent programming.

Questions: 13 | Attempts: 119 | Last updated: Mar 15, 2023

Sample Question
One of the challenges in multicore programming is dividing accessed and manipulated by tasks to run on seperate cores. This area is called: [Blank].

DATA SPLITTING

Parallel Processing Quiz 2

For the tiled single-precision matrix multiplication kernel, assume that the tile size is 32X32 and the system has a DRAM burst size of 128 bytes. How many DRAM bursts will be delivered to the processor as a result of loading one A-matrix tile by a thread block?Which one do you like?

Quiz Preview

If a CUDA device’s SM (streaming multiprocessor) can take up to 1,536 threads and up to 8 thread blocks. Which of the following block configuration would result in the most number of threads in each SM?

Assume that a kernel is launched with 1000 thread blocks each of which has 512 threads. If a variable is declared as a shared memory variable, how many versions of the variable will be created through the lifetime of the execution of the kernel?

We want to use each thread to calculate two (adjacent) output elements of a vector addition. Assume that variable i should be the index for the first element to be processed by a thread. What would be the expression for mapping the thread/block indices to data index of the first element?

SM implements zero overhead scheduling because –

device constant int mask=10 will have memory, lifetime and scope defined as

For the simple reduction kernel, if the block size is 1,024 and the warp size is 32, how many warps in a block will have divergence during the 5th iteration?

Parallel Programming Exam Quiz!

Test Your Coding Interest Levels

Programming For Problem Solving

PPS Class Test 2_1

True/False Exercise - NumPy

Midterm - Quiz 3

Parallel Processing Quiz 2

For the tiled single-precision matrix multiplication kernel, assume that the tile size is 32X32 and the system has a DRAM burst size of 128 bytes. How many DRAM bursts will be delivered to the processor as a result of loading one A-matrix tile by a thread block?Which one do you like?

Quiz Preview

If a CUDA device’s SM (streaming multiprocessor) can take up to 1,536 threads and up to 8 thread blocks. Which of the following block configuration would result in the most number of threads in each SM?

Assume that a kernel is launched with 1000 thread blocks each of which has 512 threads. If a variable is declared as a shared memory variable, how many versions of the variable will be created through the lifetime of the execution of the kernel?

We want to use each thread to calculate two (adjacent) output elements of a vector addition. Assume that variable i should be the index for the first element to be processed by a thread. What would be the expression for mapping the thread/block indices to data index of the first element?

SM implements zero overhead scheduling because –

__device__ constant int mask=10 will have memory, lifetime and scope defined as

For the simple reduction kernel, if the block size is 1,024 and the warp size is 32, how many warps in a block will have divergence during the 5th iteration?

Parallel Programming Exam Quiz!

Test Your Coding Interest Levels

Programming For Problem Solving

PPS Class Test 2_1

True/False Exercise - NumPy

Midterm - Quiz 3

device constant int mask=10 will have memory, lifetime and scope defined as