Parallel Programming Exam Quiz!

10 Questions | Total Attempts: 409

SettingsSettingsSettings
Parallel Programming Exam Quiz! - Quiz

.


Questions and Answers
  • 1. 
    We want to use each thread to calculate two (adjacent) elements of vector addition. Assume that a variable I should be the index for the first element to be processed by a thread. What would be the expression for mapping the thread/block indices to the data index?
    • A. 

      I = blockIdx.x*blockDim.x + threadIdx.x + 2

    • B. 

      I = blockIdx.x*threadIdx.x*2

    • C. 

      I = (blockIdx.x*blockDim.x + threadIdx.x)*2

    • D. 

      I = blockIdx.x*threadIdx.x*2 + threadIdx.x

  • 2. 
    For a vector addition, assume that the vector length is 4000, each thread calculates one output element, and the thread block size is 1024 threads. How many threads will be in the grid?
    • A. 

      2000

    • B. 

      3000

    • C. 

      1024

    • D. 

      4096

  • 3. 
    If a CUDA device’s SM (streaming multiprocessor) can take up to 1536 threads and up to 6 thread blocks. Which of the following block configuration would result in the most number of threads in the SM?
    • A. 

      128 threads per block

    • B. 

      256 threads per block

    • C. 

      512 threads per block

    • D. 

      1024 threads per block

  • 4. 
    __syncthreads() function is applicable to?
    • A. 

      Thread level

    • B. 

      Block level

    • C. 

      Grid level

    • D. 

      All the option

  • 5. 
    For tiled matrix-matrix multiplication kernel, if we use a 64x64 tile, what is the reduction of memory bandwidth usage for input matrices M and N?
    • A. 

      1/8 of the original usage

    • B. 

      1/16 of the original usage

    • C. 

      1/32of the original usage

    • D. 

      1/64 of the original usage

  • 6. 
    Assume that a kernel is launched with 1000 thread blocks each of which has 512 threads. If a variable is declared as a local variable in the kernel, how many versions of the variable will be created through the lifetime of the execution of the kernel?
    • A. 

      1

    • B. 

      1000

    • C. 

      51200

    • D. 

      512000

  • 7. 
    Consider performing a matrix multiplication of two input matrices with dimensions NxN. How many times is each element in the input matrices requested from global memory, When tiles of size TxT are used?
    • A. 

      T/N

    • B. 

      N/T

    • C. 

      T*N

    • D. 

      (N*N)/(T*T)

  • 8. 
    For the shared memory based tiled matrix multiplication (MxN) based on a row-major layout, which input matrix will have coalesced access?
    • A. 

      M

    • B. 

      N

    • C. 

      Both

    • D. 

      None

  • 9. 
    What are the qualifier keywords in function declarations in CUDA?
    • A. 

      __graphic__

    • B. 

      __global__

    • C. 

      __Kernel__

    • D. 

      All the option

  • 10. 
    A number of configuration parameters in the CUDA kernel function call.
    • A. 

      2

    • B. 

      1

    • C. 

      3

    • D. 

      5

Back to Top Back to top