Parallel Processing Quiz 3

Approved & Edited by ProProfs Editorial Team
The editorial team at ProProfs Quizzes consists of a select group of subject experts, trivia writers, and quiz masters who have authored over 10,000 quizzes taken by more than 100 million users. This team includes our in-house seasoned quiz moderators and subject matter experts. Our editorial experts, spread across the world, are rigorously trained using our comprehensive guidelines to ensure that you receive the highest quality quizzes.
Learn about Our Editorial Process
| By Rahul08
R
Rahul08
Community Contributor
Quizzes Created: 1 | Total Attempts: 566
Questions: 10 | Attempts: 566

SettingsSettingsSettings
Parallel Processing Quiz 3 - Quiz


Questions and Answers
  • 1. 

    For a tiled 1D convolution, if the output tile width is 250 elements and mask width is 7 elements, what is the input tile width loaded in to shared memory?

    • A.

      250

    • B.

      254

    • C.

      256

    • D.

      7

    Correct Answer
    C. 256
    Explanation
    In a tiled 1D convolution, the output tile width is given as 250 elements and the mask width is given as 7 elements. To efficiently perform the convolution, the input tile width loaded into shared memory needs to be a multiple of the mask width. Since the mask width is 7, the input tile width needs to be a multiple of 7. Among the given options, the only value that is a multiple of 7 is 256. Therefore, the input tile width loaded into shared memory is 256.

    Rate this question:

  • 2. 

    For the work inefficient scan kernel based on reduction trees, assume that we have 1024 elements, which of the following gives the closest approximation of the number of add operations performed?

    • A.

      (1024-1) *2

    • B.

      (512-1) *2

    • C.

      1024*1024

    • D.

      1024*10

    Correct Answer
    D. 1024*10
    Explanation
    The given correct answer, 1024*10, gives the closest approximation of the number of add operations performed in the work inefficient scan kernel based on reduction trees. This is because the number of add operations performed in the scan kernel is equal to the number of elements minus one, multiplied by two. In this case, there are 1024 elements, so the number of add operations would be (1024-1) * 2 = 2046. Therefore, the closest approximation to 2046 is 1024*10, which equals 10240.

    Rate this question:

  • 3. 

    Barrier synchronizations should be used whenever we want to ensure all threads have completed a common phase of their execution_____________

    • A.

      Before any of them start the next phase

    • B.

      After any of them start the next phase

    • C.

      Before any of them start the previous phase

    • D.

      After any of them start the previous phase

    Correct Answer
    A. Before any of them start the next phase
    Explanation
    Barrier synchronizations should be used whenever we want to ensure all threads have completed a common phase of their execution before any of them start the next phase. This means that the barrier synchronization will block the threads until all of them have reached the barrier, ensuring that they all finish the current phase before moving on to the next one. This helps in coordinating the execution of multiple threads and ensures that they all reach a specific point before proceeding further.

    Rate this question:

  • 4. 

    Each time a DRAM location is accessed, then __________

    • A.

      Many consecutive locations that include the requested location are actually accessed

    • B.

      Only the location requested location are actually accessed

    • C.

      All the locations that include the requested location are actually accessed

    • D.

      Many consecutive locations that excluding the requested location are actually accessed

    Correct Answer
    C. All the locations that include the requested location are actually accessed
    Explanation
    When a DRAM location is accessed, all the locations that include the requested location are actually accessed. This is because DRAM operates in blocks, and when a specific location is accessed, the entire block that contains that location is read or written to. This is known as the "row buffer" or "page" in DRAM, and it helps to improve efficiency by accessing multiple locations at once. Therefore, all the locations within the block are accessed, not just the requested location.

    Rate this question:

  • 5. 

    Consider performing a 1D convolution on array N= {4,1,3,2,3} with mask M={2,1,4}. What is the resulting output array? 

    • A.

      {8,21,13,21,8}

    • B.

      {8,21,13,20,7}

    • C.

      {9,21,14,20,7}

    • D.

      {9,21,14,21,7}

    Correct Answer
    B. {8,21,13,20,7}
  • 6. 

    Correct Syntax to declare constant memory is:

    • A.

      CudaMemcpyToSymbol (dest, src, size)

    • B.

      CudaMemcpy (dest, src, size)

    • C.

      CudaMemcpyToSymbol (src, dest, size)

    • D.

      CudaMemcpySymbol (dest, src, size)

    Correct Answer
    A. CudaMemcpyToSymbol (dest, src, size)
    Explanation
    The correct syntax to declare constant memory is cudaMemcpyToSymbol (dest, src, size). This function is used to copy data from the host memory to the constant memory on the device. The "dest" parameter specifies the destination symbol in the constant memory, the "src" parameter specifies the source data in the host memory, and the "size" parameter specifies the size of the data to be copied.

    Rate this question:

  • 7. 

    Consider performing a 1D convolution on an array of size n with a mask of size m. How many halo cells are there in total?

    • A.

      M+n-1

    • B.

      M-1

    • C.

      N-1

    • D.

      M+n

    Correct Answer
    B. M-1
    Explanation
    When performing a 1D convolution on an array of size n with a mask of size m, the halo cells are the extra cells that are added to the array to ensure that the convolution is properly calculated at the edges. The number of halo cells required is equal to the size of the mask minus one (m-1). This is because the mask needs to extend one cell beyond each end of the array to cover all the elements in the convolution.

    Rate this question:

  • 8. 

    How many multiplications are performed if halo cells are treated as multiplications (by 0) for an array of size n and mask of size m in case of 1-D convolution?

    • A.

      M*n+1

    • B.

      M*n-1

    • C.

      M*n

    • D.

      N*n

    Correct Answer
    C. M*n
    Explanation
    In 1-D convolution, each element of the mask is multiplied with the corresponding element in the array, and then all the products are summed up. Since the mask has size m and the array has size n, there will be m*n multiplications. Additionally, there is one extra multiplication because of the halo cells being treated as multiplications by 0. Therefore, the total number of multiplications is m*n+1.

    Rate this question:

  • 9. 

    Which of the memory is referred to as “scratchpad memory”-

    • A.

      Constant memory

    • B.

      Global memory

    • C.

      Shared memory

    • D.

      Registers

    Correct Answer
    C. Shared memory
    Explanation
    Shared memory refers to a type of memory that is shared among multiple threads within a block in a GPU. It is referred to as "scratchpad memory" because it can be used as a temporary storage space for threads to quickly exchange data and communicate with each other. This type of memory is faster to access compared to global memory, making it ideal for frequently accessed data that needs to be shared among threads. Constant memory, global memory, and registers are different types of memory in a GPU, but they do not specifically serve the purpose of scratchpad memory.

    Rate this question:

  • 10. 

    Do CUDA memory architecture have cache and cache levels?

    • A.

      True

    • B.

      False

    Correct Answer
    A. True
    Explanation
    CUDA memory architecture does have cache and cache levels. The GPU's memory hierarchy includes multiple levels of cache, such as L1 and L2 caches, which are used to store frequently accessed data and improve memory access latency. These caches help to reduce the time it takes to fetch data from the main memory, thereby improving overall performance.

    Rate this question:

Quiz Review Timeline +

Our quizzes are rigorously reviewed, monitored and continuously updated by our expert board to maintain accuracy, relevance, and timeliness.

  • Current Version
  • Mar 21, 2023
    Quiz Edited by
    ProProfs Editorial Team
  • Mar 07, 2017
    Quiz Created by
    Rahul08
Back to Top Back to top
Advertisement
×

Wait!
Here's an interesting quiz for you.

We have other quizzes matching your interest.