

# DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS6801 MULTI-CORE ARCHITECTURES AND PROGRAMMING

**SEM:08** 

## YEAR:04

**BATCH 2015-2019** 

## **Vision of Institution**

To build Jeppiaar Engineering College as an Institution of Academic Excellence in Technical education and Management education and to become a World Class University.

## **Mission of Institution**

| M1 | To excel in teaching and <b>learning, research and innovation</b> by promoting the principles of scientific analysis and creative thinking                                                            |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| M2 | To participate in the production, <b>development and dissemination of knowledge</b> and interact with <b>national and international communities</b>                                                   |
| M3 | To equip students with <b>values, ethics and life skills</b> needed to enrich their lives and enable them to meaningfully contribute to the <b>progress of society</b>                                |
| M4 | To prepare students for higher studies and lifelong learning, enrich them with the practical and entrepreneurial skills necessary to excel as future professionals and contribute to Nation's economy |

## Program Outcomes (POs)

| 0                                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| PO1 Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and an engineering specialization to the solution of complex engineering problems. |                                                                                                                                                                                                                                                                                                  |  |  |  |  |
| PO2                                                                                                                                                                                  | <b>Problem analysis</b> : Identify, formulate, review research literature, and analyze complex engineering problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering sciences.                                                        |  |  |  |  |
| PO3                                                                                                                                                                                  | <b>Design/development of solutions</b> : Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations |  |  |  |  |
| PO4                                                                                                                                                                                  | <b>Conduct investigations of complex problems</b> : Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.                                                       |  |  |  |  |
| PO5                                                                                                                                                                                  | <b>Modern tool usage</b> : Create, select, and apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modeling to complex engineering activities with an understanding of the limitations.                                                        |  |  |  |  |
| PO6                                                                                                                                                                                  | <b>The engineer and society</b> : Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering practice.                                                      |  |  |  |  |
| PO7                                                                                                                                                                                  | <b>Environment and sustainability</b> : Understand the impact of the professional engineering solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable development.                                                                          |  |  |  |  |

| PO8  | <b>Ethics</b> : Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice.                                                                                                                                                                    |  |  |  |  |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| PO9  | <b>Individual and team work</b> : Function effectively as an individual, and as a member or leader in diverse teams, and in multidisciplinary settings.                                                                                                                                                   |  |  |  |  |
| PO10 | <b>Communication</b> : Communicate effectively on complex engineering activities with the engineering community and with society at large, such as, being able to comprehend and write effective reports and design documentation, make effective presentations, and give and receive clear instructions. |  |  |  |  |
| PO11 | <b>Project management and finance</b> : Demonstrate knowledge and understanding of the engineering and management principles and apply these to one's own work, as a member and leader in a team, to manage projects and in multidisciplinary environments.                                               |  |  |  |  |
| PO12 | <b>Life-long learning</b> : Recognize the need for, and have the preparation and ability to engage in independent and life-long learning in the broadest context of technological change.                                                                                                                 |  |  |  |  |

## **Vision of Department**

To emerge as a globally prominent department, developing ethical computer professionals, innovators and entrepreneurs with academic excellence through quality education and research.

## **Mission of Department**

| M1 | To create <b>computer professionals</b> with an ability to identify and <b>formulate</b><br><b>the engineering problems</b> and also to provide <b>innovative solutions</b> through<br><b>effective teaching learning process.</b> |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| M2 | To strengthen the core-competence in computer science and engineering and to create an ability to interact effectively with industries.                                                                                            |
| M3 | To produce engineers with good professional skills, <b>ethical values</b> and life skills for the <b>betterment of the society</b> .                                                                                               |
| M4 | To encourage students towards <b>continuous and higher level learning</b> on technological advancements and provide a platform for <b>employment and self-employment</b> .                                                         |

## **Program Educational Objectives (PEOs)**

| PEO1 | To address the real time complex engineering problems using innovative approach with strong core computing skills.                         |  |  |  |  |  |
|------|--------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| PEO2 | To apply core-analytical knowledge and appropriate techniques and provide solutions to real time challenges of national and global society |  |  |  |  |  |
| PEO3 | Apply ethical knowledge for professional excellence and leadership for the betterment of the society.                                      |  |  |  |  |  |
| PEO4 | Develop life-long learning skills needed for better employment and entrepreneurship                                                        |  |  |  |  |  |

## **Program Specific Outcomes (PSOs)**

## Students will be able to

| PSO1 | An ability to understand the core concepts of computer science and engineering and to<br>enrich problem solving skills to analyze, design and implement software and hardware<br>based systems of varying complexity. |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|      | To interpret real-time problems with analytical skills and to arrive at cost effective and optimal solution using advanced tools and techniques.                                                                      |
| PSO3 | An understanding of social awareness and professional ethics with practical proficiency in<br>the broad area of programming concepts by lifelong learning to inculcate employment and<br>entrepreneurship skills.     |

## **BLOOM TAXANOMY LEVELS(BTL)**

BTL1: Remembering BTL 2: Understanding., BTL 3: Applying., BTL 4: Analyzing., BTL 5: Evaluating., BTL 6: Creating.,

#### JEPPIAAR ENGINEERING COLLEGE DEPARTMENT OF CSE QUESTION BANK

#### CS6801 MULTI-CORE ARCHITECTURES AND PROGRAMMING L T P C3 0 0 3

#### **OBJECTIVES:**

#### The student should be made to:

□ Understand the challenges in parallel and multi-threaded programming.

□ Learn about the various parallel programming paradigms, and solutions.

#### **UNIT I MULTI-CORE PROCESSORS**

Single core to Multi-core architectures – SIMD and MIMD systems – Interconnection networks -Symmetric and Distributed Shared Memory Architectures – Cache coherence - Performance Issues –Parallel program design.

#### UNIT II PARALLEL PROGRAM CHALLENGES

Performance – Scalability – Synchronization and data sharing – Data races – Synchronization primitives (mutexes, locks, semaphores, barriers) – deadlocks and livelocks – communication between threads (condition variables, signals, message queues and pipes).

#### UNIT III SHARED MEMORY PROGRAMMING WITH OpenMP

OpenMP Execution Model – Memory Model – OpenMP Directives – Work-sharing Constructs – Library functions – Handling Data and Functional Parallelism – Handling Loops -Performance

Considerations.

#### UNIT IV DISTRIBUTED MEMORY PROGRAMMING WITH MPI

MPI program execution – MPI constructs – libraries – MPI send and receive – Point-topoint and Collective communication – MPI derived datatypes – Performance evaluation

#### UNIT V PARALLEL PROGRAM DEVELOPMENT

Case studies - n-Body solvers - Tree Search - OpenMP and MPI implementations and comparison.

## **TOTAL: 45 PERIODS**

#### **OUTCOMES:**

#### At the end of the course, the student should be able to:

□ Program Parallel Processors.

□ Develop programs using OpenMP and MPI.

□ Compare and contrast programming for serial processors and programming for parallel processors.

#### 9

9

9

**9** Poi

9

## **TEXT BOOKS:**

1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011.

2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 (unit 2)

## **REFERENCES:**

Т

Г

1. Michael J Quinn, "Parallel programming in C with MPI and OpenMP", Tata McGraw Hill

## **COURSE OUTCOME**

| C409.1 | Illustrate the challenges in parallel and multi threaded programming            |
|--------|---------------------------------------------------------------------------------|
| C409.2 | Explain the various parallel programming paradigms and solutions.               |
| C409.3 | Develop shared memory programs using OpenMP                                     |
| C409.4 | Develop Distributed memory programs using MPI                                   |
| C409.5 | Compare and contrast programming for serial processors and parallel processors. |

| S.No<br>1 | UNIT<br>UNIT I | REF.BOOK<br>1. Peter S. Pacheco, "An Introduction to<br>Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011.          | PAGE.NO<br>1-8 |
|-----------|----------------|-----------------------------------------------------------------------------------------------------------------------------|----------------|
| 2         | UNIT II        | 2. Darryl Gove, "Multicore Application<br>Programming for Windows, Linux, and<br>Oracle Solaris",<br>Pearson, 2011 (unit 2) | 8-14           |
| 3         | UNIT III       | 1. Peter S. Pacheco, "An Introduction to<br>Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011.                      | 14-22          |
| 4         | UNIT IV        | 1. Peter S. Pacheco, "An Introduction to<br>Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011.                      | 22-28          |
| 5         | UNIT V         | 1. Peter S. Pacheco, "An Introduction to<br>Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011.                      | 28-36          |

## UNIT –I

### **MULTI-CORE PROCESSORS**

Single core to Multi-core architectures – SIMD and MIMD systems – Interconnection networks - Symmetric and Distributed Shared Memory Architectures – Cache coherence - Performance Issues –Parallel program design.

| Q. No. | Questions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                          |                                                                 |        | CO     | Bloom'<br>s Level |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|-----------------------------------------------------------------|--------|--------|-------------------|
| 1.     | What is Single core processors?<br>Single core processors have only one processor in die to process<br>instructions.                                                                                                                                                                                                                                                                                                                                                                                                                         |                                          |                                                                 | C409.1 | BTL1   |                   |
| 2.     | What are the Problems of Single Core Processors:<br>As we try to increase the clock speed of this processor, the amount<br>of heat produced by the chip also increases. It is a big hindrance in the<br>way of single core processors to continue evolving                                                                                                                                                                                                                                                                                   |                                          |                                                                 |        | C409.1 | BTL1              |
| 3.     | What is Multicore processor?<br>A multi-core processor is a single computing component with two or<br>more independent actual processing units (called "cores"), which are units<br>that read and execute program instructions. The multiple cores are embedded<br>in the same die. The multicore processor may looks like a single processor<br>but actuall y it contains two (dual - core), three (tri - core), four (quad -<br>core), six(hexa-core), eight(octa-core)or ten (deca-core) cores.Some<br>processor even have 22 or 32 cores |                                          |                                                                 |        | C409.1 | BTL1              |
| 4.     | What are the Problems with multicore processors.<br>According to Amdahl's law, the performance of parallel computing<br>is limited by its serial components. So, increasing the number of cores may<br>not be the best solution .There is need to increase the clock speed of<br>individual cores.                                                                                                                                                                                                                                           |                                          |                                                                 |        | C409.1 | BTL1              |
|        | Comparison Of Single-Core processor And Multi-Core Processor.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                          |                                                                 |        |        | BTL1              |
|        | Parameter                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Single-Core<br>Processor                 | Multi-Core<br>Processor                                         |        |        |                   |
| 5.     | Number of cores on a die                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Single                                   | Multiple                                                        |        |        |                   |
|        | Instruction Execution                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Can execute Single instruction at a time | Can execute multiple<br>instructions by using<br>multiple cores |        |        |                   |

|   | Gain                                                                                                | Speed up every<br>program or software<br>being executed                                                                                       | 1 0                                                                                                                                                               |                            |        |      |
|---|-----------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|--------|------|
|   | Performance                                                                                         | Dependent on the<br>clock frequency of<br>the core                                                                                            | · · ·                                                                                                                                                             |                            |        |      |
|   | Examples                                                                                            | Processor launched<br>before 2005 like<br>80386,486, AMD<br>29000, AMD K6,<br>Pentium I,II,III etc.                                           | Processor launched<br>after 2005 like Core -<br>2-Duo,Athlon 64<br>X2,I3,I5 and I7 etc                                                                            |                            |        |      |
| 6 | in <u>Flynn's taxonomy</u> .<br><u>elements</u> that perform<br>simultaneously. Thus, s             | <b>Itiple data (SIMD)</b> , is a<br>It describes computers<br>the same operation<br>uch machines exploit <u>da</u><br>simultaneous (parallel) | e data (SIMD)<br>a class of parallel computes<br>s with <u>multiple process</u><br>on multiple data pot<br>ata level parallelism, but<br>) computations, but only | <u>sing</u><br>ints<br>not | C409.1 | BTL1 |
| 7 | technique employed to<br>number of <u>processors</u> t<br>any time, different pro-                  | <b>MIMD</b> (multiple instr<br>achieve parallelism. Ma<br>hat function <u>asynchrono</u><br>cessors may be executi                            | uction, multiple data) is<br>achines using MIMD hav<br>ously and independently.<br>ng different instructions                                                      | ve a<br>At<br>on           | C409.1 | BTL1 |
|   | application areas su<br><u>manufacturing</u> , <u>simulat</u><br>MIMD machines can b<br>categories. | ich as <u>computer-aid</u><br><u>ion, modeling</u> , and as<br>be of either <u>shared men</u>                                                 | nay be used in a number<br>led design/computer-aid<br>s communication switch<br>nory or distributed mem                                                           | <u>ded</u><br>nes.         | C400.1 | BTL1 |
| 8 | speed computer networ                                                                               | connection networks<br>the usually composed of<br>fork and memory eleme                                                                       | (MINs) are a class of hi<br>f processing elements (P<br>ents (MEs) on the other e                                                                                 | 'Es)                       | C409.1 | BILI |
| 9 | What is meant by Rou<br>-How does a mean-Static or adaptive                                         | ssage get from source to                                                                                                                      | destination.                                                                                                                                                      |                            | C409.1 | BTL1 |

| 10 | What is Network interface?                                                                               | C409.1 | BTL1 |
|----|----------------------------------------------------------------------------------------------------------|--------|------|
| 10 | -Connects endpoints (e.g. cores) to network.                                                             |        |      |
|    | -Decouples computation/communication                                                                     |        |      |
|    | .What is Centralized shared-memory multiprocessor                                                        | C409.1 | BTL1 |
|    | It share a single centralized memory, interconnect processors and                                        |        |      |
| 11 | memory by a bus                                                                                          |        |      |
| 11 | • also known as "uniform memory access" (UMA) or "symmetric (shared-                                     |        |      |
|    | memory) multiprocessor" (SMP)                                                                            |        |      |
|    | <ul> <li>A symmetric relationship to all processors.</li> </ul>                                          |        |      |
|    | – A uniform memory access time from any processor.                                                       |        |      |
|    | What is the concept of Caching in shared-memory machines.                                                | C409.1 | BTL1 |
|    | • private data: used by a single processor                                                               |        |      |
|    | – When a private item is cached, its location is <i>migrated</i> to the cache                            |        |      |
| 12 | - Since no other processor uses the data, the program behavior is                                        |        |      |
|    | identical to that in a uniprocessor                                                                      |        |      |
|    | • shared data: used by multiple processor                                                                |        |      |
|    | – When shared data are cached, the shared value may be <i>replicated</i>                                 |        |      |
|    | in multiple caches.                                                                                      |        |      |
|    | What is Cache Coherence .                                                                                | C409.1 | BTL1 |
| 13 | • migration: a data item can be moved to a local cache and used                                          |        |      |
| 15 | there in a transparent fashion                                                                           |        |      |
|    | • replication for shared data that are being simultaneously read both                                    |        |      |
|    | are critical to performance in accessing shared data.                                                    |        |      |
|    | What is meant by Snooping Solution (Snoopy Bus).                                                         | C409.1 | BTL1 |
|    | - Send all requests for data to all processors                                                           |        |      |
| 14 | - Processors snoop to see if they have a copy and respond                                                |        |      |
|    | accordingly                                                                                              |        |      |
|    | - Requires broadcast, since caching information is at processors                                         |        |      |
|    | - Works well with bus (natural broadcast medium)                                                         |        |      |
|    | - Dominates for small scale machines (most of the market)                                                | C409.1 | BTL1 |
|    | What is meant by Directory-Based Schemes.                                                                | C409.1 | DILI |
|    | <ul> <li>Directory keeps track of what is being shared in a centralized place<br/>(logically)</li> </ul> |        |      |
|    | <ul> <li>Distributed memory =&gt; distributed directory for scalability (avoids</li> </ul>               |        |      |
| 15 | bottlenecks)                                                                                             |        |      |
|    | – Send point-to-point requests to processors via network                                                 |        |      |
|    | - Scales better than Snooping                                                                            |        |      |
|    | <ul> <li>Actually existed BEFORE Snooping-based schemes</li> </ul>                                       |        |      |
|    | - Actuary existed DEFORE Shooping-based schemes                                                          |        |      |
|    | When a memory system is coherent ?                                                                       | C409.1 | BTL1 |
|    | A memory system is coherent if:                                                                          |        |      |
| 10 | • P writes to X; no other processor writes to X; P reads X and                                           |        |      |
| 16 | receives the value previously written by P                                                               |        |      |
|    | • P1 writes to X; no other processor writes to X; sufficient time                                        |        |      |
|    | lapses; P2 reads X and receives value written by P1                                                      |        |      |
|    | • Two writes to the same location by two processors are seen in the                                      |        |      |

|    | <ul> <li>same order by all processors – wr</li> <li>The memory consistency mode effect of a processor is seen by other second se</li></ul> | 2                                                                       |        |      |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|--------|------|
| 17 | What is meant by a distributed-memory system?A distributed-memory system (often called a multicomputer) consist of<br>multiple independent processing nodes with local memory modules which is<br>connected by a general interconnection network. Software DSM systems<br>can be implemented in an operating system, or as a programming library and<br>can be thought of as extensions of the underlying virtual memory<br>architecture.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                         |        | BTL1 |
|    | What the difference between Message p                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | bassing vs. DSM                                                         | C409.1 | BTL1 |
|    | Message passing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Distributed shared memory                                               |        |      |
|    | Variables have to be marshalled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Variables are shared directly                                           |        |      |
| 18 | Cost of communication is obvious                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Cost of communication is invisible                                      |        |      |
|    | Processes are protected by having private address space                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Processes could cause error by altering data                            |        |      |
|    | Processes should execute at the same time                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Executing the processes may<br>happen with non-overlapping<br>lifetimes |        |      |
|    | What are the Advantages of DSM. (Appl                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | r/May 2018)                                                             | C409.1 | BTL1 |
| 19 | <ul> <li>System scalable</li> <li>Hides the message passing</li> <li>Can handle complex and large data bases without replication or sending the data to processes</li> <li>DSM is usually cheaper than using multiprocessor system</li> <li>No memory access bottleneck, as no single bus</li> <li>DSM provides large virtual memory space</li> <li>DSM programs portable as they use common DSM programming interface</li> <li>Shields programmer from sending or receive primitives</li> <li>DSM can (possibly) improve performance by speeding up data access</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                         |        |      |

| 21<br>22<br>23    | <ul> <li>What are the Disadvantages of DSM (Apr/May 2018)</li> <li>Could cause a performance penalty</li> <li>Should provide for protection against simultaneous access to shared data such as lock</li> <li>Performance of irregular problems could be difficult</li> <li>What are the Methods of achieving DSM.</li> <li>There are usually two methods of achieving distributed shared memory: <ul> <li>hardware, such as cache coherence circuits and network interfaces;</li> <li>software.We can use this method in different ways such as modifying the operating system kernel.</li> </ul> </li> <li>What is meant by Consistency models</li> <li>Memory system tries to behave based on certain rules in the system, which is called system's <i>consistency model</i>.</li> </ul> | C409.1<br>C409.1<br>C409.1 | BTL1<br>BTL1<br>BTL1 |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|----------------------|
| 22 7<br>23 N<br>i | <ul> <li>There are usually two methods of achieving distributed shared memory:</li> <li>hardware, such as cache coherence circuits and network interfaces;</li> <li>software.We can use this method in different ways such as modifying the operating system kernel.</li> </ul> What is meant by Consistency models Memory system tries to behave based on certain rules in the system, which                                                                                                                                                                                                                                                                                                                                                                                              |                            |                      |
| 23 N<br>i         | Memory system tries to behave based on certain rules in the system, which                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | C409.1                     | BTL1                 |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 1                          |                      |
| C                 | Define Vector Instruction?(Apr/May2017)<br>A vector processor or array processor is a central processing unit<br>(CPU) that implements an instruction set containing instructions that<br>operate on one-dimensional arrays of data called vectors, compared to scalar<br>processors, whose instructions operate on single data items.                                                                                                                                                                                                                                                                                                                                                                                                                                                     | C409.1                     | BTL1                 |
| 25 I<br>I<br>c    | What is meant by Snooping cache coherence? (Apr/May 2017)<br>Also referred to as a bus-snooping protocol, a protocol for<br>maintaining cache coherency in symmetric multiprocessing environments.<br>In a snooping system, all caches on the bus monitor (or snoop) the bus to<br>determine if they have a copy of the block of data that is requested on the<br>bus.                                                                                                                                                                                                                                                                                                                                                                                                                     | C409.1                     | BTL1                 |
|                   | Compare Symmetric memory architecture and distributed memory<br>architecture. (Nov/Dec 2017)SnoSymmetric memory<br>architecturedistributed memory architecture.<br>architecture                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | C409.1                     | BTL1                 |

|    | 1        | sharedmemory cache-          | distributed memory refers to a           |        |      |
|----|----------|------------------------------|------------------------------------------|--------|------|
|    | -        | coherent multiprocessor      | multiprocessor computer system in        |        |      |
|    |          | systems. The systems         | which each processor has its own         |        |      |
|    |          | communicated with            | private memory. Computational tasks      |        |      |
|    |          |                              |                                          |        |      |
|    |          | each other and with          | can only operate on local data, and if   |        |      |
|    |          | shared main memory           | remote data is required, the             |        |      |
|    |          | over a shared bus.           | computational task must                  |        |      |
|    |          |                              | communicate with one or more             |        |      |
|    |          |                              | remote processors                        |        |      |
|    | 2        | any access from any          | any access from any processor to         |        |      |
|    |          | processor to main            | main memory would have different         |        |      |
|    |          | memory would have            | latency                                  |        |      |
|    |          | equal latency                |                                          |        |      |
|    | What ar  | re multiprocessor systems    | and give their advantages? (Nov/Dec      | C409.1 | BTL1 |
|    | 2017)    | - ·                          |                                          |        |      |
|    |          |                              |                                          |        |      |
|    |          |                              | known as parallel systems or tightly     |        |      |
| 27 | -        |                              | we more than one processor in close      |        |      |
|    |          | & peripheral devices.        | er bus, the clock and sometimes          |        |      |
|    | -        | Their main advantages are    |                                          |        |      |
|    |          | Increased throughput         |                                          |        |      |
|    |          | Economy of scale             |                                          |        |      |
|    |          | Increased reliability        |                                          |        |      |
| 28 | Define C | Channel?                     |                                          | C409.1 | BTL1 |
|    |          | single logical connection    |                                          |        |      |
|    | List the | pros and cons of distribut   | ted system (Apr/May 2018)                | C409.1 | BTL1 |
|    |          | ystem scalable               |                                          |        |      |
|    |          | lides the message passing    |                                          |        |      |
|    |          |                              | large data bases without replication or  |        |      |
|    |          | ending the data to processes | s<br>1 using multiprocessor system       |        |      |
|    |          | To memory access bottlened   | • • •                                    |        |      |
| 29 |          | OSM provides large virtual   |                                          |        |      |
|    |          |                              | they use common DSM programming          |        |      |
|    |          | nterface                     | and all common Don't programming         |        |      |
|    | • S      | hields programmer from se    | anding or receive primitives             |        |      |
|    |          |                              | mance by speeding up data access         |        |      |
|    | • C      | Could cause a performance j  | penalty                                  |        |      |
|    |          |                              | on against simultaneous access to shared |        |      |
|    | d        | ata such as lock             |                                          |        |      |

|    | Performance of irregular problems could be difficult                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |        |      |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | Define the symmetric shared memory (Apr/May 2018, Nov/Dec 2018)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | C409.1 | BTL1 |
| 30 | <b>Symmetric Shared Memory Architecture</b> consists of several processors with a single physical <b>memory shared</b> by all processors through a <b>shared</b> bus                                                                                                                                                                                                                                                                                                                                                                                                       |        |      |
|    | List out the advantages of multicore CPU                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | C409.1 | BTL1 |
| 31 | <ul> <li>The largest boost in performance will likely be noticed in improved response time while running CPU intensive processes, like anti-virus scans, ripping/burning media.</li> <li>Assuming that the die can fit into the package, physically, the multi-core CPU designs require much less printed Circuit Board(PCB) space than multichip SMP designs.</li> <li>Also, a dual core processor uses slightly less power than two coupled single core processors, principally because of the decreased power required to drive signals external to the chip</li> </ul> |        |      |
|    | Define Vector Registers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | C409.1 | BTL1 |
| 32 | These are registers capable of storing a vector of operands and operating simultaneously on their contents. The vector length is fixed by the system, and can range from 4 to 128 64-bit elements. Vectorized and pipelined functional units.                                                                                                                                                                                                                                                                                                                              |        |      |
|    | Define Latency and Bandwidth                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | C409.1 | BTL1 |
| 33 | <ul> <li>The latency is the time that elapses between the source's beginning to transmit the data and the destination's starting to receive the first byte.</li> <li>The bandwidth is the rate at which the destination receives data after it has started to receive the first byte. So if the latency of an interconnect is 1 seconds and the bandwidth is b bytes per second, then the time it takes to transmit a message of n bytes is</li> <li>Message transmission time= l+n/b</li> </ul>                                                                           |        |      |
|    | List out the approaches in cache coherence                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | C409.1 | BTL1 |
| 34 | <ul><li>Snooping cache coherence</li><li>Directory-based cache coherence.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |        |      |
|    | List the steps involved in Parallel Program design                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | C409.1 | BTL1 |
| 35 | 1.Partitioning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |        |      |

|    | 2.Aggregating                                                                                                                                                                                                                                                                                                                                                                                  |        |      |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | 3.Communication                                                                                                                                                                                                                                                                                                                                                                                |        |      |
|    | 4.Mapping                                                                                                                                                                                                                                                                                                                                                                                      |        |      |
|    | Define Directory based cache coherence                                                                                                                                                                                                                                                                                                                                                         | C409.1 | BTL1 |
| 36 | It is protocols attempt to solve this problem through the use of a data<br>structure called a <b>directory</b> . The directory stores the status of each cache<br>line. Typically, this data structure is distributed; in our example, each<br>core/memory pair might be responsible for storing the part of the structure<br>that specifies the status of the cache lines in its local memory |        |      |
|    | Define Parallel Overhead                                                                                                                                                                                                                                                                                                                                                                       | C409.1 | BTL1 |
| 37 | Tparallel = Tserial/p + Toverhead.                                                                                                                                                                                                                                                                                                                                                             |        |      |
|    | Define Scalability                                                                                                                                                                                                                                                                                                                                                                             | C409.1 | BTL1 |
| 38 | The number of processes/threads that are used by the program. If we can find a corresponding rate of increase in the problem size so that the program always has efficiency E, then the program is scalable.                                                                                                                                                                                   |        |      |
|    | Define Amdhal's Law (Nov/Dec 2018)                                                                                                                                                                                                                                                                                                                                                             | C409.1 | BTL1 |
| 39 | Amdahl made an observation that's become known as <i>Amdahl's law</i> . It says, roughly, that unless virtually all of a serial program is parallelized, the possible speedup is going to be very limited—regardless of the number of cores available.                                                                                                                                         |        |      |
|    | Define Speedup and Efficiency                                                                                                                                                                                                                                                                                                                                                                  | C409.1 | BTL1 |
| 40 | The Serial run-time <i>T</i> serial and our parallel run-time <i>T</i> parallel, then the best we can hope for is <i>T</i> parallel = <i>T</i> serial/ $p$ .                                                                                                                                                                                                                                   |        |      |
|    | Parallel program has <b>linear speedup</b> . So if we define the <b>speedup</b> of a parallel program to be linear speedup has $S = p$ , which is unusual. Furthermore, as $p$ increases, we expect $S$ to become a smaller and smaller fraction of the ideal, linear speedup $p$ .                                                                                                            |        |      |

|    |                                                                                                                                    | C400.1  | DTI 1 |
|----|------------------------------------------------------------------------------------------------------------------------------------|---------|-------|
|    | How to parallelize the serial program                                                                                              | C409.1  | BTL1  |
|    | • For the first step we might identify two types of tasks: finding the                                                             |         |       |
| 41 | bin to which an element of data belongs and incrementing the                                                                       |         |       |
|    | appropriate entry in bin counts.                                                                                                   |         |       |
|    | • For the second step, there must be a communication between the                                                                   |         |       |
|    | computation of the appropriate bin and incrementing an element of                                                                  |         |       |
|    | bin counts.                                                                                                                        |         |       |
|    | List out the different distributed memory interconnects                                                                            | C409.1  | BTL1  |
| 42 |                                                                                                                                    |         |       |
| 72 | Distributed-memory interconnects are often divided into two groups:                                                                |         |       |
|    | Direct interconnects and Indirect interconnects                                                                                    |         |       |
|    | Define Direct Interconnects                                                                                                        | C409.1  | BTL1  |
|    | In a direct interconnect each switch is directly connected to processor                                                            |         |       |
| 43 | memory pair, and the switches are connected to each other.                                                                         |         |       |
|    | As before, the <i>circles</i> are <i>switches</i> , the <i>squares</i> are <i>processors</i> , and the <i>lines</i>                |         |       |
|    | are <i>bidirectional links</i> .                                                                                                   |         |       |
|    | A <i>ring</i> is superior to a simple bus since it allows multiple simultaneous                                                    |         |       |
|    | communications.                                                                                                                    | C 400 1 |       |
|    | Define Indirect Interconnects                                                                                                      | C409.1  | BTL1  |
|    | • They provide an alternative to direct interconnects. In an indirect                                                              |         |       |
| 44 | interconnect the switches may not be directly connected to a                                                                       |         |       |
| 44 | processor.                                                                                                                         |         |       |
|    | • They're often shown with unidirectional links and a collection of                                                                |         |       |
|    | processors, each of which has an outgoing and an incoming link, and                                                                |         |       |
|    | a switching network.                                                                                                               |         |       |
|    | Define Ideal Direct interconnect                                                                                                   | C409.1  | BTL1  |
| 45 | The ideal direct interconnect is a fully connected activery in which we have                                                       |         |       |
|    | The ideal direct interconnect is a <i>fully connected network</i> in which each switch is directly connected to every other switch |         |       |
|    | Define Hypercube                                                                                                                   | C409.1  | BTL1  |
|    | 2 come repercuse                                                                                                                   |         |       |
|    | • It is a highly connected direct interconnect that has been used in                                                               |         |       |
| 46 | actual systems. Hypercubes are built inductively:                                                                                  |         |       |
|    | • A one-dimensional hypercube is a fullyconnected system with two                                                                  |         |       |
|    | processors.                                                                                                                        |         |       |
|    | • A two-dimensional hypercube is built from two one-dimensional                                                                    |         |       |
|    | hypercubes by joining "corresponding" switches                                                                                     |         |       |
|    | Define Interleaved memory                                                                                                          | C409.1  | BTL1  |
| 47 | The memory system consists of multiple "banks" of memory, which can be                                                             |         |       |
|    | accessed more or less independently. After accessing one bank, there will be                                                       |         |       |
|    | accessed note of less independently. After accessing one bank, there will be                                                       | 1       | I     |

|    | a delay before it can be reassessed, but a different bank can be accessed<br>much sooner. So if the elements of a vector are distributed across multiple<br>banks, there can be little to no delay in loading/storing successive elements.<br><b>Define Strided memory</b>                             | C409.1 | BTL1 |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
| 48 | In strided memory access, the program accesses elements of a vector located<br>at fixed intervals.For example, accessing the first element, the fifth element,<br>the ninth element, and so on, would be strided access with a stride of four                                                          |        |      |
|    | Define Graphics Processor Pipeline                                                                                                                                                                                                                                                                     | C409.1 | BTL1 |
| 49 | Real-time graphics application programming interfaces, or APIs, use points, lines, and triangles to internally represent the surface of an object. They use a <b>graphics processing pipeline</b> to convert the internal representation into an array of pixels that can be sent to a computer screen |        |      |
|    | List out the two principal types of MIMD system                                                                                                                                                                                                                                                        | C409.1 | BTL1 |
| 50 | <ul><li>Shared Memory System</li><li>Distributed Memory system</li></ul>                                                                                                                                                                                                                               |        |      |

## PART B

| Q. No. | Questions                                                                                                        | СО      | Bloom's<br>Level |
|--------|------------------------------------------------------------------------------------------------------------------|---------|------------------|
|        | Explain Single core to Multi-core Architectures .                                                                | C409.1  |                  |
| 1      | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:15-20 |         | BTL5             |
|        | Explain SIMD and MIMD systems (Apr/May2017,Nov/Dec 2017)                                                         | C409.1  |                  |
| 2      |                                                                                                                  |         | BTL5             |
|        | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                          |         |                  |
|        | Kauffman/Elsevier, 2011 Page No:29-34                                                                            | ~       |                  |
|        | Explain about Interconnection networks? (Apr/May 2017)                                                           | C409.1  | BTL5             |
| 3      | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                          |         |                  |
|        | Kauffman/Elsevier, 2011 Page No:35-44                                                                            |         |                  |
|        | Explain with neat diagram Symmetric Shared Memory Architectures                                                  | C409.1  | BTL5             |
| 4      | Explain when near diagram Symmetric Shared Memory Arcinectures                                                   | 2.107.1 | 2120             |
|        | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:35-42 |         |                  |

|    | Explain with neat diagram Distributed Shared Memory Architectures                                                                                                                                                                                                                                                        | C409.1 | BTL5 |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | (Nov/Dec 2018)                                                                                                                                                                                                                                                                                                           |        |      |
| 5  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:43-45                                                                                                                                                                                                         |        |      |
|    | Explain Cache coherence in Symmetric Shared and Distributed Shared<br>Memory Architectures                                                                                                                                                                                                                               | C409.1 | BTL5 |
| 6  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No: 50-55                                                                                                                                                                                                        |        |      |
|    | Explain the performance issues of multicore processor(Nov/Dec 2017)                                                                                                                                                                                                                                                      | C409.1 | BTL5 |
| 7  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:58-64                                                                                                                                                                                                         |        |      |
| 8  | Define cache coherence problem. What are the 2 main approaches to<br>cache coherence? Describe working of snooping cache coherence and<br>explain describe directory based coherence. (Nov/Dec 2017)<br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:43-45 | C409.1 | BTL5 |
|    | Explain parallel program design (Apr/May 2017,Nov/Dec 2018)                                                                                                                                                                                                                                                              | C409.1 | BTL5 |
| 9  | <ol> <li>Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br/>Kauffman/Elsevier, 2011 Page No:65-69</li> </ol>                                                                                                                                                                                       |        |      |
|    | State and explain Amdahl's law Outline the steps in designing and                                                                                                                                                                                                                                                        | C409.1 | BTL5 |
| 10 | building parallel program. Give example (Apr/May 2018)                                                                                                                                                                                                                                                                   |        |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011                                                                                                                                                                                                                           |        |      |
|    | Elaborate the classification of computer architecture in parallel                                                                                                                                                                                                                                                        | C409.1 | BTL5 |
| 11 | computing system (Apr/May 2018)                                                                                                                                                                                                                                                                                          |        |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011                                                                                                                                                                                                                       |        |      |
| 12 | Explain Directory Based cache coherence protocol                                                                                                                                                                                                                                                                         | C409.1 | BTL4 |
| 12 | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011                                                                                                                                                                                                                           |        |      |

|    | Generalize the snooping protocol briefly                                | C409.1 | BTL6 |
|----|-------------------------------------------------------------------------|--------|------|
| 13 |                                                                         |        |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    | Summarize the Parallelizing the serial program                          | C409.1 | BTL5 |
| 14 |                                                                         |        |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    | Explain the Shared memory interconnect                                  | C409.1 | BTL3 |
| 15 |                                                                         |        |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    | Highlight the limitations of single core processors and outline how     | C409.1 | BTL3 |
|    | multicore architecture overcome the limitations                         |        |      |
| 16 |                                                                         |        |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    |                                                                         |        |      |
|    |                                                                         |        |      |

#### UNIT II PARALLEL PROGRAM CHALLENGES

Performance – Scalability – Synchronization and data sharing – Data races – Synchronization primitives (mutexes, locks, semaphores, barriers) – deadlocks and livelocks – communication between threads (condition variables, signals, message queues and pipes).

|        |                                                                                                                                        |        | Bloom |
|--------|----------------------------------------------------------------------------------------------------------------------------------------|--------|-------|
| Q. No. | Questions                                                                                                                              | CO     | 's    |
|        |                                                                                                                                        |        | Level |
| 4      | What is data race?                                                                                                                     | C409.2 | BTL1  |
| 1      | A data race occurs when multiple threads use the same data item and one or                                                             |        |       |
|        | more of those threads are updating it.                                                                                                 |        |       |
| 2      | How to avoid data races .                                                                                                              | C409.2 | BTL1  |
| Z      | One way to avoid data race is by utilizing proper synchronization                                                                      |        |       |
|        | between threads.                                                                                                                       |        |       |
|        | Hoe to Avoid Data Races.                                                                                                               | C409.2 | BTL1  |
|        | Although it can be hard to identify data races, avoiding them can be                                                                   |        |       |
| 3      | very simple: Make sure that only one thread can update the variable at a                                                               |        |       |
|        | time. The easiest way to do this is to place a <i>synchronization lock</i> around all                                                  |        |       |
|        | accesses to that variable and ensure that before referencing the variable, the                                                         |        |       |
|        | thread must acquire the lock.                                                                                                          |        |       |
|        | What is the use of Synchronization Primitives? List out the various                                                                    | C409.2 | BTL1  |
|        | synchronization primitive in parallel programming (Nov/Dec 2018)                                                                       |        |       |
|        | Synchronization is used to coordinate the activity of multiple                                                                         |        |       |
|        | threads. There are various situations where it is necessary; this might be to                                                          |        |       |
| 4      | ensure that shared resources are not accessed by multiple threads                                                                      |        |       |
|        | simultaneously or that all work on those resources is complete before new                                                              |        |       |
|        | work starts.                                                                                                                           |        |       |
|        | Process Synchronization                                                                                                                |        |       |
|        | Memory Synchronization                                                                                                                 |        |       |
|        | Thread Synchronization                                                                                                                 | C409.2 | BTL1  |
|        | What is the simplest form of synchronization?                                                                                          | C409.2 | DILI  |
| 5      | The simplest form of synchronization is a mutually exclusive (mutar) lock. Only one thread at a time can acquire a mutar lock, so they |        |       |
|        | ( <i>mutex</i> ) lock. Only one thread at a time can acquire a mutex lock, so they                                                     |        |       |
|        | can be placed around a data structure to ensure that the data structure is modified by only one thread at a time.                      |        |       |
|        | How to Place Mutex Locks Around Accesses to Variables.                                                                                 | C409.2 | BTL1  |
| 6      | int counter;                                                                                                                           | 0409.2 | DILI  |
|        | int counter,                                                                                                                           |        |       |

mutex\_lock mutex;

## PART A

|    | void Increment()                                                                                                                                     |         |      |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------|---------|------|
|    |                                                                                                                                                      |         |      |
|    | acquire( &mutex );                                                                                                                                   |         |      |
|    | counter++;                                                                                                                                           |         |      |
|    | release( &mutex );}                                                                                                                                  |         |      |
|    | void Decrement()                                                                                                                                     |         |      |
|    |                                                                                                                                                      |         |      |
|    | acquire( &mutex );                                                                                                                                   |         |      |
|    | counter;                                                                                                                                             |         |      |
|    | release( &mutex );}                                                                                                                                  |         |      |
|    | What is meant by <i>contended</i> mutex.                                                                                                             | C409.2  | BTL1 |
| 7  | If multiple threads are attempting to acquire the same mutex at the                                                                                  |         |      |
|    | same time, then only one thread will succeed, and the other threads will                                                                             |         |      |
|    | have to wait. This situation is known as a <i>contended</i> mutex.                                                                                   |         |      |
|    | What is critical region.                                                                                                                             | C409.2  | BTL1 |
| 8  | The region of code between the acquisition and release of a mutex                                                                                    |         |      |
|    | lock is called a <i>critical section</i> , or <i>critical region</i> . Code in this region will be                                                   |         |      |
|    | executed by only one thread at a time.                                                                                                               |         |      |
|    | Develop the code for Placing a Mutex Lock Around a Region of Code                                                                                    | C409.2  |      |
|    |                                                                                                                                                      |         |      |
|    | <pre>void * threadSafeMalloc( size_t size )</pre>                                                                                                    |         |      |
| 9  | {                                                                                                                                                    |         | BTL6 |
| 9  | acquire( &mallocMutex );                                                                                                                             |         | DILO |
|    | <pre>void * memory = malloc( size );</pre>                                                                                                           |         |      |
|    | release( &mallocMutex );                                                                                                                             |         |      |
|    | return memory;                                                                                                                                       |         |      |
|    | }                                                                                                                                                    | ~       |      |
| 10 | What is meant by Spin Locks.                                                                                                                         | C409.2  | BTL1 |
|    | Spin locks are essentially mutex locks. The thread waiting to acquire a spin                                                                         |         | 2121 |
|    | lock will keep trying to acquire the lock without sleeping                                                                                           | G 400 Q |      |
|    | Compare spin lock and mutex lock. (Apr/May 2018)                                                                                                     | C409.2  |      |
| 11 | The difference between a mutex lock and a spin lock is that a thread                                                                                 |         | BTL2 |
|    | waiting to acquire a spin lock will keep trying to acquire the lock without                                                                          |         |      |
|    | sleeping .In comparison; a mutex lock may sleep if it is unable to acquire                                                                           |         |      |
|    | the lock.                                                                                                                                            | C409.2  | BTL1 |
| 10 | Write the advantage of spin locks.                                                                                                                   | 0409.2  | DILI |
| 12 | The advantage of using spin locks is that they will acquire the lock as soon<br>as it is released, whereas a mutex lock will need to be woken by the |         |      |
|    |                                                                                                                                                      |         |      |
|    | operating system before it can get the lock<br>What is the disadvantage of spin locks                                                                | C409.2  | BTL1 |
|    | What is the disadvantage of spin locks.<br>The disadvantage is that a spin lock will spin on a virtual CPU                                           | 0709.2  |      |
| 10 | monopolizing that resource. In comparison, a mutex lock will sleep and free                                                                          |         |      |
| 13 | the virtual CPU for another thread to use.                                                                                                           |         |      |
|    |                                                                                                                                                      |         |      |
|    |                                                                                                                                                      |         |      |
|    |                                                                                                                                                      |         |      |

| 14 | What is Semaphores?<br>Semaphores are counters that can be either incremented or<br>decremented. They can be used in situations where there is a finite limit to a<br>resource and a mechanism is needed to impose that limit. Semaphores will<br>also signal or wake up threads that are waiting on them to use available                                                                                                                               | C409.2  | BTL1   |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|--------|
|    | resources                                                                                                                                                                                                                                                                                                                                                                                                                                                | G 400 A | DITY 4 |
| 15 | What is an example for semaphore.<br>An example might be a buffer that has a fixed size. Every time an element<br>is added to a buffer, the number of available positions is decreased. Every<br>time an element is removed, the number available is increased                                                                                                                                                                                           | C409.2  | BTL1   |
|    | What is wait and release in semaphore.                                                                                                                                                                                                                                                                                                                                                                                                                   | C409.2  | BTL1   |
| 16 | the method that acquires a semaphore might be called <i>wait</i> , <i>down</i> , or <i>acquire</i> , and the method to release a semaphore might be called <i>post,up</i> , <i>signal</i> , or <i>release</i> . When the semaphore no longer has resources available, the threads requesting resources will block until resources are available.                                                                                                         |         |        |
|    | Define readerswriter lock.                                                                                                                                                                                                                                                                                                                                                                                                                               | C409.2  | BTL1   |
| 17 | A <i>readerswriter lock</i> (or <i>multiple-reader lock</i> ) allows many threads to read the shared data but can then lock the readers threads out to allow one thread to acquire a writer lock to modify the data.                                                                                                                                                                                                                                     |         |        |
| 18 | <pre>Show an example for Readers-Writer Lock.<br/>int readData( int cell1, int cell2 ) {     acquireReaderLock( &amp;lock );     int result = data[cell] + data[cell2];     releaseReaderLock( &amp;lock );     return result;     }     void writeData( int cell1, int cell2, int value )     {         acquireWriterLock( &amp;lock );         data[cell1] += value;         data[cell2] -= value;         releaseWriterLock( &amp;lock);     } </pre> | C409.2  | BTL1   |
| 19 | <ul> <li>What are the use of Barriers.</li> <li>There are situations where a number of threads have to all complete their work before any of the threads can start on the next task. In these situations, it is useful to have a barrier where the threads will wait until all are present</li> </ul>                                                                                                                                                    | C409.2  | BTL1   |
|    |                                                                                                                                                                                                                                                                                                                                                                                                                                                          |         |        |
| 20 | Show One common example of using a barrier.<br>One common example of using a barrier arises when there is a<br>dependence between different sections of code. For example, suppose a<br>number of threads compute the values stored in a matrix. The variable total<br>needs to be calculated using the values stored in the matrix. A barrier can be<br>used to ensure that all the threads complete their computation of the matrix                    | C409.2  | BTL1   |

| 21 | <ul> <li>before the variable total is calculated .ZThe following example shows a situation using a barrier to separate the calculation of a variable from its use.</li> <li>Compute_values_held_in_matrix();</li> <li>Barrier();</li> <li>total = Calculate_value_from_matrix();</li> <li>What is data sharing. (Apr/May 2017)</li> <li>Sharing data between multiple threads is called data sharing.</li> <li>What are the difference between deadlock and livelock. (Apr/May 2017)(Nov/Dec 2018)</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | C409.2<br>C409.2 | BTL1<br>BTL1 |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|--------------|
| 22 | The <i>deadlock occurs</i> where two or more threads cannot make<br>progress because the resources that they needare held by the other threads.<br>Example:Suppose two threads need to acquire mutex locks A and B to<br>complete some task. If thread 1 has already acquired lock A and thread 2<br>has already acquired lock B, then A cannot make forward progress because<br>it is waiting for lock B, and thread 2 cannot make progress because it is<br>waiting for lock A. The two threads are <i>deadlocked</i> .<br>Listing 4.13 Two Threads in a Deadlock                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                  |              |
|    | <pre>Thread 1 Thread 2 void update1() void update2() {     acquire(A);     acquire(B); &lt;&lt;&lt; Thread 1 acquire(B);     acquire(B); &lt;&lt;&lt; Thread 1 acquire(A); &lt;&lt;&lt; Thread 2     waits here variable1++;     release(B);     release(A); } </pre>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                  |              |
| 23 | <ul> <li>What are conditions under which a deadlock situation may arise?</li> <li>(Nov/Dec 2017)</li> <li>Mutual Exclusion: At least one resource is held in a non-sharable mode that is only one process at a time can use the resource. If another process requests that resource, the requesting process must be delayed until the resource has been released.</li> <li>Hold and Wait: There must exist a process that is holding at least one resource and is waiting to acquire additional resources that are currently being held by other processes.</li> <li>No Preemption: Resouces cannot be preempted; that is, a resource can only be released voluntarily by the process holding it, after the process has completed its task.</li> <li>Circular Wait: There must exist a set {p<sub>0</sub>, p<sub>1</sub>,,p<sub>n</sub>} of waiting processes such that p<sub>0</sub> is waiting for a resource which is held by p<sub>1</sub>, p<sub>1</sub> is waiting for a resource which is held by p<sub>n</sub> and p<sub>n</sub> is waiting for a resource which is held by p<sub>0</sub>.</li> </ul> | C409.2           | BTL1         |

|     | <b>Define thread. mention the uses of swapping. (Nov/Dec 2017)</b><br>A thread is the smallest unit of processing that can be performed in an OS. | C409.2  |      |
|-----|---------------------------------------------------------------------------------------------------------------------------------------------------|---------|------|
| 24  | In most modern operating systems, a thread exists within a process - that is,                                                                     |         | BTL1 |
|     | a single process may contain multiple threads                                                                                                     |         |      |
|     | Define deadlock.                                                                                                                                  | C409.2  | BTL1 |
|     | <i>Deadlock</i> is the situation where two or more threads cannot make progress                                                                   |         |      |
|     | because the resources that they need are held by the other threads. It is                                                                         |         |      |
|     | easiest to explain this with an example. Suppose two threads need to acquire                                                                      |         |      |
| 25  | mutex locks A and B to complete some task. If thread 1 has already                                                                                |         |      |
| 23  | acquired lock A and thread 2 has already acquired lock B, then A cannot                                                                           |         |      |
|     | make forward progress because it is waiting for lock B, and thread 2 cannot                                                                       |         |      |
|     | make progress because it is waiting for lock A. The two threads are                                                                               |         |      |
|     | deadlocked                                                                                                                                        |         |      |
|     | How to communicate multiple threads.                                                                                                              | C409.2  | BTL1 |
|     | The easiest way for multiple threads to communicate is through                                                                                    |         |      |
|     | memory. If two threads can access the same memory location, the cost of                                                                           |         |      |
|     | that access is little more than the memory latency of the system. Of course,                                                                      |         |      |
| 26  | memory accesses still need to be controlled to ensure that only one thread                                                                        |         |      |
|     | writes to the same memory location at a time. A multithreaded application                                                                         |         |      |
|     | will share memory between the threads by default, so this can be a very                                                                           |         |      |
|     | low-cost approach. The only things that are not shared between threads are                                                                        |         |      |
|     | variables on the stack of each thread (local variables) and thread-local variables.                                                               |         |      |
|     | Variables.           Illustrate an example which Use Multiple Barriers.                                                                           | C409.2  |      |
|     | Compute_values_held_in_matrisx();                                                                                                                 |         |      |
| 27  | Barrier();                                                                                                                                        |         | BTL2 |
|     | total = Calculate_value_from_matrix();                                                                                                            |         |      |
|     | Barrier();                                                                                                                                        |         |      |
|     | Perform_next_calculation( total );                                                                                                                |         |      |
| 28  | Define live lock.                                                                                                                                 | C409.2  | BTL1 |
| 20  | A <i>livelock</i> traps threads in an unending loop releasing and acquiring locks.                                                                |         |      |
|     | Livelocks can be caused by code to back out of deadlocks.                                                                                         |         |      |
|     | What is An atomic operation                                                                                                                       | C409.2  |      |
| 29. | An <i>atomic operation</i> is one that will either successfully complete or                                                                       |         | BTL1 |
|     | fail; it is not possible for the operation to either result in a "bad" value or                                                                   |         |      |
|     | allow other threads on the system to observe a transient value.                                                                                   | G 400 2 |      |
| 30  | Write down the performance metrics (Apr/May 2018)                                                                                                 | C409.2  | BTL1 |

|     | Define Massage Queue                                                                                                                             | C409.2  |      |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------|---------|------|
|     | Define Message Queue                                                                                                                             | C409.2  |      |
| 31  | A message queue is a structure that can be shared between multiple                                                                               |         | BTL1 |
|     | processes. Messages can be placed into the queue and will be removed in<br>the same order in which they were odded. Constructing a message group |         |      |
|     | the same order in which they were added. Constructing a message queue                                                                            |         |      |
|     | looks rather like constructing a shared memory segment.                                                                                          | C409.2  |      |
|     | Define Named Pipes                                                                                                                               | C409.2  |      |
| 32  | UNIX uses pipes to pass data from one process to another. For example, the                                                                       |         | BTL1 |
|     | output from the command ls, which lists all the files in a directory, could be                                                                   |         |      |
|     | piped into the wc command, which counts the number of lines, words, and                                                                          |         |      |
|     | characters in the input                                                                                                                          | C409.2  |      |
|     | Mention the mechanism associated with Named Pipes                                                                                                | C409.2  |      |
|     | Setting Up and Writing into a Pipe Make Pipe( Descriptor ); ID = Open                                                                            |         |      |
| 33  | Pipe(Descriptor); Write Pipe(ID, Message, sizeof(Message)); Close                                                                                |         | BTL1 |
|     | Pipe(ID); Delete Pipe(Descriptor);                                                                                                               |         |      |
|     | Opening an Existing Pipe to Receive Messages ID=Open Pipe( Descriptor );                                                                         |         |      |
|     | Read Pipe( ID, buffer, sizeof(buffer) ); Close Pipe( ID );                                                                                       | C409.2  |      |
|     | How to create and Place Message Queues                                                                                                           | C409.2  |      |
|     | Creating and Placing Messages into a Queue ID = Open Message Queue                                                                               |         |      |
|     | Queue(Descriptor ); Put Message in Queue( ID, Message ); Close Message                                                                           |         |      |
| ~ . | Queue(ID);<br>Delate Massage Queue(Description))                                                                                                 |         |      |
| 34  | Delete Message Queue( Description );                                                                                                             |         | BTL1 |
|     | Using the descriptor for an existing message queue enables two processes to                                                                      |         |      |
|     | communicate by sending and receiving messages through the queue.<br>Opening a Queue and Receiving Messages ID=Open Message Queue                 |         |      |
|     | ID(Descriptor);                                                                                                                                  |         |      |
|     | Message=Remove Message from Queue(ID); Close Message Queue(ID);                                                                                  |         |      |
|     | What is the fundamental way to share access to resources between                                                                                 | C409.2  |      |
| 35  | threads                                                                                                                                          | 0.107.2 | BTL1 |
| 33  | Deadlock                                                                                                                                         |         | DILI |
|     | Livelock                                                                                                                                         |         |      |
|     | Give an example of critical regions                                                                                                              | C409.2  |      |
|     | An operating system does not have an implementation of malloc() that is                                                                          | 0.107.2 |      |
| 36  | thread-safe, or safe for multiple threads to call at the same time. One way to                                                                   |         | BTL1 |
|     | fix this is to place the call to malloc() in a critical section by surrounding it                                                                |         |      |
|     | with a mutex lock                                                                                                                                |         |      |
|     | List out the issues in shared caches                                                                                                             | C409.2  |      |
| 37  | Capacity misses                                                                                                                                  |         | BTL1 |
|     | Conflict misses                                                                                                                                  |         |      |
|     | Define False sharing                                                                                                                             | C409.2  |      |
|     | <i>False sharing</i> is the situation where multiple threads are accessing items of                                                              |         |      |
| 38  | data held on a single cache line.                                                                                                                |         | BTL1 |
|     | Although the threads are all using separate items of data, the cache line                                                                        |         | ~    |
|     | itself is shared between them so only a single thread can write to it at any                                                                     |         |      |
|     | one time.                                                                                                                                        |         |      |
| 20  | List out the Memory Bandwidth Measured on a System with Four                                                                                     | C409.2  | BTL1 |
| 39  |                                                                                                                                                  |         |      |

| Threads Time 15.238317 s Bandwidth 2.57 GB/s<br>Threads Time 24.580981 s Bandwidth 2.39 GB/s       C409.2         What are the measures to be taken when bandwidth size reduces<br>The threads are interfering on the processor.       C409.2         • A second interaction effect is if the threads start interfering in the<br>caches, such as multiple threads attempting to load data to the same<br>set of cache lines.       C409.2         • One other effect is the behavior of memory chips when they become<br>saturated. The chips start experiencing queuing latencies where the<br>response time for each request increases. Memory chips are arranged<br>in banks.       BTL1         • Accessing a particular address will lead to a request to a particular<br>bank of memory. Each bank needs a gap between returning two<br>responses. If multiple threads happen to hit the same bank, then the<br>response. If multiple threads happen to hit he same bank, then the<br>response. If multiple threads happen to hit he same bank, then the<br>response. If multiple threads happen to hit he same bank, then the<br>response. If multiple threads happen to hit he same bank, then the<br>response. If multiple threads happen to hit he same bank, then the<br>response time becomes governed by the rate at which the bank can<br>returm memory       C409.2         #include <stdio.h><br/>#include <stdio.< th=""><th></th><th>Threads Time 7.437563 s Bandwidth 2.63 GB/s</th><th></th><th>T</th></stdio.<></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |     | Threads Time 7.437563 s Bandwidth 2.63 GB/s                             |         | T     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-------------------------------------------------------------------------|---------|-------|
| Threads Time 24.580981 s Bandwidth 2.39 GB/s<br>Threads Time 37.457352 s Bandwidth 2.09 GB/s       C409.2         What are the measures to be taken when bandwidth size reduces<br>The threads are interfering on the processor.       C409.2         40       A second interaction effect is if the threads start interfering in the<br>caches, such as multiple threads attempting to load data to the same<br>set of cache lines.       BTL1         40       One other effect is the behavior of memory chips when they become<br>saturated. The chips start experiencing queuing latencies where the<br>response time for each request increases. Memory chips are arranged<br>in banks.       BTL1         40       Accessing a particular address will lead to a request to a particular<br>bank of memory. Each bank needs a gap between returning two<br>responses. If multiple threads happen to hit the same bank, then the<br>response time becomes governed by the rate at which the bank can<br>return memory       BTL1         41       Write a code to measure memory bandwidth using Memset<br>#include <strings.h><br/>#include <s< td=""><td></td><td></td><td></td><td></td></s<></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |     |                                                                         |         |       |
| Threads Time 37.457352 s Bandwidth 2.09 GB/s       C409.2         What are the measures to be taken when bandwidth size reduces       C409.2         The threads are interfering on the processor.       • A second interaction effect is if the threads start interfering in the caches, such as multiple threads attempting to load data to the same set of cache lines.       • One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.       • Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory       E409.2         Write a code to measure memory bandwidth using Memset       #include <stdib.h>         #include <stdib.h>       #include <stdib.h< td="">         #include <stdib.h>       #include <stdib.h< td="">         #include <stdib.h< td="">       #include <stdib.h< td="">         #include <stdib.h< td="">       #include <stdib.h< td="">         #includ</stdib.h<></stdib.h<></stdib.h<></stdib.h<></stdib.h<></stdib.h></stdib.h<></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |     |                                                                         |         |       |
| What are the measures to be taken when bandwidth size reduces       C409.2         The threads are interfering on the processor.       • A second interaction effect is if the threads start interfering in the caches, such as multiple threads attempting to load data to the same set of cache lines.       • One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.       • Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory       BTL1         Write a code to measure memory bandwidth using Memset       C409.2         #include <stdiio.h>       #include <stdiio.h>         winclud</stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h></stdiio.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |     |                                                                         |         |       |
| 40       The threads are interfering on the processor.       • A second interaction effect is if the threads start interfering in the caches, such as multiple threads attempting to load data to the same set of cache lines.       • One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.       • Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory       BTL1 <b>Write a code to measure memory bandwidth using Memset</b> #include <stdib.h>       #include <stdib.h>         #include <stdib.h>       #include <stdib.h>         #include <stdib.h>       #include <stdib.h>         #include <stdib.ho< td="">       #include <stdib.h>         #include <stdib.ho< td="">       #include <stdib.ho< td="">         #include</stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.ho<></stdib.h></stdib.ho<></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |     |                                                                         | C 400 Q |       |
| <ul> <li>A second interaction effect is if the threads start interfering in the caches, such as multiple threads attempting to load data to the same set of cache lines.</li> <li>One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <stifus.h></stifus.h></li> <li>#include <stifus.h></stifus.h></li> <li>#in</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |     |                                                                         | C409.2  |       |
| <ul> <li>acaches, such as multiple threads attempting to load data to the same set of cache lines.</li> <li>One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <sttillb.h></sttillb.h></li> <li>#include <sttillb.h></sttillb.h></li> <li></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |     | •                                                                       |         |       |
| <ul> <li>set of cache lines.</li> <li>One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strings.h></strings.h></li> <li>#include <stylime.h></stylime.h></li> <li>#define BLOCKSIZE 1024*1025 int nthreads = 8; char * memory; double now() {</li> <li>{</li> <li>struct timeval time; gettimeofday( &amp;time, 0 ); return (double)time.tv_usec / 1000000.0; } void *experiment( void *id ) {</li> <li>unsigned int seed = 0; int count = 20000; for (int i=0; i<count; )="" i++="" li="" {<=""> <li>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE ); }</li> </count;></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |     | •                                                                       |         |       |
| <ul> <li>One other effect is the behavior of memory chips when they become saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <stdio.h></stdio.h></li> <li>#include <stdio.h></stdio.h></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |     |                                                                         |         |       |
| <ul> <li>40 saturated. The chips start experiencing queuing latencies where the response time for each request increases. Memory chips are arranged in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <stdilb.h></stdilb.h></li> <li>#include <stdilb.h></stdilb.h></li> <li>#include <stdilb.h></stdilb.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strings.h></strings.h></li> <li>#include <string.h></string.h></li> <li>#define BLOCKSIZE 1024*1025</li> <li>int nthreads = 8;</li> <li>char * memory;</li> <li>double now() {</li> <li>{</li> <li>struct timeval time;</li> <li>gettimeofday( &amp;time, 0 );</li> <li>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;</li> <li>void *experiment( void *id ) {</li> <li>{</li> <li>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );</li> <li>How Bandwidth share between cores</li> <li>Bandwidth is another resource shared between threads. The bandwidth capacity of a system depends on the design of the processor and the memory</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |     | set of cache lines.                                                     |         |       |
| 41       struct timeval time;<br>gettimeofday( &time, 0 );<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment (void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for(int i=0; i <count; )<br="" i++="">{<br/>How Bandwidth share between cores<br/>Bandwidth is sanother resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memory       C409.2         42       How Bandwidth share between cores       C409.2</count;>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |     | • One other effect is the behavior of memory chips when they become     |         |       |
| <ul> <li>in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strinde <strings.h=""></strinde></li> <li>#include <strings.h></strings.h></li> <li></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 40  | saturated. The chips start experiencing queuing latencies where the     |         | BTL1  |
| <ul> <li>in banks.</li> <li>Accessing a particular address will lead to a request to a particular bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory</li> <li>Write a code to measure memory bandwidth using Memset #include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <stdib.h></stdib.h></li> <li>#include <strings.h></strings.h></li> <li>#include <strinde <strings.h=""></strinde></li> <li>#include <strings.h></strings.h></li> <li></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |     | response time for each request increases. Memory chips are arranged     |         |       |
| bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the responses time becomes governed by the rate at which the bank can return memory       C409.2         Write a code to measure memory bandwidth using Memset       C409.2         #include <stdio.h>       #include <stdio.h>         #include <stdis.h>       #include <stdio.h>         #include <stdis.h>       #include <stdis.h>         #include <storing.h>       #include <storing.h>         #define BLOCKSIZE 1024*1025       int nthreads = 8;         char * memory;       double now() {       {         {       struct timeval time;       gettimeofday( &amp;time, 0 );       BTL1         gettimeofday( &amp;time, 0 );       int count = 20000;       for( int i=0; i<count; )="" i++="" td="" {<="">       #incount = 20000;         for( int i=0; i<count; )="" i++="" td="" {<="">       #memeset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKS</count;></count;></storing.h></storing.h></stdis.h></stdis.h></stdio.h></stdis.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |     |                                                                         |         |       |
| bank of memory. Each bank needs a gap between returning two responses. If multiple threads happen to hit the same bank, then the responses time becomes governed by the rate at which the bank can return memory       C409.2         Write a code to measure memory bandwidth using Memset       C409.2         #include <stdio.h>       #include <stdio.h>         #include <stdis.h>       #include <stdio.h>         #include <stdis.h>       #include <stdis.h>         #include <storing.h>       #include <storing.h>         #define BLOCKSIZE 1024*1025       int nthreads = 8;         char * memory;       double now() {       {         {       struct timeval time;       gettimeofday( &amp;time, 0 );       BTL1         gettimeofday( &amp;time, 0 );       int count = 20000;       for( int i=0; i<count; )="" i++="" td="" {<="">       #incount = 20000;         for( int i=0; i<count; )="" i++="" td="" {<="">       #memeset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKS</count;></count;></storing.h></storing.h></stdis.h></stdis.h></stdio.h></stdis.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |     | • Accessing a particular address will lead to a request to a particular |         |       |
| responses. If multiple threads happen to hit the same bank, then the response time becomes governed by the rate at which the bank can return memory       C409.2         Write a code to measure memory bandwidth using Memset       C409.2         #include <stdib.h>       #include <strings.h>         #include <strings.h>       #include <strings.h>         #include <strings.h< td="">       BTL1         gettimeofday( &amp; time, 0 );       #include <strings.h< td="">      &lt;</strings.h<></strings.h<></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></stdib.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |     |                                                                         |         |       |
| response time becomes governed by the rate at which the bank can<br>return memoryC409.2Write a code to measure memory bandwidth using Memset<br>#include <stdio.h><br/>#include <stdio.h><br/>#include <stdib.h><br/>#include <strings.h><br/>#include <str< td=""><td></td><td></td><td></td><td></td></str<></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></stdib.h></stdio.h></stdio.h> |     |                                                                         |         |       |
| return memoryWrite a code to measure memory bandwidth using Memset<br>#include <stdio.h><br/>#include <stdio.h><br>#include <stdio.h< td="">BTL141#unsign</stdio.h<></br></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h></stdio.h>                                                                                                                                           |     |                                                                         |         |       |
| Write a code to measure memory bandwidth using Memset       C409.2         #include <stdib.h>       #include <stdib.h>         #include <strings.h>       #include <strings.h>         #define BLOCKSIZE 1024*1025       int nthreads = 8;         char * memory;       double now()         {       {         gettimeofday(&amp;time, 0);       return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;         }       void *experiment(void *id )         {       unsigned int seed = 0;         int count = 20000;       for( int i=0; i<count; )<="" i++="" td="">         {       memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );         }       How Bandwidth share between cores         Bandwidth is another resource shared between threads. The bandwidth capacity of a system depends on the des</count;></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></strings.h></stdib.h></stdib.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |     |                                                                         |         |       |
| #include <stdio.h><br/>#include <stdib.h><br/>#include <stdib.h< td="">BTL141#include <stdib.h< td="">#include <stdib.h< td="">#include <stdib.h< td="">#include <stdib.h< td="">#include <stdib.h< td="">41#include <br/>#include <br/>#include <br/>#include <br/>#include <br/>#in</stdib.h<></stdib.h<></stdib.h<></stdib.h<></stdib.h<></stdib.h<></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdib.h></stdio.h>                                                                                                                                  |     |                                                                         | C409.2  |       |
| #include <stdlib.h><br/>#include <strings.h><br/>#include <strings.h><br/>#include <strings.h><br/>#include <sys time.h=""><br/>#define BLOCKSIZE 1024*1025<br/>int nthreads = 8;<br/>char * memory;<br/>double now()<br/>{<br/>struct timeval time;<br/>gettimeofday( &amp;time, 0 );<br/>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br/>}<br/>void *experiment( void *id )<br/>{<br/>unsigned int seed = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<br="" i++=""></count;>{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>Bandwidth share between coresC409.242How Bandwidth share between cores<br/>Bandwidth is another resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memoryBTL1</sys></strings.h></strings.h></strings.h></stdlib.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |     |                                                                         | 0.107.2 |       |
| #include <strings.h><br/>#include <phread.h><br/>#include <sys time.h=""><br/>#define BLOCKSIZE 1024*1025<br/>int nthreads = 8;<br/>char * memory;<br/>double now()<br/>{BTL141struct timeval time;<br/>gettimeofday( &amp;time, 0 );<br/>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br/>}<br/>void *experiment( void *id )<br/>{<br/>unsigned int seed = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<br="" i++=""></count;>{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}BTL142How Bandwidth share between cores<br/>Bandwidth is another resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memoryC409.2<br/>BTL1</sys></phread.h></strings.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |     |                                                                         |         |       |
| #include <pthread.h><br/>#include <sys time.h=""><br/>#define BLOCKSIZE 1024*1025<br/>int nthreads = 8;<br/>char * memory;<br/>double now()<br/>{BTL141struct timeval time;<br/>gettimeofday( &amp;time, 0 );<br/>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br/>}<br/>void *experiment( void *id )<br/>{<br/>unsigned int seed = 0;<br/>int count = 20000;<br/>for(int i=0; i<count; )<br="" i++=""></count;>{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}BTL142How Bandwidth share between cores<br/>Bandwidth is another resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memoryC409.2</sys></pthread.h>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |     |                                                                         |         |       |
| #include <sys time.h=""><br/>#define BLOCKSIZE 1024*1025<br/>int nthreads = 8;<br/>char * memory;<br/>double now()<br/>{<br/>struct timeval time;<br/>gettimeofday( &amp;time, 0 );<br/>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br/>}<br/>void *experiment( void *id )<br/>{<br/>unsigned int seed = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<br="" i++=""></count;>{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}BTL142How Bandwidth share between cores<br/>Bandwidth is another resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memoryC409.2</sys>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |     | -                                                                       |         |       |
| #define BLOCKSIZE 1024*1025<br>int nthreads = 8;<br>char * memory;<br>double now()<br>{BTL141struct timeval time;<br>gettimeofday( &time, 0 );<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment( void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}BTL142How Bandwidth share between cores<br>Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryC409.2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |     | 1                                                                       |         |       |
| 41int nthreads = 8;<br>char * memory;<br>double now()<br>{BTL141struct timeval time;<br>gettimeofday( &time, 0 );<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment( void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}BTL142How Bandwidth share between cores<br>Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryC409.2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |     |                                                                         |         |       |
| 41char * memory;<br>double now()<br>{<br>struct timeval time;<br>gettimeofday( & time, 0 );<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment( void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( & memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}BTL142How Bandwidth share between cores<br>Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryC409.2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |     |                                                                         |         |       |
| 41double now()<br>{<br>struct timeval time;<br>gettimeofday(&time, 0);<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment( void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}BTL142How Bandwidth share between cores<br>Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryC409.242BTL1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |     | ,                                                                       |         |       |
| 41          {         {         {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |     |                                                                         |         |       |
| gettimeofday( &time, 0 );<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment( void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for( int i=0; i <count; )<br="" i++="">{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}       Image: Count = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<br="" i++="">{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}       Example 1 = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<="" i++="" td="">         42       How Bandwidth share between cores<br/>Bandwidth is another resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memory       Example 2<br/>BTL1</count;></count;></count;>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |     |                                                                         |         |       |
| gettimeofday( &time, 0 );<br>return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;<br>}<br>void *experiment( void *id )<br>{<br>unsigned int seed = 0;<br>int count = 20000;<br>for( int i=0; i <count; )<br="" i++="">{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}       Image: Count = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<br="" i++="">{<br/>memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br/>}       Example 1 = 0;<br/>int count = 20000;<br/>for( int i=0; i<count; )<="" i++="" td="">         42       How Bandwidth share between cores<br/>Bandwidth is another resource shared between threads. The bandwidth<br/>capacity of a system depends on the design of the processor and the memory       Example 2<br/>BTL1</count;></count;></count;>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 4.1 | l<br>struct timoval timo:                                               |         | DTI 1 |
| return (double)time.tv_sec + (double)time.tv_usec / 1000000.0;                 void *experiment( void *id )                 unsigned int seed = 0;                 int count = 20000;       for( int i=0; i <count; )<="" i++="" td="">         for( int i=0; i<count; )<="" i++="" td="">                 memset( &amp;memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );       C409.2         42       Bandwidth share between cores       C409.2         apacity of a system depends on the design of the processor and the memory       BTL1</count;></count;>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 41  |                                                                         |         | DILI  |
| <ul> <li>} void *experiment( void *id )         {             unsigned int seed = 0;             int count = 20000;             for( int i=0; i<count; &memory[blocksize="" (int)id],="" )="" );="" *="" 0,="" <="" blocksize="" i++="" li="" memset(="" {="" }=""> <li>How Bandwidth share between cores             Bandwidth is another resource shared between threads. The bandwidth             capacity of a system depends on the design of the processor and the memory         </li> </count;></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |     |                                                                         |         |       |
| 42 A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |     |                                                                         |         |       |
| 42 A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory A state of a system depends on the design of the processor and the memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |     | void *experiment( void *id)                                             |         |       |
| int count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>for( int i=0; i <count; )<br="" i++=""></count;> for( int i=0; i <count; )<br="" i++=""></count;> {<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>for( int i=0; i <count; )<br="" i++=""></count;> for( int i=0; i <count; )<="" i++="" th="">42How Bandwidth is another resource shared between threads. The bandwidth<br/>count; i++ )<br/>for( int i=0; i<count; )<br="" i++=""></count;>for( int i=0; i<count; )<="" i++="" td="">BTL1<br/>BTL1</count;></count;>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |     |                                                                         |         |       |
| int count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Image: Count = 20000;<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>for( int i=0; i <count; )<br="" i++=""></count;> {<br>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |     | l unsigned int seed $-0$ :                                              |         |       |
| for( int i=0; i <count; )<br="" i++=""></count;> {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Immemset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>(int)id], 0, BLOCKSIZE );C409.242How Bandwidth share between cores<br>Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryC409.242Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryBTL1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |     |                                                                         |         |       |
| {<br>memset( &memory[BLOCKSIZE * (int)id], 0, BLOCKSIZE );<br>}Locksize (int)id], 0, BLOCKSIZE );<br>>C409.242How Bandwidth share between cores<br>Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryC409.242Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memoryBTL1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |     |                                                                         |         |       |
| <ul> <li>}</li> <li>How Bandwidth share between cores</li> <li>Bandwidth is another resource shared between threads. The bandwidth</li> <li>Bandwidth is another resource shared between threads. The bandwidth</li> <li>BTL1</li> <li>BTL1</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |     |                                                                         |         |       |
| <ul> <li>}</li> <li>How Bandwidth share between cores</li> <li>Bandwidth is another resource shared between threads. The bandwidth</li> <li>Bandwidth is another resource shared between threads. The bandwidth</li> <li>BTL1</li> <li>BTL1</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |     | momsot( & momory[BLOCKSIZE * (int)id] 0 BLOCKSIZE );                    |         |       |
| 42 Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |     | }                                                                       |         |       |
| 42 Bandwidth is another resource shared between threads. The bandwidth<br>capacity of a system depends on the design of the processor and the memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |     | How Bandwidth share between cores                                       | C409.2  |       |
| capacity of a system depends on the design of the processor and the memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 42  |                                                                         |         | BTL1  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |     |                                                                         |         |       |
| system as well as the memory chips and their location in the system                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |     | system as well as the memory chips and their location in the system     |         |       |

|     |                                                                               | 1       |      |
|-----|-------------------------------------------------------------------------------|---------|------|
|     | List out the three critical areas to address large difference scaling         | C409.2  |      |
|     | • The amount of bandwidth to cache and the memory will be divided             |         |      |
|     | among the active threads on the system.                                       |         |      |
| 43  | • The design of the caches will determine how much time is lost               |         | BTL1 |
| _   | because of capacity and conflict-induced cache misses.                        |         |      |
|     | • The way that the processor core pipelines are shared between active         |         |      |
|     | software threads will determine how instruction issue rates change            |         |      |
|     | as the number of active threads increases.                                    |         |      |
|     | What is the role default malloc( )                                            | C409.2  |      |
|     | The default malloc() provides better performance than the alternative         | 0.107.2 |      |
| 44  | implementation. The algorithm that provides improved scaling also adds a      |         | BTL1 |
| 44  | cost to the single-threaded situation; it can be hard to produce an algorithm |         | DILI |
|     | that is fast for the single-threaded case and scales well with multiple       |         |      |
|     | threads.                                                                      |         |      |
|     | Define an idea to choose the appropriate data structures                      | C409.2  |      |
|     | Choosing the best structure to hold data, such as choosing an algorithm of    | 0107.2  |      |
| 45  | the appropriate complexity, can have a major impact on overall                |         | BTL1 |
|     | performance.                                                                  |         |      |
|     | Some structures will be efficient when data is accessed in one pattern, while |         |      |
|     | other structures will be more efficient if the access pattern is changed.     |         |      |
|     | Define Column major order                                                     | C409.2  |      |
|     | The opposite ordering is followed, so adjacent elements of the first index    | 0.07.12 |      |
| 46  | are adjacent in memory. This is called <i>column-major</i> order. Accessing   |         | BTL1 |
| 40  | elements by a stride is a common error in codes translated from Fortran into  |         | DILI |
|     | C. It shows how memory is addressed in C, where adjacent elements in a        |         |      |
|     | row are adjacent in memory.                                                   |         |      |
|     | How to select the appropriate array access pattern                            | C409.2  |      |
| 47  | One common data access pattern is striding through elements of an array.      |         | BTL1 |
| -77 | The performance of the application would be better if the array could be      |         | DILI |
|     | arranged so that the selected elements were contiguous.                       |         |      |
|     | List out the techniques to reduce the latency                                 | C409.2  |      |
| 48  | <ul> <li>Out of order execution</li> </ul>                                    |         | BTL1 |
| 40  | <ul> <li>Hardware prefetching</li> </ul>                                      |         | DILI |
|     | <ul> <li>Software prefetching</li> </ul>                                      |         |      |
|     | List out the non technical reasons why functionality get placed in            | C409.2  |      |
|     | libraries                                                                     | 0707.2  |      |
|     | Libraries often represent a convenient product for an organizational          |         |      |
|     | unit. One group of developers might be responsible for a particular           |         |      |
|     | library of code, but that does not automatically imply that a single          |         |      |
| 10  | library represents the best way for that code to be delivered to the          |         | BTL1 |
| 49  | end users.                                                                    |         | DILI |
|     | • Libraries are also used to group related functionality. For example,        |         |      |
|     | an application might contain a library of string-handling functions.          |         |      |
|     | Such a library might be appropriate if it contains a large body of            |         |      |
|     | code. On the other hand, if it contains only a few small routines, it         |         |      |
|     | might be more appropriate to combine it with another library.                 |         |      |
|     | inglit de more appropriate to combine it with another norally.                |         |      |

|    | Why Algorithm complexity is important                                                                                                                                                                                                             | C409.2 |      |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
| 50 | Algorithmic complexity represents the expected performance of a section of code as the number of elements being processed increases. In the limit, the code with the greatest algorithmic complexity will dominate the runtime of the application |        | BTL1 |

•

## PART B

| Q. No. | Questions                                                                                                                                                                                                                         | СО     | Bloom'<br>s Level |
|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------------------|
|        |                                                                                                                                                                                                                                   |        | 5 Level           |
| 1.     | <ul> <li>Explain about Synchronization and data sharing in detail.</li> <li>2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:121-126</li> </ul>                    | C409.2 | BTL5              |
| 2.     | <ul> <li>Explain Synchronization primitives mutexes and locks.</li> <li>2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:126-128</li> </ul>                        | C409.2 | BTL5              |
|        | Explain Synchronization primitives in semaphores and barriers in                                                                                                                                                                  | C409.2 | BTL5              |
| 3.     | detail.                                                                                                                                                                                                                           |        |                   |
|        | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:128-129                                                                                                           |        |                   |
|        | Explain the concepts of deadlocks and live locks                                                                                                                                                                                  | C409.2 | BTL5              |
| 4.     | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:132-133                                                                                                           |        |                   |
| 5.     | <ul> <li>Explain communication between threads using condition variables and signals.</li> <li>2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:133-139</li> </ul> | C409.2 | BTL5              |
|        | Explain communication between threads using message queues and                                                                                                                                                                    | C409.2 | BTL5              |
| 6.     | <ul><li>pipes.</li><li>2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:138-139</li></ul>                                                                          |        |                   |
|        | Explain data races and scalability in parallel program. (apr/may2017)                                                                                                                                                             | C409.2 | BTL5              |
| 7.     | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:121-126                                                                                                           |        |                   |

|     | Explain Synchronization primitives in parallel program challenges.                                                      | C409.2 | BTL5 |
|-----|-------------------------------------------------------------------------------------------------------------------------|--------|------|
| 8.  | (Apr/may2017)                                                                                                           |        |      |
|     | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:126-130 |        |      |
|     | Explain the various approaches to parallel programming(Nov/Dec                                                          | C409.2 | BTL5 |
| 9   | 2017)                                                                                                                   |        |      |
| 2   | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:43-47        |        |      |
|     | What is a data race? What are the tools used for detecting data races?                                                  | C409.2 | BTL5 |
| 10. | How to avoid races? (Nov/Dec 2017) (Apr/May 2018)                                                                       |        |      |
|     | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011 Pg.No:121-125 |        |      |
|     | (i)Discuss in detail about producer consumer synchronization (ii)Write                                                  | C409.2 | BTL5 |
| 11  | a simple semaphore to send a message (Apr/May 2018)                                                                     |        |      |
| 11  | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011               |        |      |
|     | Write a short notes on deadlocks, livelocks and named pipes                                                             | C409.2 | BTL5 |
| 12  | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011               |        |      |
|     | Discuss in detail about the importance of algorithmic complexity                                                        | C409.2 | BTL2 |
| 13  | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011               |        |      |
|     | Explain the outline about necessity of structure reflects in performance                                                | C409.2 | BTL4 |
| 14  | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011               |        |      |
|     | Write in detail and summarize about hardware constraints applicable                                                     | C409.2 | BTL5 |
| 4 5 | in improving scaling                                                                                                    |        |      |
| 15  | 2. Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Pearson, 2011               |        |      |

## UNIT III

## SHARED MEMORY PROGRAMMING WITH OpenMP

OpenMP Execution Model – Memory Model – OpenMP Directives – Work-sharing Constructs – Library functions – Handling Data and Functional Parallelism – Handling Loops – Performance Considerations.

| Q. No. | Questions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | СО     | Bloom'<br>s Level |
|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------------------|
| 1.     | What is OpenMP?<br>Like Pthreads, OpenMP is an API for shared-memory parallel<br>programming. The "MP" in OpenMP stands for "multiprocessing," a term<br>that is synonymous with shared-memory parallel computing. Thus, OpenMP<br>is designed for systems in which each thread or process can potentially have<br>access to all available memory, and, when we're programming with<br>OpenMP, we view our system as a collection of cores or CPUs, all of which<br>have access to main memory.                                                                                                                                                                                                                                                                                                                                                                                                                         | C409.3 | BTL1              |
| 2.     | What is Pthread.<br>Pthreads is lower level and provides us with the power to program<br>virtually any conceivable thread behavior. This power, however, comes with<br>some associated cost—it's up to us to specify every detail of the behavior of<br>each thread.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | C409.3 | BTL1              |
| 3.     | What the difference between Pthreads and OpenMP.<br>Pthreads requires that the programmer explicitly specify the<br>behavior of each thread. OpenMP, on the other hand, sometimes allows the<br>programmer to simply state that a block of code should be executed in<br>parallel, and the precise determination of the tasks and which thread should<br>execute them is left to the compiler and the run-time system. This suggests a<br>further difference between OpenMP and Pthreads, that is, that Pthreads (like<br>MPI) is a library of functions that can be linked to a C program, so any<br>Pthreads program can be used with any C compiler, provided the system has<br>a Pthreads library. OpenMP, on the other hand, requires compiler support<br>for some operations, and hence it's entirely possible that you may run across<br>a C compiler that can't compile OpenMP programs into parallel programs. | C409.3 | BTL1              |
| 4.     | What is the Execution Model of OpenMp.<br>The OpenMP API uses the fork-join model of parallel execution.<br>Multiple threads of execution perform tasks defined implicitly or explicitly<br>by OpenMP directives. The OpenMP API is intended to support programs<br>that will execute correctly both as parallel programs (multiple threads of                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | C409.3 | BTL1              |

|    |                                                                                                                                                                                                                                                                                                                                                     | 1      | -    |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | execution and a full OpenMP support library) and as<br>sequential programs (directives ignored and a simple OpenMP stubs<br>library).                                                                                                                                                                                                               |        |      |
| 5. | What is an initial thread in OpenMP?<br>An OpenMP program begins as a single thread of execution, called<br>an initial thread.                                                                                                                                                                                                                      | C409.3 | BTL1 |
|    | How to compile and running OpenMP programs                                                                                                                                                                                                                                                                                                          |        |      |
|    | To compile this with gcc we need to include the -fopenmp option                                                                                                                                                                                                                                                                                     |        |      |
| 6. | <pre>\$ gcc -g -Wall -fopenmp -o omp_hello omp_hello.c To run the program, we specify the number of threads on the command line. For example, we might run the program with four threads and type \$ ./omp_hello 4</pre>                                                                                                                            |        |      |
|    | What is termed as initial task region(Nov/Dec 2017)                                                                                                                                                                                                                                                                                                 | C409.3 | BTL1 |
| 7  | The initial task region, that is defined by an implicit inactive parallel<br>region surrounding the whole program. When any thread encounters a<br>parallel construct, the thread creates a team of itself and zero or<br>more additional threads and becomes the master of the new team. A set<br>of implicit tasks, one per thread, is generated  |        |      |
|    | Define odd even transportation sort? . (Apr/May 2017)                                                                                                                                                                                                                                                                                               | C409.3 | BTL1 |
|    | <ul> <li>Requires N passes through the array.</li> <li>Each pass through the array analyzes either: <ul> <li>Every pair of odd indexed elements and the preceding element, or</li> <li>Every pair of even indexed elements and the preceding element.</li> </ul> </li> <li>Within each pass, elements that are not in order are swapped.</li> </ul> |        |      |
| 8  | <pre>Sort local keys;<br/>for (phase - 0; phase &lt; comm_sz; phase++) {<br/>partner - Compute_partner(phase, my_rank);<br/>if (I'm not idle) {<br/>Send my keys to partner;<br/>Receive keys from partner;<br/>if (my_rank &lt; partner)<br/>Keep smaller keys;<br/>else<br/>Keep larger keys;<br/>}<br/>}</pre>                                   |        |      |
| 9. | Develop a"hello word" program inthat uses open MP (Apr/May 2017)<br>Hello world program in C using MPI:<br>#include <stdio.h></stdio.h>                                                                                                                                                                                                             | C409.3 | BTL1 |

|     | <pre>#include <mpi.h></mpi.h></pre>                                                                          |         |       |
|-----|--------------------------------------------------------------------------------------------------------------|---------|-------|
|     | <pre>main(int argc, char **argv)</pre>                                                                       |         |       |
|     |                                                                                                              |         |       |
|     | int ierr;                                                                                                    |         |       |
|     | <pre>ierr = MPI_Init(&amp;argc, &amp;argv); printf("Hello world\n");</pre>                                   |         |       |
|     | <pre>ierr = MPI_Finalize(); }</pre>                                                                          |         |       |
|     | Define Message Queue.(Nov/Dec 2017)                                                                          | C409.3  | BTL1  |
| 10  | Message queues provide an asynchronous communications protocol,                                              |         |       |
| 10  | meaning that the sender and receiver of the message do not need to interact                                  |         |       |
|     | with the message queue at the same time. Messages placed onto the queue                                      |         |       |
|     | are stored until the recipient retrieves them.                                                               |         |       |
|     | What is an initial task region?                                                                              | C409.3  | BTL1  |
| 11. | An initial thread executes sequentially, as if enclosed in an implicit                                       |         |       |
|     | task region, called an initial task region, that is defined by the implicit                                  |         |       |
|     | parallel region surrounding the whole program.                                                               | C 100 2 | DTI 1 |
|     | Discuss the Structure of the OpenMP Memory Model                                                             | C409.3  | BTL1  |
| 12  | The OpenMP API provides a relaxed-consistency, shared-memory                                                 |         |       |
|     | model. All OpenMP threads have access to a place to store and to retrieve                                    |         |       |
|     | variables, called the <i>memory</i> . In addition, each thread is allowed to have its                        |         |       |
|     | own <i>temporary view</i> of the memory.                                                                     | C409.3  | BTL1  |
| 13  | What is <i>threadprivate memory</i> ?<br>Each thread also has access to another type of memory that must not | C409.3  | DILI  |
|     | be accessed by other threads, called <i>threadprivate memory</i> .                                           |         |       |
|     | Show the format of directive in OpenMP.                                                                      | C409.3  |       |
|     | #pragma omp directive-name [clause[ [,] clause]] new-line                                                    | C+07.5  |       |
| 14  | Each directive starts with <b>#pragma omp</b> . The remainder of the                                         |         | BTL2  |
| 14  | directive follows the conventions of the C and C++ standards for compiler                                    |         | DILL  |
|     | directives. In particular, white space can be used before and after the #, and                               |         |       |
|     | sometimes white space must be used to separate the words in a directive.                                     |         |       |
|     | What is meant by Stand-alone directives?                                                                     | C409.3  | BTL1  |
|     | Stand-alone directives do not have any associated executable user code.                                      |         |       |
|     | Instead, they represent executable statements that typically do not have                                     |         |       |
| 15  | succinct equivalent statements in the base languages. There are some                                         |         |       |
| 15  | restrictions on the placement of a stand-alone directive within a program. A                                 |         |       |
|     |                                                                                                              |         |       |
|     | stand-alone directive may be placed only at a point where a base language                                    |         |       |
|     | executable statement is allowed                                                                              |         |       |
|     | What is the use of parallel construct?                                                                       | C409.3  | BTL1  |
|     | This fundamental construct starts parallel execution.                                                        |         |       |
| 16  | <b>#pragma omp parallel</b> [clause[ [, ]clause]] new-line                                                   |         |       |
|     |                                                                                                              |         |       |
|     | structured-block                                                                                             |         |       |

|          | <b>#pragma omp parallel</b> [clause[ [, ]clause]] new-line                   |         |      |
|----------|------------------------------------------------------------------------------|---------|------|
|          | structured-block                                                             |         |      |
|          | if(scalar-expression)                                                        |         |      |
|          | num_threads(integer-expression)                                              |         |      |
|          | default(shared   none)                                                       |         |      |
|          | private(list)                                                                |         |      |
|          | firstprivate(list)                                                           |         |      |
|          | shared(list)                                                                 |         |      |
|          | copyin(list)                                                                 |         |      |
|          | <pre>reduction(redution-identifier :list)</pre>                              |         |      |
|          | proc_bind(master   close   spread)                                           |         |      |
|          | What is Worksharing Constructs?                                              | C409.3  |      |
|          | A worksharing construct distributes the execution of the associated          |         |      |
| 17       | region among the members of the team that encounters it. Threads execute     |         | BTL1 |
|          | portions of the region in the context of the implicit tasks each one is      |         |      |
|          | executing. If the team consists of only one thread then the worksharing      |         |      |
|          | region is not executed in parallel.                                          | ~ · · · |      |
|          | List some worksharing constructs.                                            | C409.3  |      |
|          | The OpenMP API defines the following worksharing constructs.                 |         |      |
| 18       | loop construct                                                               |         | BTL4 |
|          | • sections construct                                                         |         |      |
|          | • single construct                                                           |         |      |
|          | • workshare construct                                                        | G 400 Q |      |
|          | List the the Restrictions apply to worksharing constructs.                   | C409.3  |      |
|          | The following restrictions apply to worksharing constructs:                  |         |      |
| 19       | • Each worksharing region must be encountered by all threads in a team or    |         | BTL4 |
|          | by none at all, unless cancellation has been requested for the innermost     |         |      |
|          | enclosing parallel region.                                                   |         |      |
|          | • The sequence of worksharing regions and <b>barrier</b> regions encountered |         |      |
|          | must be the same for every thread in a team.                                 | C409.3  | BTL1 |
|          | Show the syntax of the loop construct.                                       | 0409.3  | DILI |
|          | The syntax of the loop construct is as follows:                              |         |      |
|          | #pragma omp for [clause[[,] clause] ] new-line                               |         |      |
|          | for-loops                                                                    |         |      |
|          | where <i>clause</i> is one of the following:                                 |         |      |
| 20       | <b>private</b> ( <i>list</i> )                                               |         |      |
| 20       | firstprivate( <i>list</i> )                                                  |         |      |
|          | lastprivate( <i>list</i> )                                                   |         |      |
|          | <b>reduction</b> (reduction-identifier: list)                                |         |      |
|          | schedule(kind[, chunk_size])                                                 |         |      |
|          | collapse(n)                                                                  |         |      |
|          | ordered                                                                      |         |      |
|          | nowait                                                                       |         |      |
| 21       |                                                                              | C409.3  | BTL1 |
| <u> </u> | The binding thread set for a loop region is the current team. A loop region  |         |      |
| 21       | What is meant by binding?                                                    | C409.3  | BTL1 |

|    | The <b>sections</b> construct is a non-iterative work sharing construct that                                                             |         |       |
|----|------------------------------------------------------------------------------------------------------------------------------------------|---------|-------|
| 25 | What is The sections construct.                                                                                                          | C409.3  | BTL1  |
| 25 |                                                                                                                                          | C400.2  | DTI 1 |
|    | schedule clause determines the schedule.                                                                                                 |         |       |
|    | <i>run-sched-var</i> ICV determines the schedule. Otherwise, the value of the                                                            |         |       |
|    | clause that specifies the <b>runtime</b> schedule kind then the current value of the                                                     |         |       |
| 24 | <i>sched-var</i> ICV determines the schedule. If the loop directive has a <b>schedule</b>                                                |         |       |
|    | directive does not have a <b>schedule</b> clause then the current value of the <i>def</i> -                                              |         |       |
|    | used to determine how loop iterations are assigned to threads If the loop                                                                |         |       |
|    | any) on the directive, and the <i>run-sched-var</i> and <i>def-sched-var</i> ICVs are                                                    |         |       |
|    | When execution encounters a loop directive, the <b>schedule</b> clause (if                                                               | 0.07.5  | 2121  |
|    | How to Determine the Schedule of a Worksharing Loop?                                                                                     | C409.3  | BTL1  |
|    | • The loop iteration variable may not appear in a <b>threadprivate</b> directive                                                         |         |       |
|    | construct.                                                                                                                               |         |       |
|    | ordered region ever binds to a loop region arising from the loop                                                                         |         |       |
|    | • The <b>ordered</b> clause must be present on the loop construct if any                                                                 |         |       |
|    | • Only one <b>ordered</b> clause can appear on a loop directive.                                                                         |         |       |
|    | <i>chunk_size</i> must not be specified.                                                                                                 |         |       |
|    | • When schedule(runtime) or schedule(auto) is specified,                                                                                 |         |       |
|    | threads in the team.                                                                                                                     |         |       |
|    | • The value of the <i>run-sched-var</i> ICV must be the same for all                                                                     |         |       |
|    | threads in the team.                                                                                                                     |         |       |
|    | • The value of the <i>chunk_size</i> expression must be the same for all                                                                 |         |       |
| 23 | positive value.                                                                                                                          |         | BTL4  |
|    | • <i>chunk_size</i> must be a loop invariant integer expression with a                                                                   |         |       |
|    | • Only one <b>collapse</b> clause can appear on a loop directive.                                                                        |         |       |
|    | • Only one <b>schedule</b> clause can appear on a loop directive.                                                                        |         |       |
|    | team.                                                                                                                                    |         |       |
|    | with the loop construct must be the same for all the threads in the                                                                      |         |       |
|    | • The values of the loop control expressions of the loops associated with the loop construct must be the same for all the threads in the |         |       |
|    | directive between any two loops.                                                                                                         |         |       |
|    | nested; that is, there must be no intervening code nor any OpenMP                                                                        |         |       |
|    | • All loops associated with the loop construct must be perfectly nested; that is, there must be no intervening code nor any OpenMP       |         |       |
|    | Restrictions to the loop construct are as follows:                                                                                       |         |       |
|    | List the Restrictions to the loop construct.                                                                                             | C409.3  |       |
|    | that immediately follows the loop directive.                                                                                             | C409.3  |       |
|    | present, the only loop that is associated with the loop construct is the one                                                             |         |       |
|    | must be a constant positive integer expression. If no <b>collapse</b> clause is                                                          |         |       |
| 22 | associated with the loop construct. The parameter of the <b>collapse</b> clause                                                          |         | BTL1  |
|    | The <b>collapse</b> clause may be used to specify how many loops are                                                                     |         |       |
|    | What is the use of collapse clause.                                                                                                      | C409.3  |       |
|    |                                                                                                                                          | C 400.2 |       |
|    | not eliminated by a <b>nowait</b> clause                                                                                                 |         |       |
|    | the loop iterations and the implied barrier of the loop region if the barrier is                                                         |         |       |
|    | team executing the binding <b>parallel</b> region participate in the execution of                                                        |         |       |
|    |                                                                                                                                          |         |       |
|    | binds to the innermost enclosing <b>parallel</b> region. Only the threads of the                                                         |         |       |

|    | , , , , , , , , , , , , , , , , , , ,                                           |        | [    |
|----|---------------------------------------------------------------------------------|--------|------|
|    | contains a set of structured blocks that are to be distributed among and        |        |      |
|    | executed by the threads in a team. Each structured block is executed once by    |        |      |
|    | one of the threads in the team in the context of its implicit task.             | ~      |      |
|    | Show the the syntax of the sections construct.                                  | C409.3 |      |
|    | The syntax of the <b>sections</b> construct is as follows:                      |        |      |
|    |                                                                                 |        |      |
|    | <b>#pragma omp sections</b> [clause[[,] clause]] new-line                       |        |      |
|    | {[                                                                              |        |      |
|    | <pre>#pragma omp section new-line]</pre>                                        |        |      |
|    | structured-block                                                                |        |      |
| 20 | [#pragma omp section new-line                                                   |        | BTL1 |
| 26 | structured-block ]                                                              |        | BILI |
|    |                                                                                 |        |      |
|    | }                                                                               |        |      |
|    | where <i>clause</i> is one of the following                                     |        |      |
|    | private(list)                                                                   |        |      |
|    | firstprivate(list)                                                              |        |      |
|    | lastprivate(list)                                                               |        |      |
|    | reduction(reduction-identifier:list)                                            |        |      |
|    | nowait                                                                          |        |      |
|    | List the Restrictions to the sections construct.                                | C409.3 |      |
|    | • Orphaned section directives are prohibited. That is, the section              |        |      |
|    | directives must appear within the <b>sections</b> construct and must not be     |        |      |
|    | encountered elsewhere in the <b>sections</b> region.                            |        |      |
|    | • The code enclosed in a <b>sections</b> construct must be a structured         |        |      |
|    | block.                                                                          |        |      |
|    | • Only a single <b>nowait</b> clause can appear on a <b>sections</b> directive. |        |      |
| 27 | firstprivate( <i>list</i> )                                                     |        | BTL4 |
|    | lastprivate( <i>list</i> )                                                      |        |      |
|    | reduction(reduction-identifier:list)                                            |        |      |
|    | • A throw executed inside a sections region must cause execution to             |        |      |
|    | resume within                                                                   |        |      |
|    | the same section of the sections region, and the same thread that               |        |      |
|    | the same section of the sections region, and the same thread that threw the     |        |      |
|    | exception must catch it.                                                        |        |      |
|    | ▲                                                                               | C409.3 | BTL1 |
|    | Show The syntax of the single construct .                                       | C+09.3 | DILI |
|    | <b>!\$omp single</b> [clause[[,] clause]]                                       |        |      |
|    | structured-block                                                                |        |      |
|    |                                                                                 |        |      |
| 28 | <b>!\$omp end single</b> [end_clause[[,] end_clause]]                           |        |      |
|    | where algues is one of the following:                                           |        |      |
|    | where <i>clause</i> is one of the following:                                    |        |      |
|    | private(list)<br>firstprivate(list)                                             |        |      |
|    | firstprivate(list)                                                              |        |      |
|    | and and clause is one of the following:                                         |        |      |
|    | and <i>end_clause</i> is one of the following:                                  | l      |      |

|    | copyprivate(list)                                                                |         |      |
|----|----------------------------------------------------------------------------------|---------|------|
|    | nowait                                                                           |         |      |
| 29 | Show the syntax of the workshare construct.                                      | C409.3  | BTL1 |
|    | !\$omp workshare                                                                 |         |      |
|    | structured-block                                                                 |         |      |
|    | !\$omp end workshare [nowait]                                                    |         |      |
|    | The enclosed structured block must consist of only the following:                |         |      |
|    | • array assignments                                                              |         |      |
|    | • scalar assignments                                                             |         |      |
|    | • FORALL statements                                                              |         |      |
|    | • FORALL constructs                                                              |         |      |
|    | • WHERE statements                                                               |         |      |
|    | • WHERE constructs                                                               |         |      |
|    | atomic constructs                                                                | G 400 Q |      |
|    | List the restrictions apply to the workshare construct:                          | C409.3  |      |
| 30 | • All array assignments, scalar assignments, and masked array                    |         | BTL4 |
|    | assignments must be intrinsic assignments.                                       |         |      |
|    | • The construct must not contain any user defined function calls                 |         |      |
|    | unless the function is <b>ELEMENTAL</b>                                          | C409.3  |      |
|    | What is the use of schedule clause.                                              | C409.5  |      |
|    | If more than one loop is associated with the loop construct, then the            |         |      |
| 31 | iterations of all associated loops are collapsed into one larger iteration space |         | BTL1 |
|    | that is then divided according to the schedule clause. The sequential            |         | DILI |
|    | execution of the iterations in all associated loops determines the order of the  |         |      |
|    | iterations in the collapsed iteration space                                      |         |      |
|    | What is the use of single construct.                                             | C409.3  |      |
|    | The <b>single</b> construct specifies that the associated structured block is    |         |      |
|    | executed by only one of the threads in the team (not necessarily the master      |         |      |
| 32 | thread), in the context of its implicit task. The other threads in the team,     |         | BTL1 |
|    | which do not execute the block, wait at an implicit barrier at the end of the    |         |      |
|    |                                                                                  |         |      |
|    | single construct unless a nowait clause is specified                             |         |      |
|    | List the Restrictions to the single construct are as follows:                    | C409.3  |      |
| 33 |                                                                                  |         | BTL4 |
|    | • The <b>copyprivate</b> clause must not be used with the <b>nowait</b> clause.  |         |      |
|    | • At most one <b>nowait</b> clause can appear on a <b>single</b> construct.      |         |      |
|    | What is workshare construct?                                                     | C409.3  |      |
| 34 | The workshare construct divides the execution of the enclosed                    |         | BTL1 |
| 5- | structured block into separate units of work, and causes the threads of the      |         |      |
|    | team to share the work such that each unit is executed only once by one          |         |      |
|    | thread, in the context of its implicit task.                                     |         |      |
| 35 | Explain Scope of a variable (Apr/May 2018)                                       | C409.3  | BTL1 |
|    |                                                                                  |         |      |

| 36  | Define Race Condition (Apr/May 2018)                                                                                                       | C409.3 | BTL1  |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|
|     | State the trapezoidal rule in OpenMP (Nov/Dec 2018)                                                                                        | C409.3 |       |
|     | for $(i = 1; i \le n-1; i++)$                                                                                                              |        |       |
| 37  | $\mathbf{x}_{i} = \mathbf{a} + \mathbf{i}^{*} \mathbf{d} \mathbf{x};$                                                                      |        | BTL1  |
|     | approx $+= f(x_i);$                                                                                                                        |        |       |
|     | approx = dx*approx;                                                                                                                        |        |       |
|     | Define Coherence and consistency                                                                                                           | C409.3 |       |
|     | • Coherence refers to the behavior of the memory system when a                                                                             |        | DTI 1 |
| 38  | single memory location is accessed by multiple threads.                                                                                    |        | BTL1  |
|     | • Consistency refers to the ordering of accesses to different memory                                                                       |        |       |
|     | locations, observable from various threads in the system                                                                                   | C409.3 |       |
|     | Define data race                                                                                                                           | C409.3 |       |
| 39  | • A data race is defined to be accesses to a single variable by at least two threads, at least one of which is a write, not separated by a |        | BTL1  |
| 29  | synchronization operation.                                                                                                                 |        | DILI  |
|     | • OpenMP does guarantee certain consistency behavior, however.                                                                             |        |       |
|     | That behavior is based on the OpenMP flush operation.                                                                                      |        |       |
|     | Define OpenMP flush operation                                                                                                              |        |       |
|     | The OpenMP flush operation is applied to a set of variables                                                                                |        |       |
|     | called the flush set. Memory operations for variables in the flush                                                                         |        |       |
| 40  | set that precede the flush in program execution order must be                                                                              |        |       |
|     | firmly lodged in memory and available to all threads before the                                                                            |        |       |
|     | flush completes, and memory operations for variables in the flush                                                                          |        |       |
|     | set, that follow a flush in program order cannot start until the                                                                           |        |       |
|     | flush completes.                                                                                                                           |        |       |
|     | Mention the actions to be taken to move the value from one thread to<br>another thread                                                     | C409.3 |       |
| 4.1 | • The first thread writes the value to the shared variable,                                                                                |        | DTI 1 |
| 41  | <ul> <li>The first thread flushes the variable</li> </ul>                                                                                  |        | BTL1  |
|     | • The second thread flushes the variable and                                                                                               |        |       |
|     | <ul> <li>The second thread reads the variable</li> </ul>                                                                                   |        |       |
|     | List out the two methods available for enabling nested parallel regions                                                                    | C409.3 |       |
| 42  | 1.The <b>omp_set_nested</b> () library routine                                                                                             |        | BTL1  |
|     | 2. Setting of the <b>OMP_NESTED</b> environment variable to TRUE                                                                           |        |       |
| 43  | What is data dependences                                                                                                                   | C409.3 |       |
|     | 1. OpenMP compilers don't check for dependences among iterations in a                                                                      |        | BTL1  |
|     | loop that's being parallelized with a parallel <b>for</b> directive. It's up to us, the programmers, to identify these dependences.        |        |       |
|     | 2. A loop in which the results of one or more iterations depend on other                                                                   |        |       |
|     | 1 - 1 - 100 p m which the results of one of more hermions depend on other                                                                  | I I    |       |

|    | iterations <i>cannot</i> , in general, be correctly parallelized by OpenMP.                                                                                                                                                                                                                              |         |       |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-------|
| 44 | <ul> <li>Define Atomic Directive</li> <li>The atomic directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.</li> <li>The atomic directive supports no OpenMP clauses.</li> <li>Syntax</li> </ul> | C409.3  | BTL1  |
|    | #pragma omp atomic expression                                                                                                                                                                                                                                                                            | <i></i> |       |
| 45 | <ul> <li>Define Critical Directive</li> <li>Specifies that code is only executed on one thread at a time. OpenMP <i>does</i> provide the option of adding a name to a critical directive:</li> <li>Syntax</li> <li># pragma omp critical(name)</li> </ul>                                                | C409.3  | BTL1  |
|    | List out the two types of locks in OpenMP to destroy lock data                                                                                                                                                                                                                                           | C409.3  |       |
| 46 | <ul> <li>structures</li> <li>Simple locks</li> <li>Nested Locks</li> </ul>                                                                                                                                                                                                                               |         | BTL1  |
|    | Define Barrier Directive and Master Directive                                                                                                                                                                                                                                                            | C409.3  |       |
| 47 | A barrier directive will cause the threads in a team to block until all the threads have reached the directive. <b>Syntax</b>                                                                                                                                                                            |         | BTL1  |
|    | <ul><li>#pragma omp barrier</li><li>Specifies that only the master thread should execute a section of the program.</li><li>Syntax</li></ul>                                                                                                                                                              |         |       |
|    | # pragma omp master                                                                                                                                                                                                                                                                                      |         |       |
|    | Which are the OpenMP clauses supported by Single directive                                                                                                                                                                                                                                               | C409.3  |       |
| 40 | • copyprivate                                                                                                                                                                                                                                                                                            |         | DTL 1 |
| 48 | • firstprivate                                                                                                                                                                                                                                                                                           |         | BTL1  |
|    | • nowait                                                                                                                                                                                                                                                                                                 |         |       |
|    | • private                                                                                                                                                                                                                                                                                                |         |       |
|    | Define Synchronization Clauses                                                                                                                                                                                                                                                                           | C409.3  |       |
|    | • Critical                                                                                                                                                                                                                                                                                               |         |       |
| 49 | Atomic     Ordered                                                                                                                                                                                                                                                                                       |         | BTL1  |
|    | <ul><li>Ordered</li><li>Barrier</li></ul>                                                                                                                                                                                                                                                                |         |       |
|    | Nowait                                                                                                                                                                                                                                                                                                   |         |       |
|    | List out the three types of Scheduling?                                                                                                                                                                                                                                                                  | C409.3  |       |
| 50 | • Static                                                                                                                                                                                                                                                                                                 |         | BTL1  |
|    | • Dynamic                                                                                                                                                                                                                                                                                                |         |       |
|    | • Guided                                                                                                                                                                                                                                                                                                 |         |       |
| 51 | Define Data parallelism                                                                                                                                                                                                                                                                                  | C409.3  | BTL1  |
|    | <b>Data parallelism</b> is a form of parallelization across multiple                                                                                                                                                                                                                                     |         |       |

|    | distributing the data in parallarrays and markets | n parallel computing environments. It focuses on<br>he data across different nodes, which operate on the<br>lel. It can be applied on regular data structures like<br>atrices by working on each element in parallel. |        |      |
|----|---------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | List out the for<br>Type                          | ar discrete steps to parallelization Description                                                                                                                                                                      | C409.3 |      |
| 52 | Decomposition                                     | The program is broken down into tasks, the smallest exploitable<br>unit of concurrence.                                                                                                                               |        | BTL1 |
| 52 | Assignment                                        | Tasks are assigned to processes.                                                                                                                                                                                      |        | DILI |
|    | Orchestration                                     | Data access, communication, and synchronization of processes.                                                                                                                                                         |        |      |
|    | Mapping                                           | Processes are bound to processors.                                                                                                                                                                                    |        |      |
| 53 | Loop-Carried                                      | <b>Dependence</b> Verification in <b>OpenMP</b> . Data <b>dependence</b> ery difficult task, mainly due to the limitations imposed by                                                                                 | C409.3 | BTL1 |
|    | •                                                 | , and by the overhead of dynamic data <b>dependence</b> analysis.                                                                                                                                                     |        |      |

# ` PART B

|        | . <b>PART B</b>                                                                                                                                                                                                                 |        |                  |  |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------------------|--|
| Q. No. | Questions                                                                                                                                                                                                                       | СО     | Bloom's<br>Level |  |
| 1.     | <ul> <li>Explain OpenMP Execution Model in detail with example.(Nov/Dec 2017) (Apr/May 2018)</li> <li>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:210-215</li> </ul> | C409.3 | BTL5             |  |
|        | Explain the Memory Model of OpenMP. (Apr/May 2018)                                                                                                                                                                              | C409.3 | BTL5             |  |
| 2.     | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:213-215                                                                                                              |        |                  |  |
|        | Explain the OpenMP Directives . (Apr/may2017)                                                                                                                                                                                   | C409.3 | BTL5             |  |
| 3.     | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:224-231                                                                                                                  |        |                  |  |
|        | Explain the Library functions used in OpenMP.                                                                                                                                                                                   | C409.3 | BTL5             |  |
| 4.     |                                                                                                                                                                                                                                 |        |                  |  |
|        | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:                                                                                                                         |        |                  |  |

|    | Explain in detail how to Handle Loops in OpenMP (Nov/Dec 2017)                                                      | C409.3 | BTL5 |
|----|---------------------------------------------------------------------------------------------------------------------|--------|------|
| 5. | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No: 236-240 |        |      |
|    | Explain OpenMP directives (Apr/may 2017)                                                                            | C409.3 | BTL5 |
| 6. | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:224-231  |        |      |
|    | How data and functional parallelism are handled in shared memory                                                    | C409.3 | BTL5 |
|    | programming with open MP? (Apr/may2017)                                                                             |        |      |
| 7. | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:212-215  |        |      |
|    | Explain in detail about the handling loops in parallel operations                                                   | C409.3 | BTL5 |
|    | (Nov/Dec 2017)                                                                                                      |        |      |
| 8. | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:236-240  |        |      |
|    | Write an example program for shared memory programming with                                                         | C409.3 | BTL5 |
|    | Pthread (Apr/May 2018)                                                                                              |        |      |
| 9  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:236-240  |        |      |
|    | Explain in detail about the synchronization primitives in parallel                                                  | C409.3 | BTL5 |
|    | program challenges (Apr/May 2017)                                                                                   |        |      |
| 10 | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:236-240  |        |      |
|    |                                                                                                                     |        |      |

|    | Explain the type shared memory model                                    | BTL5 |
|----|-------------------------------------------------------------------------|------|
| 11 | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |      |
|    | Kauffman/Elsevier, 2011                                                 |      |
|    |                                                                         | BTL1 |
|    | Collect the all information about internal control variable             | DILI |
| 12 | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |      |
|    | Kauffman/Elsevier, 2011                                                 |      |
|    | Explain briefly about General Data Parallelism                          | BTL4 |
| 13 | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |      |
| _  | Kauffman/Elsevier, 2011                                                 |      |
|    |                                                                         |      |
|    | (i)Explain the NoWait Clause                                            | BTL4 |
| 14 | (ii)Explain the single pragma                                           |      |
| 14 | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |      |
|    | Kauffman/Elsevier, 2011                                                 |      |
|    | (i)Illustrate the runtime library definitions                           | BTL3 |
|    | (ii)llustrate the execution environment routines                        |      |
| 15 |                                                                         |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |      |
|    | Kauffman/Elsevier, 2011                                                 |      |
|    |                                                                         |      |

### UNIT IV

#### DISTRIBUTED MEMORY PROGRAMMING WITH MPI

MPI program execution – MPI constructs – libraries – MPI send and receive – Point-to-point and Collective communication – MPI derived datatypes – Performance evaluation.

|        |                                                                                                                                                                                                                                                                                                                                                              |        | Bloom       |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------------|
| Q. No. | Questions                                                                                                                                                                                                                                                                                                                                                    | СО     | 's<br>Level |
| 1.     | What is MPI?<br>One process calls a <i>send</i> function and the other calls a <i>receive</i> function. The<br>implementation of message-passing that will be using is called MPI<br>(Message-Passing Interface). MPI is not a new programming language. It<br>defines a <i>library</i> of functions that can be called from C, C++, and<br>FORTRAN Programs | C409.4 | BTL1        |
| 2.     | How to execute MPI programs?<br>Many systems use the command called mpicc to compile and run MPI<br>programs:<br>mpicc -g -Wall -o mpi hello mpi hello.c                                                                                                                                                                                                     | C409.4 | BTL1        |
| 3.     | What is the use of Wrapper Script. (Apr/May 2017)         A wrapper script is a script whose main purpose is to run some program. In this case, the program is the C compiler. The wrapper simplifies the running of the compiler by telling it where to find the necessary header files and which libraries to link with the object file.                   | C409.4 | BTL1        |
| 4.     | Show the syntax of MPI_init and MPI_finalize<br>The syntax of MPI_ Init ( ):<br>int MPI_Init(<br>int* argc p /* in/out */,<br>char*** argv p /* in/out */);<br>The syntax of MPI_ Finalize( ):<br>int MPI_ Finalize(void);                                                                                                                                   | C409.4 | BTL1        |
| 5.     | What is Communicator in MPI?<br>In MPI a communicator is a collection of processes that can send<br>messages to each other. One of the purposes of MPI Init is to define a<br>communicator that consists of all of the processes started by the user<br>when she started the program. This communicator is called MPI_<br>COMM_WORLD.                        | C409.4 | BTL1        |

#### PART A

|          | 1                                                                                                      | 1       |      |
|----------|--------------------------------------------------------------------------------------------------------|---------|------|
|          | What is a SPMD program?<br>A single program is written so that different processes carry out different |         |      |
| <i>.</i> | actions, and this is achieved by simply having the processes branch on the                             | C 400 4 |      |
| 6.       | basis of their process rank. This approach to parallel programming is called                           | C409.4  | BTL1 |
|          | single program, multiple data, or SPMD                                                                 |         |      |
|          | Give the syntax of MPI_Get_count                                                                       |         | BTL1 |
| 7.       | The syntax of MPI_Get_count is<br>int MPI_Get_count(                                                   | C409.4  |      |
|          | MPI_Status* status_p /* in */,                                                                         |         |      |
|          | MPI_Datatype type /* in */,<br>int* count p /*out */);                                                 |         |      |
|          | What is Non-overtaking?                                                                                |         | BTL1 |
|          | If process q sends two messages to process r, then the first                                           |         |      |
|          | message sent by q must be available to r before the second message.                                    |         |      |
| 8.       | There is no restriction on the arrival of messages sent from different                                 | C409.4  |      |
|          | processes. ie., if q and t both send messages to r, then even if q sends its                           |         |      |
|          | message before t sends its message, there is no requirement that q's                                   |         |      |
|          | message become available to r before t's message. This is essentially                                  |         |      |
|          | because MPI can't impose performance on a network.                                                     |         |      |
|          | Evaluate the performance evaluation methods in distributed memory programming(Nov/Dec 2017,            |         |      |
|          | 1. Timing performance                                                                                  |         |      |
| 9        | 2.performance Results                                                                                  | C409.4  | BTL5 |
|          | 3.Speedup performance                                                                                  |         |      |
|          | 4.efficiency performance                                                                               |         |      |
|          | 5.Scalability performance                                                                              |         |      |
|          | What is wildcard argument?                                                                             |         | BTL1 |
|          | The wildcard arguments:                                                                                |         |      |
|          | • Only a receiver can use a wildcard argument. Senders must specify a                                  |         |      |
| 10       | process rank and a nonnegative tag. Thus, MPI uses a "push"                                            | C409.4  |      |
|          | communication mechanism rather than a "pull" mechanism.                                                |         |      |
|          | There is no wildcard for communicator arguments; both senders and                                      |         |      |
|          | receivers must always specify communicators                                                            |         |      |
| 11       | Show the area of the trapezoid.                                                                        | C409.4  | BTL1 |
|          | Area of one trapezoid = $h/2[f(x_i) + (f(x_{i+1}))]$                                                   |         |      |
| 12       | Define collective communications in MPI.                                                               | C409.4  | BTL1 |
|          | In MPI parlance, communication functions that involve all the                                          |         |      |
|          | processes in a communicator are called collective communications.                                      |         | BTL1 |
| 13       | What is Local Variables? Give Examples.                                                                | C409.4  | DILI |
| 13       | Local variables are variables whose contents are significant only on                                   | C-107.4 |      |
|          | the process that's using them. Some examples are: local a, local b and                                 |         |      |
|          |                                                                                                        |         |      |

|    | local n.                                                                                                                                      |        |       |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|
|    |                                                                                                                                               |        | BTL1  |
| 14 | What is Global Variables? Give Examples.                                                                                                      | C409.4 |       |
|    | Variables whose contents are significant to all the processes are                                                                             |        |       |
|    | sometimes called global variables. Some examples are: <b>a</b> , <b>b</b> and <b>n</b> .                                                      |        | DTI 1 |
| 15 | What is point to point communications in MPI?                                                                                                 | C409.4 | BTL1  |
|    | <b>MPI_Send</b> and <b>MPI_Recv</b> are often called point-to-point communications.                                                           |        |       |
|    | List any two points stating that how collective communication differ                                                                          |        |       |
|    | from point to point communication.                                                                                                            |        |       |
| 16 | • All the processes in the communicator must call the same collective                                                                         | C409.4 | BTL4  |
| 10 | function. Example, a program that attempts to match a call to <b>MPI_Reduce</b>                                                               | 0.000  | 2121  |
|    | on one process with a call to MPI_ Recv on another process is erroneous,                                                                      |        |       |
|    | and, in all likelihood, the program will hang or crash.                                                                                       |        |       |
|    | What is a butterfly-structured global sum?                                                                                                    |        | BTL1  |
| 17 | The processes exchange partial results instead of using one-way                                                                               | C409.4 |       |
|    | communications. Such a communication pattern is sometimes called a                                                                            |        |       |
|    | butterfly-structured global sum.                                                                                                              |        | DTL 1 |
| 18 | What is broadcast?                                                                                                                            | C409.4 | BTL1  |
|    | A collective communication in which data belonging to a single process                                                                        |        |       |
|    | is sent to all of the processes in the communicator is called a <b>broadcast</b><br>Show a broadcast function.                                |        | BTL1  |
|    | A broadcast function:                                                                                                                         |        | DILI  |
|    | int MPI_Bcast (                                                                                                                               |        |       |
| 19 | void* data p /* in/out */,                                                                                                                    | C409.4 |       |
| 19 | int count /* in */,                                                                                                                           | 010711 |       |
|    | MPI_Datatype datatype /* in */,                                                                                                               |        |       |
|    | int source proc /* in */,                                                                                                                     |        |       |
|    | MPI_Comm comm /* in */);                                                                                                                      |        |       |
|    | Define block partition of the vector.                                                                                                         |        | BTL1  |
|    | The work consists of adding the individual components of the vectors,                                                                         |        |       |
|    | so we might specify that the tasks are just the additions of corresponding                                                                    |        |       |
|    | components. Then there is no communication between the tasks, and the                                                                         |        |       |
| 20 | problem of parallelizing vector addition boils down to aggregating the                                                                        | C409.4 |       |
|    | tasks and assigning them to the cores. If the number of components is n<br>and we have comm_sz cores or processes, let's assume that n evenly |        |       |
|    | divides comm_sz and define local n D n=comm sz. Then we can simply                                                                            |        |       |
|    | assign blocks of local n consecutive                                                                                                          |        |       |
|    | components to each process. This is often called a <b>block partition</b> of                                                                  |        |       |
|    | the vector.                                                                                                                                   |        |       |
|    | Show MPI Scatter function.                                                                                                                    |        | BTL1  |
|    | The communication MPI provides a function:                                                                                                    |        |       |
| 21 | int MPI Scatter(                                                                                                                              | C409.4 |       |
|    | void* send_buf_p /* in */,                                                                                                                    |        |       |
|    | int send_count /* in */,                                                                                                                      |        |       |
| 1  | MPI_Datatype send_type /* in */,                                                                                                              |        |       |

|    | void*                      | recv_buf_p              | /* out */,                   |         |       |
|----|----------------------------|-------------------------|------------------------------|---------|-------|
|    | int                        | recv_count              | /* in */,                    |         |       |
|    | MPI_Datatype               | recv_type               | /* in */,                    |         |       |
|    | int                        | src_proc                | /* in */,                    |         |       |
|    | MPI_Comm                   | comm                    | /* in */);                   |         |       |
|    | Give MPI Gather functio    |                         | / m /),                      |         | BTL1  |
|    |                            |                         | rried out by MPI Gather,     |         | DILI  |
|    | int MPI_Gather(            |                         | inted out by Wirr Gather,    |         |       |
|    | void*                      | send_buf_p              | /* in */,                    |         |       |
|    | int                        | send_count              | /* in */,                    |         |       |
| 22 | MPI_Datatype               | send_type               | /* in */,                    | C409.4  |       |
|    | void*                      | • 1                     | /* out */,                   |         |       |
|    | int                        | recv_buf_p              | /* in */,                    |         |       |
|    |                            | recv_count              | · ·                          |         |       |
|    | MPI_Datatype               | recv_type               | ·                            |         |       |
|    | int<br>MDL Comm            | dest_proc               | /* in */,                    |         |       |
|    | MPI_Comm                   | comm                    | /* in */);                   |         | DTI 1 |
| 22 | What is Derived data typ   |                         |                              | G 400 4 | BTL1  |
| 23 |                            |                         | bresent any collection of    | C409.4  |       |
|    | -                          |                         | es of the items and their    |         |       |
|    | relative locations in me   | ,                       |                              |         |       |
|    | Show the use MPI_Type_     |                         | 1 • 11// / // /              |         | BTL1  |
|    | We can use MPI Type        |                         |                              |         |       |
|    | consists of individual e   | lements that have diffe | erent basic types:           |         |       |
|    |                            |                         |                              |         |       |
|    | int MPI_Type_crea          |                         | (24                          |         |       |
|    | int                        | count                   | /*                           |         |       |
| 24 | in */,                     | 6 1 1 1 1               | <b>41 F1 /</b> \\$           | C409.4  |       |
|    | int                        | array_of_blockleng      | ths[] /*                     |         |       |
|    | in */,                     | C 1' 1                  |                              |         |       |
|    | MPI_Aint                   | array_of_displacen      | nents[] /*                   |         |       |
|    | in */,                     |                         | / Ja                         |         |       |
|    | MPI_Datatype               | array_of_types[]        | /*                           |         |       |
|    | in */,                     |                         | (24                          |         |       |
|    |                            | new_type_p              | /*                           |         |       |
|    | out */);                   | 1 1 11                  |                              |         |       |
|    | Show the ratio of the seri |                         |                              |         | BTL1  |
|    |                            |                         | between the serial and the   |         |       |
| 25 | 1                          | 1 1 0                   | ratio of the serial run-time | C409.4  |       |
| 25 | to the parallel run-time   |                         |                              | C409.4  |       |
|    |                            | T <sub>serial</sub> (n) |                              |         |       |
|    | S (n                       | <i>h</i> , <i>p</i> ) = |                              |         |       |
|    | $T_{parallel}(n, p)$       |                         |                              |         |       |
|    | Show "per process" speed   | łup?                    |                              |         | BTL1  |
| 26 | The "Per Process speed     |                         |                              | C409.4  |       |
|    |                            | ···r                    |                              |         |       |
| l  |                            |                         |                              |         |       |

|    | $S(n, p)$ $T_{serial}(n)$                                                                                                                                                                                                                                                                                                                                                                                       |        |      |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | E(n, p) = =                                                                                                                                                                                                                                                                                                                                                                                                     |        |      |
| 27 | p       p * T parallel (n, p)         what is a wrapper script? . (apr/may2017)         A wrapper script is a script whose main purpose is to run         wrapper the running of the compiler by telling it where to find the necess         which libraries to link with the object file .                                                                                                                     | C409.4 | BTL1 |
| 28 | <ul> <li>How to design a parallel program?</li> <li>A parallel program can be designed using four basic steps:</li> <li>1. Partition the problem solution into tasks.</li> <li>2. Identify the communication channels between the tasks.</li> <li>3. Aggregate the tasks into composite tasks.</li> <li>4. Map the composite tasks to cores.</li> </ul>                                                         | C409.4 | BTL1 |
| 29 | What is linear speedup?<br>The ideal value for S (n,p) is p. If S (n,p) = p, then our parallel program<br>with comm_sz = p processes is running p times faster than the serial<br>program. This speedup, sometimes called linear speedup.                                                                                                                                                                       | C409.4 | BTL1 |
| 30 | What is block-cyclic partition?<br>Instead of using a cyclic distribution of individual components, use a cyclic distribution of blocks of components, so a block-cyclic distribution isn't fully specified until we decide how large the blocks are                                                                                                                                                            | C409.4 | BTL1 |
| 31 | What are the possibilities for choosing a destination when sending<br>requests for work with MPIMPI is designed to allow users to create programs that can run efficiently on most<br>parallel architectures. The design process included vendors (such as IBM, Intel,<br>TMC, Cray, Convex, etc.), parallel library authors (involved in the development of<br>PVM, Linda, etc.), and applications specialists | C409.4 | BTL1 |
| 32 | List the restrictions work sharing constructs         The sequence of <i>work-sharing constructs</i> and barrier directives encountered must be the same for every thread in a team                                                                                                                                                                                                                             | C409.4 | BTL1 |
| 33 | Write the evaluation methods is distributed memory programming                                                                                                                                                                                                                                                                                                                                                  | C409.4 | BTL1 |
| 34 | Give the commands for MPI                                                                                                                                                                                                                                                                                                                                                                                       | C409.4 | BTL1 |
| 35 | Define and broadcast and butterfly MPI                                                                                                                                                                                                                                                                                                                                                                          | C409.4 | BTL1 |
| 36 | What is MPI W_Time         MPI provides a function, MPI_Wtime, that returns the number of seconds         that have elapsed since some time in the past:         Double MPI_Wtime(void);                                                                                                                                                                                                                        | C409.4 | BTL1 |
| 37 | Define MPI_Barrier         The MPI collective communication function MPI_Barrier insures that no process will return from calling it until every process in the communicator has started calling it.                                                                                                                                                                                                            | C409.4 | BTL1 |

|    | Define Speed-Up and Efficiency                                                           |         | BTL1  |
|----|------------------------------------------------------------------------------------------|---------|-------|
| 20 | The most widely used measure of the relation between the serial and the                  | <i></i> |       |
| 38 | parallel run-times is the <b>speedup</b> . It's just the ratio of the serial run-time to | C409.4  |       |
|    | the parallel run-time:                                                                   |         |       |
|    | $S(n,p)=T_{Serial(n)}/T_{Serial(n,p)}$                                                   |         |       |
|    | How the T <sub>overhead</sub> is represented                                             |         | BTL1  |
|    | The parallel run-time is denoted by <i>T</i> parallel. Since it depends on both the      |         |       |
| 39 | input size, $n$ , and the number of processes, commsz= $p$ , we'll frequently            | C409.4  |       |
| 39 | denote it as $T$ parallel( $n,p$ ). The parallel program will divide the work of the     | C409.4  |       |
|    | serial program among the processes, and add in some overhead time, which                 |         |       |
|    | we denoted Toverhead:                                                                    |         |       |
|    | T <sub>parallel</sub> (n,p)=T <sub>serial</sub> (n/p)+T <sub>overhead</sub>              |         |       |
|    | Define Speed up                                                                          |         | BTL1  |
| 40 | The most widely used measure of the relation between the serial and the                  | C409.4  |       |
| 40 | parallel run-times is the <b>speedup</b> . It's just the ratio of the serial run-time to | C409.4  |       |
|    | the parallel run-time:                                                                   |         |       |
|    | $S(n,p)=T_{serial}(n)/T_{parallel(n,p)}$                                                 |         |       |
|    | Define Efficiency                                                                        |         | BTL1  |
| 41 | This speedup, sometimes called linear speedup. Another widely used                       | C409.4  |       |
|    | measure of parallel performance is parallel efficiency.                                  |         |       |
|    | $E(n,p) = S(n,p)/p = T_{serial}(n)/P^* T_{parallel(n,p)}$                                |         |       |
|    | What is MPI derived data types                                                           |         | BTL1  |
| 42 | In MPI, a <b>derived datatype</b> can be used to represent any collection of data        | C409.4  |       |
|    | items in memory by storing both the types of the items and their relative                |         |       |
|    | locations in memory.                                                                     |         |       |
|    | Define Gather                                                                            |         | BTL1  |
|    | Gathers distinct messages from each task in the group to a single                        |         |       |
| 43 | destination task. This routine is the reverse operation of MPI_Scatter. The              | C409.4  |       |
| 10 | data stored in the memory referred to by send_buf_p on process 0 is stored               | 0707.4  |       |
|    | in the first block in recv_buf p, the data stored in the memory referred to by           |         |       |
|    | send buf_p on process 1 is stored in the second block referred to by                     |         |       |
|    | recv_buf_p, and so on.                                                                   |         | DTI 1 |
| 44 | Define Scatter                                                                           | C409.4  | BTL1  |
|    | Distributes distinct messages from a single source task to each task in the              |         |       |
|    | group.                                                                                   |         |       |

|    | scatter                                                                                                                                                                                                                                                                                                                                                                                                                                           |        |      |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
| 45 | <ul> <li>List out the types of collective operations</li> <li>Synchronization - processes wait until all members of the group have reached the synchronization point.</li> <li>Data Movement - broadcast, scatter/gather, all to all</li> <li>Collective Computation (reductions) - one member of the group collects data from the other members and performs an operation (min, max, add, multiply, etc.) on that data.</li> </ul>               | C409.4 | BTL1 |
| 46 | Define MPI_Allreduce<br>Consider a situation in which <i>all</i> of the processes need the result of a global<br>sum in order to complete some larger computation.<br>For example, if we use a tree to compute a global sum, we might —reverse!<br>the branches to distribute the global sum.                                                                                                                                                     | C409.4 | BTL1 |
| 47 | List out the two possibilities when the message are assembled     the sending process can buffer the message or     it can block                                                                                                                                                                                                                                                                                                                  | C409.4 | BTL1 |
| 48 | What are the types of MPI type         The MPI type MPI_Status is a struct with at least the three members MPI_         SOURCE, MPI_TAG, and MPI_ERROR.         Suppose our program contains the definition MPI_Status status; Then, after         a call to MPI Recv in which & status is passed as the last argument, we can         determine the sender and tag by examining the two members         status.MPI SOURCE         status.MPI TAG | C409.4 | BTL1 |
| 49 | <ul> <li>What is status_p argument</li> <li>The Amount of Data In The Message</li> </ul>                                                                                                                                                                                                                                                                                                                                                          | C409.4 | BTL1 |

|    | • The Sender of The M                                                   | lessage, Or                                                                       |                                   |        |      |  |
|----|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------|--------|------|--|
|    | • The Tag of The Message.                                               |                                                                                   |                                   |        |      |  |
|    | Mention the syntax for MI                                               | U U                                                                               |                                   |        | BTL1 |  |
|    | int MPI_Send(<br>void*                                                  | msg_buf_p                                                                         | /* in */.                         |        |      |  |
| 50 |                                                                         | msg_size                                                                          |                                   | C409.4 |      |  |
|    | MPI_Datatype                                                            | msg_type                                                                          | /* in */,                         |        |      |  |
|    | int                                                                     | dest                                                                              | /* in */,                         |        |      |  |
|    | int                                                                     |                                                                                   | /* in */,                         |        |      |  |
|    | MPI_Comm                                                                | communicator                                                                      | /* in */);                        |        |      |  |
|    | Write a note on Distributed memory machines (Nov/Dec 2018)              |                                                                                   |                                   |        | BTL1 |  |
|    | Programming on a distributed memory machine is a matter of organizing a |                                                                                   |                                   |        |      |  |
|    | program as a set of indepen                                             |                                                                                   |                                   |        |      |  |
| 51 | messages. In addition, program                                          |                                                                                   |                                   |        |      |  |
|    |                                                                         | ntroduces the concept of locality in parallel algorithm design. An algorithm that |                                   |        |      |  |
|    |                                                                         |                                                                                   | s and then runs with minimal      |        |      |  |
|    |                                                                         |                                                                                   | t than an algorithm that requires |        |      |  |
|    | random access to global struct                                          |                                                                                   |                                   |        |      |  |
| 52 | How to compile an MPI P                                                 |                                                                                   |                                   | C409.4 | BTL1 |  |
| 52 | Many systems use a comma                                                | _                                                                                 | r compilation mp icc is a         | C-07.4 |      |  |
|    | script that's a <b>wrapper</b> for t                                    | he C compiler.                                                                    |                                   |        |      |  |

## PART B

•

| Q. No. | Questions                                                                                                                                                                                                                                                              | СО     | Bloom<br>'s<br>Level |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|----------------------|
| 1.     | <ul> <li>What is MPI? Write a program"hello,world" that makes some use of MPI. how to compile and execute MPI programs(Nov/Dec 2017).</li> <li>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:86-90</li> </ul> | C409.4 | BTL1                 |
| 2.     | <b>Explain about Trapezoidal rule in MPI in detail</b><br>1.Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:94-96                                                                                                  | C409.4 | BTL1                 |
| 3.     | <b>Give in detail about Collective Communication.</b><br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:101-115                                                                                                | C409.4 | BTL1                 |

|    | Explain the following MPI functions:                                                                                                                         |         | BTL1  |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-------|
|    | <ul><li>MPI_Reduce</li><li>MPI_Allreduce</li></ul>                                                                                                           |         |       |
|    | <ul> <li>MPI_Scatter</li> </ul>                                                                                                                              |         |       |
| 4. | <ul> <li>MPI_Gather</li> </ul>                                                                                                                               | C409.4  |       |
|    | • MPI_Allgather                                                                                                                                              |         |       |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                                                                      |         |       |
|    | Kauffman/Elsevier, 2011 Page No:88-91                                                                                                                        |         |       |
|    | •                                                                                                                                                            |         |       |
|    | Compare and contrast Collective communication Vs Point to point                                                                                              |         |       |
| 5. | <b>communication.(16)</b> (Nov/Dec 2017).<br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                         | C409.4  | BTL2  |
|    | Kauffman/Elsevier, 2011 Page No:105-106                                                                                                                      |         |       |
|    |                                                                                                                                                              |         |       |
|    | Disscus about MPI Derived data types with example programs. (10)                                                                                             |         |       |
| c  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                                                                      | C409.4  | BTL6  |
| 6. | Kauffman/Elsevier, 2011 Page No:116-118                                                                                                                      | C409.4  | DILO  |
|    |                                                                                                                                                              |         |       |
|    |                                                                                                                                                              |         |       |
|    | i.Explain about Performance Evaluation of MPI Programs in                                                                                                    |         | BTL5  |
| -  | detail.(Apr/May 2017)                                                                                                                                        | C 400 4 |       |
| 7. | <ul><li>ii.What are the performance issues in multicore processor?</li><li>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-</li></ul> | C409.4  |       |
|    | Kauffman/Elsevier, 2011 Page No:119-126                                                                                                                      |         |       |
|    |                                                                                                                                                              |         |       |
|    | i.Explain tree structured communication                                                                                                                      |         | BTL5  |
|    | ii.What are the differences between point to point and collective                                                                                            |         |       |
| 8. | communication? (Apr/May 2017)                                                                                                                                | C409.4  |       |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:102-103                                           |         |       |
|    | Kauffilall/Elsevier, 2011 Lage N0.102-105                                                                                                                    |         |       |
|    | (i)Explain Loop Handling in detail                                                                                                                           |         | BTL5  |
| 0  | (ii)Describe about MPI programs execution with example (Apr/May                                                                                              | C409.4  |       |
| 9  | 2018)                                                                                                                                                        | C409.4  |       |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                                                                      |         |       |
|    | Kauffman/Elsevier, 2011                                                                                                                                      |         | DTI 5 |
| 10 | Explain the virtual memory in detail (Apr/May 2018)                                                                                                          | C409.4  | BTL5  |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011                                                           |         |       |
|    | (i)Describe the Attribute Caching                                                                                                                            |         | BTL2  |
| 11 | (ii)Discuss about the communicators                                                                                                                          | C409.4  |       |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-                                                                                      |         |       |
|    | Kauffman/Elsevier, 2011                                                                                                                                      |         |       |

|    | (i)Describe the Distributed array datatype constructor                  |        | BTL5 |
|----|-------------------------------------------------------------------------|--------|------|
| 12 | (ii)Explain the Cartesian constructor                                   | C409.4 |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    | (i) Generalize the group of process                                     |        | BTL6 |
| 13 | (ii) Explain the virtual topology                                       | C409.4 |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    | (i)Describe the MPI program execution                                   |        | BTL1 |
| 14 | (ii)Describe about MPI Init and Finalize                                | C409.4 |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |
|    | (i)Describe about the Datatype constructor                              |        | BTL5 |
| 15 | (ii)Discuss about the subarray datatype constructor                     | C409.4 |      |
|    | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|    | Kauffman/Elsevier, 2011                                                 |        |      |

## UNIT V

## PARALLEL PROGRAM DEVELOPMENT

Case studies - n-Body solvers – Tree Search – OpenMP and MPI implementations and comparison.

# PART A

| Q. No. | Questions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | CO     | Bloom'<br>s Level |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------------------|
| 1.     | What is an n-body problem?<br>In an n-body problem, it needs to find the positions and velocities of a collection of interacting particles over a period of time. An n-body solver is a program that finds the solution to an n-body problem by simulating the behavior of the particles. The input to the problem is the mass, position, and velocity of each particle at the start of the simulation, and the output is the position and velocity of each particle at a sequence of user-specified times, or simply the position and velocity of each particle at the end of a user-specified time period. | C409.5 | BTL1              |

|    | Show the pseudocode of a serial n-body solver.                                                                                                                                                                              | C409.5 | BTL1 |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | 1 Get input data;                                                                                                                                                                                                           |        |      |
|    | 2 for each timestep {                                                                                                                                                                                                       |        |      |
|    | 3 if (timestep output) Print positions and velocities of particles;                                                                                                                                                         |        |      |
| 2. | 4 for each particle q                                                                                                                                                                                                       |        |      |
|    | 5 Compute total force on q;                                                                                                                                                                                                 |        |      |
|    | 6 for each particle q                                                                                                                                                                                                       |        |      |
|    | 7 Compute position and velocity of q;                                                                                                                                                                                       |        |      |
|    | 8 }                                                                                                                                                                                                                         |        |      |
|    | 9 Print positions and velocities of particles;                                                                                                                                                                              |        |      |
| 2  | List the two algorithms in the n-body solver.                                                                                                                                                                               | C409.5 | BTL1 |
| 3. | The n-body solver with the original force calculation, the <b>basic</b> algorithm,                                                                                                                                          |        |      |
|    | and the solver with the number of calculations reduced the <b>reduced</b> algorithm                                                                                                                                         |        |      |
|    | List the two algorithms in the n-body solver.                                                                                                                                                                               | C409.5 | BTL1 |
| 4. | The n-body solver with the original force calculation, the basic                                                                                                                                                            |        |      |
|    | algorithm, and the solver with the number of calculations reduced the reduced                                                                                                                                               |        |      |
|    | algorithm                                                                                                                                                                                                                   |        |      |
| 5  | Differentiate between two algorithms in n-body solver                                                                                                                                                                       |        |      |
| 5. | The <i>n</i> -body solver with the original force calculation, the <i>basic</i> algorithm, and the                                                                                                                          |        |      |
|    | solver with the number of calculations reduced, the <i>reduced</i> algorithm.                                                                                                                                               |        |      |
| 6. | Define Graph.                                                                                                                                                                                                               | C409.5 | BTL1 |
| 0. | A graph is a collection of vertices and edges or line segments joining pairs of                                                                                                                                             |        |      |
|    | vertices.                                                                                                                                                                                                                   |        |      |
|    | List the use of Recursive depth-first search algorithm.                                                                                                                                                                     | C409.5 | BTL1 |
|    | The algorithm makes use of several global variables:                                                                                                                                                                        |        |      |
| 7. | • <b>n</b> : the total number of cities in the problem                                                                                                                                                                      |        |      |
| /. | • <b>digraph:</b> a data structure representing the input digraph                                                                                                                                                           |        |      |
|    | • hometown: a data structure representing vertex or city 0, the salesperson's                                                                                                                                               |        |      |
|    | hometown                                                                                                                                                                                                                    |        |      |
|    | • best tour: a data structure representing the best tour so far                                                                                                                                                             |        |      |
|    |                                                                                                                                                                                                                             |        |      |
|    | What are the disadvantages of recursive depth first stage                                                                                                                                                                   |        |      |
| 8  | It also had the disadvantage that at any given instant of time or la                                                                                                                                                        |        |      |
|    | It also has the disadvantage that at any given instant of time only<br>This could be a problem when we try to parallelize tree search by                                                                                    |        |      |
|    | This could be a problem when we try to parallelize tree search by<br>threads or processes                                                                                                                                   |        |      |
|    | List the similarities parallelizing the solvers using pthreads.                                                                                                                                                             | C409.5 | BTL1 |
|    | The more important similarities are:                                                                                                                                                                                        | 0707.5 |      |
|    | <ul> <li>By default local variables in Pthreads are private, so all shared variables are</li> </ul>                                                                                                                         |        |      |
|    | global in the Pthreads version.                                                                                                                                                                                             |        |      |
| 8. | <ul> <li>The principal data structures in the Pthreads version are identical to those in</li> </ul>                                                                                                                         |        |      |
| 0. | the OpenMP version: vectors are two-dimensional arrays of doubles, and the                                                                                                                                                  |        |      |
|    | mass, position, and velocity of a single particle are stored in a struct. The                                                                                                                                               |        |      |
|    |                                                                                                                                                                                                                             |        | 1    |
| 1  | forces are stored in an array of vectors.                                                                                                                                                                                   |        |      |
|    | <ul><li>forces are stored in an array of vectors.</li><li>Startup for Pthreads is basically the same as the startup for OpenMP: the</li></ul>                                                                               |        |      |
|    | <ul> <li>forces are stored in an array of vectors.</li> <li>Startup for Pthreads is basically the same as the startup for OpenMP: the main thread gets the command-line arguments, and allocates and initializes</li> </ul> |        |      |

|    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 1      |          |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|----------|
|    | <ul> <li>the principal data structures.</li> <li>The main difference between the Pthreads and the OpenMP implementations is in the details of parallelizing the inner loops. Since Pthreads has nothing analogous to a parallel for directive, we must explicitly determine which values of the loop variables correspond to each thread's calculations. To facilitate this, a function Loop schedule, which determines</li> <li>the initial value of the loop variable,</li> <li>the final value of the loop variable, and</li> <li>the increment for the loop variable.</li> <li>The input to the function is <ul> <li>the calling thread's rank,</li> <li>the total number of iterations, and an argument indicating whether the partitioning should be block or cyclic</li> </ul> </li> </ul> |        |          |
|    | Define communication structure                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | C409.5 |          |
| 9. | A communication structure is called a <b>ring pass</b> . In a ring pass, imagine the processes as being interconnected in a ring. Process 0 communicates directly with processes 1 and comm sz-1, process 1 communicates with processes 0 and 2, and so on. The communication in a ring pass takes place in phases, and during each phase each process sends data to its "lower-ranked" neighbor, and receives data from its "higher-ranked" neighbor. Thus, 0 will send to commsz -1 and receive from 1. 1 will send to 0 and receive from 2, and so on. In general, process q will send to process (q-1+commsz)%comm_sz and receive from process (q+1)%comm_sz.                                                                                                                                 |        | BTL1     |
|    | 0 $(1)$ $(3)$ $(2)$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |        |          |
|    | List the properties of function First_index.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | C409.5 | BTL1     |
| ٥  | The function First index should determine a global index glb part2 with the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |        |          |
| 9  | following properties:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |        |          |
|    | 1. The particle glb_part2 is assigned to the process with rank owner.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |        |          |
|    | 2. glb_part1 < glb_part2 < glb_part1 + comm_sz.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |        |          |
|    | What is a word about I/O.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | C409.5 | BTL1     |
| 10 | The basic I/O was designed for use by single-process, single-threaded programs, and when multiple processes or multiple threads attempt to access the I/O buffers; the system makes no attempt to schedule their access.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |        |          |
|    | What is race condition? .(Nov/Dec 2017)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | C409.5 | BTL1     |
| 11 | A race condition is an undesirable situation that occurs when a device or system                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | C+09.3 |          |
| 11 | attempts to perform two or more operations at the same time, but because of the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |        |          |
|    | nature of the device or system, the operations must be done in the proper                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |        |          |
|    | I nature of the device of system, the operations must be done in the proper                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |        | <u> </u> |

|    | sequence to be done correctly.                                                   |        |      |
|----|----------------------------------------------------------------------------------|--------|------|
|    | Show the Pseudocode for the MPI implementation of the reduced n-body             | C409.5 | BTL1 |
|    | solver.                                                                          |        |      |
|    | 1 source = $(my_rank + 1)$ % comm_sz;                                            |        |      |
|    | 2 $dest = (my_rank - 1 + comm_sz) \% comm_sz;$                                   |        |      |
|    | 3 Copy loc_pos into tmp_pos;                                                     |        |      |
|    | 4 $loc_forces = tmp_forces = 0;$                                                 |        |      |
|    | 5                                                                                |        |      |
|    | 6 Compute forces due to interactions among local particles;                      |        |      |
| 12 | 7 for (phase = 1; phase < commsz; phase++) {                                     |        |      |
|    | 8 Send current tmp_pos and tmp_forces to dest;                                   |        |      |
|    | 9 Receive new tmp_pos and tmp_forces from source;                                |        |      |
|    | 10 /* Owner of the positions and forces we're receiving */                       |        |      |
|    | 11 $owner = (my_rank + phase) \% comm_sz;$                                       |        |      |
|    | 12 Compute forces due to interactions among my particles                         |        |      |
|    | 13 and owner's particles;                                                        |        |      |
|    | 14 }                                                                             |        |      |
|    | 15 Send current tmp_pos and tmp_forces to dest;                                  |        |      |
|    | 16Receive new tmp_pos and tmp_forces from source;                                |        |      |
|    | What is NP-complete problem? .(Apr/May 2017)                                     | C409.5 | BTL1 |
|    | In computational complexity theory, a decision problem is NP-complete            |        |      |
| 13 | when it is both in NP and NP-hard. The set of NP-complete problems is            |        |      |
|    | often denoted by NP-C or NPC. The abbreviation NP refers to                      |        |      |
|    | "nondeterministic polynomial time".                                              |        |      |
|    | What is Depth-first search?                                                      | C409.5 | BTL1 |
|    | Depth-first search (DFS) is an algorithm .for traversing or searching tree or    |        |      |
|    | graph data structures. One starts at the root (selecting some arbitrary node as  |        |      |
| 14 | the root in the case of a graph) and explores as far as possible along each      |        |      |
|    | branch before backtracking.                                                      |        |      |
|    |                                                                                  |        |      |
|    | What is Recursive depth-first search?                                            | C409.5 | BTL1 |
|    | Depth-first search (DFS) is an algorithm that traverses a graph in search of     | C+07.5 | DILI |
|    | one or more goal nodes. The defining characteristic of this search is that,      |        |      |
|    | whenever DFS visits a maze cell c, it recursively searches the sub-maze          |        |      |
|    | whose origin is c. This recursive behaviour can be simulated by an iterative     |        |      |
| 15 | algorithm using a stack. A cell can have three states:                           |        |      |
|    | <b>Unvisited.</b> The cell has not yet been visited by DFS.                      |        |      |
|    | • Visit In Progress. The cell has been discovered, but not yet finished. Ie, the |        |      |
|    | recursive search which begins at this node has not yet terminated.               |        |      |
|    | • Visited. The cell has been discovered, and the submazes which start at this    |        |      |
|    | node have been completely visited also.                                          |        |      |
| 16 | What problem occurs when test lock condition and update lock condition           | C409.5 | BTL1 |
| 10 | combined?                                                                        |        |      |
|    | The combination of "test lock condition" and "update lock condition" can         |        |      |

|    |                                                                                                                                                                                                                                                                                                                                                                                               | 1      |      |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | cause a problem: the lock condition (e.g. the cost of the best tour) can<br>change between the time of the first test and the time that the lock is<br>acquired. Thus, the threads also need to check the lock condition after they<br>acquire the lock.                                                                                                                                      |        |      |
| 17 | Show the pseudocode for updating the best tour.         The pseudocode for updating the best tour should look something like this:         if (new tour cost < best tour cost) {                                                                                                                                                                                                              | C409.5 | BTL1 |
| 18 | Show the syntax for mutex checking.<br>Pthreads has a <i>nonblocking</i> version of <i>pthreads_mutex_lock</i> called<br><i>pthread_mutex_trylock</i> . This function checks to see if the mutex is<br>available. If it is, it acquires the mutex and returns the value 0. If the mutex<br>isn't available, instead of waiting for it to become available, it will return a<br>nonzero value. | C409.5 | BTL1 |
| 19 | Define MPI.           The message passing interface (MPI) is a standardized means of exchanging messages between multiple computers running a parallel program across distributed memory                                                                                                                                                                                                      | C409.5 | BTL1 |
| 20 | Show a pseudocode for a recursive solution to TSP using depth firs<br>search.(Apr/May2017) <pre></pre>                                                                                                                                                                                                                                                                                        | C409.5 | BTL1 |
|    |                                                                                                                                                                                                                                                                                                                                                                                               |        |      |

|                | with one or more remote processors.                                                    |        | 1        |
|----------------|----------------------------------------------------------------------------------------|--------|----------|
|                | Show the syntax of MPI_Pack and MPI_Unpack.                                            | C409.5 | +        |
|                | int MPI_Pack(                                                                          |        |          |
|                | void* data to be packed /* in */                                                       |        |          |
|                | int to be packed count / _ in _/,                                                      |        |          |
|                | MPI_Datatype datatype /_ in _/,                                                        |        |          |
|                | void_ contig buf /_ out _/,                                                            |        |          |
|                | int contig buf size /_ in _/,                                                          |        |          |
|                | int_position p /_ in/out _/,                                                           |        |          |
| 22             | MPI Comm comm /_ in _/);                                                               |        | BTL1     |
| 22             |                                                                                        |        | <b>D</b> |
|                | int MPI Unpack(                                                                        |        |          |
|                | void_ contig buf /_ in _/,                                                             |        |          |
|                | int contig buf size /_ in _/,                                                          |        |          |
|                | int_position p /_ in/out _/,                                                           |        |          |
|                | void_ unpacked data /_ out _/,                                                         |        |          |
|                | int unpack count /_ in _/,                                                             |        |          |
| ı              | MPI Datatype datatype /_ in _/,                                                        |        |          |
| I              | MPI Comm comm /_ in _/);                                                               |        |          |
|                | List the use of MPI_IN_PLACE argument.                                                 | C409.5 | BTL1     |
| 23             | The use of argument <i>MPI_IN_PLACE</i> is that the input and output buffers           | 0702.2 | D12.     |
| 23             | are the same. This can save on memory and the implementation may be able               |        |          |
| 1              | to avoid copying from the input buffer to the output buffer                            |        |          |
|                | What the use of functions MPI Scatter and MPI Gather.                                  | C409.5 | BTL1     |
| 24             | The functions <i>MPI_Scatter</i> and <i>MPI_Gather</i> can be use to split an array of | 0.02.2 |          |
| 2 <del>4</del> | data among processes and collect distributed data into a single array,                 |        |          |
| i              | respectively                                                                           |        |          |
|                | What are the use of functions MPI_Scattery and MPI_Gathery.                            | C409.5 | BTL1     |
| ı              | When the amount of data going to or coming from each process is the                    |        |          |
| 25             | same for each process. If we need to assign different amounts of data to each          |        |          |
| ı              | process, or to collect different amounts of data from each process we can use          |        |          |
| 1              | MPI_Scatterv and MPI_Gatherv, respectively.                                            |        |          |
| . <u></u> i    | Show the syntax of MPI_Scatterv.                                                       | C409.5 | BTL1     |
| ı              |                                                                                        |        | _        |
| ı              | int MPI Scatterv(                                                                      |        |          |
| ı              | void* sendbuf /* in */,                                                                |        |          |
| ı              | int* sendcounts /* in */,                                                              |        |          |
| ·              | int* displacements /* in */,                                                           |        |          |
| 26             | MPI_Datatype sendtype /* in */,                                                        |        |          |
| ı              | Void* recvbuf /* out */,                                                               |        |          |
| ı              | int recvcount /* in */,                                                                |        |          |
| ı              | MPI_Datatype recvtype /* in */,                                                        |        |          |
| ı              | int root /* in */,                                                                     |        |          |
| ı              | MPI_Comm comm /* in */);                                                               |        |          |
| ı              |                                                                                        |        |          |
| 27             | Show the syntax of MPI_ Gatherv.                                                       | C409.5 | BTL1     |
| ı <u> </u>     |                                                                                        |        |          |
|                |                                                                                        |        |          |

|    |                                                              |                     |                |             |                                              | ,       | r    |
|----|--------------------------------------------------------------|---------------------|----------------|-------------|----------------------------------------------|---------|------|
|    | int MPI_Gatherv(                                             | JLf                 | /*             | */          |                                              |         |      |
|    |                                                              | dbuf                | /* in<br>/* in | */,<br>*/   |                                              |         |      |
|    |                                                              | dcount              | /* in          | */,         | */                                           |         |      |
|    | MPI_Datatype                                                 | sendtype            | /              | /* in<br>*/ | */,                                          |         |      |
| 1  |                                                              | vbuf                | /* out         | ,           | l                                            |         |      |
|    |                                                              | vcounts             | /* in          | */,         |                                              |         |      |
|    | -                                                            | olacements          | /* in          | */,         | ste /                                        |         |      |
|    | MPI_Datatype                                                 | recvtype            | /. <b>.</b>    | /* in       | */,                                          |         |      |
|    | int roo                                                      |                     | /* in          | */,         |                                              |         |      |
|    | MPI_Comm comm                                                | /* in               | */);           |             |                                              | C 400 T |      |
| 28 | What are the three modes provided by                         |                     |                | , <u>-</u>  | <b>,</b> , , , , , , , , , , , , , , , , , , | C409.5  | BTL1 |
|    | MPI provides three other modes for                           | or sending: synchi  | ronous,        | standa      | ard, and                                     |         |      |
|    | ready.                                                       | 11/777              |                |             |                                              | 0.125   | D    |
|    | List the use of functions MPI_Pack and                       |                     |                |             | •                                            | C409.5  | BTL1 |
| 29 | MPI provides the function <b>MPI_</b>                        | •                   |                |             | •                                            |         |      |
| 22 | contiguous buffer before sending                             |                     |                |             |                                              |         |      |
|    | be used to take data that's been r                           |                     | gle cont       | iguous      | buffer and                                   |         | 1    |
| ļ  | unpack it into a local data structur                         | re                  |                |             |                                              |         |      |
|    | List the use of ready mode.                                  |                     | -              | _           |                                              | C409.5  | BTL1 |
| 30 | Ready sends (MPI_Rsend                                       |                     |                |             | -                                            |         |      |
| 1  | has already been started when                                |                     | alled. T       | he ord      | linary send                                  |         |      |
|    | <i>MPI_Send</i> is called the standard                       |                     |                |             |                                              | L       | L    |
|    | List the use of Synchronous mode.                            |                     |                |             |                                              | C409.5  | BTL1 |
| 31 | Synchronous sends won't buffer                               |                     |                | •           |                                              |         |      |
|    | function MPI_Ssend won't return                              | n until the receive | er has be      | gun re      | ceiving the                                  |         |      |
|    | data.                                                        |                     |                |             |                                              | L       |      |
|    | How to compute n-body forces                                 |                     | _              | _           |                                              | C409.5  | BTL1 |
|    |                                                              |                     |                |             | I                                            |         |      |
|    | <b>for</b> each particle q                                   |                     |                |             | I                                            |         |      |
| [  | forces[q] = 0;                                               |                     |                |             | I                                            |         |      |
|    | <b>for</b> each particle q {                                 |                     |                |             | I                                            |         |      |
|    | <b>for</b> each particle $k > q$                             |                     |                |             | I                                            |         |      |
| 1  | {                                                            |                     |                |             | I                                            |         |      |
|    | $x_diff = pos[q][X] - pos[k][X];$                            |                     |                |             | I                                            |         |      |
| _  | $y_{diff} = pos[q][Y] - pos[k][Y];$                          |                     |                |             | I                                            |         |      |
| 32 | dist = sqrt(x diff_x diff + y diff_y diff)                   | ;                   |                |             | I                                            |         |      |
|    | dist_cubed = dist_dist_dist;                                 | ,                   |                |             | l                                            |         |      |
|    | force qk[X] = G_masses[q]_masses[k],                         | /dist cubed v d     | liff:          |             |                                              |         |      |
| 1  | force qk[Y] = G_masses[q]_masses[k],                         |                     |                |             |                                              |         |      |
| 1  | forces[q][X] += force qk[X];                                 |                     | A111           |             | l                                            |         |      |
|    | forces[q][X] += force $qk[X]$ ;                              |                     |                |             |                                              |         |      |
|    | forces[q][1] += force qk[1];<br>forces[k][X] -= force qk[X]; |                     |                |             |                                              |         |      |
| 1  | 101000  K   A  = 10000  K   A ;                              |                     |                |             |                                              | 1       |      |
|    |                                                              |                     |                |             |                                              |         |      |
|    | forces[k][Y] -= force qk[Y];                                 |                     |                |             |                                              |         |      |
|    |                                                              |                     |                |             |                                              |         |      |

|    | List the data structures used for serial implementation                                                                                                                                                                                                                                                                                                                                                                                                                               | C409.5 | BTL1 |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | List the tutu structures used for seriar implementation                                                                                                                                                                                                                                                                                                                                                                                                                               | 2.07.0 |      |
| 33 | The data structures are the tour, the digraph, and, in the iterative implementations, the sta<br>and the stack are essentially list structures. For tour instead of array structure with three is<br>the array storing the cities, the number of cities, and the cost of the partial tour.<br>When the digraph are represented using List.                                                                                                                                            |        |      |
|    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | C409.5 | BTL1 |
|    | Difference between Parallelizing the two <i>n</i> -body solvers using pthread and OpenMP.                                                                                                                                                                                                                                                                                                                                                                                             |        |      |
| 34 | The main difference between the Pthreads and the OpenMP implementations is in the<br>parallelizing the inner loops. Since Pthreads has nothing analogous to a parallel for<br>must explicitly determine which values of the loop variables correspond to each three<br>calculations. To facilitate this a function Loop schedule which contains<br>. the initial value of the loop variable,<br>. the final value of the loop variable, and<br>. the increment for the loop variable. |        |      |
|    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | C409.5 | BTL1 |
| 35 | How the performance of the reduced solver is much superior to the performance of the basic solver?                                                                                                                                                                                                                                                                                                                                                                                    |        |      |
| 55 | The efficiency of the basic solver on 16 nodes is about 0.95, while the efficiency of the reduced nodes is only about 0.70. A point to stress here is that the reduced MPI solver makes mu efficient use of memory than the basic MPI solver the basic solver must provide storage for on each process, while the reduced solver only needs extra storage for $n=\text{comm}_{sz}$ position $n=\text{comm}_{sz}$ forces                                                               |        |      |
|    | How can we use OpenMP to map tasks/particles to cores in the basic version on <i>n-body solver</i> ?                                                                                                                                                                                                                                                                                                                                                                                  | C409.5 | BTL1 |
|    | <b>for</b> each timestep {                                                                                                                                                                                                                                                                                                                                                                                                                                                            |        |      |
|    | if (timestep output) Print positions and velocities of particles;<br><b>for</b> each particle q                                                                                                                                                                                                                                                                                                                                                                                       |        |      |
| 36 | Compute total force on q;                                                                                                                                                                                                                                                                                                                                                                                                                                                             |        |      |
|    | <b>for</b> each particle q<br>Compute position and velocity of q;                                                                                                                                                                                                                                                                                                                                                                                                                     |        |      |
|    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |        |      |
|    | The two inner loops are both iterating over particles. So, in principle, parallelizin<br>the two inner <b>for</b> loops will map tasks/particles to cores, and we might try some<br>like this:                                                                                                                                                                                                                                                                                        |        |      |
|    | for each timestep                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |        |      |
|    | {     if (timestep output) Print positions and velocities of     particles;                                                                                                                                                                                                                                                                                                                                                                                                           |        |      |

|    | # pragma omp parallel <b>for</b>                                                    |        |      |
|----|-------------------------------------------------------------------------------------|--------|------|
|    | for each particle q                                                                 |        |      |
|    | Compute total force on q;                                                           |        |      |
|    | # pragma omp parallel <b>for</b>                                                    |        |      |
|    | <b>for</b> each particle q                                                          |        |      |
|    | Compute position and velocity of q;                                                 |        |      |
|    | }                                                                                   |        |      |
|    |                                                                                     |        |      |
|    |                                                                                     |        |      |
|    |                                                                                     |        |      |
|    |                                                                                     |        |      |
|    |                                                                                     |        |      |
|    |                                                                                     |        |      |
|    |                                                                                     | C409.5 | BTL1 |
|    | Write the pseudocode for the MPI version of the basic <i>n</i> -body solver?        |        |      |
|    |                                                                                     |        |      |
|    | Get input data;                                                                     |        |      |
|    | 2 <b>for</b> each timestep {                                                        |        |      |
|    | 3 if (timestep output)                                                              |        |      |
| 27 | 4 Print positions and velocities of particles;                                      |        |      |
| 37 | 5 <b>for</b> each local particle loc q                                              |        |      |
|    | 6 Compute total force on loc q;                                                     |        |      |
|    | 7 <b>for</b> each local particle loc q                                              |        |      |
|    | 8 Compute position and velocity of loc q;                                           |        |      |
|    | 9 Allgather local positions into global pos                                         |        |      |
|    | array;                                                                              |        |      |
|    | 10}                                                                                 |        |      |
|    | 11 Print positions and velocities of particles;                                     |        |      |
|    |                                                                                     | C409.5 | BTL1 |
|    | Write the pseudocode for the MPI implementation of the reduced <i>n</i> -body solv  |        | DILI |
|    | while the pseudocode for the will implementation of the reduced <i>n</i> -body solv |        |      |
|    | 1 source = (my_rank + 1) % comm_sz;                                                 |        |      |
|    | $2 \text{ dest} = (\text{my}_{rank} - 1 + \text{comm}_{sz}) \% \text{ comm}_{sz};$  |        |      |
|    | 3 Copy_loc_pos_into_tmp pos;                                                        |        |      |
|    | 4 loc_forces = tmp_forces = 0;                                                      |        |      |
|    | 5                                                                                   |        |      |
| 38 | 6 Compute forces due to interactions among local particles;                         |        |      |
|    | 1 0 1                                                                               |        |      |
|    | 7 <b>for</b> (phase = 1; phase < comm sz; phase++) {                                |        |      |
|    | 8 Send current tmp_pos and tmp_forces to dest;                                      |        |      |
|    | 9 Receive new tmp_pos and tmp_forces from source;                                   |        |      |
|    | $10 / _Owner of the positions and forces we're receiving _/$                        |        |      |
|    | 11 owner = (my_rank + phase) % comm_sz;                                             |        |      |
|    | 12 Compute forces due to interactions among my particles                            |        |      |
|    | 13 and owner's particles;                                                           |        |      |
|    | 14 }                                                                                |        |      |

| · · · · · · · · · · · · · · · · · · · |                                                                                               |        | ,    |
|---------------------------------------|-----------------------------------------------------------------------------------------------|--------|------|
|                                       | 15 Send current tmp_pos and tmp_forces to dest;                                               |        |      |
|                                       | 16 Receive new tmp_pos and tmp_forces from source;                                            |        |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               | l I    |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               |        |      |
|                                       |                                                                                               | C409.5 | BTL1 |
|                                       |                                                                                               | C+07.5 | DILL |
|                                       | What are the two phases for computation of forces?                                            |        |      |
| 39                                    |                                                                                               |        |      |
| 55                                    | The following choices with respect to the data structures:                                    |        |      |
|                                       | Each process stores the entire global array of particle masses.                               |        |      |
|                                       | Each process only uses a single n-element array for the positions.                            |        |      |
|                                       | Each process uses a pointer loc_pos that refers to the start of its block of pos.             |        |      |
|                                       |                                                                                               | C409.5 | BTL1 |
|                                       | Multiple the needed of an implementation of a depth-first solution to TSP                     | 0.07.2 | D    |
|                                       | Write the pseudo code for an implementation of a depth-first solution to TSP using requision? |        |      |
|                                       | using recursion?                                                                              |        |      |
|                                       |                                                                                               |        |      |
|                                       | <b>for</b> (city = n-1; city >= 1; city)                                                      |        |      |
|                                       | Push(stack, city);                                                                            |        |      |
|                                       | while (!Empty(stack)) {                                                                       |        |      |
|                                       | city = Pop(stack);                                                                            |        |      |
|                                       | if (city == NO CITY) // End of child list, back up                                            |        |      |
|                                       | Remove last city(curr tour);}                                                                 |        |      |
|                                       | else {                                                                                        | ļ      |      |
|                                       | ι ·                                                                                           |        |      |
| 40                                    | Add city(curr tour, city);                                                                    |        |      |
|                                       | if (City count(curr tour) == n) {                                                             |        |      |
|                                       | if (Best tour(curr tour)) {                                                                   |        |      |
|                                       | Update best tour(curr tour);                                                                  |        |      |
|                                       | Remove last city(curr tour);                                                                  |        |      |
|                                       | } else {                                                                                      |        |      |
|                                       | Push(stack, NO CITY);                                                                         |        |      |
|                                       | <b>for</b> (nbr = $n-1$ ; nbr >= 1; nbr)                                                      |        |      |
|                                       | if (Feasible(curr tour, nbr))                                                                 |        |      |
|                                       | Push(stack, nbr);                                                                             |        |      |
|                                       | 1 USI(SIACK, 1101),                                                                           |        |      |
|                                       |                                                                                               |        |      |
|                                       | }/_if Feasible _/                                                                             |        |      |
|                                       | }/_while !Empty _/                                                                            |        |      |
| 41                                    | How the function Push_copy is used in TSP                                                     | C409.5 | BTL1 |
|                                       |                                                                                               |        |      |

|    | It is necessary to push onto the stack to create a copy of the tour before actually pushing it or<br>using the function Push _copy. The extra memory is required to allocating storage for a new<br>copying the existing tour is time-consuming. Reduce the costs by saving freed tours in our o<br>structure, and when a freed tour is available we can use it in the Push _copy function instead<br>malloc.                                                                                                                                                                                            | tour and<br>wn data | k    |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|------|
|    | What are the algorithms for identifying which subtrees we assign to the p<br>threads                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | C409.5              | BTL1 |
| 42 | > depth-first                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                     |      |
|    | search<br>≻ breadth-first                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                     |      |
|    | search                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | C 100 F             |      |
|    | Define the term POSIX or PThreads                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | C409.5              | BTL1 |
| 43 | Pthreads are libraries of type definitions, functions, and macros that can be used in C program<br>a standard for Unix-like operating systems—for example, Linux and Mac OS X. It specifies<br>facilities that should be available in such systems. In particular, it specifies an application pro-<br>interface (API) for <i>multithreaded</i> programming. Pthreads is not a programming language (su<br>Java). Rather, like MPI, Pthreads specifies a <i>library</i> that can be linked with C programs. Unl<br>Pthreads API is only available on POSIX systems—Linux, Mac OS X, Solaris, HPUX, and s |                     |      |
| _  | What are the reason for parameter threads_in_cond_wait used in Tree search?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | C409.5              | BTL1 |
| 44 | There are also two cases to consider:<br>o threads _in_cond_wait < thread_count, it tells us how many threads are wait<br>o threads_in_cond_wait == thread count, all the treads are out of work, and its                                                                                                                                                                                                                                                                                                                                                                                                |                     |      |
|    | What are the global variables for Recursive depth first search?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | C409.5              | BTL1 |
| 45 | n: the total number of cities in the problem .<br>digraph: a data structure representing the input digraph .<br>hometown: a data structure representing vertex or city 0, the salesperson's home<br>tour: a data structure representing the best tour so far.                                                                                                                                                                                                                                                                                                                                            |                     |      |
| 46 | Mention the performance of MPI solvers<br>The performance of the reduced solver is much superior to the performance of the<br>basic solver, although the basic solver achieves higher efficiencies.<br>A point to stress here is that the reduced MPI solver makes much more efficient<br>use of memory than the basic MPI solver; the basic solver must provide storage for<br>all <i>n</i> positions on each process, while the reduced solver only needs extra storage                                                                                                                                | C409.5              | BTL1 |
| 47 | for n/commsz positions and n/commsz forces.Mention the principal data structures on pthread<br>The vectors are two-dimensional arrays of doubles, and the mass, position, and                                                                                                                                                                                                                                                                                                                                                                                                                            | C409.5              | BTL1 |

| -  |                                                                                                                                                                                                                                                           |        |      |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
|    | velocity of a single particle are stored in a struct. The forces are stored in an array of vectors.                                                                                                                                                       |        |      |
| 48 | Name any two OpenMp environment variables (Nov/Dec 2018)         omp_set_num_threads(num_threads)         omp_get_num_threads()         omp_get_max_threads()         omp_get_thread_num()                                                                | C409.5 | BTL1 |
| 49 | List any two scoping clauses in OpenMP (Nov/Dec 2018) <ul> <li>Shared Variables</li> <li>Private Variables</li> </ul>                                                                                                                                     | C409.5 | BTL1 |
|    | What are the reason for parameter threads_in_cond_wait used in Tree search?                                                                                                                                                                               | C409.5 | BTL1 |
| 50 | <ul> <li>there are also two cases to consider:</li> <li>threads _in_cond_wait &lt; thread_count, it tells us how many threads are waiting</li> <li>threads_in_cond_wait == thread count, all the treads are out of work, and its time to quit.</li> </ul> |        |      |

#### PART-B

| Q. No. | Questions                                                                                                                                                                        | СО     | Bloom<br>'s<br>Level |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|----------------------|
| 1.     | <ul> <li>1.Explain n-Body solvers in OpenMP.</li> <li>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:271-297</li> </ul>  | C409.5 | BTL5                 |
| 2.     | <b>Explain about Tree Search.</b><br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:229-318                              | C409.5 | BTL5                 |
| 3.     | <ul><li>Explain OpenMP implementations in detail.</li><li>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No:15-20</li></ul> | C409.5 | BTL5                 |

| 4. | <b>Explain MPI implementations in detail.</b><br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No: 86-88                                                                                                                      | C409.5 | BTL4 |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------|
| 5. | Compare OpenMP and MPI implementations.         Refer Notes                                                                                                                                                                                                                         | C409.5 | BTL4 |
| 6. | Explin how to Parallelizing the tree search in detail.         1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No: 306-308                                                                                                      | C409.5 | BTL5 |
| 7. | i.How to parallelize the basic solver using MPI(Apr/May2017)ii.Explain Non recursive depth first search(Apr/May2017)1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No: 316-317 & 303-304                                       | C409.5 | BTL5 |
| 8. | Explain the implementation of tree search using MPI and dynamic<br>partitioning(Apr/May2017)1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No:229-318                                                                      | C409.5 | BTL5 |
| 9  | What does the n-body problem do/give the pseudocode for serial n-<br>body solver and for computing n-body forces(Nov/Dec 2017).1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011 Page No: 271-297                                  | C409.5 | BTL1 |
| 10 | <ul> <li>How will you parallelize the reduced solver using Open Mp? How will you parallelize the reduced solver using Open MP? .(Nov/Dec 2017).</li> <li>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-Kauffman/Elsevier, 2011 Page No: 271-297</li> </ul> | C409.5 | BTL1 |
| 11 | Generalize about the two Serial program<br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011                                                                                                                                       | C409.5 | BTL6 |
| 12 | Describe about the Parallelizing the reduced solver using OpenMP<br>1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan-<br>Kauffman/Elsevier, 2011                                                                                                              | C409.5 | BTL2 |

| 10  | Describe about the Recursive and non recursive DFS                      | C409.5 | BTL2 |
|-----|-------------------------------------------------------------------------|--------|------|
| 13  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|     | Kauffman/Elsevier, 2011                                                 |        |      |
| 1.4 | Examine the Data structures for the serial implementation               | C409.5 | BTL3 |
| 14  | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|     | Kauffman/Elsevier, 2011                                                 |        |      |
|     | Express detail about the static parallelizing of tree search using      | C409.5 | BTL1 |
| 15  | PThreads                                                                |        |      |
|     | 1. Peter S. Pacheco, "An Introduction to Parallel Programming", Morgan- |        |      |
|     | Kauffman/Elsevier, 2011                                                 |        |      |