Posts Tagged ‘Source Code


Prefix Sum Using OpenMP

OpenMP is an API for shared memory parallel programming. As in the various threads running on various cores or processors have memory space that is accessible by every thread, in addition to each thread’s own private memory space. It is available for C/C++ and Fortran.

Arguably, it is easier for parallel development than MPI, Pthreads etc. (And moreover, it is the one with which we’ll start our Parallel and Distributed Computing lab).

Some Basic Constructs

  • parallel : most basic construct. Creates a team of threads to execute the code that follows this directive, and all threads execute the code.
  • for : used just before a ‘for’ loop. Runs the iterations of the loop in parallel by splitting them amongst available threads.
  • barrier : used to provide barrier synchronization. Threads wait here until all threads in the team have finished execution.
  • single : only one thread executes the following block of code. Rest of the threads wait at the end of the construct.
And yeah, just in case this is your first time, don’t try to run these programs as you normally would. Compile from the command-line with -fopenmp flag.

First Non-Trivial Program

Formal Statement:

Given an array A[1..n], return an array S[1..n] such that S[i] = A[1] + A[2] + … + A[i] for all 0<i<n+1

For example:

Given the array [5,8,7,2,3], output will be [5,13,20,22,25]

Basic Parallel Algorithm:

  1. for j in (1 to log2(n))
  2.     for i in (j to n)
  3.         S[i] = A[i] + A[i – j]
  4.         i = i + 1
  5.     j = 2 * j

First attempt:

This code tries to simply use the most obvious parallelism in the algorithm. Steps 2 and 3 of the algorithm are shared between threads, each thread working on a different subset of the array. And yeah, to prevent race conditions, we use an auxiliary array which stores writes from a thread while other threads are reading from the other array.

#include <stdio.h>
#include <omp.h>

int main()
  int n, ar[2][100], *t, i, toread, size, j;
  printf("Enter length:");
  scanf("%d", &n);
  printf("Enter numbers:\n", n);
  for(i = 0; i < n; i++)
    scanf("%d", &ar[0][i]);
  /*set up complement to array that holds values*/
  toread = 1;
  /*copy first value, since it is not copied by the code*/
  ar[1][0] = ar[0][0];
  size = 0;
  /*following loop aims to get log2 of size, but can be avoided as in 2nd program*/
  while(i) {
    i >>= 1;
  /*following code implements algorithm*/
  for(j = 0; j < size; j++) {
    toread = !toread;
    if(toread) t = ar[0];
    else t = ar[1];
#pragma omp parallel for default(none) private(i) shared(n, j, t, ar, toread)
    for(i = 1; i < n; i++) {
      if(i - (1 << j) >= 0)
	t[i] = ar[toread][i] + ar[toread][i - (1 << j)];
      else t[i] = ar[toread][i];
  toread = !toread;
  for(i = 0; i < n; i++)
    printf("%d\n", *(*(ar + toread) + i));
  return 0;

Second attempt:

Here, we’ll divide the original array into sub-arrays, the number of which will be equal to the number of threads provided by the environment. Each thread will, linearly, calculate the prefix-sum for its assigned sub-array. These prefix-sums will be less than the actual sums, since elements before the start of a particular sub-array are ignored.

Now, the last elements of each of these sub-arrays is stored in another array. For this array, the prefix-sum array is calculated, and the corresponding values in original array are updated. This is done by adding to each element the requisite amount that was missing earlier.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include <string.h>

int main()
  int *arr, *partial, *temp;
  int num_threads, work, n;
  int i, mynum, last;
  printf("Enter length:");
  scanf("%d", &n);
  if(!(arr = (int *) malloc (sizeof (int) * n))) return -1;
  printf("Enter numbers:\n");
  for(i = 0; i < n; i++)
    scanf("%d", arr + i);
#pragma omp parallel default(none) private(i, mynum, last) shared(arr, partial, temp, num_threads, work, n)
#pragma omp single
      num_threads = omp_get_num_threads();
      if(!(partial = (int *) malloc (sizeof (int) * num_threads))) exit(-1);
      if(!(temp = (int *) malloc (sizeof (int) * num_threads))) exit(-1);
      work = n / num_threads + 1; /*sets length of sub-arrays*/
    mynum = omp_get_thread_num();
    /*calculate prefix-sum for each subarray*/
	for(i = work * mynum + 1; i < work * mynum + work && i < n; i++)
      arr[i] += arr[i - 1];
    partial[mynum] = arr[i - 1];
#pragma omp barrier
    /*calculate prefix sum for the array that was made from last elements of each of the previous sub-arrays*/
	for(i = 1; i < num_threads; i <<= 1) {
      if(mynum >= i)
		temp[mynum] = partial[mynum] + partial[mynum - i];
#pragma omp barrier
#pragma omp single
      memcpy(partial + 1, temp + 1, sizeof(int) * (num_threads - 1));
    /*update original array*/
	for(i = work * mynum; i < (last = work * mynum + work < n ? work * mynum + work : n); i++)
      arr[i] += partial[mynum] - arr[last - 1];
  for(i = 0; i < n; i++)
    printf("%d\n", arr[i]);
  return 0;

My First Lexer (in Python)…

There is a course we have, Theory of Computations (actually more of formal languages), which uses the Cinderella Book as one of the recommended readings. There was an assignment for this course, which involved designing a program to separate tokens from a C++ program. After a little search-engining, I found out that such a program is actually a lexer(lexical analyzer), uses regular expressions, and that there are various automated tools to generate lexers.

But I decided to design one on my own before looking into the lexer implementation strategies in detail. So, I began with reading some stuff from “Engineering a Compiler”, and got something running. Here is the crude working code. Its immensely unproductive as far as applicability goes, but I was behind schedule in submissions, so I crooked my way through. As soon as possible, I’ll work on a better implementation.

Source Code:

Click below to read:

import re
from deftok import keywords,expressions,separators
import os

def matchNow(line,pos,id):
    checks for the regular expression of the given 'id', if it is present in rulebase
    otherwise appends it to token list
    also adds type of token to tokens list
    global tokens
    if id in expressions.keys():
    if tok in keywords:
    elif id=='i':
    elif tok in separators:
    elif id=='n':
    return ret

def checkLine(line,pos):
    removes whitespace
    checks first character of line at 'pos', and calls matchNow() appropriately
    global tokens
    if found:
    if pos<len(line):
    if curr.isalpha() or curr=='_':
    elif curr.isdigit():
    elif curr in ['@','$']:

def main(name):
    responsible for file handling, and concatenation of lines with backslash-newline
    for line in fp:
        if len(line)>1 and line[-2]=="\\":
    for token in tokens:
if __name__=="__main__":
    import sys

Here is the definitions file. It has a list of keywords and separators, and a dictionary mapping “id’s” with regular expressions…

#list of keywords
keywords=['auto', 'const', 'double', 'float', 'int', 'short', 'struct', 'unsigned', 'break', 'continue', 'else', 'for', 'long', 'signed', 'switch', 'void', 'case', 'default', 'enum', 'goto', 'register', 'sizeof', 'typedef', 'volatile', 'char', 'do', 'extern', 'if', 'return', 'static', 'union', 'while', 'asm', 'dynamic_cast', 'namespace', 'reinterpret_cast', 'try', 'bool', 'explicit', 'new', 'static_cast', 'typeid', 'catch', 'false', 'operator', 'template', 'typename', 'class', 'friend', 'private', 'this', 'using', 'const_cast', 'inline', 'public', 'throw', 'virtual', 'delete', 'mutable', 'protected', 'true', 'wchar_t']

#list of separators
separators=['(', ')', '[', ']', '{', '}', ',', ';', ':']

expressions={   '+':r'\+[+=]|\+',
 '<':r'<<=|<[<=]|<', '>':r'>>=|>[>=]|>',

Problems with this program:

  1. It is ugly.
  2. Tokens like “9xyz” are accepted as valid identifiers. Change in regular expressions required.
  3. Apart from the regular expression rulebase, not much of the code is customizable. Eg: classification of identifiers is rigid, formatting in output file is precoded; in fact, even the rulebase is forced to work according to the inbuilt functions: checkLine() and matchNow()
  4. Invalid tokens need to raise an exception, not included here.
  5. Possibly, the entire lexer should be packaged in a class, which is iterable over itself. Then, its instance can be utilized as <instancename>.next()

There might be other issues as well. After I get over my exams, lets see if I get a better version of this rudimentary lexer up and running.


Top Clicks

  • None