Optimization in Computer Algorithms

In simple words, an algorithm is a step-by-step procedure of doing a given task. The goal of an algorithm is to perform that task in an efficient manner so that the consumption of resources is minimum.

In context of computer engineering, the resources are time and memory. So, the goal of a computer algorithm is to perform the task as quickly as possible and using as less memory as possible. Usually, there is a trade-off between these two resources(memory and time); as in today’s era we have abundance of space, we mainly focus on reducing the time consumed while developing a computer algorithm.

After the designing process is completed, a computer algorithm is the implemented in a computer programming language such a C/C++, Java, Python etc.

To demonstrate how choosing a computer algorithm intelligently may affect the outcome of program/software lets see an example problem. The problem is : We have an integer array called ‘nums’ which has n integers (As array indexing in C/C++ starts with 0, the first integer will be stored at nums[0] and the last on will be stored at nums[n-1]). Next, we are given a number, and we have to tell if that number is present in the array or not. And, we are given MANY such queries ( that is, we need to check the presence of many numbers in the array).

One very simple approach that comes in mind is that, for each query, start searching the array from the beginning and keep searching until, either the integer required is found or the end of array is encountered (i.e. the integer is not found). In the form of C code, the implementation of this algorithm will look something like this:

/*let the integer that we need to find be ‘tofind’*/

int i=0;
while(i<n)
{
  if(nums[i]==tofind)
    break;
  i++;
}
if(i==n)
  printf(“Number not found”);
else
  printf(“Number found”);

This approach is called LINEAR SEARCH and is very straight-forward and easy to understand and implement. But an obvious drawback of this approach is that we exhaustively search the entire array for each query. This is not a most efficient way of doing this task. Lets analyze this algorithm from real world perspective. In most softwares we deal with a large amount of data. So, if there are 10^5 array integers and 10^5 queries, that would mean that the body of the while loop is executed 10^10 times. Now, for a standard 4GB RAM, 2.5 GHz machine, 10^8 iterations take about 1-2 second, 10^9 iterations take about 10 seconds and 10^10 iterations is bound to take a lot of time. Time that we cannot afford to lose!

Now, let’s study an alternate approach. This approach is called ‘Binary Search’ and is used widely for this kind of problems.

The first step of this algorithm is to sort the array. At first look, it may seem that sorting itself will consume a lot of time and will contribute to overhead. But, remember that we have to run MANY queries on the array. This means that sorting once will give advantage for EACH of the many queries. So, ultimately the overhead contributed by sorting is much MUCH less than the advantage it gives.

Now, when we start searching, the required integer can be present anywhere in the sorted array. We take a look at the middle integer. If this middle integer is greater that our target integer, then we can say that our target integer is obviously present somewhere in the first half of the array (this is because the array is sorted). And if the middle integer is less than our target integer, then our target integer is present somewhere in the second half of the array.

The potential space ( i.e. the space where the required integer can be present) is, therefore, reduced to half.

We repeat this process until the size of the potential space becomes 1. And if we still have not yet found the integer, we can conclude that it is not present in the array.

In form of C code, this algorithm will look something like:

int start=0, end=n-1, mid;
int foundflag=0;
while(start<end)
{
  mid=(start+end)/2;
  if(nums[mid]==tofind) // if the middle element is the required number, then search terminates
  {
    foundflag=1;
    break;
  }
  else if(nums[mid]>tofind)
    end=mid-1; // restrict search to first half
  else if(nums[mid]<tofind)
    start=mid+1; // restrict search to second half
}
if(foundflag==1)
  printf(“Number found”);
else
  printf(“Number not found”);

 

 

This algorithm halves the potential space in every iteration. So, it takes a maximum of log2(n) iterations to complete the search of one element, as opposed to n iterations per search of the previous algorithm.

During the CEV Talk, I demonstrated the effects of this algorithm on the execution time. I took some sample arrays and sample queries, and I ran both algorithms on the test data. The first approach took over 1 minute in completing the queries, whereas the second approach took less than 0.5 second. The effects were clearly visible!

Note that if there is only a few number of queries, then the overhead produced by sorting will be greater than the advantage it gives. So, BINARY SEARCH is desirable only when there are a lot of queries. Otherwise, simple LINEAR SEARCH would work fine.

Leave a Reply