Sunday, September 1, 2013

KMP algorithm


Below there is the C# program which implements the Knuth-Morris-Pratt (KMP) string searching algorithm.
The KMP algorithm searches for occurrences of a pattern P within a text string by employing the observation that when a mismatch occurs, the pattern itself embodies sufficient information to determine where the next match could begin.






KMP spends a little time precomputing a partial match table(on the order of the size of P[], O(p),
and then it uses that table to do an efficient search of the string in O(s), where s is a size of the string.

Sunday, April 14, 2013

Just for fun

The following MMIX program prints the snowing on a console.




The snowing
 
 

Saturday, February 16, 2013

The longest increasing subsequence problem

There was the following question on the RSDN site.
I have an array of size N. This array consists of numbers from 1 to N, and they are not repeated, and they are at random order. How can I determine the length of an maximum increasing sequence?

The longest increasing subsequence (LIS) problem is to find a subsequence of a given sequence in which the subsequence elements are in sorted order, lowest to highest, and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous, or unique. LIS is solvable in time O(n log n), where n is the length of the input sequence.

As a base, I took the algorithm from GeeksforGeeks site. I fixed few bugs in this algorithm and programmed it in C#.

  1.   class LongestIncr_ingSubseq
  2.   {
  3.     public Int32 LongestIncrSubseq(Int32[] sourceArray)
  4.     {
  5.       Int32  lengthSourceArray = sourceArray.Length;
  6.       Int32[] tailIndices = new Int32[lengthSourceArray];
  7.       Int32[] prevIndices = new Int32[lengthSourceArray];
  8.       Int32  maxLength = 1;
  9.  
  10.       tailIndices[0] = 0;
  11.       prevIndices[0] = -1;
  12.  
  13.       for (Int32 i = 1; i < lengthSourceArray; i++)
  14.       {
  15.         if (sourceArray[i] < sourceArray[tailIndices[0]])
  16.           tailIndices[0] = i;
  17.         else if (sourceArray[i] > sourceArray[tailIndices[maxLength - 1]])
  18.         {
  19.           prevIndices[i] = tailIndices[maxLength - 1];
  20.           tailIndices[maxLength++] = i;
  21.         }
  22.         else
  23.         { 
  24.           Int32 pos =
  25.            GetCeilIndex(sourceArray, tailIndices, -1, maxLength - 1, sourceArray[i]);
  26.           prevIndices[i] = tailIndices[pos - 1 < 0 ? 0 : pos - 1];
  27.           tailIndices[pos] = i;
  28.         }
  29.       }
  30.  
  31.       for (Int32 i = tailIndices[maxLength - 1],
  32.        j = maxLength, l = lengthSourceArray - 1; j > 0; i = prevIndices[i], j--)
  33.         tailIndices[l--] = sourceArray[i];
  34.       for (Int32 i = lengthSourceArray - maxLength, j = maxLength; j > 0; j--)
  35.         Console.WriteLine("LIS " + String.Format("{0} ", tailIndices[i++]));
  36.       return maxLength;
  37.     }
  38.  
  39.     Int32 GetCeilIndex(Int32[] sourceArray, Int32[] tailIndices, Int32 l, Int32 r, Int32 key)
  40.     { 
  41.       Int32 m;  
  42.       while( r - l > 1 )
  43.       {   
  44.         m = l + (r - l)/2;
  45.         if (sourceArray[tailIndices[m]] >= key) r = m;   
  46.         else l = m; 
  47.       }
  48.       return r;
  49.     }
  50.   }

Friday, January 11, 2013

The analysis of algorithms and MMIXAL.NET

Let's discuss how MMIXAL.NET can help to analyze algorithms. In particular, we'll consider the sorting by counting algorithm. This method is based on the idea that the jth key in the final sorted sequence is greater than exactly j - 1 of the other keys.
We need to compare K(j) key with K(i) key for 1 < j < i, 1 < i < N. MMIXAL code that implements the algorithm is given below (to enlarge the picture click on it).

InputBuf contains the keys to sort. This algorithm involves no movement of records. CountBuf[j] tells us where to move InputBuf[j] record. For our program, CountBuf will contain the following data: 6,1,8,0,15,3,14,4,10,5,2,7,9,11,13,12
It means InputBuf[4] is the smallest key and InputBuf[5] contains the biggest key.

At the end of the program's execution MMIXAL.NET prints
_ the total number of instructions executed;
_ the number of units that represents the clock cycle time or oops in a pipelined implementation;
_ the number of memory references or mems that the program uses;
_ the profile of the program.
See picture below.

Program's profile contains a frequency (fr) of instruction's execution. The running time of this program is 7N+7A+3B+1 units,where
_ N is the number of records to sort;
_ A is the number of choices of two things from a set of N objects (N! / 2!(N-2)!);
_B is the number of pairs of indices for which j < i and Kj > Ki.
Thus, B is the number of inversions of the permutation K1 K2 ... KN. The min value of B is 0 and max value is (N**2-N)/2.

So our program requires between 3.5N**2 + 3.5N + 1 and 5N**2 + 2N + 1 units of a time. The factor N**2 dominates this running time. It shows that the sorting by counting algorithm is not an efficient way to sort when N is large. Doubling the number of records increases the running time fourfold.