Quicksilver's Programming Stuff: September 2009

Thursday, September 17, 2009

11 Ways to Sort an Array!

sort.cpp: http://pastebin.com/f1b85d015

This program contains functions to sort an array of integers in descending order in 11 different ways :-

- Simple sort (Compare every element to every other element)

- Quicksort

- Dual pivot quicksort (The fastest general purpose sort)

- Insertion sort

- Selection sort

- Bubble sort

- Merge sort

- Heap sort

- Count sort (sorts in O(n) time)

- Bucket sort (works on data distributed uniformly across a range)

- Radix sort

The Dual Pivot Quicksort I wrote isn't as efficient as the one written by Vladimir Yaroslavskiy for the java.util.Arrays class, but the performance is better than the other sorting algorithms for large array sizes.

You can modify the code to sort in increasing order by interchanging > and <.

Wednesday, September 16, 2009

HashTable Template

Arrays, linked lists and other simple data structures do not allow us to search for an object in constant time. One way to solve this problem is using a Hash Table. It uses a hash function to reduce a key identifying the object to it's hash value in O(1) time, which is used to look up the object.

hashtable.cpp: http://pastebin.com/f66301bdd

(Tested on Mingw)

The hash table I have written here is a template. It will work with any of your classes, as long as it has a public field name and pointer to the next node. Collision resolution is done by simple chaining. I used char name[NAME] as the key, but you can use anything you want; just replace all the strcmp() calls by the appropriate comparison function.

I used my own hashing function, so replace it if you must:

hash(): Converts a char array into a hash

key(): Converts the hash into a integer key

hash() gives almost no collisions while key() has less than 10% collision rate. Because the hashtable only stores 1000 pointers, collisions will occur quite often in this particular program (when 1000 strings are entered 40% register as collisions). Obviously you'd want to use a better hashing function before plugging this into any real application.

You may also want to read up on salting the input.

Saturday, September 12, 2009

Statistical Estimator - German Tank Problem

This was a little program I wrote to test the accuracy of statistical estimation in solving the German Tank Problem:

"Suppose one is an Allied intelligence analyst during World War II, and one has some serial numbers of captured German tanks. Further, assume that the tanks are numbered sequentially from 1 to N. How does one estimate the total number of tanks?"

The results were surprisingly accurate, giving a standard deviation of less than 10%, when the upper limit was calculated from 10 random numbers within the range.

estimator.cpp : http://pastebin.com/f79223e97

The program was originally written for GCC, but when I tried running it on Mingw (Windows) the rand() function returned only a 16-bit result. So I went for the Boost random number library to get sufficiently large random numbers for the test.

Tuesday, September 8, 2009

Partitioned Memory Manager

Hello world!

This blog will hopefully be a place where I can share code with friends, talk about programming and technology related stuff, and serve as a repository for my past stuff when I need it later...

To start off, here's something I worked on a while back: A memory allocator that implements the C functions nmalloc and nfree (which basically do the same thing as malloc and free in malloc.h).

nmalloc.c : http://pastebin.com/f72460c04

nmalloc.h : http://pastebin.com/f4a8b27bc

Memory is divided into many partitions by the function nmalloc_init. Each partition has an associated block size. All blocks within one partition have the same length. While this leads to internal fragmentation, it avoids the overhead involved in managing blocks of variable length. The blocks within a partition are chained into 2 linked lists – the Used list and the Free list. Each partition has it’s own pair of linked lists. (The layout of the partitions is provided by the calling program)

If you already know the sizes of objects that will be used in your program, it speeds up allocation considerably. Unfortunately it has considerable space overhead, and isn't exactly thread friendly. I haven't optimised it because the performance was good enough for my purposes :-

O(1) allocation and deallocation when memory partitions are empty, since operations are more or less independent of number of blocks etc. (Typical delay: 2-6 µs)

O(n) allocation and deallocation when partitions start filling up, since smaller blocks must be merged. Time taken becomes proportional to size requested.

Quicksilver's Programming Stuff