Thursday, July 30, 2009

Project Euler question 24 in C and Python

In my spare time after work, I've been going through the questions found on ProjectEuler. I'm doing it as a way of learning both Python and the Vim editor. Plus, it's just fun to solve these problems. I found problem 24 to be considerably challenging.

This is the problem:

A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are:

012 021 102 120 201 210

What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?


I managed to finish it in Python and the code I wrote executes in about 36 seconds. Below is a brief description of how my code works:

In a word, recursion.

At the top level, I loop through an index which goes from zero to the amount of digits in the number (in this case, 10). For each iteration, I remove the digit which has the same value as the current index. Then I sort that number without the removed digit. Then I insert that digit back into the sorted number at position 0. So without doing any recursion, the permutations would be:

0123456789
1023456789
2013456789
3012456789
4012356789
etc...

Once I have that number, I recursively call that function again using that newly generated number instead of the original number. I also, increase an integer argument "start" by 1 each time. This argument allows me to only resort the digits to the right of the "start" value. For example, iterating into the next level down using the first number would give these results:

0123456789
0 123456789
0 213456789
0 312456879
0 412356789
etc...

The "start" variable allows me to assume the value to the left of "start" and simply tack that value on. The recursion continues down this path until start == the length of the string. At that point I commit the derived value to a master list of permutations.


So that got me to thinking. How fast would similar code execute if written in C instead? Now I'm no expert in C. It's a language I know even less well than Python. I am a .Net developer by day. But I wanted the challenge of making it work in C. Furthermore, I was curious to see how fast it would run in C compared to Python. I was also curious to see how much more code I would have to write in C versus Python and how much more difficult that code would be to write.

Small disclaimer here. I make no claims as to whether or not my solution for this particular problem is the best possible solution or if the Python version is in any way "Pythonic". So if anyone has a better solution, I would love to see it. Also, I obviously wrote the Python solution first and then merely engineered the C solution to run the same way as the Python solution. So I did not solve the C solution differently from the Python solution.

Onto the code. Again, the purpose of this is to compare the speed of execution, speed of development, and lines of code between the two languages.

First the Python:


import time

DIGITS = '0123456789'
MAX_PERM = 1000000

def get_resorted_digits(numstr, remove_digit):
removestr = str(remove_digit)
if numstr.find(removestr) == -1: return False
new_numstr = numstr.replace(removestr, '')
new_sorted_digits = sorted(new_numstr)
new_sorted_digits.insert(0, removestr)
return ''.join(new_sorted_digits)

def list_permutations(start, strlen, numstr):
for remove_digit in range(strlen):
resorted = get_resorted_digits(numstr[start:strlen], remove_digit)
if not resorted: continue
new_numstr = numstr[0:start] + resorted
if start == strlen-1: yield new_numstr
if start < strlen:
for recursive_perm in list_permutations(start+1, strlen, new_numstr):
yield recursive_perm


numstr = DIGITS
strlen = len(DIGITS)

start = time.time()
count = 0
for perm in list_permutations(0, strlen, numstr):
count += 1
if count == MAX_PERM:
print perm
break

print time.time() - start



Then the C code (utilities functions omitted for brevity):


#include stdio.h
#include stdlib.h
#include string.h
#include time.h
#include "utilities.h"

#define DIGITS "0123456789"
#define MAX_PERM 1000000

typedef struct
{
int list_counter;
int current_index;
char **list;
} List;

int resort_digits(char *, char *, char);
void append_permutations(List *, int, int, char *);
void append_to_list(List *, char *);
void expand_list(List *);
void runtests(void);

int main (int argc, char *argv[])
{
char numstr[] = DIGITS;
int len = strlen(numstr);
List *list;
list = malloc(sizeof(List));
list->list_counter = 1;
list->current_index = 0;
list->list = malloc(sizeof(list->list));

time_t t1, t2;
time(&t1);
append_permutations(list, 0, len, numstr);
int i;
for (i = 0; i <>current_index; i++)
if (i == MAX_PERM-1) {
printf("%s\n", list->list[i]);
break;
}
time(&t2);
int total_time = difftime(t2, t1);

printf("%d\n", total_time);

free(list);
return 0;
}

void expand_list(List *list) {
list->list_counter *= 2;
list->list = realloc(list->list, (list->list_counter * sizeof(list->list)));
}

void append_to_list(List *list, char *string) {
if (list->current_index == list->list_counter)
expand_list(list);

list->list[list->current_index] = malloc(strlen(string) + 1);
strcpy(list->list[list->current_index], string);
list->current_index++;
}

int resort_digits(char *source, char *dest, char cremove_digit) {
int len = strlen(source);
if (find(source, cremove_digit) == -1)
return -1;
char removed[len];
strremove(removed, source, cremove_digit);
qsort(removed, strlen(removed), sizeof(char), (void *) strcomp);
strinsert(dest, removed, 0, cremove_digit);

return 0;
}

void append_permutations(List *list, int start, int strlen, char *numstr) {
int remove_digit;
char cremove_digit;

for (remove_digit = 0; remove_digit < strlen; remove_digit++) {
cremove_digit = cchar(remove_digit);

char partial_end[strlen];
strcopy(partial_end, numstr, start, strlen);
char resorted[strlen];
if (resort_digits(partial_end, resorted, cremove_digit) == -1)
continue;

char partial_start[start];
strcopy(partial_start, numstr, 0, start);
char new_numstr[strlen];
strcpy(new_numstr, partial_start);
strcat(new_numstr, resorted);

if (start == (strlen-1))
append_to_list(list, new_numstr);

if (start < strlen)
append_permutations(list, start+1, strlen, new_numstr);

}

}



Here are the results:
Python: 38 Lines, takes about 36 seconds to execute, took about 8 hours to figure out / write
C: 110 Lines, takes about 9 seconds to execute, took about 8 hours to figure out / write

Summary:
The Python development time is only as long as the C time because I had to figure out the solution in Python and then implement it. In C, I only had to simply implement the solution I had already figured out in Python. Most of my C time was sucked up on me figuring out how malloc / realloc work. Certainly a good C programmer could have done this in much less time. However, the fact remains that the C solution takes almost 3 times as many lines of code as the Python solution. The cool thing is that the C solution executes almost 4 times as fast as the Python solution. So all of the stereotypical reasons for why someone would develop a solution in C or Python held true in this exercise. Python for developer speed, code brevity and understand-ability. C for execution speed and performance.