Bubblesort - it has a cool name, so it must be good... right?

Well, its been a while since I did any algorithm stuff, but I decided to give myself a refresher in sorting since, unlike Barack Obama, I still like Bubblesort (oops, did I just say that out loud?).  I like it because it has a cool name, and because it was the first sorting method I ever learned, and because for many years it was the only one I knew.  I like it because it was about the first program I wrote by myself at university (it used string manipulation to sort the characters of a string in Java, and I still recall how proud I was showing it to my lecturer - I also recall how he just smiled at my efforts).  I also like it because I deliberately snuck a C implementation into an embedded assignment recently just to see if anyone was watching - and I got full marks!  (Although to be fair it was an Engineering assignment, not a Computer Science one)... So anyway, whats not to like?

I also can't remember a lot of the Big-O complexity stuff I learned, and it might be good to get some of that back in the brain.  What I do remember is that O(1) is totally stunning, O(log n) is amazing, O(n) is exceptionally good, O(n log n) is pretty good, and O(n2) is not so flash.

Well, some simple timing on a simple Bubblesort implementation might be in order, just to prove to myself that a cool name is not enough to make it a cool algorithm.  I know I did this years ago, but I don't recall the nitty gritty details, so here we go again, just for fun Cool.

Just in case you don't know how it works, you simply go through the list of items you want to sort, checking pairs and swapping them if they are not right... so for a small list, bubblesort would sort it like this:

[5, 3, 7, 1] (This is the example list we feed to bubblesort)

 Start with the first pair: 5, 3.  They are out of order, so swap them

[3, 5, 7, 1]

Now move forward one: 5, 7.  They are in order, leave them

[3, 5, 7, 1]

Now move forward one: 7, 1.  They are out of order, so swap them

[3, 5, 1, 7]

We're at the end, start again because in the last pass we did a swap

Start with 3, 5. Its in order, move on

Next is 5, 1.  Out of order, swap them

Next is 5, 7.  In order, move on.

Start again because a swap was made in that pass

[3, 1, 5, 7] is how it now looks

3, 1 gets swapped

3, 5 doesn't

5, 7 doesn't

Start again because a swap was made in that pass

[1, 3, 5, 7] is how it now looks

1, 3 is good

3, 5 is good

5, 7 is good

We're at the end, no swaps were made in that pass.  Return the sorted list.

To analyse that a little bit more scientifically:

The worst case for Bubblesort is a reversed list.  On the first pass, the algorithm makes n-1 comparisons where n is the number of elements being sorted (if the list was [4,3,2,1] then it compares 4 with 3, 3 with 2, and finally 2 with 1).  On the second pass it makes... well, n-1 comparisons again.  And it keeps making n-1 comparisons until its completed a pass with no swaps.  So a reverse-ordered list, if n = 4, will make n*n-1 (4 * 3) comparisons.  Since we're interested in the big values, not the little ones we can simplify n*n-1 to just n*n - which is the same as saying n2.

The reason we can simplify it is that if you have 1000000 items, a runtime trying to do 1000000000000 comparisons is going to be much the same as doing 999999000000 comparisons - you're going to wait a while.  Subtracting one before you multiply has little real effect in the grand scheme of things.

In the worst implementation of Bubblesort it completes n passes every time to ensure its sorted everything - the version I've used below stops once its made a clean pass - meaning it could actually complete in n-1 comparisons if the list was sorted - so the best case for this implementation is O(n) - but best cases are like unicorns... and its the same as buying a Ferrari on credit when you have no money to pay for it because, best case, you'll win lotto this weekend.

So, knowing that, heres my test bubblesort code (not the fastest implementation, but it will suffice):

 

def bubblesort(my_list):
    flag = True
    while (flag):
        flag = False # Clear the swap-found flag
        for i in range(len(my_list)-1):
            if my_list[i] > my_list[i+1]:
                flag = True # We found a swap in this pass, raise the flag
                my_list[i], my_list[i+1] = my_list[i+1], my_list[i]
    return my_list

 

And if I time that on a short list of, lets say, 10 items - I get a grand total of 0.0 seconds (give or take a very small amount).

Pump that list up to 1000 items by making an unsorted list:

 

from random import sample
from time import time

random_list = sample(xrange(1000), 1000)
start = time()
print bubblesort(random_list)
print 'time taken = {0} seconds'.format(time()-start)

 

and I get around 0.13 seconds on my dev machine.  Still not too bad really.  Running it a few times gives a range of times from around 0.128s to 0.134s.

So lets add a 0 - how long will it take on 10,000 items... thats just making it 10 times bigger, right?  So surely it will be around 1.3 seconds... won't it?

Well, according to my timing, its more like 13 seconds.  So making it ten times bigger resulted in a time that's one hundred times longer!   So the complexity of Bubblesort, which is considered to be an n2 algorithm is probably right (that is (loosely), if we double the items, the increase in time will be squared).  Here is a plot of lists from 1000 to 11000 items:

Bubblesort Timing

As you can see, its not a straight line, its curving up - and it will continue to do so as the amount of data it needs to process gets larger and larger.  So while it is perfectly fine sorting a small data set, it just doesn't scale well... except for the aforementioned special case... the sorted list:

 

sorted_list = [x for x in range(10000)]
bubblesort(sorted_list)

 

Well, it takes 0.0 seconds, give or take a very small amount!  This is because all it has to do is race through the list once, making sure everything is in order.  The flag will never be raised, so it only makes n-1 comparisons - suddenly its an O(n) algorithm!  This is good... very good!...  But.  Its only good if it gets a sorted, or very-nearly sorted list.  And since this is a 'sorting' algorithm, not a 'checking of sorted lists' algorithm, its most likely going to be fed an unsorted list, meaning its average case is going to be O(n2).  Not good.

Oh well, I still like the name Tongue out

Stay tuned - I'm going to attack a better algorithm with a less cool name, just to see how it performs.