Counting sort

From Algorithmist
Jump to: navigation, search
Sorting
Algorithms
Bubble sort
Insertion sort
Selection sort
Quicksort
Merge sort
Heap sort
Introsort
Counting sort
Problems
Problems solvable using sorting

The Counting sort algorithm is not based on comparisons like most other sorting methods are, and its time complexity is thus not bounded by Ω(nlogn) as all comparison sorts are. This method of sorting is used when all elements to be sorted fall in a known, finite and reasonably small range.

Before we discuss the counting sort, let's describe a slightly simpler sort: the rapid sort.

Rapid Sort[edit]

"Rapid sort only sorts keys. In other words, items to be sorted consist solely of the key; there is no additional data in items." -- NIST

For example, we might need to sort a billion numbers which all lie between 1 and 1000. In this case, the best comparison sort would take O(nlogn) time, which is rather slow. However, we can create an array with 1000 elements. As we read in each number, we increment its corresponding array element by 1. When all inputs have been read in, we iterate through the 1000 elements in the array and we can print the numbers in order.

In the following pseudocode, there are N input numbers which all lie between 0 and (K-1), inclusive.

for i in 0 to (K - 1)
    counts[i] = 0

for each input number n
    counts[n] = counts[n] + 1

for i in 0 to (K - 1)
    assert( 0 <= counts[i] )
    for j in 0 to counts[i]
         output i

The time complexity goes as follows. It takes O(K) time to inititialize the array, O(N) time to read in the numbers and increment the appropriate element of counts, and another O(N) time to output the sorted list, for a total runtime of O(N+K)sort.

Counting sort[edit]

The initial setup and first pass through the data in the counting sort are exactly the same as the rapid sort. However, the counting sort also keeps track of additional data associated with each item.

The counting sort is a stable sort -- if, for example, we have a list of students sorted by last name, then a counting sort is used to sort by grade A B C D F, the final list will be sorted primarily by grade, but all the students who got the same grade will still be sorted by last name.

In the following pseudocode, there are N students, each of which make a grade which lies between 0 and (K-1), inclusive.

for i in 0 to (K - 1)
    counts[i] = 0

for each input student s
    counts[s.grade] = counts[s.grade] + 1

// up to this line, it is identical to the rapid sort.

sum = 0
for i in 0 to (K - 1)
    sum = sum + counts[i] // accumulate sum
    counts[i] = sum // convert counts[] array to a "cumulative sum"

// second pass through the original data
for each input student s starting from the last student
    index = counts[s.grade]
    assert( 0 < counts[s.grade] )
    counts[s.grade] = counts[s.grade] - 1
    assert( 0 <= counts[s.grade] )
    sorted_students[index] = s


The time complexity goes as follows. It takes O(K) time to initialize the array, O(N) time to read in the numbers and increment the appropriate element of counts, another O(K) to create the cumulative sum array, and another O(N) time to scan read through the list of students again, for a total runtime of O(N+K).