Merge sort
From Algorithmist
Merge sort is a sorting algorithm invented by John von Neumann based on the divide and conquer technique. It always runs in
time, but requires
space. The general concept is that we first break the list into two smaller lists of roughly the same size, and then use merge sort recursively on the subproblems, until they cannot subdivide anymore (i.e. when they contain zero or one elements). Then, we can merge by stepping through the lists in linear time. The recurrence is thus:
which solves to:
T(n) = Θ(nlogn)
Contents |
[edit] Pseudo-code
- a is a array containing n elements.
func mergesort( var a as array )
if ( n = 1 ) return a
var l1 as array = a[0] ... a[n/2]
var l2 as array = a[n/2+1] ... a[n]
l1 = mergesort( l1 )
l2 = mergesort( l2 )
return merge( l1, l2 )
end func
func merge( var a as array, var b as array )
var c as array
while ( a and b have elements )
if ( a[0] > b[0] )
add b[0] to the end of c
remove b[0] from b
else
add a[0] to the end of c
remove a[0] from a
while ( a has elements )
add a[0] to the end of c
remove a[0] from a
while ( b has elements )
add b[0] to the end of c
remove b[0] from b
return c
end func
[edit] Bottom-up merge sort
Bottom-up merge sort is a non-recursive variant of the merge sort, in which the array is sorted by a sequence of passes. During each pass, the array is divided into blocks of size
. (Initially,
). Every two adjacent blocks are merged (as in normal merge sort), and the next pass is made with a twice larger value of
.
In pseudo-code:
Input: array a[] indexed from 0 to n-1.
m = 1
while m <= n do
i = 0
while i < n-m do
merge subarrays a[i..i+m-1] and a[i+m .. min(i+2*m-1,n-1)] in-place.
i = i + 2 * m
m = m * 2
[edit] natural mergesort
For almost-sorted data on tape, a bottom-up "natural mergesort" variant of this algorithm is popular.
The bottom-up "natural mergesort" merges whatever "chunks" of in-order records are already in the data. In the worst case (reversed data), "natural mergesort" performs the same as the above -- it merges individual records into 2-record chunks, then 2-record chunks into 4-record chunks, etc. In the best case (already mostly-sorted data), "natural mergesort" merges large already-sorted chunks into even larger chunks, hopefully finishing in fewer than log n passes.
In pseudocode, the "natural mergesort" algorithm could look something like this:
# Original data is on the input tape; the other tapes are blank
function mergesort(input_tape, output_tape, scratch_tape_C, scratch_tape_D)
while any records remain on the input_tape
while any records remain on the input_tape
merge( input_tape, output_tape, scratch_tape_C)
merge( input_tape, output_tape, scratch_tape_D)
while any records remain on C or D
merge( scratch_tape_C, scratch_tape_D, output_tape)
merge( scratch_tape_C, scratch_tape_D, input_tape)
# take the next sorted chunk from the input tapes, and merge into the single given output_tape.
# tapes are scanned linearly.
# tape[next] gives the record currently under the read head of that tape.
# tape[current] gives the record previously under the read head of that tape.
# (Generally both tape[current] and tape[previous] are buffered in RAM ...)
function merge(left[], right[], output_tape[])
do
if left[current] ≤ right[current]
append left[current] to output_tape
read next record from left tape
else
append right[current] to output_tape
read next record from right tape
while left[current] < left[next] and right[current] < right[next]
if left[current] < left[next]
append current_left_record to output_tape
if right[current] < right[next]
append current_right_record to output_tape
return

