Sorting algorithms

Jimmy (xiaoke) Shen
5 min readFeb 15, 2020

--

Sorting is important and if is important without further explanation.

Nice visualizations can be found here.

Important sorting algorithms are:

O(N²) insertion, bubble, selection sort.

O(nlogn) algorithms are quick, merge, heap.

If algorithms are based on the swap, the complexity will be more than O(nlogn), however, we do have some O(n+k) algorithms in the special condition such as bucket sort.

More content can be found from Wikipedia.

In terms of best-case complexity:

For average O(N²) sorting algorithms, bubble sort and insertion sort in the best case complexity is O(n). However, the selection sort in the best case is still O(n²).

In terms of stable or not:

Bubble and insertion algorithms are stable, while selection sort is not stable.

Merge sort is stable, quicksort can be stable, heap sort is not stable.

Why stable sort and when do we need stable sort? The answers can be found here:

https://www.drdobbs.com/cpp/why-stable-sorting-is-important/231001864

A problem related to sorting can be found here

179. Largest Number

Solution

A reference solution can be found HERE.

215. Kth Largest Element in an Array

Different Python2 solution from HERE by Cai Kehe

# O(nlgn) time
def findKthLargest1(self, nums, k):
return sorted(nums, reverse=True)[k-1]

# O(nk) time, bubble sort idea, TLE
def findKthLargest2(self, nums, k):
for i in xrange(k):
for j in xrange(len(nums)-i-1):
if nums[j] > nums[j+1]:
# exchange elements, time consuming
nums[j], nums[j+1] = nums[j+1], nums[j]
return nums[len(nums)-k]

# O(nk) time, selection sort idea
def findKthLargest3(self, nums, k):
for i in xrange(len(nums), len(nums)-k, -1):
tmp = 0
for j in xrange(i):
if nums[j] > nums[tmp]:
tmp = j
nums[tmp], nums[i-1] = nums[i-1], nums[tmp]
return nums[len(nums)-k]

# O(k+(n-k)lgk) time, min-heap
def findKthLargest4(self, nums, k):
heap = []
for num in nums:
heapq.heappush(heap, num)
for _ in xrange(len(nums)-k):
heapq.heappop(heap)
return heapq.heappop(heap)

# O(k+(n-k)lgk) time, min-heap
def findKthLargest5(self, nums, k):
return heapq.nlargest(k, nums)[k-1]

# O(n) time, quick selection
def findKthLargest(self, nums, k):
# convert the kth largest to smallest
return self.findKthSmallest(nums, len(nums)+1-k)

def findKthSmallest(self, nums, k):
if nums:
pos = self.partition(nums, 0, len(nums)-1)
if k > pos+1:
return self.findKthSmallest(nums[pos+1:], k-pos-1)
elif k < pos+1:
return self.findKthSmallest(nums[:pos], k)
else:
return nums[pos]

# choose the right-most element as pivot
def partition(self, nums, l, r):
low = l
while l < r:
if nums[l] < nums[r]:
nums[l], nums[low] = nums[low], nums[l]
low += 1
l += 1
nums[low], nums[r] = nums[r], nums[low]
return low

Quickselect

Similar to quicksort, the quick select algorithm can be used here to solve the problem with the complexity of:

Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side — the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n), with a worst case of O(n^2). -wikipedia

Why quick select is O(n) if we have a perfect pivot chosen?

Because, if it is perfectly chosen, the data size will follow a 1,1/2, 1/4, …pattern. The sum of this geometric series is 1+1 = 2. So the complexity is O(2n)=O(n).

1/2+1/4+… = 1

When will we get the worst-case complexity for quickselect algorithm

Like quicksort, the quickselect has good average performance, but is sensitive to the pivot that is chosen. If good pivots are chosen, meaning ones that consistently decrease the search set by a given fraction, then the search set decreases in size exponentially and by induction (or summing the geometric series) one sees that performance is linear, as each step is linear and the overall time is a constant times this (depending on how quickly the search set reduces). However, if bad pivots are consistently chosen, such as decreasing by only a single element each time, then worst-case performance is quadratic: O(n2). This occurs for example in searching for the maximum element of a set, using the first element as the pivot, and having sorted data. — wikipedia

Pseudocode from Wikipedia

function partition(list, left, right, pivotIndex) is
pivotValue := list[pivotIndex]
swap list[pivotIndex] and list[right] // Move pivot to end
storeIndex := left
for i from left to right − 1 do
if list[i] < pivotValue then
swap list[storeIndex] and list[i]
increment storeIndex
swap list[right] and list[storeIndex] // Move pivot to its final place
return storeIndex
// Returns the k-th smallest element of list within left..right inclusive
// (i.e. left <= k <= right).
function select(list, left, right, k) is
if left = right then // If the list contains only one element,
return list[left] // return that element
pivotIndex := ... // select a pivotIndex between left and right,
// e.g., left + floor(rand() % (right − left + 1))
pivotIndex := partition(list, left, right, pivotIndex)
// The pivot is in its final sorted position
if k = pivotIndex then
return list[k]
else if k < pivotIndex then
return select(list, left, pivotIndex − 1, k)
else
return select(list, pivotIndex + 1, right, k)

Direct translate pseudocode to C++ code

A direct translated C++ code from the pseudocode of the [quick select](https://en.wikipedia.org/wiki/Quickselect) algorithm mentioned from Wikipedia.
It is pretty fast and the run time is

/*
Runtime: 16 ms, faster than 95.11% of C++ online submissions for Kth Largest Element in an Array.
Memory Usage: 10 MB, less than 96.23% of C++ online submissions for Kth Largest Element in an Array.
*/

class Solution {
public:
int partition(vector<int>& list, int left, int right,int pivot_index)
{
auto pivot_value = list[pivot_index];
swap(list[pivot_index], list[right]);
auto store_index = left;
for(int i=left; i<=right;++i)
{
if (list[i] < pivot_value)
{
swap(list[i], list[store_index++]);
}
}
swap(list[store_index], list[right]);
return store_index;
}

int select(vector<int>& list, int left, int right,int k)
{
if (left==right)return list[left];
auto pivot_index = left + rand()%(right - left + 1);
pivot_index = partition(list, left, right, pivot_index);
if (k==pivot_index)return list[pivot_index];
else if(k<pivot_index)
{
return select(list, left, pivot_index-1, k);
}
else
{
return select(list, pivot_index+1, right, k);
}

}
int findKthLargest(vector<int>& nums, int k)
{
return select(nums, 0, nums.size()-1, nums.size()-k);
}
};

148. Sort List

A nice video about this problem can be found in [2]

Reference

[1]https://v.douyu.com/show/za4Jj7llk1B7Dk01

[2] https://youtu.be/M1TwY0nsTZA

--

--

No responses yet