Introduction
Sorting is the process of arranging a collection of items (numbers, names, etc.) into a specific order, like ascending or descending.
Comparison-Based Sorting Algorithms
These algorithms sort elements by comparing them pairwise. The order of elements is determined solely based on these comparisons.
1. Bubble Sort
- How it works: Imagine bubbles rising in water. The heavier (larger) elements "sink" to the bottom of the list while the lighter (smaller) ones "bubble" to the top. It works by repeatedly stepping through the list, comparing adjacent elements and swapping them if they are in the wrong order.
- Analogy: Picture a line of people where each person compares their height with the one next to them. If they are taller than the next one, they swap places.
- Process:
- Start at the beginning of the list.
- Compare the first two elements. If the first is bigger, swap them.
- Move to the next pair, compare and swap if needed.
- Keep going until the end of the list, doing one sweep.
- Repeat the sweeps until no swaps happen.
- Time Complexity:
- Best Case: - when the list is already sorted
- Average Case: - when the list is not sorted or partially sorted
- Worst Case: - when the list is reverse sorted
- Space Complexity:
def bubble_sort(arr): n = len(arr) for i in range(n): for j in range(0, n - i - 1): if arr[j] > arr[j + 1]: arr[j], arr[j + 1] = arr[j + 1], arr[j]
2. Insertion Sort
- How it works: Think of how you might sort a hand of cards. You pick one card and insert it into its correct place in the already sorted cards. In the context of the array, we assume that the left part of the array is always sorted and we are inserting each new element into the sorted portion.
- Analogy: Like sorting cards in your hand. You take one card at a time and put it into the right position in the sorted part of your hand.
- Process:
- Assume the first element is sorted.
- Take the next element and compare it with the previous ones (move to the left).
- Insert the current element into the right spot among the already sorted elements.
- Repeat this process for each remaining element.
- Time Complexity:
- Best Case: - when the list is already sorted
- Average Case: - when the list is not sorted or partially sorted
- Worst Case: - when the list is reverse sorted
- Space Complexity:
def insertion_sort(arr): n = len(arr) for i in range(1, n): key = arr[i] j = i - 1 while j >= 0 and arr[j] > key: arr[j + 1] = arr[j] j -= 1 arr[j + 1] = key
3. Selection Sort
- How it works: Imagine you are trying to find the smallest item from a pile, put it to the front, and then repeat with the remaining items. It works by repeatedly finding the minimum element from the unsorted part of the list and placing it at the beginning of the sorted part.
- Analogy: Find the smallest person, place them in front of the line, then repeat to sort the remaining people.
- Process:
- Find the smallest element in the unsorted list.
- Swap it with the first element of the unsorted list.
- Repeat for the remaining unsorted elements, each time finding the smallest and swapping it with the first of unsorted.
- Time Complexity:
- Best Case:
- Average Case:
- Worst Case: - doesn't change much even if partially sorted.
- Space Complexity:
def selection_sort(arr): n = len(arr) for i in range(n - 1): min_index = i for j in range(i + 1, n): if arr[j] < arr[min_index]: min_index = j arr[i], arr[min_index] = arr[min_index], arr[i]
4. Merge Sort
- How it works: This algorithm is based on the "divide and conquer" principle. You keep splitting the list into halves until you get to individual elements. Then you merge those sorted parts together until you have the final sorted list.
- Analogy: Imagine sorting a deck of cards. You divide the deck in half, then in halves again, keep doing it until you have individual cards (which are already sorted). Then, you combine these individual cards to form sorted groups until you have a single sorted deck.
- Process:
- Recursively divide the list into two halves until you have lists of size 1.
- Merge the sorted halves into new, sorted lists.
- Repeat until you have one fully sorted list.
- Time Complexity:
- Best Case:
- Average Case:
- Worst Case:
- Space Complexity:
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2 left_arr = merge_sort(arr[:mid]) right_arr = merge_sort(arr[mid:]) return merge(left_arr, right_arr) def merge(left, right): merged = [] i = 0 j = 0 while i < len(left) and j < len(right): if left[i] <= right[j]: merged.append(left[i]) i += 1 else: merged.append(right[j]) j += 1 while i < len(left): merged.append(left[i]) i += 1 while j < len(right): merged.append(right[j]) j += 1 return merged
5. Quick Sort
- How it works: It's another "divide and conquer" method, but instead of always dividing in half, it chooses a "pivot" element and partitions the list around that pivot. It then sorts the sublists recursively.
- Analogy: Imagine picking someone as a "leader". Everyone smaller stands on one side, and everyone larger stands on the other side. Repeat the process on each side using a new leader each time until sorted.
- Process:
- Choose a pivot element.
- Partition the list into two sublists, smaller and larger than pivot.
- Recursively sort each sublist using quicksort
- Time Complexity:
- Best Case:
- Average Case:
- Worst Case: - if pivot is always the smallest or largest. Can be avoided if pivot is chosen randomly or with a different method
- Space Complexity: - logarithmic, due to the recursive nature, can be in worst cases
def quick_sort(arr, low, high): if low < high: pivot_index = partition(arr, low, high) quick_sort(arr, low, pivot_index - 1) quick_sort(arr, pivot_index + 1, high) def partition(arr, low, high): pivot = arr[high] i = low - 1 for j in range(low, high): if arr[j] <= pivot: i += 1 arr[i], arr[j] = arr[j], arr[i] arr[i + 1], arr[high] = arr[high], arr[i + 1] return i + 1
6. Heap Sort
- How it works: This uses a special tree-based data structure called a "heap." A max-heap ensures the largest element is always at the root, which makes sorting efficient. It repeatedly extracts the largest element from the heap.
- Analogy: Imagine a priority queue or a competition where the "winner" (largest element) is always at the top, then we remove them and sort the rest.
- Process:
- Build a max-heap from the list.
- Swap the root (largest element) with the last element.
- Reduce the heap size and "heapify" the remaining heap to maintain the max-heap property.
- Repeat from step 2 until the heap is empty.
- Time Complexity:
- Best Case:
- Average Case:
- Worst Case:
- Space Complexity:
def heap_sort(arr): n = len(arr) for i in range(n // 2 - 1, -1, -1): heapify(arr, n, i) for i in range(n - 1, 0, -1): arr[0], arr[i] = arr[i], arr[0] heapify(arr, i, 0) def heapify(arr, n, i): largest = i left = 2 * i + 1 right = 2 * i + 2 if left < n and arr[left] > arr[largest]: largest = left if right < n and arr[right] > arr[largest]: largest = right if largest != i: arr[i], arr[largest] = arr[largest], arr[i] heapify(arr, n, largest)
7. Shell Sort
- How it works: Shell Sort is an improvement over Insertion Sort. It sorts elements that are far apart from each other and gradually reduces the gap between elements to be compared. It uses a gap sequence that gets smaller over time, eventually behaving like insertion sort.
- Analogy: Like sorting a deck of cards where you initially compare and sort cards that are several positions apart, and gradually reduce the spacing.
- Time Complexity: Depends on the gap sequence but can achieve or better in certain cases.
- Space Complexity:
def shell_sort(arr): n = len(arr) gap = n // 2 while gap > 0: for i in range(gap, n): temp = arr[i] j = i while j >= gap and arr[j - gap] > temp: arr[j] = arr[j - gap] j -= gap arr[j] = temp gap //= 2
Non-Comparison Sorting Algorithms
These algorithms sort elements without comparing them to each other directly. They make use of other properties of the elements, like their values.
8. Counting Sort
- How it works: Counting Sort is used to sort integers within a specific range. It counts the occurrences of each integer, and then uses the counts to place the elements in their correct sorted positions.
- Analogy: Imagine counting the number of each colored ball, and then rearranging them based on their count to produce an ordered list of colors.
- Time Complexity: , where n is the number of elements and k is the range of the input.
- Space Complexity:
def counting_sort(arr): max_val = max(arr) count = [0] * (max_val + 1) for num in arr: count[num] += 1 sorted_arr = [] for i in range(len(count)): sorted_arr.extend([i] * count[i]) return sorted_arr
9. Radix Sort
- How it works: Radix Sort sorts integers digit by digit. It sorts the elements based on their least significant digit first, and then sorts based on more significant digits using a stable sorting algorithm (like Counting Sort).
- Analogy: Imagine sorting a pile of papers by the last digit of a number written on it and then the next last digit and so on, until the numbers are ordered.
- Time Complexity: , where n is the number of elements and k is the number of digits in the largest number.
- Space Complexity:
def radix_sort(arr): max_val = max(arr) exp = 1 while max_val // exp > 0: counting_sort_radix(arr, exp) exp *= 10 def counting_sort_radix(arr, exp): n = len(arr) output = [0] * n count = [0] * 10 for i in range(n): index = arr[i] // exp count[index % 10] += 1 for i in range(1, 10): count[i] += count[i - 1] i = n - 1 while i >= 0: index = arr[i] // exp output[count[index % 10] - 1] = arr[i] count[index % 10] -= 1 i -= 1 for i in range(n): arr[i] = output[i]
10. Bucket Sort
- How it works: Bucket Sort distributes the elements of a list into a number of buckets. Each bucket is then sorted individually (using insertion sort or another suitable sorting algorithm) and then concatenated. It works well when the input is uniformly distributed
- Analogy: Imagine sorting cards where you distribute them into 10 buckets of 10 card number each (1-10, 11-20, etc) then each bucket is sorted and finally put together.
- Time Complexity: - when the input is uniformly distributed; otherwise, it can be in the worst case where many elements end up in one bucket.
- Space Complexity:
def bucket_sort(arr): n = len(arr) if n == 0: return arr max_val = max(arr) min_val = min(arr) bucket_count = n bucket_range = (max_val - min_val) / bucket_count buckets = [[] for _ in range(bucket_count)] for x in arr: index = int((x - min_val)/ bucket_range) if index >= bucket_count: index = bucket_count - 1 buckets[index].append(x) sorted_arr = [] for bucket in buckets: insertion_sort(bucket) sorted_arr.extend(bucket) return sorted_arr
Sorting Lower Bounds
- Comparison-Based Sorts: Comparison-based sorting algorithms have a theoretical lower bound of in the worst-case scenario. This means that no comparison-based algorithm can sort a list in fewer than n log n comparisons for all possible input lists. Merge Sort and Heap Sort achieve this lower bound.
- Non-Comparison Sorts: Non-comparison sorting algorithms can achieve better than performance in certain cases because they don't rely on pairwise comparisons. Counting Sort, Radix Sort, and Bucket Sort can achieve linear time complexity under the right conditions. However, they might have requirements about the data being sorted (e.g., Radix Sort is for numbers, Counting Sort needs a range limit)
Summary Table
Algorithm | Type | Best Case | Average Case | Worst Case | Space Complexity | Stable | In-Place |
Bubble Sort | Comparison-Based | Yes | Yes | ||||
Insertion Sort | Comparison-Based | Yes | Yes | ||||
Selection Sort | Comparison-Based | No | Yes | ||||
Merge Sort | Comparison-Based | Yes | No | ||||
Quick Sort | Comparison-Based | No | Yes | ||||
Heap Sort | Comparison-Based | No | Yes | ||||
Shell Sort | Comparison-Based | No | Yes | ||||
Counting Sort | Non-Comparison | Yes | No | ||||
Radix Sort | Non-Comparison | Yes | No | ||||
Bucket Sort | Non-Comparison | Yes | No |
Key Definitions:
- Time Complexity: Represents how the algorithm's runtime scales with the input size.
- Space Complexity: Represents how the algorithm's memory usage scales with the input size.
- Stable: A sorting algorithm is stable if elements with the same value maintain their relative order in the sorted output.
- In-place: An algorithm is in-place if it uses only a constant amount of extra memory, regardless of the input size (excluding the space for storing the input itself).
Yes
in place means that only is used, andNo
means more than .
- n: Represents the number of elements to be sorted.
- k: Represents the range of the input values for non-comparison sorts (e.g., the maximum value for Counting Sort, the number of digits in Radix Sort).
Important Considerations:
- Best, Average, and Worst Cases: The time complexity can differ based on the input data. Best-case is the ideal, worst-case is the most inefficient scenario, and average-case is the typical performance of an algorithm.
- Practical Usage:
- Simple Sorts (Bubble, Insertion, Selection): Good for small lists or nearly sorted data due to simplicity but inefficient for large datasets.
- Efficient Sorts (Merge, Quick, Heap): These provide better performance for large datasets. They have a time complexity of (Merge Sort, Heap Sort) or average of for Quick Sort.
- Shell Sort: Can be a good compromise in-place sort.
- Non-Comparison Sorts: Highly efficient under certain conditions, particularly for integer data (Counting, Radix). Bucket Sort is efficient for uniformly distributed data.