W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)

Video: https://youtu.be/gR6nycuZKlM

#CS50x #C #Computer_Science #Algorithms

Read more here
Related: W0 - 👨‍🏫 Lecture - Computer Science from Scratch

🔙 Previous Part | Next Part 🔜

↩️ Go Back

Table of Contents:

↩️ Go Back


A) Lecture Goals: Big O Notation and Algorithm Efficiency 🎯

This week builds on the concepts introduced in previous weeks, with a focus on algorithm efficiency and the use of Big O notation to describe the running time of algorithms.

Practical Implications:

This week's focus is on applying these concepts to design more efficient search and sorting algorithms while developing a deeper understanding of how computers process data.

Recap: Debug Tools

Below is a summary of Debugging Tools we have learnt so far:

  1. Help50: Assists with interpreting cryptic compiler error messages.
  2. Style50: Checks the aesthetic quality of your code.
  3. Check50: Tests your code for correctness against problem specifications.
  4. Printf: Outputs data to help debug by inspecting variable states.
  5. Debug50: An interactive debugger that allows you to:
    • View the stack of functions called during execution.
    • Inspect local variables during runtime.
  6. Duck Debugger: A playful tool to help articulate problems by explaining them out loud.

A.1) Recap: Arrays and Their Limitations ⚠️

Arrays allow for contiguous memory storage of elements, simplifying data management.

Limitations of arrays include:


B) Recap: "Search Problems" and its algorithms 📖🔍

Remember: [[W0 - 👨‍🏫 Lecture - Computer Science from Scratch]]
A search problem involves finding a value within a dataset, like a phone book or a webpage list.

The goal is to return:


C) Intro to Algorithm Efficiency: Running Time and its Notations ⏱️⚙️

[[Running Time]]: Describes how long an algorithm takes to execute, often measured in:

C.1) Big O Notation: The Worst-Case Running Time (The Upper Bound) 😱⚠️

[[Big O Notation]]: The Upper Bound on Running Time

Big O Definition: It describes the worst-case running time of an algorithm. It is the pessimistic calculation on how long the algorithm will take to solve the problem.

Week Zero Algorithm Recap with [[Big O]]

  1. Linear Search (checking each element sequentially):
    • Its possible worst-case execution: O(n). (terrible)
  2. Double-Speed Search (skipping every second page in a phone book):
    • Its possible worst-case execution: O(n/2).
    • Simplified to O(n) because constants are ignored. (still terrible)
  3. [[Binary Search]] (divide and conquer):
    • Its possible worst-case execution: O(log n). (good!)


C.1.1) Simplifying Big O (ignore constants and non-dominant terms)

Ignore constant factors. When n (the size of the problem), grows to infinity, the constants become irrelevant:

Also, focus only on dominant terms,

Read more here: [[Big O Notation by Colt Steele]]


C.1.2) Algorithm Categories by Running Time (using Big O)

  1. O(1): Constant time, regardless of input size.
  2. O(log n): Logarithmic, efficient for large datasets.
  3. O(n): Linear, proportional to the size of the input.
  4. O(n log n): Log-linear, common in efficient sorting algorithms.
  5. O(n²): Quadratic, can become impractical with large inputs.

Common Examples with Algorithms:


C.2) Omega Notation: The Best-Case Running Time (The Lower Bound) 😄❗

Omega (Ω): represent the opposite to [[Big O]], [[Omega Notation]] represents the Lower Bound.

It describes the best-case running time of an algorithm. And we can use the same Math Formulas for n, but just keep in mind this is the best case scenario now...

Example:

C.3) Theta Notation: The Tight Bound (the upper and lower bounds are the same) ⚖️

Theta (Θ): Represents the tight bound on an algorithm's running time.
Theta notation is used when the upper bound (Big O) and the lower bound (Omega) are the same.

In other words, Θ gives a more precise measurement of an algorithm's running time—it describes the exact growth rate of the running time as the input size n increases.

Definition of Θ:
An algorithm is Θ(f(n)) if its running time is:

This can be expressed mathematically as:
c₁ * f(n) ≤ T(n) ≤ c₂ * f(n) for all n ≥ n₀, where c₁, c₂ > 0.

Examples of Θ:

  1. Θ(1): The algorithm always runs in constant time, regardless of input size.
    • Example: Accessing a specific array element by index.
  2. Θ(log n): The algorithm runs logarithmically.
    • Example: Binary search.
  3. Θ(n): The algorithm runs linearly with the size of the input.
    • Example: Iterating through an array.
  4. Θ(n²): The algorithm has quadratic growth.
    • Example: Nested loops over the same dataset.

C.4) Note: Comparing Big O, Ω, and Θ

To summarize:

Example Comparison:

By understanding Θ, you gain clarity about when an algorithm consistently performs in a predictable time frame for all inputs, unlike Big O or Ω, which only describe one side of the spectrum.

Read more in: [[Big O Notation by Colt Steele]]



______________________________________________________________



D) Linear Search: In-Depth Explanation (Random Dataset) 🎲🔍

Linear search is a straightforward algorithm for finding a specific value in a dataset, such as searching for a number in an array.

Below is a detailed explanation of the process:

Conceptual Overview:
Linear search involves inspecting each element of a dataset sequentially until the desired value is found or all elements have been checked.

  1. Starting Point:
    • Begin at the first element in the dataset.
  2. Step-by-Step Inspection:
    • Check if the current element matches the desired value.
    • If it matches, the search is successful, and the algorithm ends.
    • If it doesn't match, move to the next element.
  3. Exhaustion:
    • If all elements are checked and no match is found, the search fails.

D.1) Example: Searching Behind Doors 🚪

In the provided example, the goal is to find the number 0 hidden behind one of the seven doors.

Since the numbers are randomly arranged, a systematic approach like [[linear search]] is appropriate:

  1. Start from the Left:
    • Check the first door. If the number isn’t 0, move to the next door.

  1. Continue Sequentially:
    • Check each subsequent door one at a time.

  1. Find or Exhaust:
    • If 0 is found, the search ends successfully.
    • If all doors are checked and 0 isn’t found, return a failure result.


Here’s how linear search can be expressed in pseudocode:

for i from 0 to n - 1:
    if number is behind the i-th door:
        return true  # Found the number
return false  # Number not found after checking all doors

D.3.1) Worst-Case Scenario: ( O(n) ) 😱⚠️ 🐢

Big O Definition: The worst-case running time is the maximum number of steps the algorithm might take.

Example:

Result: The running time in this case is proportional to ( n), denoted as ( O(n)).

D.3.2) Best-Case Scenario: ( Omega(1) ) 😄❗ 🚀

Example:

Result: The running time in this case is constant, denoted as ( Omega(1) ).


  1. Big-O Notation (( O(n) )): algorithm’s running time (worst-case).

    • Linear search requires ( n ) steps in the worst case when the desired value is at the end of the dataset or not present at all.
  2. Omega Notation (( Omega(1) )): algorithm’s running time (best-case).

    • In the best case, the value is found in the first step, requiring constant time.
  3. Range of Efficiency:

    • [[Linear search]] performance ranges from ( Omega(1) ) (best-case) to ( O(n) ) (worst-case).

Practical Implications of [[Linear Search]]:

By understanding the structure, pseudocode, and efficiency analysis of linear search, you can implement and optimize it in practical scenarios effectively.


E) Binary Search: Detailed Explanation (Sorted Dataset)📏🔍

Binary search is an efficient algorithm for finding a target value in a sorted dataset.

It operates by repeatedly dividing the dataset into halves, allowing it to find the target in logarithmic time.

Here's an in-depth breakdown:

Conceptual Overview: Binary search relies on the dataset being sorted. This key property enables the algorithm to eliminate half of the remaining search space with each step.

  1. Start in the Middle:
    • Check the middle element of the dataset.
    • If the middle element matches the target, the search is complete.
  2. Narrow Down the Search:
    • If the target is smaller than the middle element, search the left half.
    • If the target is larger, search the right half.
  3. Repeat Until Found or Exhausted:
    • Continue halving the search space until the target is found or the search space is empty.

E.1) Example 1: Searching Behind Doors 🚪

In the example:

Steps:

  1. Start at the Middle:
    • Check the middle door (door 3). The number is 5, which is smaller than 6.
    • Narrow the search to the doors on the right. ---->

  1. Repeat with the New Middle:
    • Check the middle door of the remaining right half (door 5). The number is 7, which is larger than 6.
    • Narrow the search to the doors on the left. <-------

  1. Reach the Target:
    • Only one door remains (door 4), which contains 6. The search is successful.

By leveraging the sorted order, only 3 doors were checked, compared to potentially 7 doors in a [[linear search]].


[[Binary search]] can be expressed in pseudocode as follows:

Beginner Pseudo Code Explanation: This pseudo code uses conceptual language ("doors") to simplify understanding.

If no doors
    Return false
If numberX is behind doors[middle]
    Return true
Else if numberX < doors[middle]
    Search doors[0] through doors[middle - 1] //left half
Else if numberX > doors[middle]
    Search doors[middle + 1] through doors [n - 1] //right half

Explanation:

Advanced Pseudo Code Explanation: This version of binary search is closer to implementation. It defines the logic using variables and loops that can directly translate into programming languages:

function binary_search(array, target):
    start = 0
    end = array.length - 1

    while start <= end:
        middle = (start + end) / 2

        if array[middle] == target:
            return true  // Target found
        else if array[middle] < target:
            start = middle + 1  // Narrow to right half
        else:
            end = middle - 1  // Narrow to left half

    return false  // Target not found

Explanation:

  1. Initial Variables:

    • start and end define the search boundaries. Initially, start is the first index, and end is the last index of the list.
  2. While Loop:

    • The loop runs as long as the search range is valid (start <= end).
    • Each iteration calculates the middle index: (start + end) / 2.
  3. Comparison:

    • If the middle element matches the target, the function immediately returns true.
    • If the target is smaller, it adjusts the end boundary to search the left half (end = middle - 1).
    • If the target is larger, it adjusts the start boundary to search the right half (start = middle + 1).
  4. Return Value:

    • If the loop finishes without finding the target, the function returns false.

E.3.1) Worst-Case Scenario: ( O(log n) ) 😱⚠️ 🚀

E.3.2) Best-Case Scenario: ( Omega(1) ) 😄❗ 🚀

Algorithm Best Case (Omega) Worst Case (O)
Linear Search Ω(1) O(n)
Binary Search Ω(1) O(log(n))

[[Binary search]] is exponentially faster for large datasets because it reduces the search space by half with each step.

Example 2: Demonstration with Light Bulbs


E.4) CPU Performance and Real-World Implications

For example, a CPU operating at 1 GHz can process binary search for very large datasets almost instantaneously compared to linear search.


E.5) Key Takeaways: using Binary Search for Sorted Datasets

  1. Precondition: Binary search requires a sorted dataset.
  2. Efficiency: It reduces the search space by half at each step, achieving logarithmic time complexity.
  3. Best Use Case: Ideal for large, sorted datasets where efficient searching is critical.
  4. Limitations: Ineffective on unsorted data, requiring preprocessing (sorting) before the search.

Binary search exemplifies the power of algorithms to optimize performance, making it a cornerstone in computer science.



______________________________________________________________



F) Practice: Translating Linear Search to C Code 👨‍💻 - (Random Dataset) 🎲🔍

F.1) Example: Implementing Linear Search with Numbers (numbers.c)

The goal here is to implement linear search in C to locate a specific number in an array.

  1. Code Explanation:

    • The program uses an array of integers.
    • Linear search iterates through the array, comparing each element to the target value.
    • If the value is found, it prints "Found" and exits with a success code (0).
    • If the loop finishes without finding the value, it prints "Not Found" and exits with an error code (1).
  2. Code for Numbers:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
    // Array of numbers
    int numbers[] = {4, 6, 8, 2, 7, 5, 0};
    int n = 7; // Size of the array

    // Target number to search for
    int target = 0;

    // Linear search
    for (int i = 0; i < n; i++)
    {
        if (numbers[i] == target)
        {
            printf("Found\n");
            return 0; // Exit with success
        }
    }

    // If we reach here, the number was not found
    printf("Not Found\n");
    return 1; // Exit with failure
}

Sample Outputs - Numbers Program:


F.2) Example: Implementing Linear Search with Strings (names.c)

The second program extends linear search to work with strings.

It also highlights the use of strcmp to handle string comparisons in C, as direct == comparisons do not work for strings.

  1. Key Concepts: strcmp()
    • The strcmp function from string.h compares two strings:
      • Returns 0 if the strings are equal.
      • Returns a negative value if the first string is lexicographically less.
      • Returns a positive value if the first string is lexicographically greater.
    • The program is case-sensitive. Meaning lower and upper case matters when comparing.
      • Searching for "RON" in an array containing "Ron" will return "Not Found."
  1. Code for Names:
#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
    // Array of names
    string names[] = {"Bill", "Charlie", "Fred", "George", "Ginny", "Percy", "Ron"};
    int n = 7; // Size of the array

    // Target name to search for
    string target = "Ron";

    // Linear search
    for (int i = 0; i < n; i++)
    {
        if (strcmp(names[i], target) == 0)
        {
            printf("Found\n");
            return 0; // Exit with success
        }
    }

    // If we reach here, the name was not found
    printf("Not Found\n");
    return 1; // Exit with failure
}

Sample Outputs - Names Program:



______________________________________________________________



G) 🧰 Implementing Custom Data Types in C: using Structs 🗃️

Creating Custom Data Types in C:
In C, you can create your own [[custom data types]] using struct and typedef.

This allows you to group different types of variables into a single entity, making your programs more efficient and organized.

Introduction to Custom Data Types: working with a phone book
For example, we can build a phone book using custom data types in C to store both names and phone numbers.

So, if we want to represent a person in a phonebook with both a name and a phone number, we can create a custom data type called person.

G.1) Step-by-Step Example: Building a Phonebook 📔👤📞

Motivation: The goal here is to represent a phonebook where each entry stores a person's name and phone number.

G.1.1) Concept 1: Why Use Strings for Phone Numbers?


G.1.2) Concept 2: Noob Approach - Using Parallel Arrays

Before creating custom data types, we can represent a phonebook using two parallel arrays:

Code: Using Parallel Arrays

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
    // Two parallel arrays
    string names[] = {"Brian", "David"};
    string numbers[] = {"+1-617-495-1000", "+1-949-468-2750"};

    // Searching for "David"
    for (int i = 0; i < 2; i++)
    {
        if (strcmp(names[i], "David") == 0)
        {
            printf("Found: %s\n", numbers[i]);
            return 0; // Exit with success
        }
    }

    // If not found
    printf("Not found\n");
    return 1; // Exit with failure
}

It works but...

Drawbacks of Using Parallel Arrays:


G.1.3) Step 1: Using Custom Data Types with typedef struct 🗃️

The better approach is to use structures to group related data together. This approach helps avoid the problems associated with parallel arrays.

Using struct and typedef:

Define a person Struct:

Creating the Custom Data Type:

typedef struct
{
    string name;
    string number;
} person;

G.1.4) Step 2: Refactoring to Improve the Code and use an Array of person

We replace parallel arrays with a single array of person structures.

Code: Using person Struct

#include <cs50.h>
#include <stdio.h>
#include <string.h>

// Define the person structure
typedef struct
{
    string name;
    string number;
} person;

int main(void)
{
    // Array of person structures
    person people[2];

    // Populate the array
    people[0].name = "Brian";
    people[0].number = "+1-617-495-1000";

    people[1].name = "David";
    people[1].number = "+1-949-468-2750";

    // Search for "David"
    for (int i = 0; i < 2; i++)
    {
        if (strcmp(people[i].name, "David") == 0)
        {
            printf("Found: %s\n", people[i].number);
            return 0; // Exit with success
        }
    }

    // If not found
    printf("Not found\n");
    return 1; // Exit with failure
}

It also works!

Advantages of Using Structs:

  1. Cleaner Code: Instead of managing two separate arrays, you now have one array of person structs, each holding both the name and number together.
  2. Scalability: This approach scales well when the dataset grows, and it avoids the risks of mismatched data from parallel arrays.
  3. Ease of Access: To access a person's name or number, you just use the dot notation (e.g., people[i].name).

G.1.5) Step 3: Scaling the Program (Dynamic Array Size)

Dynamic Array Size:
Instead of hardcoding the size (e.g., 2 people), we can use a constant or even dynamic memory allocation later.

For example:

const int MAX_PEOPLE = 10;  // Defining the maximum number of people

Alternatively, you could dynamically allocate memory for the array of person structs based on user input or other factors.

Dynamic Population Example:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

typedef struct
{
    string name;
    string number;
} person;

int main(void)
{
    const int MAX_PEOPLE = 10;
    person people[MAX_PEOPLE];

    int n = get_int("Number of people: ");
    if (n > MAX_PEOPLE)
    {
        printf("Too many people!\n");
        return 1;
    }

    // Populate the array
    for (int i = 0; i < n; i++)
    {
        people[i].name = get_string("Name: ");
        people[i].number = get_string("Number: ");
    }

    // Search for a name
    string target = get_string("Who are you looking for?: ");
    for (int i = 0; i < n; i++)
    {
        if (strcmp(people[i].name, target) == 0)
        {
            printf("Found: %s\n", people[i].number);
            return 0; // Exit with success
        }
    }

    printf("Not found\n");
    return 1; // Exit with failure
}

G.2) Key Learnings: Custom Data Types, Dynamic Arrays and Scalability

  1. Custom Data Types:
    • typedef struct enables encapsulation, improving program design.
  2. Dynamic Arrays:
    • Using loops and user input allows the program to handle dynamic data sizes.
  3. Scalability:
    • The person struct simplifies the addition of new fields like email or address.

Examples: Real-World Applications:



______________________________________________________________



H) Selection Sort: Swapping the Smallest Found 🧮📊

How does [[Selection Sort]] works?
[[Selection sort]] repeatedly selects the smallest (or largest) element from the unsorted portion of an array and places it in the correct position.

Step-by-Step Explanation:

  1. Start with the first element of the array and keep track of the smallest element as you explore the full array.
  2. Once you are done exploring, you will now know which is the smallest element in the array.
  3. Swap the smallest element with the current first element.
  4. Repeat the process for the remaining unsorted elements, ignoring the already sorted portion.


H.1) Step-by-Step Example and Pseudo Code: Sorting an Array

For an unsorted array:
[6, 3, 8, 5, 2, 7, 4, 1]

Pseudo Code:

for i from 0 to n - 1:
    find the smallest item between numbers[i] and the last numbers[n-1]
    swap the smallest item with the numbers[i]

H.2) Example: Implementation in C - Selection Sort

#include <stdio.h>

void selection_sort(int array[], int size)
{
    for (int i = 0; i < size - 1; i++)
    {
        int min_index = i;
        
        // Find the smallest element in the unsorted portion
        for (int j = i + 1; j < size; j++)
        {
            if (array[j] < array[min_index])
            {
                min_index = j;
            }
        }
        
        // Swap the found smallest element with the first unsorted element
        int temp = array[min_index];
        array[min_index] = array[i];
        array[i] = temp;
    }
}

void print_array(int array[], int size)
{
    for (int i = 0; i < size; i++)
    {
        printf("%d ", array[i]);
    }
    printf("\n");
}

int main(void)
{
    int numbers[] = {6, 3, 8, 5, 2, 7, 4, 1};
    int size = sizeof(numbers) / sizeof(numbers[0]);

    printf("Unsorted array: ");
    print_array(numbers, size);

    selection_sort(numbers, size);

    printf("Sorted array: ");
    print_array(numbers, size);

    return 0;
}

H.3) Efficiency of Selection Sort

H.3.1) Big O: Time Complexity - O(n^2) 😱⚠️ 🐢

  1. First Pass: Inspect all n elements to find the smallest.
  2. Second Pass: Inspect n-1 elements.
  3. Continue until only one element remains.

If you notice this is [[Gaussian Sum]] for natural numbers,

Note: Space Complexity:


H.4) Visualizing Selection Sort 📊

Visualize the process as repeatedly narrowing the unsorted portion of the array while extending the sorted portion.

Key Observations:

  1. The algorithm always finds the smallest element in the unsorted portion.
  2. Each pass reduces the size of the unsorted portion by one.
  3. The sorted portion grows one element at a time.

H.5) Drawbacks of Selection Sort (Big-O Comparison Chart)

  1. Inefficient for Large Data: ( O(n2)) is too slow for large datasets.
  2. Fixed Number of Comparisons: Regardless of the initial order (e.g., fully sorted or reversed), the algorithm always performs the same number of comparisons.

Big-O Comparison Chart:

Algorithm Best Case Worst Case Space Complexity
Selection Sort O(n2) O(n2) O(1)
Linear Search O(1) O(n) O(1)
Binary Search O(1) O(log(n)) O(1)

When to Use Selection Sort:

Future Directions:
Selection sort is a stepping stone to more advanced sorting algorithms like bubble sort, merge sort, and quick sort, which improve efficiency significantly for large datasets.


I) Bubble Sort: Sort numbers by pairs (Swapping by Pairs) 🧮📊

Bubble sort is an intuitive sorting algorithm that "bubbles" the largest elements to the end of the array by repeatedly swapping adjacent elements if they are out of order. It focuses on local corrections to gradually fix the entire array.

Step-by-Step Explanation

  1. Compare Adjacent Pairs:

    • Start with the first two elements.
    • If the first is greater than the second, swap them.
    • Move to the next pair and repeat.
  2. Iterate Through the Array:

    • Continue until reaching the end of the array.
    • After one complete pass, the largest element will have "bubbled" to its correct position at the end.
  3. Repeat Until Sorted:

    • Repeat the process, reducing the range of unsorted elements with each pass.


I.1) Step-by-Step Example and Pseudo Code: Sorting an Array

For an unsorted array:
[6, 3, 8, 5, 2, 7, 4, 1]

Pseudocode:

repeat n-1 times:
    for i from 0 to n - 2:
        if numbers[i] > numbers[i + 1]: // true if out of order
            swap(numbers[i], numbers[i + 1])
	if no swaps
		Quit

I.2) Example: Implementation in C - Bubble Sort

#include <stdio.h>

void bubble_sort(int array[], int size)
{
    int swapped;
    do
    {
        swapped = 0;
        for (int i = 0; i < size - 1; i++)
        {
            if (array[i] > array[i + 1])
            {
                // Swap elements
                int temp = array[i];
                array[i] = array[i + 1];
                array[i + 1] = temp;
                swapped = 1; // Mark as swapped
            }
        }
    } while (swapped); // Continue until no swaps are made
}

void print_array(int array[], int size)
{
    for (int i = 0; i < size; i++)
    {
        printf("%d ", array[i]);
    }
    printf("\n");
}

int main(void)
{
    int numbers[] = {6, 3, 8, 5, 2, 7, 4, 1};
    int size = sizeof(numbers) / sizeof(numbers[0]);

    printf("Unsorted array: ");
    print_array(numbers, size);

    bubble_sort(numbers, size);

    printf("Sorted array: ");
    print_array(numbers, size);

    return 0;
}

I.3) Efficiency of Bubble Sort

I.3.1) Big O: Time Complexity - O(n^2) 😱⚠️ 🐢

  1. Worst Case (Reversed Order):
    • Requires ( n ) passes through the array.
    • Each pass involves ( n-1 ), ( n-2 ), ..., ( 1 ) comparisons.
    • Total comparisons:

We only care about the dominant terms in Big O,

Note: Space Complexity


I.4) Key Observations: Comparing Bubble Sort vs Selection Sort

  1. Optimization:

    • If no swaps are made during a pass, the array is already sorted, and the algorithm can terminate early.
  2. Lower Bound:

    • In the best case (sorted input), bubble sort has a lower bound of ( Omega(n) ), since every element must be checked at least once.
  3. Comparison to Selection Sort:

    • Selection Sort:
      • Always ( O(n^2) ), regardless of input.
    • Bubble Sort:
      • Performs better with already sorted data (( O(n) )) but has the same worst-case performance (( O(n^2) )).

I.5) Visualizing Bubble Sort 📊

Using a bar visualization:


I.6) Advantages And Disadvantages of Bubble Sort

Advantages of Bubble Sort

  1. Simple to implement.
  2. Efficient for nearly sorted data.

Disadvantages of Bubble Sort

  1. Inefficient for large datasets (( O(n^2) )).
  2. Often overshadowed by more efficient algorithms like merge sort or quick sort.

Conclusion:
Bubble sort introduces the idea of optimizing the sorting process by fixing local problems (adjacent swaps) rather than selecting the smallest element each time (as in selection sort). While its best-case performance is better than selection sort, it is still impractical for large datasets due to its ( O(n^2) ) worst-case performance.



______________________________________________________________



J) 🧰 Understanding Recursion (an elegant way to create loops) 🌀

Definition: [[Recursion]] is a programming technique where a function calls itself to solve a problem.

Example1:

Each recursive call works on a smaller instance of the problem until it reaches a [[base case]], which stops the recursion.

Example2: The base case here, is when we calc(1), once we reach that case we stop breaking down the factorial.


J.1) Building Binary Search with Recursion (Pseudo-code Example) 📏🔍🌀

Let's revisit the phone book problem,

Previously we wrote the algorithm to work like this, using only Loops,

But we can replace the loop calls, with recursion and call ourselves until the Search is finished,


J.2) Example: Building a Mario Pyramid with Recursion

Imagine you’re tasked with building a pyramid with blocks, layer by layer.

Steps to Build a Pyramid of Height 4

But how do you build a pyramid of height 3?

And to build a pyramid of height 2?

Finally, to build a pyramid of height 1:

This approach defines the solution in terms of smaller versions of the same problem. The process ends when it hits the base case, where no further recursion is needed.

Recursive Pyramid Building in Code

#include <stdio.h>

void build_pyramid(int height) {
    if (height == 0) {  // Base Case
        return;
    }
    build_pyramid(height - 1);  // Recursive Case
    for (int i = 0; i < height; i++) {
        printf("#");
    }
    printf("\n");
}

int main() {
    build_pyramid(4);  // Build a pyramid of height 4
    return 0;
}

J.3) Relating Recursion to Merge Sort

Merge sort uses recursion to efficiently sort arrays by dividing the problem into smaller parts.

  1. Divide: Split the array into two halves.
  2. Conquer: Recursively sort each half.
  3. Combine: Merge the sorted halves into a single sorted array.

This approach divides the problem until the base case is reached (an array of size 1, which is inherently sorted). It then combines the smaller sorted arrays into a larger sorted array.



K) Merge Sort Algorithm - Divide-and-conquer strategy 🧮📊

The merge sort algorithm is a recursive divide-and-conquer method.

Here's its flow:

  1. Divide: Split the array into two halves.
  2. Conquer: Recursively sort each half.
  3. Combine: Merge the two sorted halves into one sorted array.

The base case occurs when the array has one element (or no elements), as a single-element array is already sorted.

Example:


K.1) Pre-requisite: Let's understand the "Merging" helper function first

Imagine two sorted arrays:

To merge them:

  1. Compare the smallest elements (3 and 1). Take 1 (smaller).

  2. Compare 3 (left) and 2 (right). Take 2.

  3. Compare 3 (left) and 4 (right). Take 3.

  4. Continue until all elements are merged.

Result: [1, 2, 3, 4, 5, 6, 7, 8]


K.2) Example: Recursive Breakdown - Merge Sort in Action

Now let try this algorithm, step-by-step,

Input Array:
[6, 3, 8, 5, 2, 7, 4, 1]

Recursion in Action (Array Division): Sort Left and Sort Right

Merge Sort "Basic" Pseudocode

function mergeSort(array):
    if array has one element:
        return array
    split array into left and right halves
    leftSorted = mergeSort(left)
    rightSorted = mergeSort(right)
    return merge(leftSorted, rightSorted)

The merge function combines two sorted arrays into a single sorted array by comparing elements one at a time.

Merge Sort "Detailed" Pseudocode

merge_sort(array):
    if size of array <= 1:
        return array
    mid = size of array / 2
    left = merge_sort(array[0:mid])
    right = merge_sort(array[mid:end])
    return merge(left, right)

merge(left, right):
    result = []
    while left and right:
        if left[0] <= right[0]:
            append left[0] to result
            remove left[0]
        else:
            append right[0] to result
            remove right[0]
    append remaining elements of left or right to result
    return result

K.3) Implementation in C

#include <stdio.h>
#include <stdlib.h>

// Merge two subarrays into a sorted array
void merge(int array[], int left, int mid, int right) {
    int n1 = mid - left + 1;
    int n2 = right - mid;

    // Temporary arrays
    int L[n1], R[n2];

    // Copy data to temp arrays
    for (int i = 0; i < n1; i++) L[i] = array[left + i];
    for (int i = 0; i < n2; i++) R[i] = array[mid + 1 + i];

    int i = 0, j = 0, k = left;

    // Merge the temp arrays back into the main array
    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) {
            array[k] = L[i];
            i++;
        } else {
            array[k] = R[j];
            j++;
        }
        k++;
    }

    // Copy remaining elements of L[], if any
    while (i < n1) {
        array[k] = L[i];
        i++;
        k++;
    }

    // Copy remaining elements of R[], if any
    while (j < n2) {
        array[k] = R[j];
        j++;
        k++;
    }
}

// Recursive merge sort function
void merge_sort(int array[], int left, int right) {
    if (left < right) {
        int mid = left + (right - left) / 2;

        // Recursively sort both halves
        merge_sort(array, left, mid);
        merge_sort(array, mid + 1, right);

        // Merge the sorted halves
        merge(array, left, mid, right);
    }
}

void print_array(int array[], int size) {
    for (int i = 0; i < size; i++) {
        printf("%d ", array[i]);
    }
    printf("\n");
}

int main() {
    int numbers[] = {6, 3, 8, 5, 2, 7, 4, 1};
    int size = sizeof(numbers) / sizeof(numbers[0]);

    printf("Unsorted array: ");
    print_array(numbers, size);

    merge_sort(numbers, 0, size - 1);

    printf("Sorted array: ");
    print_array(numbers, size);

    return 0;
}

K.4) Visualization (divide and conquer strategy) 📊

K.5) Efficiency of Merge Sort

K.5.1) Big O: Time Complexity - O(n log n) 😱⚠️ 🚀

  1. Worst Case (Reversed Order):
    • Divide: Splitting the array takes O(log(n)) levels, as each division halves the array.
    • Merge: At each level, all n elements are merged, making it O(n) per level.
    • Overall: O(nlog(n))

Trade-offs
Space Complexity: - Weakness:


K.6) Comparison with Other Sorting Algorithms

Merge sort's consistent performance makes it superior for large datasets where space constraints are manageable.


🔙 Previous Part | Next Part 🔜

↩️ Go Back


Z) 🗃️ Glossary

File Definition
runtime -
Uncreated files Origin Note
algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
base case W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Big O W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Big O W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Big O W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Big O Notation W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
binary search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
binary search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
binary search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Binary search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Binary search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Binary Search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Bubble sort algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Bubble sort algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Bubble sort algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
custom data types W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Gaussian Sum W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
↩️ Go Back W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Lexicographical Order W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
linear search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
linear search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Linear search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Linear search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Linear Search W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Merge Sort algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Merge Sort Algorithm W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Omega Notation W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Recursion W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
recursive W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Running Time W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Selection sort W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Selection sort W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)
Selection Sort W3 - 👨‍🏫 Lecture - Learn to use Algorithms in C (Linear and Binary Search; Selection, Bubble and Merge Sort; Recursion)