W5 - Shorts 📚🔍- Data Structures in C (Linked Lists, BST, Hash Tables, Tries, Stacks, Queues, and Dictionaries)

Read more here
CS50 IDE: https://cs50.dev/ (Online VSCode)
CS50 Library Documentation: https://manual.cs50.io/#cs50.h

#CS50x #C #Computer_Science #VisualStudioCode

🔙 Previous Part | Next Part 🔜

↩️ Go Back

Table of Contents:

🔙 Previous Part | Next Part 🔜

↩️ Go Back


A) Comparing Key Data Structures: Arrays, Linked Lists, Hash Tables, and Tries 🎦

In CS50, we’ve covered various data structures—arrays, linked lists, hash tables, and tries—that serve as the foundation for most implementations in C. Each has its strengths and weaknesses, depending on your data and use case. Let’s consolidate their properties to evaluate which structure might suit specific situations.

A.1) Arrays

Strengths:

Weaknesses:


A.2) Linked Lists

Strengths:

Weaknesses:


A.3) Hash Tables

Strengths:

Weaknesses:


A.4) Tries

Strengths:

Weaknesses:


A.5) Summary of Strengths and Weaknesses

Structure Insertion Deletion Lookup Sorting Size
Array O(1) (end) / O(n) (middle) O(1) (end) / O(n) (middle) O(1) (random access) Easy (O(nlog⁡n) Compact
Linked List O(1) O(1) O(n) Hard Small
Hash Table O(1) (average) O(1) O(1) (average) Hard Flexible
Trie O(k) O(k) O(k) Inherent Large

Final Thoughts: how to chose
Choosing the right data structure depends on your specific needs:

Ultimately, all data structures come with trade-offs. The best choice depends on balancing speed, memory usage, and functionality for your application.


B) Structures in C: Creating Custom Data Types 🎦

C provides basic data types like int, float, char, and double, and CS50 introduces additional types like string and bool.

But what if we want to combine multiple variables, even of different types, into a single cohesive unit?

Enter structures, a powerful feature of C that allows us to define custom data types that unify multiple fields with logical connections.

Example:

B.1) Introduction to Structures

[[Structures]] allow us to group different variables (potentially of different types) into a "super variable."

Unlike arrays, which can only hold values of the same type, structures can mix and match data types.

B.1.0) Why Use Structures?

Logical Grouping
Structures allow us to group logically related data into a single unit, making code more readable and maintainable.

Custom Data Types
You can define types that represent real-world entities like cars, students, or books.

Example: Student Structure

struct student {
    int id;
    char name[50];
    float gpa;
};

// Create a student variable
struct student s1;
s1.id = 1001;
strcpy(s1.name, "Alice");
s1.gpa = 3.9;

B.1.1) Example: Defining a Structure

To define a structure, we use the struct keyword, followed by a name and a set of fields.

Example: Car Structure

struct car {
    int year;              // Year of manufacture
    char model[20];        // Model name
    char plate[8];         // License plate
    int odometer;          // Mileage in miles
    float engine_size;     // Engine capacity in liters
};

Important: Don’t forget the semicolon ; at the end of the structure definition!


B.1.2) Example: Using Structures

Declaring a Variable:
To create a variable of the struct car type:

struct car my_car;

Accessing Fields:
We use the [[dot operator]] (.) to access individual fields.

my_car.year = 2011;
strcpy(my_car.model, "Civic");  // Copy string into `model`
my_car.odometer = 50505;
my_car.engine_size = 1.8;

B.2) Dynamic Allocation of Structures (malloc and linked lists)

When we don’t know how many structures we’ll need at runtime, we can dynamically allocate memory using malloc.

Example: Allocating a struct car

Example: Allocating a struct car

struct car *my_car = malloc(sizeof(struct car));
my_car->year = 2015;  // Use the arrow operator (->)
strcpy(my_car->model, "Camry");
my_car->odometer = 75000;
my_car->engine_size = 2.5;

Explanation:

  1. malloc allocates enough memory for a struct car.
  2. my_car is a pointer to that memory.
  3. We use the arrow operator (->) to dereference the pointer and access fields.

B.2.1) Dot Operator vs. Arrow Operator

The arrow operator is a shorthand for:

(*my_car).year = 2011;

B.3) Note: Structures in Functions (passed by value or reference)

Structures can be passed to functions by value or reference.

By Value
Passing a copy of the structure:

void print_student(struct student s) {
    printf("ID: %d, Name: %s, GPA: %.2f\n", s.id, s.name, s.gpa);
}

print_student(s1);

By Reference
Passing a pointer to avoid copying:

void update_gpa(struct student *s, float new_gpa) {
    s->gpa = new_gpa;  // Use arrow operator for pointers
}

update_gpa(&s1, 4.0);

B.4) Key Takeaways: struct

  1. Combining Data: Structures unify variables of different types into one logical unit.

  2. Dynamic Allocation: Use malloc and the arrow operator for dynamically allocated structures.

  3. Shortcuts: The arrow operator (->) simplifies dereferencing and accessing fields.

  4. Real-World Applications: Structures can model complex data like cars, students, or even more abstract concepts.


C) Singly-Linked Lists in C 🎦

Introduction: Linked Lists in C

What is a linked list?


Defining a linked list node

typedef struct sllist {
    int value; // Data stored in the node
    struct sllist *next; // Pointer to the next node
} sll_node;

Operations on linked lists

Creating a linked list node

Function prototype:

sll_node *create(int value);

Steps:

  1. Dynamically allocate memory for a new node using malloc.
  2. Initialize fields:
    • Set value to the provided data.
    • Set next to NULL (no next node yet).
  3. Return a pointer to the new node.

Example implementation:

sll_node *create(int value) {
    sll_node *new_node = malloc(sizeof(sll_node));
    if (new_node == NULL) {
        return NULL; // Memory allocation failed
    }
    new_node->value = value;
    new_node->next = NULL;
    return new_node;
}

Searching a linked list

Function prototype:

bool find(sll_node *head, int value);

Steps:

  1. Start at the head of the list.
  2. Traverse the list using a traversal pointer.
  3. Check each node's value:
    • If found, return true.
    • If NULL is reached, return false.

Example implementation:

bool find(sll_node *head, int value) {
    sll_node *trav = head;
    while (trav != NULL) {
        if (trav->value == value) {
            return true;
        }
        trav = trav->next;
    }
    return false;
}

Inserting at the beginning

Function prototype:

sll_node *insert(sll_node *head, int value);

Steps:

  1. Allocate a new node with the given value.
  2. Point the new node's next to the current head of the list.
  3. Move the head pointer to the new node.
  4. Return the updated head pointer.

Example implementation:

sll_node *insert(sll_node *head, int value) {
    sll_node *new_node = create(value);
    if (new_node == NULL) {
        return NULL; // Memory allocation failed
    }
    new_node->next = head;
    return new_node;
}

Deleting the entire list

Function prototype:

void destroy(sll_node *head);

Steps:

  1. Use recursion to delete the rest of the list before freeing the current node.
  2. Base case: Stop when head is NULL.

Example implementation:

void destroy(sll_node *head) {
    if (head == NULL) {
        return; // Base case: end of the list
    }
    destroy(head->next); // Recursively delete the rest of the list
    free(head); // Free the current node
}

Deleting a single element

Challenges:


Advantages and disadvantages of linked lists

Advantages:

  1. Dynamic size: Can grow or shrink as needed.
  2. Efficient insertion/deletion: At the beginning of the list, operations are O(1).

Disadvantages:

  1. Sequential access: No random access like arrays.
  2. Memory overhead: Each node requires extra memory for the pointer.
  3. Complexity: Manipulating pointers can be error-prone.

Next steps: doubly linked lists


D) Doubly-Linked Lists in C 🎦

Introduction to Doubly-Linked Lists


Structure of a Doubly-Linked List Node

typedef struct dllist {
    int value; // Data stored in the node
    struct dllist *next; // Pointer to the next node
    struct dllist *prev; // Pointer to the previous node
} dll_node;

Operations on Doubly-Linked Lists

Inserting a Node at the Beginning

Function prototype:

dll_node *insert(dll_node *head, int value);

Steps:

  1. Dynamically allocate memory for a new node.
  2. Set the node's value.
  3. Point the new node's next to the current head of the list.
  4. Set the prev pointer of the old head to the new node.
  5. Update the list's head pointer to point to the new node.
  6. Return the updated head pointer.

Example implementation:

dll_node *insert(dll_node *head, int value) {
    dll_node *new_node = malloc(sizeof(dll_node));
    if (new_node == NULL) {
        return NULL; // Memory allocation failed
    }
    new_node->value = value;
    new_node->next = head;
    new_node->prev = NULL; // New node has no previous node

    if (head != NULL) {
        head->prev = new_node; // Update the old head's previous pointer
    }
    return new_node;
}

Deleting a Node

Function prototype:

void delete_node(dll_node **head_ref, dll_node *node_to_delete);

Steps:

  1. Update the next pointer of the previous node (if it exists) to skip the node being deleted.
  2. Update the prev pointer of the next node (if it exists) to skip the node being deleted.
  3. Free the memory occupied by the node.
  4. If the node being deleted is the head, update the head pointer.

Example implementation:

void delete_node(dll_node **head_ref, dll_node *node_to_delete) {
    if (node_to_delete == NULL) return;

    // Update the next pointer of the previous node
    if (node_to_delete->prev != NULL) {
        node_to_delete->prev->next = node_to_delete->next;
    } else {
        // Node to delete is the head
        *head_ref = node_to_delete->next;
    }

    // Update the prev pointer of the next node
    if (node_to_delete->next != NULL) {
        node_to_delete->next->prev = node_to_delete->prev;
    }

    free(node_to_delete); // Free the memory occupied by the node
}

Advantages of Doubly-Linked Lists

  1. Bidirectional traversal:
    • Nodes can be traversed both forwards and backwards.
  2. Efficient deletion:
    • Deleting a node requires fewer steps since we can move backward.
  3. Insertion flexibility:
    • New nodes can be inserted at the beginning, middle, or end with minimal pointer adjustments.

Trade-offs of Linked Lists

Advantages

Disadvantages


Comparison: Singly vs. Doubly-Linked Lists

Feature Singly-Linked List Doubly-Linked List
Traversal Forward only Forward and backward
Memory usage Less (one pointer per node) More (two pointers per node)
Insertion Simple at the beginning Slightly more complex
Deletion Requires extra traversal More efficient

Applications of Doubly-Linked Lists


Summary: Doubly-Linked Lists


E) Hash Tables in C 🎦

Introduction to Hash Tables


Core Components of a Hash Table

  1. Hash Function:
    • Takes input data and generates a non-negative integer (hashcode).
    • Determines where data is stored in the array.
  2. Array:
    • Stores pointers to data (or linked lists in advanced implementations).

How a Hash Table Works

  1. Data is passed through a hash function.
  2. The hash function returns a hashcode.
  3. The hashcode is used as an index to store the data in the array.
  4. To retrieve the data:
    • Pass the input through the hash function.
    • Use the resulting hashcode to access the data directly.

Example:


Hash Functions: Characteristics of a Good Hash Function

  1. Uses only the input data:
    • Should not rely on external factors.
  2. Uses all of the input data:
    • Incorporates the entire dataset.
  3. Deterministic:
    • For the same input, always produces the same output.
  4. Uniform distribution:
    • Evenly distributes hashcodes across the table.
  5. Minimizes collisions:
    • Avoids assigning the same hashcode to different inputs.

Example of a Simple Hash Function

int hash(char *str) {
    int sum = 0;
    for (int j = 0; str[j] != '\0'; j++) {
        sum += str[j]; // Add ASCII values of characters
    }
    return sum % HASH_MAX; // Ensure the hashcode fits in the array
}

Explanation:


Collisions and Solutions

Collision: Two pieces of data generate the same hashcode.

Collision Resolution Methods

  1. Linear Probing:

    • If the desired index is occupied, check the next available index (e.g., hashcode + 1).
    • Problem: Clustering—groups of occupied indices grow, increasing collisions.
  2. Separate Chaining:

    • Store multiple elements at the same index using a linked list.
    • Each array index points to the head of a linked list.
    • Efficient for dynamic growth and avoids clustering.

Implementation of Separate Chaining:


Example: Separate Chaining

  1. Initial State:

    • Array of size 10, all elements set to NULL.
  2. Insert Joey:

    • Hash Joey6.
    • Create a linked list at index 6, starting with Joey.
  3. Insert Phoebe:

    • Hash Phoebe6.
    • Add Phoebe to the linked list at index 6.
  4. Insert Ross:

    • Hash Ross2.
    • Create a linked list at index 2, starting with Ross.
  5. Lookup Joey:

    • Hash Joey6.
    • Traverse the linked list at index 6 to find Joey.

Advantages of Separate Chaining

  1. Avoids Clustering:
    • Multiple elements can exist at the same hashcode without performance degradation.
  2. Dynamic Growth:
    • Linked lists allow arbitrary growth at each index.

Advantages and Limitations of Hash Tables

Advantages

  1. Fast Operations:
    • Average case for insertion, deletion, and lookup is O(1).
  2. Flexible Growth:
    • Separate chaining avoids fixed size limitations of arrays.

Limitations

  1. Collisions:
    • Although reduced, collisions still occur.
  2. Inefficient Sorting:
    • Hash tables are unsuitable for sorting or ordering data.
  3. Memory Overhead:
    • Requires additional space for pointers (linked lists).

Key Takeaways: Hash Tables


F) Tries in C 🎦

Introduction to Tries


Comparison with Other Data Structures


Structure of a Trie

Node Definition:

typedef struct trie_node {
    char *data;             // Value associated with the key (e.g., university name)
    struct trie_node *next[10]; // Array of 10 pointers for digits 0-9
} trie_node;

Insertion in a Trie

  1. Start at the Root:
    • Begin navigation from the root node.
  2. Navigate the Key:
    • For each digit/character in the key, move to the corresponding branch.
    • If the branch doesn’t exist, create a new node.
  3. Insert Data:
    • Once the entire key is processed, store the associated value (e.g., the university name) in the final node.

Example: Insert "Harvard" with key 1636.


Searching in a Trie

  1. Start at the Root:
    • Use a traversal pointer starting at the root.
  2. Navigate the Key:
    • For each digit/character in the key, follow the corresponding branch.
    • If a branch doesn’t exist, the data isn’t in the trie.
  3. Check Data:
    • If the entire key is processed and a node exists, check its data.

Example: Search for "Harvard" with key 1636.


Efficiency of Tries


Tradeoffs of Tries

  1. Advantages:

    • No collisions: Keys guarantee unique paths.
    • Fast operations: Constant time for insertion, lookup, and deletion for fixed-length keys.
    • Scalable with existing branches: As more data is added, fewer new nodes are needed.
  2. Disadvantages:

    • Memory-intensive: Each node contains multiple pointers (e.g., 10 for digits, 26 for letters), even if most are NULL.
    • Complexity: Implementation requires careful management of pointers and memory allocation.

Example Use Cases

  1. Storing Dictionaries:
    • Keys: Words (e.g., "bat", "batch").
    • Values: Definitions or metadata.
  2. Mapping Data:
    • Keys: Years (e.g., 1636 for Harvard, 1701 for Yale).
    • Values: Names of universities.

Key Takeaways


G) Stacks in C 🎦

Introduction to Stacks


Array-Based Stacks

Operations

Push
  1. Check if the stack is full (top == CAPACITY).
  2. Add the element at data[top].
  3. Increment top.

Example:

void push(Stack *s, int value) {
    if (s->top == CAPACITY) {
        printf("Stack overflow\n");
        return;
    }
    s->data[s->top++] = value;
}
Pop
  1. Check if the stack is empty (top == 0).
  2. Decrement top.
  3. Return the element.

Example:

int pop(Stack *s) {
    if (s->top == 0) {
        printf("Stack underflow\n");
        return -1; // Sentinel value
    }
    return s->data[--s->top];
}

Linked-List-Based Stacks

Operations

Push
  1. Allocate memory for a new node.
  2. Assign the value to the node.
  3. Update the next pointer to point to the current head.
  4. Move the head to the new node.

Example:

void push(Stack *s, int value) {
    Node *newNode = malloc(sizeof(Node));
    if (!newNode) {
        printf("Memory allocation failed\n");
        return;
    }
    newNode->data = value;
    newNode->next = s->head;
    s->head = newNode;
}
Pop
  1. Check if the stack is empty (head == NULL).
  2. Save a reference to the current head.
  3. Update the head to the next node.
  4. Free the old head.
  5. Return the data.

Example:

int pop(Stack *s) {
    if (!s->head) {
        printf("Stack underflow\n");
        return -1; // Sentinel value
    }
    Node *temp = s->head;
    int value = temp->data;
    s->head = temp->next;
    free(temp);
    return value;
}

Comparing Implementations

Feature Array-Based Linked-List-Based
Memory Fixed size, pre-allocated. Dynamic size, memory allocated per node.
Efficiency Faster due to contiguous memory. Slower due to pointer operations.
Flexibility Limited by initial capacity. Grows/shrinks as needed.
Ease of Use Simpler to implement. More complex due to memory management.

Applications of Stacks


H) Queues in C 🎦

Introduction to Queues

Key Operations:

  1. Enqueue: Add an element to the end of the queue.
  2. Dequeue: Remove an element from the front of the queue.

Implementations of Queues

Queues can be implemented using:

  1. Arrays
  2. Linked Lists

Array-Based Queue

Structure Definition

Example:

typedef struct {
    int data[CAPACITY]; // Array to hold queue elements
    int front;          // Index of the oldest element
    int size;           // Number of elements in the queue
} queue;

Enqueue Operation

Purpose: Add an element to the end of the queue.

Steps:

  1. Check if the queue is full (size == CAPACITY).
  2. Place the new element at the position front + size (mod CAPACITY for wrapping around).
  3. Increment the size of the queue.

Visualization:

  1. Start with front = 0 and size = 0.
  2. Add elements:
    • Enqueue 28: Place at data[0], increment size to 1.
    • Enqueue 33: Place at data[1], increment size to 2.
    • Enqueue 19: Place at data[2], increment size to 3.

Dequeue Operation

Purpose: Remove the oldest element (at the front of the queue).

Steps:

  1. Check if the queue is empty (size == 0).
  2. Retrieve the element at front.
  3. Increment front (mod CAPACITY for wrapping around).
  4. Decrement the size.

Visualization:

  1. Start with front = 0, size = 3, and elements [28, 33, 19].
  2. Dequeue:
    • Remove 28: Update front = 1, decrement size to 2.
    • Remove 33: Update front = 2, decrement size to 1.

Linked-List-Based Queue

Structure Definition

Example:

typedef struct node {
    int data;
    struct node *next;
    struct node *prev; // Optional for singly linked lists
} node;

typedef struct {
    node *head; // Pointer to the first element
    node *tail; // Pointer to the last element
} queue;

Enqueue Operation

Purpose: Add a new element to the end of the queue.

Steps:

  1. Dynamically allocate memory for the new node.
  2. Set the node's data to the new value.
  3. Update pointers:
    • Point the current tail’s next to the new node.
    • Set the new node’s prev to the current tail.
    • Update tail to point to the new node.

Visualization:

  1. Start with head pointing to 12 and tail pointing to 13.
  2. Enqueue 10:
    • Allocate memory for the new node.
    • Set 10.prev to 13 and 13.next to 10.
    • Update tail to point to 10.

Dequeue Operation

Purpose: Remove the oldest element (at the head of the queue).

Steps:

  1. Retrieve the element at the head.
  2. Update pointers:
    • Move head to the next node.
    • Set the new head's prev to NULL.
  3. Free the memory of the removed node.

Visualization:

  1. Start with head pointing to 12 and tail pointing to 10.
  2. Dequeue:
    • Free the node containing 12.
    • Update head to point to 15.

Comparison of Array vs. Linked List Implementation

Aspect Array Linked List
Memory Usage Fixed size (CAPACITY). Dynamically allocated, flexible size.
Speed Fast access, potential wraparound. Slightly slower due to pointer manipulation.
Ease of Use Simple for small, static queues. More complex but scales dynamically.

Applications of Queues

  1. Scheduling:
    • CPU task scheduling.
    • Printer job scheduling.
  2. Data Streams:
    • Buffering in communication protocols.
  3. Breadth-First Search (BFS):
    • Used to explore nodes level by level in graphs or trees.

Queues provide a fair, FIFO-based mechanism for managing data and are critical in a variety of computational tasks.


🔙 Previous Part | Next Part 🔜

↩️ Go Back


Z) 🗃️ Glossary

File Definition
Uncreated files Origin Note
dot operator W5 - Shorts 📚🔍- Data Structures in C (Linked Lists, BST, Hash Tables, Tries, Stacks, Queues, and Dictionaries)
Structures W5 - Shorts 📚🔍- Data Structures in C (Linked Lists, BST, Hash Tables, Tries, Stacks, Queues, and Dictionaries)