W4 - 👨🏫 Lecture - Understanding Hexadecimal Memory Addresses, Pointers, and Strings in C
Video: YouTube
Video:
Video: https://youtu.be/F9-yqoS7b8w
Read more here
Table of Contents:
A) Intro - Removing Training Wheels: The CS50 Library
Welcome to week four of CS50!
For the past few weeks, we've been using the CS50 library as a form of "training wheels" for programming in C.
Read more at https://manual.cs50.io/#cs50.h
This library simplifies tasks by providing functions like get_string
and get_int
, and you’ve been using it by including cs50.h
at the top of your programs.
Additionally, commands like clang
automatically linked the library using -lcs50
, but make
has automated all of this for you.
Today, we’ll start taking those training wheels off, diving into the mechanics of computers and memory.
While C may seem complicated at first, there are only a few key concepts you need to understand to unlock far more sophisticated and exciting problems.
Let’s begin by "relearning how to count."
B) Counting in Hexadecimal
Think about your computer’s memory as a grid of bytes, each with its own address.
While humans typically count in decimal (base-10)
or [[binary]] (base-2),
computer scientists often use [[hexadecimal]] (base-16).
Hexadecimal allows us to represent values using 16 symbols: 0-9 followed by A-F.
For example:
- Decimal
0-9
is the same in hexadecimal. - Decimal
10
becomesA
,11
becomesB
, and so on up toF
, which equals decimal15
.
Hexadecimal simplifies memory addressing because each digit represents four bits (a "nibble").
This means you can efficiently represent a byte (eight bits) with just two hexadecimal digits.
For example:
- Binary
11111111
(eight 1's) equals decimal255
and hexadecimalFF
.
Our grid with memory addresses can now look like this,
Note: By convention, hexadecimal numbers are often prefixed with 0x
to distinguish them from decimal numbers. For example, 0xFF
clearly indicates a hexadecimal value.
B.1) Example: Hexadecimal in RGB Colors
If you've worked with graphics software or web design, you've likely encountered hexadecimal in color codes.
For instance:
#000000
represents black (no red, green, or blue).#FFFFFF
represents white (full red, green, and blue).#FF0000
is pure red,#00FF00
is green, and#0000FF
is blue.
This system allows programmers and artists to represent colors efficiently using hexadecimal values for red, green, and blue intensities.
C) Computer Memory Addresses Explained
C.1) Exploring Memory with Code: using the "address-of" operator (&)
Let’s see how these concepts apply to your computer's memory.
Our grid with memory addresses can now look like this,
Consider this simple C program:
#include <stdio.h>
int main(void) {
int n = 50;
printf("%i\n", n);
}
This code declares an integer variable n
with a value of 50
and prints it.
When executed, this value is stored in a specific memory location.
To see where n
resides in memory, we can modify the program to print its address:
#include <stdio.h>
int main(void) {
int n = 50;
printf("%p\n", &n); // Print the address of n
}
Here, &
is the [[address-of operator]], which tells C to retrieve the memory address of n
.
When compiled and run, this program outputs something like 0x7ffd80792f7c
.
This hexadecimal value is the memory address where n
is stored.
C.2) Introducing Pointers - a variable that stores memory addresses ( * )
A [[pointer]] is a variable that stores the address of another variable.
In C, pointers are declared using the *
symbol, like this:
int *p = &n; // Pointer p stores the address of n
Here’s what’s happening:
int *p
declaresp
as a pointer to an integer.&n
retrieves the address ofn
.p
now holds the memory address ofn
.
We can even see the memory address again by printing the [[pointer]],
They cancel each other,
To access the value stored at the address p
points to, use the dereference operator (*
):
printf("%i\n", *p); // Prints the value at the address stored in p
For example,
C.3) Visualizing Pointers in the Memory Grid
Imagine your computer's memory as a grid:
n
is stored at address0x123
, with a value of50
.p
is stored elsewhere and holds the value0x123
(the address ofn
).
Instead of focusing on raw addresses, we often abstract this relationship as p
pointing to n
.
This abstraction simplifies thinking about pointers without worrying about specific addresses.
D) Strings Are Pointers
Strings in C are [[arrays]]] of [[characters]] stored in contiguous memory, ending with a special \0
([[null terminator (NUL)]]) to indicate the end.
For example, the string "HI!"
is stored as an [[array]]:
which is equivalent to having some [[bytes]], each with their contiguous [[memory addresses]]:
When you declare a string like this:
string s = "HI!";
Internally, s
is just a [[pointer]] to the first character ('H'
) at address 0x123
.
This means that the variable s
holds the address 0x123
,
and you can traverse the rest of the string by accessing subsequent memory locations.
For example:
And all we need to be careful about is finding the [[null terminator (NUL)]] to know that the string has ended,
D.1) Example: Removing the Training Wheels and create Strings as Pointers
Until now, the CS50 library has abstracted [[strings]] for you by introducing the string
data type.
However, in raw C, there’s no string
type—it’s actually a char *
(a pointer to a character).
For example:
char *s = "HI!";
The CS50 library uses typedef struct
to create string
as a synonym for char *
.
This simplification makes it easier to work with strings early in the course but isn’t necessary as you progress and understand pointers more deeply.
- Hexadecimal is a base-16 system used for memory addressing and color representation.
- Pointers store addresses of variables and allow you to directly manipulate memory.
- Strings in C are pointers to the first character of a null-terminated array of characters.
Understanding these building blocks will allow you to write more sophisticated programs and manipulate data at a lower level, giving you greater control over how your code interacts with the machine.
D.2) Using Pointer Arithmetic with Strings
Recap: Strings in C are technically [[pointers]] to the first character of a null-terminated sequence of characters.
For example, "HI!"
is stored contiguously in memory, and the variable s
holds the address of the first character ('H'
).
Using *s
dereferences the pointer, allowing you to access the value ('H'
) stored at the address.
Pointer arithmetic lets you move through memory:
*(s + 1)
retrieves the second character ('I'
).*(s + 2)
retrieves the third character ('!'
).
This mechanism underlies the common square bracket notation (s[1]
),
For example:
which is syntactic sugar for *(s + 1)
.
For example:
Pointer arithmetic allows you to traverse memory by directly manipulating the pointer:
char *s = "HI!";
printf("%c\n", *s); // Outputs: H
printf("%c\n", *(s + 1)); // Outputs: I
printf("%c\n", *(s + 2)); // Outputs: !
This behavior highlights that strings in C are addresses, and the square bracket notation is simply a user-friendly abstraction.
- Accessing memory beyond a string's length (e.g.,
*(s + 3)
) may lead to undefined behavior, such as reading the null terminator (\0
) or causing a segmentation fault. - A segmentation fault occurs when you attempt to access memory outside the bounds allocated to your program.
D.2.1) Note: Comparing Strings (understanding the strcmp function)
Comparing strings with __==__
doesn't check their content—it compares their memory addresses.
They are stored at different addresses,
To compare the actual characters, we can use a loop to compare each character, but it is easier to just #include <string.h>
,
and use the strcmp
function:
#include <string.h>
if (strcmp(s, t) == 0) {
printf("Strings are the same\n");
} else {
printf("Strings are different\n");
}
strcmp
compares strings lexicographically:
- Returns
0
if the strings are the same. - Returns a negative value if the first string is "less than" the second.
- Returns a positive value if the first string is "greater than" the second.
D.3) Copying Strings using malloc
Using =
to assign one string to another copies the address, not the contents.
To do this correctly, we need to use the function malloc
, which comes included in the library stdlib.h
,
malloc
allow us to allocate memory for the new string, and now with a for
loop, we go to each of the elements in s
and create a copy t
with the first letter capitalized,
Note: to prevent bugs when the input has no character, use an if
,
using malloc...
So, in general, to copy a string:
-
Allocate memory for the new string using
malloc
. -
Check that you don't get a [[null pointer (NULL)]]
-
Copy the characters, including the [[null terminator (NUL)]] (
\0
), using a loop orstrcpy
:char *t = malloc(strlen(s) + 1); // Allocate memory for the new string if (t == NULL) // check if malloc functioned correctly and not shows and empty pointer { printf("Memory allocation failed\n"); return 1; } strcpy(t, s); // Copy the string
This creates an independent copy of the string in memory.
strcpy
strcpy
simplifies string copying:
char *t = malloc(strlen(s) + 1);
if (t != NULL) {
strcpy(t, s);
}
strcpy
automatically handles the loop and null terminator for you.
NULL
NULL
(all caps) is a special pointer value that represents "no address."- It’s different from
\0
(the null character) used to terminate strings. - Always check the result of
malloc
:
if (t == NULL) {
printf("Memory allocation failed\n");
return 1;
}
malloc
malloc
dynamically allocates memory:
- Syntax:
void *malloc(size_t size)
size_t size
: Number of bytes to allocate.- Returns the address of the allocated memory or
NULL
if the allocation fails. - Always add 1 to account for the null terminator when copying strings.
Example:
char *s = "HI!";
char *t = malloc(strlen(s) + 1);
if (t != NULL) {
strcpy(t, s);
}
D.31) Releasing Memory with free
after using malloc
When using malloc
, you must release the memory with free
to avoid memory leaks:
Failure to call free
can cause your program to consume increasing amounts of memory over time, leading to crashes or performance issues.
Everytime you use malloc
, you must use free
at the end of your program!!
D.3.2) Note: CS50's get_string
(it uses malloc
)
- The
get_string
function internally usesmalloc
to allocate enough memory to store user input. - The CS50 library automatically frees the allocated memory at the end of the program, but when you use
malloc
, you must manage memory manually.
- Strings are pointers to the first character in a null-terminated sequence.
- Use
strcmp
to compare strings by content, not addresses. - Use
malloc
to allocate memory for string copies andfree
to release it. - Understand the difference between
\0
(null character) andNULL
(null pointer). - Avoid segmentation faults by accessing memory within bounds and freeing allocated memory.
By mastering pointers, memory allocation, and string operations, you gain control over low-level memory management, enabling more efficient and powerful C programming.
E) Debugging Memory-Related Bugs with Valgrind (Proper Pointer Usage)
E.1) What is Valgrind?
[[Valgrind]] is a powerful tool for detecting memory-related bugs in C programs.
It helps identify issues such as:
- Invalid memory accesses: Reading or writing memory that hasn't been allocated or is out of bounds.
- Memory leaks: Forgetting to free allocated memory, leading to gradual depletion of available memory.
To run Valgrind:
valgrind ./program_name
It outputs detailed information about memory usage, errors, and leaks.
While the output can seem overwhelming, focus on:
- Invalid reads/writes: Indicates out-of-bounds memory access.
- Memory leaks: Reports memory that was allocated but not freed.
E.1.1) Example: Troubleshooting with Valgrind
Code with Mistakes:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char *s = malloc(3); // Mistake: Only 3 bytes allocated for 4 needed
s[0] = 'H';
s[1] = 'I';
s[2] = '!';
s[3] = '\0'; // Writing out of bounds
printf("%s\n", s);
// Missing free(s); // Memory is not freed
return 0;
}
Errors in the Code:
- Allocating only 3 bytes when 4 are needed.
- Writing to the 4th byte without allocating enough memory.
- Not freeing memory allocated with
malloc
.
Notice the error is not obvious, since we are able to compile and get the expected output,
Run Valgrind (note, you can use help50 with [[valgrind]] to get some help)
wow... the output looks crazy....
Pay attention to this:
==12345== Invalid write of size 1
==12345== at 0x4005FD: main (memory.c:10)
==12345== Address 0x5203040 is 0 bytes after a block of size 3 alloc'd
==12345== Invalid read of size 1
==12345== at 0x40063B: main (memory.c:11)
==12345== Address 0x5203043 is 0 bytes after a block of size 3 alloc'd
==12345== LEAK SUMMARY:
==12345== definitely lost: 3 bytes in 1 blocks
Key Insights - Valgrind Report:
- Invalid Write: Writing to the 4th byte (
s[3]
) is not allowed because only 3 bytes were allocated. - Invalid Read: Reading the string (e.g., in
printf
) uses the invalid write. - Memory Leak:
malloc
allocated 3 bytes, butfree
was not called.
Fixing the Issues:
-
Allocate Enough Memory:
-
Correctly account for the null terminator (
\0
):char *s = malloc(4);
-
-
Free Allocated Memory:
-
Call
free(s)
when the memory is no longer needed:free(s);
-
Corrected Code:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char *s = malloc(4); // Allocate 4 bytes for "HI!\0"
if (s == NULL) {
printf("Memory allocation failed\n");
return 1;
}
s[0] = 'H';
s[1] = 'I';
s[2] = '!';
s[3] = '\0'; // Properly terminate the string
printf("%s\n", s);
free(s); // Free allocated memory
return 0;
}
Valgrind confirms we did a good job,
E.1.2) Example: Proper Pointer Initialization - use sizeof()
with malloc()
and avoid Garbage Values
This program has a logical flaw because of improper pointer usage, particularly involving the variable y
.
Code Walkthrough:
-
Pointer Declarations:
int *x;
andint *y;
are declared, which are pointers to integers.
-
Memory Allocation:
x = malloc(sizeof(int));
- Dynamic memory is allocated for
x
, enough to store anint
. At this point,x
is a valid pointer.
- Dynamic memory is allocated for
-
Assigning Values via Dereferencing:
*x = 42;
- The value
42
is stored in the memory location allocated tox
.
- The value
-
Undefined Behavior - Dereferencing an Uninitialized Pointer (
y
):*y = 13;
y
has not been initialized or allocated memory. Attempting to dereference it (i.e.,*y = 13
) leads to undefined behavior. This could cause:- A segmentation fault.
- Corruption of memory or crashing the program.
Garbage Values and Uninitialized Pointers
-
When you declare a pointer without initializing it, it may contain a garbage value, which could point to an invalid memory address.
-
Example of dangerous behavior:
int *y; // Uninitialized pointer. *y = 13; // Undefined behavior: dereferencing an invalid address.
- This can cause a segmentation fault, as you're accessing memory that does not belong to your program.
Memory Reuse and Garbage Values
- Memory from freed pointers or uninitialized variables might contain garbage values:
- Freed memory is not cleared; it's marked as reusable but still contains old data.
- Always assume uninitialized memory contains random values until explicitly set.
How to Fix It: Properly allocate memory for y
before dereferencing,
Code:
int main(void)
{
int *x;
int *y;
x = malloc(sizeof(int));
if (x == NULL) return 1; // Handle malloc failure
y = malloc(sizeof(int)); // Allocate memory for y
if (y == NULL) return 1; // Handle malloc failure
*x = 42;
*y = 13;
free(x); // Free memory to avoid leaks
free(y);
return 0;
}
Memory Management: Always free memory allocated with malloc
to avoid memory leaks:
free(x); // Releases memory allocated to x.
x = NULL; // Prevents dangling pointer issues.
```
> [!summary] Note: Takeaways from Binky
> - **Pointers need pointees**: A pointer must point to a valid memory location before being used.
>- **Pointers must point to valid memory before dereferencing**.
>- Using an uninitialized pointer (like `int *y`) leads to undefined behavior.
>- Allocate memory before dereferencing pointers:
>
>```c
> int *x = malloc(sizeof(int)); // Allocate memory for an integer
> *x = 42; // Assign a value
> free(x); // Free memory
> ```
>
>
>By understanding and applying these practices, you can effectively manage memory in C, reducing errors and improving program stability.
- - - -
#### E.1.3) Summary: Key Concepts and Best Practices for Using Pointers
1. **`malloc` and `free`:**
- `malloc` allocates memory, and you must pair it with `free` to release the memory.
- Example:
```c
char *ptr = malloc(10); // Allocate 10 bytes
free(ptr); // Release memory
```
2. **Invalid Reads/Writes:**
- Writing outside the bounds of allocated memory is an invalid write.
- Reading unallocated memory is an invalid read.
3. **Memory Leaks:**
- Forgetting to call `free` leads to memory leaks, where the program holds onto memory it no longer uses.
**Best Practices for Using Pointers and Memory:**
1. **Always check `malloc` return value**:
```c
char *ptr = malloc(size);
if (ptr == NULL) {
// Handle memory allocation failure
}
```
2. **Initialize pointers**:
- Avoid garbage values by setting pointers to `NULL` or valid memory.
- Example:
```c
int *ptr = NULL; // Initialize pointer
```
3. **Avoid dangling pointers**:
- After freeing memory, set the pointer to `NULL`:
```c
free(ptr);
ptr = NULL;
```
4. **Use tools like Valgrind**:
- Regularly run Valgrind to check for memory leaks or invalid accesses:
```bash
valgrind ./program_name
```
- - -
#### E.1.4) Example: Uninitialized Variables and Garbage Values with Arrays
- **Issue**: Using uninitialized variables ( e.g., an array) leads to unpredictable values, known as _garbage values_.
**Example**: get garbage values
```c
int scores[3];
for (int i = 0; i < 3; i++) {
printf("%d\n", scores[i]); // Prints garbage values
}
```

- **Reason**: Memory is not automatically cleared when allocated. It retains whatever was previously stored there.
**Fix**: Always initialize variables before use:
```c
int scores[3] = {0}; // All elements set to 0
F) Example: Swapping Values with Temporary Variables and Pointers
-
Swapping values requires a temporary variable:
int temp = a; a = b; b = temp;
-
When swapping values in functions, remember:
- [[Pass-by-value]]: Copies values, so changes inside the function do not affect the original variables.
- [[Pass-by-reference]] (via pointers): Allows changes to the original variables by passing their addresses.
Example Using Pointers: pass addresses to the function, not just values
void swap(int *a, int *b) {
int temp = *a;
*a = *b;
*b = temp;
}
int main() {
int x = 1, y = 2;
swap(&x, &y);
printf("x: %d, y: %d\n", x, y); // Outputs: x: 2, y: 1
}
G) Memory Layout: understanding Heap and Stack
- [[Heap]]: Memory dynamically allocated using
malloc
. Memory is manually managed and must be freed withfree
. - [[Stack]]: Automatically managed memory for function calls, including local variables, main function, etc.
- Key Concepts:
- The stack grows downward (towards lower memory addresses).
- The heap grows upward (towards higher memory addresses).
- Stack overflow occurs when too many function calls are made, consuming all available stack space.
G.1) Recursive Functions and Stack Overflows
- Recursive functions call themselves and use stack memory for each call.
- Base Case: Prevents infinite recursion.
- Risk: Excessive recursion can lead to [[stack overflow]].
Example:
void draw(int height) {
if (height == 0) return; // Base case, without it we get stack overflow
draw(height - 1); // recursion
for (int i = 0; i < height; i++) {
printf("#");
}
printf("\n");
}
G.2) Buffer Overflows
- Definition: Accessing memory outside the allocated boundaries of an array or [[buffer]].
Example:
char buffer[5];
strcpy(buffer, "Hello, world!"); // Overflows buffer
-
Implications:
- [[Buffer overflow]] can corrupt memory or cause program crashes.
- Exploitable in security vulnerabilities.
-
Prevention:
- Use safer functions like
strncpy
for strings. - Always validate input sizes before writing to buffers.
- Use safer functions like
G.3) Key Takeaways for Robust Memory Management
- Always initialize variables: Avoid garbage values.
- Manage memory explicitly: Free memory allocated with
malloc
. - Use pointers carefully: Ensure they are initialized before dereferencing.
- Test thoroughly: Use tools like Valgrind to catch hidden bugs.
- Design cautiously:
- Limit recursion depth to avoid stack overflow.
- Validate inputs to prevent buffer overflows.
- Debug systematically:
- Use
printf
for quick checks. - Leverage debuggers and tools like Valgrind for deeper issues.
- Use
By applying these principles, you can write safer, more efficient C programs while navigating the complexities of memory management.
H) Transitioning from Training Wheels (Advanced C Concepts)
-
CS50 Library:
- Abstracts complex operations for simplicity (e.g.,
get_int
,get_string
). - Prevents common errors (e.g., invalid input handling).
- Abstracts complex operations for simplicity (e.g.,
-
Moving Forward:
- Rely less on CS50's abstractions.
- Use standard C functions (
scanf
,fopen
,fread
,fwrite
, etc.).
H.1) Dangers of scanf
- Problem:
scanf
is not error-robust.- Fails silently when input doesn't match the expected format.
- No built-in re-prompting for invalid input (unlike CS50's
get_int
).
Example:
int x;
printf("x: ");
scanf("%d", &x);
printf("x: %d\n", x);
- If user types "cat",
x
is set to0
without error.
H.2) Handling Strings with scanf
- Challenge: Requires pre-allocated memory for strings.
Example:
char s[50]; // Allocates 50 bytes - USES Stack Memory
printf("Name: ");
scanf("%49s", s); // Reads at most 49 characters
- Risk: Input exceeding allocated size causes buffer overflow.
Alternative: Dynamic memory allocation:
char *s = malloc(100); // Allocates memory dynamically - USES Heap Memory
scanf("%99s", s); // Reads safely into allocated space
free(s); // Free memory after use
I) File Manipulation with C
I.1) File Input/Output (Write and Read from Files) - FILE Data Type
- Key Functions:
fopen
: Open a file.fprintf
: Write to a file.fread
: Read from a file.fclose
: Close a file.
Example: Writing to a File:
FILE *file = fopen("phonebook.csv", "a");
if (file == NULL) {
printf("Error opening file.\n");
return 1;
}
fprintf(file, "%s,%s\n", "David", "123-456-7890");
fclose(file);
Example: Reading from a File:
FILE *file = fopen("phonebook.csv", "r");
if (file == NULL) {
printf("Error opening file.\n");
return 1;
}
char name[50], phone[15];
while (fscanf(file, "%49[^,],%14s\n", name, phone) != EOF) {
printf("Name: %s, Phone: %s\n", name, phone);
}
fclose(file);
I.2) Working with Images
I.2.1) File Formats and Byte-Level Manipulation (Identify JPEG Files)
- Magic Numbers:
- Specific sequences of bytes at the start of a file indicate its type.
- Example: JPEG files begin with
0xFF
,0xD8
,0xFF
.
Reading File Headers:
unsigned char buffer[3];
fread(buffer, sizeof(buffer), 1, file);
if (buffer[0] == 0xFF && buffer[1] == 0xD8 && buffer[2] == 0xFF) {
printf("This is a JPEG file.\n");
}
I.2.2) Example: Copying Files (cp
Implementation)
- Concept:
- Open a source file for reading.
- Open a destination file for writing.
- Read and write byte-by-byte.
Example:
FILE *src = fopen("source.jpg", "r");
FILE *dest = fopen("copy.jpg", "w");
unsigned char buffer[512];
size_t bytesRead;
while ((bytesRead = fread(buffer, 1, sizeof(buffer), src)) > 0) {
fwrite(buffer, 1, bytesRead, dest);
}
fclose(src);
fclose(dest);
I.2.3) Example: Manipulating Image Files
- Bitmaps:
- Represent images as grids of pixels.
- Each pixel's color is defined by RGB (Red, Green, Blue) values.
Example: Grayscale Filter:
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
RGBTRIPLE pixel = image[i][j];
int grayscale = (pixel.rgbtRed + pixel.rgbtGreen + pixel.rgbtBlue) / 3;
pixel.rgbtRed = pixel.rgbtGreen = pixel.rgbtBlue = grayscale;
image[i][j] = pixel;
}
}
I.3) Summary: Low Level C and File Manipulation
- From Abstractions to Low-Level Code:
- Transitioning from CS50 library functions to standard C requires handling memory and file operations directly.
- Memory Safety:
- Allocate sufficient space for user input and file reads/writes.
- Always free dynamically allocated memory.
- Practical Applications:
- File I/O introduces real-world programming capabilities (e.g., saving, copying, and editing files).
- Image manipulation builds foundational knowledge for graphics and multimedia programming.
By mastering these topics, you gain a deeper understanding of how programs interact with memory, files, and data at the byte level. These skills are essential for writing efficient, low-level code.
Z) 🗃️ Glossary
File | Definition |
---|