W2 - 👨🏫 Lecture - Compiling, Debugging, Arrays, Command Line Arguments and Exit Statuses in C
Video: https://youtu.be/4vU4aEFmTSo
Video: https://youtu.be/tI_tIZFyKBw
#CS50x #C #Computer_ScienceRead more here
Table of Contents:
A) Introduction to Memory, Compiling, and Cryptography in C
In this CS50 lecture, we’re diving into some fundamental concepts in programming, starting with memory and the compilation process. We’re also touching on cryptography, which shows the practical application of programming fundamentals.
-
Introduction to Memory and Compilation 💻💾
-
Up until now, we’ve mostly covered basic building blocks, like loops, conditionals, and Boolean expressions. These form the foundation of any program.
-
This week, we’re expanding on these basics and exploring applications, including memory management and compiling code. Understanding these lower-level concepts helps with debugging and problem-solving.
-
-
Cryptography as an Application of Programming 🔐
-
Cryptography involves securing information by transforming it in ways that make it unreadable to unintended recipients.
-
Programs that can encrypt and decrypt messages ensure secure communication, even if someone intercepts the message.
-
- Cryptography, the practice of securing information by encoding messages
- Using cryptography, messages can be scrambled, making them unreadable to unauthorized parties, even if intercepted during transmission.
B) Compilation Process in C 📝➡️💻
In C programming, compiling your code,
involves turning it into machine-readable instructions that your computer can execute.
We usually use the make
command to handle this, just like we did in the previous lesson,
we used the command make
to compile code.
But it’s helpful to understand what’s happening behind the scenes:
-
make
is a tool that automates compilation but is not a compiler itself. It runs a compiler called [[Clang]] (or C-language), which is what actually converts your code into binary (0s and 1s) for the CPU. -
[[Clang]] is the program that performs the actual translation of your code, handling complex tasks that
make
simplifies for you.
B.1) Example: Compiling Manually with Clang (hello.c)
To understand how make
works, let’s break down the steps of compiling manually:
-
Write Your Program: First, write your code as usual, like in
hello.c
.
-
Compile Using Clang:
- Instead of running
make hello
, you can use Clang directly withclang -o hello hello.c
.
- Instead of running
- Here, the [[command line argument]]
-o hello
tells Clang to name the executablehello
. Without it, the compiler defaults the executable name toa.out
.
- Adding Libraries:
- If your code needs additional libraries (e.g., the CS50 library for
get_string
),
- If your code needs additional libraries (e.g., the CS50 library for
- add the
-lcs50
flag to your Clang command:clang -o hello hello.c -lcs50
.
B.2) Why Use make
? (Learn about MakeFiles) 🛠️📄
-
make
simplifies everything by automatically generating the right compilation command, saving you from manually typing out all the options each time. -
However, understanding how to compile with Clang directly helps you troubleshoot and gives you more control when needed.
-
Remember: in [[Clang]], compiler flags (like
-o
and-l
) facilitate the compilation of more complex programs. -
make
automatically includes such flags, but manually adding them with [[Clang]] is useful when you want finer control.
Video:
Video:
B.3) The Four Steps of Compilation
When make
or Clang compiles code, it runs through four essential steps:
-
Preprocessing
-
Compiling
- Compiling translates the preprocessed code from C code into [[assembly language]], which is a lower-level language specific to each computer architecture.
- Compiling translates the preprocessed code from C code into [[assembly language]], which is a lower-level language specific to each computer architecture.
-
Assembling
- The assembler converts the assembly language into machine code (0s and 1s) that the computer can understand.
- The assembler converts the assembly language into machine code (0s and 1s) that the computer can understand.
-
Linking
- Linking combines machine code from multiple source files (like
hello.c
,cs50.c
, andstdio.c
) into one executable file. - Without linking, these pieces of machine code would remain disconnected.
- Linking combines machine code from multiple source files (like
Video: https://youtu.be/2YfM-HxQd_8
B.4) Note: Reverse Engineering and Decompilation 📝⬅️💻
Now that we know about the compiler, let me mention [[decompilation]], where [[machine code]] is converted back into [[source code]].
While technically possible,
decompiling often produces code that’s challenging to interpret, lacking helpful names or comments.
This difficulty provides some protection for proprietary code,
though languages like [[Python]] or [[JavaScript]] don’t offer as much protection since they are readable by design.
Summary:
- Reverse Engineering: Converting compiled machine code back into source code is challenging because it lacks original variable names and function names, making it hard to interpret.
- Intellectual Property: Reverse engineering poses a risk to intellectual property for proprietary software. However, the difficulty of reading decompiled code can deter unauthorized reproduction.
C) Pro Tips when coding
C.1) Practical Use of Command Line Tools
Along with compiling, working in Linux introduces several essential command-line utilities.
Here’s a quick list of basic commands:
ls
: Lists files in the current directory.cd
: Changes the current directory.pwd
: Shows the present working directory.mkdir
: Creates a new directory.rm
: Deletes files. (For directories, userm -r
.)cp
: Copies files, and with-r
, copies entire directories.mv
: Renames or moves files to a different directory.
C.2) Avoiding "Magic Numbers" in Code 🧙♂️✨7️⃣
"Magic numbers" are hard-coded constants in code that lack an intuitive or obvious meaning, like setting a max value of 23 for a variable without context.
This time it is recommended to use of #define
to replace such numbers with named constants (e.g., #define MAX_HEIGHT 23
), making the code more readable and easier to maintain.
This helps avoid issues if someone wants to adjust that number later, as updating one line is simpler than tracking down every instance in the code.
______________________________________________________________
D) Debugging in CS50: Techniques and Tools 💻 🐛
Why Debugging is Essential
Debugging is integral to programming, as code rarely works perfectly on the first try. Even experienced programmers frequently encounter bugs, especially when adding new features. Therefore, learning and applying debugging techniques to eliminate errors is crucial for effective programming.
D.1) Historical Context of "Bugs"
The term “bug” in computing, popularized by Admiral Grace Hopper,
dates back to a literal bug—a moth—that got stuck in a relay in the Harvard Mark II computer, disrupting its function.
This story has since symbolized the common presence of bugs in programming.
D.2) Example: Debugging Techniques and Tools (buggy.c)
To illustrate debugging, let's write a basic program (buggy.c
) that should print a vertical column of three hashes (#)
but contains a bug causing it to print four hashes instead.
This example sets up a context to introduce effective debugging tools, like:
- Printf: Useful for quick, simple checks.
- Debug50: Ideal for stepping through complex code and examining logic.
- Rubber Duck Debugging: Helps clarify thinking and locate logical errors.
Learning to use these tools effectively can save you time and effort, making debugging less frustrating and more systematic.
D.2.1) Using printf()
- Using
printf
for Debugging-
Description:
printf
statements allow you to print the value of variables to track what’s happening in each part of the program. -
Example: By adding
printf("i is %i\n", i);
within the loop, the program will output the value ofi
at each iteration, helping pinpoint why an extra hash is printed.
-
Limitations: Overusing
printf
can clutter code and requires repeated recompilation. When debugging becomes trial and error, it’s better to switch to more advanced tools.
-
D.2.2) using debug50 👨💻
- Using a Debugger with
debug50
- What is a Debugger? A debugger allows you to examine the flow of your program, track variable values in real time, and identify logical errors. CS50’s
debug50
is a command for launching a graphical debugger in the VS Code environment. - Setting a Breakpoint: By setting breakpoints (clicking next to line numbers), you can pause program execution at specific lines and examine variable states and control flow step-by-step.
- What is a Debugger? A debugger allows you to examine the flow of your program, track variable values in real time, and identify logical errors. CS50’s
Then on the left hand side, we will see these options,
Variables and Call Stack: In debug50
, you can see variable values and the sequence of function calls.
For now, we will focus on these controls and monitor the variable h
,
-
Debugger Controls:
-
Step Over: Executes the current line without going into function calls.
-
Step Into: Moves into a function’s code to examine its inner workings.
-
-
Example: By stepping into the code at each line, it’s possible to see why the
i <= 3
condition allows an extra loop iteration. The solution is to change the condition toi < 3
.
Example: https://youtu.be/tk3cl8hyfqM?si=b9PIdxFWOY3htqYH
Spoiler: Takeaway message: in C, you cannot compare two strings with equal signs (_==_
), you can compare them using strcmp()
. if they are equal, strcmp()
will return you with a value of 0.
https://manual.cs50.io/3/strcmp
D.2.3) using the Rubber Duck 🐤🦆
- Rubber Duck Debugging
- Concept: Talking through your code line-by-line to an inanimate object, such as a rubber duck, can help you identify errors in your logic.
- Process: Explaining each part of your code forces you to think critically about its behavior, often helping you catch mistakes.
- CS50 Duck: CS50 has its own digital duck at
cs50.ai
or in VS Code’s pane atcs50.dev
, where students can ask questions about programming concepts or get code insights.
_______________________________________________________________
E) Intro to Data Types, Memory, and Arrays in C 💾🧠
In programming, we rely on data types to store different kinds of data in ways the computer understands, like integers, characters, or even entire strings.
Last week, we introduced a few of these types, but today, we’ll dive deeper to understand more precisely how data is represented at the hardware level and the limitations that come with it.
E.1) Data Types and Their Memory Limits
Recall that, at the end of the day, everything in a computer is just binary: a series of 0s and 1s. How the computer interprets a specific sequence of bits depends on the context we set for it—whether it’s an integer, a character, or even a color in an image.
This is determined by the data type we specify. But each of these types has a finite amount of space it can occupy:
For example:
- int: 4 bytes (32 bits), which can hold up to roughly ±2 billion.
- long: 8 bytes, allowing for a much larger range of integers.
- bool: Only needs 1 bit (0 or 1), but typically occupies a full byte to simplify processing.
- char: 1 byte per character.
- float: 4 bytes.
- double: 8 bytes for higher precision.
Each type only has a set amount of space in memory, which sometimes affects how much information can be stored or processed accurately.
E.2) Understanding Memory and Addresses
Let’s take a closer look at how memory actually works.
Inside your computer, memory is essentially a large grid of bytes, each with its own unique address or "location."
Each byte, being the smallest unit, can store part of a data type:
- A single
char
would occupy 1 byte,
- while an
int
might take up 4 bytes in a row.
- Each variable occupies a specific number of bytes and could theoretically be stored anywhere within this grid.
When we store variables in memory, the computer organizes them efficiently, often placing them contiguously (back-to-back) if there’s enough space available.
E.3) Example: Calculating an Average Score (scores.c) 💯🏆
Let’s see this in action with a basic example that calculates the average of three test scores.
Imagine a program where you declare three integer variables score1
, score2
, and score3
and assign them values like 72, 73, and 33.
Here’s what the code might look like:
int score1 = 72;
int score2 = 73;
int score3 = 33;
Next, to calculate the average, you would sum the three scores and divide by 3.
However, if you do this with integer math, you’ll only get an integer result (no decimal points), which isn’t ideal for an average.
One quick way to fix this is by using a float in the calculation:
Now the division includes a decimal, so you’ll get a floating-point result, giving you the full precision of the average score.
E.3.1) Understanding Memory Allocation (integers example)
To break down what’s happening in memory when we store these variables,
let’s visualize it as a grid or a “canvas” of memory.
This grid helps illustrate where data might be placed, though in practice, the exact location is handled by the computer.
For illustration, let’s place our variables at the top-left corner of this canvas.
Since an integer takes up 4 bytes, each of these numbers occupies 4 “squares” on our grid. So, score1 would fill the first 4 bytes, score2 the next 4, and score3 the following 4.
By convention, the computer often stores related variables close together (contiguously), which makes it easier for the CPU to retrieve them when needed.
However, in cases of heavy usage, memory could become fragmented, meaning values might get stored in separate, non-contiguous locations—something we’ll explore later.
Now, let’s look inside these memory slots. Each integer like 72 or 73 is stored as a pattern of 0s and 1s. Though we typically see these as whole numbers, the computer reads them in binary.
E.3.2) Moving Toward a Better Solution: Introducing Arrays 🔗🔟
This approach, with individually named variables like score1
, score2
, and score3
, technically works, but it doesn’t scale well....
Imagine if we needed to store many more scores—score4
, score5
, and so on. We’d quickly end up with a long list of nearly identical variables, creating messy, repetitive code. It’s not only inefficient but also prone to errors from simple typos or copy-paste mistakes.
To manage this better, we can use an [[array]], which lets us store multiple values under one variable name.
An [[array]] is a sequence of values back to back to back in memory. So an array is just a chunk of memory storing values back to back to back. So no gaps, no fragmentation. From left to right, top to bottom.
So, instead of separate variables,
an [[array]] like int scores[3];
allows us to store all three scores in one place, accessible by [[index]] such as:
and just like before, these integers are stored in memory back to back,
scores[3]
Note: This syntax means we start counting at 0, a standard in C and many other languages. If you try accessing scores[3]
, you’d be accessing an area beyond the array’s allocated memory, which can lead to errors.
This setup allows us to reference each score through scores[index]
, which keeps our code organized and reduces redundancy. It also becomes much easier to scale and manage, especially if the number of scores grows.
Arrays, therefore, are a more structured and efficient solution for storing multiple values of the same type, setting us up for cleaner, more manageable code as we scale.
We could declare an array like this too:
E.3.3) Improving Code with Loops and Arrays 🔄🔗
Hardcoding each assignment separately is still a bit repetitive. Since we’re essentially doing the same thing three times,
we can make it dynamic with a loop:
Here, we use a loop to prompt the user for three scores. Each input is stored in the respective index of scores
, making the code more efficient and easier to modify.
E.4) Addressing Design Issues and using Functions with Arrays
We can improve our code further by avoiding repeated numbers. For example, the length of the array (3
in this case) appears in multiple places, which could lead to errors if we want to adjust it later.
To handle this better, let’s store this size in a constant:
We can improve our code further by avoiding repeated numbers. For example, the length of the array (3
in this case) appears in multiple places, which could lead to errors if we want to adjust it later. To handle this better, let’s store this size in a constant:
#define NUM_SCORES 3
int scores[NUM_SCORES];
By defining the array size as a constant (`NUM_SCORES`), we can adjust the value in one place, ensuring that any reference to the array size throughout the program remains consistent.
Also, if we want to calculate the average, we can write a function that takes in the array and its length:
float average(int array[], int length) {
int sum = 0;
for (int i = 0; i < length; i++) {
sum += array[i];
}
return sum / (float) length;
}
This function iterates over the array, sums up its elements, and calculates the average.
Notice that in C, we pass the array's length explicitly because C doesn’t provide built-in methods to determine an array’s size.
Full example:
________________________________________________________
F) Working with Characters and Arrays (hi.c) 🧵🔗🔤
Up until now, we’ve mostly worked with numbers. But wouldn’t it be nice to work with letters, words, or even full paragraphs?
That’s where [[strings]] come in, though we can first look at a simpler setup with individual characters to understand how characters are stored in memory.
Let's say we want to store individual characters like “H,” “I,” and “!”. We can use the char type, which represents a single character.
Here’s how we’d set this up:
char c1 = 'H';
char c2 = 'I';
char c3 = '!';
This setup is simple but becomes cumbersome if we have to deal with many characters.
Yet, this gives a glimpse of how characters are stored in memory. Each character, when stored, corresponds to a specific number due to the [[ASCII encoding]].
For instance:
- H corresponds to 72
- I corresponds to 73
- ! corresponds to 33
So if we were to print each character as an integer, we’d see 72, 73, and 33.
And just like numbers, this is how the characters are stored in memory,
these are just numbers, like the integers,
Which are the then used by the computer as binaries,
F.1) Strings: A Collection of Characters and the Null Terminator 🧵🔚
A [[string]] in C is really an array of characters stored one after the other in memory.
Rather than defining each character individually, we can use an array structure to hold multiple characters together.
For example, the string “HI!” could be stored as a single variable (using the CS50 Lib):
This variable s
automatically includes not only the characters H, I, and !, but also a special character at the end called the [[null terminator (NUL)]] (represented as \0
), which tells the computer, “This is the end of the string.”
Note: The [[null terminator (NUL)]], \0
, is a byte of all zeroes that indicates the end of a string in memory. Without it, the computer wouldn’t know where a string ends, which could lead it to read unintended parts of memory.
We use the \
to not confuse it with an actual 0
And it's name in [[ASCII]] is [[NUL]],
F.2) Example: Exploring the Memory Layout of a String (using it as an Array)
When you declare a string like string s = "HI!";
, it might look something like this in memory:
Index | Character |
---|---|
0 | H |
1 | I |
2 | ! |
3 | \0 |
Each character takes up one byte, and the [[null terminator (NUL) ]] adds one more byte, making this string a total of 4 bytes.
Every [[string]] in the world, is actually n plus 1 bytes where n is the actual human length that you care about, in other words, the number of readable characters, but the [[string]] is always going to use one extra byte for this so-called zero value [[null terminator (NUL)]] at the end.
Since a [[string]] is just an array of characters, we can access each character in s
using square bracket notation, similar to how we access elements in an integer array:
We can even print them as the numbers they really are (including the [[null terminator (NUL)]]):
This approach gives us full control over each character in the string.
NULL
vs. NUL
NULL
: A null pointer macro, used to indicate a pointer with no valid memory location.NUL
or'\0'
: A null terminator for strings, marking the end of a string in C.
Don’t confuse NULL
(pointer) with NUL
('\0'
for strings)—they serve different purposes!
F.3) Example: Creating an Array of Strings (hi.c)
Let's start with a small example and play with some strings like s
and t
to store two different words:
In memory, this might look like:
Thanks to the [[null terminator (NUL)]] we know where each of the strings end.
But, we can even create an [[array]] of [[strings]], for Example:
In memory, this might look like:
but if each string is an array of characters, in memory it looks like this:
Now, words
is an array of strings, where each string is itself an array of characters.
Note: You can see it as a 2D array,
And we can even print them as such, as individual characters,
Can you have an array with multiple different data types?
- Short answer, no; longer answer, sort of, but not in nearly the same user-friendly way as with languages like Python or JavaScript or others. So assume for now arrays should be the same type in C.
__________________________________________________________________
G) Exploring methods to find the String Length (length.c) 🧵📏
To start exploring the length of a string, let’s build a program called length.c
that will count how many characters are in a user’s input.
First, we’ll count this manually, character by character, and then we’ll look into a more efficient way to do this using C’s strlen
function.
G.1) Counting Characters Manually
To determine the length of a string ourselves, let’s prompt the user for their name, then count each character by moving through each position until we hit the null terminator (\0
), which signals the end of the string.
#include <cs50.h>
#include <stdio.h>
int main(void)
{
string name = get_string("Name: ");
// Count each character in 'name' until reaching null terminator
int n = 0;
while (name[n] != '\0')
{
n++;
}
printf("Length: %i\n", n);
}
When run, if you type “David,” the program will output 5, since “David” has five characters. The while loop stops when it encounters the null terminator, ensuring we only count the characters in the name.
G.2) Creating a Helper Function: string_length
We can make this counting process reusable by creating a helper function. Instead of repeating this logic each time we need the length of a string, we can modularize it.
#include <cs50.h>
#include <stdio.h>
// Function that returns the length of a string
int string_length(string s);
int main(void)
{
string name = get_string("Name: ");
int length = string_length(name);
printf("Length: %i\n", length);
}
// Function that returns the length of a string
int string_length(string s)
{
int n = 0;
while (s[n] != '\0')
{
n++;
}
return n;
}
Here, string_length
takes a string as input and returns its length. Now, by calling string_length(name)
, we get the length without embedding the loop directly in main
.
G.3) Using the strlen
Function from string.h
👍
Counting the length of a string manually works, but C’s strlen
function in the string.h
library already accomplishes this.
Link: https://manual.cs50.io/#string.h
Let’s use strlen
to simplify our code further:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
string name = get_string("Name: ");
int length = strlen(name);
printf("Length: %i\n", length);
}
Including string.h
lets us use strlen
, which returns the length of any string passed to it, saving us from writing our own counting loop.
string.h
Note: in C, you cannot compare two strings with equal signs (_==_
), you can compare them using strcmp()
. if they are equal, strcmp()
will return you with a value of 0.
https://manual.cs50.io/3/strcmp
H) Practice Efficient For Loop Usage with String Lengths in C 🔁🧵
In this section, we’ll create a C program to print out each character of a string, but more importantly, we’ll explore efficient design. We’ll look at how repeated function calls inside loops can lead to inefficiencies and introduce an improved method for calling strlen
once in a for
loop. Let’s get started!
H.1) Looping Through Each Character (string.c) - inefficient example
Our goal is to print each character in the string individually. Almost like we were making our own version of printf()
for string
data types.
We’ll start by including the necessary libraries and setting up our main function to accept user input.
This sets up the program to capture user input and display an "Output" label to keep our output organized.
Remember, our goal is to print each character in the string individually. We can do this by iterating over each position in the string up to its length.
Here’s how it might look initially:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// Prompt the user for a string input
string s = get_string("Input: ");
// Print "Output:" for readability
printf("Output: ");
for (int i = 0; i < strlen(s); i++)
{
printf("%c", s[i]);
}
printf("\n");
}
This code uses strlen(s)
to get the length of the string and loops through each character using printf("%c", s[i]);
.
However, this version has an inefficiency due to the repeated calls to strlen(s)
within the loop.
H.2) Identifying Inefficiencies and Optimizing the for
Loop
In the above code, calling strlen(s)
every time the loop iterates is unnecessary since strlen(s)
will always return the same value for s
. Calling strlen
within the loop condition leads to recalculating the string length on every iteration, which adds up computationally, especially for larger strings.
To solve this, we can store the result of strlen(s)
in a variable outside the loop.
Let’s adjust the code to call strlen
only once, storing the result in a variable length
. This small change reduces redundant calls, making the program more efficient.
Optimizing with a Length Variable:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// Prompt the user for a string input
string s = get_string("Input: ");
// Print "Output:" for readability
printf("Output: ");
int length = strlen(s);
for (int i = 0; i < length; i++)
{
printf("%c", s[i]);
}
printf("\n");
}
Further Simplification Using for
Loop Initialization:
In C, we can declare multiple variables of the same type in the for
loop initialization. This allows us to declare both i
and n
(for length) in the for
statement itself, reducing clutter:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// Prompt the user for a string input
string s = get_string("Input: ");
// Print "Output:" for readability
printf("Output: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
printf("%c", s[i]);
}
printf("\n");
}
By initializing n
as strlen(s)
in the for
loop, we avoid repeated strlen
calls and keep our code concise.
for
loop
In this example, I'm declaring i
as an int
, but by way of the comma, I am also declaring n
as an int
. So they've got to be the same type for this trick to work.
This final version provides both clarity and efficiency, demonstrating how small adjustments in function calls and loop design can improve performance. Using these techniques, you’ll avoid unnecessary computations and write cleaner, more effective code!
____________________________________________________________
I) Exploring more Libraries: Using ctype.h
for Uppercase Conversion in C (uppercase.c) 🔡🔠
In addition to the standard libraries we've seen, there are other libraries and header files you might find useful in C.
One such library is ctype
,
https://manual.cs50.io/#ctype.h
which relates to character types and provides various functions that we can use in our programs.
Let’s take a look at this by writing a small program to convert a user’s input string to uppercase.
I.1) Version1: Converting to Upper Case Manually (using the ASCII table)
Step 1: Setting Up uppercase.c
I'll start by creating a new file, uppercase.c
, and including the necessary header files.
We’ll also include ctype.h
later, which contains functions to manipulate character types.
Step 2: Implementing the Uppercase Conversion Logic:
To begin, I’ll create a main
function and prompt the user for a string:
The After:
prompt will show the same word, but in uppercase letters-
Step 3: Iterating Through the String:
I’ll use a for
loop to iterate through each character in the string. The loop will look like this:
for (int i = 0, n = strlen(s); i < n; i++)
{
if (s[i] >= 'a' && s[i] <= 'z')
{
printf("%c", s[i] - 32);
}
else
{
printf("%c", s[i]);
}
}
printf("\n");
}
Here’s the breakdown:
-
Looping Through Characters: The loop iterates through each character in the string.
-
Condition to Check Lowercase Letters: If the character is between
'a'
and'z'
, we assume it’s lowercase. -
Converting to Uppercase: To convert a lowercase letter to uppercase, we subtract
32
from its ASCII value. This approach leverages the consistent ASCII difference between lowercase and uppercase letters.
-
Printing Characters: We print each character, converted or unchanged.
Step 4: Running the Program:
Now, let's compile and test the program. Typing "david" converts it to "DAVID,"
and mixed-case input, like "David," also converts the lowercase letters to uppercase.
Avoid the manual arithmetic by doing math with char
I.2) Version2: Refactoring our program with ctype.h
Instead of handling ASCII conversions manually, let’s simplify this using the ctype
library.
The library provides a function, toupper
, that converts a character to uppercase if it’s lowercase, or leaves it unchanged if it’s already uppercase.
Updating the Code:
First, I’ll add #include <ctype.h>
. Then, I’ll replace the ASCII-based conversion with toupper
:
#include <ctype.h>
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// Prompt the user for a string input
string s = get_string("Before: ");
// Print "Output:" for readability
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
//toupper handles both lowercase and uppercase
printf("%c", toupper(s[i]));
}
printf("\n");
}
This approach not only simplifies the code but also enhances readability by removing the need for explicit ASCII math.
ctype.h
In case we need to know if a letter is lowercase or uppercase, we can use the function islower()
I.2.1) Note: Benefits of using Libraries
Using toupper
improves our program because:
- Simplicity: We no longer need manual checks for each character’s ASCII range.
- Error Reduction: The function handles both uppercase and non-alphabetic characters seamlessly.
Libraries like ctype
and string
allow us to perform common operations in C efficiently, building on decades of work by other programmers. By using these libraries, we make our code more readable, reliable, and efficient.
Video: https://youtu.be/JbHmin2Wtmc
________________________________________________________________
J) Exploring Command Line Arguments in C 🖐️⌨️📥
In this segment, we dive into command line arguments in C, a feature commonly used in Linux commands, such as cd
or rm
, that allows users to specify input right when running a program.
Let’s explore how command line arguments work and how we can implement them in our own C programs.
J.1) Command Line Arguments - Getting User Input with main
Command line arguments allow a user to provide inputs as words directly in the command line, which a program can then read and process.
This is different from using functions like get_string
, which prompt the user for input during the program’s execution.
Using command line arguments can make the process quicker and more efficient by letting users input everything at once.
For instance:
- Running
clang
uses command line arguments to specify files for compilation. - Similarly,
cd pset1
changes the directory to “pset1” as a command line argument.
In C, we can set up our programs to take these arguments by modifying the main
function to accept two parameters instead of void
:
int main(int argc, string argv[])
argc
is the [[argument count]]—the number of words provided in the command.argv[]
is the [[argument vector]]—an array of strings containing each word typed in.
J.2) Example: Creating a Simple C Program with Command Line Arguments (greet.c)
Let’s create a program called greet.c
that greets a user based on their name provided as a command line argument.
Here’s the old method of doing this:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
string name = get_string("What's your name? ");
printf("Hello, %s\n", name);
}
This code uses get_string
, which requires the user to enter their name after starting the program.
Now, let's modify it to accept the name directly as a command line argument.
Updating main
to Use Command Line Arguments:
To use command line arguments, we replace void
with int argc, string argv[]
:
#include <stdio.h>
int main(int argc, string argv[])
{
if (argc == 2)
{
printf("Hello, %s\n", argv[1]);
}
else
{
printf("Hello, world\n");
}
}
In this code:
- We check if
argc
is2
(meaning the user provided one additional word after the program’s name atargv[0]
). - If so, we print "Hello,
argv[1]
," using the user’s name. - Otherwise, we default to "Hello, world" if no name is given.
argv[]
Note: if we do not use the if
conditional, we might experience bugs if we do not use argv[]
correctly,
J.2.1) Note: Exploring argv[0]
Let’s experiment with argv[0]
:
Here, argv[0]
refers to the program’s name itself. Running ./greet David
would output Hello, ./greet
because argv[0]
stores the name of the executable.
J.2.2) How to handle Multiple Arguments with argc
We can also add logic to handle additional arguments:
This loop iterates over all command line arguments, printing each one on a new line.
Video: https://youtu.be/6Dk8s0F2gow
J.3) Extra: Fun with ASCII Art and Command Line Arguments (cowsay
) 🎨
To demonstrate command line arguments creatively, let’s explore ASCII art using a pre-installed program called cowsay
.
ASCII art programs like cowsay
allow you to create images using text characters, and they take command line arguments to customize the output.
For example: cowsay "moo"
This command prints a cow saying "moo."
You can also add flags (options that start with a dash) to change the style:
cowsay -f duck "quack"
cowsay -f dragon "RAWR"
These flags allow us to switch from a cow to a duck
or dragon,
demonstrating how command line arguments modify the behavior of the program.
Command line arguments are a powerful feature that, once mastered, provide flexibility and efficiency, allowing users to input multiple parameters at the program start. This makes them essential for programs requiring varied inputs.
K) Introduction to Program Exit Statuses in C 🛑📤✉️
Today, we’ll explore a feature of C programs that works behind the scenes: exit statuses.
These are the status codes that programs return to indicate if they ran successfully or encountered an issue.
Though most programs we’ve written end with a 0
status, which signifies success, we can also assign other exit statuses to signify different errors or outcomes.
Let's break down this concept with examples.
K.1) Understanding Exit Statuses
Every C program returns an integer when it finishes running, called an exit status:
- 0: Represents success (the program ran as expected).
- Non-zero values: Indicate various errors. For example,
1
could mean a missing argument, and1132
could be a specific code for a bug in a larger program like Zoom.
This concept is widespread. For instance, error codes like 404 (File Not Found) on the web are essentially exit statuses.
These numbers often serve to help developers understand what went wrong.
K.2) Setting Up Exit Statuses in C Programs (status.c)
In C, our main
function can return an integer, allowing us to control the program's exit status:
For example:
#include <cs50.h>
#include <stdio.h>
int main(int argc, string argv[])
{
if (argc != 2)
{
printf("Missing command-line argument\n");
return 1; // Exit with status 1 to indicate an error
}
else
{
printf("Hello, %s\n", argv[1]);
return 0; // Exit with status 0 to indicate success
}
}
In this example:
- If the user provides an incorrect number of arguments, we display an error message and
return 1
. - If the argument count (
argc
) is correct, we greet the user andreturn 0
.
Let’s compile and test:
K.2.1) Checking the Exit Status (echo $?
)
We can view the exit status of the last-run command using echo $?
:
This echo $?
command displays the exit status for debugging or automated testing.
Practical Use of Exit Statuses in Testing:
In development, exit statuses are used to verify program correctness automatically. In classes or real-world software development, unit tests and tools like check50 can detect exit statuses and confirm if a program ran correctly.
Non-zero statuses help pinpoint the nature of the failure, allowing for more targeted debugging.
Using exit statuses effectively in C programming provides a standardized way to communicate program outcomes and errors, making your code more robust and easier to test.
__________________________________________________________________
L) Introduction to Cryptography and Caesar Cipher 🔑✉️🔒
Today, we’ll explore a crucial aspect of modern computing: cryptography, the practice of encoding information to ensure only the intended recipient can read it.
In this overview, we’ll cover fundamental concepts like:
- plaintext,
- ciphertext,
- and ciphers,
and then dive into a classic example: the Caesar cipher.
L.1) Cryptography: The Basics
Cryptography allows us to send secure information, transforming it in such a way that, even if intercepted, unauthorized viewers cannot understand it.
Here are some key terms:
- Plaintext: The original, readable message.
- Ciphertext: The scrambled, encrypted version of the message, unreadable to unauthorized viewers.
- Cipher: An algorithm for converting plaintext into ciphertext and vice versa, using a key.
Most ciphers rely on a key—a piece of information that guides the encryption process and allows only those with the correct key to decrypt the ciphertext back to readable plaintext.
L.2) Example: The Caesar Cipher 🔑📜
One of the simplest ciphers is the Caesar cipher, famously used by Julius Caesar:
- Caesar’s cipher shifts each letter in the plaintext by a set number of positions in the alphabet.
- For example, if the key is 1, each letter in "HI!" is shifted to the next letter in the alphabet, becoming "IJ!".
Let’s walk through how this works:
- Choose a Key: Suppose we choose the key 1. This means each letter in the plaintext is shifted forward by one.
- Encrypting: Take "HI!"—shift the H to I, and I to J. Thus, "HI!" becomes "IJ!". (Remember,
strings
are arrays ofchar
, andchar
are just numbers. So we can do math to shift them). - Decrypting: To reverse the process, subtract the key from each letter’s position. If our key is 1, shifting "IJ!" back by one recovers "HI!".
L.2.1) Variations of Caesar Cipher (ROT13)
This cipher isn’t limited to a shift of 1:
- ROT13: This variant shifts each letter by 13 places. ROT13 is commonly used online for obfuscation.
- ROT26: Shifting by 26 letters brings each letter back to its original position, making it effectively useless as an encryption method.
- Applying it twice restores the original text (e.g.,
ROT13(ROT13(text)) = text
). - This makes it self-reversible, eliminating the need for a separate decryption key or algorithm.
- ROT13 is not intended for secure communication; it is merely a tool for casual obfuscation.
- It avoids the complexities of stronger encryption schemes like RSA or AES when the goal is simply to obscure text rather than secure it.
While Caesar’s cipher is simple, it isn’t secure by modern standards. With only 25 possible keys, it’s easy to brute-force the solution by trying all possible shifts.
L.3) Decryption Example 📩
If you receive a message encrypted with a Caesar cipher, you can decrypt it by reversing the shift:
- Identify the key (for example, 1).
- Shift each letter in the ciphertext back by this key to reveal the original message.
L.4) Conclusion and Class Recap
Today’s class introduced basic cryptography concepts, specifically the Caesar cipher, highlighting the essential role of encryption in protecting information.
As we move forward, we’ll explore more sophisticated ciphers that leverage complex algorithms, providing stronger security.
Z) 🗃️ Glossary
File | Definition |
---|---|
Unit Tests | - |