12 Control Structures
Control structures are statements in a program that determine when code is evaluated (and how many times it might be evaluated). There are two main types of control structures: if-statements and loops.
12.1 Objectives
- Understand how to use conditional statements
- Understand how conditional statements are evaluated by a program
- Use program flow diagrams to break a problem into parts and evaluate how a program will execute
- Understand how to use loops
- Select the appropriate type of loop for a problem
12.2 Mindset
Before we start on the types of control structures, let’s get in the right mindset. We’re all used to “if-then” logic, and use it in everyday conversation, but computers require another level of specificity when you’re trying to provide instructions.
Check out this video of the classic “make a peanut butter sandwich instructions challenge”:
Here’s another example:
The key takeaways from these bits of media are that you should read this section with a focus on exact precision - state exactly what you mean, and the computer will do what you say. If you instead expect the computer to get what you mean, you’re going to have a bad time.
12.3 Conditional Statements
Conditional statements determine if code is evaluated.
They look like this:
if (condition)
then
(thing to do)
else
(other thing to do)
The else (other thing to do) part may be omitted.
When this statement is read by the computer, the computer checks to see if condition is true or false. If the condition is true, then (thing to do) is also run. If the condition is false, then (other thing to do) is run instead.
Let’s try this out:
In R, the logical condition after if
must be in parentheses. It is common to then enclose the statement to be run if the condition is true in {}
so that it is clear what code matches the if statement. You can technically put the condition on the line after the if (x > 2)
line, and everything will still work, but then it gets hard to figure out what to do with the else statement - it technically would also go on the same line, and that gets hard to read.
So while the 2nd version of the code technically works, the first version with the brackets is much easier to read and understand. Please try to emulate the first version!
= 3
x = 1
y
if x > 2:
= 8
y else:
= 4
y
print("x =", x, "; y =", y)
## x = 3 ; y = 8
In python, all code grouping is accomplished with spaces instead of with brackets. So in python, we write our if statement as if x > 2:
with the colon indicating that what follows is the code to evaluate. The next line is indented with 2 spaces to show that the code on those lines belongs to that if statement. Then, we use the else: statement to provide an alternative set of code to run if the logical condition in the if statement is false. Again, we indent the code under the else statement to show where it “belongs”.
Python will throw errors if you mess up the spacing. This is one thing that is very annoying about Python… but it’s a consequence of trying to make the code more readable.
12.3.1 Representing Conditional Statements as Diagrams
A common way to represent conditional logic is to draw a flow chart diagram.
In a flow chart, conditional statements are represented as diamonds, and other code is represented as a rectangle. Yes/no or True/False branches are labeled. Typically, after a conditional statement, the program flow returns to a single point.
12.3.2 Chaining Conditional Statements: Else-If
In many cases, it can be helpful to have a long chain of conditional statements describing a sequence of alternative statements.
Suppose I want to determine what categorical age bracket someone falls into based on their numerical age. All of the bins are mutually exclusive - you can’t be in the 25-40 bracket and the 41-55 bracket.
The important thing to realize when examining this program flow map is that if age <= 18
is true, then none of the other conditional statements even get evaluated. That is, once a statement is true, none of the other statements matter. Because of this, it is important to place the most restrictive statement first.
If for some reason you wrote your conditional statements in the wrong order, the wrong label would get assigned:
In code, we would write this statement using else-if
(or elif
) statements.
age <- 40 # change this as you will to see how the code works
if (age < 18) {
bracket <- "<18"
} else if (age <= 25) {
bracket <- "18-25"
} else if (age <= 40) {
bracket <- "26-40"
} else if (age <= 55) {
bracket <- "41-55"
} else if (age <= 65) {
bracket <- "56-65"
} else {
bracket <- ">65"
}
bracket
## [1] "26-40"
Python uses elif
as a shorthand for else if
statements. As always, indentation/white space in python matters. If you put an extra blank line between two elif statements, then the interpreter will complain. If you don’t indent properly, the interpreter will complain.
= 40 # change this to see how the code works
age
if age < 18:
= "<18"
bracket elif age <= 25:
= "18-25"
bracket elif age <= 40:
= "26-40"
bracket elif age <= 55:
= "41-55"
bracket elif age <= 65:
= "56-65"
bracket else:
= ">65"
bracket
bracket## '26-40'
The US Tax code has brackets, such that the first $10,275 of your income is taxed at 10%, anything between $10,275 and $41,775 is taxed at 12%, and so on.
Here is the table of tax brackets for single filers in 2022:
rate | Income |
---|---|
10% | $0 to $10,275 |
12% | $10,275 to $41,775 |
22% | $41,775 to $89,075 |
24% | $89,075 to $170,050 |
32% | $170,050 to $215,950 |
35% | $215,950 to $539,900 |
37% | $539,900 or more |
Note: For the purposes of this problem, we’re ignoring the personal exemption and the standard deduction, so we’re already simplifying the tax code.
Write a set of if statements that assess someone’s income and determine what their overall tax rate is.
Hint: You may want to keep track of how much of the income has already been taxed in a variable and what the total tax accumulation is in another variable.
Control flow diagrams can be extremely helpful when figuring out how programs work (and where gaps in your logic are when you’re debugging). It can be very helpful to map out your program flow as you’re untangling a problem.
# Start with total income
income <- 200000
# x will hold income that hasn't been taxed yet
x <- income
# y will hold taxes paid
y <- 0
if (x <= 10275) {
y <- x*.1 # tax paid
x <- 0 # All money has been taxed
} else {
y <- y + 10275 * .1
x <- x - 10275 # Money remaining that hasn't been taxed
}
if (x <= (41775 - 10275)) {
y <- y + x * .12
x <- 0
} else {
y <- y + (41775 - 10275) * .12
x <- x - (41775 - 10275)
}
if (x <= (89075 - 41775)) {
y <- y + x * .22
x <- 0
} else {
y <- y + (89075 - 41775) * .22
x <- x - (89075 - 41775)
}
if (x <= (170050 - 89075)) {
y <- y + x * .24
x <- 0
} else {
y <- y + (170050 - 89075) * .24
x <- x - (170050 - 89075)
}
if (x <= (215950 - 170050)) {
y <- y + x * .32
x <- 0
} else {
y <- y + (215950 - 170050) * .32
x <- x - (215950 - 170050)
}
if (x <= (539900 - 215950)) {
y <- y + x * .35
x <- 0
} else {
y <- y + (539900 - 215950) * .35
x <- x - (539900 - 215950)
}
if (x > 0) {
y <- y + x * .37
}
print(paste("Total Tax Rate on $", income, " in income = ", round(y/income, 4)*100, "%"))
## [1] "Total Tax Rate on $ 2e+05 in income = 22.12 %"
# Start with total income
= 200000
income
# untaxed will hold income that hasn't been taxed yet
= income
untaxed # taxed will hold taxes paid
= 0
taxes
if untaxed <= 10275:
= untaxed*.1 # tax paid
taxes = 0 # All money has been taxed
untaxed else:
= taxes + 10275 * .1
taxes = untaxed - 10275 # money remaining that hasn't been taxed
untaxed
if untaxed <= (41775 - 10275):
= taxes + untaxed * .12
taxes = 0
untaxed else:
= taxes + (41775 - 10275) * .12
taxes = untaxed - (41775 - 10275)
untaxed
if untaxed <= (89075 - 41775):
= taxes + untaxed * .22
taxes = 0
untaxed else:
= taxes + (89075 - 41775) * .22
taxes = untaxed - (89075 - 41775)
untaxed
if untaxed <= (170050 - 89075):
= taxes + untaxed * .24
taxes = 0
untaxed else:
= taxes + (170050 - 89075) * .24
taxes = untaxed - (170050 - 89075)
untaxed
if untaxed <= (215950 - 170050):
= taxes + untaxed * .32
taxes = 0
untaxed else:
= taxes + (215950 - 170050) * .32
taxes = untaxed - (215950 - 170050)
untaxed
if untaxed <= (539900 - 215950):
= taxes + untaxed * .35
taxes = 0
untaxed else:
= taxes + (539900 - 215950) * .35
taxes = untaxed - (539900 - 215950)
untaxed
if untaxed > 0:
= taxes + untaxed * .37
taxes
print("Total Tauntaxed Rate on $", income, " in income = ", round(taxes/income, 4)*100, "%")
## Total Tauntaxed Rate on $ 200000 in income = 22.12 %
We will find a better way to represent this calculation once we discuss loops - we can store each bracket’s start and end point in a vector and loop through them. Any time you find yourself copy-pasting code and changing values, you should consider using a loop (or eventually a function) instead.
12.4 Loops
Often, we write programs which update a variable in a way that the new value of the variable depends on the old value:
x = x + 1
This means that we add one to the current value of x
.
Before we write a statement like this, we have to initialize the value of x
because otherwise, we don’t know what value to add one to.
x = 0
x = x + 1
We sometimes use the word increment to talk about adding one to the value of x
; decrement means subtracting one from the value of x
.
A particularly powerful tool for making these types of repetitive changes in programming is the loop, which executes statements a certain number of times. Loops can be written in several different ways, but all loops allow for executing a block of code a variable number of times.
12.4.1 While Loops
In the previous section, we discussed conditional statements, where a block of code is only executed if a logical statement is true. The simplest type of loop is the while loop, which executes a block of code until a statement is no longer true.
x <- 0
while (x < 10) {
# Everything in here is executed
# during each iteration of the loop
print(x)
x <- x + 1
}
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
= 0
x
while x < 10:
print(x)
= x + 1
x ## 0
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
Write a while loop that verifies that \[\lim_{N \rightarrow \infty} \prod_{k=1}^N \left(1 + \frac{1}{k^2}\right) = \frac{e^\pi - e^{-\pi}}{2\pi}.\]
Terminate your loop when you get within 0.0001 of \(\frac{e^\pi - e^{-\pi}}{2\pi}\). At what value of \(k\) is this point reached?
Breaking down math notation for code:
If you are unfamiliar with the notation \(\prod_{k=1}^N f(k)\), this is the product of \(f(k)\) for \(k = 1, 2, ..., N\), \[f(1)\cdot f(2)\cdot ... \cdot f(N)\]
To evaluate a limit, we just keep increasing \(N\) until we get arbitrarily close to the right hand side of the equation.
In this problem, we can just keep increasing \(k\) and keep track of the cumulative product. So we define k=1
, prod = 1
, and ans
before the loop starts. Then, we loop over k
, multiplying prod
by \((1 + 1/k^2)\) and then incrementing \(k\) by one each time. At each iteration, we test whether prod
is close enough to ans
to stop the loop.
In R, you will use pi
and exp()
- these are available by default without any additional libraries or packages.
Note that in python, you will have to import the math library to get the values of pi and the exp
function. You can refer to these as math.pi
and math.exp()
respectively.
import math
= 1
k = 1
prod = (math.exp(math.pi) - math.exp(-math.pi))/(2*math.pi)
ans = 0.0001
delta
while abs(prod - ans) >= 0.0001:
= prod * (1 + k**-2)
prod = k + 1
k if k > 500000:
break
print("At ", k, " iterations, the product is ", prod, "compared to the limit ", ans,".")
## At 36761 iterations, the product is 3.675977910975878 compared to the limit 3.676077910374978 .
It is very easy to create an infinite loop when you are working with while loops. Infinite loops never exit, because the condition is always true. If in the while loop example we decrement x instead of incrementing x, the loop will run forever.
You want to try very hard to avoid ever creating an infinite loop - it can cause your session to crash.
One common way to avoid infinite loops is to create a second variable that just counts how many times the loop has run. If that variable gets over a certain threshold, you exit the loop.
This while loop runs until either x < 10 or n > 50 - so it will run an indeterminate number of times and depends on the random values added to x. Since this process (a ‘random walk’) could theoretically continue forever, we add the n>50 check to the loop so that we don’t tie up the computer for eternity.
x <- 0
n <- 0 # count the number of times the loop runs
while (x < 10) {
print(x)
x <- x + rnorm(1) # add a random normal (0, 1) draw each time
n <- n + 1
if (n > 50)
break # this stops the loop if n > 50
}
## [1] 0
## [1] 0.3517088
## [1] 1.517403
## [1] 2.810551
## [1] 2.443797
## [1] 2.813846
## [1] 0.9007025
## [1] 1.011412
## [1] 0.961636
## [1] 2.379953
## [1] 2.714089
## [1] 2.94099
## [1] 2.227345
## [1] 2.672468
## [1] 4.240402
## [1] 5.615584
## [1] 5.204176
## [1] 4.706667
## [1] 5.582507
## [1] 7.051302
## [1] 8.839158
import numpy as np; # for the random normal draw
= 0
x = 0 # count the number of times the loop runs
n
while x < 10:
print(x)
= x + np.random.normal(0, 1, 1) # add a random normal (0, 1) draw each time
x = n + 1
n if n > 50:
break # this stops the loop if n > 50
## 0
## [-0.88771665]
## [-0.02517791]
## [-0.55818398]
## [0.7410756]
## [1.0105944]
## [0.70525995]
## [1.18138021]
## [0.6155277]
## [-0.92180249]
## [-0.44298115]
## [0.04365226]
## [-0.41922168]
## [-0.33223168]
## [0.04617224]
## [-0.20752893]
## [-0.46245587]
## [-0.22026173]
## [0.64728956]
## [-1.70667016]
## [-2.82031746]
## [-3.76019061]
## [-4.70790018]
## [-5.83278641]
## [-5.34132197]
## [-7.60843189]
## [-6.03492823]
## [-6.06545976]
## [-5.03359451]
## [-3.01396455]
## [-1.05511401]
## [-0.5347353]
## [-2.91148155]
## [-1.854246]
## [-2.00071402]
## [-2.5173893]
## [-3.66543687]
## [-2.75343879]
## [-1.94007636]
## [-2.03480331]
## [-1.4759154]
## [-2.68606795]
## [-3.26983128]
## [-4.46714622]
## [-3.52795086]
## [-2.69546197]
## [-1.72338727]
## [-1.35948517]
## [-2.15545937]
## [0.47031341]
## [-0.59488528]
In both of the examples above, there are more efficient ways to write a random walk, but we will get to that later. The important thing here is that we want to make sure that our loops don’t run for all eternity.
12.4.2 For Loops
Another common type of loop is a for loop. In a for loop, we run the block of code, iterating through a series of values (commonly, one to N, but not always). Generally speaking, for loops are known as definite loops because the code inside a for loop is executed a specific number of times. While loops are known as indefinite loops because the code within a while loop is evaluated until the condition is falsified, which is not always a known number of times.
for (i in 1:5 ) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
for i in range(5):
print(i)
## 0
## 1
## 2
## 3
## 4
By default range
(5) goes from 0 to 5, the upper bound. When i = 5
the loop exits. This is because range(5)
creates a vector [0, 1, 2, 3, 4]
.
For loops are often run from 1 to N (or 0 to N-1 in python) but in essence, a for loop is very commonly used to do a task for every value of a vector.
12.4.2.1 Example - For Loops
For instance, in R, there is a built-in variable called month.name
. Type month.name
into your R console to see what it looks like. If we want to iterate along the values of month.name, we can:
for (i in month.name)
print(i)
## [1] "January"
## [1] "February"
## [1] "March"
## [1] "April"
## [1] "May"
## [1] "June"
## [1] "July"
## [1] "August"
## [1] "September"
## [1] "October"
## [1] "November"
## [1] "December"
We can even pick out the first 3 letters of each month name and store them into a vector called abbr3
# Create new vector of the correct length
abbr3 <- rep("", length(month.name))
# We have to iterate along the index (1 to length) instead of the name
# in this case because we want to store the result in a corresponding
# row of a new vector
for (i in 1:length(month.name))
abbr3[i] <- substr(month.name[i], 1, 3)
# We can combine the two vectors into a data frame
# so that each row corresponds to a month and there are two columns:
# full month name, and abbreviation
data.frame(full_name = month.name, abbrev = abbr3)
## full_name abbrev
## 1 January Jan
## 2 February Feb
## 3 March Mar
## 4 April Apr
## 5 May May
## 6 June Jun
## 7 July Jul
## 8 August Aug
## 9 September Sep
## 10 October Oct
## 11 November Nov
## 12 December Dec
In python, we have to define our vector or list to start out with, but that’s easy enough:
import calendar
# Create a list with month names. For some reason, by default there's a "" as
# the first entry, so we'll get rid of that
= list(calendar.month_name)[1:13]
month_name
for i in month_name:
print(i)
## January
## February
## March
## April
## May
## June
## July
## August
## September
## October
## November
## December
We can even pick out the first 3 letters of each month name and store them into a vector called abbr3
.
Python handles lists best when you use pythonic expressions. The linked post has an excellent explanation of why enumerate works best here.
# Create new vector of the correct length
= [""] * len(month_name)
abbr3
# We have to iterate along the index because we want to
# store the result in a corresponding row of a new vector
# Python allows us to iterate along both the index i and the value val
# at the same time, which is convenient.
for i, val in enumerate(month_name):
= val[0:3:] # Strings have indexes by character, so this gets
abbr3[i] # characters 0, 1, and 2.
abbr3## ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
12.5 Other Control Structures
12.5.1 Conditional Statements
case statements, e.g. case_when
in tidyverse
12.5.2 Loops
12.5.2.1 Controlling Loops
While I do not often use break, next, and continue statements, they do exist in both languages and can be useful for controlling the flow of program execution. I have moved the section on this to Section 31.2 for the sake of brevity and to reduce the amount of new material those without programming experience are being exposed to in this section.
12.5.2.2 Other Types of Loops
There are other types of loops in most languages, such as the do-while loop, which runs the code first and then evaluates the logical condition to determine whether the loop will be run again.
In R, do-while loops are most naturally implemented using a very primitive type of iteration: a repeat
statement.
repeat {
# statements go here
if (condition)
break # this exits the repeat statement
}
In python, do-while loops are most naturally implemented using a while loop with condition TRUE:
while TRUE:
# statements go here
if condition:
break
An additional means of running code an indeterminate number of times is the use of recursion, which we cannot cover until we learn about functions. I have added an additional section, Section 31.3, to cover this topic, but it is not essential to being able to complete most basic data programming tasks. Recursion is useful when working with structures such as trees (including phylogenetic trees) and nested lists.