8  Variables and Basic Data Types

Published

November 4, 2024

8.1 Objectives

  • Know the basic data types and what their restrictions are
  • Know how to test to see if a variable is a given data type
  • Understand the basics of implicit and explicit type conversion
  • Write code that assigns values to variables

8.2 Basic Definitions

For a general overview, [1] is an excellent introduction to data types:

Let’s start this section with some basic vocabulary.

  • a value is a basic unit of stuff that a program works with, like 1, 2, "Hello, World", and so on.
  • values have types - 2 is an integer, "Hello, World" is a string (it contains a “string” of letters). Strings are in quotation marks to let us know that they are not variable names.

In most programming languages (including R and python), there are some very basic data types:

  • logical or boolean - FALSE/TRUE or 0/1 values. Sometimes, boolean is shortened to bool

  • integer - whole numbers (positive or negative)

  • double or float or numeric- decimal numbers.

    • float is short for floating-point value.
    • double is a floating-point value with more precision (“double precision”).1
    • R uses the name numeric to indicate a decimal value, regardless of precision.
  • character or string - holds text, usually enclosed in quotes.

Capitalization matters!

In R, boolean values are TRUE and FALSE, but in Python they are True and False. Capitalization matters a LOT.

Other things matter too: if we try to write a million, we would write it 1000000 instead of 1,000,000 (in both languages). Commas are used for separating numbers, not for proper spacing and punctuation of numbers. This is a hard thing to get used to but very important – especially when we start reading in data.

8.3 Variables

Programming languages use variables - names that refer to values. Think of a variable as a container that holds something - instead of referring to the value, you can refer to the container and you will get whatever is stored inside.

8.3.1 Assignment

We assign variables values using the syntax object_name <- value (R) or object_name = value (python). You can read this as “object name gets value” in your head.

In R, <- is used for assigning a value to a variable. So x <- "R is awesome" is read “x gets ‘R is awesome’” or “x is assigned the value ‘R is awesome’”. Technically, you can also use = to assign things to variables in R, but most style guides consider this to be poor programming practice, so seriously consider defaulting to <-.

In Python, = is used for assigning a value to a variable. This tends to be much easier to say out loud, but lacks any indication of directionality.

8.3.1.1 Demo: Assignment

message <- "So long and thanks for all the fish"
year <- 2025
the_answer <- 42L
earth_demolished <- FALSE
message = "So long and thanks for all the fish"
year = 2025
the_answer = 42
earth_demolished = False
Note

Note that in R, we assign variables values using the <- operator, where in Python, we assign variables values using the = operator. Technically, = will work for assignment in both languages, but <- is more common than = in R by convention.

We can then use the variables - do numerical computations, evaluate whether a proposition is true or false, and even manipulate the content of strings, all by referencing the variable by name.

8.3.2 Naming Variables

There are only two hard things in Computer Science: cache invalidation and naming things.
– Phil Karlton

Object names must start with a letter and can only contain letters, numbers, _, and . in R. In Python, object names must start with a letter and can consist of letters, numbers, and _ (that is, . is not a valid character in a Python variable name). While it is technically fine to use uppercase variable names in Python, it’s recommended that you use lowercase names for variables (you’ll see why later).

What happens if we try to create a variable name that isn’t valid?

In both languages, starting a variable name with a number will get you an error message that lets you know that something isn’t right - “unexpected symbol” in R and “invalid syntax” in python.

8.3.2.1 Invalid Names

1st_thing <- "check your variable names!"
## Error: <text>:1:2: unexpected symbol
## 1: 1st_thing
##      ^
1st_thing <- "check your variable names!"

Note: Run the above chunk in your python window - the book won’t compile if I set it to evaluate 😥. It generates an error of SyntaxError: invalid syntax (<string>, line 1)

second.thing <- "this isn't valid"
## name 'second' is not defined

In python, trying to have a . in a variable name gets a more interesting error: “ is not defined”. This is because in python, some objects have components and methods that can be accessed with .. We’ll get into this more later, but there is a good reason for python’s restriction about not using . in variable names.

Naming things is difficult! When you name variables, try to make the names descriptive - what does the variable hold? What are you going to do with it? The more (concise) information you can pack into your variable names, the more readable your code will be.

8.3.2.2 Learn More

Why is naming things hard? - Blog post by Neil Kakkar

There are a few different conventions for naming things that may be useful:

  • some_people_use_snake_case, where words are separated by underscores
  • somePeopleUseCamelCase, where words are appended but anything after the first word is capitalized (leading to words with humps like a camel).
  • some.people.use.periods (in R, obviously this doesn’t work in python)
  • A few people mix conventions with variables_thatLookLike.this and they are almost universally hated 👿

As long as you pick ONE naming convention and don’t mix-and-match, you’ll be fine. It will be easier to remember what you named your variables (or at least guess) and you’ll have fewer moments where you have to go scrolling through your script file looking for a variable you named.

8.4 Types

8.4.1 Testing Types

You can use different functions to test whether a variable has a specific type.

is.logical(FALSE)
is.integer(2L) # by default, R treats all numbers as numeric/decimal values. 
          # The L indicates that we're talking about an integer. 
is.integer(2)
is.numeric(2)
is.character("Hello, programmer!")
is.function(print)
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] TRUE

In R, you use is.xxx functions, where xxx is the name of the type in question.

isinstance(False, bool)
isinstance(2, int)
isinstance(2, (int, float)) # Test for one of multiple types
isinstance(3.1415, float)
isinstance("This is python code", str)
## True
## True
## True
## True
## True

In python, test for types using the isinstance function with an argument containing one or more data types in a tuple ((int, float) is an example of a tuple - a static set of multiple values).

If we want to test for whether something is callable (can be used like a function), we have to get slightly more complicated:

callable(print)
## True

This is glossing over some much more technical information about differences between functions and classes (that we haven’t covered) [2].

Example: Assignment and Testing Types
x <- "R is awesome"
typeof(x)
## [1] "character"
is.character(x)
## [1] TRUE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] FALSE
x = "python is awesome"
type(x)
## <class 'str'>
isinstance(x, str)
## True
isinstance(x, bool)
## False
isinstance(x, int)
## False
isinstance(x, float)
## False
x <- FALSE
typeof(x)
## [1] "logical"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] TRUE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] FALSE

In R, is possible to use the shorthand F and T, but be careful with this, because F and T are not reserved, and other information can be stored within them. See this discussion for pros and cons of using F and T as variables vs. shorthand for true and false. 2

x = False
type(x)
## <class 'bool'>
isinstance(x, str)
## False
isinstance(x, bool)
## True
isinstance(x, int)
## True
isinstance(x, float)
## False

Note that in python, boolean variables are also integers. If your goal is to test whether something is a T/F value, you may want to e.g. test whether its value is one of 0 or 1, rather than testing whether it is a boolean variable directly, since integers can also function directly as bools in Python.

x <- 2
typeof(x)
## [1] "double"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] TRUE

Wait, 2 is an integer, right?

2 is an integer, but in R, values are assumed to be doubles unless specified. So if we want R to treat 2 as an integer, we need to specify that it is an integer specifically.

x <- 2L # The L immediately after the 2 indicates that it is an integer.
typeof(x)
## [1] "integer"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] TRUE
is.double(x)
## [1] FALSE
is.numeric(x)
## [1] TRUE
x = 2
type(x)
## <class 'int'>
isinstance(x, str)
## False
isinstance(x, bool)
## False
isinstance(x, int)
## True
isinstance(x, float)
## False
x <- 2.45
typeof(x)
## [1] "double"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] TRUE
is.numeric(x)
## [1] TRUE
x = 2.45
type(x)
## <class 'float'>
isinstance(x, str)
## False
isinstance(x, bool)
## False
isinstance(x, int)
## False
isinstance(x, float)
## True

A fifth common “type”3, numeric is really the union of two types: integer and double, and you may come across it when using str() or mode(), which are similar to typeof() but do not quite do the same thing.

The numeric category exists because when doing math, we can add an integer and a double, but adding an integer and a string is … trickier. Testing for numeric variables guarantees that we’ll be able to do math with those variables. is.numeric() and as.numeric() work as you would expect them to work.

The general case of this property of a language is called implicit type conversion - that is, R will implicitly (behind the scenes) convert your integer to a double and then add the other double, so that the result is unambiguously a double.

8.5 Type Conversions

Programming languages will generally work hard to seamlessly convert variables to different types. This is called implicit type casting - the computer implicitly changes the variable type to avoid a conflict.

8.5.1 Implicit Type Conversion

TRUE + 2
## [1] 3

2L + 3.1415
## [1] 5.1415

"abcd" + 3
## Error in "abcd" + 3: non-numeric argument to binary operator
True + 2
## 3

int(2) + 3.1415
## 5.141500000000001

"abcd" + 3
## can only concatenate str (not "int") to str

This conversion doesn’t always work - there’s no clear way to make “abcd” into a number we could use in addition. So instead, R or python will issue an error. This error pops up frequently when something went wrong with data import and all of a sudden you just tried to take the mean of a set of string/character variables. Whoops.

When you want to, you can also use as.xxx() to make the type conversion explicit. So, the analogue of the code above, with explicit conversions would be:

8.5.2 Explicit Type Conversion

as.double(TRUE) + 2
## [1] 3

as.double(2L) + 3.1415
## [1] 5.1415

as.numeric("abcd") + 3
## [1] NA
int(True) + 2
## 3

float(2) + 3.1415
## 5.141500000000001

float("abcd") + 3
## could not convert string to float: 'abcd'

import pandas as pd # Load pandas library
pd.to_numeric("abcd", errors = 'coerce') + 3
## nan

When we make our intent explicit (convert “abcd” to a numeric variable) we get an NA - a missing value - in R. In Python, we get a more descriptive error by default, but we can use the pandas library (which adds some statistical functionality) to get a similar result to the result we get in R.

There’s still no easy way to figure out where “abcd” is on a number line, but our math will still have a result - NA + 3 is NA.

8.6 What Type is it?

If you don’t know what type a value is, both R and python have functions to help you with that.

8.6.1 Determining Variable Types

If you are unsure what the type of a variable is, use the typeof() function to find out.

w <- "a string"
x <- 3L
y <- 3.1415
z <- FALSE

typeof(w)
## [1] "character"
typeof(x)
## [1] "integer"
typeof(y)
## [1] "double"
typeof(z)
## [1] "logical"

If you are unsure what the type of a variable is, use the type() function to find out.

w = "a string"
x = 3
y = 3.1415
z = False

type(w)
## <class 'str'>
type(x)
## <class 'int'>
type(y)
## <class 'float'>
type(z)
## <class 'bool'>
Try It Out: Variables and Types
  1. Create variables string, integer, decimal, and logical, with types that match the relevant variable names.
string <- 
integer <- 
decimal <- 
logical <- 
  1. Can you get rid of the error that occurs when this chunk is run?
logical + decimal
integer + decimal
string + integer
  1. What happens when you add string to string? logical to logical?
  1. Create variables string, integer, decimal, and logical, with types that match the relevant variable names.
string = 
integer = 
decimal = 
logical = 
  1. Can you get rid of the error that occurs when this chunk is run?
logical + decimal
integer + decimal
string + integer
  1. What happens when you add string to string? logical to logical?
string <- "hi, I'm a string"
integer <- 4L
decimal <- 5.412
logical <- TRUE

logical + decimal
## [1] 6.412
integer + decimal
## [1] 9.412
as.numeric(string) + integer
## [1] NA

"abcd" + "efgh"
## Error in "abcd" + "efgh": non-numeric argument to binary operator
TRUE + TRUE
## [1] 2

In R, adding a string to a string creates an error (“non-numeric argument to binary operator”). Adding a logical to a logical, e.g. TRUE + TRUE, results in 2, which is a numeric value.

To concatenate strings in R (like the default behavior in python), we would use the paste0 function: paste0("abcd", "efgh"), which returns abcdefgh.

import pandas as pd

string = "hi, I'm a string"
integer = 4
decimal = 5.412
logical = True

logical + decimal
## 6.412
integer + decimal
## 9.411999999999999
pd.to_numeric(string, errors='coerce') + integer
## nan

"abcd" + "efgh"
## 'abcdefgh'
True + True
## 2

In Python, when a string is added to another string, the two strings are concatenated. This differs from the result in R, which is a “non-numeric argument to binary operator” error.

8.7 References

[1]
Why TRUE + TRUE = 2: Data Types. (Feb. 03, 2020) [Online]. Available: https://www.youtube.com/watch?v=6otW6OXjR8c. [Accessed: May 18, 2022]
[2]
Ryan, “Answer to "how do i detect whether a variable is a function?". Stack overflow,” Mar. 09, 2009. [Online]. Available: https://stackoverflow.com/a/624948/2859168. [Accessed: Jan. 10, 2023]

  1. This means that doubles take up more memory but can store more decimal places. You don’t need to worry about this much in R, and only a little in Python, but in older and more precise languages such as C/C++/Java, the difference between floats and doubles can be important.↩︎

  2. There is also an R package dedicated to pure evil that will set F and T randomly on startup. Use this information wisely.↩︎

  3. numeric is not really a type, it’s a mode. Run ?mode for more information.↩︎