message <- "So long and thanks for all the fish"
year <- 2025
the_answer <- 42L
earth_demolished <- FALSE
8 Variables and Basic Data Types
8.1 Objectives
- Know the basic data types and what their restrictions are
- Know how to test to see if a variable is a given data type
- Understand the basics of implicit and explicit type conversion
- Write code that assigns values to variables
8.2 Basic Definitions
For a general overview, [1] is an excellent introduction to data types:
Let’s start this section with some basic vocabulary.
- a value is a basic unit of stuff that a program works with, like
1
,2
,"Hello, World"
, and so on. - values have types -
2
is an integer,"Hello, World"
is a string (it contains a “string” of letters). Strings are in quotation marks to let us know that they are not variable names.
In most programming languages (including R and python), there are some very basic data types:
logical or boolean - FALSE/TRUE or 0/1 values. Sometimes, boolean is shortened to bool
integer - whole numbers (positive or negative)
-
double or float or numeric- decimal numbers.
- float is short for floating-point value.
- double is a floating-point value with more precision (“double precision”).1
- R uses the name numeric to indicate a decimal value, regardless of precision.
character or string - holds text, usually enclosed in quotes.
In R, boolean values are TRUE
and FALSE
, but in Python they are True
and False
. Capitalization matters a LOT.
Other things matter too: if we try to write a million, we would write it 1000000
instead of 1,000,000
(in both languages). Commas are used for separating numbers, not for proper spacing and punctuation of numbers. This is a hard thing to get used to but very important – especially when we start reading in data.
8.3 Variables
Programming languages use variables - names that refer to values. Think of a variable as a container that holds something - instead of referring to the value, you can refer to the container and you will get whatever is stored inside.
8.3.1 Assignment
We assign variables values using the syntax object_name <- value
(R) or object_name = value
(python). You can read this as “object name gets value” in your head.
- DataCamp Introduction to R Chapter 1: Intro to basics
- DataCamp Introduction to Python for Data Science Chapter 1: Python Basics
In R, <-
is used for assigning a value to a variable. So x <- "R is awesome"
is read “x gets ‘R is awesome’” or “x is assigned the value ‘R is awesome’”. Technically, you can also use =
to assign things to variables in R, but most style guides consider this to be poor programming practice, so seriously consider defaulting to <-
.
In Python, =
is used for assigning a value to a variable. This tends to be much easier to say out loud, but lacks any indication of directionality.
8.3.1.1 Demo: Assignment
= "So long and thanks for all the fish"
message = 2025
year = 42
the_answer = False earth_demolished
Note that in R, we assign variables values using the <-
operator, where in Python, we assign variables values using the =
operator. Technically, =
will work for assignment in both languages, but <-
is more common than =
in R by convention.
We can then use the variables - do numerical computations, evaluate whether a proposition is true or false, and even manipulate the content of strings, all by referencing the variable by name.
8.3.2 Naming Variables
There are only two hard things in Computer Science: cache invalidation and naming things.
– Phil Karlton
Object names must start with a letter and can only contain letters, numbers, _
, and .
in R. In Python, object names must start with a letter and can consist of letters, numbers, and _
(that is, .
is not a valid character in a Python variable name). While it is technically fine to use uppercase variable names in Python, it’s recommended that you use lowercase names for variables (you’ll see why later).
What happens if we try to create a variable name that isn’t valid?
In both languages, starting a variable name with a number will get you an error message that lets you know that something isn’t right - “unexpected symbol” in R and “invalid syntax” in python.
8.3.2.1 Invalid Names
1st_thing <- "check your variable names!"
## Error: <text>:1:2: unexpected symbol
## 1: 1st_thing
## ^
1st_thing <- "check your variable names!"
Note: Run the above chunk in your python window - the book won’t compile if I set it to evaluate 😥. It generates an error of SyntaxError: invalid syntax (<string>, line 1)
<- "this isn't valid"
second.thing ## name 'second' is not defined
In python, trying to have a .
in a variable name gets a more interesting error: “.
. We’ll get into this more later, but there is a good reason for python’s restriction about not using .
in variable names.
Naming things is difficult! When you name variables, try to make the names descriptive - what does the variable hold? What are you going to do with it? The more (concise) information you can pack into your variable names, the more readable your code will be.
8.3.2.2 Learn More
Why is naming things hard? - Blog post by Neil Kakkar
There are a few different conventions for naming things that may be useful:
-
some_people_use_snake_case
, where words are separated by underscores -
somePeopleUseCamelCase
, where words are appended but anything after the first word is capitalized (leading to words with humps like a camel). -
some.people.use.periods
(in R, obviously this doesn’t work in python) - A few people mix conventions with
variables_thatLookLike.this
and they are almost universally hated 👿
As long as you pick ONE naming convention and don’t mix-and-match, you’ll be fine. It will be easier to remember what you named your variables (or at least guess) and you’ll have fewer moments where you have to go scrolling through your script file looking for a variable you named.
8.4 Types
8.4.1 Testing Types
You can use different functions to test whether a variable has a specific type.
is.logical(FALSE)
is.integer(2L) # by default, R treats all numbers as numeric/decimal values.
# The L indicates that we're talking about an integer.
is.integer(2)
is.numeric(2)
is.character("Hello, programmer!")
is.function(print)
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] TRUE
In R, you use is.xxx
functions, where xxx is the name of the type in question.
isinstance(False, bool)
isinstance(2, int)
isinstance(2, (int, float)) # Test for one of multiple types
isinstance(3.1415, float)
isinstance("This is python code", str)
## True
## True
## True
## True
## True
In python, test for types using the isinstance
function with an argument containing one or more data types in a tuple ((int, float)
is an example of a tuple - a static set of multiple values).
If we want to test for whether something is callable (can be used like a function), we have to get slightly more complicated:
callable(print)
## True
This is glossing over some much more technical information about differences between functions and classes (that we haven’t covered) [2].
x <- "R is awesome"
typeof(x)
## [1] "character"
is.character(x)
## [1] TRUE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] FALSE
= "python is awesome"
x type(x)
## <class 'str'>
isinstance(x, str)
## True
isinstance(x, bool)
## False
isinstance(x, int)
## False
isinstance(x, float)
## False
x <- FALSE
typeof(x)
## [1] "logical"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] TRUE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] FALSE
In R, is possible to use the shorthand F
and T
, but be careful with this, because F
and T
are not reserved, and other information can be stored within them. See this discussion for pros and cons of using F
and T
as variables vs. shorthand for true and false. 2
= False
x type(x)
## <class 'bool'>
isinstance(x, str)
## False
isinstance(x, bool)
## True
isinstance(x, int)
## True
isinstance(x, float)
## False
Note that in python, boolean variables are also integers. If your goal is to test whether something is a T/F value, you may want to e.g. test whether its value is one of 0 or 1, rather than testing whether it is a boolean variable directly, since integers can also function directly as bools in Python.
x <- 2
typeof(x)
## [1] "double"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] TRUE
Wait, 2 is an integer, right?
2 is an integer, but in R, values are assumed to be doubles unless specified. So if we want R to treat 2 as an integer, we need to specify that it is an integer specifically.
x <- 2L # The L immediately after the 2 indicates that it is an integer.
typeof(x)
## [1] "integer"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] TRUE
is.double(x)
## [1] FALSE
is.numeric(x)
## [1] TRUE
= 2
x type(x)
## <class 'int'>
isinstance(x, str)
## False
isinstance(x, bool)
## False
isinstance(x, int)
## True
isinstance(x, float)
## False
x <- 2.45
typeof(x)
## [1] "double"
is.character(x)
## [1] FALSE
is.logical(x)
## [1] FALSE
is.integer(x)
## [1] FALSE
is.double(x)
## [1] TRUE
is.numeric(x)
## [1] TRUE
= 2.45
x type(x)
## <class 'float'>
isinstance(x, str)
## False
isinstance(x, bool)
## False
isinstance(x, int)
## False
isinstance(x, float)
## True
A fifth common “type”3, numeric
is really the union of two types: integer and double, and you may come across it when using str()
or mode()
, which are similar to typeof()
but do not quite do the same thing.
The numeric
category exists because when doing math, we can add an integer and a double, but adding an integer and a string is … trickier. Testing for numeric variables guarantees that we’ll be able to do math with those variables. is.numeric()
and as.numeric()
work as you would expect them to work.
The general case of this property of a language is called implicit type conversion - that is, R will implicitly (behind the scenes) convert your integer to a double and then add the other double, so that the result is unambiguously a double.
8.5 Type Conversions
Programming languages will generally work hard to seamlessly convert variables to different types. This is called implicit type casting - the computer implicitly changes the variable type to avoid a conflict.
8.5.1 Implicit Type Conversion
TRUE + 2
## [1] 3
2L + 3.1415
## [1] 5.1415
"abcd" + 3
## Error in "abcd" + 3: non-numeric argument to binary operator
True + 2
## 3
int(2) + 3.1415
## 5.141500000000001
"abcd" + 3
## can only concatenate str (not "int") to str
This conversion doesn’t always work - there’s no clear way to make “abcd” into a number we could use in addition. So instead, R or python will issue an error. This error pops up frequently when something went wrong with data import and all of a sudden you just tried to take the mean of a set of string/character variables. Whoops.
When you want to, you can also use as.xxx()
to make the type conversion explicit. So, the analogue of the code above, with explicit conversions would be:
8.5.2 Explicit Type Conversion
as.double(TRUE) + 2
## [1] 3
as.double(2L) + 3.1415
## [1] 5.1415
as.numeric("abcd") + 3
## [1] NA
int(True) + 2
## 3
float(2) + 3.1415
## 5.141500000000001
float("abcd") + 3
## could not convert string to float: 'abcd'
import pandas as pd # Load pandas library
"abcd", errors = 'coerce') + 3
pd.to_numeric(## nan
When we make our intent explicit (convert “abcd” to a numeric variable) we get an NA - a missing value - in R. In Python, we get a more descriptive error by default, but we can use the pandas
library (which adds some statistical functionality) to get a similar result to the result we get in R.
There’s still no easy way to figure out where “abcd” is on a number line, but our math will still have a result - NA + 3
is NA
.
8.6 What Type is it?
If you don’t know what type a value is, both R and python have functions to help you with that.
8.6.1 Determining Variable Types
If you are unsure what the type of a variable is, use the typeof()
function to find out.
If you are unsure what the type of a variable is, use the type()
function to find out.
= "a string"
w = 3
x = 3.1415
y = False
z
type(w)
## <class 'str'>
type(x)
## <class 'int'>
type(y)
## <class 'float'>
type(z)
## <class 'bool'>
- Create variables
string
,integer
,decimal
, andlogical
, with types that match the relevant variable names.
<-
string <-
integer <-
decimal <- logical
- Can you get rid of the error that occurs when this chunk is run?
logical + decimal
integer + decimal
string + integer
- What happens when you add string to string? logical to logical?
- Create variables
string
,integer
,decimal
, andlogical
, with types that match the relevant variable names.
=
string =
integer =
decimal = logical
- Can you get rid of the error that occurs when this chunk is run?
+ decimal
logical + decimal
integer + integer string
- What happens when you add string to string? logical to logical?
string <- "hi, I'm a string"
integer <- 4L
decimal <- 5.412
logical <- TRUE
logical + decimal
## [1] 6.412
integer + decimal
## [1] 9.412
as.numeric(string) + integer
## [1] NA
"abcd" + "efgh"
## Error in "abcd" + "efgh": non-numeric argument to binary operator
TRUE + TRUE
## [1] 2
In R, adding a string to a string creates an error (“non-numeric argument to binary operator”). Adding a logical to a logical, e.g. TRUE + TRUE, results in 2, which is a numeric value.
To concatenate strings in R (like the default behavior in python), we would use the paste0
function: paste0("abcd", "efgh")
, which returns abcdefgh.
import pandas as pd
= "hi, I'm a string"
string = 4
integer = 5.412
decimal = True
logical
+ decimal
logical ## 6.412
+ decimal
integer ## 9.411999999999999
='coerce') + integer
pd.to_numeric(string, errors## nan
"abcd" + "efgh"
## 'abcdefgh'
True + True
## 2
In Python, when a string is added to another string, the two strings are concatenated. This differs from the result in R, which is a “non-numeric argument to binary operator” error.
8.7 References
This means that doubles take up more memory but can store more decimal places. You don’t need to worry about this much in R, and only a little in Python, but in older and more precise languages such as C/C++/Java, the difference between floats and doubles can be important.↩︎
There is also an R package dedicated to pure evil that will set F and T randomly on startup. Use this information wisely.↩︎
numeric
is not really a type, it’s a mode. Run?mode
for more information.↩︎