A Short Overview of Statistics

What is Statistics?

The science of developing and studying methods for collecting, analyzing, interpreting, and presenting empirical data

  • How should I conduct my experiment?
  • What is the best way to test my hypothesis?
  • What is the true value of ____
    (and how precisely do I know that value)?
  • What does my experiment/sample tell me about my data?
  • What is the future value of ____ given what I know now?

Statistical Tasks

Description: What does the data say?

Statistical Tasks

Experimental Design: What’s the best way to collect data?

Statistical Tasks

Inference: What does the data tell us (about the population)?

Statistical Tasks

Prediction: What will happen next?

An Historical Example

The Logic of Hypothesis Testing

Can someone tell whether tea or milk is added first to a cup?

  • 4 cups of tea with milk first , 4 cups of tea with tea first

  • Randomize the order

  • Test the cups and make predictions for all 8 cups

What is the probability that someone gets all 8 correct?

A Lady Tasting Tea

  • If the 4 milk-first cups are correctly identified, so are the 4 tea-first cups

  • If we assume the taster is just guessing, we could just as easily flip 4 coins

A Lady Tasting Tea

Statistical evaluation

  • Null hypothesis: Taster is guessing

    • If our experimental results are likely to occur by random chance, we can’t really say whether the taster is guessing or not
      We fail to reject the null hypothesis

    • If our experimental results are not likely to occur by random chance, we may decide it’s more likely that there is another explanation… the taster knows their stuff!

Try it out!

Go to https://shiny.srvanderplas.com/APL and start with the Tea Tasting tab.

  • What effect does the # simulations have on the results?

  • What effect does the # test cups have on the results?

    • Assuming the number of observed cups is the same as the number of test cups
    • Assuming the number of observed cups is less than the number of test cups

Hypothesis Testing Logic

  1. Run an experiment and generate an observed value

  2. Simulate a large number of experiments under random chance (the null hypothesis)

  3. Compare the observed value to the results of the simulated experiments

  4. Decide whether the observed value is plausible under random chance, or it is more likely that the results would happen if the null hypothesis is wrong

Theory-based Statistics

  1. Run an experiment and generate a test statistic (t, z, F, \(\chi^2\))

  2. Compare the observed value to the theoretical distribution

  3. Decide whether the observed value is plausible under random chance, or it is more likely that the results would happen if the null hypothesis is wrong

Try it out

Go to https://shiny.srvanderplas.com/APL and start with the Distributions tab.

  • What changes when you change distribution?

  • How many samples do you need for the simulation histogram to look similar to the theoretical distribution?

  • What effect does setting your observed value to be larger have on the p-value?
    Note: At this point, we are doing tests examining values greater than the observed value. This will obviously not always hold true.

  • How different is the simulation p-value from the theoretical p-value? Does this change when you increase the number of samples?

Statistical Test Logic

  • Goal: Are the experiment results are compatible with the null hypothesis?

  • if the region that is “more extreme” than the observed value is very small, then the experimental results are surprising

    • This suggests the null hypothesis might not be reliable
    • Reject \(H_0\) in favor of the alternative

Statistical Test Logic

Statistical Test Logic

  • the region that is “more extreme” than the observed value is summarized as the p-value – the area of that region.

    • p-values lower than \(\alpha = 0.05\), a pre-specified cutoff are considered “statistically significant”
      that is, they should lead to a rejection of the null hypothesis

Two Sided Tests

  • If we don’t know/care whether \(x < a\) or \(x > a\), we use a two-sided test

You can experiment with two-sided tests here:

https://shiny.srvanderplas.com/APL/ and click on “One Continuous Variable”

Confidence Intervals

  • Another way to use statistics is to get a range of “plausible” values based on the estimate + variability

  • This is called a confidence interval

Confidence Intervals

  • Every confidence interval has a “level” of \(1-\alpha\), just like every hypothesis test has a significance level \(\alpha\)

  • Confidence intervals with higher levels (e.g. .99 instead of .95) are wider

  • Interval width depends on

    • sample size
    • variability
    • confidence level
  • A CI of (A, B) is read as “We are 95% confident that the true value of _________ lies between A and B”

Experimental Design

One Categorical Variable

  • Statistic: # Successes (out of # Trials)

  • Simulation method: Flip coins \((p = 0.5)\), weighted spinners \((p\neq 0.5)\)

  • Theoretical distribution: Binomial

https://shiny.srvanderplas.com/APL/ and click on “One Categorical Variable”

One Continuous Variable

  • Statistic: \(\displaystyle t = \frac{\overline x - \mu}{s/\sqrt{n}}\) where
    • \(\overline x\) is the sample mean,
    • \(s\) is the sample standard deviation,
    • \(\mu\) is the hypothesized mean, and
    • \(n\) is the sample size
  • Simulation method: none
  • Theoretical distribution: \(t\) with \(n-1\) degrees of freedom

https://shiny.srvanderplas.com/APL/ and click on “One Continuous Variable”

Two-group Tests

  • Categorical variable: Group 1 or Group 2?

  • Continuous variable: Some measurement

  • Statistic: \(\overline x_1 - \overline x_2\)

  • Simulation method: shuffle group labels

  • Theoretical distribution: \(t\)
    (degrees of freedom are a bit complicated)

https://shiny.srvanderplas.com/APL/ and click on “Categorical + Continuous Variables”

Two-group Tests

A two-sample experiment randomly divides up a sample of experimental units into two groups and calculates the sample mean for each group.

Two-group Tests

A two-sample experiment randomly divides up a sample of experimental units into two groups and calculates the sample mean for each group.

We compare \(\overline{X}_A\) and \(\overline {X}_B\): \(\overline{X}_A - \overline{X}_B\).

The standard deviation of \(\overline{X}_A - \overline{X}_B\) requires calculation: Use a two-sample test.

Repeated Measures

Repeated Measures

Matched Pairs

Matched Pairs

Matched Pairs

Matched Pairs

Analysis of Variance

Used for multiple groups

Suppose we have a group of schoolchildren separated by grade, and we want to examine the relationship between grade and height.

Analysis of Variance

If height is important, students in a single grade should be more similar than students across different grades.

Analysis of Variance

Goal: determine similarity within groups

  • within-groups sum-of-squares
    Square the deviations from the group mean and add them up

  • between-groups sum-of-squares
    Sum of squared differences of the class average and the overall average for each student

Analysis of Variance

Results from ANOVA are shown in tables like this:

Factor Df Sum Sq Mean Sq F value Pr(>F)
grade 2 112 56.000000 26.25 1.26e-05
Residuals 15 32 2.133333
Total 17 144

The F-value is the statistic, and is compared to an F(df1, df2) distribution - in this case, F(2, 15) to get a p-value.

Two Continuous Variables

We want to know if there is a linear association between x and y

Two Continuous Variables

If the slope of the line is nonzero, there is a linear association

Two Continuous Variables

We need to test whether that slope is significantly different from 0

Two Continuous Variables

  • Continuous variables: \(x\) and \(y\)

  • Statistic: \(a\), the sample slope

  • Simulation method: shuffle values of \(y\) relative to \(x\)

  • Theoretical distribution: \(t_{n-2}\), where \(n\) is the number of observations