Factor | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
grade | 2 | 112 | 56.000000 | 26.25 | 1.26e-05 |
Residuals | 15 | 32 | 2.133333 | ||
Total | 17 | 144 |
The science of developing and studying methods for collecting, analyzing, interpreting, and presenting empirical data
Description: What does the data say?
Experimental Design: What’s the best way to collect data?
Inference: What does the data tell us (about the population)?
Prediction: What will happen next?
Can someone tell whether tea or milk is added first to a cup?
4 cups of tea with milk first , 4 cups of tea with tea first
Randomize the order
Test the cups and make predictions for all 8 cups
What is the probability that someone gets all 8 correct?
If the 4 milk-first cups are correctly identified, so are the 4 tea-first cups
If we assume the taster is just guessing, we could just as easily flip 4 coins
Statistical evaluation
Null hypothesis: Taster is guessing
If our experimental results are likely to occur by random chance, we can’t really say whether the taster is guessing or not
We fail to reject the null hypothesis
If our experimental results are not likely to occur by random chance, we may decide it’s more likely that there is another explanation… the taster knows their stuff!
Go to https://shiny.srvanderplas.com/APL and start with the Tea Tasting tab.
What effect does the # simulations have on the results?
What effect does the # test cups have on the results?
Run an experiment and generate an observed value
Simulate a large number of experiments under random chance (the null hypothesis)
Compare the observed value to the results of the simulated experiments
Decide whether the observed value is plausible under random chance, or it is more likely that the results would happen if the null hypothesis is wrong
Run an experiment and generate a test statistic (t, z, F, \(\chi^2\))
Compare the observed value to the theoretical distribution
Decide whether the observed value is plausible under random chance, or it is more likely that the results would happen if the null hypothesis is wrong
Go to https://shiny.srvanderplas.com/APL and start with the Distributions tab.
What changes when you change distribution?
How many samples do you need for the simulation histogram to look similar to the theoretical distribution?
What effect does setting your observed value to be larger have on the p-value?
Note: At this point, we are doing tests examining values greater than the observed value. This will obviously not always hold true.
How different is the simulation p-value from the theoretical p-value? Does this change when you increase the number of samples?
Goal: Are the experiment results are compatible with the null hypothesis?
if the region that is “more extreme” than the observed value is very small, then the experimental results are surprising
the region that is “more extreme” than the observed value is summarized as the p-value – the area of that region.
You can experiment with two-sided tests here:
https://shiny.srvanderplas.com/APL/ and click on “One Continuous Variable”
Another way to use statistics is to get a range of “plausible” values based on the estimate + variability
This is called a confidence interval
Every confidence interval has a “level” of \(1-\alpha\), just like every hypothesis test has a significance level \(\alpha\)
Confidence intervals with higher levels (e.g. .99 instead of .95) are wider
Interval width depends on
A CI of (A, B) is read as “We are 95% confident that the true value of _________ lies between A and B”
Statistic: # Successes (out of # Trials)
Simulation method: Flip coins \((p = 0.5)\), weighted spinners \((p\neq 0.5)\)
Theoretical distribution: Binomial
https://shiny.srvanderplas.com/APL/ and click on “One Categorical Variable”
https://shiny.srvanderplas.com/APL/ and click on “One Continuous Variable”
Categorical variable: Group 1 or Group 2?
Continuous variable: Some measurement
Statistic: \(\overline x_1 - \overline x_2\)
Simulation method: shuffle group labels
Theoretical distribution: \(t\)
(degrees of freedom are a bit complicated)
https://shiny.srvanderplas.com/APL/ and click on “Categorical + Continuous Variables”
We compare \(\overline{X}_A\) and \(\overline {X}_B\): \(\overline{X}_A - \overline{X}_B\).
The standard deviation of \(\overline{X}_A - \overline{X}_B\) requires calculation: Use a two-sample test.
Used for multiple groups
Suppose we have a group of schoolchildren separated by grade, and we want to examine the relationship between grade and height.
If height is important, students in a single grade should be more similar than students across different grades.
Goal: determine similarity within groups
within-groups sum-of-squares
Square the deviations from the group mean and add them up
between-groups sum-of-squares
Sum of squared differences of the class average and the overall average for each student
Results from ANOVA are shown in tables like this:
Factor | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
---|---|---|---|---|---|
grade | 2 | 112 | 56.000000 | 26.25 | 1.26e-05 |
Residuals | 15 | 32 | 2.133333 | ||
Total | 17 | 144 |
The F-value is the statistic, and is compared to an F(df1, df2) distribution - in this case, F(2, 15) to get a p-value.
Continuous variables: \(x\) and \(y\)
Statistic: \(a\), the sample slope
Simulation method: shuffle values of \(y\) relative to \(x\)
Theoretical distribution: \(t_{n-2}\), where \(n\) is the number of observations