Null hypothesis (H0):
1. H0: m = μ
2. H0: m \(\leq\) μ
3. H0: m \(\geq\) μ
Alternative hypotheses (Ha): 1. Ha:m ≠ μ (different)
2. Ha:m > μ (greater)
3. Ha:m < μ (less)
Note: Hypothesis 1. are called two-tailed tests and hypotheses 2. & 3. are called one-tailed tests.
The p-value is the probability that the observed data could happen, under the condition that the null hypothesis is true.
Note: p-value is not the probability that the null hypothesis is true.
Note: Absence of evidence ⧧ evidence of absence.
Cutoffs for hypothesis testing *p < 0.05, **p < 0.01, ***p < 0.001. If p value is less than significance level alpha (0.05), the hull hypothesies is rejected.
not rejected (‘negative’) | rejected (‘positive’) | |
---|---|---|
H0 true | True negative (specificity) | False Positive (Type I error) |
H0 false | False Negative (Type II error) | True positive (sensitivity) |
Type II errors are usually more dangerous.
t-test
t-test and normal distribution
t-distribution assumes that the observations are independent and that they follow a normal distribution. If the data are dependent, then p-values will likely be totally wrong (e.g., for positive correlation, too optimistic). Type II errors?
It is good to test if observations are normally distributed. Otherwise we assume that data is normally distributed.
Independence of observations is usually not testable, but can be reasonably assumed if the data collection process was random without replacement.
FIXME: I do not understand this. Deviation data from normalyty will lead to type-I errors. I data is deviated from normal distribution, use Wilcoxon test or permutation tests.
One-sample t-test
One-sample t-test is used to compare the mean of one sample to a known standard (or theoretical/hypothetical) mean (μ).
t-statistics: \(t = \frac{m - \mu}{s/\sqrt{n}}\), where
m is the sample mean
n is the sample size
s is the sample standard deviation with n−1 degrees of freedom
μ is the theoretical value
Q: And what should I do with this t-statistics?
Q: What is the difference between t-test and ANOVA?
Q: What is the smallest sample size which can be tested by t-test?
Q: Show diagrams explaining why p-value of one-sided is smaller than two-sided tests.
R example:
We want to test if N is different from given mean μ=0:
N = c(-0.01, 0.65, -0.17, 1.77, 0.76, -0.16, 0.88, 1.09, 0.96, 0.25)
t.test(N, mu = 0, alternative = "less")
##
## One Sample t-test
##
## data: N
## t = 3.0483, df = 9, p-value = 0.9931
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
## -Inf 0.964019
## sample estimates:
## mean of x
## 0.602
t.test(N, mu = 0, alternative = "two.sided")
##
## One Sample t-test
##
## data: N
## t = 3.0483, df = 9, p-value = 0.01383
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.1552496 1.0487504
## sample estimates:
## mean of x
## 0.602
t.test(N, mu = 0, alternative = "greater")
##
## One Sample t-test
##
## data: N
## t = 3.0483, df = 9, p-value = 0.006916
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 0.239981 Inf
## sample estimates:
## mean of x
## 0.602
FIXME: why it accepts all alternatives at the same time (less and greater?)
Two samples t-test
Do two different samples have the same mean?
H0:
1. H0: m1 - m2 = 0
2. H0: m1 - m2 \(\leq\) 0
3. H0: m1 - m2 \(\geq\) 0
Ha:
1. Ha: m1 - m2 ≠ 0 (different)
2. Ha: m1 - m2 > 0 (greater)
3. Ha: m1 - m2 < 0 (less)
The paired sample t-test has four main assumptions:
- The dependent variable must be continuous (interval/ratio).
- The observations are independent of one another.
- The dependent variable should be approximately normally distributed.
- The dependent variable should not contain any outliers.
Continuous data can take on any value within a range (income, height, weight, etc.). The opposite of continuous data is discrete data, which can only take on a few values (Low, Medium, High, etc.). Occasionally, discrete data can be used to approximate a continuous scale, such as with Likert-type scales.
t-statistics: \(t=\frac{y - x}{SE}\), where y and x are the samples means. SE is the standard error for the difference. If H0 is correct, test statistic follows a t-distribution with n+m-2 degrees of freedom (n, m the number of observations in each sample).
To apply t-test samples must be tested if they have equal variance:
equal variance (homoscedastic). Type 3 means two samples, unequal variance (heteroscedastic).
### t-test
a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
b = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180)
# test homogeneity of variances using Fisher’s F-test
var.test(a,b)
##
## F test to compare two variances
##
## data: a and b
## F = 2.1028, num df = 9, denom df = 9, p-value = 0.2834
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5223017 8.4657950
## sample estimates:
## ratio of variances
## 2.102784
# variance is homogene (can use var.equal=T in t.test)
# t-test
t.test(a,b,
var.equal=TRUE, # variance is homogene (tested by var.test(a,b))
paired=FALSE) # samples are independent
##
## Two Sample t-test
##
## data: a and b
## t = -0.94737, df = 18, p-value = 0.356
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -10.93994 4.13994
## sample estimates:
## mean of x mean of y
## 174.8 178.2
Summary of R functions for t-tests
One-sample t-test
t.test(x, mu = 0, alternative = c("two.sided", "less", "greater"), paired = FALSE, var.equal = FALSE, conf.level = 0.95)
What is that?
one-way ANOVA or 2-way ANOVA with Bonferroni multiple comparison or Dunnett’s post-test
Non-parametric tests
Mann-Whitney U Rank Sum Test
- The dependent variable is ordinal or continuous.
- The data consist of a randomly selected sample of independent observations from two independent groups.
- The dependent variables for the two independent groups share a similar shape.
Wilcoxon test
The Wilcoxon is a non-parametric test which works on normal and non-normal data. However, we usually prefer not to use it if we can assume that the data is normally distributed. The non-parametric test comes with less statistical power, this is a price that one has to pay for more flexible assumptions.
Tests for categorical variables
Categorical variable can take fixed number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Chi-squared tests
The chi-squared test is most suited to large datasets. As a general rule, the chi-squared test is appropriate if at least 80% of the cells have an expected frequency of 5 or greater. In addition, none of the cells should have an expected frequency less than 1. If the expected values are very small, categories may be combined (if it makes sense to do so) to create fewer larger categories. Alternatively, Fisher’s exact test can be used.
data = rbind(c(83,35), c(92,43))
data
## [,1] [,2]
## [1,] 83 35
## [2,] 92 43
chisq.test(data, correct=F)
##
## Pearson's Chi-squared test
##
## data: data
## X-squared = 0.14172, df = 1, p-value = 0.7066
chisq.test(testor,correct=F) ## Fisher’s Exact test R Example:
Group | TumourShrinkage-No | TumourShrinkage-Yes | Total |
---|---|---|---|
1 Treatment | 8 | 3 | 11 |
2 Placebo | 9 | 4 | 13 |
3 Total | 17 | 7 | 24 |
The null hypothesis is that there is no association between treatment and tumour shrinkage.
The alternative hypothesis is that there is some association between treatment group and tumour shrinkage.
data = rbind(c(8,3), c(9,4))
data
## [,1] [,2]
## [1,] 8 3
## [2,] 9 4
fisher.test(data)
##
## Fisher's Exact Test for Count Data
##
## data: data
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.1456912 10.6433317
## sample estimates:
## odds ratio
## 1.176844
The output Fisher’s exact test tells us that the probability of observing such an extreme combination of frequencies is high, our p-value is 1.000 which is clearly greater than 0.05. In this case, there is no evidence of an association between treatment group and tumour shrinkage.
Multiple testing
When performing a large number of tests, the type I error is inflated: for α=0.05 and performing n tests, the probability of no false positive result is: 0.095 x 0.95 x … (n-times) <<< 0.095
The larger the number of tests performed, the higher the probability of a false rejection!
Many data analysis approaches in genomics rely on itemby-item (i.e. multiple) testing:
Microarray or RNA-Seq expression profiles of “normal” vs “perturbed” samples: gene-by-gene
ChIP-chip: locus-by-locus
RNAi and chemical compound screens
Genome-wide association studies: marker-by-marker
QTL analysis: marker-by-marker and trait-by-trait
False positive rate (FPR) - the proportion of false positives among all resulst.
False discovery rate (FDR) - the proportion of false positives among all significant results.
Example: 20,000 genes, 100 hits, 10 of them wrong.
FPR: 0.05%
FDR: 10%
The Bonferroni correction
The Bonferroni correction sets the significance cut-off at α/n.