Notes on Causal Inference following the course by Brady Neal.
In this post we preview the course on Causal Inference. This post will give a brief overview of the main concepts.
Inferring the effects of any treatment/policy/intervention/ect.
Examples:
Suppose we have some disease we are tying to treat with treatment A and treatment B. Our only goal is minimizing death. Suppose treatment B is much more scarce than treatment A.
Treatment | Total |
---|---|
A | 240/1500 16% |
B | 105/550 19% |
Note this column is just \(\mathbb{E}[Y \vert T]\)
On average 16% (240 out of 1500) died after receiving treatment A.
On average 19% (105 out of 550) died after receiving treatment B.
It appears treatment A is the best option, because fewer patients that received treatment A died.
Treatment | Mild | Severe | Total |
---|---|---|---|
A | 210/1400 15% | 30/100 30% | 240/1500 16% |
B | 5/50 10% | 100/500 20% | 105/550 19% |
Note the two new columns are \(\mathbb{E}[Y \vert T,C]\).
Now that we have conditioned on condition we see our conclusion is flipped. Fewer patients died that received treatment B in both mild and severe patients.
This is Simpson’s Paradox: Our conclusions seem to depend on how we partition our data.
We can think of these numbers the following way:
\[ \frac{1400}{1500}(0.15) + \frac{100}{1500}(0.30) = 0.16 \]
\[ \frac{50}{550}(0.10) + \frac{500}{550}(0.20) = 0.19 \]
Now, the fractions are like weights on our percentages. We see the 19% of patients that received treatment B had a severe condition.
Likewise, the most patients that received treatment A only had a mild condition.
The answer will depend on the causal structure of the problem.
Observations can be correlated by chance, or if there is a common cause of both.
Total association (e.g. correlation) is a mixture of causal and confounding association.
Suppose you do not take the pill, then \(Y_i(0)\) is the Factual.
The problem is we cannot compute the Counterfactual, \(Y_i(1)\).
Therefore, we cannot compute the causal effect.
Leverage linearity of expectation to
Denote the individual treatment effect (ITE) by \(Y_i(1)-Y_i(0)\).
The ATE is \(\mathbb{E}[Y_i(1) - Y_i(0)] = \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)]\).
Note, potential outcomes are not actual outcomes, so
\[ \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)] \neq \mathbb{E}[Y\vert T=1] - \mathbb{E}[Y\vert T = 0] \] The left hand side is causal, while the right hand side is causal and confounding.
This is where randomized trials come in. This allows us to remove any causal relationship between treatment and condition.
When there is no confounding: \[ \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)] = \mathbb{E}[Y\vert T=1] - \mathbb{E}[Y\vert T = 0] \]
Randomization is very powerful, because it also removes any causal effects of unobserved variables.
Can’t always be randomized.
How do we measure causal effect in observational studies?
We adjust/control for the right variables \(W\).
If \(W\) is a sufficient adjustment set, we have
\[ \mathbb{E}[Y(t) \vert W = w] := \mathbb{E}[Y\vert do(T=t), W =w] = \mathbb{E}[Y\vert t,w] \] - This still depends on \(w\), so we take the marginal
\[ \mathbb{E}[Y(t)] := \mathbb{E}[Y\vert do(T=t)] = \mathbb{E}_W\mathbb{E}[Y\vert t, W] \]
Treatment | Mild | Severe | Total |
---|---|---|---|
A | 210/1400 15% | 30/100 30% | 240/1500 16% |
B | 5/50 10% | 100/500 20% | 105/550 19% |
\[ \mathbb{E}[Y\vert do(T=t)] = \mathbb{E}_W\mathbb{E}[Y\vert t, C] = \sum_{c\in C} \mathbb{E}[Y\vert t,c]P(c) \]
\[ \frac{1450}{2050}(0.15) + \frac{600}{2050}(0.30) \approx 0.194 \]
\[ \frac{1450}{2050}(0.10) + \frac{600}{2050}(0.20) \approx 0.129 \]