## Temporal Difference Learning – Linear Function Approximation

For a Markov chain $\hat{x} = (\hat x_t : t\in\mathbb Z_+)$, consider the reward function

associated with rewards given by $r = (r(x) : x\in\mathcal X)$. We approximate the reward function $R(x)$ with a linear approximation,