The multiple linear regression model has been trained on local jurisdiction case data and used to predict the levels of DC COVID case numbers per 100,000 as a target variable. To predict yesterday’s COVID level in DC, the model needs to have the local areas’ data.
Fortunately, the NYTimes usually updates these numbers early in the day (typically DC reports their numbers around 1pm eastern):
The model easily makes a prediction for 9/6 numbers in DC using the .predict() method in statsmodels:
The actual DC numbers for 9/6 are:
The model is off by almost 10 cases per 100,000 people. Not a great error. Next step will be to explore other machine learning methods, starting with a simple and interpretable model, the decision tree.