Forecasting an election is never an easy task. I like to think of it as part art part science. There are many ways you can go about it, depending on the type of outcome one is looking to predict. Ultimately though, a good election forecast will correctly predict the likeliest outcome of an election by incorporating the right mix of variables in their right portions.
Our 2020 election forecast builds on our highly successful model from 2016. Four years ago we accurately predicted Nana Addo would be the winner of the election, as well as the winner of 9 out of the 10 regions in Ghana. The Central Region was the one region we got wrong. Four years later, we’ve rebuilt the model from the ground up, taking the same fundamental principles from 2016’s model and extending them to better capture the changes in this campaign cycle.
To understand our forecast model, it’s best to think of it as an aggregate of 3 different models instead of one. This aggregation is by far the biggest change since our 2016 forecast, where we used one model to generate our predictions. The three components of our 2020 forecast are: a Historic Model, a Polls-based Model and a Composite Economic Index Model. Each of these 3 models generates a forecast that is largely independent of the other two. The final model assembles the 3 forecasts, applies variable weights to them and combines them into the final prediction.
Before diving further into the inner workings of the three models, it’s important to call out that our 2020 forecast, like the 2016 model, is probabilistic and not deterministic. This simply means that we are forecasting the chance of a candidate winning and not the exact percentage of votes they will get on election day.
Let’s start our deep dive by looking at the Historic Model.
The Historic Model
The first of our 3 models is a forecast based on all past election results. In many ways, this serves as the base layer for our final forecast. The Historic Model makes its predictions based on history alone, and does not factor in any new changes from today. The first step in our Historic Model is to take the results of all past elections in all 275 constituencies and feed them into a machine learning model. Our neural networks are fed more than 2,000 data points per political party. Once the model learns from the data, it then makes a prediction for what will happen in 2020. Keep in mind that this prediction is entirely based on everything that has happened up until 2016. The results of this forecast are then passed on to the final model as a base layer.
Regional Partisan Lean
In addition to generating predictions based on past elections, the Historic Model also computes something called “Regional Partisan Leans”. The concept of a partisan lean is very simple – it’s simply an estimate of the level of support a party has in a region. It gives the model a sense of how much a given region leans towards each political party. For example, the NDC’s partisan lean in the Volta Region is calculated to be +81. This means that in any generic election, the NDC’s candidate for president should win by about 81 points. The model treats the lean as a benchmark or expectation for how a given party should be performing. As such if we run a poll that shows the NDC’s candidate winning Volta Region by 90 points, the model concludes that the candidate is over-performing against expectations and therefore assigns a bounce to the forecast. Conversely, if a party has a lean of -10 in a given region, but a given poll reveals the party to be trailing by -20, then the model concludes that the candidate is under-performing and therefore deducts points from the final forecast. This concept of partisan lean is a crucial component of our forecast. The real value of the partisan lean is reflected in how the Polls Model uses it to calibrate whether to award a bounce or deduct points.
The Polls-based Model
The great thing about the Historic Model is that it accurately captures the deep structural patterns that shape the outcome of our elections over time. The Historic Model has a strong notion of patterns such as swing versus stronghold regions, and the cyclical nature by which the 2 major parties take turns winning after every 8 years. However, one big drawback to the Historic Model is its inability to capture what has changed since the last election. By design, the Historic Model is ignorant of new shifts in voter attitudes and recent changes in electoral dynamics since 2016. This is where the Polls-based Model steps in. With this model, we are capturing those variables that the Historic Model is designed to ignore. So here’s how the Polls-based model works.
Say we poll 1,000 likely voters in the Ashanti Region and the results of such a poll shows the NPP’s candidate winning 65% of the vote. A naive model will simply assign a bonus to the candidate’s forecast because of his 30 point margin. What our model does is to compare that 30 point margin to the NPP’s partisan lean for the Ashanti Region (which is +42). The Poll Model’s interpretation then is that the candidate is underperforming relative to the party’s lean. As such the model deducts points from the forecast even though the candidate is in the lead per the poll. This is a bit of a simplistic overview of how the Polls Model works, but it shows how our model smartly assesses the results of a poll, interprets it before deciding how best to incorporate it into the forecast.
One last note about the Polls Model. There are a number of pre-processing steps the model takes before incorporating poll results. One step is to balance out the polling data to make sure it is as representative of the electorate as is possible. It does this by applying weights to different attributes about each survey respondent. Another step it to consider the newest poll in the context of all prior polls, assign weights to accommodate for recency and then aggregate the results. The goal is to build a sense of how the polls are trending overall instead of simply looking at the current poll as a snapshot. By taking these steps, our Polls-based model is able to look ahead all the way to Election Day in order to smartly judge how best to incorporate polling data while avoiding the pitfalls a naive model will typically fall into.
The Composite Economic Index
The very last component of our forecast is a small model that looks exclusively at certain leading indicators to determine how the state of the economy will affect the elections. The model uses an index of multiple indicators to calibrate an internal variable that measures how sensitive voter preferences are to changes in economic outlook. This model is highly dynamic as there is at times a difference between how the economy is doing on paper and the perception voters have. Part of how we capture voter perception of the performance of the economy is through polling.
How The Model Generates The Final Forecast
To generate the actual forecast, our final model takes inputs from the 3 sub-models discussed above and then simulates the elections thousands of times to determine the likeliest scenarios and their associated probabilities. These scenarios and their probabilities are what inform our prediction of what will happen on election day. This idea of simulating the elections is key to understanding why our forecast is probabilistic and not deterministic. It’s also key to correctly interpreting our forecast. A 60% chance of winning the election means that out of the thousands of simulations we ran, the candidate won 60% of those simulations. To make it a bit more concrete, suppose we simulate the elections 10,000 times a 60% chance for a given candidate means that that candidate won 6,000 out of those 10,000 elections. That presents the case for the likeliest outcome on election day. That also means that there is a 40% chance that the candidate won’t win. Obviously on election day only 1 scenario will play out. It could turn out to be one of the simulations out of the 4,000 in which the candidate lost. Even a candidate with a 1% chance of winning could end up being the winner on election day, even though it’s very unlikely under normal electoral conditions.
To recap, our 3 models combine to yield a very robust forecast that tries to capture as much of the multi-dimensional nature of modern elections as is possible. The Historic Model captures the deep-seated structural patterns that hold true every election cycle while the Polls Model does an excellent job capturing the latest shift in voter support. Our CEI Model rounds out the forecast by contextualizing the sensitivity that voter preferences have toward changes in economic conditions. As we get more polling data over the next few months, we will update the forecast on an ongoing basis, sometimes multiple times in 1 day. So check back here often to see the latest changes in our forecast.