What the A Level Grade Scandal Can Show Us about Algorithmic Bias

Recently, we have seen a particularly visible example of artificial intelligence make the news. UK A level grades were calculated using an algorithm since students were unable to sit exams due to the Covid-19 pandemic. There was outrage since many students received grades substantially lower than those they had been expecting, predicted or needed to confirm their university place. After the backlash, the national governments each reversed their position and allowed students to be awarded their predicted grades instead.

Artificial intelligence (AI) is a term used for a huge range of methods of automated decision making techniques based on data rather than human evaluation. These techniques are being used more and more frequently in many sectors, from healthcare to social media and personalised advertising. They are also being used at an increasing rate, due to the improvements and ease of access to the right technology.

While AI can improve efficiency and sometimes accuracy for many tasks, there are a lot of ways in which it can be unfair. This is usually referred to as algorithmic bias. Examples include giving arbitrary population groups preferential treatment, for example discrimination based on gender or race.

Algorithmic bias can happen at several different stages of the data collection, model development and deployment of the results and insights. A big problem is often the data used which underpin the models. Biases can be introduced if the data used is not properly representative of the population you are interested in. If you build an algorithm based on a data source consisting of 90% men, features of the data which are more closely associated with men than women will be given more preference in the outcome, despite the fact that the algorithm may be used in decision making about men and women equally. This may sound like an exaggerated example but since data collection can be very difficult, time consuming and expensive (often due to ethical and privacy considerations), it is far simpler to use existing data, even if it is not ideal.

Even if your data is perfectly representative, biases can be brought in based on historical trends. An algorithm can only “know” what it has learned from the data used to build it. Some criticisms of the A level results algorithm have been related to the issues of only using a narrow set of historical data. The algorithm places most weight on a student’s school’s performance over the past three years. This has resulted in preferential treatment to the private and high-achieving state schools while students from lower-achieving schools have a restricted potential to achieve the highest grades – due to their school’s historical performance and not the students’ own personal attainment. While this method has allowed this years’ results to be in keeping with the past few years of grades, it means that on the individual level, many students were given unfair treatment.

The A level results algorithm has been an example of when individuals are acutely aware of how fair the model has been for them: grades are a very clear outcome, and students not getting what they were expecting can have a clear and lasting impact on their future in terms of getting into university and beginning their careers. This is a noteworthy occasion as often AI decisions are use within large systems which can allow biases to go unchecked. The public engagement with the idea of an algorithm which has the potential for misuse and unfairness is hopefully showing how we should be very careful going forward using AI. We should not blindly accept the outcomes of algorithms since there are many ways biases can influence the results in unfair manners.

The outcry over the results did lead to the governments’ U-turn in allowing students to use their teacher-predicted grades. Despite the grade inflation this will result in (when asked, teachers responded that their predicted grades reflected what they thought a student could get on a good day), since each predicted grade was made by a teacher with personal knowledge of the individual student, there is less of a chance of systematic bias based on school.