Examining judicial sentencing using court transcripts and natural language processing techniques

Project Details

Lead Supervisor: Jose Pina-Sánchez (University of Leeds)
Other Supervisors: Eric Atwell (School of Computing)
Contact Email:
Partners: University of Leeds
External Partners: Sentencing Council for England and Wales

Start Date: October 2017

The project entails the use of natural language processing techniques to study sentence records from Her Majesty Courts and Tribunals Service, made available online at www.thelawpages.com. These records present certain details of the case in a systematic fashion (for example, the type of offence, the sentence outcome, or the court location), making it easy to retrieve. However, some other relevant case characteristics (such as the number of previous convictions, or the presence of personal mitigating factors such as remorse) are less systematically reported. When present, they are embedded in different parts of the judge’s statement, hence the need to rely on natural language processing techniques to record them.

The website captures a sample of 14,736 sentences passed by the Crown Court from 2000 to the present date. Focusing on offences of homicide within the 2009-2018 period we will seek to answer the following research questions:

RQ1. What are the aggravating and mitigating factors that judges take into account when sentencing cases of homicide?

RQ2: What is the effect that they have on the sentence outcome?

RQ3: What will be the impact of the new homicides guideline?

RQ4: How reliably can aggravating and mitigating factors in sentence transcripts be recorded using natural language processing compared to a supervised process of content analysis?

Part of the substantive originality of this project stems from the dearth of empirical analyses looking at cases of homicide. This is mainly due to the lack of adequate data with which to do so. The importance of finding out which are the main factors and how they are weighted is also reinforced by the absence of a sentencing guideline covering cases of homicide at the present time.

The methodological contribution of the project is even clearer. Natural language processing techniques are a very rare research tool in the disciplines of Criminal Justice and Law. The creation of a dataset covering the main factual elements describing criminal sentences could open important research avenues in the future. In particular, using the information contained in sentence transcripts we could deepened in the study of consistency in sentencing. Additionally, this data will make it possible to investigate new types of discrimination in sentencing such as those based on gender or religion.

The non-academic impact of the project is equally significant. The Sentencing Council has a statutory duty to monitor the application of their different guidelines. As part of that duty they undertake different types of research: qualitative interviews of judges, time-series analysis using official statistics from the Ministry of Justice, and content analysis of court records. This last type of research is carried out by social researchers, one transcript at a time, which makes it too time consuming. The research suggested here could be used to demonstrate the Sentencing Council (and interested parties at the Ministry of Justice) how to improve the generalisability of their findings by covering much bigger samples than what no person could possibly deal with on its own. All of that at a fraction of their current costs of running research projects based on content analysis.

Reference number LE05

Deadline for applications – 6th June 2017

Apply online here

Tags: , , ,