What is the difference between reasonable endeavours and best endeavours?” “Can a pile of bricks be protected by copyright?”. These are just two of the questions you might ask your lawyer and expect an accurate answer in return. But what happens if you turn to AI for legal advice?

We have created the LinksAI English Law Benchmark to test how much we can rely on AI tools to provide English law advice.

The benchmark

The benchmark comprises 50 questions from 10 different practice areas: contract, intellectual property, data privacy, employment, real estate, dispute resolution, corporate, competition, tax and banking. 

The questions are hard. They are the sorts of questions that would require advice from a competent mid-level lawyer, specialised in that practice area.

The answers were marked by expert lawyers from each practice area. Each answer was given a mark out of 10 comprised of 5 marks for substance (is the answer right?), 3 for citations (is the answer supported by relevant statute, case law, regulations?) and 2 for clarity.

The results

Artificial intelligence has advanced significantly in the last 18 months. Large Language Models (LLMs) can provide superficially convincing answers to legal questions. 

However, the models we tested – GPT 2/3/4 and Bard – provide inaccurate English law legal advice (scoring 1.3 out of 5) and “hallucinate” citations (scoring 0.8 out of 3). Other LLMs were not tested and may perform better.

LLMs do, nonetheless, have an important role supporting the provision of legal services, e.g. conducting legal research, extracting data from contracts and summarisation.

The future

The next 18 months could be interesting. If AI continues its current rate of advance, one of these LLMs might well pass the benchmark next time round. That could irrevocably change the nature of legal services.

We will use the benchmark to test future LLM models to see if this is the case.

 

See our report, and the questions and answers, and other supporting material