A leaked memo, published in May and allegedly originating from a researcher within Google, starts with the line “We Have No Moat And neither does OpenAI”. The memo foreshadows the unstoppable growth in generative AI and its widespread adoption.

In this post, we consider what an oncoming wave of AI-generated evidence might mean for businesses, courts and anyone else engaged in litigation or investigations.

The horse has bolted

The leaked memo explains how, in the author’s view, both Google and OpenAI are at risk of falling behind the open source community in terms of generative AI developments, following a key inflection point when the “LLaMA” large language model (LLM) was leaked to the internet at large in March.

What then followed was an explosion in open source innovation that enabled anyone with sufficient knowledge to run and fine-tune LLaMA-derivatives on consumer hardware. Previously, LLMs had been the (near-)exclusive domain of large technology companies with the knowledge and compute capabilities to train models with hundreds of billions of parameters – the competitive advantage or “moat” referred to in the memo.

Then, nearly overnight, individuals could suddenly run open source LLMs independently on laptops (or even mobile phones!) and fine-tune them to achieve results approaching those of ChatGPT.

Deepfakes and dark AI

While the democratisation of LLMs and other generative AI promises many benefits – like all tools – it is also likely to be used for illicit purposes.

For example, given the rise of deepfake technology (already available on an open source basis), the prospect of authentic-seeming - but false - AI generated media has been with us for some time now. However, the text manipulation abilities of open source LLMs and their accessibility raises the prospect of a wave of AI-generated text content that threatens to overwhelm content prepared by humans and increase the potency of disinformation campaigns.

“Bot” activity on the internet is already widespread, with suggestions that as much as 47% of internet traffic in 2022 was generated by bots. However, the widespread adoption of generative AI is likely to exacerbate that trend and make it increasingly difficult to identify whether content is solely human-authored, created by generative AI, or a mix.

Enterprise generative AI

The explosion in LLM usage is not just limited to illicit use-cases however; it has forced the hand of major technology companies who are rapidly incorporating this technology into commodity office software.

Google Docs already offers an AI assistant while Microsoft is in the process of rolling out CoPilot as an add-on to Word, Excel, PowerPoint, Outlook, Teams and more. The technology promises a range of benefits including automatically generating a wide range of documents such as emails, memoranda, presentations and more.

Again, this will lead to a proliferation of content that is partly or wholly AI-authored throughout commercial documentation and different business sectors, as well as on the internet at large. Generative AI (and related questions about who or what is the true author of  a document) will increasingly impact anyone who uses core office software for their work.

How is all of this relevant to litigation and investigations? 

For some time now, the English Courts have given contemporaneous documents primacy over witness evidence in evidential matters. As Leggatt J famously observed in Gestmin SGPS v Credit Suisse [2013] EWHC 3560:

““… the best approach for a judge to adopt … is, in my view, to place little if any reliance at all on witnesses’ recollections of what was said in meetings and conversations, and to base factual findings on inferences drawn from the documentary evidence and known or probable facts”.

This approach is based partly on the practical challenges of oral testimony but also reflects the growing (sometimes overwhelming) wealth of contemporaneous electronic evidence arising from “digital conversations” by email, WhatsApp and other means.

Such evidence is often central in contentious proceedings because it enables courts and regulatory authorities to infer the contemporaneous intentions and knowledge of the author. However, the ease with which generative AI produces new content will raise some thorny issues as to the credibility and authorship of such digital evidence:

  • Who’s evidence is it anyway? – Open-source generative AI can already create convincing deepfakes. In combination with the increasing use of LLMs to draft emails or other documents, the English courts might well will see witnesses in the stand who claim they are not responsible for video, audio or correspondence that appears to portray their views or knowledge (a point that has already arisen in the US case of Huang v Tesla (19cv346663)). Future witnesses may well seek to disavow unfavourable evidence on the basis that it was prepared by generative AI, and therefore does not reflect their views or beliefs.

  • An onslaught of generated content – Disclosure is becoming an ever-larger burden for courts, litigants, regulators and prosecutors. As with the internet more generally, the ease with which current tools can churn out documents risks deepening existing difficulties with deriving meaningful information from a sea of data, particularly where humans are increasingly removed from the drafting.

  • Legal innovation – On the flip side, generative AI applications may also enable dramatically accelerated document review, even in the face of ever-increasing volumes of evidence. Technology assisted review (TAR) like predictive coding is already growing in prevalence and the ability to frame queries in natural language (per ChatGPT and other LLMs) may further spur widespread adoption. The question then is how comfortable businesses, courts and regulators will be with placing reliance on “black box” machine learning models, particularly given the prevalence of "hallucinatory" outputs from current generative AI.

  • A self-imposed “moat”? – More generally the courts will increasingly need to grapple with pleadings and witness statements prepared using LLMs. LLMs could reduce the time taken to prepare such documents (particularly with legal fine-tuning), but would need to comply with the stringent requirements applicable. That includes ensuring witness statements accurately reflects the voice and beliefs of the witness (rather than the increasingly recognisable “tone” of (e.g.) ChatGPT). A recent US case attests to the care required when using ChatGPT to prepare pleadings.

Nevertheless, we anticipate the English courts are unlikely to seek to blanket ban the use of generative AI, particularly given potential time and cost savings. Instead, we anticipate an onus on self-regulation, with litigants either applying for permission to use the technology or voluntarily declaring where they have used it and how.

Looking ahead

The wave of open source LLMs and imminent incorporation of generative AI tools into office software means in-house counsel will not only need to consider how to deal with the risks and opportunities associated with generative AI in the here and now (see our recent Tech Mid-Year Update on generative AI for more), but will increasingly need to address issues that may arise when AI-generated documents fall to be considered in litigation and regulatory proceedings.