The study, conducted by the AI evaluation platform Vals AI with collaboration from Legaltech Hub, evaluated five leading legal generative AI tools across seven tasks. Vals AI says its auto-evaluation framework produces blind assessments to evaluate the accuracy of AI models.
Vals AI tested the AI tools against a control group of independent lawyers, called the Lawyer Baseline, supplied by Cognia Law, an alternative legal service provider. The results showed that AI does deliver some value in legal work.
"Generative AI has reshaped the legal landscape, but not all tools are created equal," Rayan Krishnan, co-founder of Vals AI, said in a statement. "Our study not only measures performance, but also establishes first-ever standards that legal professionals and developers can rely on to understand the technology's impact, but most importantly, its limitations."
Harvey, the fast-growing legal tech startup that just raised a $300 million Series D round, put its AI assistant into six of the seven tasks in the study. It got the top score of AI tools on five tasks and outperformed the lawyer baseline on four. It also tied the Lawyer Baseline in generating a chronology, but stayed out of the task of doing EDGAR research.
This marked the first public benchmarking evaluation of Harvey's AI assistant.
CoCounsel, the AI tool from Thomson Reuters, is the only other vendor whose AI tool received a top score in the study, which it got for summarizing documents. Thomson Reuters submitted its product in four of the seven task areas for the study, surpassing the Lawyer Baseline in those four and achieving the highest average score across them.
Vincent AI, the AI assistant from vLex, participated in seven tasks. It performed better than the Lawyer Baseline in document question-answering, document summarization and transcript analysis.
Vecflow's Oliver, the newest company in this study of AI assistants, opted into five tasks. It outperformed the Lawyer Baseline in document question-answering and document summarization, and was the only AI tool to opt into the EDGAR research category.
Lexis+ AI, the AI platform from LexisNexis, was originally part of the study, but withdrew from all tasks except for legal research. Vals AI plans to release its study on legal research soon.
The Lawyer Baseline topped the AI tools in the tasks of EDGAR research and redlining, which refers to editing contracts.
A consortium of law firms, including Reed Smith LLP, Fisher Phillips, McDermott Will & Emery LLP, Ogletree Deakins Nash Smoak & Stewart PC and four anonymous firms, contributed sample questions and documents for the study.
Vals AI found that AI tools outperformed the human lawyers in easy cases, but fell short in complex and reasoning-intensive tasks.
"These results offer a balanced perspective for the legal community," Langston Nashold, co-founder of Vals AI, said in a statement. "For developers, it's a roadmap to prioritize innovation in underperforming areas. For law firms, it's a guide to making strategic investments in AI that enhance both client service and operational ROI."
--Editing by Adam LoBelia.
Law360 is owned by LexisNexis Legal & Professional, a RELX company.
For a reprint of this article, please contact reprints@law360.com.