Legal Research Services Vary Widely in Results, Study Finds

In a comparison of six leading research providers, there was hardly any overlap in the top 10 results from searches.

onJuly 24, 2017 at 6:13 PMJuly 24, 2017 at 6:28 PM

Does the legal research platform you use matter to the results you receive? Turns out, it very much matters. Different legal research platforms deliver surprisingly different results. In fact, in a comparison of six leading research providers, there was hardly any overlap in the cases that appeared in the top-10 results returned by each database.

This startling finding is the result of research performed by Susan Nevelow Mart, director of the law library and associate professor at the University of Colorado Law School, where she teaches advanced legal research and analysis and environmental legal research. Mart has published a draft of her research paper, The Algorithm as a Human Artifact: Implications for Legal {Re} Search, and she presented some of her findings in a program I attended at the recent annual meeting of the American Association of Law Libraries.

Mart’s exploration of the differences among research services was spurred in part by an email she received from Mike Dahn, senior vice president for Westlaw product management at Thomson Reuters, in which he noted that “all of our algorithms are created by humans.” Why is that statement significant? Because if search algorithms are built by humans, then those humans made choices about how the algorithm would work. And those choices, Mart says, become the biases and assumptions that get coded into each system and that have implications for the results they deliver.

Sponsored

The Business Case For AI At Your Law Firm

ChatGPT ushers in the age of generative AI – even for law firms.

From THOMSON REUTERS

So Mart set out to study how hidden biases and assumptions affect the results provided by some of the major legal research providers. She chose six to study: Casetext, Fastcase, Google Scholar, Lexis Advance, Ravel and Westlaw. The results, Mart writes, “are a remarkable testament to the variability of human problem solving.”

Little Overlap in Results

“Remarkable” is the right word. Mart found that there is hardly any overlap in the cases that appear in the top 10 results returned by each database. An average of 40 percent of the cases were unique to one database, and only about 7 percent of the cases were returned in search results in all six databases.

This startled me. Of course I would have expected there to be differences among the top results delivered by different research platforms. But for there to be “hardly any overlap”? I mean, the most relevant cases are the most relevant cases, aren’t they?

Sponsored

Legal AI: 3 Steps Law Firms Should Take Now

If 2023 introduced legal professionals to generative AI, then 2024 will be when law firms start adapting to utilize it. Things are moving fast, so…

From LexisNexis

Sponsored

Is The Future Of Law Distributed? Lessons From The Tech Adoption Curve

The rise of remote work has dramatically reshaped the relationship between Lawyers and Law Firms, see how Scale LLP has taken the steps to get…

From The Team at Scale LLP

Sponsored

Navigating Financial Success by Avoiding Common Pitfalls and Maximizing Firm Performance

In this CLE-eligible webinar, we’ll explore the most common accounting pitfalls and how to avoid them for your firm.

From Pilot and Above The Law

Sponsored

Early Adopters Of Legal AI Gaining Competitive Edge In Marketplace

How to best leverage generative AI as an early adopter with ethical use.

From LexisNexis

Mart first compared the algorithms of different legal research services in 2013, when she entered the same query into four databases – Lexis Advance, Fastcase, WestlawNext and Google Scholar – and examined the results. That study produced three notable takeaways:

There were irrelevant results in the top 10 results for all four databases.
Seventy percent of the cases were unique to one database.
Of those unique cases, slightly over half were both relevant and unique.

In 2016, she repeated the same test against the same four databases. Again she found very different results among the four. In addition, she found that each service returned notably different results than it had three years earlier, even though many of the new cases existed in the database when the first search was performed.

Expanding Her Scope

For her latest research, Mart expanded her experiment to include 50 different searches in six research databases, and then to examine only the top 10 results. In order to makes the sets of information being searched as identical as possible, searches were limited to subsets of reported cases within specific jurisdictions.

Sponsored

The Business Case For AI At Your Law Firm

ChatGPT ushers in the age of generative AI – even for law firms.

From THOMSON REUTERS

Sponsored

Early Adopters Of Legal AI Gaining Competitive Edge In Marketplace

How to best leverage generative AI as an early adopter with ethical use.

From LexisNexis

One hypothesis Mart wanted to test was whether each algorithm would find the same cases. After all, the search algorithm for each database was trying to achieve the same result in the same pool of information by finding relevant cases. Even though her earlier studies using one query had not found this to be true, there was the possibility that a large number of searches would produce more consistent results.

To the contrary, the different algorithms found very different cases. In fact, each research platform returned an average of 40 percent unique cases. On average, 25 percent of the cases were in only two databases, 15 percent appeared in three databases, 9.5 percent appeared in four databases, just under 7 percent appeared in five databases, and just under 7 percent appeared in all six databases.

Another hypothesis was whether, because the algorithms all rank relevance, and the goal is to return relevant cases, the top 10 cases would all be relevant. Here again, Mart’s study found otherwise. The more-established research providers, Westlaw and Lexis Advance, did better at delivering relevant cases, but no service was perfect. Here are the percentages of relevant cases they delivered:

Westlaw: 67 percent.
Lexis Advance: 57 percent.
Fastcase: 44.7 percent.
Google Scholar: 44.6 percent.
Ravel: 40.5 percent.
Casetext: 39.7 percent.

One further factor Mart examined was the number of unique cases delivered by each platform that were also relevant. In other words, given that each platform returned an average of 40 percent unique cases, how many of those unique cases were relevant? Westlaw did the best, she found, with 33.1 percent of its cases both unique and relevant. Casetext fared the worst, with just 8.2 percent of its unique cases also found to be relevant.

What It All Means

In her research paper, Mart examines other differences in the results among research services. For example, newer services tended to deliver newer cases. She also discusses some of the factors that may explain, at least in part, why those differences occurred. But the bigger question for the rest of us is what we should make of this. Mart has several suggestions.

For one, we as researchers should have at least some understanding of the biases inherent in different systems and should not limit our research to any single system.

“Legal research has always been an endeavor that required redundancy in searching; one resource does not usually provide a full answer, just as one search will not provide every necessary result,” Mart writes. “This study clearly demonstrates that the need for redundancy in searches and resources has not faded with the rise of the algorithm.”

For another, Mart believes we should all challenge legal research companies to be much more transparent about the biases in their algorithms.

“Algorithmic accountability in legal databases will help assure researchers of the reliability of their search results and will allow researchers greater flexibility in mining the rich information in legal databases,” Mart argues. “If researchers know generally what a search algorithm is privileging in its results, they will be better researchers.”

Two research platforms that I am aware of provide some degree of transparency into their search algorithms by allowing users to customize the relevance factors used by their algorithm. In the desktop version of CaseFinder, a Virginia legal research service, users can adjust sliders corresponding to four relevance-ranking factors: hit density, age, precedence, and enhanced precedence.

Recently, Fastcase added a similar feature to its advanced search in Fastcase 7, allowing users to customize the algorithm by adjusting sliders for seven relevance factors: search relevance score, large document relevance, small document relevance, authoritativeness, frequently read, frequently printed, and frequently emailed.

The bottom line is to be cognizant of the fact that different research services have different biases and that those biases affect the search results they deliver. We can push companies to be more transparent about their algorithms, but in the meantime, we should remember that time-worn piece of advice: Consider the source.

Robert Ambrogi is a Massachusetts lawyer and journalist who has been covering legal technology and the web for more than 20 years, primarily through his blog LawSites.com. Former editor-in-chief of several legal newspapers, he is a fellow of the College of Law Practice Management and an inaugural Fastcase 50 honoree. He can be reached by email at ambrogi@gmail.com, and you can follow him on Twitter (@BobAmbrogi).

Topics

AALL, Algorithms, American Association of Law Libraries, Bob Ambrogi, Casetext, Fastcase, Google Scholar, Legal Research, Lexis Advance, lexisnexis, LexisNexis / Lexis-Nexis, Ravel, Ravel Law, Robert Ambrogi, Technology, Thomson Reuters Westlaw, Westlaw, WestlawNext

Topics

Love ATL? Let's make it official. Sign up for our newsletter.

Love ATL? Let's make it official.
Sign up for our newsletter.