Results may vary in legal research databases

Legal Technology

infographic

Shutterstock

When a lawyer searches in a legal database, that single search box is like a lure: Put in your search terms and rely on the excellence of the search algorithms to catch the right fish.

At first glance, the various legal research databases seem similar. For instance, they all promote their natural language searching, so when the keywords go into the search box, researchers expect relevant results. The lawyer would also expect the results to be somewhat similar no matter which legal database a lawyer uses. After all, the algorithms are all trying to solve the same problem: translating a specific query into relevant results.

The reality is much different. In a comparison of six legal databases—Casetext, Fastcase, Google Scholar, Lexis Advance, Ravel and Westlaw—when researchers entered the identical search in the same jurisdictional database of reported cases, there was hardly any overlap in the top 10 cases returned in the results. Only 7 percent of the cases were in all six databases, and 40 percent of the cases each database returned in the results set were unique to that database. It turns out that when you give six groups of humans the same problem to solve, the results are a testament to the variability of human problem-solving. If your starting point for research is a keyword search, the divergent results in each of these six databases will frame the rest of your research in a very different way.

READ  ‘The Outsider’: An Interview With Anthony Franze

SEEING IS BELIEVING

It is easy to forget that the algorithms returning search results are completely human constructs. Those humans made choices about how the algorithms will work. And those choices become the biases and assumptions that are built into research systems. Bias for algorithms simply means a preference in a computer system. While researchers don’t know the choices the humans made, we can know the variables that are at work in creating legal research algorithms.

Search grammar: Which terms are automatically stemmed (returned to their root form) and which are not, which synonyms are automatically added, which legal phrases are recognized without quotation marks, how numbers are treated, and how the number of word occurrences in a document determine results—these are examples of search grammar.

Term count: If your search has six words and only five words are in a document, the algorithm can be set to include or exclude the five-term document.

Proximity: The algorithm is preset to determine how close search terms have to be to each other to be returned in the top results.

Machine learning: The programmers decide whether to include instructions that allow the algorithm to “learn” from the data in the database and make predictions.

Prioritization: Relevance ranking is one form of prioritizing that emphasizes certain things at the expense of others. U.S. Supreme Court cases, newer cases or well-cited cases may get a relevance boost.

Network analysis: The extent to which the algorithm uses citation analysis to find and order results is a human choice.

Classification and content analysis: Database providers with full classification systems and access to secondary sources to mine may be programming their algorithms to utilize that value-added content.

READ  Lawyers have an ethical duty to safeguard confidential information in the cloud

Filtering: Decisions about what content to include and exclude from a database affect results. These decisions may be based on copyright or other access issues.

Once these decisions have been made and the code has been implemented, legal researchers don’t know how those human choices are affecting search results. But the choices matter to what a researcher sees in the results set. Code is law, as Lawrence Lessig famously said in his 1999 book, Code and Other Laws of Cyberspace.

Read more …


Susan Nevelow Mart is an associate professor and the director of the law library at the University of Colorado Law School in Boulder.

This article was published in the March 2018 issue of the

ABA Journal with the title “Results May Vary: Which database a researcher uses makes a difference.”


Be Sociable, Share!
FacebooktwitterredditpinterestlinkedintumblrFacebooktwitterredditpinterestlinkedintumblr

Follow Us!
FacebooktwitterpinterestlinkedinyoutubeFacebooktwitterpinterestlinkedinyoutube

Author: Edward Lott

Edward Lott, Ph.D., M.B.A. is president and managing partner of Allentown-based ForLawFirmsOnly Marketing, Inc., a local search and digital marketing agency that offers clients lead generation, local seo and Google Maps Domination. Ed has been a digital entrepreneur since 1994, having discovered very early the opportunities the Internet offered. After having spent over two decades helping attorneys grow their practice, Ed joined the staff of ForLawFirmsOnly Marketing as President and Managing Partner, where he is expanding the agency’s cutting-edge services to the legal market. A true marketing futurist, Ed's vast experience working directly with attorneys has given him a unique perspective on law firm marketing not found in many other digital marketing agencies. Ed has reshaped the offerings of ForLawFirmsOnly to focus on growing law firms through a holistic approach to digital marketing evident in the reformulated lead generation processes now in place. Want to learn more about ForLawFirmsOnly Marketing, their lead generation programs, or just talk to Ed about his visions for helping law firms grow? Call him at 855-943-8736.

Scroll Up