The term “analytics” may be the most ubiquitous in today’s legal tech landscape. Of the 91 exhibitors present at the annual American Association of Law Libraries Annual Meeting and Conference earlier this month, it’s safe to assume that virtually all of them were armed with marketing material alluding to “analytics.”
The pervasiveness of the term raises a few basic questions. What exactly are analytics? Are analytics and artificial intelligence the same thing? And why, exactly, should law firms and lawyers care about analytics in the first place?
The answers to those questions (and more) are the focus of this month’s column.
Definitions
Analytics can be simply defined as information resulting from the systematic analysis of data or statistics. In a legal context, they comprise empirically-rooted information that attorneys can leverage to help their clients make better decisions.
Analytics augmented by machine learning offer profound new possibilities across the legal spectrum – from the development of a litigation strategy based on historical outcomes of cases with similar fact patterns, to planning and executing a negotiating strategy for transactions based on contractual terms for similar deals. There is an abundance of data, in the form of case law and transaction documents, available for analysis. We’ll tackle exactly how one gets data out of unstructured text in a separate column.
Types and Range of Analytics
The first type of analytics we’ll look at is rooted in statistical analysis and doesn’t require the use of artificial intelligence capabilities like machine learning. The objective is to harness descriptive and predictive statistics to solve a particular problem. Relevant current and historical statistics are compiled on a wide array of attributes related to the problem at issue, and key summary statistics are then computed. These summary statistics are then used as guides in making better decisions amongst a set of available choices.
Using baseball as an example, assume that (1) a manager has to choose whether to keep his pitcher in the game against the next batter in a key part of the game, and (2) that the manager has statistics that show that the upcoming batter has hit a home run in 50% of his at bats against the current pitcher. Knowing the historical statistical performance, the manager may choose to replace the current pitcher with someone who has better historical performance against the batter.
Of course there’s no guarantee that past performance will reoccur. The key is that with the data at his disposal, the manager – or CEO, retailer, law firm managing partner, etc. – has quantitative weapons that allow him to make evidence-based assessments covering a wide range of relevant data that should perform better over time.
**
The second major category of analytics centers on sophisticated, machine learning-enhanced pattern matching. The sophistication of the pattern matching relies on the ability of machines to do what they do best – process large amounts of data that a human simply could not consume in a reasonable period of time. These algorithms can detect patterns in data sets that are then used to make exceptionally fast and accurate predictions about outcomes associated with a new set of data.
A study recently published in the journal Nature documents how Stanford researchers developed a machine learning algorithm to detect potential cases of skin cancer. The team fed the algorithm with 130,000 pictures of moles, along with outcome information as to whether or not each mole tested positive for skin cancer. The results were extraordinary; the machine was able to accurately assess cancerous moles about 96% of the time, slightly exceeding the accuracy rate of a team of 21 trained dermatologists.
If there are patterns to be found amongst the images, the machine will “learn” to predict, with a high degree of accuracy, whether a newly added picture of a mole is cancerous. Machine-enhanced pattern matching is a game changer, but it’s important to be aware of what even the smartest machine cannot do: it cannot articulate its thought process, or explain why it has decided that a particular image is one of a cancerous mole. The machine can find patterns that a human cannot, but such patterns are not transparent to the humans harnessing the power of the machine.
Caveat Emptor: Potential Trouble Spots for Analytics
Analytics will play an increasingly important role in legal practice. Yet algorithms can only deliver output that’s as good as the inputs that humans insert in the first instance. Humans, who also bear responsibility for interpreting the machine’s output, ought to look out for some common pitfalls.
The Law of Large Numbers
Even though the machine allows for larger amounts of data than humans could wrestle with, the law of large numbers remains intact. Put simply, good predictive analytics require a large sampling of data to work well. Recall the baseball example from above. If the batter has only faced the pitcher twice, his 50% home run rate is probably random; but if it’s happened 50 times, there may well be significant predictive value. A rough rule of thumb when it comes to pattern matching is that you need at least 100 examples of the fact pattern you’re seeking to analyze if you’re seeking legitimate predictive value. Too small a sample equals lousy predictive capability.
Ensuring a sufficiently large sample in certain legal contexts can be challenging – but it’s no less important. For instance, if one is trying to make predictions about the likely outcome of a specific ERISA claim filed in one particular federal district court and heard by one particular judge, one needs to look at whether the judge has heard enough of these types of claims previously to allow for any significant predictive value.
Cause and Effect
You may have heard the expression: correlation isn’t the same as causation. That’s true and it’s of crucial relevance when considering analytics.
Sometimes data elements cause behavior, and sometimes they are the effect of behavior – and analytics requires differentiating between the two. A report, for example, claiming that people who run five or more times a week are 80% less likely to have joint problems than those who run less frequently is insufficient grounds for one to employ running frequency as a predictor of joint health. One might sensibly argue that people with joint pain aren’t able to run very frequently and hence that the cause of the observed relationship is joint pain, not frequency of running. Additional investigation is needed to accurately distinguish cause and effect.
Assessing causation is typically the realm of the data scientist and techniques such as multivariate regression analysis. Fortunately, most reputable firms employ data analysts.
Constancy
Analyzing historic data to predict future outcomes only works if the past is truly representative of the future. Can we depend on history repeating itself? In those cases where all environmental factors remain constant over time, the answer is yes. But when changes occur in any relevant environmental factor, the past can quickly become a lousy predictor of the future.
Consider the abundance of occasions on which the United States Supreme Court has decided the merits of a legal claim differently than it had in the past after the constitution of the judges on the court had changed. One might indeed be able to make a very good prediction about the future if the data analyzed is based on the holdings of the same judges who currently sit on the court. But using the activities of a past bench to predict the holdings of a future bench isn’t likely to do nearly as well.
Lack of Transparency: Humans, Stick Around
As noted above, a pattern-matching machine essentially makes its predictions in a black box. Attorneys will often be unable to explain to their clients why the machine has offered the prediction it has. In combination with some of the other cautions just discussed, it’s best to think of these algorithms as exceptionally useful starting points for further research rather than as conclusive.
Hence the rising popularity of another term: “augmented intelligence.” The concept envisions the machine as assisting but not replacing the human role; machine and human play on the same time, but people continue to be the ultimate decision-makers. The hope is that by correctly deploying analytics, we humans – and we legal professionals – will make better choices. But we’ll save that for next month’s column.
Dean Sonderegger is Vice President & General Manager, Legal Markets and Innovation at Wolters Kluwer Legal & Regulatory U.S., a leading provider of information, business intelligence, regulatory and legal workflow solutions. Dean has more than two decades of experience at the cutting edge of technology across industries. He can be reached at [email protected].