Hidden Figures: Monetizing Your DMS

Machine learning may be a path to unlock the value of a law firm's document management system.

Are firms missing the true value of AI-driven document analysis tools?

In last month’s article, we covered how legal professionals can best leverage analytics. This month, we’ll look at how analytics can be used to monetize data that most lawyers already have at their fingertips.

The rich set of contracts that reside in the law firm’s document management system (DMS) contain a wealth of intellectual property that the firm has built for clients over the years. And while firms pay publishers and vendors to create standards for their contracts, chances are they already have the tools to do it themselves. Unfortunately, it’s not easy to find this information in its unstructured state on the DMS. Richard Robbins, Director of Knowledge Management at Sidley, sums it up nicely:

Searching for a firm’s model form of a particular document isn’t hard.  But searching for an example of previously completed work that fits a specific set of circumstances can be very difficult – especially if the person doing the search isn’t the person who did the prior work.  Searching for a merger agreement involving a pair of semiconductor manufacturing companies where shareholders are able to elect stock or cash consideration, or searching for a brief on a particular antitrust issue as argued before a given judge can be like looking for a needle in a haystack. Traditional DMS search systems just aren’t up to the task.

The ability to mine this information can lead to a number of benefits including reduced unbilled hours, more consistent contracts, better deal-level profitability, better outcomes for clients, and higher rates of retention and referrals. Of course, if this were a simple task, firms would have done this by now. How does one turn contracts into structured (read searchable) data?

Theoretically, an attorney could manually sift through contracts to search for and extract useable data — but in reality, this is not a cost-effective process for most firms to undertake. This is a problem space where the concept of machine learning really shines.

Before we delve into how this works, let’s take a moment to briefly review the term. Machine learning (ML), which is a particular type of artificial intelligence, is a mechanism through which an algorithm is trained to recognize patterns in sets of information. In the context of contracts, machine learning can be used to identify key terms or clauses within unstructured text.

How does this work? Think of the popular music program Pandora, in which users create their own stations based on a favorite artist. The program uses an algorithm to find similar music, and the user trains the algorithm — in the form of a thumbs up or a thumbs down — to find more songs the user will like.

The concept is the same when applied to machine learning for contracts. Say one starts with a training set of 100 different kinds of contracts, and within each, identifies the Governing Law clause such as this one below:

This Agreement will be governed in all respects by the laws of Illinois, without regard to any conflicts of law principles, decisional law, or statutory provision which would require or permit the application of another jurisdiction’s substantive law.

To train an ML algorithm to extract this clause, one would highlight the clause in all contracts from the training set, and then feed them back into the algorithm. One would then run additional contracts through the (now trained) algorithm, providing feedback on whether the algorithm correctly identified the clause.

How accurate are these algorithms? It depends on the nature of the data itself. A Governing Law clause is fairly simple, whereas others — say indemnification clauses — may be more difficult to extract accurately. When speaking of accuracy, there are two measurements commonly used: recall is the percentage of times an algorithm identifies a pattern, and precision is the percentage of times that the algorithm identifies a pattern that it was intended to identify. Using the example from above, just because you trained the algorithm doesn’t mean it won’t pull back a result that has the correct criteria, but is not actually a governing law clause.

Data scientists refer to the product of the recall and precision as the f score, which tells you the percentage of available clauses the algorithm extracts correctly. As a rule, you should be able to achieve an f score of about 90%, meaning the algorithm identifies 9 out of 10 clauses correctly.  Getting 100% of the desired clauses requires a subject matter expert (SME) to review the algorithm results. While this is still labor intensive, using the ML algorithm can reduce the labor required by an order of magnitude.

If you’re interested in performing this type of contract analysis on your own DMS, there are a number of options — each of which has trade-offs in time to value, upfront cost, and recurring cost.

Do It Yourself (DIY)

Thanks to a recent boom in the commercial availability of ML algorithms, there are several offerings from Amazon, Google, and others to create your own machine learning software.

Pros: By building your own solution, you have complete flexibility to tailor it to your firm’s needs. This option has the lowest cost up front, as the toolkits are relatively inexpensive or free, and you may be able to leverage existing development resources.

Cons: While the upfront cost is lower, this approach carries the highest ongoing cost. It’s up to you to build (and maintain) all functionality, which effectively takes your firm in the direction of being a (small) software company.

Toolkit

The most prevalent approach in the legal market today is to leverage a commercial tool built for contract review or due diligence — such as LexPredict’s open source tool — and apply a commercial tool to contracts from the firm’s own database.

Pros: A firm can save time by leveraging a ready-made tool. This reduces the amount of software you need to build and maintain on your own, which reduces your ongoing cost and improves your time to value.

Cons: While this approach reduces upfront labor, you still need to train these tools to recognize your contracts. The tools also focus more on the processing of the contracts and less on making the data actionable (e.g., integrating the data into your infrastructure, executing discovery or comparison, etc.) For those aspects of the solution, you will still need to build and maintain your own code.

The Fully Formed (SaaS) Solution

In this approach, a law firm adopts a software as a service (SaaS) product that has been trained on specific documents, while the law firm adds custom data to train the algorithm further.

Pros: This approach is the most time-efficient and requires the least maintenance by the law firm. In terms of function, these solutions can typically be integrated directly into the DMS, allowing for a level of discoverability that the other two approaches do not provide.

Cons: Purchasing a solution requires the highest upfront investment, but is usually less expensive when measuring the cost to deployment. This option also allows for the least amount of flexibility, since the tool is built and maintained by a vendor.

The infographics above show some of these tradeoffs when choosing a path to implement an ML approach to contract analysis. A common mistake that many firms make is underestimating the commitment (and amount of ongoing investment) the firm needs to make to fully leverage DIY or tool-based solutions. To be sure, flexibility can be worth the tradeoff, but firms should consider carefully what type of business they want to be in.

Irrespective of the approach taken, one thing is certain — there’s value in the content sitting in the DMS. Machine learning may be a path to unlock that value for the firm.


May Goren Photography

Dean Sonderegger is Vice President & General Manager, Legal Markets and Innovation at Wolters Kluwer Legal & Regulatory U.S., a leading provider of information, business intelligence, regulatory and legal workflow solutions. Dean has more than two decades of experience at the cutting edge of technology across industries. He can be reached at Dean.Sonderegger@wolterskluwer.com.

CRM Banner