Virtual Canary in the Digital Mine #5: Total Pre-Cull (Part 3 of 3), or, How I Learned to Stop Worrying and Love Predictive Coding

In a land that is right here and in a time that is right now, a technology has arisen so powerful that it can replace basic human document review. Is it time to bow down before our new robot overlords? For the last two installments, we’ve been talking about the coming empire, the reign of the machine, and anything I can think of to keep what attention you may have for this topic. And, in a questionable move, I promised two things for this installment: “just how the predictive coding engines work” and “ some recent case law not only validating but also mandating” their use. In retrospect, I realize that I could not possibly have promised you two less exciting cliffhangers. So, with apologies to anyone who is here anyway, I will be brief. 1) How predictive coding works (generally): Generally, there are two methods employed by predictive coding applications:

a. SAMPLING AND CONVERGENCE

A subject matter expert (attorney) sits down with a random sampling of documents. She “codes” while the computer “watches”, building a model according to the attorney’s choices. Then, the computer “predicts” the coding of another set. When the computer’s predictions sufficiently match the attorney’s choices, it has learned all it needs to complete the batch itself.

b. KNOWLEDGE GATHERING

A team of reviewers (not experts) begins to code while the computer “watches” and compares each response with all of the other responses. It makes its own predictions at the same time, and when its predictions match the reviewers’, it has learned all it needs to complete the batch itself.

As you can see, the differences are in details so fine that you and I do not really need to know about them. What’s important to understand is that, through the miracles of parallel processing, machines can now watch, practice and eventually mimic the choices made by humans as the humans are making the choices. This means a real-time, actionable feedback cycle that makes it possible for us to trust that, yes, we can program a machine to think like us. As I’ve said before, for now, all we’re asking the machine to do is work that humans were never really well-suited to do in the first place: digest and make binary choices about non-binary information at speeds not humanly possible. 2) Some important case law to guide you into the future: 2012 was a big year for machines in the courts. Either the judiciary is just exhausted with not having more assistance during discovery, or the attorneys have become more eloquent about explaining how the various technologies work, or a combination of both, but the result is this: today, there is really no excuse for not pursuing all available means of technological assistance in document review. Predictive coding is now precedent. Here are three important cases:

• February, 2012: “Da Silva Moore”: Da Silva Moore v. Publicis Groupe (Southern Dist NY. 2/24/12).

The first federal case to recognize Computer-Assisted Review as “an acceptable way to search for relevant ESI in appropriate cases.”

• July, 2012: “NDLON”: National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency, (S.D.N.Y. July 13, 2012).

FOIA case in which a District Judge held that “most custodians cannot be ‘trusted’ to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities”

and that [b]eyond the use of keyword search, parties can (and frequently should) rely on . . . machine learning to find responsive documents. Through iterative learning, these methods (known as ‘computer-assisted’ or ‘predictive’ coding) allow humans to teach computers what documents are and are not responsive to a particular . . . discovery request and . . . significantly increase the effectiveness and efficiency of searches.”

• October, 2012: “The Hooters Case“ EORHB v. HOA Holdings LLC (Del. Ch. 10/19/12).

The first case in which a court directed the parties to use Predictive Coding as a replacement for Manual Review (or to show cause why this was not an appropriate case for Predictive Coding), absent either party’s request to employ Predictive Coding.

THE BIG CONCLUSION: Predictive coding, or TAR, or CAR, or whatever, is here. You should use it. If you do not, you should have a good argument for why not. (Hint: because I want to run up the cost of discovery in hopes that my opponent will settle out-of-court is not a good argument.) THE EVEN BIGGER PREDICTION: We’re just beginning. For now, we will let the algorithms decide Responsive and Non-responsive, just like we let our emails decide junk and not-junk. But, I also let my email decide “not-junk and important” because it comes from my wife or my boss or my boss’s boss. I allow it to show me these emails prioritized over others that are, while not junk, not from anyone with whom I have a particularly special relationship. Not yet, but soon, maybe by the time this post goes live, we will also have the choice to let our algorithms decide “relevant to this issue”, “probably privileged”, or even, “hot”. And, if we train them well, we will find that their decisions are correct. At least, we’ll find that their decisions are the same as ours. A FINAL PROMISE BEFORE MOVING ON TO OTHER TOPICS: With the next installment, we will move on to another topic. I’m thinking visual analytics, but I make no guarantees. If you have suggestions or requests, please feel free to contact me here. Until then, please review my company’s entry into the predictive coding arena. We’re very excited about it and hope that you will be too. So far, our new machine overlord has been quite the benevolent ruler. Sign up here for a free live webinar and introduce yourself to your future. Eric Killough is the virtual canary AccessData has released into your digital mine. He is a JD, a CEDS, and a librarian. He thinks about electronic discovery probably more than he should. Please join him here, at Twitter, at LinkedIn, and at his own blog. He’ll be happy to meet you.

In a land that is right here and in a time that is right now, a technology has arisen so powerful that it can replace basic human document review. Is it time to bow down before our new robot overlords?

For the last two installments, we’ve been talking about the coming empire, the reign of the machine, and anything I can think of to keep what attention you may have for this topic. And, in a questionable move, I promised two things for this installment: “just how the predictive coding engines work” and “ some recent case law not only validating but also mandating” their use. In retrospect, I realize that I could not possibly have promised you two less exciting cliffhangers. So, with apologies to anyone who is here anyway, I will be brief.

1) How predictive coding works (generally):

Generally, there are two methods employed by predictive coding applications:

a. SAMPLING AND CONVERGENCE

A subject matter expert (attorney) sits down with a random sampling of documents. She “codes” while the computer “watches”, building a model according to the attorney’s choices. Then, the computer “predicts” the coding of another set. When the computer’s predictions sufficiently match the attorney’s choices, it has learned all it needs to complete the batch itself.

Sponsored

b. KNOWLEDGE GATHERING

A team of reviewers (not experts) begins to code while the computer “watches” and compares each response with all of the other responses. It makes its own predictions at the same time, and when its predictions match the reviewers’, it has learned all it needs to complete the batch itself.

As you can see, the differences are in details so fine that you and I do not really need to know about them. What’s important to understand is that, through the miracles of parallel processing, machines can now watch, practice and eventually mimic the choices made by humans as the humans are making the choices. This means a real-time, actionable feedback cycle that makes it possible for us to trust that, yes, we can program a machine to think like us. As I’ve said before, for now, all we’re asking the machine to do is work that humans were never really well-suited to do in the first place: digest and make binary choices about non-binary information at speeds not humanly possible.

2) Some important case law to guide you into the future:

2012 was a big year for machines in the courts. Either the judiciary is just exhausted with not having more assistance during discovery, or the attorneys have become more eloquent about explaining how the various technologies work, or a combination of both, but the result is this: today, there is really no excuse for not pursuing all available means of technological assistance in document review. Predictive coding is now precedent.

Sponsored

Here are three important cases:

• February, 2012: “Da Silva Moore”: Da Silva Moore v. Publicis Groupe (Southern Dist NY. 2/24/12).

The first federal case to recognize Computer-Assisted Review as “an acceptable way to search for relevant ESI in appropriate cases.”

• July, 2012: “NDLON”: National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency, (S.D.N.Y. July 13, 2012).

FOIA case in which a District Judge held that “most custodians cannot be ‘trusted’ to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities”

and that [b]eyond the use of keyword search, parties can (and frequently should) rely on . . . machine learning to find responsive documents. Through iterative learning, these methods (known as ‘computer-assisted’ or ‘predictive’ coding) allow humans to teach computers what documents are and are not responsive to a particular . . . discovery request and . . . significantly increase the effectiveness and efficiency of searches.”

• October, 2012: “The Hooters Case“ EORHB v. HOA Holdings LLC (Del. Ch. 10/19/12).

The first case in which a court directed the parties to use Predictive Coding as a replacement for Manual Review (or to show cause why this was not an appropriate case for Predictive Coding), absent either party’s request to employ Predictive Coding.

THE BIG CONCLUSION:

Predictive coding, or TAR, or CAR, or whatever, is here. You should use it. If you do not, you should have a good argument for why not. (Hint: because I want to run up the cost of discovery in hopes that my opponent will settle out-of-court is not a good argument.)

THE EVEN BIGGER PREDICTION:

We’re just beginning. For now, we will let the algorithms decide Responsive and Non-responsive, just like we let our emails decide junk and not-junk.

But, I also let my email decide “not-junk and important” because it comes from my wife or my boss or my boss’s boss. I allow it to show me these emails prioritized over others that are, while not junk, not from anyone with whom I have a particularly special relationship.

Not yet, but soon, maybe by the time this post goes live, we will also have the choice to let our algorithms decide “relevant to this issue”, “probably privileged”, or even, “hot”. And, if we train them well, we will find that their decisions are correct. At least, we’ll find that their decisions are the same as ours.

A FINAL PROMISE BEFORE MOVING ON TO OTHER TOPICS:

With the next installment, we will move on to another topic. I’m thinking visual analytics, but I make no guarantees. If you have suggestions or requests, please feel free to contact me here.

Until then, please review my company’s entry into the predictive coding arena. We’re very excited about it and hope that you will be too. So far, our new machine overlord has been quite the benevolent ruler. Sign up here for a free live webinar and introduce yourself to your future.

Eric Killough is the virtual canary AccessData has released into your digital mine. He is a JD, a CEDS, and a librarian. He thinks about electronic discovery probably more than he should. Please join him here, at Twitter, at LinkedIn, and at his own blog. He’ll be happy to meet you.