AR – Case study 2

The Dispute

The dispute related to drawings made by a partner that were alleged to be in excess of what was permitted by the partnership agreement.

The Evidence

The bulk of the evidence consisted of email traffic between the parties and associated documents evidencing the conduct of the parties during the period of time the drawings were taken.

The Challenges

Sufficient keywords could not be agreed between the parties due to the lack of obvious specific keywords.

Ultimately, the keywords agreed were of a sufficiently general nature (so as to be sure to find all relevant documents) that they also resulted in a substantial proportion of irrelevant documents being returned (i.e. keyword hits that related to the day to day running of the business without being materially relevant to the narrow issue of drawings).

The total keyword responsive documents exceeded 150,000 and the volume of actually relevant material was expected to be a very small percentage of this amount.

Agreement could not be reached as to how to cull the data to a smaller and more manageable pot by adopting a narrower set of keywords and there was the prospect of a lengthy manual relevance review being undertaken (a reviewer will do well to review 100 documents per hour which raised the prospect of each party having to pay for “30 reviewer weeks” worth of material being waded through.

Decision to use Assisted Review

Keyword searching was obviously an inefficient means to identify relevant documents – the majority of the documents containing the keywords were not materially relevant to the issues in dispute

It was felt that Assisted Review had the potential to save a substantial amount of time and money if it could be used to help identify the material that was directly relevant to the issues in dispute.

How Assisted Review works

The approach differs depending on the product; in this instance the software used was kCura Relativity’s assisted review software.

  1. A universe of documents to put into assisted review is defined.  In this instance all keyword responsive documents, within the relevant date range, were included. An alternative to this could have been to put all documents into Assisted Review.  Often this is an appropriate option but in this instance the parties agreed that the keywords had been effective in stripping out the completely irrelevant material – where they had failed was in identifying the material that was of real significance and directly relevant to the issues in dispute.
  2. The “universe” is analysed to exclude documents that are not compatible with the Assisted Review software.  These files include spreadsheets (numbers are not recognised by the software), image files and file types with very little text content such as calendar appointments.
  3. An Index is built of the text contained in the remaining documents. The index:
    1. Only includes the “body text” of emails including the Author, Recipients and Subject line is not felt to assist a process involving contextual search
    2. Is based on the “machine readable” text – in the case where the text has been created by running an Optical Character Recognition process it is important to remember that it is the extracted text (even if incorrect) rather than what a human can read in the source document which is what is relevant so far as the software is concerned.
    3. The senior lawyer on the case reviews a sample of documents to train the software on what is relevant (and what is not). This involved a set of 1519 randomly selected documents being used to train the software.
    4. The software uses this training to then “categorise” all remaining documents.  This means that it takes decisions made on conceptually similar documents and applies that decision to others like them – effectively creating “clusters” of textually similar documents.
    5. The senior reviewer then initiates a number of Quality Assurance (“QA”) rounds that aims to test the software’s accuracy in clustering the documents.   This involves another random sample being reviewed by the senior reviewer for “blind coding” (meaning that they code the documents without knowing what the software has categorised these documents).
    6. At the end of each QA round Millnet then reports to the lawyer:
      1. The total documents where the reviewer’s decision was found to be different from the software’s decision.  These documents are known as “overturns”.
      2. These overturns are then reconsidered by the reviewer to determine if, on reflection:
        1. the reviewer was right and the software was wrong;
        2. the software was right and the reviewer was wrong.
  4. Our experience that almost every QA round results in the senior reviewer concluding that their initial decision was wrong.   This demonstrates (and impresses reviewers) that the software can also be used as a reviewer QA tool and also illustrates the fact that a human led review is never likely to be perfect – there are always “borderline” documents where it is difficult to be dogmatic as to whether or not they should be classed as relevant.
  5. Depending on the number of overturns and the reviewers determination following a reconsideration of these instances, the accuracy of the software is assessed and we decide:
    1. To re-categorise – i.e. use the additional human tagging performed in the last round (the overturns) to re-train the software with more documents OR
    2. That the software has achieved a level of accuracy that the senior reviewer is comfortable with and we can move on to finalise the assisted review process.

Project Statistics

The following figures are specific to this matter and have been rounded in the interests of clarity:

Total documents considered for Assisted Review

150,000

Total Documents in the above that were not compatible with the software and were reviewed manually

24,000

Total GB included in the Assisted Review process

30 GB

Total documents reviewed by a senior reviewer to “train” the software as to what was relevant

6,500

Total QA rounds performed to achieve stability

4

Non-Responsive Overturns as a percentage of the documents reviewed in the final QA round

0.87%

Total documents manually reviewed by junior team (includes documents not compatible with AR, the documents categorised as responsive (a second pass review for relevance and privilege)

34,750

Total Documents ultimately subject to human review

41,000

Total Non-Responsive Documents Identified by Assisted Review Software

120,000

Total relevant documents ultimately disclosed

9,000

Total Assisted Review cost:

– Senior lawyer (assuming hourly rate of £250 per hour and a review rate of 60 documents per hour) to “train” the software- £27,387.50
– Junior lawyer (assume hourly rate of £100 per hour and review rate of 100 documents per hour – 34,750 documents to review) – £34,728
– Assisted Review software cost (including consulting time) – £9,375

£71,490.50

Total Conventional review cost:
– Junior lawyer reviewing all keyword responsive within date range (with no allowance for QA checks or senior supervision) at an assumed hourly rate of £100  and a review rate of 100 documents per hour

£150,000

 

 Summary

Use of Assisted Review on this project saved the client in excess of £80,000 (over a 50% cost saving).

Other Points Worthy of Note:

  • On this project it was decided that, due to the consistent accuracy of the software’s categorisation of the non-responsive documents, it was not necessary to perform any sampling or further review on these documents.   On other projects, it may be considered worthwhile doing further sampling on this set or to perform additional targeted keyword searches on this set as a way to provide confidence that there are no other highly relevant documents remaining.
  • The accuracy of the software will also depend on what documents are found during the overturn analysis.  For example, the overturn rate could still be less than 1% however if the documents being overturned include important, key, documents then this level of overturn will be unacceptable.  On this project the type of documents being overturned in the final rounds were analysed and ultimately only included non-contentious and “grey-area” documents that were of little consequence to the case if ultimately were not identified as relevant by the software.  Arguably in any human review there is a similar if not higher likelihood that a junior or contract reviewer may miss or incorrectly code documents in this category.
  • In the majority of cases a second pass review on the documents categorised responsive will be necessary because (i) there is the prospect of some non-responsive documents being included and (ii) there s also a need to consider whether any of the material is either privileged, requires redaction or whether other “family members” (typically, families of documents occur where there is an email and multiple attachments) need to be excluded.
  • The decision and approach in relation to those documents categorised is entirely related to the nature of the dispute.  There are other approaches (not used in this project) that would suit a circumstance where each and every document needed to be reviewed by a human and the assisted review process is simply used to accelerate the identification of the particularly relevant documents and not as a substitute for carrying out an exhaustive review of every document in due course..   In such cases the software is typically used to identify highly relevant documents to the senior team early on (i.e. by reviewing only 10% of the documents, the highly relevant documents can be gathered and then presented for the purpose of drafting pleadings, witness statements, chronologies, etc).