Top 10 eDiscovery Terms You Should Learn Today

Apr 26, 2018 4:00:00 AM / by Jennifer Roberts


Basic & Essential eDiscovery Terminology, Complete with Definitions in Context

The modern practice of law is a labyrinth of documentation. Even average-sized cases can require litigators to review thousands of documents at a time, ranging from dense PDFs or complex image files to monotonous emails and endless social media posts.

In larger cases, the number of documents can reach the tens of thousands or higher, sometimes in a single gigabyte! And yet the attorney team is expected to review all of it.

Welcome to eDiscovery. It’s become a critical part of the legal profession, and yet there’s very little talk of it in law school.

Unsurprisingly then, many JDs emerge from the bar exam with a passing score but virtually no clue what their more experienced colleagues mean when they say things like “de-duping” or “spoliation.”

Maybe you’re one of them.

Have no fear. We’re here with easy-to-understand definitions for the ten most common and important eDiscovery terms.

Mind you, there are hundreds of terms in the eDiscovery professional’s lexicon, and we could write a whole book on each of these ten alone. But concise definitions are a good place to start. Let’s dive in.

10 All-Important eDiscovery Terms & Definitions

Batch Processing

“Processing” is one of an eDiscovery professional’s essential tasks. It refers to collecting data, narrowing it down, indexing the data, converting it to an appropriate file format, and extracting metadata (among other things). “Batch Processing” simply means processing a large amount of electronic data in a single step.

Big Data

When litigators talk about “big data” in reference to large-scale corporate parties, they are usually describing a set of data so expansive that even the standard tools of batch processing and database management aren’t up to the task.

For example, imagine a hypothetical class-action lawsuit involving a giant social media company, streaming video service, or even a government agency like the NSA. These are tech-reliant organizations that amass tremendous amounts of data on many millions of users — sometimes with separate data entries regarding time, geolocation, user activity, and other information for each user each time they interact with the company.

In today’s increasingly digital world, big data litigation is becoming more common, and it represents a real challenge for even the most well-equipped law firms.

Fortunately, with the right processing and analysis tools in place (and the right eDiscovery professionals assigned to the job), it is entirely possible to tackle big data and come away with relevant, interesting, and ultimately useful insights.


The custodian of a file is the individual with whom a specified file originated. The custodian isn’t always the author or creator of the content in the file or even of the file itself. Rather, “custodian” may refer to:

• The person who created the electronic file; or

• A person who controlled the electronic file (or filing system) from which the record was extracted.

When litigators refer to the custodian of a given file, they are usually using the term in the second sense.


When reviewing documents during the discovery process, it isn’t uncommon to find multiple files that are essentially duplicates of each other. De-duping (or de-duplication) is the process of removing these duplicate documents from view by either hiding or deleting them.

De-duplication isn’t just the technical process of hiding or deleting files, however. It also entails careful discretion by the eDiscovery professional, because removing a file that might be relevant to the case or a discovery request could lead to trouble (see “spoliation” below).

How do you define “duplicate”? What if two documents are 99% the same? 98%? 95%? These are the questions eDiscovery professionals must approach carefully.

Ultimately, de-duping can be a valuable means of reducing review time, minimizing discovery costs, and increasing compliance and consistency. But it is best left to experienced practitioners who are familiar with the various technical and ethical demands.


Before we get to De-NISTing, let’s define NIST. It stands for the National Institute of Standards and Technology, the organization responsible for maintaining the National Software Reference Library (NSRL). The NSRL acts as a “master list” of computer applications, including common system files that might turn up during eDiscovery, but which are not important or relevant in any way.

DeNISTing is the process of separating computer-generated or application-generated files (e.g. system files, program files, file extensions, etc.) from potentially relevant user-generated files by using the NSRL master list.


ECA stands for Early Case Assessment. It involves reviewing documents early in the litigation process to determine whether a case is worth taking to trial and, if so, how much discovery might cost and what it might entail. During ECA, an eDiscovery expert can help a firm make a careful risk analysis, set aside irrelevant files, and get a broad sense of the ESI involved (see below).


ESI stands for Electronically Stored Information. It is a ubiquitous term in the world of electronic discovery, inclusive of (but not limited to):

• Audio / video files

• CAD files

• Databases

• Electronic documents

• Emails

• PDFs

• Scanned documents

• Social media posts

• Voicemail

• Websites and other web content

Native Format / Native File

A native file is one still in its original format.

eDiscovery experts frequently convert files to different formats. But in some cases, it is important to have a record of a file’s native format — or to preserve the native file itself — so that potentially relevant metadata or data attributes aren’t lost or obscured during conversion to a different file format.

Predictive Coding

After a document review professional has reviewed a set of data and tagged or coded its contents and metadata, he or she may then use a process called “predictive coding” to predict how additional data will be coded or tagged in the same legal matter. Predictive coding works by analyzing the coding work already done and extrapolating from those decisions.

It is important to note that while eDiscovery experts might use software to aid in making those extrapolations (sometimes referred to as “machine learning”), the term “predictive coding” refers to the process itself and not the software.


Spoliation refers to destroying, hiding, withholding, fabricating, or otherwise altering evidence that might be relevant to a case, usually during the discovery process. Spoliation is a serious legal violation. In some jurisdictions, it is a crime and can result in a prison sentence.

Attorneys and eDiscovery experts must be extremely careful to avoid spoliation during processes such as de-duping and deNISTing, for example. That’s because spoliation can happen by accident, and the “spoliator” may still be on the hook for acts of negligence or omission.

There’s a Lot to Know About eDiscovery

Electronic discovery has truly developed into a world all its own, complete with a language that might seem foreign, even to many attorneys. But there’s good news – law firms don’t have to manage the risks and challenges of discovery on their own. LexInsight connects litigators with experienced eDiscovery experts. Learn more.

Join LexInsight


Topics: For Candidates

Jennifer Roberts

Written by Jennifer Roberts

Jennifer Roberts is the Director of Marketing for LexInsight. When she's not spreading the word about LexInsight, you can find her running half marathons, dancing Argentine tango, or writing about wine.

Subscribe to Email Updates

Recent Posts