Insight into Name Normalization: What Is Your Name?

January 24th, 2022

Assigning a single unique name to identify an individual has created problems that precede the inception of e-discovery. Think of Santa Claus, St. Nicholas, St. Nick, Noel or simply Santa. These variations are associated with a unique individual and yet, our society has been using different names to refer to him since before we were born. In e-discovery, identifying individuals with information relevant to a dispute is one of the first and most important steps we go through at the inception of a case. As such, the identification of the different names for these individuals is critical to ensuring we do not miss potentially relevant files.

In addition to the variation of names, the digital nature of information adds the complexity of format to the text. The electronic format of names and email addresses as they appear in headers or other communications may differ and the differences affect the way the name is indexed and captured. Therefore, it is also important to identify and consolidate these format variations.

Where should we look for variations of names?

Different sources of ESI may display names differently. If you are planning to collect email, then you should ask for all relevant email addresses – both business and personal (if the employee uses personal email to conduct business). If you are planning to collect social media, then you will need the usernames or aliases for the different platforms. You will also need the network usernames and any other usernames that the company uses for the employee (including chat platforms and phone numbers for mobile devices). You’ll also need to know if the employee’s name has changed over time or if the employee is referred to by any nicknames.

Can we cross-reference names in an automated way?

E-discovery platforms have name normalization tools that isolate and consolidate information in email headers and other metadata. This is an automated process that scans and associates the name variants, aliases and email addresses for individuals referenced in the data set. These associations are captured and the output should be available when reviewing documents.

Is this a “data cleansing” step?

Data cleansing is the process of organizing data to appear similar across all records and fields by (1) detecting and correcting correct or inaccurate records from a data set; and/or (2) bringing together data of varying file formats, naming conventions and columns and transforming it into one cohesive data set (such as revising the abbreviations “st.” and “rd.” to be “Street” and “Road” throughout). Name normalization can definitely be a data cleansing step but I recommend tackling names at the inception of a case so that your data identification, collection and review searches include all possible name variations for relevant individuals (e.g. last name changes after marriage/divorce, nicknames and app-specific usernames).

Is the “normalization” process only for names?

The algorithm that evaluates the email headers for names and aliases also evaluate the entities that the person belongs to and standardizes the names of entities as well. Identifying entities is extremely helpful when analyzing communications and using data visualization tools.

What are the benefits of name normalization?

Name normalization:

Builds an inclusive set for key custodians.
Allows visualization tools to display information in an organized way.
Is the basis for communications analysis and the identification of additional custodians, search terms and relevant time periods.
Optimizes privilege and second level reviews.
Can be used as a tool to prioritize review based on custodians or potentially privileged data sets.
Optimizes search execution and results.

What are the drawbacks of name normalization?

The main drawback is that the normalization process may not include all variations because not all names have the same structure. In the US, for example, some names have middle names, a hyphenated structure, same first and last name or just initials instead of names. The variations in the structure should include variations in the first name as well (e.g., Elizabeth, Beth and Betty) or nicknames. Additionally, not all countries or cultures use the same rules for names. For example, in some countries, marriage and divorce may result in someone’s last name being changed in a variety of ways; however, this may not be the case in other countries.

Conclusion – Use it or Not?

Name normalization is a process that should be implemented in your e-discovery databases as it reduces redundancy, improves the quality of the metadata and optimizes document review.

DISCLAIMER: The information contained in this blog is not intended as legal advice or as an opinion on specific facts. For more information about these issues, please contact the author(s) of this blog or your existing LitSmart contact. The invitation to contact the author is not to be construed as a solicitation for legal work. Any new attorney/client relationship will be confirmed in writing.

Topics: E-Discovery Best Practices Name Normalization Data Cleansing LitSmart KTLitSmart KT LitSmart project manager E-Discovery Project Manager Relativity

Newest Posts

Spoiler Alert! Another Legal Update on Data Preservation and Spoliation Implications

There appears to be a recent theme on this blog regarding data preservation and spoliation, and—not to spoil anyone’s appetite for this important topic—we are back with another one. And for good reason given the heightened risk of spoliation sanctions in today’s increasingly data-driven legal landscape. A recent order in Safelite Group, Inc. v. Lockridge is one of many that highlights the growing need to stay apprised of the various steps necessary to ensure compliance with essential data preservation requirements.

Ignorance might be bliss, but it is not a defense. This is especially true as it relates to one’s duty to comply with a litigation hold. To avoid potential Rule 37(e) sanctions, attorneys must be familiar with the preservation steps needed for basic sources of ESI and take care to ensure that their clients understand the same.
Blurred Lines: Personal Devices, Proportionality, and Piercing the Work Product Privilege

In a fairly short opinion and order, the district court in Weston v. DocuSign, Inc. analyzed whether the parties were entitled to the production of text messages from former employees’ personal devices and potential piercing of the attorney work product privilege. The issues in this opinion are not necessarily novel but illustrate significant concerns for litigants.

In a world where the lines between our personal and private lives are increasingly blurry, the possibility of discovery on personal devices should come as a surprise to no one, and it is, of course, a litigation disaster to have the work product privilege protections pierced and to be ordered to turn over attorney notes, witness lists, and witness communications on the very subject of the litigation. So, what is the take-away for litigation counsel with respect to protecting the work product privilege?
Planting the Seeds of Accountability for Spoliation Sanctions

When seeking sanctions for spoliated evidence, the nature of the evidence and your jurisdiction can play a pivotal role. Are you in state or federal court? Is the missing evidence electronically stored information or not? The same facts and circumstances could yield vastly different outcomes depending on the answers to those questions. It is important to recognize up front, at the start of your case, how your jurisdiction may impact discovery issues that could arise later down the road so that you can plan accordingly. In the case in this post, while the court did not ultimately affirm the imposition of an adverse jury instruction for spoliation of evidence, it did find a duty to preserve existed based not only on the parties’ contract, but on evidence the party in question had promised to preserve such evidence. By contrast, the insurers failed to demonstrate that same party owed them a duty to preserve.

Where should we look for variations of names?

Can we cross-reference names in an automated way?

Is this a “data cleansing” step?

Is the “normalization” process only for names?

What are the benefits of name normalization?

What are the drawbacks of name normalization?

Conclusion – Use it or Not?

Subscribe to the E-Discovery Newsletter

Related Posts

Data Mapping - Why is it Important for Successful E-Discovery?

Pitfalls of Complex Search Protocols in ESI Agreements

Newest Posts

Spoiler Alert! Another Legal Update on Data Preservation and Spoliation Implications

Blurred Lines: Personal Devices, Proportionality, and Piercing the Work Product Privilege

Planting the Seeds of Accountability for Spoliation Sanctions