Home /Duplicates are Not Your Friend

Duplicates are Not Your Friend

Every data set that I work with needs to be “fixed” in some way shape or form. I may need to pull out invalid characters, merge certain data points or format cells properly for analysis. There is one operation that I use pretty frequently it is removing duplicates. In the field of database design, this is known as “normalization”. It this process of removing redundant data. If you are working in Excel there is a readily available tool that can assist this task.

Table with duplicate rows

Table with duplicate rows

Here we have a data set containing repeating employee names and ID’s.  In total there are 384 rows in this data set.  In order to get a condensed list showing one occurrence for each name and ID, we need to do the following.

  1. Highlight the table the entire table
Duplicate Row Path

Duplicate Row Path

3. Click the Data menu in the ribbon

4. Click Remove Duplicates

Duplicate row result

Duplicate row result

Note: A message box will appear letting you know how many duplicates were removed and the number of unique values remaining.  It is wise to review the number to see if it makes sense.

5. Click Ok

These brief few steps can save you hours summarizing data.




About the author



Othniel Denis is currently Principal of Excellent Ones Consulting LLC. He brings with him a background in finance, training, and information management systems. Prior to opening his own firm, Othniel spent the last 13 years in finance as an Analyst at organizations like New York University, Nassau County Government and Brookhaven National Laboratory to name a few...

Read More

Comments (0)

New Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Latest Tweet