Every data set that I work with needs to be “fixed” in some way shape or form. I may need to pull out invalid characters, merge certain data points or format cells properly for analysis. There is one operation that I use pretty frequently it is removing duplicates. In the field of database design, this is known as “normalization”. It this process of removing redundant data. If you are working in Excel there is a readily available tool that can assist this task.
Here we have a data set containing repeating employee names and ID’s. In total there are 384 rows in this data set. In order to get a condensed list showing one occurrence for each name and ID, we need to do the following.
- Highlight the table the entire table
3. Click the Data menu in the ribbon
4. Click Remove Duplicates
Note: A message box will appear letting you know how many duplicates were removed and the number of unique values remaining. It is wise to review the number to see if it makes sense.
5. Click Ok
These brief few steps can save you hours summarizing data.