How to find duplicates in Excel


redo-remove-duplicates-excel
Image: Viktor Pazemin/Adobe Stock

In the duplicate world, definition means everything. That’s because a duplicate is subjective to the context of its related data. Duplicates can occur within a single column, across multiple columns or complete records. There’s no one feature or technique that will find duplicates in every case. In this article, I’ll show you how to find duplicates in Excel.

SEE: Windows, Linux, and Mac commands everyone needs to know (free PDF) (TechRepublic)

I’m using Microsoft 365 on a Windows 64 bit-system, but you can use earlier versions. For your convenience, you can download the demonstration .xlsx and .xls files. Excel for the web doesn’t support advanced filters and limits formatting, but you can apply conditional formatting rules.

How to filter for duplicate records in Excel

Perhaps the easiest way to find duplicates is to use Excel’s advanced filter. It’s flexible and can easily find duplicate rows. What you do with the result is up to you. For example, we’ll use an advanced filter to copy the resulting records — sans duplicates to another location. That way you still have the original data and a separate set of unique records.

In this situation, the word find is a bit misleading. This feature won’t find the duplicates: It will filter them from the results, giving you a unique set of records.

Let’s look at a sheet that has two records that are duplicated in a Table object (Figure A). I recommend using Tables, but this feature will work with an ordinary data range. Even in a small sheet, finding duplicates visually is a bit of a task, and you’re apt to fail. To temporarily remove duplicates from the data set, use Excel’s advanced filter feature as follows:

  1. Select any cell inside the data set.
  2. Click the Data tab and then click Advanced Filter in the Sort & Filter group.
  3. Select Copy To Another Location in the Action section.
  4. Check the List Range to make sure Excel correctly references the original data.
  5. Enter a copy range in the Copy To control. I chose H2.
  6. Check Unique Records Only (Figure B) and click OK.

Figure A

ExcelDuplicateRedo A
Image: Susan Harkins/TechRepublic. These Microsoft Excel records have two duplicates in the Table.

Figure B

ExcelDuplicateRedo_B
Image: Susan Harkins/TechRepublic. Be sure to select the Check Unique Records Only option.

Figure C

ExcelDuplicateRedo_C
Image: Susan Harkins/TechRepublic. The filtered list is short two records because this Excel feature removed duplicates.

Excel will copy a filtered list of unique records (Figure C) to the range you specified in Step 5. At this point, you can replace the original data with the filtered list if you want to remove duplicates. As a rule, I don’t recommend deleting data, even if you think you’ll never refer to it again, but that’s up to you.

One thing you might not notice right away is that the Commission values in the filtered set are literal values. In the original data, that column contains an expression. Be on the lookout for these types of issues — if you plan on using the filtered set going forward, you must replace the values with the expression so new records will correctly calculate the commissions.

How to format duplicate values in Excel

Finding duplicates in a single column or across multiple columns is a bit more difficult than filtering for an entire record. Using Excel’s conditional formatting to highlight duplicates in a single column is one way to find them quickly, although there’s less motive to delete duplicates in this situation. Let’s format duplicate commission values as follows:

  1. Select cell F3:F13.
  2. On the Home tab, click Conditional Formatting in the Styles group.
  3. Choose New Rule in the dropdown.
  4. In the top pane, select the Use A Formula to Determine Which Cells to Format option.
  5. In the lower pane, enter =COUNTIF(F:F,F3)>1. (The period at the end is grammatical and not part of the Excel formula.)
  6. Click the Format button, click the Font tab, choose Red, and click OK (Figure D).
  7. Click OK to return to the worksheet.

Figure D

ExcelDuplicateRedo_D
Image: Susan Harkins/TechRepublic. Enter the Excel formula and choose a format.

Figure E

ExcelDuplicateRedo_E
Image: Susan Harkins/TechRepublic. The conditional formatting rule highlights duplicate commissions.

The conditional format will highlight any value in column F that’s repeated (Figure E). The Excel function, COUNTIF() returns a conditional count. In this case, it compares the current commission to all the other commission values and returns True if there’s more than one. If you want Excel to highlight only the copies, leaving the first occurrence of the value unaltered, enter the Excel formula =COUNTIF($F$3:$F3, F3)>1 in Step 5.

This conditional rule works great for a single column. How can we find duplicate values across multiple columns? For this task, we’ll use two Excel helper formulas: One to concatenate the columns you’re comparing; a second to count the duplicates. Let’s suppose you want to find duplicates for the name and commission. To begin, enter the first expressions into H3 and copy to the remaining cells:

=Commissions8[@Personnel]&Commissions8[@Commission]

The structured referencing is the result of using a Table object to store the data. If you’re using a data range, enter =D3&F3.

Next, in cell I3 enter the following Excel formula and copy it to accommodate the remaining list:

=IF(COUNTIF(H3:H13,H3)>1,"Duplicate","")

There are now two helper columns in place (Figure F). The Excel IF() function returns “Duplicate,” when the Excel function, COUNTIF() finds more than one occurrence of the concatenated values in column H.

Figure F

ExcelDuplicateRedo_F
Image: Susan Harkins/TechRepublic. We’ll base a conditional formatting rule on column H.

You could stop here or apply a new conditional format based on the Excel formula in column I to highlight duplicates as follows:

  1. Select cell B3:F13.
  2. On the Home tab, click Conditional Formatting in the Styles group.
  3. Choose New Rule in the dropdown.
  4. In the top pane, select the Use A Formula To Determine Which Cells To Format option.
  5. In the lower pane, enter =$I3="Duplicate".
  6. Click the Format button, click the Font tab, choose Red, and click OK (Figure G)
  7. Click OK to return to the worksheet.

Figure G

ExcelDuplicateRedo_G
Image: Susan Harkins/TechRepublic. Enter the rule and format.

The conditional rule highlights the first occurrence of a duplicate in both the Name and Commission columns (Figure H). Because the rule formats the entire record, users might assume that the entire record has a duplicate, so that would require a bit of training. In the second helper formula, you could display DuplicateNameCommission instead of only Duplicate, but that seems a bit over the top.

Figure H

ExcelDuplicateRedo_H
Image: Susan Harkins/TechRepublic. Two records have duplicate values in both the Personnel and Commission columns.

None of these techniques works with all types of duplicate data. Each situation, whether you’re comparing an entire record, a single column or multiple columns will dictate which method you choose. There are other ways to find duplicates, but these methods are quick and easy.

Editor’s note: This article has been updated.



Source link

Spread the love