Duplicate Detection in Excel: The Case-Sensitive Conundrum

Photo of author
Written By Chris

Just a simple "data guy". Changing the world one formula at a time.

Managing large datasets in Microsoft Excel makes the seemingly difficult task a quick possibility, including identifying and removing duplicate entries. One common question that arises when using the Remove Duplicates feature is whether it’s case-sensitive. In other words, will the tool consider “Smith” and “smith” as duplicates or different values?

To clarify this issue, Excel’s Remove Duplicates feature is, by default, not case-sensitive. It means that if you have two entries like “Apple” and “apple” in your dataset, Excel will treat them as duplicates and remove one.

However, there are methods to make the comparison case-sensitive if needed. Stick with me, and you’ll soon learn the easiest way of tackling Excel’s Remove Duplicates functionality and understand the difference between case-sensitive and case-insensitive removals.

The Excel Remove Duplicates Function

If you’re unfamiliar with Excel’s ‘Remove Duplicates’ feature, it’s available in Microsoft 365 and Excel 2007, 2010, 2013, 2016, 2019, and 2021. To use this function, follow these steps:

  1. Select the data in your table that contains duplicate values.
  2. Go to the ‘Data’ tab in the Excel toolbar.
  3. Click on ‘Remove Duplicates.’
removing duplicates in excel is not case sensitive

By doing this, Excel will remove duplicate values based on the selected columns, making it easier for you to work with unique data. But as you’ve learned, case sensitivity is a nonfactor in Excel’s eyes.

In the image below, you’ll see that after I created a conditional formatting rule to show duplicates, every item on my list was highlighted. Yikes!

duplicates with conditional formatting

So what are your options? How can you solve this problem without painfully searching row by row?

How to Remove Case-Sensitive Duplicates in Excel

Removing case-sensitive duplicates in Excel is relatively straightforward. You can achieve it with minimal hassle with a few extra simple steps. There are several methods to do this, including using conditional formatting, array formulas, and even Kutools and VBA. However, let’s stick to the basics to keep our solution simple. 

Using Formulas to Remove Case-Sensitive Duplicates

This is a great time to practice your nested Excel formulas on a data query everyone deals with. We can identify case-sensitive duplicates in Excel lickity split by using a helper formula!

The best and easiest method to remove case-sensitive duplicates in Excel is to use a nested formula consisting of SUMPRODUCT and EXACT. For my example dataset, the correct syntax is:

=SUMPRODUCT(–EXACT($G$3:$G$11,G3))>1

correct case sensitive formula

Dissecting The Solution Formula

In simpler terms, this formula checks if the value in cell G3 appears more than once in the range G3:G11 (including itself), taking into account the case sensitivity of the text. If it does, it returns TRUE, indicating that it’s a case-sensitive duplicate.

But let’s take each section of the formula and analyze what is happening through the process.

The SUMPRODUCT function calculates the sum of the products obtained by multiplying corresponding values from two or more arrays. In this formula, we’re using it to compare the values in the range G3:G11 to the item in cell G3, or our lookup value.

The double negative sign (–), also known as the double unary, converts the result of the EXACT function to a number. The EXACT function analyzes two text strings and returns TRUE if they are duplicates and FALSE otherwise.

The EXACT function is comparing the value in cell G3 to each value in the range G3:G11, one at a time. If the values are exactly the same (including capitalization), it returns TRUE; otherwise, it returns FALSE.

The SUMPRODUCT function then adds up all the TRUE and FALSE results returned by the EXACT function. Since TRUE is treated as 1 and FALSE is treated as 0 in Excel (thanks to the double negative), the result will be the number of times the value in cell G3 appears in the range G3:G11 (including itself).

Finally, the formula checks if the result of the SUMPRODUCT function is greater than 1. If it is, it means the value in cell G3 appears more than once in the range G3:G11 (including itself), and therefore it’s a case-sensitive duplicate. The formula returns TRUE if this is the case and FALSE if not.

Since we applied absolutes to our reference array, by dragging the formula down, the lookup value, or G3 in this example, changes to check each cell against the others in the list.

To get a TRUE value, I changed one of my items to match the case of another.

true or false result of formula

Filtering and Removing Duplicates

Now that we’ve identified our duplicates in the list, let’s use a filter to remove all values except those that are unique.

To do this, select any cell in your data and hit ALT + A + T to apply filters to your headers. Now filter your formula column to only show the values of TRUE, or your duplicates.

filtered on true

Next, select ONLY the visible cells of your list, NOT the formula column.

We will use the Remove Duplicates function, and even though the helper column shows TRUE for your duplicates, Excel won’t remove them since the formulas are unique.

With your duplicate list items chosen, select’ Remove Duplicates’ under the Data tab. A new window will appear, asking if you want to ‘Expand the selection.’ Unselect this option and choose ‘Continue with the current selection.’ Otherwise, Excel will revert back to deleting all duplicates without factoring in the case sensitivity.

removing case sensitive duplicates

Click ‘Remove Duplicates’ in the window and ‘Ok’ in the following. Now unfilter your Formula column, and you should now find your list full of unique values and no more instances of TRUE in your helper column.

unique values list

Sure, you could have just deleted that row, and that’s what I would have done. But it’s important to understand the steps as it will save you a ton of time when working with a much larger set of data.

Give yourself a round of applause! Nested functions can be confusing, but you just learned a great way to utilize them for a task I experience daily. Bye bye case sensitive duplicates ✌

Leave a Comment