Dealing with duplicates

A few customers have recently contacted us with examples of duplicate data being entered into the system, by different people. We though it would be good to try to address this for everyone at a system level.

Examples could be

  • As a purchaser, you start dealing with a new supplier and enter them onto the system, without realising that the finance dept. has already put their details in
  • You’re running marketing and sales campaigns from your CRM system. The marketing dept. import a list of 2000 of your top target companies, whilst salespeople enter individual companies they deal with that are already on that list
  • You’re a pastries manufacturer and there’s a sudden craze for cronuts – half a dozen customers ask if you can make them and a development recipe gets entered multiple times

Now in a perfect world, none of these scenarios would happen – people would check for existing records before adding new ones and everyone would spell names exactly the same.

In real life, it can happen from time to time and when it does, it can be really annoying and cause a lot of work for staff to sort out. It’s not just a matter of removing duplicates, because once say a company is in the system, all sorts of things get linked to them, like invoices, opportunities, contacts etc., all of which have to be un-picked.

So what do we do about it?


One thing that we’re introducing is a feature of the agileBase platform that automatically tells you if there are any similarly named items already in the system as you type. Here for example, I’ve started typing ‘Invest Bath Bristol’ and the system’s let me know that there’s already a company in there called ‘Invest Bristol Bath’, which is actually the one I’m thinking of.

This helps get around the fact that people often refer to company names slightly differently, so even if you force a field to be unique, that check won’t always catch near-duplicates.

How does it work?

Through an interesting and clever idea known as ‘trigrams’.

A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages.

The facility to compare text snippets like this is built into the PostgreSQL database that agileBase uses to store and query data. For more details, see the database documentation.

Activating the feature

This facility will work on any text field that has the options ‘use as record title’ and either ‘prominent’ or ‘required’ ticked in the field options. An administrator can set this up.

A starting point

On it’s own, this feature is unlikely to solve the duplication problem in all cases, however as usual we like to give customers useful features as soon as they’re ready, while acknowledging they will continue to improve as this isn’t the end step.

Next steps for this feature might be to allow you to click on a suggested similar record to navigate to it, while deleting the new record that you started to type. However we obviously want to be careful to take a lot of work planning and testing before releasing that update!

If you have any feedback, please let us know what you think at [email protected]



Source: Agilebase