Parts List Matcher

Parts List Matcher

Clean up vendor data by using a fuzzy match algorithm to reconcile make/model names.

Challenge

The client’s team maintained constant contact with manufacturers and vendors, passing parts lists back and forth. They needed to match these up against internal records to make sure everything was accounted for.

Frustratingly, the Manufacturer and Model would often come in slightly different.

For example, these two should be the same:

And these models are the same:

Another example:

So if you see the part DC 1200, which isn’t an exact match from your master part list, you’d have to do a lot of manual eye-checking or subjective Ctrl + F searches. Now imagine doing this hundreds of times every day.

Solution

In data cleaning work, we usually perform multiple passes.

The easiest first pass is to remove unexpected whitespaces, and remove common suffixes (such as INC, LLC, etc. in manufacturer names).

From there, because a given part is always tied to a manufacturer, the next step was to concatenate the manufacturer with the model to create a unique ID/key for each entry.

Then, I created several LAMBDA functions that calculated “closeness” scores based on dynamic parameters, such as the source string length and how strict the matching should be.

Sample output of fuzzy matching algorithm

Finally, a comparison sheet would output an aggregated score (based on combining various matching techniques and parameters) with a list of the system’s best guesses.

Outcome

The client now has a simple Google Sheets template where they can paste incoming data, and see the fuzzy matches highlighted within a matter of seconds.

No Google Apps Script or fancy Python scripting required!

Everything is good so far and I don’t have any changes to make on the file.

Need something similar?

Get in touch:

See All Projects