card-1673581_640

How to Parse Data in Excel: DBA Names

If you deal with credit card transactions a lot then you know that trying to get the vendor name out of the descriptions can be challenging to say the least. Companies can sometimes cram a lot of information into their doing business as (DBA) name that can show up on your statement. Information such as store number or even the contact phone number can show up in there. The problem is that when trying parse this data in Excel, it can get a little messy.

However, I’ll show you how you can pull valuable information out of the data just by using a formula. Because there are no rules around DBA names, there’s no way that will work 100% as ultimately it’s up to the company to determine how the data shows up.

Quick side note – I’ve seen some interesting variations when it comes to DBA names, especially on corporate cards when some less-than-savory adult establishments tried to decoy themselves as restaurants. For example, it took some digging when I was examining expense reports to find out that a DBA name of Glenarm Restaurant in Denver was actually a strip club, the Diamond Cabaret. So yes, there’s not even a guarantee that the DBA name will reflect what’s even the name of the company. It’s a clever (and sleazy) way for an establishment to disguise itself as something else, especially say if someone’s spouse were to catch glimpse of their credit card statement…

…And we’re back from commercial break.

So as you can see from the above example, there’s an inherent limitation when it comes to using DBA names. However, we can still find ways to pull useful data from the majority of names that have at least predictable patterns and honest names in their descriptions. Here are some items from my latest credit card statement:

When it comes to parsing data it’s all about patterns. And there are a few things that stand out from the above list.

  • The vendor name shows up at the beginning of the description.
  • For those that have a store #, anything to the right of it is store and location level data and unnecessary for pulling the vendor name.
  • For those that don’t have a # we can usually just take the first couple of words as in most cases a vendor name won’t go to three words, and at the very least ,two should be enough for us to figure out who the vendor is.

So what we’ll have to do now is to build a formula that accounts for these assumptions to effectively create a formula that will parse it out.

First, let’s create a formula that will pull everything up until the # sign. A good test will be the fast food transactions, which typically have store numbers. To do this I’ll rely on the MID function as well as the FIND function. Here’s how it looks like:

=TRIM(MID(B1,1,FIND(“#”,B1,1)-1))

In the above example, assume that column B is where the data is stored in. Since the name begins in the first position, the second argument is the number 1, and the third argument is up until where the # sign is and that’s the point of the find function. A -1 is deducted to ensure the # sign itself isn’t included. That way, the description will grab everything up to just before the # sign. To clear off any trailing space, we can also the TRIM function to make sure no extra spaces are included in the results.

That takes care of the descriptions that have a # sign, but for the ones that don’t, it’ll result in an error. This is where we’ll want to create another formula to pull the first two words. The formula for that looks like this:

=MID(B1,1,FIND(” “,B1,1+FIND(” “,B1,1))-1)

This is a bit more complicated of a formula because it’s looking for the second space within the string. To do that, I have to the second FIND function picking up where the first one left off at is it starts looking for the blank cell after the first one is found (this is why the starting point is 1+ where the first blank cell is found). For the length of the string, here too we’ll want to add a -1 to the end to make sure that the blank space isn’t picked up.

So now we’ve got two formulas, and the next step is going to be to combine them into one. To do that, I’ll add an IF function to say if there is a # sign found within the text, to use the # sign formula, otherwise to use the two word formula. The IF condition looks like this:

=IF(ISNUMBER(FIND(“#”,B1,1)),1,0)

I will now replace the 1 or true argument with the # formula, and the 0 with the two word formula. And here’s what we’ve got so far:

=TRIM(IF(ISNUMBER(FIND(“#”,B1,1)),MID(B1,1,FIND(“#”,B1,1)-1),MID(B1,1,FIND(” “,B1,1+FIND(” “,B1,1))-1)))

The one change I made was I moved the TRIM function at the beginning of the formula to include all the arguments.

Lastly, I’ll want a catch all just in case I run into a DBA name that has fewer than two spaces in its name and no # signs. To do this, I’ll just use an IFERROR function at the beginning and the default will then be to just use the entire cell’s contents:

=IFERROR(TRIM(IF(ISNUMBER(FIND(“#”,B1,1)),MID(B1,1,FIND(“#”,B1,1)-1),MID(B1,1,FIND(” “,B1,1+FIND(” “,B1,1))-1))),B1)

It’s gotten to be a pretty big formula but this will do a pretty good job of getting the vendor names pretty well. Like I mentioned, it won’t be perfect, but at the very least it’ll do a good job of pulling out most of the vendor information that you need.

In my example, the DBA name was in cell B1, but you can just do a quick find and replace on this formula to adapt to your data set and then just copy the formula as you need it.

Here’s how my data looked like using this formula:

Looks like a pretty good job by the looks of it. At the very least, the vast majority look to be unique enough in case I wanted to group them.

This is part of an upcoming template that I’m working on to help quickly analyze expenses, and a key part of that is being able to efficiently pull out vendor data from your statement.

If you have any questions, comments or suggestions please let me know.

Tags: No tags

Add a Comment

You must be logged in to post a comment