howtofilterdataformulas.png

Dynamically Filter Data Using Only a Formula

In my previous post I went over advanced filters in Excel. This time around I’ll go over how to achieve the same result using just a formula. No macros, no VBA, just through a not-so-simple formula that can dynamically update based on your selections.

I’m again going to use my sample database file for this example. Here’s an excerpt of what that looks like:

Filtering based on one criteria


I’m going to start by filtering all that entries for a specific sales rep.

First, I’m going to use the INDEX function to select the range from where I’m pulling data from.

=INDEX(SampleDatabase!$A$1:$G$1000

For my results, I’m going to want them to show up in the order they appear in the database. For example, in the excerpt above Rep D shows up on lines 3 and 5, and I want that same order to stay intact.

In order to do this, I’m going to use the IF, SMALL and ROW functions, which will be inserted in the INDEX function.

SMALL(IF(SampleDatabase!$C$1:$C$1000=$I$2,ROW(SampleDatabase!$A$1:$A$1000),””),ROW(A1)),1)

$C$1:$C$1000=$I$2 : In this argument, assume that $I$2 is where I have my sales rep name, in this case it would be Rep D. Because I’m only interested in rows that relate to Rep D, this is the main argument that I want to evaluate.

ROW(SampleDatabase!$A$1:$A$1000), “”) : This will return the row number if the above argument is true. It doesn’t matter whether I reference column A, B, C or any other since I’m only pulling the row number. If it isn’t a match, the result will be a blank value.

ROW(A1) : This returns a value of 1, and what this will accomplish is that it will pull the smallest row number from the above list. For instance, for Rep D we know that lines 3 and 5 will be a match, but the smallest number, or the first time that there is a match, is 3. As I drag this formula onto subsequent lines, the row number, because it isn’t frozen, will change and on line 2 it will pull the second smallest row number, on the third line it will be the third smallest, and so on.

The last argument is which column you want to extract. I left it as 1, and that will return the date since that is the first column in my INDEX argument. However, if I wanted to pull the total sales, I could change that to 7, since that would indicate column G, which is the seventh column in the data set that I specified.

The completed formula will look as follows:

=INDEX(SampleDatabase!$A$1:$G$1000,SMALL(IF(SampleDatabase!$C$1:$C$1000=$I$2,ROW(SampleDatabase!$A$1:$A$1000),””),ROW(A1)),1)

This formula will need to be entered as an array, so be sure to hit CTRL+SHIFT+ENTER.

The first five results look as follows:

The one caveat is that if you don’t know how many entries you’ll have and copy the formula down too far, you’ll inevitably end up with #NUM! errors because the formula has not found any more matches. What you can do in this case is use the IFERROR function and include it in the formula:

=IFERROR(INDEX(SampleDatabase!$A$1:$G$1000,SMALL(IF(SampleDatabase!$C$1:$C$1000=$I$2,ROW(SampleDatabase!$A$1:$A$1000),””),ROW(A1)),1),””)

What this will do is now show a blank value if there are no more matches.

Filtering for multiple criteria


While it’s nice to be able to filter for just one criteria, what if you wanted to look for the entries with multiple conditions? Although this makes our already long formula even longer, it is still possible.
Much of the formula stays the same, and the key to making it work is by changing the first argument in the IF statement. Previously, It was only looking for the Sales Rep to be a match:
$C$1:$C$1000=$I$2

I’ll add another criteria, this time for records that include Product E, and I’ll put the product criteria in the cell below in I3. I will add the following to the formula:
$D$1:$D$1000=$I$3
How I combine the two arguments is by multiplying them by one another:
($C$1:$C$1000=$I$2)*($D$1:$D$1000=$I$3)=1

I add the =1 at the end because if both conditions are true then they will result in a 1 value for that line. For example, in the first condition it will look at whether the sales rep is a match, if it is the value will be true (or 1), and if not, it will be false (or 0). The same will happen if the product matches. 
Therefore, if either one of those conditions is false then a 0 will be returned and the two conditions multiplied against one another will not equal 1.
Below is how the new formula looks:
=IFERROR(INDEX(SampleDatabase!$A$1:$G$1000,SMALL(IF(($C$1:$C$1000=$I$2)*($D$1:$D$1000=$I$3)=1,ROW(SampleDatabase!$A$1:$A$1000),””),ROW(A1)),1),””)
You’ll notice much fewer matches in column L (multiple criteria) than in column J where I was only looking for the sales rep to be a match.
If you go back to the original excerpt I showed, you’ll see that for the 3/21/2017 entry, it was for both Rep D and Product E. If I change the values in column I then my calculations will adjust accordingly.

parsingdata.png

How to Parse Data to Make it More Useful for Data Analysis

how to parse data

In a previous post I have gone over how to use LEFT and MID functions for parsing data, but in this post I’ll go through a specific example from start to finish.

I am going to pull my data from the citymayors website, url as follows:

http://www.citymayors.com/features/capitals.html

The data on this page looks like this:

original data

At first glance this isn’t the most useful data that you can paste into an Excel spreadsheet but I’ll show you how it can be made more usable. First off I will copy the entire data set into a spreadsheet.

original data spreadsheet
It copies in much the same as how it looked on the webpage. The problem is it is not in a format that you can do any analysis on. The structure it currently follows is Country: City (population). The more consistent the data is, the easier it is pull the information out. In this sample there are some inconsistencies but for the most part it follows a logical pattern.
I am going to make the following columns: Country, City, and Population. 
I’ll start with the Country column. For this field I can use the LEFT function. However the country names range in length so I can’t simply take the first x amount of characters. Instead, I have to look at where the colon shows up and stop one character before that. 
I am going to start will cell B2 to analyze cell A2. In order to find the colon I can simply use the FIND function. The formula for this will look as follows:
=FIND(“:”,A2,1)
I insert this formula into the LEFT function so that I get the following
=LEFT(A2,FIND(“:”,A2,1)-1)
What this formula does is look at cell A2, and pull characters until one before the colon (since I don’t want to actually include the colon). I will make one additional adjustment to avoid errors and that is if the cell in A2 is fewer than two characters it will return a blank (rather than an error since it would not find a colon). The formula to check for a length greater than two characters is this:
LEN(A2)>2
Inserted into my earlier formula:
=IF(LEN(A2)>2,LEFT(A2,FIND(“:”,A2,1)-1),””)
This qualifies the cell first by saying only if it is more than two characters long will my formula try to pull data out, otherwise it will leave it as blank (“”). I will copy this formula all the way down my country field. This is what my spreadsheet looks like now:
parsing country data
Next up is the City field. This one is going to be a bit more difficult because I can’t start from the left and have to use the MID function where I will need to search for both the colon (my starting point) as well as the bracket that starts the population field (my ending point). In the MID function I need to specify the start and end point, whereas with the LEFT function it started from the first character in the cell.
The first formula I need to make is to get my starting point. But I’ve already done that in the country field, I can just copy the FIND formula from earlier:
FIND(“:”,A2,1)
In this case I will want to add +2 to the end of it so that it skips over the blank space after the colon and starts at the first character of the city name. My formula currently looks like this:
=MID(A2,FIND(“:”,A2,1)+2
Next I need to find the end point, and similarly I can use the FIND function to find the opening of the parentheses. The formula for this is similar to my earlier one:
=FIND(“(“,A2,1)
I will want to subtract two characters from this so that I do not include the open parenthesis character or the empty space before it. If I insert this formula into the MID function I now have the following:
=MID(A2,FIND(“:”,A2,1)+2,FIND(“(“,A2,1)-2
The problem with this is finding the ( character does not tell me how long the city field is. To get the actual length of the field, I need to subtract the starting point of the field, which is again using the earlier formula to find the colon. My adjusted formula looks like this:
=MID(A2,FIND(“:”,A2,1)+2,FIND(“(“,A2,1)-2-FIND(“:”,A2,1))
I will add the same qualifier to check for a length of two or more characters. My updated formula:
IF(LEN(A2)>2,MID(A2,FIND(“:”,A2,1)+2,FIND(“(“,A2,1)-2-FIND(“:”,A2,1)),””)
If I copy this formula down my spreadsheet now looks like this:
parsing city data
Next is the population field. Again I will use the MID function and I can use the end point of the city field as the starting point for my population field. I am only going to extract the numbers because numbers with text are not useful for analysis. If I wanted to I could pull the million text into another column and then could adjust the numbers accordingly. However in this instance it looks like all the figures are in millions so it is not necessary.
My formula starts as follows with the MID function and the previous formula to find the ( character:
=MID(A2,FIND(“(“,A2,1)+1
I added the +1 again so that it starts from the number rather than the ( itself. Next I need to find the length so I need end point for which I can use the FIND function again. This time I can just look for the empty space that comes after the number. So far I have the following:
=MID(A2,FIND(“(“,A2,1)+1,FIND(” “,A2,1)
The ” ” indicates a blank space. The problem here is I cannot start from the first character because it will find the first space. If the country has a space it will return a value from there, and if not there it will pull the space that comes right after the colon. What I need to do is change the value of 1 to where the ( is found. The updated formula:
=MID(A2,FIND(“(“,A2,1)+1,FIND(” “,A2,FIND(“(“,A2,1))
This will now make sure it retrieves the first space after the ( character, which is what I want. I could have made it simpler and just looked for the word ‘million’ but that would not work for instances where the word did not show up (and I also wanted to show a more complicated example). Next, I need to subtract the starting point so that the length is correctly calculated:
=MID(A2,FIND(“(“,A2,1)+1,FIND(” “,A2,FIND(“(“,A2,1))-FIND(“(“,A2,1)-1)
I deducted one at the end because I did not want to include the space after the number. However there is still one problem. Even though I extracted a number it is still text. I can convert it to a number simply by multiplying the result by one:
=MID(A2,FIND(“(“,A2,1)+1,FIND(” “,A2,FIND(“(“,A2,1))-FIND(“(“,A2,1)-1)*1
Now the number aligns to the right of the cell, indicating it is a number rather than text (which aligns to the left). I will add my qualifier for the length of the cell:
=IF(LEN(A2)>2,MID(A2,FIND(“(“,A2,1)+1,FIND(” “,A2,FIND(“(“,A2,1))-FIND(“(“,A2,1)-1)*1,””)
Unfortunately the data set is not perfect and in some cases there are text in parentheses so I would want to correct any of those cells – which should be easy to find since they result in errors. Alternatively, I could in the meantime use an IFERROR function to make any errors result in a 0 value:
=IFERROR(=IF(LEN(A2)>2,MID(A2,FIND(“(“,A2,1)+1,FIND(” “,A2,FIND(“(“,A2,1))-FIND(“(“,A2,1)-1)*1,””),0)
Copying the formula to all the cells my spreadsheet now looks like this:
parsing numbers
Using the IFERROR allows you to make the data usable for data analysis. And at the same time because you wouldn’t expect a population to be 0, you can still easily find error cells.
Once you are done parsing your data, I suggest copying and pasting it as values. This ensures you are not dependent on the original data. Once you have done that you can also also eliminate any blank values in the Country, City, or Population fields. This will allow you to have an unbroken data set that you can easily filter or use in a pivot table. My completed data set after these changes looks like this:
parsing data for analysis
The key thing to remember is that the original data needs some consistency in it before you can use a formula to be applied to it. If there is no consistency or has a lot variations to it, the more complicated your formula would need to be to pull what you need from it.  In those situations I prefer to use Visual Basic just because of the complexity that may be involved. This data set was fairly consistent and still involved some long, complex formulas to extract data from it. 

Custom Function: Extract All Matching Data

The purpose of this function is to act as multiple vlookups to pull all data that has multiple matches, and allows the user to specify how they want the data delimited

The function is called EXTRACTALL and it has four arguments:
Argument 1: Looup Field. This is the value that you are looking for. If it is text, make sure it is in quotes
Argument 2: Lookup Range. The range the function is going to search to find the value specified in argument 1
Argument 3: Output Range. If a match is found in the lookup range, this is what will be returned. The ranges should be the same length
Argument 4: Separator. This is how the results will be separated.

An example of the formula is below:

=EXTRACTALL(“A”, B:B,A:A,”,”)

This will look for the letter A in column B, and when there is a match the value from column A will be pulled. All values will be separated by commas.

You will note the values that correspond to A’s are 1, 3, 5, 8, and 9 which is what the result displays in cell D11. This acts effectively the same as a lookup function might with the difference being this will pull every match and put it into one cell whereas a lookup will just grab the first match and nothing else. This is more useful in the case of text values because after using this formula (especially if a comma separator is used) you can convert from a comma delimited field into multiple fields.

Below is the code:

Function extractall(lookupfield As String, lookuprange As Range, outputrange As Range, separator As String)

Dim cl As Range
Dim counter, offsetcol As Integer

counter = 0
extractall = “”


offsetcol = outputrange.Column – lookuprange.Column


For Each cl In lookuprange

    If cl = lookupfield Then
        counter = counter + 1

            If counter > 1 Then

               extractall = extractall & separator & cl.offset(0, offsetcol)
            ElseIf counter = 1 Then

                extractall = cl.offset(0, offsetcol)
            Else

            End If
    End If

Next cl

End Function

Pivot Tables: Avoid Changing Data Sources with Named Ranges

When working with a pivot table, you determine a data source to use for that table. But what if later you add rows or columns? You’ll normally have to change the data source to include the updated range, otherwise your pivot table isn’t including your changes. Unless you use a named range.

First, select the Name Manager under the Formulas tab and click on New

However, instead of selecting all the data, I’ll use the offset formula.

The OFFSET function allows you to specify the size of your data set. The formula below will keep the range equal to all the nonblank rows and cells starting from cell A1:

=OFFSET(Sheet1!$A$1,0,0,counta(Sheet1!$A:$A),counta(Sheet1!$1:$1))
This is the first cell of your data source

This is the first column of your data source

This is the first row of your data source

Once you’ve created a new name for your data set, put the formula above in the Refers to field and press OK:



Now you’ve created your custom, auto-updating range. All you need to do now is when creating a pivot table (or changing its data source), put in the named range.

Note this will not work properly if you have gaps in your columns or rows. The COUNTA function counts how many non-blank cells are within the range. So you could manually override the formulas if need be, but that would defeat the point of this post.

But if you follow the above steps and use the formula above, your pivot table will automatically be updated with any new rows or columns you add to it. All you’ll have to do is refresh the pivot table. And because the offset formula forces the range to change, that will automatically happen in the pivot table as well.

Formula to find day of week occurrences in Excel

In my previous post I showed a function that can help pull a certain instance of a day of the week using VBA. In this post, I’ll show how to do that without VBA. Specifically, three things this post will look at: 1) how to find the first Monday of the month 2) how to find the date ending/starting of a week, and 3) how to find the 3rd Monday of the month

1.Finding the first Monday of the month

First, I need to identify the weekday that the first day of the year falls on with this formula:

Formula 1.A

=WEEKDAY(DATE(2014,1,1))

This returns 4 (Wednesday). 

Next I’ll determine how many days away this is from my desired day – Monday. Since Monday is the 2nd day of the week, the first day of the year is 2 (4-2) days after Monday. If you want to use another day instead of Monday then change the 2 to the corresponding day of the week (for example, Sunday is 1 – assuming your settings are setup for weeks starting on a Sunday). 

With 7 days in a week, I need to deduct 2 from 7, making 5 days that I need to add to the first day of the year to get to the first Monday of the year. The formula so far looks like this:

Formula 1.B

=DATE(2014,1,1)+7-(WEEKDAY(DATE(2014,1,1))-2)

However, this formula will only work if the first day of the month falls after Monday. I need to include an IF statement so that in the case that Monday (or whatever day I choose) falls after the first day of the year, it will just add the difference. The formula that follows includes this consideration:


Formula 1.C

=DATE(2014,1,1)+IF(WEEKDAY(DATE(2014,1,1))-2<=0,2-WEEKDAY(DATE(2014,1,1)),7-((WEEKDAY(DATE(2014,1,1))-2)))

(the numbers highlighted in red relate to the day of the week I want to calculate)

This formula is a bit more complex but all it is saying is that if the difference between the days of the week are negative (e.g. first day of the year is Sunday and hence before Monday) then I would just add the difference to the first day of the month. Otherwise the formula remains as it was.

I’ve replaced all the possible variables with words so you can easily see where to change the values to calculate to the day you want.


Formula 1.D

=DATE(YEAR,MONTH,1)+IF(WEEKDAY(DATE(YEAR,MONTH,1))-WEEKDAYNUMBER<=0,WEEKDAYNUMBER-WEEKDAY(DATE(YEAR,MONTH,1)),7-((WEEKDAY(DATE(YEAR,MONTH,1))-WEEKDAYNUMBER)))

2. How to find the 3rd Monday of the month

Continuing from Formula 1.D, I’m going to make an adjustment to calculate the proper week. To do this I need to add two weeks to the formula:

Formula 2.A

=DATE(YEAR,MONTH,1)+((XTHOCCURRENCE-1)*7)+IF(WEEKDAY(DATE(YEAR,MONTH,1))-WEEKDAYNUMBER<=0,WEEKDAYNUMBER-WEEKDAY(DATE(YEAR,MONTH,1)),7-((WEEKDAY(DATE(YEAR,MONTH,1))-WEEKDAYNUMBER)))


Where XTHOCCURRENCE in this case will be equal to 3.

If your result is a number you will need to change the cell format (CTRL+1) to date.


3. Find the date ending/starting of a week

If you have weekly reporting and reference a week number, you may find it useful to show what date that week relates to.

This formula builds off of Formula 1.D. One difference is the month will always be set to 1 because we need to know the first weekday of the year. This is important to determine the number of weeks that need to be added.


If the first day of the year falls after the day I want (this example is Monday), then I know the first Monday will fall in week 2, not in week 1. What that means is if I want to determine the Monday on week 15, I will need to multiply by 13 and not by 14 weeks to get to that date. The formula is:

Formula 3.A

=DATE(2014,1,1)+IF(WEEKDAY(DATE(2014,1,1))-2<=0,(15-1)*7+2-WEEKDAY(DATE(2014,1,1)),(15-2)*7+7-((WEEKDAY(DATE(2014,1,1))-2)))

The only parts I added were those in purple, which just multiply the number of weeks I need. This formula will work for any day of the week. So if you want to calculate the the start or end of week 15, you can set the weekday you are calculating to as Monday (2) or for the end of the week to Friday (6). Here is the formula with the variables replaced with words:

=DATE(YEAR,1,1)+IF(WEEKDAY(DATE(YEAR,1,1))-WEEKDAYNUMBER<=0,(WEEKNUMBER-1)*7+WEEKDAYNUMBER-WEEKDAY(DATE(YEAR,1,1)),(WEEKNUMBER-2)*7+7-((WEEKDAY(DATE(YEAR,1,1))-WEEKDAYNUMBER)))

The weekday number relates to the day of the week (e.g. 1-Sunday, 2-Monday, 3-Tuesday, 4-Wednesday, 5-Thursday, 6-Friday, 7-Sunday). This may be slightly different depending on your regional settings, you may have Monday set as 1. The weekday number is the day of the week that you’re interested in determining where it falls.

Brief Intro to Macros and VBA

If you do a lot of repetitive tasks (and even if you don’t), macros can help save lots of time. In some cases you can be talking hours worth of work in one macro.

Recording a Macro

Now, if you’re not familiar with macros or VBA at all, a good way to learn is through the macro recorder tool (this is under the View tab in the ribbon). When you record your macro, you can see what was created on the VBA side by pressing ALT+F11 – you can also edit your macro here. You can do much more in VBA by coding straight from that screen, as the macro recorder is limited (e.g. it can’t create variables or other complex codes). But the macro recorder helps you learn how to reference different items.

When you create a macro – whether directly from VBA or the macro recorder, you can assign a shortcut (or a button to it) to it, saving you time in executing it.

To assign a shortcut: select the view macros option under the view tab and then select options for the macro you want. There it will allow you to assign a shortcut (see below)

For a button, you will need the developer tab enabled. If you don’t have it, go under Excel options and under ribbon options you will see an option to enable it.

Once enabled, under the developer tab you will see an insert controls button where you can insert a button (see below)

Once you do that you will create the size of the button, and when you are done it will ask you to assign a macro to it. And now when you click on that button your macro will run.

Inserting VBA Code

If you run across VBA code that you want to copy into your spreadsheet, you want to make sure it is put in the proper section.

If it is a custom function, you want to make sure that is put in a module to work properly. If you do not see a module section in VBA, you will need to add one from the Insert menu.

If it is just a macro (denoted by ‘Sub’ and then the name of the procedure) it can be put in a module, the worksheets, or the workbook section. If you want the macro to work in all worksheets, then I would recommend putting it in the workbook rather than the individual sheets.