H2EParsePowerQuery

How to Parse Data in Excel Using Power Query

In an old post, I went over how to parse data using various different functions. This time around, I’m going to show you how much easier it is do that in Power Query. If you’re not comfortable using LEN or MID functions, then this will make your life a whole lot easier. And to keep things simple, I’m going to use the same data set as I did in the previous post, which you can download from here.

Setting up the query

The first step involves copying the data from the webpage and then just pasting it into cell A1. With no adjustments, my data just contains the raw data:

Raw data download.

The one thing I’m going to do is remove the blank rows just so that Excel recognizes the full data set without having to adjust it. I can remove the blank rows in Power Query, but I’m going to do it at this stage so I don’t need to worry about finding what row number I need the range to go down to. To remove blanks, I will select column A, press F5, special, blanks, and then right-click delete on one of the cells. Now that there are no gaps in my data, I can set up the query.

To do that, I’ll go into the Data tab, and in the Get & Transform Data section, click on the From Sheet button.

Get & Transform section of the Data tab.

Excel should now autodetect the entire range. Click OK and the query will be created:

Data exported into Power Query.

Right now, it looks the same as what it was before, except it’s in the Power Query window. Next, I’ll actually start making the transformations.

Parsing the data using delimiters

Just like in the older post, I am going to set up fields for Country, City, and Population. But this time, you won’t have to fumble around and worry about setting up complex formulas. In the Power Query Editor, I’ll select the Add Column tab. And in there, I’m going to select the Extract drop-down selection and choose Text Before Delimiter:

Extract menu in Power Query.

I’m going to use the colon (:) as the delimiter and then click OK

Setting up the text before delimiter in Power Query.

That nicely parses out the countries:

Applying the text before delimiter for the Country field.

I can double-click on the header where it says Text Before Delimiter and change it say ‘Country’

Next, let’s parse out the City field. I need to make sure that Column1 remains selected. This time, I’m going to select Extract under the Add Column tab and then select Text Between Delimiters. I’m going to set my start delimiter as a colon. And the end delimiter will be the opening bracket:

Setting up the text between delimiters in Power Query.

And after re-naming the field to City, this is why my Power Query Editor looks like:

Power Query Editor after setting up the Country and City fields.

The last column to parse out is the Population. For this, I’m going to follow a similar step as above except I’m going to extract the text within the brackets. But in some instances, there is data within brackets that doesn’t relate to the population. But one consistency is that the population always comes at the end. So in this case, I’m going to use the Advanced options and specify that I want to start searching from the end of the string. I have left the other options the same:

The Advanced options in the Text Between Delimiters section.

Now, my fields look pretty good:

Country, City, and Population fields set up in the Power Query Editor.

The one thing I still need to do is remove the headers for the different letters. Since there is nothing in brackets, I can filter for any blank value in the City field. To do, this, I will click on the drop-down arrow for that field and select the option to Remove Empty:

Removing empty values from the City field.

Now, the data looks good and ready to import back into the worksheet:

City field after removing the blank values.

I technically don’t need that first column anymore. It’s done its job and one of the great things about Power Query is I can delete it, and it won’t impact everything else I’ve done. I’m going to right-click and delete that column so that all I’m left with are the fields I actually need:

Power Query Editor after removing the first column.

All that’s left now is to load the data into the spreadsheet. To do this, click on the Close & Load button in the top-left corner. It will now put that into a new tab by default:

Power Query data loaded back into an Excel sheet.

And just like that, you’ve parsed the data without having to go a painstaking effort of figuring out the correct formulas. But as easy as this was, there is an even easier way of parsing the data out (most of the time).

Parsing the data using examples

I’m going to re-do the previous step, this time taking a different approach, without the use of delimiters. This time, I’m going to the Add Column tab and select the Column From Examples button. This will generate another column on the right-hand-side:

Using Column From Examples in Power Query.

What I’m going to do in Column2 is give Power Query some examples of what I want this field to contain. Since it is the Country field, I’m going to start by typing out a country name. Even after just entering the first one, Power Query has figured out the pattern and does the rest for me:

Column From Example after entering in one value for the Country field.

You’ll notice at the top it has the Text Before Delimiter which is what I used when I did this manually. The less complicated the data, the quicker and easier it will be for Power Query to predict what I’m trying to do.

I’ll repeat the step for the City field. This time when I enter just one value, it hasn’t figured out the rest of the values:

Column From Example after entering in one value.

Instead of La Paz at the bottom of the above screenshot, it only pulls ‘La’ and so what I will do is correct that entry manually. Upon doing that it updates the calculations, but they still aren’t quite right. For Bosnia and Herzegovina, it is including part of the country name:

Column From Example after entering in two values.

I will manually update that value to just enter Sarajevo, and once I do that it now looks correct:

Column From Example after entering in three values.

And if I look at the formula that it has generate, it now is the same as what I did manually with selecting the delimiters:

Column From Examples formula.

The last column, Population, was the most challenging to set up because I needed to use the Advanced settings. Let’s see how well Power Query is able to extract this one using examples. Again, I’ll start with entering in the first value:

Column From Example after entering in one value for the population.

It doesn’t look too bad except for La Paz, it pulls in ‘seat of government’ which is in brackets, as opposed to the population. I’ll manually correct this one, and upon doing so this is what my column looks like:

Column From Example after entering in two values for the population.

Now the problem is the n/a values aren’t picking up correctly. Once I correct them, the column looks to be correct, except for Delhi:

Column From Example after entering in three values for the population.

After making a few more adjustments, the column looks to be correct:

Column From Example after entering in several values for the population.

One of the challenges with doing it this way is if a field isn’t easy to predict for Power Query, it may take some manual entry before it is able to get it just right. And even then, you may not be certain that you’ve accounted for all the possible variations. While this method can make it really easy for simple data parsing, for more advanced ones you will likely want to familiarize yourself with how to use the different Extract options.


If you liked this post on How to Parse Data in Excel Using Power Query, please give this site a like on Facebook and also be sure to check out some of the many templates that we have available for download. You can also follow us on Twitter and YouTube.

Add a Comment

You must be logged in to post a comment