A Linked Table to Sort Data for Charting

I am often asked how to sort the data in a table without sorting the original data. The answer is twofold: the chart’s source data has to be sorted for the charted data to be sorted, but you don’t have to sort the original data if you can write a few formulas. The trick is to use an intermediate range as the source data, which is linked to the original data, but uses formulas to place the data into the desired order.

Let’s look at a simple example. The table below shows a simple data set with some unsorted values:

	B	C
1	Label	Value
2	Alpha	7
3	Beta	7
4	Gamma	7
5	Delta	8
6	Epsilon	8
7	Zeta	3
8	Eta	4
9	Theta	8

Chart with Unsorted Data

The LARGE(<range>, N) worksheet function is used to return the Nth largest value in a range. Set up columns E and F for the chart source data. In E1 and F1 put headers, in F2 enter this formula:

=LARGE($C$2:$C$9,ROW()-1)

then copy this cell and paste it into F3:F9. To provide the labels in column E, enter this formula in E2:

=INDEX($B$2:$B$9,MATCH(F2,$C$2:$C$9,0))

Copy this cell and paste it into E3:E9. The table looks like this:

	E	F
1	Label	Sorted
2	Delta	8
3	Delta	8
4	Delta	8
5	Alpha	7
6	Alpha	7
7	Alpha	7
8	Eta	4
9	Zeta	3

The numbers are sorted appropriately. If you want them sorted in ascending order, use the SMALL() function instead of LARGE().

The column of labels is not correct, however: Delta appears three times next to the cells containing the value 8, while Epsilon and Theta, which also should be listed besides 8, are missing. Alpha also appears multiple times, and other labels are missing. The reason for this is that the MATCH() function used in column E picks out the first item in the array that matches the entered value. Delta is the first item in the list associated with the value 8, so it is returned every time 8 appears in the sorted list.

The labels are corrected by inserting some adjusted values into the worksheet, as shown below. I’ve used column A for the extra column, but you could locate it next to the second table, so it doesn’t interfere with the layout of the original table, if it is part of a report. Cell A2 contains the following formula, and the cell is copied and pasted into the range A3:A9. The ROW()/1000 term is used to add an insignificant amount to the values, to break ties in the sorting but not to affect the sorted order or to affect the perceived values in the chart.

=C2+ROW()/1000

	A	B	C
1		Label	Value
2	7.002	Alpha	7
3	7.003	Beta	7
4	7.004	Gamma	7
5	8.005	Delta	8
6	8.006	Epsilon	8
7	3.007	Zeta	3
8	4.008	Eta	4
9	8.009	Theta	8

The formulas in column F are changed to

=LARGE($A$2:$A$9,ROW()-1)

and the formulas in column E are changed to

=INDEX($B$2:$B$9,MATCH(F2,$A$2:$A$9,0))

The table now is properly sorted, the labels are correctly rendered, and the chart is ready to paste into a report.

	E	F
1	Label	Sorted
2	Theta	8.009
3	Epsilon	8.006
4	Delta	8.005
5	Gamma	7.004
6	Beta	7.003
7	Alpha	7.002
8	Eta	4.008
9	Zeta	3.007

Chart with Sorted Data

Comments

Andrew says

Sunday, February 24, 2008 at 11:51 pm

I suggest the following improvements:

In cell A2, use
=RANK(C2,$C$2:$C$9)+COUNTIF($C$2:C2,C2)-1
This formula creates an artificial sort allowing for duplicates with the use of the COUNTIF function. When duplicates values arise the first label in the list takes precedence.

In cell E2, use
=INDEX($B$2:$B$9,MATCH(ROW()-ROW($E$1),$A$2:$A$9,0))
This formula works in the same way as described in the blog but makes reference to the updated sort column.

In cell F2, use
=INDEX($C$2:$C$9,MATCH(ROW()-ROW($F$1),$A$2:$A$9,0))

Each of these formulae should be copied down.

The result is slightly different in that (a) my sort function is based on first listed data (as opposed to introducing ROW() as a tolerance variable), and (b) returns actual results (rather than close estimates).
Jon Peltier says

Tuesday, February 26, 2008 at 12:56 pm

Andrew –

Thanks for the comments. I noticed the reverse sorting of items in the sorted list: Delta, Epsilon, and Theta all show the same value, but they are listed with with Theta first, because ROW()/1000 gives a larger value for a higher row. I could have changed the added term to something like (100-ROW())/10000, which would restore the order of the items in the original list. Your formula looks nicer, though, and precludes the user having to decide what large number to divide the row number by.

You also mentioned that my chart is actually showing slightly distorted data, with numbers exaggerated by ROW()/1000. I should follow one of my cardinal rules, which is, if you should use another column, use another column. The added column would use the sorted numbers in column F to look up the actual data from column C, much as the labels in column E are looked up. The lookup formula is:

=INDEX($C$2:$C$9,MATCH(F2,$A$2:$A$9,0))

In fact I use this in my own web site tracking workbook, where I track statistics backwards for six months. I use the lookup for all six months of the tracking data, based on the sorted values for the most recent month.

Maybe when I get a minute I’ll fix the protocol above to address both of your suggestions.
Jon Peltier says

Thursday, February 28, 2008 at 11:09 am

I noticed that John Mansfield has covered this exact topic on his cellMatrix blog:
Auto-Sort Chart

Reader Interactions

Comments

Leave a Reply