<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Peltier Tech Blog &#187; Data Techniques</title>
	<atom:link href="http://peltiertech.com/WordPress/category/data-techniques/feed/" rel="self" type="application/rss+xml" />
	<link>http://peltiertech.com/WordPress</link>
	<description>Peltier Tech Excel Charts and Programming Blog</description>
	<lastBuildDate>Wed, 16 May 2012 17:58:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Clean Up Date Items in An Excel Pivot Table</title>
		<link>http://peltiertech.com/WordPress/clean-up-date-items-in-excel-pivot-table/</link>
		<comments>http://peltiertech.com/WordPress/clean-up-date-items-in-excel-pivot-table/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 08:00:59 +0000</pubDate>
		<dc:creator>Jon Peltier</dc:creator>
				<category><![CDATA[Data Techniques]]></category>
		<category><![CDATA[Pivot Tables]]></category>

		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=3375</guid>
		<description><![CDATA[In Grouping by Date in a Pivot Table I showed how to summarize daily data in a pivot table by grouping into monthly values. I&#8217;ll review this technique, then show how to clean up the dates when you don&#8217;t use the default starting and ending dates in the Grouping dialog. Here is a pivot table [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://peltiertech.com/WordPress/grouping-by-date-in-a-pivot-table/"class="vt-p" title="Grouping by Date in a Pivot Table" >Grouping by Date in a Pivot Table</a> I showed how to summarize daily data in a pivot table by grouping into monthly values. I&#8217;ll review this technique, then show how to clean up the dates when you don&#8217;t use the default starting and ending dates in the Grouping dialog.</p>
<p>Here is a pivot table with daily values. I worked this out in Excel 2003, because that&#8217;s what was open at the time, but the technique is much the same in all versions. I want to condense this into monthly values, and show only data from 2010 and 2011.</p>
<p style="text-align: center;"><img class="aligncenter" src="http://peltiertech.com/images/2012-02/PivotTable1.png" alt="Pivot Table of Daily Values" width="450" height="556" /></p>
<p>I navigate to the Group and Show Detail &gt; Group command, and choose the appropriate parameters in the Grouping dialog.</p>
<p style="text-align: center;"><img class="aligncenter" src="http://peltiertech.com/images/2012-02/PivotGroupingDialog.png" alt="Pivot Field Grouping Dialog" width="234" height="269" /></p>
<p><span id="more-3375"></span>I change the range of dates in the pivot table by unchecking the two Auto boxes at the top and entering the start and finish dates I want. Then I select the time units I want the data grouped by. Pick Months and Years, or you&#8217;ll only get 12 monthly values summed over all the years within selected the date range.</p>
<p>The pivot table is condensed into the desired time periods. The only problem is that Excel includes two additional pivot items in each date-related pivot field. The pivot fields Years and Date have pivot items &lt;1/1/10 and &gt;12/31/11 to account for data from outside our selected date range.</p>
<p style="text-align: center;"><img class="aligncenter" src="http://peltiertech.com/images/2012-02/PivotTable2.png" alt="Pivot Table grouped by Months and Years" width="515" height="494" /></p>
<p>We can click on the field header dropdowns and see the pivot items in each. The Years and Date pivot item lists are shown below.</p>
<p style="text-align: center;"><img class="aligncenter" src="http://peltiertech.com/images/2012-02/PivotFieldItemLists.png" alt="Pivot Item Lists for Years and Date Fields" width="424" height="310" /></p>
<p>The pivot items have checkboxes, so we can manually uncheck the extraneous items, every time the pivot table is refreshed. Bo-o-o-oring.</p>
<p>I&#8217;ve actually been doing this manually for years in some of my accounting workbooks. Like the plumber&#8217;s faucet that always drips, the programmer&#8217;s workbook gets updated by hand. But a reader emailed me and asked how to get rid of those extra date items. I thought for about 30 seconds, and coded for about 3 minutes, and came up with this routine that cleans up any fields in the Row and Column areas of all pivot tables on the active sheet that have items beginning with &#8220;&lt;&#8221; or &#8220;&gt;&#8221;.</p>
<pre class="vbasmall"><code>Sub Remove_GT_LT_PivotItems()
  Dim pt As PivotTable
  Dim pf As PivotField
  Dim pi As PivotItem

  For Each pt In ActiveSheet.PivotTables

    For Each pf In pt.RowFields
      For Each pi In pf.PivotItems
        If Left$(pi.Caption, 1) = "&lt;" Or Left$(pi.Caption, 1) = "&gt;" Then
          pi.Visible = False
        End If
      Next
    Next

    For Each pf In pt.ColumnFields
      For Each pi In pf.PivotItems
        If Left$(pi.Caption, 1) = "&lt;" Or Left$(pi.Caption, 1) = "&gt;" Then
          pi.Visible = False
        End If
      Next
    Next

  Next

End Sub</code></pre>
<p>Here is the finished pivot table.</p>
<p style="text-align: center;"><img class="aligncenter" src="http://peltiertech.com/images/2012-02/PivotTable3.png" alt="Pivot Table with Cleaned Up Date Fields" width="500" height="460" /></p>
<p>Peltier Technical Services, Inc., Copyright © 2011.<br /> <br /><span style="font: 80% Verdana,Tahoma,Arial,sans-serif;">Licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="nofollow" rel="license" >Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.<br /> <br />
<a href="http://www.exceluser.com/cmd.asp?Clk=1374689" rel="nofollow" ><IMG SRC="http://www.exceluser.com/images/info/pub/info_dash_c02.gif" ALT="Learn how to create Excel dashboards." WIDTH="468" HEIGHT="60" border=0></a><br />
<br /><img src="http://www.exceluser.com/cmd.asp?Imp=1374689" width="0" height="0" border="0"></p>
]]></content:encoded>
			<wfw:commentRss>http://peltiertech.com/WordPress/clean-up-date-items-in-excel-pivot-table/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Data Smoothing Perils (Series Lines Follow-Up)</title>
		<link>http://peltiertech.com/WordPress/data-smoothing-perils-series-lines-follow-up/</link>
		<comments>http://peltiertech.com/WordPress/data-smoothing-perils-series-lines-follow-up/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 09:00:20 +0000</pubDate>
		<dc:creator>Jon Peltier</dc:creator>
				<category><![CDATA[Data Techniques]]></category>

		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=3351</guid>
		<description><![CDATA[Last week in Series Lines: Useful or Chart Junk?, I wrote about Excel&#8217;s &#8220;Series Lines&#8221; feature, and how it seems like a good way to clarify the data in a stacked column chart, until you implement it and realize it just adds chart junk to your chart. I proposed a panel chart to show the stacked [...]]]></description>
			<content:encoded><![CDATA[<p>Last week in <a href="http://peltiertech.com/WordPress/series-lines-useful-or-chart-junk/"class="vt-p" title="Series Lines: Useful or Chart Junk?" >Series Lines: Useful or Chart Junk?</a>, I wrote about Excel&#8217;s &#8220;Series Lines&#8221; feature, and how it seems like a good way to clarify the data in a stacked column chart, until you implement it and realize it just adds chart junk to your chart.</p>
<p>I proposed a panel chart to show the stacked chart data more clearly. Joe Mako commented that the line chart I showed prior to building a panel chart could be cleaned up by taking the rolling percentages of the cumulative total of the original data. I agreed that it makes the chart look cleaner, but I think it forces us to sacrifice too much detail in the data.</p>
<p><span id="more-3351"></span>I&#8217;ll illustrate my reasoning in this post.</p>
<h2>Individual Values</h2>
<p>These are the original values I used in my previous post.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/1ValuesData.png" alt="Original Values: Data" /></p>
<p>Here are the stacked column chart and line chart of this data. They are rather cluttered, and not easy to interpret.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/1ValuesTwoCharts.png" alt="Original Values: Stacked Column and Line Charts" /></p>
<p>Here is the panel chart. Splitting the data into separate panels according to category makes it much easier to see trends within an individual category and to compare values across categories.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/1ValuesPanel.png" alt="Original Values: Panel Chart" /></p>
<h2>Percentages of Annual Total</h2>
<p>This table shows the data as a percentage of the annual total. For example, the &#8220;alpha&#8221; value for 2005 is the alpha value divided by all values, or 16/(16+14+13+10+8), or 26.2%.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/2PercentageData.png" alt="Percentage of Annual Total: Data" /></p>
<p>The stacked chart is now flat across the top, because the total in each year is 100%. The details within the stacked column chart and the line chart are no easier to see than the charts showing the individual values. In fact, it&#8217;s hard to see the differences between these and the earlier charts.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/2PercentageTwoCharts.png" alt="Percentage of Annual Total: Stacked Column and Line Charts" /></p>
<p>The panel chart of the percentage data isn&#8217;t much different than that for the individual values, and it is much easier to read than the stacked and line charts.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/2PercentagePanel.png" alt="Percentage of Annual Total: Panel Chart" /></p>
<h2>Rolling Percentages of Cumulative Total</h2>
<p>Here are the calculations proposed by Joe Mako. The values for the first year are the same. The values for the each subsequent year are the values for a given category for all years so far divided by the values for all categories for all years so far. So alpha for 2006 would be (16+18)/((16+14+13+10+8)+(18+17+12+9+8)), or 27.2%.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/3RollingData.png" alt="Rolling Percentage of Total: Data" /></p>
<p>Essentially each year&#8217;s rolling value is a weighted average of the current value with the previous years&#8217; values, which results in a smoothing of the data. The data is smoothed so much that it&#8217;s not easy to see much difference at all within any particular category in the stacked chart below. In the line chart, we can make out trends, but they are not nearly as pronounced as in the original data or in the annual percentages.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/3RollingTwoCharts.png" alt="Rolling Percentage of Total: Stacked Column and Line Charts" /></p>
<p>The panel chart also shows this smoothed data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/3RollingPanel.png" alt="Rolling Percentage of Total: Panel Chart" /></p>
<h2>What&#8217;s the Big Deal?</h2>
<p>So why don&#8217;t I like this smoothing of the data? Smoothing can be helpful when you&#8217;re trying to find patterns in noisy data. In fact, I&#8217;ve written about <a href="http://peltiertech.com/WordPress/loess-smoothing-in-excel/"class="vt-p" title="LOESS Smoothing in Excel - Peltier Tech Blog" >LOESS smoothing in Excel</a>, and I&#8217;ve released a <a href="http://peltiertech.com/WordPress/loess-utility-awesome-update/"class="vt-p" title="LOESS Utility for Excel - Awesome Update - Peltier Tech Blog" >utility to perform LOESS smoothing</a> on worksheet data.</p>
<p>But smoothing can also obliterate details in data, and it can give the wrong impression of trends in the data.</p>
<p>In the latest stacked column chart above, the variation in the data from year to year has been almost smoothed out of existence. The latest line chart contains more white space, because points have been moved away to provide this space.</p>
<p>The line charts below compare how the unsmoothed data compares to the rolling average data. In the chart on the left, the first and last years in the analysis are connected, without showing the intervening years. The solid symbols and lines show the annual percentages, while the open symbols and dashed lines show the rolling averages. The lines start at the same points in 2005, and both sets of data move in the same direction. But for all categories, the rolling averages change much less than the annual averages. The rolling averages make it seem that alpha and beta have become close in value, while the annual averages show that they have diverged to a great degree. The rolling averages show that gamma, delta, and epsilon have moved very close to each other, while the annual averages have changes so much that the categories have crossed each other and reversed their order. The line chart on the right is another view of the two sets of data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/4PctgRollingLines.png" alt="Percentage and Rolling Percentage of Total: Line Charts" /></p>
<p>As is often the case, the panel chart shows the data most clearly. The blue solid diamond markers show the annual percentages, with their considerable variability, while the red open squares show very gradual changes. The smoothed data in this case tends toward the overall time average of the data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2012-01/4PctgRollingPanel.png" alt="Percentage and Rolling Percentage of Total: Panel Chart" /></p>
<h2>Data Smoothing</h2>
<p>Data soothing can be a useful tool for teasing patterns out of noisy data. It&#8217;s important that the consumer of the data understands the unsmoothed and smoothed data, and how the smoothed data may have been distorted by the smoothing technique used.</p>
<div id="_mcePaste" class="mcePaste" style="position: absolute; left: -10000px; top: 2286px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">http://peltiertech.com/WordPress/loess-smoothing-in-excel/</div>
<p>Peltier Technical Services, Inc., Copyright © 2011.<br /> <br /><span style="font: 80% Verdana,Tahoma,Arial,sans-serif;">Licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="nofollow" rel="license" >Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.<br /> <br />
<a href="http://www.exceluser.com/cmd.asp?Clk=1374689" rel="nofollow" ><IMG SRC="http://www.exceluser.com/images/info/pub/info_dash_c02.gif" ALT="Learn how to create Excel dashboards." WIDTH="468" HEIGHT="60" border=0></a><br />
<br /><img src="http://www.exceluser.com/cmd.asp?Imp=1374689" width="0" height="0" border="0"></p>
]]></content:encoded>
			<wfw:commentRss>http://peltiertech.com/WordPress/data-smoothing-perils-series-lines-follow-up/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Excel Interpolation Formulas</title>
		<link>http://peltiertech.com/WordPress/excel-interpolation-formulas/</link>
		<comments>http://peltiertech.com/WordPress/excel-interpolation-formulas/#comments</comments>
		<pubDate>Thu, 18 Aug 2011 09:00:38 +0000</pubDate>
		<dc:creator>Jon Peltier</dc:creator>
				<category><![CDATA[Data Techniques]]></category>
		<category><![CDATA[Interpolation]]></category>

		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=3322</guid>
		<description><![CDATA[In Getting value on Y axis by putting X axis value on the Mr Excel forum, someone wanted to know how to find in-between values of a function, given some known data points. The approach, of course, is to interpolate values given the known points on either side of the value you need. Interpolation requires some [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.mrexcel.com/forum/showthread.php?t=571986" rel="nofollow" title="Getting value on Y axis by putting X axis value : Mr Excel" >Getting value on Y axis by putting X axis value</a> on the <a href="http://www.mrexcel.com/forum/forumdisplay.php?f=10" rel="nofollow" title="Mr Excel" >Mr Excel</a> forum, someone wanted to know how to find in-between values of a function, given some known data points. The approach, of course, is to interpolate values given the known points on either side of the value you need.</p>
<p>Interpolation requires some simple algebra. The diagram below shows two points (blue diamonds connected by a blue line) with coordinates (X1, Y1) and (X2, Y2). We need to find the value of Y corresponding to a given X, represented by the red square at (X, Y).</p>
<p>The smaller triangle with hypotenuse (X1, Y1)-(X, Y) is &#8220;similar&#8221; to the larger triangle with hypotenuse (X1, Y1)-(X2, Y2), so the sides of the triangles are proportionally sized, leading to the first equation below the sketch. We rearrange this to solve for Y, in the second equation.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-08/interpolationalgebra.png" alt="Algebraic construction for interpolation" width="291" height="349" /></p>
<p><span id="more-3322"></span>We&#8217;ll set up our interpolation in the example below. Our data is in A5:B18, and the known values are plotted as blue diamonds connected by blue lines in the chart.</p>
<p>The analysis has two parts: first we need to determine which pair of points to interpolate between, second we need to do the interpolation. We will judge the validity of our interpolation by plotting the calculated point on the same chart.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-08/interpolationsheet.png" alt="Interpolating set-up in worksheet" width="565" height="324" /></p>
<h2>Solving for Y</h2>
<p>I&#8217;ve put the calculations above the data table. The yellow shaded cell, A2, holds the known X value, and a formula in cell B2 holds the calculated Y value. Cell A3 indicates which pair of points to interpolate between. The formulas are:</p>
<p><tt class="tt">A3: =MATCH(A2,A6:A18)</tt></p>
<p><tt class="tt">B2: =INDEX(B6:B18,A3) + (A2-INDEX(A6:A18,A3)) * (INDEX(B6:B18,A3+1)-INDEX(B6:B18,A3)) / (INDEX(A6:A18,A3+1)-INDEX(A6:A18,A3))</tt></p>
<p>We want the Gauge value (Y) when the Flow value (X) equals 3, so this is entered into the yellow shaded cell A2. The formula in A3 tells us that our computed point is between the 7th and 8th data point, and the formula in B2 calculate Y=0.444, and the calculated point (A2, B2) is the red square that lies along the plotted data points. Looks good.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-08/interp_findY1.png" alt="Interpolating to solve for Y - example 1" width="565" height="324" /></p>
<p>Te determine the Gauge value for a Flow value of 12, we enter this into A2. The red square moves along the blue line past the 11th point, where Gauge=0.548.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-08/interp_findY2.png" alt="Interpolating to solve for Y - example 2" width="565" height="324" /></p>
<h2>Solving for X</h2>
<p>With a minor rearrangement, we can instead solve for Flow, given a value for Gauge. The known Gauge value is entered into B2, shaded yellow. B3 indicates the pair of points to interpolate between, and A2 provides the value for Flow. The formulas are:</p>
<p><tt class="tt">B3: =MATCH(B2,B6:B18)</tt></p>
<p><tt class="tt">A2: =INDEX(A6:A18,B3) + (B2-INDEX(B6:B18,B3)) * (INDEX(A6:A18,B3+1)-INDEX(A6:A18,B3)) / (INDEX(B6:B18,B3+1)-INDEX(B6:B18,B3))</tt></p>
<p>For a Gauge (Y) of 0.53, we compute a Flow (X) of 9.790. The red square shows where the calculate point lies along our plotted data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-08/interp_findX1.png" alt="Interpolating to solve for X - example 1" width="565" height="324" /></p>
<p>Changing the Gauge value to 0.35 moves the red square way to the left, to a Flow value of 0.400.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-08/interp_findX2.png" alt="Interpolating to solve for X - example 2" width="565" height="324" />
<p>Peltier Technical Services, Inc., Copyright © 2011.<br /> <br /><span style="font: 80% Verdana,Tahoma,Arial,sans-serif;">Licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="nofollow" rel="license" >Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.<br /> <br />
<a href="http://www.exceluser.com/cmd.asp?Clk=1374689" rel="nofollow" ><IMG SRC="http://www.exceluser.com/images/info/pub/info_dash_c02.gif" ALT="Learn how to create Excel dashboards." WIDTH="468" HEIGHT="60" border=0></a><br />
<br /><img src="http://www.exceluser.com/cmd.asp?Imp=1374689" width="0" height="0" border="0"></p>
]]></content:encoded>
			<wfw:commentRss>http://peltiertech.com/WordPress/excel-interpolation-formulas/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Area Under a Fitted Curve</title>
		<link>http://peltiertech.com/WordPress/area-under-a-fitted-curve/</link>
		<comments>http://peltiertech.com/WordPress/area-under-a-fitted-curve/#comments</comments>
		<pubDate>Mon, 02 May 2011 05:00:55 +0000</pubDate>
		<dc:creator>Jon Peltier</dc:creator>
				<category><![CDATA[Data Techniques]]></category>
		<category><![CDATA[calculating area]]></category>
		<category><![CDATA[polynomial fit]]></category>
		<category><![CDATA[trendline]]></category>

		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=3303</guid>
		<description><![CDATA[A reader of my post about Trendline Fitting Errors asked how to calculate area under a fitted curve. His stated problem was that when he tried to calculate points based on the fit, it didn&#8217;t come close to matching his measured data, even though the fit had a very high R² value. I suspect this [...]]]></description>
			<content:encoded><![CDATA[<p>A reader of my post about <a href="http://peltiertech.com/WordPress/trendline-fitting-errors/"title="Trendline Fitting Errors | Peltier Tech Blog" >Trendline Fitting Errors</a> asked how to calculate area under a fitted curve. His stated problem was that when he tried to calculate points based on the fit, it didn&#8217;t come close to matching his measured data, even though the fit had a very high R² value. I suspect this problem was due to the tricky nature of the Excel formulas needed to make the various calculations.</p>
<p>Another problem, which is stated in the <strong>Trendline Fitting Errors</strong> post is that the reader had calculated a 6th order polynomial fit to his data. This is fine, I guess, if you only need the data for interpolation, the fit is close, and due to curvature the fit deviates from a straight line between points.</p>
<p>Despite the problems with using a sixth order fit, I&#8217;ve decided to work out the calculations and compute the area under the curve, both under the measured data and unser the calculated data based on the fit.</p>
<h2>The Experimental Data</h2>
<p>The measured data is shown in the table below. This represents the force needed to move a surface closer to a particular molecule starting at a distance of 1.22 (which is asymptotically close enough to infinity that we assume a force of zero). The units of force and distance are not stated, but it doesn&#8217;t matter for this exercise. As the surface approaches the molecule, the force increases, until it is close enough to start to distort the molecule, at which point the force tops out and even decreases at the closest measurement.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Sheet2Columns.png" alt="Measured Data Tabulated" /></p>
<p><span id="more-3303"></span>Here&#8217;s a plot of the data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data.png" alt="Measured Data Plotted" /></p>
<p>When I first saw the data, I mentioned to the reader that there appeared to be a bump in the curve near the bottom. I&#8217;ve zoomed in on this region below. The curve coming from the upper left doesn&#8217;t match up with the curve coming from the lower right, resulting in a zig-zag at about X=0.95.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data-ZigZag.png" alt="Bump in Measured Data" /></p>
<p>To me this means there may be some slippage in the mechanism that moves the surface, or some other experimental inconsistency. I would think that the actual curve would look like the dotted line below. If it were my experiment I would repeat the measurements, but the exercise below was performed on the unmodified data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data-UnZigZag.png" alt="Corrected Bump in Measured Data" /></p>
<h2>The Trendline</h2>
<p>I added a 6th order polynomial trendline to the data. Then, since trendlines are bold black lines that blast your eyes and obscure the data being fitted, I formatted the fitted curve as a thin red line. For most of the range of data the trendline fits pretty closely to the measured data, and the R² value of 0.9996 is very close to a perfect 1.0.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+6thPoly.png" alt="Measured Data Plus Sixth Order Polynomial Fit" /></p>
<p>Near the top of the data, the trendline is not very accurate, overshooting the maximum of the measured data, then overcompensating for the decline in the last point.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+6thPolyTop.png" alt="Measured Data Plus Sixth Order Polynomial Fit Showing Lack of Fit at Top" /></p>
<p>The fit at the opposite end also isn&#8217;t very good. The red line careens back and forth like an out-of-control bobsled trying to hold a line. Note how the trendline is thrown off by the discontinuity I remarked about earlier. The trendline actually reaches a local minimum at about X=1.15, then a local maximum at about X=1.20, before continuing downward, but not quite to zero.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+6thPolyBottom.png" alt="Measured Data Plus Sixth Order Polynomial Fit Showing Lack of Fit at Bottom" /></p>
<p>These deviations of the trendline from the measurements, particularly the serpentine behavior at the lower end of the data, illustrates the problem with high order polynomial fits. There&#8217;s really not a physical basis for choosing such a fit; it&#8217;s simply convenient and gives a high R². Of course, a high R² is not the only reason to select a particular mathematical model, and does not by itself mean the model is a good one. You have to decide whether you think the selected model or the actual measurements know more about your data.</p>
<p>We must take care using the trendline equation in the earlier chart. Its coefficients have only five digits, and with so many coefficients multiplied by so many powers of X, errors can accumulate. Here are the points calculated with these imprecise coefficients. Not very good agreement: the calculated points deviate somewhat more than the trendline at the top, and they go off in another direction entirely at the bottom.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+6thPoly+MisCalc.png" alt="Measured Data Plus Sixth Order Polynomial Fit Plus Calculated Points" /></p>
<p>We can improve the precision of the coefficients by formatting the trendline formula in the chart to use scientific notation with 14 digits after the decimal point. It&#8217;s ugly, but it&#8217;s precise.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+6thPolySci.png" alt="Measured Data Plus Sixth Order Polynomial Fit with More Precise Coefficients" /></p>
<h2>Calculating Coefficients</h2>
<p>Now, we could then retype all of these coefficients into worksheet cells, but that would take a long time and leave us cross-eyed. The better approach is to let Excel make the calculations using its LINEST worksheet function. You need a range with five rows and one plus the order of the poly fit columns. For a sixth order fit, we need seven columns. Select the range with the active cell in the top left of the range, type in the formula below, then hold down CTRL and SHIFT while pressing ENTER. CTRL+SHIFT+ENTER produces an array formula, which is a topic that could cover dozens of blog posts.</p>
<pre class="vbasmall"><code>=LINEST(B2:B19,A2:A19^{1,2,3,4,5,6},,TRUE)</code></pre>
<p>B2:B19 contains the Y values, A2:A19 contains the X values. We denote a six order poly fit with the ^{1,2,3,4,5,6} notation, which tells Excel to apply each of the exponents in brackets to X in its calculations. The third argument is left blank, because we don&#8217;t want to force the fit to go through the origin, and the last is TRUE because we do want to fill the entire selected range with calculations. (The first row of the output contains our coefficients. The second through fifth rows of the output contain information about the coefficients and the model, so we could have simply selected the first row of the range and not bothered with the rest.)</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/LinestFormula.png" alt="Entering the LINEST Formula" /></p>
<p>After pressing CTRL+SHIFT+ENTER we get the following table. I&#8217;ve inserted headers at the top to remind me which coefficient is which.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Linest.png" alt="Calculations of the LINEST Formula" /></p>
<h2>Calculating Points</h2>
<p>Now we can calculate values at the X values of interest. In this case, I&#8217;ve used the same X values for which we have measured data. The formula in cell C2 is shown below, and it is filled down to C19.</p>
<pre class="vbasmall"><code>=$N$2+$M$2*A2+$L$2*A2^2+$K$2*A2^3+$J$2*A2^4+$I$2*A2^5+$H$2*A2^6</code></pre>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Sheet3Columns.png" alt="Measured and Calculated Data Tabulated" /></p>
<p>These points exactly fit the trendline, and are pretty close to the measured data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+6thPoly+Calc.png" alt="Measured Data Plus Sixth Order Polynomial Fit Plus Calculated Points" /></p>
<p>I&#8217;ve removed the curved trendline and connected the calculated points with straight lines. With this formatting it may be easier to see the deviation of the calculated points from the measured data.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Data+Calc.png" alt="Measured Data Plus Calculated Points" /></p>
<h2>Residuals</h2>
<p>We can easily calculate the deviation between measured and calculated points. Column D simply shows the difference between columns C and B (calculated minus measured).</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Sheet4Columns.png" alt="Measured and Calculated Data and Error Tabulated" /></p>
<p>And here is a plot of the residuals, which is the fancy word statisticians use for this deviation. It&#8217;s particularly high at the left end of the data (the top, where the trendline overshot the maximum, then overcompensated on the rebound.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Residuals.png" alt="Error Between Measured Data and Calculated Points" /></p>
<h2>Calculating Areas</h2>
<p>The original question the reader had was &#8220;What&#8217;s the area under the curve?&#8221; I think the whole polynomial overfit was really a distraction.</p>
<p>To calculate the area under a curve, we can cut the area into slices, figure out the area of each slice, then add them up to get the total area. We already have data points at certain intervals, so let&#8217;s slice the curve at each point. Here is the sliced up area under the measured curve.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/DataTrapezoids.png" alt="Measured Data Divided Into Trapezoids to Calculate Area" /></p>
<p>Here&#8217;s the sliced up area under the calculated curve. Not too different.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/CalcTrapezoids.png" alt="Calculated Data Divided Into Trapezoids to Calculate Area" /></p>
<p>The slices are trapezoids, and we know the area of a trapezoid: average height times thickness. In columns E and F I&#8217;ve calculated the areas under the measured and calculated curves. The formula in cell E2, filled down to E18, is</p>
<pre class="vbasmall"><code>=(A3-A2)*(B2+B3)/2</code></pre>
<p>where A3-A2 is the thickness and (B2+B3)/2 is the average height. The formula in cell F2, filled down to F18, is</p>
<pre class="vbasmall"><code>=(A3-A2)*(C2+C3)/2</code></pre>
<p>The total areas are summed in row 21.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/Sheet6Columns.png" alt="Area Computed for Both Measured and Calculated Data" /></p>
<p>The two computed areas are unusually close, differing by less than 0.03%. In this case, there was no benefit to using a trendline to calculate this area.</p>
<p>You could make the case that trapezoids don&#8217;t accurately capture the area under a curve if the data shows lots of curvature. If we had taken measurements more frequently, our points would lie under or over the straight top segments of our trapezoids. If we believe our trendline, we could calculate values at closer intervals, as shown below. We might then say that the computed area was more accurate.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2011-05/ManyCalcTrapezoids.png" alt="Calculated Data Divided Into Thin Trapezoids to Calculate Area" /></p>
<p>The area calculated for the thinner slices was 1.404070, which is about 0.2% less than the areas computed using thicker trapezoids. This difference is probably from the range between X=0.80 and X=0.97, where curvature in the trendline moved it below the straight line segments of the measured data. Is this a better value? It&#8217;s not substantially different from the calculation based on only the unmodified measurements, and I&#8217;m sure there are greater sources of error in the experimental setup.</p>
<p>Peltier Technical Services, Inc., Copyright © 2011.<br /> <br /><span style="font: 80% Verdana,Tahoma,Arial,sans-serif;">Licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="nofollow" rel="license" >Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.<br /> <br />
<a href="http://www.exceluser.com/cmd.asp?Clk=1374689" rel="nofollow" ><IMG SRC="http://www.exceluser.com/images/info/pub/info_dash_c02.gif" ALT="Learn how to create Excel dashboards." WIDTH="468" HEIGHT="60" border=0></a><br />
<br /><img src="http://www.exceluser.com/cmd.asp?Imp=1374689" width="0" height="0" border="0"></p>
]]></content:encoded>
			<wfw:commentRss>http://peltiertech.com/WordPress/area-under-a-fitted-curve/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Copy a Pivot Table and Pivot Chart and Link to New Data</title>
		<link>http://peltiertech.com/WordPress/copy-pivot-table-pivot-chart-link-to-new-data/</link>
		<comments>http://peltiertech.com/WordPress/copy-pivot-table-pivot-chart-link-to-new-data/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 07:00:14 +0000</pubDate>
		<dc:creator>Jon Peltier</dc:creator>
				<category><![CDATA[Data Techniques]]></category>

		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=3212</guid>
		<description><![CDATA[A very common task you may have is to take a chart you&#8217;ve painstakingly formatted and use it with new data. I described a few ways to handle this in Make a Copied Chart Link to New Data. Most commonly you have a worksheet with a bunch of data and a corresponding chart, and you [...]]]></description>
			<content:encoded><![CDATA[<p>A very common task you may have is to take a chart you&#8217;ve painstakingly formatted and use it with new data. I described a few ways to handle this in <a href="http://peltiertech.com/WordPress/make-a-copied-chart-link-to-new-data/"title="Make a Copied Chart Link to New Data | Peltier Tech Blog | Excel Charts" >Make a Copied Chart Link to New Data</a>.</p>
<p>Most commonly you have a worksheet with a bunch of data and a corresponding chart, and you have another sheet of data you want to add a chart to. Copying and pasting the chart onto the new sheet requires you to change links in the chart, usually series by series. This is tedious and error-prone. But the article above describes an easier way:</p>
<ol style="margin-left: 24pt;">
<li>Make a copy of the worksheet with the old data and chart;</li>
<li>Copy the new data;</li>
<li>Paste the new data over the old data on the copied worksheet.</li>
</ol>
<p><span id="more-3212"></span>The chart on the new worksheet updates as soon as the new data is pasted into place. Works in Excel 2003 and earlier, and in Excel 2007 if you&#8217;ve installed the latest service packs.</p>
<p>What if the worksheet contains a pivot table and its sister pivot chart? Well, knowing the above protocol, you&#8217;d think you could copy the worksheet, change the data, and the chart would be fine. And in Excel 2003 this set of steps works great:</p>
<ol style="margin-left: 24pt;">
<li>Make a copy of the worksheet with the old pivot table and pivot chart;</li>
<li>Change the pivot table&#8217;s data source to the new range;</li>
<li>Refresh the pivot table.</li>
</ol>
<p>The new pivot chart (on the copied sheet) retains its link to the pivot table on its parent worksheet, so it updates as soon as the pivot table is refreshed.</p>
<p>But in Excel 2007, these steps  don&#8217;t work the same way. When you copy the worksheet with the pivot table and chart, not only does the new pivot table link to the same old data, the new chart also links to the old pivot table. You can easily enough change the data source of the new pivot table to the appropriate range; in fact, this is easier to do in Excel 2007 than in earlier versions. But the new chart cannot be linked to the new pivot table. It is permanently linked to the old pivot table.</p>
<p>My colleague Bill Manville wrote to me about this problem, citing an old forum post in which I doubted this could ever be solved. I&#8217;m glad to say that Bill has proved me wrong. He sent me a new protocol that makes this work.</p>
<ol style="margin-left: 24pt;">
<li>Make a copy of the worksheet with the old pivot table and pivot chart <em>in a different workbook</em>;</li>
<li>Move the copied worksheet back into the original workbook;</li>
<li>Change the new chart&#8217;s source data to the new pivot table;</li>
<li>Change the pivot table&#8217;s data source to the new range;</li>
<li>Refresh the pivot table.</li>
</ol>
<p>The difference is that the worksheet is copied into a new workbook (or another existing workbook) rather than within the original workbook. When this happens, the pivot table still links to the original data, but the chart becomes unlinked from the pivot table. In fact, the chart changes back from a pivot chart to a regular chart. It is unlinked from any data range, and the series formulas have been converted to written-out arrays.</p>
<p>Since the chart has been unlinked in the process, it now can be relinked. Thanks, Bill, for saving us lots of time and effort.</p>
<p>Bill tells me that the familiar Excel 2003 behavior has been restored to Excel 2010.</p>
<p>Peltier Technical Services, Inc., Copyright © 2011.<br /> <br /><span style="font: 80% Verdana,Tahoma,Arial,sans-serif;">Licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="nofollow" rel="license" >Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.<br /> <br />
<a href="http://www.exceluser.com/cmd.asp?Clk=1374689" rel="nofollow" ><IMG SRC="http://www.exceluser.com/images/info/pub/info_dash_c02.gif" ALT="Learn how to create Excel dashboards." WIDTH="468" HEIGHT="60" border=0></a><br />
<br /><img src="http://www.exceluser.com/cmd.asp?Imp=1374689" width="0" height="0" border="0"></p>
]]></content:encoded>
			<wfw:commentRss>http://peltiertech.com/WordPress/copy-pivot-table-pivot-chart-link-to-new-data/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Time Trials of Approaches to Measure Minimum and Maximum Chart Values</title>
		<link>http://peltiertech.com/WordPress/time-trials-measure-min-max-excel-chart-values/</link>
		<comments>http://peltiertech.com/WordPress/time-trials-measure-min-max-excel-chart-values/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 07:00:34 +0000</pubDate>
		<dc:creator>Jon Peltier</dc:creator>
				<category><![CDATA[Data Techniques]]></category>
		<category><![CDATA[series formula]]></category>
		<category><![CDATA[VBA]]></category>

		<guid isPermaLink="false">http://peltiertech.com/WordPress/?p=3182</guid>
		<description><![CDATA[In Get a Maximum and Minimum Value from Certain Charts, John Mansfield showed a simple routine he uses to find the maximum and minimum Y values in a chart. In the ensuing discussion, alternatives were suggested with various justifications. I wanted to say that this one was faster than that one, but I realized that [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.cellmatrix.net/index.php/site/comments/get_a_maximum_and_minimum_value_from_certain_charts/" rel="nofollow" title="Get a Maximum and Minimum Value from Certain Charts" >Get a Maximum and Minimum Value from Certain Charts</a>, John Mansfield showed a simple routine he uses to find the maximum and minimum Y values in a chart. In the ensuing discussion, alternatives were suggested with various justifications. I wanted to say that this one was faster than that one, but I realized that I didn&#8217;t know what was fastest. So I decided a series of time trials were called for.</p>
<h2>Experimental Design</h2>
<p>There are two ways to find the minimum and maximum of a set of values:</p>
<ul style="margin-left: 24pt;">
<li>Loop through the values, and keep track of the largest and smallest you&#8217;ve encountered</li>
<li>Use the worksheet functions <tt class="tt">MIN()</tt> and <tt class="tt">MAX()</tt> (i.e., <tt class="tt">WorksheetFunction.Min</tt> and <tt class="tt">.Max</tt>)</li>
</ul>
<p>There are three ways to obtain the values for each series:</p>
<ul style="margin-left: 24pt;">
<li>Use the <tt class="tt">.Values</tt> property of the series, which returns a 1-D array</li>
<li>Parse the series formula to identify the range, then get the <tt class="tt">.Value</tt> of this range object, which returns a 2-D array</li>
<li>Parse the series formula to identify the range, and analyze the range itself</li>
</ul>
<p><span id="more-3182"></span>This suggests a 2&#215;3 matrix, but I eliminated looping cell-by-cell through the range to find the largest and smallest values, since experience shows that looping through the cells of a range in this way is much slower than using the <tt class="tt">.Value</tt> property of the range to generate an array, and working with this array.</p>
<p>The five-block test matrix becomes:</p>
<ul style="margin-left: 24pt;">
<li>Looping Over Series Values</li>
<li>Worksheet Function on Series Values</li>
<li>Looping Over Range Values</li>
<li>Worksheet Function on Range Values</li>
<li>Worksheet Function on Range Object</li>
</ul>
<p>I wrote VBA procedures to perform each of these approaches. Then I wrote a master procedure that called each of these procedures 1000 times, 50 times each in 20 blocks, so I&#8217;d have the mean time for each block. The overall mean and standard deviation of the block means can be used to compare the different approaches.</p>
<p>I used a simple XY chart with 1000 points. Both X and Y values were taken from the standard normal distribution. The procedures below could be adapted to check the Y values of any chart and the X values of an XY chart or a line/column/bar/area chart with a date scale X axis. Here the X values don&#8217;t matter since I was merely sampling Y.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2010-03/1000datapoints.png" alt="XY chart used in Min-Max time trials" /></p>
<p>I ran the tests in Excel 2003 SP3 on a reasonably modern laptop, running Windows 7 Ultimate with 4GB of memory on a 2.20 GHz dual-core AMD processor.</p>
<h2>VBA Procedures</h2>
<h3>Looping Over Series Values</h3>
<pre class="vbasmall"><code>Sub Max_Min_Loop_Values()
  Dim Cht As Chart
  Dim Srs As Series
  Dim i As Long
  Dim A As Variant
  Dim MinVal As Double
  Dim MaxVal As Double 

  Set Cht = ActiveSheet.ChartObjects(1).Chart
  Set Srs = Cht.SeriesCollection(1)
  A = Srs.Values
  MaxVal = A(LBound(A, 1))
  MinVal = A(LBound(A, 1))
  For i = LBound(A, 1) + 1 To UBound(A, 1)
    If A(i) &gt; MaxVal Then MaxVal = A(i)
    If A(i) &lt; MinVal Then MinVal = A(i)
  Next i
End Sub</code></pre>
<p>Assuming there is one chart on the active worksheet, and one series in the chart, we set the chart and the series simply. We assign the Y values of the series to an array, and initially set the minimum and maximum values to the first value in the array. Then we loop from the second to the last values in the array, and if a given value is larger than the maximum value so far, we set the new maximum to the current value, and likewise for the minimum. I didn&#8217;t output the minimum and maximum values, all I wanted was the time to run the core of the procedure.</p>
<h3>Worksheet Function on Series Values</h3>
<pre class="vbasmall"><code>Sub Max_Min_Wkfn_Values()
  Dim Cht As Chart
  Dim Srs As Series
  Dim A As Variant
  Dim MinVal As Double
  Dim MaxVal As Double  

  Set Cht = ActiveSheet.ChartObjects(1).Chart
  Set Srs = Cht.SeriesCollection(1)
  A = Srs.Values
  With WorksheetFunction
    MaxVal = .Max(A)
    MinVal = .Min(A)
  End With
End Sub</code></pre>
<p>We set the chart and the series simply as above. We assign the Y values of the series to an array, and use the worksheet functions <tt class="tt">Min()</tt> and <tt class="tt">Max()</tt> to determine the minimum and maximum values.</p>
<h3>Looping Over Range Values</h3>
<pre class="vbasmall"><code>Sub Max_Min_Loop_RangeArray()
  Dim Cht As Chart
  Dim Srs As Series
  Dim i As Long
  Dim A As Variant
  Dim MinVal As Double
  Dim MaxVal As Double
  Dim sSeriesFmla As String 

  Set Cht = ActiveSheet.ChartObjects(1).Chart
  Set Srs = Cht.SeriesCollection(1)
  sSeriesFmla = Split(Srs.Formula, ",")(2)
  A = Range(sSeriesFmla).Value
  MaxVal = A(LBound(A, 1), 1)
  MinVal = A(LBound(A, 1), 1)
  For i = LBound(A, 1) + 1 To UBound(A, 1)
    If A(i, 1) &gt; MaxVal Then MaxVal = A(i, 1)
    If A(i, 1) &lt; MinVal Then MinVal = A(i, 1)
  Next i
End Sub</code></pre>
<p>We set the chart and the series simply as above. We parse the formula of the series to find the range, use the <tt class="tt">.Value</tt> property of the range to define an array, and initially set the minimum and maximum values to the first value in the array. Then we loop from the second to the last values in the array, and if a given value is larger than the maximum value so far, we set the new  maximum to the current value, and likewise for the minimum.</p>
<h3>Worksheet Function on Range Values</h3>
<pre class="vbasmall"><code>Sub Max_Min_Wkfn_RangeArray()
  Dim Cht As Chart
  Dim Srs As Series
  Dim A As Variant
  Dim MinVal As Double
  Dim MaxVal As Double
  Dim sSeriesFmla As String 

  Set Cht = ActiveSheet.ChartObjects(1).Chart
  Set Srs = Cht.SeriesCollection(1)
  sSeriesFmla = Split(Srs.Formula, ",")(2)
  A = Range(sSeriesFmla).Value
  With WorksheetFunction
    MaxVal = .Max(A)
    MinVal = .Min(A)
  End With
End Sub</code></pre>
<p>We set the chart and the series simply as above. We parse the formula of  the  series to find the range, use the <tt class="tt">.Value</tt> property of the range to define an array, and use the worksheet functions <tt class="tt">Min()</tt> and <tt class="tt">Max()</tt> to determine the minimum and maximum values.</p>
<h3>Worksheet Function on Range Object</h3>
<pre class="vbasmall"><code>Sub Max_Min_Wkfn_Range()
  Dim Cht As Chart
  Dim Srs As Series
  Dim rng As Range
  Dim MinVal As Double
  Dim MaxVal As Double
  Dim sSeriesFmla As String 

  Set Cht = ActiveSheet.ChartObjects(1).Chart
  Set Srs = Cht.SeriesCollection(1)
  sSeriesFmla = Split(Srs.Formula, ",")(2)
  Set rng = Range(sSeriesFmla)
  With WorksheetFunction
    MaxVal = .Max(rng)
    MinVal = .Min(rng)
  End With
End Sub</code></pre>
<p>We set the chart and the series simply as above. We parse the formula of the series to find the range, and use the Worksheet Functions <tt class="tt">Min()</tt> and <tt class="tt">Max()</tt> to determine the minimum and maximum values of the range object directly.</p>
<h3>Timing Procedure</h3>
<pre class="vbasmall"><code>Sub TestTimes()
  Dim Looper As Long
  Dim Blocker As Long
  Dim tStart As Double
  Dim tTotal As Double
  Const LooperMax As Long = 50
  Const BlockerMax As Long = 20  

  For Blocker = 1 To BlockerMax
    tStart = Timer
    For Looper = 1 To LooperMax
      Max_Min_Loop_Values
    Next
    tTotal = Timer - tStart
    WriteToLog "Loop - Values: " &amp; tTotal &amp; " sec for " &amp; LooperMax &amp; " iterations"
  Next  

  For Blocker = 1 To BlockerMax
    tStart = Timer
    For Looper = 1 To LooperMax
      Max_Min_Wkfn_Values
    Next
    tTotal = Timer - tStart
    WriteToLog "Func - Values: " &amp; tTotal &amp; " sec for " &amp; LooperMax &amp; " iterations"
  Next  

  For Blocker = 1 To BlockerMax
    tStart = Timer
    For Looper = 1 To LooperMax
      Max_Min_Loop_RangeArray
    Next
    tTotal = Timer - tStart
    WriteToLog "Loop - RngArr: " &amp; tTotal &amp; " sec for " &amp; LooperMax &amp; " iterations"
  Next  

  For Blocker = 1 To BlockerMax
    tStart = Timer
    For Looper = 1 To LooperMax
      Max_Min_Wkfn_RangeArray
    Next
    tTotal = Timer - tStart
    WriteToLog "Func - RngArr: " &amp; tTotal &amp; " sec for " &amp; LooperMax &amp; " iterations"
  Next  

  For Blocker = 1 To BlockerMax
    tStart = Timer
    For Looper = 1 To LooperMax
      Max_Min_Wkfn_Range
    Next
    tTotal = Timer - tStart
    WriteToLog "Func - Range:  " &amp; tTotal &amp; " sec for " &amp; LooperMax &amp; " iterations"
  Next 

End Sub</code></pre>
<p>This procedure calls each of the Min_Max procedures <tt class="tt">LooperMax</tt> (50) times, records the total time for 50 iterations, and repeats this <tt class="tt">BlockerMax</tt> (20) times for each procedure, before moving on to the next. The times are recorded in a text file using the <tt class="tt">WriteToLog</tt> procedure below.</p>
<h3>Logging Procedure</h3>
<pre class="vbasmall"><code>Public Sub WriteToLog(sLogEntry As String)
  ' write information to a log file
  Dim iFile As Integer
  Dim sFileName As String
  Const sLogFile As String = "TimeLogger"   

  sFileName = ThisWorkbook.Path &amp; "\" &amp; sLogFile &amp; Format$(Now, "YYMMDD") &amp; ".txt"
  iFile = FreeFile
  Open sFileName For Append As iFile
  Print #iFile, Now; " "; sLogEntry
  Close iFile
End Sub</code></pre>
<p>This is a very handy procedure I use all the time for logging debug information or for saving settings. It uses good old VB I/O protocols, which are very fast, to populate a text file.</p>
<h2>Test Results</h2>
<p>The results of these experiments are shown in the following table. The approaches are sorted from slowest to fastest. Processing an array extracted from the source data range takes about 1.2 milliseconds, whether looping or using the worksheet functions. Processing the series values array takes about 0.7 milliseconds by either technique. Measuring the range directly takes less than 0.5 milliseconds.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2010-03/1000iterationstats.png" alt="XY chart used in Min-Max time trials" /></p>
<p>50-iteration means of the 20 blocks for each approach are plotted in this chart, with means sorted from fastest to slowest to give a pseudo-distribution of procedure times.</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2010-03/1000iterationplot.png" alt="XY chart used in Min-Max time trials" /></p>
<p>This chart is stepped, because there seems to be a discrete quantum of time, that is, a resolution of about 0.078 milliseconds in the measurement of time using VBA&#8217;s Timer function on this microprocessor.</p>
<h2>What These Results Mean</h2>
<p>One school of thought is that looping an array would be slow compared to  a single function call, while another is that any call from VBA to  Excel (i.e., <tt class="tt">WorksheetFunction</tt>) would incur a  performance hit. An unexpected result is that on this machine, it makes essentially no difference whether I use a loop or worksheet functions to find the minimum and maximum values in a series of 1000 points. I guess that&#8217;s why we occasionally bother to run these time trials.</p>
<p>We see that using the array of series values is faster than using the array of values from the source data range, by not quite a factor of two. Thinking of the VBA-Excel interface, we have to cross it once to get the series values, or twice to get the series formula and again to get the range values. This makes sense, and I have to admit, I would never have thought of getting the values from the source data range, since they are so easy to extract using series <tt class="tt">.Values</tt>.</p>
<p>The fastest method of all, which was proposed by Daniel Ferry (<a href="http://excelhero.com/blog" rel="nofollow" >Excel Hero</a>), is to parse the series formula to find the range, and directly measure the min and max of this range. Apparently the performance hit to cross the VBA-Excel barrier is offset by the much faster performance of worksheet functions operating on Excel objects directly within Excel.</p>
<p>Although the direct range object measurement approach is fastest, I will stick to my series values approach. The main reason is that not all series values come from a single range in a worksheet, or even from a worksheet range at all: some series use multiple area ranges, while others have values hard-coded as a literal array directly in the series formula.</p>
<p>This is a &#8220;normal&#8221; series formula (based on the first ten points of experimental data from our chart above):</p>
<pre class="vbasmall"><code>=SERIES(<span style="color: #008000;">Sheet1!$B$1</span>,<span style="color: #800080;">Sheet1!$A$2:$A$11</span>,<span style="color: #0000ff;">Sheet1!$B$2:$B$11</span>,<span style="color: #ff0000;">1</span>)</code></pre>
<p>In this formula, <span style="color: #008000;">the series name is the argument in green</span>, <span style="color: #800080;">X values are in purple</span>, <span style="color: #0000ff;">Y values are in blue</span>, and <span style="color: #ff0000;">plot order is in red</span>. These are the same colors Excel uses to highlight the data ranges of a selected series in the underlying worksheet (illustrated below).</p>
<p><img style="display: block; margin-left: auto; margin-right: auto;" src="http://peltiertech.com/images/2010-03/10datapointhighlight.png" alt="Excel highlights data range of selected=" /></p>
<p>This formula is for a series that uses multiple area ranges for X and Y values, the areas for X and Y enclosed in parentheses:</p>
<pre><code>=SERIES(Sheet1!$B$1,(Sheet1!$A$2:$A$501,Sheet1!$A$502:$A$1001),
(Sheet1!$B$2:$B$501,Sheet1!$B$502:$B$1001),1)</code></pre>
<p>Multiple area formulas are not too uncommon. Less common is a series whose formula has hard-coded arrays as series values (first ten points of experimental data converted to values using F9 key):</p>
<pre class="vbasmall"><code>=SERIES(<span style="color: #008000;">"Y"</span>,<span style="color: #800080;">{-0.172902780162854;0.0788276881579948;0.752718316150293;-1.15534740638389;
-0.651271029427549;1.61522977181724;0.218894823127643;-1.38264191435207;
0.107605524088065;0.281018951052876}</span>,
<span style="color: #0000ff;">{-0.640448122430912;-0.355817827507841;-0.318480200209528;-0.0116013833433011;
-1.15185971606638;0.183605522997797;-0.376897171002227;-0.488810695788606;
-2.14375416629929;0.154502474308393}</span>,<span style="color: #ff0000;">1</span>)</code></pre>
<p>There is a practical limit of around 250 characters (you&#8217;d guess 255, but it&#8217;s slightly less) each for the X values and Y values of a series formula. In the formula above, the ten X values consume 183 characters, the ten Y values 188. You can squeeze in more values by reducing the resolution of the values (e.g., -0.173 in place of -0.172902780162854), but that&#8217;s just nibbling around the margins. A practical limit is around 17 or 18 values before the formula crashes.</p>
<p>The arrays (within {curly braces} above) are delimited by semicolons in this case, because the original data came from a column, but data in rows would be delimited by commas.</p>
<p>These complications in the series formula lead to errors when using simple approaches to parse such formulas. John Walkenbach has developed <a href="http://spreadsheetpage.com/index.php/tip/a_class_module_to_manipulate_a_chart_series/" rel="nofollow" title="Spreadsheet Page Excel Tips: A Class Module To Manipulate A Chart Series" >A Class Module To Manipulate A Chart Series</a> which can deal with such difficult situations.</p>
<p>Since accessing values using the <tt class="tt">.Values</tt> property is so easy, and since this approach is only about 50% slower than the risky approach of parsing the formula and measuring the range directly, that&#8217;s the approach I will stick with.</p>
<p>Peltier Technical Services, Inc., Copyright © 2011.<br /> <br /><span style="font: 80% Verdana,Tahoma,Arial,sans-serif;">Licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="nofollow" rel="license" >Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>.<br /> <br />
<a href="http://www.exceluser.com/cmd.asp?Clk=1374689" rel="nofollow" ><IMG SRC="http://www.exceluser.com/images/info/pub/info_dash_c02.gif" ALT="Learn how to create Excel dashboards." WIDTH="468" HEIGHT="60" border=0></a><br />
<br /><img src="http://www.exceluser.com/cmd.asp?Imp=1374689" width="0" height="0" border="0"></p>
]]></content:encoded>
			<wfw:commentRss>http://peltiertech.com/WordPress/time-trials-measure-min-max-excel-chart-values/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
	</channel>
</rss>

