Yogi Bera is credited with saying, "It's tough to make
predictions, especially about the future."
humor here is that forecasting implies the application of a
variety of tools to predict future situations, or results. "Fore" implies a forward
predication, though it is possible to use data from the 1920s to forecast for
the 1930s, comparing our forecasts to the actual data from the 1930s. If we move
backward, attempting to predict a previous condition, the term
can be used, though the word, "predict," is misleading here, and should instead
be "suggest." If we know existing time series data points, but there are some
holes, we can estimate the values for those missing points between known points;
that process is called interpolation.
One way to classify
types of forecasts are based on the type of data. quantitative
forecasting is based on numeric data from several different time periods of the
past that can be assumed to provide a pattern that allows us to predict the
future; whereas qualitative forecasting relies more heavily on
non-numeric data, the judgments of specialists, and their variety of knowledge,
according to Makridakis, Wheelwright, & Hyndman (1998). They suggest the
following general steps in quantitative forecasting:
- Defining the problem
- Acquiring relevant data and judgments of key individuals
- Initial graphing and analysis
- Selecting and adapting a model to fit the data
- Applying that model to forecast future data, and evaluating that model
Time-Series Data and Trends
As noted, some data is particular helpful in quantitative forecasting. This
is time-series data, where we know the numerical value for several
different points in time. Typically, time is plotted along the horizontal x-axis
of a graph, and the variable measured or to be predicted is plotted along the
vertical y-axis. The term trend implies a
pattern of change over time, and if we suspect a trend exists, we may be able to
extrapolate that trend to future time periods, forecasting data for those
For example, sales at an ice cream stand
at the town park in June of each of the last five years has been good, but
in July it was about 20% more than in June. If this year, the stand took in
$10,000 in June (a new record), how much would you predict it will
take in in July? Well, if we were correct in our assumption based on
the historical data, we'd estimate the July figure would be 20% higher, or
Sometimes, the data does not suggest a trend, or the trend
is so weak that it might as well not exist, as in the following:
Figure 1. Random time-series data not suggesting a strong
Other times, however, there does seem to be a pattern to
time-series data, and in these cases, we may be able to use trend extrapolation
to forecast future data. First, let's look at some different classifications of
Some Common Types
Trends are often shown graphically (as
line graphs) with the level of a dependent variable on the y-axis and the
time period on the x-axis. There are different types of trends, including the
The following graphs each contain 100 points of fictitious
data connected with a blue line, and the trend superimposed with a black line.
They are based on Figure 3-1 on Page 112 from Levin, Rubin, & Stinson (1986).
Constant trends are those where there is no net
increase or decrease.
Figure 2. A constant trend that does not rise or fall.
However, there may be seasonality, or a periodic
fluctuation (as there would be in a graph of the temperature over a 1-year
period, with daytime temperatures higher than nighttime):
Figure 3. A constant trend that fluctuates regularly.
The above graph shows the same data with
periodic additions and subtractions. The blow graph shows this data, but the
periods are based on a multiple of the data in the x-axis. Often, it
is instead a multiplicative factor of the y-axis that is used.
Figure 4. A constant trend that fluctuates regularly, and
the degree of that fluctuate is a factor of the y-value, or in this case,
Linear trends show a steady,
straight-line increase or decrease. So the trendline may go up or down, and
the angle may be steep or shallow.
Figure 5. A linear trend showing a steady increase.
Figure 6. A linear trend with additive seasonality.
Figure 7. A linear trend
where the periodic fluctuation is a factor of the y-value.
Exponential trends are those where the data rises or
falls not at a steady rate, but at an increasing rate. The x-value
(plotted horizontally) is an exponent of the trendline formula to derive the
Figure 8. An upward exponential trend, where the x value
is an exponent in the equation to derive y.
Figure 9. An upward exponential trend with additive
Figure 10. An upward exponential trend with multiplicative
Damped trends are those that approach a horizontal
Figure 11. A damped trend.
Polynomial trends are those best
modeled by a polynomial equation. They may be second-order (quadratic)
equations of the form y = ax2 + bx + c,
resulting in a parabolic shape:
Figure 12. A second-order polynomial trend.
Figure 13. Another second-order polynomial trend.
Polynomial trendlines may also be third order (y
= ax3 + bx2 + c) or higher:
Figure 14. A third-order polynomial trend.
Figure 15. A third-order polynomial trend with additive
Figure 16. A third-order polynomial trend where
seasonality is a factor of the y value.
Figure 17. Another third-order polynomial trend.
Quantitative Forecasting using Trend
There are several tools available for using trend
extrapolation to first plot a trendline to historical time-series data, and then
extend this to future periods for the purpose of forecasting or predicting
values for those periods. Some might be tempted to visually extend a trendline
to future periods with a pencil on a printed graph, but algebraic techniques are
more precise, more varied, and more powerful.
are three general approaches to use algebraic techniques for trendline
extrapolation. The first of these involves applying formulas to calculate a
future period. The second technique is to make use of specialized functions
within a spreadsheet program or another data analysis program. The third
technique is to use a spreadsheet or data analysis program to construct a graph
with a trendline, and to automatically extend that trendline to future periods.
Based on Chapter 3 of Levin, Rubin, & Stinson,
(1986), I have created a tutorial (with practice) on using formulas in Microsoft
Excel™ to perform forecasts using the following forecasting techniques:
Please download the file from one of the following and follow
the instructions (required download), checking your answers.
Weighted Moving Averages
Single Exponential Smoothing
Double Exponential Smoothing
Notice that it is not that difficult to
type in the formulas that are needed to do the forecasting. In some
instances, such as Single Exponential Smoothing, there is a built-in feature
in the Microsoft Excel Analysis ToolPak™ Add-In.
for Trend Extrapolation
Microsoft Excel™ offers some built-in
tools for forecasting. One of these allows you to add a trendline to
existing data points on a chart. However, as with templates, the
developers of these tools have made some decisions for the user, and not all
users would agree with those decisions. If the limitations imposed by
Microsoft Excel's features for forecasting are inappropriate for a
particular forecasting task, the reader is instead encouraged to use direct
numerical manipulation using proven analytical techniques as described in
any of several texts on forecasting (such as Makridakis, Wheelwright & Hyndman,
Before You Begin
This remainder of this page is based on
the assumption that the reader has Microsoft
Excel™ 2007 with the Analysis ToolPak Add-In from Microsoft installed.
Getting the Data
Let's look at some data concerning
compact fluorescent lamps (CFLs) using the following as a source document:
US Department of Energy. (2009).
CFL Market Profile - March 2009. Washington, DC: Author. Retrieved
April 7, 2009 from
The analysis in this report was performed
by D & R International, LTD (http://www.drintl.com/.)
On Page 2, there is a bar chart (or bar
lists the number of shipments of CFLs by year up to 2007, and then predicts
the number of shipments in 2008, 2009, and 2010, based on that data.
Figure 18. Bar chart from US DOE, 2009, p.2.
Let's use the data in this chart, and the
power of Microsoft Excel™, to make a similar prediction. Ideally, you would
have the actual data values, but in this case, an estimate was made based on
the above graphic and the following was entered into an Excel spreadsheet.
Table 1. Raw data.
Let's only look at the historical data for 2000 to 2007,
not at the estimates or predictions for 2008 to 2010. We can recreate
the bar chart shown in the source document by selecting the historical
data in Excel and creating a bar chart:
19. Raw data in a bar chart to match original
But instead, let's create a scatter plot of the values
(since Excel's trendline equation feature can produce errors with bar
charts or line graphs.)
20. Raw data in a scatter plot.
Linear Trendline and
Now, remember, we are just concerned with the CFL data,
and we want to be able to predict future years. In order to add a
trendline, click on one of the icons representing a data point for CFLs,
and then right-click and select "Add Trendline." You'll see the
following dialog box.
Figure 21. Trendline options dialog box.
In this example, we will assume that the number of CFLs
shipped per year increases at a steady or linear rate. For now, in the Trendline Options area, select the
Trend / Regression type: Linear
Forecast - Forward 3 periods
Equation on chart
After moving the equation we have:
Figure 22. Raw data with a linear trendline and regression equation.
The equation is a linear regression equation.
That means that is it the equation of a straight line that best fits the
points on the chart. The method Excel uses to determine this equations involves finding
the line that produces the least value for the sum of the squares of the
vertical differences between data points and the line. Like all lines,
it has an equation in the form:
y is the number to be
calculated, the dependent
variable, or in this case, the number of millions of CFLs shipped per
m is the slope of the line, which equals the change in
the y value divided by the change in the x value;
x is the given data point or the dependent
variable, in this case, it is the year; and
b is the y-axis intercept of the line.
The equation, in this case, is:
That means that for the year 2010, the predicted value
y = (40.429 * 2010) + 80874
y = 388 million CFLs shipped
We can substitute other values for x, such as the year
2020, and since we now have an equation, we can predict that there will
be 793 million CFLs shipped in the year 2020. Of course, this is making
a lot of assumptions that we shouldn't make. In particular, we are
assuming that the trend is linear, and that it will continue far into
Method. You can find out the equation directly from the
tabled data, if you like. Select two cells like G5 and G6 and then start
typing in the formula: =LINEST(range) for the range, select all
of the known y values, then type the closing parenthesis, but don't hit
the Enter key. Instead, hit Control-Shift-Enter. You'll see the slope
and the intercept appear in these two cells.
Many trends are not linear. For example, human
population on the planet was fairly linear, but then it shot up, as
illustrated by the red line in the following figure:
There are several non-linear predictive
equations. We'll look at two, exponential equations and polynomial
equations, but you are advised to explore others.
Let's take that same historical CFL shipment data we used
above and apply some non-linear trendlines. Here is an exponential trendline.
It uses an equation that has the x value (the year) as an exponent. I
clicked on the new equation an selected "format trendline label" to display
the equation in scientific notation with six decimal points, since the
default does not give me enough precision for predicting.
Figure 24. Raw data with exponential trendline.
As we can see, there trendline is curved, not quite as much
as is indicated by the relatively high 2007 datum point, but it is still
The predictive equation is:
y = 1.598767 E -279 e ^ 3.226616 E -01 x
Recall that the capital E means "Times ten to the power of"
and that the lower case e is a constant approximately equal to 2.71828. In
Excel, I can then type in the following formula in any cell:
and by replacing "2010" with the year, get a prediction for
that year. The value for 2010 is 733 million CFLs, and the value for 2012 is
1.398 billion CFLs.
The predictive equation can be a polynomial. We saw that the
linear regression equation was
y = mx + b
A second-order, or quadratic, polynomial equation adds an x2
term, resulting in:
y = ax2 + bx + c
The graph of a quadratic equation of this form is typically
a parabola. Here is the same data with a second order polynomial trendline:
Figure 25. Second order polynomial trend line with equation.
It is possible to increase the order, adding an x3,
x4, or x5 term, if there is reason to believe such a
curve will be more accurate.
Modifying the Data
Sometimes, we suspect the data should be modified. In our
example, notice how high the value of 400 was for 2007. An analyst might
have reason to believe that this point was an outlier, and due to some
special circumstances, like a one-time marketing blitz, the high value of
this datum is throwing off the future prediction. Let's alter the data,
reducing that point to 300.
Table 2. Revised data.
Using the revised data and second-order polynomial
forecasting, we get:
Figure 26. The value of 2007 was changed from 400 to 300 in
the belief that this was an abnormal value.
Notice how Figure 10 is relatively close to the initial
prediction shown in the original US DOE source document.
There are many ways to transform and
adjust data, and in each instance the analyst should have a defensible line
of reasoning that justifies the transformation.
As with many forms of statistical
analysis, trendline extrapolation can be subject to deliberate attempts to
make the data suggest the analyst's bias. This is inappropriate. Where there
are alternative projections, it is best to present them with explanations of
each. For example, the following illustration shows several different paths
world population might take given different conditions explained by the
Figure 27. "World population according to five projection
scenarios, 1950 - 2150" from United Nations,
1998, p. 4.
Other Quantitative Methods
As seen on Excel's Trendline Options
dialog box, there are other types of trendlines that can be added, including
a logarithmic, power, and moving average trendline. The Analysis ToolPak
Add-in for Excel also has several forecasting tools. To access them, click
Data Analysis in the Data tab. You will see moving average, regression, and
exponential smoothing there, all of which can be used to forecast.
But don't stop there, Excel, like some
other programs for numerical manipulation, allows the user to directly
control the formulas used to derive values. We do not have to settle for the
default settings used in the Add Trendline feature of charts, but we can
instead perform the necessary calculations on the data directly.
For information on the methods in this
lesson, and others, such as the Box-Jenkins method, dynamic regression,
multiple regression, please consult a text on forecasting, such as the one
by Makridakis, Wheelwright, & Hyndman
Forecasting Exercise / Example
There is another spreadsheet that contains an example of using Microsoft
Excel's forecasting functions to predict future values. Using data on the number
of compact fluorescent lamps (CFLs) shipped, and on the recycling rate of CFLs
and their mercury content, this spreadsheet guides you through the process of
predicting the cumulative environmental discharge of toxic mercury from CFLs
that are not recycled (required download):.
Evaluating Trend Analysis &
Coefficient of Determination
If you apply a trendline to a graph using Excel, you
may notice that you can have both the trendline equation displayed, and the
value for r2. r2 is called the
coefficient of determination, and it is a measure of the proportion
of the variability that is explained by the trendline equation. If r2
= .74, then 74% of the fluctuation of the data corresponds to the trendline,
and 26% of the fluctuation deviates from what would have been predicted by
the trendline. Thus, an intermediary step in the evaluation of models for
trendline analysis would be to look at the value of the coefficient of
determination, but do not let this be the only criterion.
For example, consider the following sample data
Table 3. More raw data.
We could map a linear regress line to the historical data (Time = 1 to
8), but when we use Excel to map a 3rd order polynomial trendline, we find
the value for r2 is much higher, at R2 = 0.9787:
Figure 28. Finding a trend that fits: 3rd order polynomial?
This means that the trendline curve accounts for 97.87% of the variation
among the data points. That's pretty good. But we're not convinced it is
optimal, so we keep clicking options in Excel, finding that when we specify
a 6th order polynomial trendline, it accounts for 99.96% of the variation.
Figure 29. Finding a trend that fits: 6th order polynomial?
We are very impressed, and we decide to use this 6th order trendline to
predict future periods. Imagine our surprise when we extend the trendline
two periods to the future to result in the following:
Figure 30. That 6th order polynomial had a high coefficient of determination,
but was a poor predictor.
That can't be right. So we resolve to go back to the 3rd-order polynomial
and its predications:
Figure 31. A more reasonable prediction with the 3rd order
In all, one of the best ways to test predications is
to, well, wait. Testing a predication against the data that starts to come
in has no equal, but it does require waiting, and if we were willing to
wait, there would have been no reason to make a prediction in the first
place. Thus, it is wise to attempt to assess the strength of a predication
prior to the emergence of new data.
Trend extrapolation is one aspect of
the larger field of trend (or trendline) analysis. It attempts to extend
known data points to regions beyond the timeframe of known datapoints,
almost always in an attempt to predict future values with some degree of
probability. However, the assumptions made are critical.
For example, let's look at "Biomass as feedstock for a Bioenergy and
Bioproducts Industry: The Technical Feasibility of a Billion-Ton Annual
Supply," which was published in April 2005 under the auspices of the US
Department of Agriculture and the US Department of Energy (required visit):
Please look at the first page of the Executive Summary,
where the purpose is stated:
"The purpose of
this report is to determine whether the land resources of the United States
are capable of producing a sustainable supply of biomass sufficient to
displace 30 percent or more of the country's present petroleum
consumption.... The short answer to the question of whether that much
biomass feedstock can be produced is yes."
is worth repeating, "The short answer ... is yes." However, now look at the
assumptions under which this answer was made.
|"Important assumptions that were
made include the
- yields of corn, wheat, and other
small grains were increased by 50 percent;
- the residue-to-grain ratio for
soybeans was increased to 2:1;
- harvest technology was capable
of recovering 75 percent of annual crop residues (when removal
- all cropland was managed with
- 55 million acres of cropland,
idle cropland, and cropland pasture were dedicated to the
production of perennial bioenergy crops;
- all manure in excess of that
which can applied on-farm for soil improvement under anticipated
EPA restrictions was used for biofuel; and
- all other available residues
were utilized." (Page 2 of Executive Summary)
Are these assumptions reasonable? Are they likely?
What would the probability be that all of these assumptions would be shown,
over time, to be valid? Maybe a better approach would be for the authors to
say, "In order to achieve Level X, we would have to do Activities Y,
probability of Y leading to X determined to be ____ %. If we do less than Activities Y, such
as Activities Y2, the results would not be at Level X, but would
likely be at the lesser Level X2, with the same probability."
There are many additional resources online and in
"Forecasting by Smoothing Techniques from Professor Hossein Arsham at:
Try it out.
Examples in Technology Assessment
Nearly every typical technology assessment
project makes use of some type of forecasting technique, and while trend
extrapolation may not always be the technique chosen, it offers powerful
Rubin, D., & Stinson, J. (1986). Chapter 3: Forecasting. in Quantitative
approaches to management. NY: McGraw-Hill. (Available to those in the
BSU community at
S., Wheelwright, S., & Hyndman, R. (1998). Forecasting: Methods and
Applications.. 3rd ed.
New York: Wiley & Sons.
Nations. (1998). Long-range World Population Projections: Based on the
1998 Revision. Executive Summary: Author. Retrieved April 7, 2009 from
Nations. (1999). The World at Six Billion. New York: Author.
Retrieved April 7, 2009 from
States Department of Energy. (2009).
CFL Market Profile - March 2009. Washington, DC: Author. Retrieved
April 7, 2009 from