Consumer Demand Forecasting: Popular Techniques. Part 3: Regression Analysis
|
|
Rating: 5 vote(s)
Author: Eyal Eckhaus, posted on 6/22/2010
, in category "Logistics"
Views: this article has been read 23157 times
Abstract:
Regression analysis is a popular and efficient method for forecasting, used in various fields. This article demonstrates some basic types of regression analysis and provides numerical examples.
Regression analysis is a popular and efficient method for forecasting, used in various fields. This article demonstrates some basic types of regression analysis and provides numerical examples.
1. Regression analysis
Regression analysis can be used to assess the relationship between one or more independent or predictor variables and the dependent or response variable – which is the variable we want to predict [1]. Therefore, it can not only provide a forecast, but also explain the relationship between dependent variables and independent factors [2].
Multiple regression analysis involves two or more predictor variables [1], and is a slightly more advanced forecasting method, but believed to be the most accurate when used correctly [3]. This analysis is widely accepted in various disciplines, such as business, economics, engineering, and social and biological sciences [4]. It can be used also for time-series analysis [5], and became increasingly popular with both basic and applied research journals, with the advantage of flexibility of testing complex associations among different variables [6]. Several software packages exist today to compute statistical requirements for the regression process [4].
2. Linear regression
As described by Han, Kamber, and Pei [1], Linear regression involves finding a line that fits between two variables, so by inputting one variable, another can be predicted (multiple linear regression is an extension of the linear regression, where more than two variable are involved). It is the simplest form of regression, involving a response variable y, and a predictor variable x, forming the equation: y = b + wx, where b is the intercept and w is the slope of the line, and are the regression coefficients, which can also be thought of as weights: y= w
0 + w
1x .
A common goodness of fit check is R2, which refers to the fraction of variance explained by the model, with results ranging between 0 to 1: the higher the value, the better the linear regression fits the data [7].
Weighted linear regression (weighted least-squares regression)
Weighted linear regression is a simple improvement that considers the time varying uncertainties in shoreline position estimates, providing greater emphasis on more reliable data [8]. Many techniques exist to assign the weight to the data pairs. One common technique is using the mean of their inverse variance [9].
3. Polynomial regression
There are other types of regression that can model relationships between a variable and a predictor which isn’t a straight line. A good example is polynomial regression, a nonlinear model which can easily be converted to a linear one by transforming the variables [1]. It is more flexible than standard regression and easy to implement [10], such that the higher the polynomial's order, the higher the flexibility of the estimated function [11]. It can be seen as a special case of multiple linear regression [12]. It is preferred for smooth curves [13] , and is the most commonly used method for meta-models of mechanical systems [14].
4. Examples
Example 1 – Simple linear regression
Table 1 shows pairs of years of experience (x) and monthly pay (y) of employees in a certain job, where each is given an equal weight,
Figure 1 illustrates the same data graphically. (The charts were produced online
here).
Table 1: x values - years of experience, y values - pay.
Employee | Weight | x values – years of experience | y values – pay ($ thousands) |
A |
1 |
1 |
10 |
B |
1 |
2 |
15 |
C |
1 |
3 |
11 |
D |
1 |
3 |
14 |
E |
1 |
4 |
30 |
F |
1 |
6 |
28 |
G |
1 |
6 |
30 |
H |
1 |
7 |
35 |
I |
1 |
9 |
40 |
J |
1 |
9 |
42 |
Figure 1: Years of experience (in brackets next to name) and pay
Although there isn’t a straight line passing through the graph point, a pattern can be seen. Calculating the regression line (try it yourself here), produces the following linear regression equation:
Y = 4.04166666666666X + 5.29166666666667
R2 = 0.896095238095238
Using this equation, we can predict the employee's pay for a given number of years of experience in a certain job, with a rather high R2 of about 0.9. For example, the salary for 5 years of experience will be: 4.04166666666666 × 5 + 5.29166666666667 = $25.5k
Example 2 – Weighted linear regression
The data in table 2 increase the weight of the last five employees by one.
Table 2: Assigning a different weight
Employee | Weight | x values – years of experience | y values – pay ($ thousands) |
A |
1 |
1 |
10 |
B |
1 |
2 |
15 |
C |
1 |
3 |
11 |
D |
1 |
3 |
14 |
E |
1 |
4 |
30 |
F |
2 |
6 |
28 |
G |
2 |
6 |
30 |
H |
2 |
7 |
35 |
I |
2 |
9 |
40 |
J |
2 |
9 |
42 |
Recalculating the regression line (try it yourself here), produces the following linear regression equation:
Y = 4.01394422310757X + 5.38579017264277
R2 = 0.918406238784453
R2 is higher than the previous, unweighted value.
Example 3 – Polynomial regression
The data from Table 1 was tested using 5 polynomial orders. The results are as follows:
(by generating a 10-row table, inserting the data only once, and then changing each time only the order of polynomials, check it yourself here)
Order of polynomial - 1: as shown in Example 1, the results are:
Y = 4.04166666666666X + 5.29166666666667
R2 = 0.896095238095238
Order of polynomial - 2:
Y = -0.060281766458953X2 + 4.66960173394745X + 4.09306421024132
R2 = 0.897117157564733
Order of polynomial - 3:
Y = -0.0454199667044577X3 + 0.64058363686047X2 + 1.62704232791248X + 7.45710739578317
R2 = 0.899951889463559
Order of polynomial - 4:
Y = 0.0589921881447424X4 -1.20006492704488X3 + 8.07456457670924X2 - 16.1079489242256X + 19.8235790256076
R2 = 0.908300163716241
Order of polynomial - 5:
Y = -0.0107778110196932X5 + 0.317497838832423X4 -3.4701816932502X3 + 17.0168727937853X2 -31.4029423432658X + 28.3985238345922
R2 = 0.909248297284204
The above example shows a gradual improvement of R2 with each order increase. Increasing the order will also show improvement with polynomial regression.
5. Summary
Regression analysis is used to examine the relationship between the dependent an independent (predictor) variable and the independent (response) variable, and is widely used as forecasting method in various fields. There are many types of regression analyses. Some basic types of regression – simple linear regression, weighted linear regression and polynomial regression – have been demonstrated with examples.
You can generate these types of regression with the free hands-on purchasesmarter.com’s regression equation development, or browse other free hands-on utilities.
This is the second in the article set describing forecasting techniques:
Part 1 introduces demand forecasting issue and demonstrates the weighted and unweighted moving average techniques
Part 2 demonstrates the simple exponential smoothing technique.
Part 4 discusses selection among all techniques.
Bibliography
- Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Second Edition. 2006: Morgan Kaufmann
- Oliver, S.A., et al., Forecasting readiness: Regression analysis techniques. Air Force Journal of Logistics, 2001. 25(3): p. 1,29+.
- Dawood, N. and W. Bates, Creation of a cost index and forecasting methodology for process/heavy civil engineering. AACE International Transactions, 1997: p. 137-144.
- Salaverry, J.A. and E.D. WhiteIII., Improving procurement through regression analysis: a case study of predicting Argentine jet fuel prices. Journal of Public Procurement, 2009. 9(1): p. 1-16.
- Weisel, J.A., Forecasting with Excel. Journal of Accountancy, 2009. 207(2): p. 62-67,14.
- Hoyt, W.T., S. Leierer, and M.J. Millington, Analysis and Interpretation of Findings Using Multiple Regression Techniques. Rehabilitation Counseling Bulletin, 2006. 49(4): p. 223-233.
- Kumar, S., A.D. Wolfe, and K.A. Wolfe, Using Six Sigma DMAIC to improve credit initiation process in a financial services operation. International Journal of Productivity and Performance Management, 2008. 57(8): p. 659-676.
- Ruggiero, P. and J.H. List, Improving Accuracy and Statistical Reliability of Shoreline Position and Change Rate Estimates. Journal of Coastal Research, 2009. 25(5): p. 1069-1081.
- Barron, U.G., et al., Estimation of Prevalence of Salmonella on Pig Carcasses and Pork Joints, Using a Quantitative Risk Assessment Model Aided by Meta-Analysis. Journal of Food Protection, 2009. 72(2): p. 274- 285.
- Kalyanam, K. and T.S. Shively, Estimating irregular pricing effects: A stochastic spline regression approach. JMR, Journal of Marketing Research, 1998. 35(1): p. 16-29.
- Magee, L., Nonlocal behavior in polynomial regressions. The American Statistician, 1998. 52(1): p. 20-22.
- Sopek, P., The effect of financial crisis on Croatia's primary deficit and public debt. Financial Theory and Practice, 2009. 33(3): p. 273-298.
- Whiteside_II, J.D., Developing Estimating Models. Cost Engineering, 2004. 46(9): p. 23-30.
- Salagame, R.R. and R.R. Barton, Factorial hypercube designs for spatial correlation regression. Journal of Applied Statistics, 1997. 24 (4): p. 453-473.
copyright © Purchasesmarter.com. All rights reserved. The material may not be published, rewritten, broadcast, or redistributed. Any reproduction in whole or part by and individuals or organizations will be held liable for copyright infringement to the full extent of the law.
Rate