Home > Multiple Regression Analysis and Forecasting User Guide > Inputting Data Help Topic Share

Inputting Data

Help for inputting data in the multiple regression analysis and forecasting template for Excel


User Guide Navigation:
Business Spreadsheets User Guides: Project Planning and Management Multiple Regression Analysis and Forecasting Investment and Business Valuation Real Options Valuation Portfolio Optimization Portfolio Performance Tracking Customer Invoicing
Multiple Regression Analysis and Forecasting Help Topics: Inputting Data Analyzing the Output Forecasting the Dependent Variable

Arriving at suitable input data for the Regression Forecasting model can be somewhat of an iterative process. Depending on the strength of the resulting output, independent variables can eliminated, replaced, or added until a statistically sound and satisfactory predictive regression equation eventuates. In selecting suitable independent variables, several factors should be taken into consideration:

 

Data should be input into the 'Input' sheet. The Dependent variable data and title should be entered into column A, and the Independent variables’ data and titles should be entered into columns B outward. There is no limit to the number of independent variables or observations and additional variables can be accommodated across the model automatically. Row formatting can be copied down is desired for observations.

 

Text based categorical and logistic/binary data types are supported and are automatically detected and converted to continuous data proxies for regression. The numerical variables created from categorical and logistic variables are added as additional variables to retain the mapping with the original variable columns which are marked OFF to be excluded from the analysis.

 

Feature Selection

Feature selection automatically identifies the combination of input variables that provides the best fit predictive regression equation without multicollinearity. The Minimum R-squared is first applied to each independent variable against the dependent and those under the R-squared are flagged as OFF. A value of zero will ignore this part of the feature selection. The Variance Inflation Factor (VIF) is used to test the adjusted R-Squared relationship of each independent variable with all other independent variables. An iterative process is undertaken to disable variables where the maximum VIF calculated as 1/(1-RSQ) to over this threshold. A rule of thumb for the VIF value is 10 lower values disabling more variables and higher values being more lenient on the multicollinearity tolerance. A VIF value of zero will ignore this part of the feature selection.

 

Variables can also be flagged as OFF manually for testing. Setting both feature selection variables to zero essentially disabled the feature selection and forces all numeric variables to be processed. The results will then highlight relative impact and multicollinearity for variables to then be manually turned OFF.


Screenshot: Inputting Data