Saturday, August 14, 2010

Regression Analysis

Regression analysis is a statistical technique that is widely used for research. Regression analysis is used to predict the behavior of the dependent variables, based on the set of independent variables. In regression analysis, dependent variables can be metric or non-metric and the independent variable can be metric, categorical, or both a combination of metric and categorical. These days, researchers are using regression analysis in two manners, for linear regression analysis and for non-linear regression analysis. Linear regression analysis is further divided into two types, simple linear regression analysis and multiple linear regression analysis. In simple linear regression analysis, there is a dependent variable and an independent variable. In multiple linear regressions analysis, there is a dependent variable and many independent variables. Non- linear regression analysis is also of two types, simple non-linear regression analysis and multiple non-linear regression analysis. When there is a non-liner relationship between the dependent and independent variables and there is a dependent and an independent variable, then it said to be simple non-liner regression analysis. When there is a dependent variable and two or more than two independent variables, then it said to be multiple non-linear regression.

There is a difference between linear and non-linear regression analysis. Linear regression analysis is based on assumptions. These assumptions are as follows:

1. There is normal distribution.
2. There is a linear relationship between the dependent and independent variable.
3. There is no multicollinearity between the independent variables or no exact correlation between the independent variable.
4. There is no autocorrelation.
5. The means lagged value of the regression variable does not affect the current value.
6. The homoscedasticity or variance between all the independent variables is equal.

However, in the non-linear regression analysis, there are no assumptions like autocorrelation, multicollinearity, homoscedasticity, etc. Non-linear regression is used when linear regression does not meet these assumptions. Logistic regression is an example of non-linear regression.

Most researchers use two methods to calculate the coefficient of the regression analysis. The first method is the OLS method, which stands for the ordinary least square method. The second method is the maximum likelihood method. The OLS method is used when there is a linear relationship between the dependent and independent variables. The maximum likelihood method can be used in non-linear relationships as well. When there is a non-linear relationship between the dependent and independent variables, most of the researchers transform the data in the linear form, and then they use the OLS method. The Maximum likelihood method is quite mathematical, and that is why many researchers prefer the OLS method in regression analysis. But these days, computers can solve this problem quite easily. Now the researchers are using the OLS and maximum likelihood method equally.

Regression analysis has two types of variables; one is the dependent and the other is the independent variable. The intercept term in regression analysis shows the common variance explained by all the independent variables, and the beta coefficient shows the rate of change. The Beta coefficient shows how the dependent variable alters when one unit of the independent variable increases. In regression analysis, R-square shows how much total variance is explained by the independent variable for the dependent variable. In regression analysis, the t-test is used to test the significance of the variable. In regression analysis, if the independent variable is categorical in nature, then the researcher must have to convert that independent variable into a dummy variable. For example, the male and female is converted into 0 and 1. When a dependent variable is categorical in nature, then a simple regression cannot be used. In such situations, logistic regression is used. When the dependent variable has two categories, then the binary logistic can be used to predict the probability of the dependent variable categories. But if the categories of the dependent variables are more than two, then multinomial logistic regression is used to predict the probability of the categories of the dependent variable. When dependent variable categories are ordinal in nature, then ordinal logistic regression is used to predict the probability of the dependent variable categories. In time series analysis, regression analysis is used very frequently. ARIMA, ARCH, VAR, and Co-integration are examples of regression analysis in time series analysis.

No comments: