# Linear Models And Time-Series Analysis: Regress...

Linear Models and Time-Series Analysis: Regression, ANOVA, ARMA and GARCH sets a strong foundation, in terms of distribution theory, for the linear model (regression and ANOVA), univariate time series analysis (ARMAX and GARCH), and some multivariate models associated primarily with modeling financial asset returns (copula-based structures and the discrete mixed normal and Laplace). It builds on the author's previous book, Fundamental Statistical Inference: A Computational Approach, which introduced the major concepts of statistical inference. Attention is explicitly paid to application and numeric computation, with examples of Matlab code throughout. The code offers a framework for discussion and illustration of numerics, and shows the mapping from theory to computation.

## Linear Models and Time-Series Analysis: Regress...

By contrast, dynamic models use lagged predictors to incorporate feedback over time. There is nothing in the CLM assumptions that explicitly excludes predictors with lags or leads. Indeed, lagged exogenous predictors xt-k, free from interactions with the innovations et, do not, in themselves, affect the Gauss-Markov optimality of OLS estimation. If predictors include proximate lags xt-k, xt-k-1, xt-k-2, ..., however, as economic models often do, then predictor interdependencies are likely to be introduced, violating the CLM assumption of no collinearity, and producing associated problems for OLS estimation. This issue is discussed in the example Time Series Regression II: Collinearity and Estimator Variance.

Subsequent examples in this series consider linear regression models, built from a small set of potential predictors and calibrated to a rather small set of data. Still, the techniques, and the MATLAB toolbox functions considered, are representative of typical specification analyses. More importantly, the workflow, from initial data analysis, through tentative model building and refinement, and finally to testing in the practical arena of forecast performance, is also quite typical. As in most empirical endeavors, the process is the point.

Multiple linear regression models assume that a response variable is a linear combination of predictor variables, a constant, and a random disturbance. If the variables are time series processes, then classical linear model assumptions, such as spherical disturbances, might not hold. For more details on time series regression models and their departures from classical linear model assumptions, see Time Series Regression I: Linear Models.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models.[3] Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.[4] This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.

Standard linear regression models with standard estimation techniques make a number of assumptions about the predictor variables, the response variables and their relationship. Numerous extensions have been developed that allow each of these assumptions to be relaxed (i.e. reduced to a weaker form), and in some cases eliminated entirely. Generally these extensions make the estimation procedure more complex and time-consuming, and may also require more data in order to produce an equally precise model.

Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is

Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as general linear regression.

Various models have been created that allow for heteroscedasticity, i.e. the errors for different response variables may have different variances. For example, weighted least squares is a method for estimating linear regression models when the response variables may have different error variances, possibly with correlated errors. (See also Weighted linear least squares, and Generalized least squares.) Heteroscedasticity-consistent standard errors is an improved method for use with uncorrelated but potentially heteroscedastic errors.

Hierarchical linear models (or multilevel regression) organizes the data into a hierarchy of regressions, for example where A is regressed on B, and B is regressed on C. It is often used where the variables of interest have a natural hierarchical structure such as in educational statistics, where students are nested in classrooms, classrooms are nested in schools, and schools are nested in some administrative grouping, such as a school district. The response variable might be a measure of student achievement such as a test score, and different covariates would be collected at the classroom, school, and school district levels.

Errors-in-variables models (or "measurement error models") extend the traditional linear regression model to allow the predictor variables X to be observed with error. This error causes standard estimators of β to become biased. Generally, the form of bias is an attenuation, meaning that the effects are biased toward zero.

Generalized linear models including logistics regression, Poisson regression, contingency tables. Random effects and repeated measures. Modern regression techniques. Regression and classification trees. Neural networks. (3 Credits)

This course will cover statistical models and methods relevant to the analysis of financial data. Topics covered will include modeling and estimation of data from heavy-tailed distributions, models and inference with multivariate copulas, linear and non-linear time series analysis, and statistical portfolio modeling. Applications from finance will be used to illustrate the methods. (3 Credits)

The course is designed for graduate students interested in quantitative research to learn about regression models for data analysis. Topics include estimation and inference, diagnostics, model selection, and interpretation of results associated with linear models and general linear models. Additional topics may vary with the instructor. (3 Credits)

This is an advanced introduction to regression modeling and prediction, including traditional and modern computationally-intensive methods. The following topics will be covered: (1) Theory and practice of linear models, including the relevant distribution theory, estimation, confidence and prediction intervals, testing, model and variable selection, generalized least squares, robust fitting, and diagnostics; (2) Generalized linear models, including likelihood formulation, estimation and inference, diagnostics, and analysis of deviance; and (3) Large and small-sample inference as well as inference via the bootstrap, cross-validation, and permutation tests. (4 Credits)

This is an advanced introduction to the analysis of multivariate and categorical data. Topics include: (1) dimension reduction techniques, including principal component analysis, multidimensional scaling and extensions; (2) classification, starting with a conceptual framework developed from cost functions, Bayes classifiers, and issues of over-fitting and generalization, and continuing with a discussion of specific classification methods, including LDA, QDA, and KNN; (3) discrete data analysis, including estimation and testing for log-linear models and contingency tables; (4) large-scale multiple hypothesis testing, including Bonferroni, Westphal-Young and related approaches, and false discovery rates; (5) shrinkage and regularization, including ridge regression, principal component regression, partial least squares, and the lasso; (6) clustering methods, including hierarchical methods, partitioning methods, K-means, and model-based clustering. (4 Credits)

Wishart distribution, multivariate linear models, multivariate regression, Hotelling's T-square and its applications, discriminant analysis, canonical correlations, principal components analysis, growth curves. (3 Credits)

Gauss-Markov theorem; one-way, two-way analysis of variance, and complete higher-way layouts; regression; the general linear model and hypothesis; least squares theory; analysis of covariance; missing observations; multiple comparisons procedures; incomplete blocks, split plot designs, and Latin squares; variance component models, mixed models; treatment of residuals; robustness of the methods. Special topics in the second semester. (3 Credits) 041b061a72