The course begins with simple and multiple linear regression models for which fitting, parametric and model inference as well as prediction will be explained. Topics covered are least squares (LS) and generalised LS, the Gauss-Markov theorem, geometry of least squares and orthogonal projections. A special attention is paid to the diagnostic strategies which are key components of good model fitting. Further topics include transformations and weightings to correct model inadequacies, the multicollinearity issue, variable subset selection and model building techniques. Later in the course, some general strategies for regression modelling will be presented with a particular focus on the generalized linear models (GLM) using the examples with binary and count response variables.
As the high-dimensional data, order of magnitude larger than those that the classic regression theory is designed for, are nowadays a rule rather than an exception in computer-age practice (examples include information technology, finance, genetics and astrophysics, to name just a few), regression methodologies which allow to cope with the high dimensionality are presented. The emphasis is placed on methods of controlling the regression fit by regularization (Ridge, Lasso and Elastic-Net), as well as methods using derived input directions (Principal Components regression and Partial Least Squares) that allow to tamp down statistical variability in high-dimensional estimation and prediction problems.
A number of statistical learning procedures with the focus on computer-based algorithms is presented from a regression perspective.
Computer-aided project work with a variety of datasets forms an essential learning activity.