# Linear models and their application in R

Dr. Roger Mundry will hold a one-week intensive workshop on statistical methods for Ph.D. fellows and academic staff. The workshop is organized by MultiLing and ILN.

Photo: Markus Winkler / Unsplush

#### Admission and ECTS

All information on admission and ETCS is available on the registration page for the course HFILN9012.

#### Schedule

A detailed schedule of the workshop can be downloaded here.

#### Course Content

Linear models represent a flexible framework allowing the analysis of the effects of one or several (quantitative or qualitative) predictors on a single response (which can be, e.g., continuous, a count, or binary). As such they encompass, for instance, linear regression, the t-tests, ANOVA, ANCOVA, the Generalized Linear Model (e.g., logistic, Poisson, zero-inflated, or negative binomial models), and Mixed (a.k.a. multi-level) Models. Hence, linear models allow to address a huge variety of questions with various types of data, using a unified conceptual and statistical framework.

In the course all the above will be covered, that is linear models from simple regression to the Generalized Linear Mixed Model (GLMM). We will begin with simple linear regression and then explain how this concept can be extended to model the impact of multiple predictors, categorical predictors, interactions, and certain non-linear relationships (i.e., the 'general linear model'). Then we will proceed with introducing how the general linear model can be expanded to the 'Generalized Linear Model' (e.g., logistic, Poisson, zero-inflated, or negative binomial regression). Finally, we will treat the (Generalized) Linear Mixed Model (i.e., models allowing the inclusion of grouping variables or 'random effects'). A further lesson will be devoted to the question of, how to formulate scientifically meaningful models.

Throughout the course, we will put much emphasis on the conceptual meaning and interpretation of the models rather than on their 'mechanics' (i.e., the mathematical background). Practically this means that we shall devote quite some time to understanding what such models reveal about 'life' (i.e., the process investigated) and particularly to understanding and interpreting interactions. In fact, it is an important component of the course to try teaching how models and 'life' are linked, i.e., how one can put hypotheses and questions about life into models and what these then can (and cannot) reveal about it.

The course is mainly centered around a null-hypothesis significance testing framework, largely because this is still the by far most frequently used approach. However, the models themselves, i.e., how they are set up with regard to, for instance, interactions, fixed and random effects, random slopes, error and link function, their meaning, interpretation (and limitations), are unaffected by the philosophy used to draw statistical inference.

#### Material

The course is accompanied by plenty of handouts which will be made available prior to/during it.

#### Structure

The course consists of roughly (regularly interspersed) 50% theory and 50% practical applications during which we shall work ourselves through various models. As part of that, participants will also learn how to plot the results of the models treated and how to describe them in the methods and results sections of a paper. Finally, I put much emphasis on assumptions and model diagnostics and how to evaluate them.

#### Requirements

The course requires some familiarity with general ideas/concepts of statistics and also the basic concepts of R. Regarding the former, participants should have some experience with applied statistics, and be somewhat familiar with things like null-hypothesis significance-testing, 'error level', etc.. Regarding the latter, participants should have some experience with R, and, for instance, know how to read a file into it and run some simple tests (e.g., t-test, ANOVA, or non-parametric tests) and create simple plots. Regarding R, a couple of weeks before the course begins I'll make available two tutorials giving a general introduction to R and an introduction to plotting in R, and I highly recommend that participants have a serious look at these (total of ca. 100 pages) before the course begins. The individual lessons of the course build heavily upon one another. Hence, it is a requirement that every participant attends throughout and all of them (missing even just a few hours may make it very hard to catch up later). Also, it probably pays a lot to invest extra time to go through the treated material again and the exercises I may provide. Hence, I strongly advise to keep the period of the course as free of other obligations as possible.

#### Lecturer

Dr. Roger Mundry (Biostatistician, Leibniz-Science Campus, Primate Cognition)