R – Complete package

Content

Introduction to R • Introduction to modern statistics • Linear regression • ANOVA • Advanced regression • Visualisation & data exploration • Cluster analysis • Structural equation models (SEM) • Survival analysis • Biomarker data • Machine learning • Artificial intelligence
10 course days – SEK 50.000

R 1 - Introduction to R and to modern statistics – 2 days

This course helps you get started with R. We’ll cover the basics of R, ranging from importing and handling data to visualisation. You’ll learn about two fundamental tools in statistical analysis: hypothesis tests and confidence intervals. We’ll also discuss important concepts like p-values, power, and sample size calculcations.

The popular tidyverse package is used for filtering, cleaning, and preparing data for analysis. The powerful plotting capabilities of the ggplot2 package are also covered. Both basic statistical concepts and fundamental topics in R programming are discussed. This course is a great fit if you’re curious about R, or already know that you want to use its many tools for advanced data analysis. Classical statistical tests like the t-test, nonparametric tests and the chi-squared test are covered, along with modern computer-intensive methods like the bootstrap. The latter allows us to obtain p-values and confidence intervals without many of the constraints of traditional methods (such as requiring that the data follow a normal distribution), bringing your statistical toolbox up to the 21st century.

Course goals: To be able to use R to import and wrangle data, describe data using graphs and tables. To understand the basics of hypothesis testing and confidence intervals and be able to use R for running and computing common tests and intervals.

Prerequisites: Basic computer skills.

R 2 - Linear regression, ANOVA & advanced regression models – 2 days

This course provides you with a solid understanding of modern linear regression and ANOVA models. It also covers some common but advanced regression models for dealing with categorical data and repeated measurements.

We will have a closer look at how these models work and how R can be used to build, visualise, and interpret such models. We will use modern techniques like the bootstrap and permutation tests, to obtain confidence intervals and p-values without having to assume a normal distribution for your data. We will cover non-linear regression models like logistic regression and Poisson regression, where the response variable can be either binary (yes/no), counts, or prevalence. Mixed models are used to analyse data with repeated measurements on the same subjects. We also learn about methods for dealing with missing data.

Course goals: To be able to use R to fit, visualize and interpret linear regression and ANOVA models. To understand how to visualise and interpret models for logistic regression, count regression, and mixed models. To be able to fit models with missing data.

Prerequisites: R1 or similar.

R 3 - Visualisation, data exploration, cluster analysis & SEM – 2 days

This course will teach you how visually explore data in R, and how to create great-looking graphics using the powerful ggplot2 package. We´ll also discuss cluster analysis, including hierarchichal and centroid-based methods, factor analysis and structural equation models (SEM), used to measure and analyse the relationship between observed and hidden variables, as well as mediation analysis.

Topics covered include outlier detection, visualisation of trends, and multivariate data. It also covers dimension-reduction of complex data using principal component analysis (PCA). Cluster analysis is used to find subgroups in exploratory analyses of your data. SEM allows us to study causal relationships between variables in our data and latent (unobservable) variables, such as difficult-to-measure attitudes. Mediation analysis is used to understand the mechanism behind causal relationships.

Course goals: To be able to use the R package ggplot2 to visualise and explore data. Learn how to do cluster analysis when analysing your data and to perform SEM to study causal relationships between variables.

Prerequisites: R1 or similar.

R 4 - Survival analysis and biomarker data / Statistics in medicine – 2 days

This course will cover methods for survival analysis including visualisation techniques such as Kaplan-Meier plots, and regression models such as Cox proportional hazards regression. During the second day you will learn how to best analyse biomarker data, which has become a vital part of modern medicine.

Many studies are concerned with the time until an event happens: time until an individual contracts a disease, time until a patient diagnosed with a disease dies, and so on. On day one of this course we study methods for survival analysis used for analysing such data. Examples include visualisation techniques such as Kaplan-Meier plots, and regression models such as Cox proportional hazards regression – and newer regression models that often are better than the classical Cox model. In addition, we learn how to handle competing risks, recurrent events and time-varying variables in survival models.

In the second day of the course we learn how best to analyse biomarker data, which has become a vital part of modern medicine. Biomarker measurements rarely follow a normal distribution, and often have detection limits, meaning that some measurement will fall below the lowest levels of the biomarker that the laboratory analysis can detect. We can still make use of these nondetects if we use the right statistical methods. We study methods tailored to such data, including regression, visualisation, techniques for finding biomarkers related to diseases, and understanding correlations between biomarkers.

Course goals: To be able to use R to analyse survival data and biomarker data using state-of-the-art methods.

Prerequisites: R2 or similar.

R 5 - Machine learning and AI – 2 days

During this course we learn how to train different kinds of machine learning models and how to evaluate the predictive performance of them. We will also discuss how modern AI systems work and build models for analysing text and images.

Machine learning models are used to make predictions, for instance to diagnose diseases or predict future stock prices. In this course we learn how to train different kinds of machine learning models, including random forest and lasso regression, and how to evaluate the predictive performance of our models. In addition, we learn about how to deal with common challenges in machine learning projects, such as missing data and imbalanced data.

Modern AI systems use machine learning models known as deep neural networks. During the second day of this course, we learn how these work, and build models for analysing text and images. We also learn about common pitfalls and the limitations of present-day AI.

Course goals: To be able to use R to build, evaluate and use machine learning models, both for regression and classification. To understand how modern AI works and be able to use R to build simple AI models for analysing text and images.
Prerequisites: R2 or similar.