We currently offer three self paced tutorials designed to support researchers in data analysis and use of specific statistical software.  

These tutorials provide an introduction to SAS for individuals working with administrative data.

 Topics covered include:

- Introduction to the SAS Windows Environment
- Viewing SAS Data
- Basic Data Manipulation
- Debugging SAS Programs
- Intermediate SAS Statistical Procedures
- Variable Attributes and Display in Output
- Intermediate Data Manipulation
- Combining SAS Data Sets
- Advanced SAS Statistical Procedures
- Getting Data into & out of SAS

These tutorials provide an introduction to Mplus for individuals who wish to use this software program in their analyses.

Topics covered include:

- Introduction to Mplus
- Path Analysis with Mplus
- Factor Analysis with Mplus

These tutorials provide a general introduction to linear regression modeling.

Topics covered include:

- Understanding Simple Linear Regression
- Examining Model Assumptions and fit
- Overview of Multiple Linear Regression
- Multiple Linear Regression: Model Refinement

RStudio is a software application that provides a powerful user interface for the R language which aims to make R easier to use and more productive. It is also free and works on Windows, Mac OS, and UNIX/Linux.

 

The Introduction to RStudio for SAS users is a best practices document for those who have had prior training in the SAS programming language, but are new to the R language. It does not make one-to-one comparisons with SAS commands, statements or syntax, but points out important similarities and differences between SAS and R where appropriate, in an attempt to aid the transition to the R language for users familiar with SAS programming. In such situations, a separate information box ‘For SAS Users’ is provided. However, prior experience with SAS is not required. This document may also serve as a stand-alone introduction to RStudio.

This webinar series “An Introduction to Data Visualization and Display using R Commander” provides an overview of visualization using the R language’s superb graphics tools. R is a free, open-source language and environment for statistical computing and graphics, with an extensive collection of features for data visualization.  We will use both R’s native graphing capabilities and the tools in ggplot2, an R package that is easily installed. Our graphic user interface is a (free) Canadian product from McMaster University called “R Commander.”  This is an SPSS-like menu-driven GUI that allows access to much of the power of R graphics without the need for programming in R.  R Commander’s menu commands create and display the R code which you need.  In most cases you will use this code directly to create data visualizations.  Occasionally we will make some simple modifications or even write a line of new code.  We will run R and R Commander within the highly-regarded web browser-like interface “R Studio.” 

This series introduces users to the basic principles of graphing data, visualizing data, and effectively displaying data in documents and dashboards. The webinar series will be divided into four 2-hour sessions. Homework activities will be provided for practice between sessions.

This four module course will provide you with an introduction to Data Management and Cleaning for Analysis using R Software. Each module includes a PowerPoint slide deck, training data and associated exercises for practice. You many choose to download all documents for use on your computer or practice the exercises within Population Data BC's Remote Training Lab (RTL). The RTL houses all exercises, training data and R software you require. 

Module One includes: 

  • Introduction and Theory of data cleaning and management
  •  Getting Started with R software

Module Two includes: 

  • Data Cleaning: Errors and Missing Data
  • Subset and Restrict Data

Mo  Module Three includes:

  •        Recoding: Editing Variables and Creating New Variables

Module 4 includes:

  • Merging, Joining and Manipulation of Data


-    


Overview

Data Science is concerned with analyzing and reporting on a range of different kinds of data including structured data stored in organizational databases and unstructured data that is often text-rich and not collected according to a particular data model. Work in this field requires specialized techniques and tools that draw upon both statistical and computational methods to address complex real world problems and employ multidisciplinary analytics to derive knowledge from large sources of data (Big data).

The following Data Science webinar series will provide an introduction to this rapidly growing field with a particular focus on machine learning methods and analytic techniques that can serve the needs of health and environmental researchers working to understand trends in society, health and human behavior.  

The presentations are intended for those who are interested in a broad overview to basic data science analytics. The sessions will benefit health and environmental researchers, analysts and related professionals who want an introduction to data science approaches for data analytics using R software. (Python code will also be provided) The webinar series includes four modules that each include an introductory and practicum session. Each module will focus on the application of specific machine learning methods and analytic techniques with general formulas presented but will not delve into their statistical theory.

Requirements

To benefit from the webinar presentations, registrants should have knowledge of simple and multiple linear regression models and categorical data analysis such as logistic regression.

No prior working knowledge of R or Python is required, but some familiarity with R would be beneficial for following the practicum sessions.

As a supplemental resource for this series, you may wish to review our new free online resource: Data Management and Cleaning for Analysis with R software.

Webinar module resources

All modules include presentation recordings, slide decks, training data, R and Python code, and related references for further reading/study.

Module format

Each module includes two  webinar presentations:

  • Session 1: A one-hour introductory presentation
  • Session 2: A two-hour practicum session that includes a focus on applied analytics using training data with R code and supplementary Python code.