Real-world Data and Missing Data: Multiple Imputation in R
Schedule
Thu Feb 16 2023 at 09:00 am to 05:00 pm
Location
Online | Online, 0
About this Event
Workshop content:
Do you work with real-world data from sources such as disease registries and electronic health records? In your statistical analyses, do you contend with missing data especially in the explanatory variables? Are you usually concerned about the attendant potential bias and efficiency loss? This workshop is designed specifically for you.
Missing data are a major challenge with observational data. An example is data from cancer registries, where stage at diagnosis is often missing for upwards of 30% of the patients. Research in missing data methodology has already demonstrated the risks associated with ignoring the incomplete records and using ad hoc methods.
At the same time, advances in methodology have presented multiple imputation as a versatile solution, which provides validity of results under a broader and more realistic range of assumptions about the missing data mechanism. Current methodological research in this area is focusing on substantive model compatible multiple imputation approaches.
This limited-attendance one-day workshop will equip participants with the knowledge and tools needed to handle missing data in explanatory variables. We will start by elucidating the different missing data mechanisms: missing completely at random, covariate-only-dependent missing at random, outcome-dependent missing at random, and missing not at random.
We will do so under both the selection model and the pattern mixture factorization framework. The pattern mixture factorization is particularly instructive in laying bare the unverifiable nature of assumptions made regarding the missing data mechanism. We will then look at the implications of each of these mechanisms on the validity of analyses which ignore the incomplete records.
The workshop will then move on to the theoretical basis of multiple imputation, and explain how the model for imputing the missing data is obtained. We will then zoom in on multiple imputation by chained equations, also known as multiple imputation by the fully conditional specification approach.
To solidify an understanding of how results from the imputed datasets are combined, application of Rubin’s rules will first be illustrated by hand. Their implementation in software will then be explained. Throughout the workshop, simulated data mirroring linked cancer registry and secondary care data will be used, to cement understanding of all the concepts taught.
At the end of this workshop, attendees will be able to:
• Coherently explain what the different missing data mechanisms mean
• Explain the risks associated with ignoring the incomplete records
• Appreciate the theoretical basis of multiple imputation
• Understand how the model for imputation is obtained
• Implement multiple imputation using the fully conditional specification in R software
• Apply Rubin’s rules for combining results and correctly interpret the output
The Lecturer:
Dr Njeru Njagi is an Assistant Professor in Biostatistics at the London School of Hygiene & Tropical Medicine, a biostatistician and cancer epidemiologist with the Inequalities in Cancer Outcomes Network, and an expert in missing data and multiple imputation methodology. He has over 7 years of experience analysing cancer registry data linked to primary and secondary care data. His work on multiple imputation methodology is published in statistical journals such as Pharmaceutical Statistics and Statistical Methods in Medical Research, and substantive-matter journals such as British Journal of Cancer and Cancers. Dr Njeru Njagi is invited speaker on Missing Data for the Corsican Summer School on Modern Methods in Biostatistics and Epidemiology, a biennial and international congregation of researchers working in chronic disease registries. Currently, he is authoring a book chapter on missing data in cancer epidemiology, under the Challenges in estimation of Net SURvival (CENSUR) working group, the international group of biostatistics methodologists and epidemiologists who organise the summer school. He is also past speaker on Missing Data for the Cancer Survival: Principles, Methods and Applications at the London School of Hygiene & Tropical Medicine, an international course for researchers working in cancer registries. He is also a visiting staff member with the Epidemiology of Cancer Healthcare Outcomes at the University College London. An Associate Fellow of the UK Higher Education Academy, he is a proficient lecturer of Medical Statistics, Epidemiology and Health Data Science.
Registration:
It is the aim of this workshop to offer attendees a one-on-one experience, and therefore attendance is capped at 10 participants. Registration opens on the 9th of January 2023 and closes on the 31st of January 2023. The workshop will be held on the 16th of February 2023 via Zoom. Note that familiarity with methods such as logistic regression and time-to-event (survival) analysis will be assumed. Prospective participants wishing to have an initial conversation with the Lecturer (in order to explore the complexity of the workshop), before registration, are encouraged to book a 10-minute one-on-one Zoom discussion with the Lecturer. This initial conversation is offered free.
Workshop Programme:
0900 – 10:00: Missing data mechanisms: theory and practical examples
10:30 – 11:30: Implications of the different mechanisms and introduction to multiple imputation
12:30 – 13:30: Multiple imputation by the fully conditional specification approach
14:00 – 15:00: Practical 1 in R: Exploring the missing data mechanism and specifying the imputation model
15:30 – 17:00: Practical 2 in R: Implementing the imputation model, application of Rubin’s rules, and interpretation of results
Where is it happening?
OnlineGBP 150.00 to GBP 300.00