Deriving and validating a risk prediction model for long COVID-19 : protocol for an observational cohort study using linked Scottish data

Daines, Luke and Mulholland, Rachel H and Vasileiou, Eleftheria and Hammersley, Vicky and Weatherill, David and Katikireddi, Srinivasa Vittal and Kerr, Steven and Moore, Emily and Pesenti, Elisa and Quint, Jennifer K and Shah, Syed Ahmar and Shi, Ting and Simpson, Colin R and Robertson, Chris and Sheikh, Aziz (2022) Deriving and validating a risk prediction model for long COVID-19 : protocol for an observational cohort study using linked Scottish data. BMJ Open, 12 (7). e059385. ISSN 2044-6055 (

[thumbnail of Daines-etal-BMJOpen-2022-Deriving-and-validating-a-risk-prediction-model-for-long-COVID-19]
Text. Filename: Daines_etal_BMJOpen_2022_Deriving_and_validating_a_risk_prediction_model_for_long_COVID_19.pdf
Final Published Version
License: Creative Commons Attribution-NonCommercial 4.0 logo

Download (1MB)| Preview


Introduction: COVID-19 is commonly experienced as an acute illness, yet some people continue to have symptoms that persist for weeks, or months (commonly referred to as ‘long-COVID’). It remains unclear which patients are at highest risk of developing long-COVID. In this protocol, we describe plans to develop a prediction model to identify individuals at risk of developing long-COVID. Methods and analysis: We will use the national Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II) platform, a population-level linked dataset of routine electronic healthcare data from 5.4 million individuals in Scotland. We will identify potential indicators for long-COVID by identifying patterns in primary care data linked to information from out-of-hours general practitioner encounters, accident and emergency visits, hospital admissions, outpatient visits, medication prescribing/dispensing and mortality. We will investigate the potential indicators of long-COVID by performing a matched analysis between those with a positive reverse transcriptase PCR (RT-PCR) test for SARS-CoV-2 infection and two control groups: (1) individuals with at least one negative RT-PCR test and never tested positive; (2) the general population (everyone who did not test positive) of Scotland. Cluster analysis will then be used to determine the final definition of the outcome measure for long-COVID. We will then derive, internally and externally validate a prediction model to identify the epidemiological risk factors associated with long-COVID. Ethics and dissemination: The EAVE II study has obtained approvals from the Research Ethics Committee (reference: 12/SS/0201), and the Public Benefit and Privacy Panel for Health and Social Care (reference: 1920-0279). Study findings will be published in peer-reviewed journals and presented at conferences. Understanding the predictors for long-COVID and identifying the patient groups at greatest risk of persisting symptoms will inform future treatments and preventative strategies for long-COVID.