Prevalence and risk factors for long COVID among adults in Scotland using electronic health records : a national, retrospective, observational cohort study

Jeffrey, Karen and Woolford, Lana and Maini, Rishma and Basetti, Siddharth and Batchelor, Ashleigh and Weatherill, David and White, Chris and Hammersley, Vicky and Millington, Tristan and Macdonald, Calum and Quint, Jennifer K. and Kerr, Robin and Kerr, Steven and Shah, Syed Ahmar and Rudan, Igor and Fagbamigbe, Adeniyi Francis and Simpson, Colin R. and Katikireddi, Srinivasa Vittal and Robertson, Chris and Ritchie, Lewis and Sheikh, Aziz and Daines, Luke (2024) Prevalence and risk factors for long COVID among adults in Scotland using electronic health records : a national, retrospective, observational cohort study. eClinicalMedicine, 71. 102590. ISSN 2589-5370 (

[thumbnail of Jeffrey-etal-eClinicalMedicine-2024-Prevalence-and-risk-factors-for-long-COVID-among-adults-in-Scotland]
Text. Filename: Jeffrey-etal-eClinicalMedicine-2024-Prevalence-and-risk-factors-for-long-COVID-among-adults-in-Scotland.pdf
Final Published Version
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 logo

Download (718kB)| Preview


Background: Long COVID is a debilitating multisystem condition. The objective of this study was to estimate the prevalence of long COVID in the adult population of Scotland, and to identify risk factors associated with its development. Methods: In this national, retrospective, observational cohort study, we analysed electronic health records (EHRs) for all adults (≥18 years) registered with a general medical practice and resident in Scotland between March 1, 2020, and October 26, 2022 (98–99% of the population). We linked data from primary care, secondary care, laboratory testing and prescribing. Four outcome measures were used to identify long COVID: clinical codes, free text in primary care records, free text on sick notes, and a novel operational definition. The operational definition was developed using Poisson regression to identify clinical encounters indicative of long COVID from a sample of negative and positive COVID-19 cases matched on time-varying propensity to test positive for SARS-CoV-2. Possible risk factors for long COVID were identified by stratifying descriptive statistics by long COVID status. Findings: Of 4,676,390 participants, 81,219 (1.7%) were identified as having long COVID. Clinical codes identified the fewest cases (n = 1,092, 0.02%), followed by free text (n = 8,368, 0.2%), sick notes (n = 14,469, 0.3%), and the operational definition (n = 64,193, 1.4%). There was limited overlap in cases identified by the measures; however, temporal trends and patient characteristics were consistent across measures. Compared with the general population, a higher proportion of people with long COVID were female (65.1% versus 50.4%), aged 38–67 (63.7% versus 48.9%), overweight or obese (45.7% versus 29.4%), had one or more comorbidities (52.7% versus 36.0%), were immunosuppressed (6.9% versus 3.2%), shielding (7.9% versus 3.4%), or hospitalised within 28 days of testing positive (8.8% versus 3.3%%), and had tested positive before Omicron became the dominant variant (44.9% versus 35.9%). The operational definition identified long COVID cases with combinations of clinical encounters (from four symptoms, six investigation types, and seven management strategies) recorded in EHRs within 4–26 weeks of a positive SARS-CoV-2 test. These combinations were significantly (p < 0.0001) more prevalent in positive COVID-19 patients than in matched negative controls. In a case-crossover analysis, 16.4% of those identified by the operational definition had similar healthcare patterns recorded before testing positive. Interpretation:The prevalence of long COVID presenting in general practice was estimated to be 0.02–1.7%, depending on the measure used. Due to challenges in diagnosing long COVID and inconsistent recording of information in EHRs, the true prevalence of long COVID is likely to be higher. The operational definition provided a novel approach but relied on a restricted set of symptoms and may misclassify individuals with pre-existing health conditions. Further research is needed to refine and validate this approach.