Data
Data on all SARS-CoV-2 PCR tests from laboratories across England undertaking Pillar 1 and Pillar 2 testing from January 2020 until the end of August 2021 were obtained from the United Kingdom Health Security Agency’s Second Generation Surveillance System (SGSS) [11]. Pillar 1 tests were carried out in Public Health England (PHE) labs and National Health Service (NHS) hospitals for patients and health and care workers, whereas Pillar 2 tests were conducted for the wider population, e.g. at walk-in testing sites. For people with multiple SARS-CoV-2 positive tests, the earliest positive test date was retained.
Data on all hospital admissions in England were obtained from the Secondary Uses Service (SUS) [12]. From SUS, we constructed patient hospital spells made up of contiguous episodes at a single hospital trust (SUS data are presented in consultant episodes, where a patient is under the continuous care of a single consultant). These data contain information on admission and discharge dates, whether routine or emergency admission, age, sex, ICD-10 codes (used for defining a measure of comorbidities using the comorbidity R package, version 0.5.3), and surgical interventions. Data used in this analysis were extracted on Feb 20th 2022 and contained admissions from the beginning of March 2020 through to the end of August 2021. Records with missing patient spell identifiers were excluded, as were records with no discharge date recorded or where the discharge date was apparently before the admission date. In total some 2% of raw records were thus filtered out. Spells of less than 2 days in length were excluded as they are not relevant under our definition of potential nosocomial cases. Patients who had tested positive prior to admission, or within the first two days following hospitalisation, were removed from the risk set.
SARS-CoV-2 PCR test data were linked to SUS admissions data via patient NHS number where available or using an exact match on both date of birth and local patient identifier where NHS number was not available. Hospital-linked cases are defined as those where the first positive test date occurs whilst a patient is in hospital, within 14 days prior to admission, or with 14 days following discharge. A summary of the data flows is shown in Fig. 1.

Records with implausible ages (over 115 years) were removed. Missing hospital spell identifiers mean that up to 0.5% of hospital spells may not have been accurately built up from their component episodes. Missing spell end dates resulted in 0.1% of spells being excluded altogether.
All time-related information relating to admissions, discharges and PCR tests was available only to the nearest day. When a patient dies in hospital, this is counted as a discharge with discharge date equal to date of death. Wherever we refer to infection times, it should be understood to mean the number of days from admission date to the first detection of SARS-CoV-2 infection. First positive test (specimen date) therefore serves as a proxy for infection date.
We included sex, age, Charlson comorbidity index (based on ICD10 codes), month of admission and admission method (elective/non-elective) as baseline confounders. Whether a patient has had invasive surgical procedures is a possible time-varying confounder and is included. Where applicable, that is for phases 3 and 4, whether or not the patient had been double-vaccinated 14 days or more before admission was additionally included at baseline. These risk factors are all potential confounders given that they may influence both the length of stay and the risk of acquiring a SARS-CoV-2 infection.
LoS is calculated based on the admission and discharge dates of the hospital stay within which the positive test occurs. Subsequent re-admissions do not count toward LoS unless the re-admission date falls on the previous discharge date, in which case the stays are joined together. We do not include reinfections or reactivations of disease and assume that these are relatively small in number, though the full picture likely changes over time and is not fully understood at the time of writing [13].
Scope
The analysis is split into four distinct phases (see Fig. 2) in order to examine the impact of COVID-19 at stages of the pandemic differing with respect to overall case numbers, the vaccine rollout and prevalence of variants. Phase 1: March 2020 through June 2020 inclusive, consists of most of the first wave. Phase 2: September 2020 through December 2020 inclusive, consists of the earliest part of the second wave before large numbers of people received a first vaccine dose. Phase 3: January 2021 through April 2021, consists of the remaining part of the second wave where increasing numbers of patients had been vaccinated, and when the Alpha variant was dominant. Phase 4: May 2021 through August 2021, consists of the third wave, when the Delta variant was dominant.

Analysis
We define a COVID-19 infection as hospital-onset if the patient’s first positive specimen date is at least two days after admission and does not occur after discharge. This definition includes possible, probable and definite hospital-onset cases, as described in [1]. The study population includes all patients admitted to English hospitals between March 1st 2020 and August 31st 2021 who stayed at least 2 days in hospital, excluding community-onset COVID-19 cases. Note that the latter includes those whose first positive specimen date is on the day of admission or on the day after; this is because these infections were very likely acquired before admission.
LoS is calculated in days as discharge date minus admission date, regardless of whether the discharged patient was alive or dead on discharge. Since we are interested in time to discharge—dead or alive—as the outcome, there is no competing risk between discharge and death. We estimate survival probabilities up to day 60 in hospital for the observed study population, comprising both infected (hospital onset infections) and uninfected cohorts, using a standard Kaplan–Meier analysis. We sum these probabilities over the days up to day 60 to obtain the average LoS up to day 60, the restricted mean survival time [14]. Applying this restriction avoids including long-staying patients in the study where the low numbers of patients provides insufficient support to adjust accurately for time-varying confounding. Approximately 3% of cases in our dataset have LoS of more than 60 days.
We then estimate what the LoS would have been in the counterfactual scenario where the infected did not acquire the infection, following the methodology described in [9]. To estimate the counterfactual LoS, cases are censored on the day of infection, so that their observed LoS after infection does not contribute to the LoS estimate. Furthermore, inverse probability weights were used to account for the potential informative censoring introduced by treating SARS-CoV-2 infections as censoring events. The weights remove confounding by baseline and time-varying confounders by rebalancing the case contributions on each of the 60 days; they were constructed using pooled logistic regression models for the probability of infection on each day [15, 16], so for each patient each day in hospital is treated as separate observation.
As discussed above, potentially important risk factors were selected for inclusion as variables in the model. Continuous variables were modelled as cubic splines with degrees of freedom chosen to minimise the Akaike Information Criterion: age was modelled by a cubic spline with 5 degrees of freedom; Charlson comorbidity with 2 degrees of freedom. Interactions between age and comorbidity were considered, but found not to improve the fit.
The excess LoS is estimated as the difference between the mean LoS observed and the counterfactual mean LoS. The excess LoS per infected case is obtained by multiplying this difference in means by the total number of patients and dividing by the number of patients whose infection was detected within the first 60 days of their stay. We estimated 95% confidence intervals assuming that the weights are deterministic ([9], supplement). We tested that this assumption is appropriate by re-sampling the coefficients of the pooled regression model, and re-calculating the weights based on these coefficients, to verify that the uncertainty in the weights was small relative to the uncertainty of the regression model.
Re-admissions
For the main analysis, re-admissions following the spell in which the SARS-CoV-2 infection was detected were not considered. To explore the possible impact of including re-admission in the length of stay calculations, we conducted an analysis for Phase 1. To achieve this, we additionally included any further hospital admissions for which the admission date was within 7 days following the discharge date of the initial hospital-onset stay. Additional days spent in hospital were added to the total LoS and treated as if a single continuous spell in hospital; days between discharge and re-admission are thus ignored.
Sensitivity to hospital-onset definition
We carried out an analysis of the impact of altering the definition of hospital-onset from those cases that are detected at least 2 days following admission to 7 days and 14 days following admission. These correspond to alternative definitions based on the likelihood of the infection being acquired in hospital as used, for example, in [1]. The 3 definitions can be loosely interpreted as covering all (detected) possible hospital-acquired infections (2 days or more), those that are probably or almost certainly hospital-acquired (7 days or more), and those that are almost certainly hospital-acquired (14 days or more).
Simulations
We tested simulated scenarios to validate the implementation of the methodology. Firstly, an analysis was run on a sample of the data to obtain an estimated excess LoS. An extra day was added to the discharge date of all hospital-onset cases in this sample and the analysis was re-run, resulting in an increase of one day to the estimated excess LoS. Secondly, a sample set was created where the hospital-onset cases had the same characteristics and LoS as the non-COVID-19 cases, and it was confirmed that the model returned no excess, as expected. In both examples, further additional days were added to hospital-onset discharge dates, with the expected result on the additional excess obtained.
Software
All analyses were carried out using R version 4.0.3.
link