### Data collection

#### Influenza surveillance data

We collected weekly ILI rates (the proportion of ILI consultations out of all outpatient visits) in Beijing and the Hong Kong SAR from the 1st week of 2011 to the 50th week of 2021. The weekly ILI rates in Beijing were retrieved from a sentinel surveillance system for ILI [11], which was designed and led by the Beijing Center for Disease Control and Prevention (BJCDC) and comprised 421 sentinel hospitals. In this system, outpatient clinicians in internal medicine, the emergency department, fever clinics, and pediatric clinics were required to diagnose all ILI cases using the World Health Organization definition of ILI (patients presenting with fever ≥ 38 °C and cough or sore throat) and to record the number of ILI consultations by age group (0–4, 5–14, 15–24, 25–59, and 60 + years) [12]. These data were then reported daily to the BJCDC via an internet-based system. The weekly ILI rates were obtained by dividing ILI consultations by outpatient visits. This ILI surveillance was conducted throughout the year to monitor influenza virus activity in Beijing.

The weekly ILI rates in the Hong Kong SAR were obtained from the Centre for Health Protection of Hong Kong SAR, based on General Outpatient Clinics/Private Medical Practitioner Clinics sentinel surveillance (Weekly consultation rates of influenza-like illness. https://www.chp.gov.hk/en/static/24015.html. Accessed 1 August 2022.) [13, 14]. The ILI sentinel surveillance network comprised approximately 60 outpatient clinics [15]. At the end of each week, sentinel practitioners reported weekly data on the rates of ILI per 1000 outpatient consultations [16]. Age-specific ILI rates were not reported. The data were retrieved from Hong Kong Island, Kowloon, New Territories East, and New Territories West and aggregated [17]. This ILI surveillance was conducted throughout the year to monitor influenza virus activity in the Hong Kong SAR.

In the Chinese mainland, each surveillance year comprises a 12-month interval; from the 14th week of one year to the 13th week of the following year. For consistency within this study, the surveillance year in the Hong Kong SAR was defined as the same as that in Beijing.

#### COVID-19 NPIs data

We downloaded daily NPIs data from the Oxford COVID-19 Government Response Tracker (OxCGRT; Blavatnik School of Government, University of Oxford, United Kingdom; OxCGRT/covid-policy-tracker-legacy. https://github.com/OxCGRT/covid-policy-tracker-legacy. Accessed 1 August 2022.) [18] in Beijing and the Hong Kong SAR from the 1st week of 2020 to the 50th week of 2021. OxCGRT collects publicly available information on governments’ COVID-19 response [18]. Eight of the policy indicators (C1–C8) relate to containment and closure policies, such as school closures and restrictions on movement. Four of the indicators (E1–E4) concern economic policies, such as income support to citizens or provision of foreign aid. Eight indicators (H1–H8) cover health system policies, such as the COVID-19 testing regimes or emergency investments into healthcare. Three indicators (V1–V3) concern vaccination policies. Finally, a miscellaneous indicator (M1) is for notes that do not fit elsewhere. For each indicator, the OxCGRT scores used a numeric scale with higher scores signifying more intense or wider coverage of the interventions, and data are collected and updated in real-time and reported daily [19].

In this study, nine indicators considered likely to play a key role in influenza transmission based on epidemiological principles and previous studies were retrieved to establish the COVID-19 NPI datasets (Additional file 1: Table S1). Given the potential collinearity between these NPI indicators in the modelling, we combined similar indicators (Additional file 1: Table S2) by calculating their mean. C1 and C2 were combined to C12 representing “closings of schools and workplaces”. C3 and C4 were integrated to C34 representing “cancelling public events or gatherings”. C5, C6 and C7 were combined to C567 representing “restrictions on internal travel”. C1‒8 and H6 were averaged to “All NPIs as a whole” representing the overall intensity of NPIs. Daily NPI data were converted to weekly averages to be consistent with the ILI data.

#### Meteorological data

Daily meteorological data, including mean temperature, mean dew point temperature, and other meteorological factors in Beijing and the Hong Kong SAR, were obtained from the National Centers for Environmental Information (Global Surface Summary of the Day—GSOD. https://www.ncei.noaa.gov/access/search/data-search/global-summary-of-the-day?bbox=53.544,73.620,18.198,134.761&place=Country:194&stations=54511099999&pageNum=4. Accessed 10 August 2022.) [20] for the 1st week of 2011 to the 50th week of 2021. Temperature and humidity have been found to play a more important role in influenza transmission by previous studies [21, 22], which were incorporated into models in the study. The daily saturation vapor pressure* (*\(E\)*)*, daily actual vapor pressure* (*\(e)\), daily relative humidity *(*\(RH\)*)*, Kelvin temperature *(T)*, and daily absolute humidity* (*\(AH\)*)* were derived from Eqs. 1–5, respectively. In addition, \(t\) and \(t_d\) represented mean temperature and mean dew temperature in Eqs. 1–5, respectively. Daily meteorological data were also converted to weekly averages to be consistent with the ILI data.

$$\beginarraycE=6.112\mathrmexp\left(\frac17.67*tt+243.5\right)\endarray$$

(1)

$$\beginarrayce=6.112\mathrmexp\left(\frac17.67*t_dt_d+243.5\right)\endarray$$

(2)

$$\beginarraycRH=\fraceE\times 100\%\endarray$$

(3)

$$\beginarraycT=t+273.15^\circ\rm C \endarray$$

(4)

$$\beginarraycAH=217\times \fraceT\endarray$$

(5)

#### Public and school holiday data

The holiday data were obtained from the WorldPop Data repository (Global Holiday Data. https://hub.worldpop.org/project/categories?id=19. Accessed 10 August 2022.) [23] and a previous relevant study [24] for the 1st week of 2011 to the 50th week of 2021. The holiday datasets were established as a time series to record how many days in each week contained public or school holidays.

#### Population density data

The yearly population density data of Beijing and the Hong Kong SAR was downloaded from the statistical yearbook from the portal of Beijing Municipal Bureau Statistics (Beijing Statistical Yearbook 2022. https://nj.tjj.beijing.gov.cn/nj/main/2022-tjnj/zk/indexch.htm. Accessed 4 August 2022.) [25] and Census and Statistics Department of the Government of the Hong Kong SAR (Demographic Trends in Hong Kong 1991–2021. https://www.censtatd.gov.hk/sc/EIndexbySubject.html?pcode=B1120017&scode=150. Accessed 4 August 2022.) [26].

### Data analysis

#### Descriptive analysis

We conducted descriptive analysis to present the temporal distribution of the mean temperature, relative humidity, and absolute humidity from the 1st week of 2011 to the 50th week of 2021 in Beijing and the Hong Kong SAR, respectively (Figs. 1 and 2). The nine NPI indicators were descriptively analyzed from the 1st week of 2020 to the 50th week of 2021 in the above two settings (Figs. 1 and 2).

The weekly ILI counts were computed by multiplying the weekly ILI rate by a constant 10,000, representing weekly ILI consultations per 10,000 outpatient visits. Given that the incubation period of influenza (1‒4) days [27] and the lagged effect of meteorological factors and NPIs on influenza transmission from exposure to reporting, the weekly ILI counts were smoothen one week back using the moving average method in the study. Then the temporal trend of the average weekly ILI counts in 2011–2019, the observed in 2020, and the observed in 2021 was presented together from the 1st week to the 52nd week in Beijing and the Hong Kong SAR, respectively (Fig. 3). The interannual and seasonal trend of observed weekly ILI counts was also presented together with estimates of the GAM models below from the 1st week of 2011 to the 50th week of 2021 in each city (Figs. 4 and 5).

#### GAM models

First, GAM models based on data from the 1st week of 2011 to the 52nd week of 2019 was established to predict the weekly ILI counts for the COVID-19 period of the 1st week of 2020 to the 50th week of 2021 under a counterfactual scenario of no COVID-19 and its interventions in Beijing and the Hong Kong SAR, respectively. Before modeling, collinearity between mean temperature, relative humidity, and absolute humidity was analyzed using the Pearson correlation method (Additional file 1: Tables S3 and S4). Thus, if the correlation coefficient for a pair was greater than 0.7, only one was included in the model. As a result, mean temperature and relative humidity, but not absolute humidity, were incorporated into two types of predictive GAM models as follows. Equation 6 takes both seasonal and interannual variations into account, which Eq. 7 only considers seasonal features.

$$\beginarrayc\mathitlog\left[E\left(Y_i\right)\right]=\alpha + ns\left(Year_i, df\right)+ns\left(W_i, df\right) + ns\left(T_i, df\right)\\ +ns\left(RH_i,df\right) + factor\left(H_i\right)+pd\endarray$$

(6)

$$\beginarrayc\mathitlog\left[E\left(Y_i\right)\right]=\alpha + ns\left(W_i, df\right) + ns\left(T_i, df\right)\\ +ns\left(RH_i,df\right) + factor\left(H_i\right)+pd\endarray$$

(7)

where *E(Y*_{i}*)* is the expected weekly ILI counts in a given week (*i)*, and the link function here is *log()* with the assumption that weekly ILI counts follows log normal distribution; α is the intercept; \(ns\left(.\right)\) is a cubic spline function, \(df\) is the degree of freedom; \(Year_i\) is a time series of year numbers (1, 2, 3…, 11) in the dataset, representing the potential interannual long-term trend in ILI counts; \(W_i\) is a time series of week numbers (1, 2, 3…, 52) in a calendar year, representing the potential seasonality trend in weekly ILI counts; \(T_i\) is mean temperature in week *(i)*; \(RH_i\) is relative humidity in week *(i)*. The degrees of these factors are determined by partial autocorrelation functions [28]. \(H_i\) is an indicator variable that equals 0–7 representing how many days of school and public holidays in a week (*i)*. The population density (*pd*) of each city per year was also incorporated into the model to adjust for potential demographic confounding factors across space and time.

The models were built and trained using 2011–2019 data in Beijing and the Hong Kong SAR, respectively. The goodness of fit of the GAM models was assessed by root mean square error (RMSE), Akaike information criterion (AIC), and adjusted R-square (Additional file 1: Tables S5 and S6). However, the ILI in 2018–2019 in the Hong Kong SAR were much lower than those in the previous years, which might be due to the instability of ILI reporting during the protests or riots in the city during 2018–2019. Thus, the inclusion of the \(Year_i\) term or ILI data in 2018–2019 could lead to a continuous downward trend in ILI prediction from 2018 to 2021, which might not be fully representative of the real situation. To address this, we also tested and compared Eq. 6 using 2011–2017 data for both cities. Considering the data reliability, the goodness of fit of the models and the uncertainty of predictions (Figs. 4 and 5), the subsequent results in the main text were mainly retrieved from the predictions of Eq. 7, while the results of other models were presented in the Additional file 1.

Second, a simple GAM model including univariate interventions was established to assess the relationship between the relative reduction of weekly ILI counts and each individual or combined NPI measure, from 4th week of 2020 when the large-scale COVID-19 NPIs were implemented, to the 50th week of 2021 in Beijing and the Hong Kong SAR, respectively. After examining the collinearity using the Pearson correlation (Additional file 1: Tables S7 and S8), the relative change of mean temperature and the relative change of relative humidity, but not the relative change of absolute humidity, were incorporated into the model. The basic model is as follows:

$$\beginarraycE\left(Y_i\_c\right)= a\_c + \beta _jX_j+ns\left(S_i, df\right) + ns\left(T_i\_c, df\right)\\ + ns\left(RH_i\_c,df\right) + pd_i\_c\endarray$$

(8)

where \(E\left(Y_i\_c\right)\) is the expected relative reduction of ILI counts in week (i) (Eq. 9) with the assumption that the relative reduction of weekly ILI counts follows normal distribution; \(a_c\) is the intercept; \(S_i\) is a time series of week numbers (1,2,3…,99) during the study period in 2020–2021, representing the potential seasonality and long-term trend in weekly ILI counts; \(T_i\_c\) is the relative change of mean temperature in week (*i*) (Eq. 10); \(RH_i\_c\) is the relative change of relative humidity in week (*i)* (Eq. 11); \(pd_i\_c\) is the relative change of population density (Eq. 12); \(X_j\) refers to the each individual or combined NPI indicator, and \(\beta _j\) is its coefficient. A positive coefficient represents that the intervention, or its intensity might not be statistically associated to the decrease of weekly ILI counts across the whole study period, while a negative coefficient reveals that the implementation of COVID-19 NPIs might be associated with the reduction of ILI in the city.

$$\beginarraycY_i\_c=\fracObserved \,weekly \,ILI \,counts-predicted \,weekly \,ILI \,countspredicted \,weekly \,ILI \,counts\endarray$$

(9)

$$\beginarraycT_i\_c=\fracObserved \,mean \,temperature-average \,in \,the \,past \,9 \,yearsaverage \,in \,the \,past \,9 \,years \,in \,2011-2019\endarray$$

(10)

$$\beginarraycRH_i\_c=\fracObserved \,relative \,humidity-average \,in \,the \,past \,9 \,yearsaverage \,in \,the \,past \,9 \,years \,in \,2011-2019\endarray$$

(11)

$$\beginarraycpd\_c=\fracPopulation \,density-average \,in \,the \,past \,9 \,yearsaverage \,in \,the \,past \,9 \,years \,in \,2011-2019\endarray$$

(12)

Third, a multivariate GAM model including multiple interventions was further built to assess the relationship between the relative reduction of weekly ILI counts and the NPI indicators. Due to the collinearity among the individual or combined NPI indicators (Additional file 1: Tables S9–S12), for each individual NPI, only C3, C4, C6, C7, C8, and H6 were incorporated into the GAM model for Beijing and C1, C2, C3, C4, C6, C7, C8, and H6 were incorporated into the model for Hong Kong; for the combined NPI indicators, C34 and C567 were incorporated into the GAM model with C8 and H6 for Beijing and the C12, C34, C567, C8 and H6 were incorporated into the GAM model for Hong Kong. The basic model is as follows:

$$\beginarraycE\left(Y_i\_c\right)= \alpha + \sum \beta _j\left(X_j\right)+ns\left(S_i, df\right) + ns\left(T_i\_c, df\right)\\ + ns\left(RH_i\_c,df\right) + pd\_c\endarray$$

(13)

The meaning of the symbols in Eq. 13 was the same as that in Eq. 8. The goodness of fit of the GAM models in Beijing and the Hong Kong SAR in 2020–2021 was also assessed by RMSE, AIC and adjusted R-square (Additional file 1: Table S13).

#### Cross-validation

To understand the robustness of GAM models, we performed a blocked cross-validation analysis. For the GAM models in 2011–2019, in the first step, data in 2011–2016 (312 consecutive weeks) were used as a training set to establish the model and data in the following year 2017 (52 consecutive weeks) were used as a test set. In the second step, it was pushed forward for a week both for the training and test set. Such analysis continued until the last week of 2019. At last, 105 analyses were performed for each model and goodness of fit indicators (mean and 95% *CI*) were calculated (Additional file 1: Tables S5 and S6). For the GAM models in 2011–2017, in the first step, data in 2011–2015 (260 consecutive weeks) were used as a training set to establish the model and data in the following year 2016 (52 consecutive weeks) were used as a test set. In the second step, it was pushed forward for a week both for the training and test set. Such analysis continued until the last week of 2019. At last, 52 analyses were performed for each model and goodness of fit indicators (mean and 95% *CI*) were calculated (Additional file 1: Tables S5 and S6). Similarly, to test the performance of GAM models for revealing the NPI impacts in 2020–2021, data in the first 52 consecutive weeks were initially used as a training set to establish the model and data in the following consecutive 8 weeks were used as a test set. Both sets were moved forward 1 week at a time in the following steps until the last week of 2021. At last, 35 analyses were performed for each model with the goodness of fit indicators calculated (Additional file 1: Table S13). Inferred from the blocked cross-validation analysis (Additional file 1: Tables S5, S6 and S13), the performance of model 1, 2 and 3 in each block were quite stable (relatively small values of average RMSE, AIC and adjusted R^{2} with narrow 95% confidence intervals, respectively). These further implied that all models were robust to the changes in data and had a low risk of overfitting.

Dataset establishment and data analyses for this study were implemented using R version 4.0.0 (2020-04-24) (R Foundation for Statistical Computing, Vienna, Austria) and Excel Microsoft 365MSO (207 Build 16.0.15427.20182) (Microsoft, Washington, USA).

link