Development and validation cohorts
We analyzed two previous studies which had as the primary aim to study adjustments in asthma treatment30,31. The development cohort was a randomized controlled trial comparing different inhaler medications with follow up of approximately 84 weeks31. The validation cohort was a single-blind placebo-controlled trial examining alternative treatment pathways with follow up of approximately 60 weeks32. All patients had stable mild-to-moderate chronic asthma. Both studies were conducted in an asthma clinic in New Zealand on patients referred by their general practitioners. For both studies, patients recorded their peak expiratory flow and use of \(\upbeta \)2-reliever (yes/no) in the morning and evening of every trial day in diaries. Nocturnal awakening (yes/no) was recorded in the morning (see below).
The outcome variable was measured daily and was defined as the occurrence of a severe asthma exacerbation within 2 days (the day of the measurement or the following day). Table 4 provides a visualization of this 2-day window outcome. Severe asthma exacerbations were defined as the need for a course of oral corticosteroids (prednisone) for a minimum of 3 days, as documented in medical records30,31.
All predictors were measured or calculated daily. Nocturnal awakening (yes/no), the average of morning and evening peak expiratory flow (PEF, measured in liters per minute) and the use of \(\upbeta \)2-reliever in morning and evening (used in both morning and evening/used in morning or evening/not used in morning and evening) were considered as potential predictors. For a rolling window of 7 days, we also calculated the PEF average, standard deviation, maximum and minimum and added these as predictors. This rolling window consisted of the current day and all 6 preceding days. The PEF personal best was determined per patient during a run-in period of 4 weeks and added to the models. Lastly, we constructed and added first differences (the difference in today’s measurement with respect to yesterday’s measurement) and lags (yesterday’s measurement) for PEF, nocturnal awakening, and use of \(\upbeta \)2-reliever.
Demographics and descriptive statistics of predictors (i.e., age, sex, mean PEF, PEF % personal best, nocturnal awakening, and use of \(\upbeta \)2-reliever) were calculated for each individual patient over their respective observational periods.
Missing values were interpolated based on previous and succeeding values and the data was normalized. The first ML model developed through supervised learning was a gradient boosted decision trees (XGBoost) model. This model was chosen as it is one of the most popular ML techniques, and it performs well for a wide selection of problems, including time series prediction33. The XGBoost model estimates many decision-trees sequentially. This is also called boosting. These decision tree predictions are combined into an ensemble model to arrive at the final predictions. The sequential training makes the XGBoost model faster and more efficient than other tree-based algorithms, such as random forest. A downside of this model is that, due to its complexity, it becomes hard to interpret. Moreover, when the missingness is high, tuning an XGBoost model may become increasingly difficult, which is less of an issue with other tree-based models like random forest.
Second, we trained an outlier detection model (one class SVM with Radial Basis Kernel)34. The one class SVM aims to find a frontier that delimits the contours of the original distribution. By estimating this frontier, it can identify whether a new data point falls outside of the original distribution and should therefore be classified as ‘irregular’. An advantage of this model is that it is particularly apt at dealing with the low event rate in the asthma data. A downside of this model is that it does not provide probability estimates like a regular support vector machine and we therefore must base its predictive performance on its classification metrics only (see below).
Additionally, we developed a prediction model using logistic regression as the popular classical prediction counterpart of these two ML models. Logistic regression assumes a probability distribution for the outcome variable and models the log-odds of each patient experiencing the outcome linearly. The log-odds are converted into probabilities via the logistic function. Logistic regression is an inherently interpretable technique and a hallmark of classical prediction modelling35,36. Due to its linearity restriction, it may however not provide the level of complexity needed to adequately model certain prediction problems. Machine learning methods, like XGBoost and one class SVM, provide more flexibility, which comes at a cost of the interpretability of these methods.
The hyperparameters of the XGBoost, one class SVM, and logistic regression models (see additional Table A4) were set using a full grid search and 5 × 5-fold cross-validation (stratified by patient) on the development cohort. We trained the final models using all data with optimized hyperparameters. We compared these model outcomes with a clinical rule that is currently proposed as action point in an asthma action plan by the British Thoracic Society: start oral corticosteroids treatment if PEF < 60% of personal best2,5.
After completing model development on the development cohort, all models and the clinical rule were applied to the validation cohort. The discriminative performance of the models producing probabilities (XGBoost and logistic regression) was measured via the area under the receiver operating characteristic curve (AUC) and histograms of the probability distributions were plotted. We applied the DeLong test to compare the AUCs from these two models. Calibration was assessed graphically and quantified through the calibration slope and intercept26. Confidence intervals were obtained through bootstrapping (based on a 1000 iterations). Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for all models at the following probability thresholds (the cut-off point at which probabilities are converted into binary outcomes): 0.1% and 0.2%. These were chosen as they circle the prevalence rate of the outcome in our data. For a fair comparison with the clinical rule, we also calculated these performance metrics (sensitivity, specificity, etc.) for the XGBoost and logistic regression models at the probability thresholds producing the same number of positive predictions as produced by the one class SVM and the clinical rule.
We performed a sensitivity analysis for predicting exacerbations within 4 and 8 days as opposed to 2 days (Table 4). This enabled us to study the effect of a variation in the length of the outcome window on the models’ discrimination and calibration capacities.
Second, we performed a sensitivity analysis to assess the effect of the number of lags on model performance. For this analysis, we varied the number of lags from 1 to 5 for the models predicting exacerbations within 2 days. For the XGBoost and logistic regression model, the AUC was compared. For the one class SVM model, the sensitivity, specificity, PPV, and NPV were compared.
All analyses were performed in Python 3.8.0. with R 3.6.3 plug-ins to obtain calibration results. The key functions and libraries can be found in additional file 2. The complete code is available on request.
Ethics approval and consent to participate
Ethics approval was obtained for the original data collection. These studies were conducted in accordance with the principles of the Declaration of Helsinki on biomedical research. The protocols were approved by the Otago and Canterbury ethics committees and all patients gave written informed consent prior to participation.