Survey design, initial questionnaire development, and data collection
Surveys were structured as a sequence of identically phrased questions to the dog owner with each question or item addressing a specific dog behavior. We used two lead questions: “Please tell us how well each of these words describe your dog as he/she is today” and “please tell us how well each of these words describe your dog at mealtime” to collect daytime and mealtime information respectively. Items were scored on a 1 to 7 Likert scale with 1 labeled as “does not describe at all” and 7 labeled as “very much describes”. Technical implementation ensured that completed surveys had no missing data.
The initial item set to probe QoL across multiple domains was developed by a five-person team consisting of pet owners, veterinarians, veterinary nutritionists, and veterinary behaviorists. A literature review identified 9 commonly used QoL domains addressing physical (energy, mobility, pain, appetite, hydration, hygiene), emotional (happiness, anxiety) and social aspects (social interaction). Tentative mappings for domains from 4 key publications12,13,14,15 onto this set are shown in Supplementary Table S1 as an example. The team decided to exclude two of these domains for the initial item set development: pain because its effect is likely reflected in the other domains10 and hygiene because it did not pass validation15. For the remaining 7 domains of interest the team then developed an item set consisting of words that based on their experience and on inclusion in existing canine QoL instruments12 provide direct or indirect information on these domains. Because the team felt that general appetite questions might lack resolution a specific mealtime section was added to the survey as a potentially better practical way to obtain this information. This resulted in a 94-item questionnaire with 52 daytime items and 42 mealtime items (Supplementary Table S2).
The initial 94-item version of the survey was piloted on MARS employees from all United Kingdom and United States sites via internal social media groups. Responses were collected over two weeks (employee study). A second 98-item version (see Supplementary Table S2 for included items) was sent out by e-mail to 3929 participants of the Pet Insight Project17 and responses were collected over a week (citizen science study). The final data collection used a 36-item version (see Supplementary Table S3 for included items) and was executed by the Banfield Pet Hospital network of over 1000 general practice hospitals across the United States. A random sample of 49,000 dog owners received the survey by e-mail, and responses were collected over a week. Respondents were sent the same survey again 10 days after the initial contact to obtain data on survey repeatability (hospital client study). Given that the studies did not involve interventions on animals they were deemed exempt from ethical approval by the MARS ethical review board. Study objectives were shared prior to the survey and participants consented to these by completing the survey. The usage of electronic medical record data for scientific purposes (see below) is consented to by Banfield Pet Hospital clients.
Deriving dog signalment information and chronic disease status from medical records
To support sample characterization and construct validity analyses, surveys from the citizen science and hospital client studies were linked to the dog’s electronic medical record. Basic signalment information including age, breed, and sex were extracted from the last available visit with age recalculated to the survey date. Breeds were recoded into size categories toy, small, medium, large, and giant based on the breed’s average adult body weight18. Age-based life stage coding into the categories youth, midlife, and senior used breakpoints at 7 and 11 years for toy and small dogs, and breakpoints at 6 and 10 years for medium, large and giant dogs. Body condition score (BCS) was extracted from the last visit when available and carried forward from previous visits when not. It was recoded into the categories underweight, normal, and overweight from the original 5- or 9-point scale. The underweight group was ignored for the analysis because it included only 9 dogs.
Chronic disease status was scored for 5 disease clusters: osteoarthritis, gastro-intestinal (GI) disease, cardiac disease, dental disease, and skin disease. A definition for each disease cluster was developed based on a set of structured diagnostic codes identified by a board-certified veterinary specialist as being associated with the condition. Osteoarthritis, GI disease, and cardiac disease were scored “present” when at least one associated diagnostic code was recorded during any visit in the medical record, and “absent” otherwise. For GI disease and skin disease, we further imposed the cluster diagnostic code to be recorded at least once in the 18 months prior to the survey to increase the probability that the disease was still present at the time of the survey. For skin disease we further added the requirement that a cluster diagnostic code was recorded during at least 3 different visits, again to enrich for chronic conditions.
For instrument development and validation, a factor analysis was performed on all items in scope using the R package psych19. The initial number of factors was determined by parallel analysis20. Factors were sequentially reduced considering interpretability, by removing items with a too low (r < 0.30) or a too high (r > 0.8) within-domain Pearson correlation coefficient r, and by assessing item repeatability. For the latter the intraclass correlation coefficient21 was calculated on the subset of dogs for whom repeated surveys were available. For the final instrument development stage and for validation analyses the number of factors was fixed a priori. For some domains in the final instrument, factor loadings were reversed so that all domains have higher scores for increasing levels of the construct expressed by the domain name. In order to ease interpretation, domain scores were mapped back on the original 1–7 scale.
In the construct validity analyses, associations between domain scores and factors of interest were tested with a non-parametric Mann–Whitney U test and with a Kruskal–Wallis test in case of 2 or more than 2 factor levels, respectively. Factor effects were expressed as the difference between the factor level median and the median of the factor reference level. To enable effect size comparison between domains, factor effects were also scaled in approximate units of the domain’s population standard deviation. A robust estimate of the population standard deviation was obtained by dividing the domain interquartile range by 1.35, the number of standard deviation units the interquartile range covers in a normal distribution. Reported p values are not corrected for multiple testing but we only called statistical significance in case of a p value below 0.001. This corresponds to a Bonferroni correction for the 72 tests performed in this paper (0.05/72 ≈ 0.001) and guarantees the fraction of reported false positive results to be less than 7.2%. All analyses were performed in the statistical software R22 version 3.6.3 using standard functions and packages where not explicitly mentioned.