Congratulations to professor Bryan Shepherd and principal biostatistician Aihua Bian on winning the 2023 Biometrics Best Paper Award, for "Multiwave validation sampling for error-prone electronic health records." The article was co-authored by Vanderbilt colleagues Shannon Pugh (Emergency Medicine), Stephany Duda (Biomedical Informatics), and William J. Heerman (Pediatrics), along with Kyunghee Han (University of Illinois at Chicago), Tong Chen (University of Auckland), Thomas Lumley (University of Auckland), and Pamela A. Shaw (Kaiser Permanente). The team used EHR data to "estimate the association between maternal weight gain during pregnancy and the risks of her child developing obesity or asthma." As the first known implementation of a multiwave sampling design, the study yielded several lessons learned and offers an R package (optimall) to address some of the data-handling chores:
First, adaptive sampling designs provide an important chance to recover from a poorly chosen first sampling wave. Second, we learned that it takes quite a bit of time between receiving validation data from one wave to design the next wave. Upon receiving validation data we needed to perform data quality checks, deidentify data, rerun FPCA analyses, refit regression models to estimate influence functions, recompute Neyman allocation, and then meet as a team to discuss whether and how to divide strata. Keeping track of interim data sets also became tedious. To alleviate some of these challenges we have developed an R package, optimall, which performs Neyman allocation, allows easy splitting of strata, and keeps track of various data sets in an efficient manner. This package also implements integer-valued Neyman allocation (Wright, 2017), which provides exact optimality for a fixed sample size (ie, avoids rounding issues) and was employed in later waves of our validation sampling.
The research was supported in part by PCORI award R-1609-36207 and NIH grants R01AI131771 and UL1TR002243.
This paper's team members are also the authors (with Vanderbilt Pediatrics and Nursing postdoc Nadia Sneed) of "Associations between gestational weight gain, gestational diabetes, and childhood obesity incidence," which appeared in the print issue of Maternal and Child Health Journal this month (following its initial e-publication in November). Drs. Shepherd, Shaw, and Duda are also known for their work on validating EHR data for HIV/AIDS research, which was recently recognized with a MERIT Award from the National Institute of Allergy and Infectious Diseases.
Schematic of multiwave sampling strategy for data validation in the childhood obesity study and the childhood asthma substudy. The numbers do not sum to 996 because of overlap of 38 records sampled for both the obesity and asthma studies. Figure 2 in Shepherd et al., "Multiwave validation sampling for error-prone electronic health records," © 2022 The International Biometric Society. https://doi.org/10.1111/biom.13713