Documentation of Initial Vancomycin Prediction Modelling

Below is the abstract from our recent publication “Development and Evaluation of a Machine Learning–Based Prediction-Modelling Approach for Initial Vancomycin Serum Concentrations in Septic ICU Patients Using Electronic Health Record Data” (not yet available) on individualized prediction of vancomycin concentrations in sepsis, for which this GitHub repository provides all associated code. In this single-center retrospective study, we developed and evaluated a machine learning–based modelling approach to predict initial vancomycin serum concentrations using routinely available clinical data, outperforming traditional population pharmacokinetic models and supporting more precise, individualized dosing decisions in septic patients.

Abstract

Background: Sepsis remains a life-threatening condition with highly heterogeneous and dynamic pathophysiology, limiting the effectiveness of uniform therapeutic strategies. Beyond timely source control, antimicrobial therapy represents the only causal treatment option. Vancomycin is widely used for treatment of Gram-positiveinfections; however, optimal dosing in septic patients is challenging due to pronounced pharmacokinetic variability and substantial interindividual heterogeneity. Underdosing may promote antimicrobial resistance, whereas overdosing increases the risk of toxicity. This study aimed to develop and validate a machine learning–based prediction model to support individualized vancomycin dosing using routinely available clinical data.

Methods: This single-center retrospective study included adult sepsis patients admitted to the intensive care unit, using routinely collected data from the hospital’s electronic medical records. Patients were eligible if they received a vancomycin loading dose followed by continuous infusion and had at least one measured serum concentration. Three machine learning models—elastic net regression, random forest, and XGBoost—were developed to predict the initial vancomycin serum concentration. To minimize bias and enhance generalizability, model training, hyperparameter tuning, and performance evaluation were conducted using a stratified nested cross-validation approach. Model performance was compared with seven commonly used population pharmacokinetic models

Results: The developed best performing elastic net model achieved a notable improvement with an average RMSE of $6.19$, compared to $7.83$ for the best pharmacokinetic model highlighting the potential of early and individualized dosing supported by a machine learning model. Final model analysis revealed that noradrenaline administration, together with classical pharmacokinetic parameters including body weight, serum creatinine, and the presence of chronic kidney disease, significantly influenced predictive performance.

Conclusions: This machine learning–based approach for predicting vancomycin serum concentrations outperforms conventional PK models and enables more precise, individualized dosing even prior to the availability pf therapeutic drug monitoring results. By integrating key clinical variables, the model facilitates data-driven decision-making in sepsis care and underscores the potential of machine learning to advance personalized antimicrobial therapy.

Code Structure

This repository contains the R code used for the analyses in the associated medical research publication. The scripts in the src/ directory are organized around distinct functional goals, reflecting the workflow from data preparation to model evaluation and visualization.

The file 00_misc.R provides a foundation for the analysis by listing all required R packages and implementing miscellaneous helper functions for aggregation and evaluation purposes. Data import is handled in 01_pdmsread.R, which contains functions to read and combine different components of the electronic health record (EHR) database.

Data preparation and feature engineering are carried out in 03_prepdata.R, where predictors across medical categories are aggregated using the approaches described in the 02_treat_* files. Pharmacokinetic (PK) simulations are performed in 02_poppk_mrgmodels.R and 02_poppk_simulate.R, which generate individual time–concentration curves based on estimated parameters from various population PK models, such as those used in the online tool TDMx.eu.

Initial pre-processing steps are implemented in 04_preproc.R. Subsequent essential pre-processing and modeling steps within a nested resampling framework are combined in 05_modelwfs.R. Actual hyperparameter tuning, training and evaluation of chosen models using nested resampling are conducted in 06_estimateperformance.R. The final tuning of the models, aimed at producing interpretable parameters and practically usable prediction models, is performed in 07_tunefinal.R.

Finally, all results, including tables and figures presented in the publication, are generated in 08_visualize.R.

Data Availability

Due to confidentiality, the data necessary to run these scripts is not included in the repository. Nevertheless, all relevant steps for data preparation, aggregation, pre-processing, modeling, tuning, evaluation, and visualization are fully implemented and transparently documented within the provided R scripts.

Utilized R-Version and Packages

R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_DK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggrepel_0.9.6      shapviz_0.10.2     tikzDevice_0.12.6  kableExtra_1.4.0  
 [5] tables_0.9.31      RColorBrewer_1.1-3 ggh4x_0.3.1        future_1.67.0     
 [9] yardstick_1.3.2    workflowsets_1.1.1 workflows_1.2.0    tune_1.3.0        
[13] rsample_1.3.1      recipes_1.3.1      parsnip_1.3.2      modeldata_1.5.1   
[17] infer_1.0.9        dials_1.4.1        scales_1.4.0       broom_1.0.9       
[21] tidymodels_1.3.0   mrgsolve_1.6.1     DescTools_0.99.60  glmnet_4.1-10     
[25] Matrix_1.7-4       ranger_0.17.0      devtools_2.4.5     usethis_3.2.0     
[29] openxlsx_4.2.8     vroom_1.6.5        lubridate_1.9.4    forcats_1.0.0     
[33] stringr_1.5.1      dplyr_1.1.4        purrr_1.1.0        readr_2.1.5       
[37] tidyr_1.3.1        tibble_3.3.0       ggplot2_3.5.2      tidyverse_2.0.0   

loaded via a namespace (and not attached):
  [1] rstudioapi_0.17.1   jsonlite_2.0.0      shape_1.4.6.1      
  [4] magrittr_2.0.3      magick_2.8.7        farver_2.1.2       
  [7] rmarkdown_2.29      fs_1.6.6            vctrs_0.6.5        
 [10] memoise_2.0.1       askpass_1.2.1       tinytex_0.57       
 [13] htmltools_0.5.8.1   haven_2.5.5         xgboost_1.7.11.1   
 [16] cellranger_1.1.0    parallelly_1.45.1   htmlwidgets_1.6.4  
 [19] pdftools_3.7.0      rootSolve_1.8.2.4   cachem_1.1.0       
 [22] mime_0.13           lifecycle_1.0.4     iterators_1.0.14   
 [25] pkgconfig_2.0.3     R6_2.6.1            fastmap_1.2.0      
 [28] shiny_1.11.1        digest_0.6.37       Exact_3.3          
 [31] furrr_0.3.1         pkgload_1.4.0       textshaping_1.0.1  
 [34] timechange_0.3.0    httr_1.4.7          compiler_4.5.2     
 [37] proxy_0.4-27        remotes_2.5.0       bit64_4.6.0-1      
 [40] withr_3.0.2         backports_1.5.0     pkgbuild_1.4.8     
 [43] MASS_7.3-65         lava_1.8.2          sessioninfo_1.2.3  
 [46] gld_2.6.7           tools_4.5.2         filehash_2.4-6     
 [49] zip_2.3.3           httpuv_1.6.16       future.apply_1.20.0
 [52] nnet_7.3-20         glue_1.8.0          promises_1.3.3     
 [55] grid_4.5.2          generics_0.1.4      gtable_0.3.6       
 [58] tzdb_0.5.0          class_7.3-23        data.table_1.17.8  
 [61] lmom_3.2            hms_1.1.3           xml2_1.5.1         
 [64] foreach_1.5.2       pillar_1.11.0       later_1.4.4        
 [67] lhs_1.2.0           splines_4.5.2       lattice_0.22-7     
 [70] survival_3.8-3      bit_4.6.0           tidyselect_1.2.1   
 [73] miniUI_0.1.2        knitr_1.50          svglite_2.2.1      
 [76] xfun_0.53           expm_1.0-0          hardhat_1.4.2      
 [79] timeDate_4041.110   stringi_1.8.7       DiceDesign_1.10    
 [82] qpdf_1.4.1          yaml_2.3.10         boot_1.3-32        
 [85] evaluate_1.0.4      codetools_0.2-20    cli_3.6.5          
 [88] rpart_4.1.24        systemfonts_1.2.3   xtable_1.8-4       
 [91] Rcpp_1.1.0          readxl_1.4.5        globals_0.18.0     
 [94] parallel_4.5.2      ellipsis_0.3.2      gower_1.0.2        
 [97] profvis_0.4.0       urlchecker_1.0.1    GPfit_1.0-9        
[100] listenv_0.9.1       viridisLite_0.4.2   mvtnorm_1.3-3      
[103] ipred_0.9-15        prodlim_2025.04.28  e1071_1.7-16       
[106] crayon_1.5.3        rlang_1.1.6

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
output		output
src		src
.gitignore		.gitignore
README.md		README.md
README.qmd		README.qmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation of Initial Vancomycin Prediction Modelling

Abstract

Code Structure

Data Availability

Utilized R-Version and Packages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Documentation of Initial Vancomycin Prediction Modelling

Abstract

Code Structure

Data Availability

Utilized R-Version and Packages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages