In Critical reviews in toxicology
We examine how Bayesian network (BN) learning and analysis methods can help to meet several methodological challenges that arise in interpreting significant regression coefficients in exposure-response regression modeling. As a motivating example, we consider the challenge of interpreting positive regression coefficients for blood lead level (BLL) as a predictor of mortality risk for nonsmoking men. We first note that practices such as dichotomizing or categorizing continuous confounders (e.g. income), omitting potentially important socioeconomic confounders (e.g. education), and assuming specific parametric regression model forms leave unclear to what extent a positive regression coefficient reflects these modeling choices, rather than a direct dependence of mortality risk on exposure. Therefore, significant exposure-response coefficients in parametric regression models do not necessarily reveal the extent to which reducing exposure-related variables (e.g. BLL) alone, while leaving fixed other correlates of exposure and mortality risks (e.g. education, income, etc.) would reduce adverse outcome risks (e.g. mortality risks). We then consider how BN structure-learning and inference algorithms and nonparametric estimation methods (partial dependence plots) can be used to clarify dependencies between variables, variable selection, confounding, and quantification of joint effects of multiple factors on risk, including possible high-order interactions and nonlinearities. We conclude that these details must be carefully modeled to determine whether a data set provides evidence that exposure itself directly affects risks; and that BN and nonparametric effect estimation and uncertainty quantification methods can complement regression modeling and help to improve the scientific basis for risk management decisions and policy-making by addressing these issues.
Cox Louis Anthony
Causality, blood lead level, machine learning, mortality risk, partial dependence plots