| Title: | Reproducibility Diagnostics for Statistical Modeling |
|---|---|
| Description: | Tools for diagnosing the reproducibility of statistical model outputs under data perturbations. Implements bootstrap, subsampling, and noise-based perturbation schemes and computes coefficient stability, p-value stability, selection stability, prediction stability, and a composite reproducibility index on a 0 to 100 scale. Includes cross-validation ranking stability for model comparison and visualization utilities. Optional backends support robust M-estimation ('MASS') and penalized regression ('glmnet'). Bootstrap perturbation follows Efron and Tibshirani (1993, ISBN:9780412042317); selection stability follows Meinshausen and Buhlmann (2010) <doi:10.1111/j.1467-9868.2010.00740.x>; reproducibility framework follows Peng (2011) <doi:10.1126/science.1213847>. |
| Authors: | Gideon Nti Boateng [aut, cre] |
| Maintainer: | Gideon Nti Boateng <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-05-25 06:27:53 UTC |
| Source: | https://github.com/ntigideon/reprostat |
Computes the variance of each regression coefficient across perturbation iterations. Lower variance indicates greater stability.
coef_stability(diag_obj)coef_stability(diag_obj)
diag_obj |
A |
A named numeric vector of per-coefficient variances.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) coef_stability(d)set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) coef_stability(d)
Evaluates model selection stability by repeatedly running -fold
cross-validation and recording the error-metric rank of each candidate model
across repetitions. Supports four modeling backends: "lm",
"glm", "rlm" (robust regression via MASS), and
"glmnet" (penalized regression via glmnet).
cv_ranking_stability( formulas, data, v = 5L, R = 30L, seed = 20260307L, family = NULL, backend = c("lm", "glm", "rlm", "glmnet"), en_alpha = 1, lambda = NULL, metric = c("auto", "rmse", "logloss") )cv_ranking_stability( formulas, data, v = 5L, R = 30L, seed = 20260307L, family = NULL, backend = c("lm", "glm", "rlm", "glmnet"), en_alpha = 1, lambda = NULL, metric = c("auto", "rmse", "logloss") )
formulas |
A named list of formulas, one per candidate model. |
data |
A data frame. |
v |
Number of cross-validation folds. Must satisfy
|
R |
Number of cross-validation repetitions. Default is |
seed |
Integer random seed for reproducibility. Default is
|
family |
A GLM family object (e.g. |
backend |
Modeling backend. One of |
en_alpha |
Elastic-net mixing parameter for |
lambda |
Regularization parameter for |
metric |
Error metric used for ranking. One of |
A list with components:
settingsList with v, R, seed,
metric, family, backend, en_alpha, and
lambda.
rmse_mat matrix of per-repeat mean
error values (RMSE or log-loss depending on metric).
rank_mat integer matrix of per-repeat
model ranks (rank 1 = best).
summaryData frame with columns model,
mean_rmse, sd_rmse, mean_rank, and
top1_frequency, ordered by mean rank. Note: the columns
mean_rmse and sd_rmse store the mean and standard
deviation of the chosen error metric (RMSE or log-loss); the column
names are retained for backwards compatibility.
# Linear models models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp) cv_ranking_stability(models, mtcars, v = 5, R = 20) # Logistic models glm_models <- list(m1 = am ~ wt + hp, m2 = am ~ wt + hp + qsec) cv_ranking_stability(glm_models, mtcars, v = 5, R = 20, family = stats::binomial(), metric = "logloss") # Robust regression if (requireNamespace("MASS", quietly = TRUE)) { cv_ranking_stability(models, mtcars, v = 5, R = 20, backend = "rlm") } # Penalized (LASSO) if (requireNamespace("glmnet", quietly = TRUE)) { lasso_models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp + qsec) cv_ranking_stability(lasso_models, mtcars, v = 5, R = 20, backend = "glmnet") }# Linear models models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp) cv_ranking_stability(models, mtcars, v = 5, R = 20) # Logistic models glm_models <- list(m1 = am ~ wt + hp, m2 = am ~ wt + hp + qsec) cv_ranking_stability(glm_models, mtcars, v = 5, R = 20, family = stats::binomial(), metric = "logloss") # Robust regression if (requireNamespace("MASS", quietly = TRUE)) { cv_ranking_stability(models, mtcars, v = 5, R = 20, backend = "rlm") } # Penalized (LASSO) if (requireNamespace("glmnet", quietly = TRUE)) { lasso_models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp + qsec) cv_ranking_stability(lasso_models, mtcars, v = 5, R = 20, backend = "glmnet") }
Generates a perturbed version of a dataset using one of three strategies: bootstrap resampling, subsampling without replacement, or Gaussian noise injection.
perturb_data( data, method = c("bootstrap", "subsample", "noise"), frac = 0.8, noise_sd = 0.05, response_col = NULL )perturb_data( data, method = c("bootstrap", "subsample", "noise"), frac = 0.8, noise_sd = 0.05, response_col = NULL )
data |
A data frame. |
method |
Character string specifying the perturbation method. One of
|
frac |
Fraction of rows to retain for subsampling. Ignored for other
methods. Must be in |
noise_sd |
Noise level as a fraction of each column's standard
deviation. Ignored unless |
response_col |
Optional character string naming the response (outcome)
column to exclude from noise injection. Useful when you want to
perturb predictors only and leave the outcome unchanged. Ignored for
|
A data frame with the same columns as data. The number of
rows equals nrow(data) for bootstrap and noise, and
floor(frac * nrow(data)) for subsampling.
set.seed(1) d_boot <- perturb_data(mtcars, method = "bootstrap") d_sub <- perturb_data(mtcars, method = "subsample", frac = 0.7) d_nois <- perturb_data(mtcars, method = "noise", noise_sd = 0.1) # Perturb predictors only, leave the response (mpg) unchanged: d_pred_only <- perturb_data(mtcars, method = "noise", noise_sd = 0.1, response_col = "mpg")set.seed(1) d_boot <- perturb_data(mtcars, method = "bootstrap") d_sub <- perturb_data(mtcars, method = "subsample", frac = 0.7) d_nois <- perturb_data(mtcars, method = "noise", noise_sd = 0.1) # Perturb predictors only, leave the response (mpg) unchanged: d_pred_only <- perturb_data(mtcars, method = "noise", noise_sd = 0.1, response_col = "mpg")
Produces a bar chart of either the top-1 selection frequency or the mean CV rank for each candidate model.
plot_cv_stability(cv_obj, metric = c("top1_frequency", "mean_rank"))plot_cv_stability(cv_obj, metric = c("top1_frequency", "mean_rank"))
cv_obj |
A list returned by |
metric |
Character string. One of |
Invisibly returns NULL; called for its side effect.
models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp) cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 20) plot_cv_stability(cv_obj, metric = "top1_frequency")models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp) cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 20) plot_cv_stability(cv_obj, metric = "top1_frequency")
Produces a ggplot2 horizontal bar chart of either the top-1 frequency or the mean rank of each candidate model from a repeated cross-validation stability run.
plot_cv_stability_gg(cv_obj, metric = c("top1_frequency", "mean_rank"))plot_cv_stability_gg(cv_obj, metric = c("top1_frequency", "mean_rank"))
cv_obj |
A list returned by |
metric |
One of |
A ggplot object.
if (requireNamespace("ggplot2", quietly = TRUE)) { models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp) cv <- cv_ranking_stability(models, mtcars, v = 5, R = 20) plot_cv_stability_gg(cv, "top1_frequency") plot_cv_stability_gg(cv, "mean_rank") }if (requireNamespace("ggplot2", quietly = TRUE)) { models <- list(m1 = mpg ~ wt + hp, m2 = mpg ~ wt + hp + disp) cv <- cv_ranking_stability(models, mtcars, v = 5, R = 20) plot_cv_stability_gg(cv, "top1_frequency") plot_cv_stability_gg(cv, "mean_rank") }
Produces a bar chart or histogram summarising one stability dimension from a
reprostat object.
plot_stability( diag_obj, type = c("coefficient", "pvalue", "selection", "prediction") )plot_stability( diag_obj, type = c("coefficient", "pvalue", "selection", "prediction") )
diag_obj |
A |
type |
Character string specifying the plot type. One of
|
Invisibly returns NULL; called for its side effect.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) plot_stability(d, "coefficient") plot_stability(d, "selection")set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) plot_stability(d, "coefficient") plot_stability(d, "selection")
Produces a ggplot2 bar chart for either the coefficient variance or the selection frequency of each predictor term, ordered from lowest to highest value.
plot_stability_gg(diag_obj, type = c("coefficient", "selection"))plot_stability_gg(diag_obj, type = c("coefficient", "selection"))
diag_obj |
A |
type |
One of |
A ggplot object.
if (requireNamespace("ggplot2", quietly = TRUE)) { d <- run_diagnostics(mpg ~ wt + hp, mtcars, B = 50) plot_stability_gg(d, "coefficient") plot_stability_gg(d, "selection") }if (requireNamespace("ggplot2", quietly = TRUE)) { d <- run_diagnostics(mpg ~ wt + hp, mtcars, B = 50) plot_stability_gg(d, "coefficient") plot_stability_gg(d, "selection") }
Computes the pointwise variance of predictions across perturbation iterations, and the mean of these variances as a scalar summary.
prediction_stability(diag_obj)prediction_stability(diag_obj)
diag_obj |
A |
By default predictions are made on the training data used to fit the base
model. For method = "subsample" this means the held-out rows
receive genuine out-of-sample predictions, while for method =
"bootstrap" the predictions are a mix of in-bag and out-of-bag. Pass
predict_newdata to run_diagnostics for a dedicated
held-out evaluation set.
A list with components:
pointwise_varianceNumeric vector of per-observation prediction variances.
mean_varianceMean of pointwise variances.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) prediction_stability(d)$mean_varianceset.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) prediction_stability(d)$mean_variance
Print a reprostat object
## S3 method for class 'reprostat' print(x, ...)## S3 method for class 'reprostat' print(x, ...)
x |
A |
... |
Further arguments (ignored). |
Invisibly returns x.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 20) print(d)set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 20) print(d)
Computes the proportion of perturbation iterations in which each predictor
is statistically significant (p-value below alpha). The intercept
is excluded. Values near 0 or 1 indicate stable decisions; values near 0.5
indicate high instability.
pvalue_stability(diag_obj)pvalue_stability(diag_obj)
diag_obj |
A |
A named numeric vector of significance frequencies in ,
excluding the intercept. All NaN for backend = "glmnet"
(p-values are not defined).
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) pvalue_stability(d)set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) pvalue_stability(d)
Computes a composite reproducibility index (RI) on a 0–100 scale by aggregating normalized versions of coefficient, p-value, selection, and prediction stability.
reproducibility_index(diag_obj)reproducibility_index(diag_obj)
diag_obj |
A |
Each component is mapped to as follows:
), where
is a global scale reference. This prevents the exponential from
collapsing to zero for weakly-identified (near-zero) predictors.
Includes all model terms (intercept and predictors).
),
where is the proportion of perturbation iterations in
which term is significant at level alpha and
is the number of predictor terms. Values near 0 or 1 (consistent
decisions) score high; 0.5 (random) scores zero.
NA for backend = "glmnet".
)For "lm", "glm", "rlm": the mean
sign consistency across predictors — the proportion of
perturbation iterations in which each predictor's estimated sign
agrees with the base-fit sign. Captures stability of effect direction,
which is distinct from significance stability ().
For "glmnet": the mean non-zero selection frequency —
proportion of perturbation iterations in which each predictor's
coefficient is non-zero. Always available (never NA).
Excludes the intercept.
).
Prediction variance relative to outcome variance drives the decay.
The RI is , where is the number of non-NA
components. For backend = "glmnet", is NA so
the RI is based on three components (): ,
, and . All other backends
contribute all four components ().
Comparability across backends: because "glmnet" uses three
components and "lm"/"glm"/"rlm" use four, RI values
are not directly comparable across backends.
A list with components:
indexScalar RI on a 0–100 scale.
componentsNamed numeric vector with four sub-scores:
coef, pvalue, selection, prediction.
pvalue is NA for backend = "glmnet";
selection is always available.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) reproducibility_index(d)set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) reproducibility_index(d)
Estimates uncertainty in the RI by resampling the perturbation iterations
already stored in a reprostat object (no additional model fitting).
ri_confidence_interval(diag_obj, level = 0.95, R = 1000L, seed = NULL)ri_confidence_interval(diag_obj, level = 0.95, R = 1000L, seed = NULL)
diag_obj |
A |
level |
Confidence level, e.g. |
R |
Number of bootstrap resamples of the perturbation draws.
Default is |
seed |
Integer random seed passed to |
A named numeric vector of length 2 giving the lower and upper quantile bounds of the RI bootstrap distribution.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) ri_confidence_interval(d, R = 200, seed = 1)set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) ri_confidence_interval(d, R = 200, seed = 1)
Fits a model on the original data, then repeatedly fits on perturbed
versions to collect coefficient estimates, p-values, and predictions for
downstream stability analysis. Four modeling backends are supported:
ordinary least squares ("lm"), generalized linear models
("glm"), robust regression ("rlm" via MASS), and
penalized regression ("glmnet" via glmnet).
run_diagnostics( formula, data, B = 200, method = c("bootstrap", "subsample", "noise"), alpha = 0.05, frac = 0.8, noise_sd = 0.05, predict_newdata = NULL, family = NULL, backend = c("lm", "glm", "rlm", "glmnet"), en_alpha = 1, lambda = NULL, perturb_response = FALSE )run_diagnostics( formula, data, B = 200, method = c("bootstrap", "subsample", "noise"), alpha = 0.05, frac = 0.8, noise_sd = 0.05, predict_newdata = NULL, family = NULL, backend = c("lm", "glm", "rlm", "glmnet"), en_alpha = 1, lambda = NULL, perturb_response = FALSE )
formula |
A model formula. |
data |
A data frame. |
B |
Integer. Number of perturbation iterations. Default is |
method |
Perturbation method passed to |
alpha |
Significance threshold for p-value and selection stability.
Default is |
frac |
Subsampling fraction. Passed to |
noise_sd |
Noise level. Passed to |
predict_newdata |
Optional data frame for out-of-sample prediction
stability. Defaults to |
family |
A GLM family object (e.g. |
backend |
Modeling backend. One of |
en_alpha |
Elastic-net mixing parameter passed to
|
lambda |
Regularization parameter for |
perturb_response |
Logical. When |
An object of class "reprostat", a list with components:
callThe matched call.
formulaThe model formula.
BNumber of iterations used.
alphaSignificance threshold used.
base_fitModel fitted on the original data.
y_trainResponse vector from the original data (used internally for RI computation).
coef_matB x p matrix of coefficient estimates.
p_matB x p matrix of p-values, or all-NA for
backend = "glmnet" (where p-values are not defined).
pred_matn x B matrix of predictions.
methodPerturbation method used.
familyGLM family used, or NULL.
backendBackend used.
en_alphaElastic-net mixing parameter used (only relevant
for backend = "glmnet", otherwise NA).
set.seed(1) # Linear model (small B for a quick check) diag_lm <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 20) print(diag_lm) # Logistic regression diag_glm <- run_diagnostics(am ~ wt + hp + qsec, data = mtcars, B = 50, family = stats::binomial()) reproducibility_index(diag_glm) # Robust regression (requires MASS) if (requireNamespace("MASS", quietly = TRUE)) { diag_rlm <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50, backend = "rlm") reproducibility_index(diag_rlm) } # Penalized regression / LASSO (requires glmnet) if (requireNamespace("glmnet", quietly = TRUE)) { diag_glmnet <- run_diagnostics(mpg ~ wt + hp + disp + qsec, data = mtcars, B = 50, backend = "glmnet", en_alpha = 1) reproducibility_index(diag_glmnet) }set.seed(1) # Linear model (small B for a quick check) diag_lm <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 20) print(diag_lm) # Logistic regression diag_glm <- run_diagnostics(am ~ wt + hp + qsec, data = mtcars, B = 50, family = stats::binomial()) reproducibility_index(diag_glm) # Robust regression (requires MASS) if (requireNamespace("MASS", quietly = TRUE)) { diag_rlm <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50, backend = "rlm") reproducibility_index(diag_rlm) } # Penalized regression / LASSO (requires glmnet) if (requireNamespace("glmnet", quietly = TRUE)) { diag_glmnet <- run_diagnostics(mpg ~ wt + hp + disp + qsec, data = mtcars, B = 50, backend = "glmnet", en_alpha = 1) reproducibility_index(diag_glmnet) }
Measures how consistently each predictor is selected across perturbation iterations. The definition depends on the modeling backend:
selection_stability(diag_obj)selection_stability(diag_obj)
diag_obj |
A |
"lm", "glm", "rlm"
Sign consistency: the proportion of perturbation iterations in
which the estimated coefficient has the same sign as in the base fit.
A value of 1 means the direction of the effect is perfectly stable; 0.5
means the sign is random. Returns NA for a predictor whose
base-fit coefficient is exactly zero.
"glmnet"Non-zero selection frequency: the proportion of perturbation iterations in which the coefficient is non-zero (i.e. the variable survives the regularisation penalty).
The intercept is always excluded.
A named numeric vector of selection stability values in ,
excluding the intercept.
set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) selection_stability(d)set.seed(1) d <- run_diagnostics(mpg ~ wt + hp, data = mtcars, B = 50) selection_stability(d)