Title: | Markov Chain Monte Carlo Small Area Estimation |
---|---|
Description: | Fit multi-level models with possibly correlated random effects using Markov Chain Monte Carlo simulation. Such models allow smoothing over space and time and are useful in, for example, small area estimation. |
Authors: | Harm Jan Boonstra [aut, cre], Grzegorz Baltissen [ctb] |
Maintainer: | Harm Jan Boonstra <[email protected]> |
License: | GPL-3 |
Version: | 0.7.7 |
Built: | 2024-11-24 03:10:57 UTC |
Source: | https://github.com/cran/mcmcsae |
Fit multi-level models with possibly correlated random effects using MCMC.
Functions to fit multi-level models with Gaussian, binomial, multinomial, negative binomial or Poisson likelihoods using MCMC. Models with a linear predictor consisting of various possibly correlated random effects are supported, allowing flexible modeling of temporal, spatial or other kinds of dependence structures. For Gaussian models the variance can be modeled too. By modeling variances at the unit level the marginal distribution can be changed to a Student-t or Laplace distribution, which may account better for outliers. The package has been developed with applications to small area estimation in official statistics in mind. The posterior samples for the model parameters can be passed to a prediction function to generate samples from the posterior predictive distribution for user-defined quantities such as finite population domain means. For model assessment, posterior predictive checks and DIC/WAIC criteria can easily be computed.
Return Metropolis-Hastings acceptance rates
acceptance_rates(obj, aggregate.chains = FALSE)
acceptance_rates(obj, aggregate.chains = FALSE)
obj |
an mcdraws object, i.e. the output of function |
aggregate.chains |
whether to return averages over chains or results per chain. |
A list of acceptance rates.
ex <- mcmcsae_example() # specify a model that requires MH sampling (in this case for a modeled # degrees of freedom parameter in the variance part of the model) sampler <- create_sampler(ex$model, data=ex$dat, formula.V=~vfac(factor="fA", prior=pr_invchisq(df="modeled"))) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) (summary(sim)) acceptance_rates(sim)
ex <- mcmcsae_example() # specify a model that requires MH sampling (in this case for a modeled # degrees of freedom parameter in the variance part of the model) sampler <- create_sampler(ex$model, data=ex$dat, formula.V=~vfac(factor="fA", prior=pr_invchisq(df="modeled"))) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) (summary(sim)) acceptance_rates(sim)
Utility function to construct a sparse aggregation matrix from a factor
aggrMatrix(fac, w = 1, mean = FALSE, facnames = FALSE)
aggrMatrix(fac, w = 1, mean = FALSE, facnames = FALSE)
fac |
factor variable. |
w |
vector of weights associated with the levels of |
mean |
if |
facnames |
whether the factor levels should be used as column names for the aggregation matrix. |
A sparse aggregation matrix of class tabMatrix
.
n <- 1000 f <- sample(1:100, n, replace=TRUE) x <- runif(n) M <- aggrMatrix(f) all.equal(crossprod_mv(M, x), as.vector(tapply(x, f, sum)))
n <- 1000 f <- sample(1:100, n, replace=TRUE) x <- runif(n) M <- aggrMatrix(f) all.equal(crossprod_mv(M, x), as.vector(tapply(x, f, sum)))
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates a BART term in the
model's linear predictor. To use this model component one needs
to have R package dbarts installed.
brt( formula, X = NULL, n.trees = 75L, name = "", debug = FALSE, keepTrees = FALSE, ... )
brt( formula, X = NULL, n.trees = 75L, name = "", debug = FALSE, keepTrees = FALSE, ... )
formula |
a formula specifying the predictors to be used in the BART
model component. Variable names are looked up in the data frame
passed as |
X |
a design matrix can be specified directly, as an alternative
to the creation of one based on |
n.trees |
number of trees used in the BART ensemble. |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
keepTrees |
whether to store the trees ensemble for each Monte Carlo draw. This
is required for prediction based on new data. The default is |
... |
parameters passed to |
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
H.A. Chipman, E.I. Georgea and R.E. McCulloch (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266-298.
J.H. Friedman (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1-67.
# generate data, based on an example in Friedman (1991) gendat <- function(n=200L, p=10L, sigma=1) { x <- matrix(runif(n * p), n, p) mu <- 10*sin(pi*x[, 1] * x[, 2]) + 20*(x[, 3] - 0.5)^2 + 10*x[, 4] + 5*x[, 5] y <- mu + sigma * rnorm(n) data.frame(x=x, mu=mu, y=y) } train <- gendat() test <- gendat(n=25) # keep trees for later prediction based on new data sampler <- create_sampler( y ~ brt(~ . - y, name="bart", keepTrees=TRUE), sigma.mod=pr_invchisq(df=3, scale=var(train$y)), data = train ) sim <- MCMCsim(sampler, n.chain=2, n.iter=700, thin=2, store.all=TRUE, verbose=FALSE) (summ <- summary(sim)) plot(train$mu, summ$bart[, "Mean"]); abline(0, 1) # NB prediction is currently slow pred <- predict(sim, newdata=test, iters=sample(seq_len(ndraws(sim)), 100), show.progress=FALSE ) (summpred <- summary(pred)) plot(test$mu, summpred[, "Mean"]); abline(0, 1)
# generate data, based on an example in Friedman (1991) gendat <- function(n=200L, p=10L, sigma=1) { x <- matrix(runif(n * p), n, p) mu <- 10*sin(pi*x[, 1] * x[, 2]) + 20*(x[, 3] - 0.5)^2 + 10*x[, 4] + 5*x[, 5] y <- mu + sigma * rnorm(n) data.frame(x=x, mu=mu, y=y) } train <- gendat() test <- gendat(n=25) # keep trees for later prediction based on new data sampler <- create_sampler( y ~ brt(~ . - y, name="bart", keepTrees=TRUE), sigma.mod=pr_invchisq(df=3, scale=var(train$y)), data = train ) sim <- MCMCsim(sampler, n.chain=2, n.iter=700, thin=2, store.all=TRUE, verbose=FALSE) (summ <- summary(sim)) plot(train$mu, summ$bart[, "Mean"]); abline(0, 1) # NB prediction is currently slow pred <- predict(sim, newdata=test, iters=sample(seq_len(ndraws(sim)), 100), show.progress=FALSE ) (summpred <- summary(pred)) plot(test$mu, summpred[, "Mean"]); abline(0, 1)
Set options for the conjugate gradient (CG) sampler
CG_control( max.it = NULL, stop.criterion = NULL, preconditioner = c("GMRF", "GMRF2", "GMRF3", "identity"), scale = 1, chol.control = chol_control(), verbose = FALSE )
CG_control( max.it = NULL, stop.criterion = NULL, preconditioner = c("GMRF", "GMRF2", "GMRF3", "identity"), scale = 1, chol.control = chol_control(), verbose = FALSE )
max.it |
maximum number of CG iterations. |
stop.criterion |
total squared error stop criterion for the CG algorithm. |
preconditioner |
one of "GMRF", "GMRF2", "GMRF3" and "identity". |
scale |
scale parameter; only used by the "GMRF3" preconditioner. |
chol.control |
options for Cholesky decomposition, see |
verbose |
whether diagnostic information about the CG sampler is shown. |
A list of options used by the conjugate gradients algorithm.
These options are only effective in case the matrix to be decomposed is sparse, i.p.
of class dsCMatrix-class
.
chol_control(perm = NULL, super = NA, ordering = 0L, inplace = TRUE)
chol_control(perm = NULL, super = NA, ordering = 0L, inplace = TRUE)
perm |
logical scalar, see |
super |
logical scalar, see |
ordering |
an integer scalar passed to CHOLMOD routines determining which reordering schemes are tried to limit sparse Cholesky fill-in. |
inplace |
whether sparse Cholesky updates should re-use the same memory location. |
A list with specified options used for Cholesky decomposition.
This function can be used to combine the results of parallel simulations.
combine_chains(...)
combine_chains(...)
... |
objects of class |
A combined object of class mcdraws
where the number of stored
chains equals the sum of the numbers of chains in the input objects.
This function is used to combine the results of parallel posterior predictive simulations.
combine_iters(...)
combine_iters(...)
... |
objects of class |
A combined object of class mcdraws
where the number of stored
draws equals the sum of the numbers of draws in the input objects.
This function computes incidence, precision and restriction matrices, or
a subset thereof, for a Gaussian Markov Random Field (GMRF).
A GMRF is specified by a formula passed to the factor
argument,
in the same way as for the factor
argument of gen
.
compute_GMRF_matrices( factor, data, D = TRUE, Q = TRUE, R = TRUE, cols2remove = NULL, remove.redundant.R.cols = TRUE, enclos = .GlobalEnv, n.parent = 1L, ... )
compute_GMRF_matrices( factor, data, D = TRUE, Q = TRUE, R = TRUE, cols2remove = NULL, remove.redundant.R.cols = TRUE, enclos = .GlobalEnv, n.parent = 1L, ... )
factor |
factor formula of a generic model component,
see |
data |
data frame to be used in deriving the matrices. |
D |
if |
Q |
if |
R |
if |
cols2remove |
if an integer vector is passed, the dimensions (columns of D, rows and columns of Q and rows of R) that are removed. This can be useful in the case of empty domains. |
remove.redundant.R.cols |
whether to test for and remove redundant restrictions from restriction matrix R |
enclos |
enclosure to look for objects not found in |
n.parent |
for internal use; in case of custom factor, the number of frames up the calling stack in which to evaluate any custom matrices |
... |
further arguments passed to |
A list containing some or all of the components D
(incidence matrix),
Q
(precision matrix) and R
(restriction matrix).
n <- 1000 dat <- data.frame( x = rnorm(n), f1 = factor(sample(1:50, n, replace=TRUE)), f2 = factor(sample(1:10, n, replace=TRUE)) ) mats <- compute_GMRF_matrices(~ f1 * RW1(f2), dat) str(mats)
n <- 1000 dat <- data.frame( x = rnorm(n), f1 = factor(sample(1:50, n, replace=TRUE)), f2 = factor(sample(1:10, n, replace=TRUE)) ) mats <- compute_GMRF_matrices(~ f1 * RW1(f2), dat) str(mats)
If sampler
is provided instead of formula
, the design matrices
are based on the model used to create the sampler environment. In that case, if
data
is NULL
, the design matrices stored in sampler
are returned,
otherwise the design matrices are computed for the provided data based on the sampler's model.
The output is a list of dense or sparse design matrices for the model components
with respect to data
.
computeDesignMatrix(formula = NULL, data = NULL, labels = TRUE)
computeDesignMatrix(formula = NULL, data = NULL, labels = TRUE)
formula |
model formula. |
data |
data frame to be used in deriving the design matrices. |
labels |
if |
A list of design matrices.
n <- 1000 dat <- data.frame( x = rnorm(n), f = factor(sample(1:50, n, replace=TRUE)) ) str(computeDesignMatrix(~ x, dat)[[1]]) model <- ~ reg(~x, name="beta") + gen(~x, factor=~f, name="v") X <- computeDesignMatrix(model, dat) str(X)
n <- 1000 dat <- data.frame( x = rnorm(n), f = factor(sample(1:50, n, replace=TRUE)) ) str(computeDesignMatrix(~ x, dat)[[1]]) model <- ~ reg(~x, name="beta") + gen(~x, factor=~f, name="v") X <- computeDesignMatrix(model, dat) str(X)
Element 'factor' of a model component created using function
gen
is a formula composed of several possible terms described
below. It is used to derive a (typically sparse) precision matrix for a set of
coefficients, and possibly a matrix representing a set of linear constraints
to be imposed on the coefficient vector.
Independent effects corresponding to the levels of factor f
.
First-order random walk over the levels of factor f
.
The random walk can be made circular and different (fixed) weights can be attached to the innovations.
If specified, w
must be a positive numeric vector of length one less than the number of
factor levels. For example, if the levels correspond to different times, it would often be
reasonable to choose w
proportional to the reciprocal time differences. For equidistant
times there is generally no need to specify w
.
Second-order random walk.
First-order autoregressive correlation structure among
the levels of f
. Required argument is the (fixed) autoregressive parameter phi
.
For irregularly spaced AR(1) processes weights can be specified, in the same way as for
RW1
.
Dummy seasonal with period period
.
CAR spatial correlation.
Argument poly.df
can either be an object of (S4) class SpatialPolygonsDataFrame
or an object of (S3) class sf
. The latter can be obtained, e.g., from reading in a
shape file using function st_read
. Arguments snap
and queen
are passed to poly2nb
.
If derive.constraints=TRUE
the constraint matrix for an IGMRF model component
is formed by computing the singular vectors of the precision matrix.
P-splines, i.e. penalized B-splines structure over
the domain of a quantitative variable f. Arguments knots and degree are passed to
splineDesign
. If knots
is a single value it is interpreted as
the number of knots, otherwise as a vector of knot positions. By default 40 equally spaced
knots are used, and a degree of 3.
Either a custom precision or incidence
matrix associated with factor f can be passed to argument Q
or D
. Optionally a
constraint matrix can be supplied as R
, or constraints can be derived from the null space
of the precision matrix by setting derive.constraints=TRUE
.
iid(name) RW1(name, circular = FALSE, w = NULL) RW2(name) AR1(name, phi, w = NULL) season(name, period) spatial(name, poly.df, snap = sqrt(.Machine$double.eps), queen = TRUE) spline(name, knots, degree) custom(name, D = NULL, Q = NULL, R = NULL, derive.constraints = NULL)
iid(name) RW1(name, circular = FALSE, w = NULL) RW2(name) AR1(name, phi, w = NULL) season(name, period) spatial(name, poly.df, snap = sqrt(.Machine$double.eps), queen = TRUE) spline(name, knots, degree) custom(name, D = NULL, Q = NULL, R = NULL, derive.constraints = NULL)
name |
name of a variable, unquoted. |
circular |
whether the random walk is circular. |
w |
a vector of weights. |
phi |
value of an autoregressive parameter. |
period |
a positive integer specifying the seasonal period. |
poly.df |
a spatial data frame. |
snap |
passed to |
queen |
passed to |
knots |
passed to |
degree |
passed to |
D |
custom incidence matrix. |
Q |
custom precision matrix. |
R |
custom restriction matrix. |
derive.constraints |
whether to derive the constraint matrix for an IGMRF model component numerically from the precision matrix. |
B. Allevius (2018). On the precision matrix of an irregularly sampled AR(1) process. arXiv:1801.03791.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
# example of CAR spatial random effects if (requireNamespace("sf")) { # 1. load a shape file of counties in North Carolina nc <- sf::st_read(system.file("shape/nc.shp", package="sf")) # 2. generate some data according to a model with a few regression # effects, as well as spatial random effects gd <- generate_data( ~ reg(~ AREA + BIR74, prior=pr_normal(precision=1), name="beta") + gen(factor = ~ spatial(NAME, poly.df=nc), name="vs"), sigma.mod = pr_invchisq(df=10, scale=1), data = nc ) # add the generated target variable and the spatial random effects to the # spatial dataframe object nc$y <- gd$y nc$vs_true <- gd$pars$vs # 3. fit a model to the generated data, and see to what extent the # parameters used to generate the data, gd$pars, are reproduced sampler <- create_sampler( y ~ reg(~ AREA + BIR74, prior=pr_normal(precision=1), name="beta") + gen(factor = ~ spatial(NAME, poly.df=nc), name="vs"), block=TRUE, data=nc ) sim <- MCMCsim(sampler, store.all=TRUE, n.iter=600, n.chain=2, verbose=FALSE) (summ <- summary(sim)) nc$vs <- summ$vs[, "Mean"] plot(nc[c("vs_true", "vs")]) plot(gd$pars$vs, summ$vs[, "Mean"]); abline(0, 1, col="red") }
# example of CAR spatial random effects if (requireNamespace("sf")) { # 1. load a shape file of counties in North Carolina nc <- sf::st_read(system.file("shape/nc.shp", package="sf")) # 2. generate some data according to a model with a few regression # effects, as well as spatial random effects gd <- generate_data( ~ reg(~ AREA + BIR74, prior=pr_normal(precision=1), name="beta") + gen(factor = ~ spatial(NAME, poly.df=nc), name="vs"), sigma.mod = pr_invchisq(df=10, scale=1), data = nc ) # add the generated target variable and the spatial random effects to the # spatial dataframe object nc$y <- gd$y nc$vs_true <- gd$pars$vs # 3. fit a model to the generated data, and see to what extent the # parameters used to generate the data, gd$pars, are reproduced sampler <- create_sampler( y ~ reg(~ AREA + BIR74, prior=pr_normal(precision=1), name="beta") + gen(factor = ~ spatial(NAME, poly.df=nc), name="vs"), block=TRUE, data=nc ) sim <- MCMCsim(sampler, store.all=TRUE, n.iter=600, n.chain=2, verbose=FALSE) (summ <- summary(sim)) nc$vs <- summ$vs[, "Mean"] plot(nc[c("vs_true", "vs")]) plot(gd$pars$vs, summ$vs[, "Mean"]); abline(0, 1, col="red") }
This function sets up a sampler object, based on the specification of a model. The object contains functions to
draw a set of model parameters from their prior and conditional posterior distributions, and
to generate starting values for the MCMC simulation. The functions share a common environment
containing precomputed quantities such as design matrices based on the model and the data.
The sampler object is the main input for the MCMC simulation function MCMCsim
.
create_sampler( formula, data = NULL, family = "gaussian", ny = NULL, ry = NULL, r.mod, sigma.fixed = NULL, sigma.mod = NULL, Q0 = NULL, formula.V = NULL, logJacobian = NULL, linpred = NULL, compute.weights = FALSE, block = NULL, prior.only = FALSE, control = sampler_control() )
create_sampler( formula, data = NULL, family = "gaussian", ny = NULL, ry = NULL, r.mod, sigma.fixed = NULL, sigma.mod = NULL, Q0 = NULL, formula.V = NULL, logJacobian = NULL, linpred = NULL, compute.weights = FALSE, block = NULL, prior.only = FALSE, control = sampler_control() )
formula |
formula to specify the response variable and additive model components. The model components
form the linear predictor part of the model. A model component on the right hand side can be either
a regression term specified by |
data |
data frame with n rows in which the variables specified in model components can be found. |
family |
character string describing the data distribution. The default is 'gaussian'.
Other options are 'binomial', 'multinomial', 'negbinomial' for the negative binomial distribution,
'poisson', and 'gamma'. See mcmcsae-family for the related functions
that can be used to specify |
ny |
in case |
ry |
in case |
r.mod |
prior specification for a scalar (reciprocal) dispersion parameter
of the negative binomial distribution. The prior can be specified by a call to a prior
specification function. Currently |
sigma.fixed |
for Gaussian models, if |
sigma.mod |
prior for the variance parameter of a gaussian sampling distribution.
This can be specified by a call to one of the prior specification functions
|
Q0 |
n x n data-level precision matrix for a Gaussian model. It defaults to the unit matrix.
If an n-vector is provided it will be expanded to a (sparse) diagonal matrix with Q0 on its diagonal.
If a name is supplied it will be looked up in |
formula.V |
a formula specifying the terms of a variance model in the case of a Gaussian likelihood.
Currently two types of terms are supported: a regression term for the log-variance
specified with |
logJacobian |
if the data are transformed the logarithm of the Jacobian can be supplied so that it
is incorporated in all log-likelihood computations. This can be useful for comparing information criteria
for different transformations. It should be supplied as a vector of the same size as the response variable,
and is currently only supported if |
linpred |
a list of matrices defining (possibly out-of-sample) linear predictors to be simulated.
This allows inference on e.g. (sub)population totals or means. The list must be of the form
|
compute.weights |
if |
block |
DEPRECATED, please use argument |
prior.only |
whether a sampler is set up only for sampling from the prior or for sampling from both prior
and posterior distributions. Default |
control |
a list with further computational options. These options can
be specified using function |
The right hand side of the formula
argument to create_sampler
can be used to specify
additive model components. Currently four model components are supported: reg(...)
for regression or 'fixed' effects, gen(...)
for generic random effects,
mec(...)
for measurement in covariates effects, and brt(...)
for a Bayesian additive regression trees component. Note that an offset can be added
separately, in the usual way using offset(...)
.
For gaussian models, formula.V
can be used to specify the variance structure of the model.
Currently two specialized variance model components are supported, vreg(...)
for regression
effects predicting the log-variance and vfac(...)
for modeled variance factors.
A sampler object, which is the main input for the MCMC simulation
function MCMCsim
. The sampler object is an environment with
precomputed quantities and functions. The main functions are rprior
,
which returns a sample from the prior distributions, draw
,
which returns a sample from the full conditional posterior distributions,
and start
, which returns a list with starting values for the Gibbs
sampler. If prior.only
is TRUE
, functions draw
and
start
are not created.
J.H. Albert and S. Chib (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association 88(422), 669-679.
D. Bates, M. Maechler, B. Bolker and S.C. Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67(1), 1-48.
S.W. Linderman, M.J. Johnson and R.P. Adams (2015). Dependent multinomial models made easy: Stick-breaking with the Polya-Gamma augmentation. Advances in Neural Information Processing Systems, 3456-3464.
P.A. Parker, S.H. Holan and R. Janicki (2023). Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure. Journal of Survey Statistics and Methodology, smad002.
N. Polson, J.G. Scott and J. Windle (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association 108(504), 1339-1349.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
M. Zhou and L. Carin (2015). Negative Binomial Process Count and Mixture Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(2), 307-320.
# first generate some data n <- 200 x <- rnorm(n) y <- 0.5 + 2*x + 0.3*rnorm(n) # create a sampler for a simple linear regression model sampler <- create_sampler(y ~ x) sim <- MCMCsim(sampler) (summary(sim)) y <- rbinom(n, 1, 1 / (1 + exp(-(0.5 + 2*x)))) # create a sampler for a binary logistic regression model sampler <- create_sampler(y ~ x, family="binomial") sim <- MCMCsim(sampler) (summary(sim))
# first generate some data n <- 200 x <- rnorm(n) y <- 0.5 + 2*x + 0.3*rnorm(n) # create a sampler for a simple linear regression model sampler <- create_sampler(y ~ x) sim <- MCMCsim(sampler) (summary(sim)) y <- rbinom(n, 1, 1 / (1 + exp(-(0.5 + 2*x)))) # create a sampler for a binary logistic regression model sampler <- create_sampler(y ~ x, family="binomial") sim <- MCMCsim(sampler) (summary(sim))
This function sets up an object for multivariate normal sampling based on a specified precision matrix. Linear equality and inequality restrictions are supported. For sampling under inequality restrictions four algorithms are available. The default in that case is an exact Hamiltonian Monte Carlo algorithm (Pakman and Paninski, 2014). A related algorithm is the zig-zag Hamiltonian Monte Carlo method (Nishimura et al., 2021) in which momentum is sampled from a Laplace instead of normal distribution. Alternatively, a Gibbs sampling algorithm can be used (Rodriguez-Yam et al., 2004). The fourth option is a data augmentation method that samples from a smooth approximation to the truncated multivariate normal distribution (Souris et al., 2018).
create_TMVN_sampler( Q, mu = NULL, Xy = NULL, update.Q = FALSE, update.mu = update.Q, name = "x", coef.names = NULL, R = NULL, r = NULL, S = NULL, s = NULL, lower = NULL, upper = NULL, check.constraints = FALSE, method = NULL, reduce = NULL, chol.control = chol_control() )
create_TMVN_sampler( Q, mu = NULL, Xy = NULL, update.Q = FALSE, update.mu = update.Q, name = "x", coef.names = NULL, R = NULL, r = NULL, S = NULL, s = NULL, lower = NULL, upper = NULL, check.constraints = FALSE, method = NULL, reduce = NULL, chol.control = chol_control() )
Q |
precision matrix of the (unconstrained) multivariate normal distribution. |
mu |
mean of the (unconstrained) multivariate normal distribution. |
Xy |
alternative to specifying mu; in this case |
update.Q |
whether |
update.mu |
whether |
name |
name of the TMVN vector parameter. |
coef.names |
optional labels for the components of the vector parameter. |
R |
equality restriction matrix. |
r |
rhs vector for equality constraints |
S |
inequality restriction matrix. |
s |
rhs vector for inequality constraints |
lower |
alternative to |
upper |
alternative to |
check.constraints |
if |
method |
sampling method. The options are "direct" for direct sampling from the
unconstrained or equality constrained multivariate normal (MVN). For inequality constrained
MVN sampling three methods are supported: "HMC" for (exact) Hamiltonian Monte Carlo,
"HMCZigZag" for (exact) Hamiltonian Monte Carlo with Laplace momentum, "Gibbs" for a
component-wise Gibbs sampling approach, and "softTMVN" for a data augmentation method that samples
from a smooth approximation to the truncated MVN. Alternatively, the method setting
functions |
reduce |
whether to a priori restrict the simulation to the subspace defined by the equality constraints. |
chol.control |
options for Cholesky decomposition, see |
The componentwise Gibbs sampler uses univariate truncated normal samplers as described in Botev and L'Ecuyer (2016). These samplers are implemented in R package TruncatedNormal, but here translated to C++ for an additional speed-up.
An environment for sampling from a possibly degenerate and truncated multivariate normal distribution.
Harm Jan Boonstra, with help from Grzegorz Baltissen
Z.I. Botev and P. L'Ecuyer (2016). Simulation from the Normal Distribution Truncated to an Interval in the Tail. in VALUETOOLS.
Y. Cong, B. Chen and M. Zhou (2017). Fast simulation of hyperplane-truncated multivariate normal distributions. Bayesian Analysis 12(4), 1017-1037.
Y. Li and S.K. Ghosh (2015). Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints. Journal of Statistical Theory and Practice 9(4), 712-732.
A. Nishimura, Z. Zhang and M.A. Suchard (2021). Hamiltonian zigzag sampler got more momentum than its Markovian counterpart: Equivalence of two zigzags under a momentum refreshment limit. arXiv:2104.07694.
A. Pakman and L. Paninski (2014). Exact Hamiltonian Monte Carlo for truncated multivariate gaussians. Journal of Computational and Graphical Statistics 23(2), 518-542.
G. Rodriguez-Yam, R.A. Davis and L.L. Scharf (2004). Efficient Gibbs sampling of truncated multivariate normal with application to constrained linear regression. Unpublished manuscript.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
A. Souris, A. Bhattacharya and P. Debdeep (2018). The Soft Multivariate Truncated Normal Distribution. arXiv:1807.09155.
K.A. Valeriano, C.E. Galarza and L.A. Matos (2023). Moments and random number generation for the truncated elliptical family of distributions. Statistics and Computing 33(1), 1-20.
S <- cbind(diag(2), c(-1, 1), c(1.1, -1)) # inequality matrix # S'x >= 0 represents the wedge x1 <= x2 <= 1.1 x1 # example taken from Pakman and Paninski (2014) # 1. exact Hamiltonian Monte Carlo (Pakman and Paninski, 2014) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="HMC") sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".") # 2. exact Hamiltonian Monte Carlo with Laplace momentum (Nishimura et al., 2021) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="HMCZigZag") sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".") # 3. Gibbs sampling approach (Rodriguez-Yam et al., 2004) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="Gibbs") sim <- MCMCsim(sampler, burnin=500, n.iter=2000, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".") # 4. soft TMVN approximation (Souris et al., 2018) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="softTMVN") sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".")
S <- cbind(diag(2), c(-1, 1), c(1.1, -1)) # inequality matrix # S'x >= 0 represents the wedge x1 <= x2 <= 1.1 x1 # example taken from Pakman and Paninski (2014) # 1. exact Hamiltonian Monte Carlo (Pakman and Paninski, 2014) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="HMC") sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".") # 2. exact Hamiltonian Monte Carlo with Laplace momentum (Nishimura et al., 2021) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="HMCZigZag") sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".") # 3. Gibbs sampling approach (Rodriguez-Yam et al., 2004) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="Gibbs") sim <- MCMCsim(sampler, burnin=500, n.iter=2000, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".") # 4. soft TMVN approximation (Souris et al., 2018) sampler <- create_TMVN_sampler(Q=diag(2), mu=c(4, 4), S=S, method="softTMVN") sim <- MCMCsim(sampler, n.iter=600, verbose=FALSE) summary(sim) plot(as.matrix(sim$x), pch=".")
This function is intended to be used on the right hand side of the formula
argument to
create_sampler
or generate_data
.
gen( formula = ~1, factor = NULL, remove.redundant = FALSE, drop.empty.levels = FALSE, X = NULL, var = NULL, prior = NULL, Q0 = NULL, PX = NULL, GMRFmats = NULL, priorA = NULL, Leroux = FALSE, R0 = NULL, RA = NULL, constr = NULL, S0 = NULL, SA = NULL, formula.gl = NULL, a = 1000, name = "", sparse = NULL, control = gen_control(), debug = FALSE )
gen( formula = ~1, factor = NULL, remove.redundant = FALSE, drop.empty.levels = FALSE, X = NULL, var = NULL, prior = NULL, Q0 = NULL, PX = NULL, GMRFmats = NULL, priorA = NULL, Leroux = FALSE, R0 = NULL, RA = NULL, constr = NULL, S0 = NULL, SA = NULL, formula.gl = NULL, a = 1000, name = "", sparse = NULL, control = gen_control(), debug = FALSE )
formula |
a model formula specifying the effects that vary over the levels of
the factor variable(s) specified by argument |
factor |
a formula with factors by which the effects specified in the |
remove.redundant |
whether redundant columns should be removed from the model matrix
associated with |
drop.empty.levels |
whether to remove factor levels without observations. |
X |
A (possibly sparse) design matrix. This can be used instead of |
var |
the (co)variance structure among the varying effects defined by |
prior |
the prior specification for the variance parameters of the random effects.
These can currently be specified by a call to |
Q0 |
precision matrix associated with |
PX |
whether parameter expansion should be used. Default is
|
GMRFmats |
list of incidence/precision/constraint matrices. This can be specified
as an alternative to |
priorA |
prior distribution for scale factors at the variance scale associated with |
Leroux |
this option alters the precision matrix determined by |
R0 |
an optional equality restriction matrix acting on the coefficients defined by |
RA |
an optional equality restriction matrix acting on the coefficients defined by |
constr |
whether constraints corresponding to the null-vectors of the precision matrix
are to be imposed on the vector of coefficients. By default this is |
S0 |
an optional inequality restriction matrix acting on the coefficients defined by |
SA |
an optional inequality restriction matrix acting on the coefficients defined by |
formula.gl |
a formula of the form |
a |
only used in case the effects are MLiG distributed, such as is
assumed in case of a gamma sampling distribution, or for
gaussian variance modelling. In those cases |
name |
the name of the model component. This name is used in the output of the MCMC simulation
function |
sparse |
whether the model matrix associated with |
control |
a list with further computational options. These options can
be specified using function |
debug |
if |
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
J. Besag and C. Kooperberg (1995). On Conditional and Intrinsic Autoregression. Biometrika 82(4), 733-746.
C.M. Carvalho, N.G. Polson and J.G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465-480.
L. Fahrmeir, T. Kneib and S. Lang (2004). Penalized Structured Additive Regression for Space-Time Data: a Bayesian Perspective. Statistica Sinica 14, 731-761.
A. Gelman (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1(3), 515-533.
A. Gelman, D.A. Van Dyk, Z. Huang and W.J. Boscardin (2008). Using Redundant Parameterizations to Fit Hierarchical Models. Journal of Computational and Graphical Statistics 17(1), 95-122.
B. Leroux, X. Lei and N. Breslow (1999). Estimation of Disease Rates in Small Areas: A New Mixed Model for Spatial Dependence. In M. Halloran and D. Berry (Eds.), Statistical Models in Epidemiology, the Environment and Clinical Trials, 135-178.
T. Park and G. Casella (2008). The Bayesian Lasso. Journal of the American Statistical Association 103(482), 681-686.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
Set computational options for the sampling algorithms used for a 'gen' model component
gen_control(MHprop = c("GiG", "LNRW"))
gen_control(MHprop = c("GiG", "LNRW"))
MHprop |
MH proposal for the variance component in case of a MLiG prior
on the coefficients. The two options are "GiG" for a generalized inverse gamma
proposal, and "LNRW" for a log_normal random walk proposal. The former should
approximate the conditional posterior quite well provided MLiG parameter |
A list with computational options regarding a 'gen' model component.
This function generates draws from the prior predictive distribution. Parameter values are drawn from their priors, and consequently data is generated from the sampling distribution given these parameter values.
generate_data( formula, data = NULL, family = "gaussian", ny = NULL, ry = NULL, r.mod, sigma.fixed = NULL, sigma.mod = NULL, Q0 = NULL, formula.V = NULL, linpred = NULL )
generate_data( formula, data = NULL, family = "gaussian", ny = NULL, ry = NULL, r.mod, sigma.fixed = NULL, sigma.mod = NULL, Q0 = NULL, formula.V = NULL, linpred = NULL )
formula |
A model formula, see |
data |
see |
family |
sampling distribution family, see |
ny |
see |
ry |
see |
r.mod |
see |
sigma.fixed |
see |
sigma.mod |
see |
Q0 |
see |
formula.V |
see |
linpred |
see |
A list with a generated data vector and a list of prior means of the parameters. The parameters are drawn from their priors.
n <- 250 dat <- data.frame( x = rnorm(n), g = factor(sample(1:10, n, replace=TRUE)), ny = 10 ) gd <- generate_data( ~ reg(~ 1 + x, Q0=10, b0=c(0, 1), name="beta") + gen(factor = ~ g, name="v"), family="binomial", ny="ny", data=dat ) gd plot(dat$x, gd$y)
n <- 250 dat <- data.frame( x = rnorm(n), g = factor(sample(1:10, n, replace=TRUE)), ny = 10 ) gd <- generate_data( ~ reg(~ 1 + x, Q0=10, b0=c(0, 1), name="beta") + gen(factor = ~ g, name="v"), family="binomial", ny="ny", data=dat ) gd plot(dat$x, gd$y)
Extract a list of parameter values for a single draw
get_draw(obj, iter, chain)
get_draw(obj, iter, chain)
obj |
an object of class |
iter |
iteration number. |
chain |
chain number. |
A list with all parameter values of draw iter
from chain chain
.
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) get_draw(sim, iter=20, chain=3)
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) get_draw(sim, iter=20, chain=3)
This function is intended to be used to specify the formula.gl
argument to
the gen
model component specification function.
Group-level predictors and hierarchical centering are
not used by default, and they currently cannot be used in a model component that is sampled
together with another model component in the same Gibbs block.
glreg( formula = NULL, remove.redundant = FALSE, prior = NULL, Q0 = NULL, data = NULL, name = "" )
glreg( formula = NULL, remove.redundant = FALSE, prior = NULL, Q0 = NULL, data = NULL, name = "" )
formula |
a formula specifying the group-level predictors to be used within a model
component. If no |
remove.redundant |
whether redundant columns should be removed from the design matrix.
Default is |
prior |
prior specification for the group-level effects. Currently only
normal priors with mean 0 can be specified, using function |
Q0 |
prior precision matrix for the group-level effects. The default is a
zero matrix corresponding to a noninformative improper prior.
DEPRECATED, please use argument |
data |
group-level data frame in which the group-level variables specified in
|
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
An object with precomputed quantities for sampling from prior or conditional posterior distributions for this model component. Only intended for internal use by other package functions.
Get and set the variable labels of a draws component object for a vector-valued parameter
## S3 method for class 'dc' labels(object, ...) labels(object) <- value
## S3 method for class 'dc' labels(object, ...) labels(object) <- value
object |
a draws component object. |
... |
currently not used. |
value |
a vector of labels. |
The extractor function returns the variable labels.
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=50, n.iter=100, n.chain=1, store.all=TRUE) labels(sim$beta) labels(sim$v) labels(sim$beta) <- c("a", "b") labels(sim$beta)
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=50, n.iter=100, n.chain=1, store.all=TRUE) labels(sim$beta) labels(sim$v) labels(sim$beta) <- c("a", "b") labels(sim$beta)
Functions for matrix-vector multiplies like %*%
and crossprod
,
but often faster for the matrix types supported. The return value is always a
numeric vector.
M %m*v% v crossprod_mv(M, v)
M %m*v% v crossprod_mv(M, v)
M |
a matrix of class 'matrix', 'dgCMatrix', 'dsCMatrix', 'tabMatrix', or 'ddiMatrix'. |
v |
a numeric vector. |
For %m*v%
the vector and for
crossprod_mv
the vector
where
denotes the transpose of
.
M <- matrix(rnorm(10*10), 10, 10) x <- rnorm(10) M %m*v% x crossprod_mv(M, x) M <- Matrix::rsparsematrix(100, 100, nnz=100) x <- rnorm(100) M %m*v% x crossprod_mv(M, x)
M <- matrix(rnorm(10*10), 10, 10) x <- rnorm(10) M %m*v% x crossprod_mv(M, x) M <- Matrix::rsparsematrix(100, 100, nnz=100) x <- rnorm(100) M %m*v% x crossprod_mv(M, x)
Maximize the log-likelihood or log-posterior as defined by a sampler closure
maximize_log_lh_p( sampler, type = c("llh", "lpost"), method = "BFGS", control = list(fnscale = -1), ... )
maximize_log_lh_p( sampler, type = c("llh", "lpost"), method = "BFGS", control = list(fnscale = -1), ... )
sampler |
sampler function closure, i.e. the return value of a call to |
type |
either "llh" (default) or "lpost", for optimization of the log-likelihood, or the log-posterior, respectively. |
method |
optimization method, passed to |
control |
control parameters, passed to |
... |
other parameters passed to |
A list of parameter values that, provided the optimization was successful, maximize the (log-)likelihood or (log-)posterior.
n <- 1000 dat <- data.frame( x = rnorm(n), f = factor(sample(1:50, n, replace=TRUE)) ) df <- generate_data( ~ reg(~x, name="beta", prior=pr_normal(precision=1)) + gen(~x, factor=~f, name="v"), sigma.fixed=TRUE, data=dat ) dat$y <- df$y sampler <- create_sampler(y ~ x + gen(~x, factor=~f, name="v"), data=dat) opt <- maximize_log_lh_p(sampler) str(opt) plot(df$par$v, opt$par$v); abline(0, 1, col="red")
n <- 1000 dat <- data.frame( x = rnorm(n), f = factor(sample(1:50, n, replace=TRUE)) ) df <- generate_data( ~ reg(~x, name="beta", prior=pr_normal(precision=1)) + gen(~x, factor=~f, name="v"), sigma.fixed=TRUE, data=dat ) dat$y <- df$y sampler <- create_sampler(y ~ x + gen(~x, factor=~f, name="v"), data=dat) opt <- maximize_log_lh_p(sampler) str(opt) plot(df$par$v, opt$par$v); abline(0, 1, col="red")
R_hat
computes Gelman-Rubin convergence diagnostics based on the MCMC output
in a model component, and n_eff
computes the effective sample sizes, .i.e.
estimates for the number of independent samples from the posterior distribution.
R_hat(dc) n_eff(dc, useFFT = TRUE, lag.max, cl = NULL)
R_hat(dc) n_eff(dc, useFFT = TRUE, lag.max, cl = NULL)
dc |
a draws component (dc) object corresponding to a model parameter. |
useFFT |
whether to use the Fast Fourier Transform algorithm. Default is |
lag.max |
the lag up to which autocorrelations are computed in case |
cl |
a cluster for parallel computation. |
In case of R_hat
the split-R-hat convergence diagnostic for each
component of the vector parameter, and in case of n_eff
the effective
number of independent samples for each component of the vector parameter.
A. Gelman and D. B. Rubin (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7, 457-511.
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari and D.B. Rubin (2013). Bayesian Data Analysis, 3rd edition. Chapman & Hall/CRC.
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) n_eff(sim$beta) n_eff(sim$v_sigma) n_eff(sim$v_rho) R_hat(sim$beta) R_hat(sim$llh_) R_hat(sim$v_sigma)
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) n_eff(sim$beta) n_eff(sim$v_sigma) n_eff(sim$v_rho) R_hat(sim$beta) R_hat(sim$llh_) R_hat(sim$v_sigma)
Use to_mcmc
to convert a draws component to class mcmc.list
,
allowing one to use MCMC diagnostic functions provided by package coda.
Use as.array
to convert to an array of dimension (draws, chains, parameters)
.
The array format is supported by some packages for analysis or visualisation of MCMC
simulation results, e.g. bayesplot.
Use as.matrix
to convert to a matrix, concatenating the chains.
Finally, use to_draws_array
to convert either a draws component or
(a subset of components of) an mcdraws object to a draws_array
object
as defined in package posterior.
to_mcmc(x) to_draws_array(x, components = NULL) ## S3 method for class 'dc' as.array(x, ...) ## S3 method for class 'dc' as.matrix(x, colnames = TRUE, ...)
to_mcmc(x) to_draws_array(x, components = NULL) ## S3 method for class 'dc' as.array(x, ...) ## S3 method for class 'dc' as.matrix(x, colnames = TRUE, ...)
x |
a component of an mcdraws object corresponding to a scalar or vector model parameter. |
components |
optional character vector of names of draws components in an mcdraws object.
This can be used to select a subset of components to convert to
|
... |
currently ignored. |
colnames |
whether column names should be set. |
The draws component(s) coerced to an mcmc.list
object,
a draws_array
object, an array, or a matrix.
data(iris) sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris) sim <- MCMCsim(sampler, burnin=100, n.chain=2, n.iter=400) summary(sim) if (require("coda", quietly=TRUE)) { mcbeta <- to_mcmc(sim$beta) geweke.diag(mcbeta) } if (require("posterior", quietly=TRUE)) { mcbeta <- to_draws_array(sim$beta) mcbeta draws <- to_draws_array(sim) str(draws) } str(as.array(sim$beta)) str(as.matrix(sim$beta)) # generate some example data n <- 250 dat <- data.frame(x=runif(n), f=as.factor(sample(1:5, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat) sim <- MCMCsim(sampler, n.chain=2, n.iter=400) str(sim$beta) str(as.array(sim$beta)) bayesplot::mcmc_hist(as.array(sim$beta)) bayesplot::mcmc_dens_overlay(as.array(sim$beta)) # fake data simulation check: bayesplot::mcmc_recover_intervals(as.array(sim$beta), gd$pars$beta) bayesplot::mcmc_recover_hist(as.array(sim$beta), gd$pars$beta) ex <- mcmcsae_example() plot(ex$dat$fT, ex$dat$y) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, n.chain=2, n.iter=400, store.all=TRUE) str(sim$beta) str(as.matrix(sim$beta)) # fake data simulation check: bayesplot::mcmc_recover_intervals(as.matrix(sim$beta), ex$pars$beta) bayesplot::mcmc_recover_intervals(as.matrix(sim$u), ex$pars$u)
data(iris) sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris) sim <- MCMCsim(sampler, burnin=100, n.chain=2, n.iter=400) summary(sim) if (require("coda", quietly=TRUE)) { mcbeta <- to_mcmc(sim$beta) geweke.diag(mcbeta) } if (require("posterior", quietly=TRUE)) { mcbeta <- to_draws_array(sim$beta) mcbeta draws <- to_draws_array(sim) str(draws) } str(as.array(sim$beta)) str(as.matrix(sim$beta)) # generate some example data n <- 250 dat <- data.frame(x=runif(n), f=as.factor(sample(1:5, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat) sim <- MCMCsim(sampler, n.chain=2, n.iter=400) str(sim$beta) str(as.array(sim$beta)) bayesplot::mcmc_hist(as.array(sim$beta)) bayesplot::mcmc_dens_overlay(as.array(sim$beta)) # fake data simulation check: bayesplot::mcmc_recover_intervals(as.array(sim$beta), gd$pars$beta) bayesplot::mcmc_recover_hist(as.array(sim$beta), gd$pars$beta) ex <- mcmcsae_example() plot(ex$dat$fT, ex$dat$y) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, n.chain=2, n.iter=400, store.all=TRUE) str(sim$beta) str(as.matrix(sim$beta)) # fake data simulation check: bayesplot::mcmc_recover_intervals(as.matrix(sim$beta), ex$pars$beta) bayesplot::mcmc_recover_intervals(as.matrix(sim$u), ex$pars$u)
This function is used to generate data for several examples.
mcmcsae_example(n = 100L, family = "gaussian")
mcmcsae_example(n = 100L, family = "gaussian")
n |
the size of the generated dataset. |
family |
sampling distribution family, see |
A list
containing the generated dataset, the values of the model
parameters, and the model specification as a formula.
ex <- mcmcsae_example() str(ex)
ex <- mcmcsae_example() str(ex)
These functions are intended for use in the family
argument of create_sampler
.
In future versions these functions may gain additional arguments, but currently the corresponding
functions gaussian
and binomial
can be used as well.
f_gaussian(link = "identity") f_binomial(link = c("logit", "probit")) f_negbinomial(link = "logit") f_poisson(link = "log") f_multinomial(link = "logit", K = NULL) f_gamma( link = "log", shape.vec = ~1, shape.prior = pr_gamma(0.1, 0.1), shape.MH.type = c("RW", "gamma") ) f_gaussian_gamma(link = "identity", var.data, ...)
f_gaussian(link = "identity") f_binomial(link = c("logit", "probit")) f_negbinomial(link = "logit") f_poisson(link = "log") f_multinomial(link = "logit", K = NULL) f_gamma( link = "log", shape.vec = ~1, shape.prior = pr_gamma(0.1, 0.1), shape.MH.type = c("RW", "gamma") ) f_gaussian_gamma(link = "identity", var.data, ...)
link |
the name of a link function. Currently the only allowed link functions are:
|
K |
number of categories for multinomial model; this must be specified for prior predictive sampling. |
shape.vec |
optional formula specification of unequal shape parameter for gamma family |
shape.prior |
prior for gamma shape parameter. Supported prior distributions:
|
shape.MH.type |
the type of Metropolis-Hastings algorithm employed in case the shape parameter is to be inferred. The two choices currently supported are "RW" for a random walk proposal on the log-shape scale and "gamma" for an approximating gamma proposal, found using an iterative algorithm. In the latter case, a Metropolis-Hastings accept-reject step is currently omitted, so the sampling algorithm is an approximate one, though one that is usually quite accurate and efficient. |
var.data |
the (variance) data for the gamma part of family |
... |
further arguments passed to |
A family object.
J.W. Miller (2019). Fast and Accurate Approximation of the Full Conditional for Gamma Shape Parameters. Journal of Computational and Graphical Statistics 28(2), 476-480.
Given a sampler object this function runs a MCMC simulation and stores the
posterior draws. A sampler object for a wide class of multilevel models
can be created using create_sampler
, but users can also define
their own sampler functions, see below.
MCMCsim
allows to choose the parameters for which simulation results
must be stored. It is possible to define derived quantities that will also
be stored. To save memory, it is also possible to only store Monte Carlo
means/standard errors for some large vector parameters, say. Another
way to use less memory is to save the simulation results of large vector
parameters to file.
For parameters specified in plot.trace
trace plots or pair plots of
multiple parameters are displayed during the simulation.
MCMCsim( sampler, from.prior = FALSE, n.iter = 1000L, n.chain = 3L, thin = 1L, burnin = if (from.prior) 0L else 250L, start = NULL, store, store.all = FALSE, pred = NULL, store.mean, store.sds = FALSE, to.file = NULL, filename = "MCdraws_", write.single.prec = FALSE, verbose = TRUE, n.progress = n.iter%/%10L, trace.convergence = NULL, stop.on.convergence = FALSE, convergence.bound = 1.05, plot.trace = NULL, add.to.plot = TRUE, plot.type = "l", n.cores = 1L, cl = NULL, seed = NULL, export = NULL )
MCMCsim( sampler, from.prior = FALSE, n.iter = 1000L, n.chain = 3L, thin = 1L, burnin = if (from.prior) 0L else 250L, start = NULL, store, store.all = FALSE, pred = NULL, store.mean, store.sds = FALSE, to.file = NULL, filename = "MCdraws_", write.single.prec = FALSE, verbose = TRUE, n.progress = n.iter%/%10L, trace.convergence = NULL, stop.on.convergence = FALSE, convergence.bound = 1.05, plot.trace = NULL, add.to.plot = TRUE, plot.type = "l", n.cores = 1L, cl = NULL, seed = NULL, export = NULL )
sampler |
sampler object created by |
from.prior |
whether to sample from the prior. By default |
n.iter |
number of draws after burnin. |
n.chain |
number of independent chains. |
thin |
only every |
burnin |
number of draws to discard at the beginning of each chain. |
start |
an optional function to generate starting values or a list containing for each chain a named list of starting values. It may be used to provide starting values for some or all parameters. The sampler object's own start function, if it exists, is called to generate any starting values not provided by the user. |
store |
vector of names of parameters to store MCMC draws for. By default, simulations are
stored for all parameters returned by |
store.all |
if |
pred |
list of character strings defining derived quantities to be computed (and stored) for each draw. |
store.mean |
vector of names of parameters for which only the mean (per chain) is to be stored.
This may be useful for large vector parameters (e.g. regression residuals) for which storing complete
MCMC output would use too much memory. The function |
store.sds |
if |
to.file |
vector of names of parameters to write to file. |
filename |
name of file to write parameter draws to.
Each named parameter is written to a separate file, named |
write.single.prec |
Whether to write to file in single precision. Default is |
verbose |
if |
n.progress |
update diagnostics and plots after so many iterations. |
trace.convergence |
vector of names of parameters for which Gelman-Rubin R-hat diagnostics are printed to the screen every |
stop.on.convergence |
if |
convergence.bound |
threshold used with |
plot.trace |
character vector of parameter names for which to plot draws
during the simulation. For one or two parameters trace plots will be shown,
and if more parameters are specified the results will be displayed in a pairs
plot. For vector parameters a specific component can be selected using brackets,
e.g. |
add.to.plot |
if |
plot.type |
default is "l" (lines). |
n.cores |
the number of cpu cores to use. Default is 1, i.e. no parallel computation.
If an existing cluster |
cl |
an existing cluster can be passed for parallel computation. If |
seed |
a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers. |
export |
a character vector with names of objects to export to the workers. This may
be needed for parallel execution if expressions in |
A sampler object is an environment containing data and functions to use
for sampling. The following elements of the sampler object are used by
MCMCsim
:
function to generate starting values.
function to draw samples, typically from a full conditional posterior distribution.
function to draw from a prior distribution.
list of vectors of parameter coefficient names, for vector parameters.
vector of names of parameters that are sampled using a Metropolis-Hastings (MH) sampler; acceptance rates are kept for these parameters.
function of acceptance rates of MHpars
to adapt
MH-kernel, called every 100 iterations during the burn-in period.
An object of class mcdraws
containing posterior draws as well as some meta information.
# 1. create a sampler function sampler <- new.env() sampler$draw <- function(p) list(x=rnorm(1L), y=runif(1L)) # 2. do the simulation sim <- MCMCsim(sampler, store=c("x", "y")) str(sim) summary(sim) # example that requires start values or a start function sampler$draw <- function(p) list(x=rnorm(1L), y=p$x * runif(1L)) sampler$start <- function(p) list(x=rnorm(1L), y=runif(1L)) sim <- MCMCsim(sampler, store=c("x", "y")) summary(sim) plot(sim, c("x", "y")) # example using create_sampler; first generate some data n <- 100 dat <- data.frame(x=runif(n), f=as.factor(sample(1:4, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ x + f, data=dat) sim <- MCMCsim(sampler, burnin=100, n.iter=400, n.chain=2) (summary(sim)) gd$pars
# 1. create a sampler function sampler <- new.env() sampler$draw <- function(p) list(x=rnorm(1L), y=runif(1L)) # 2. do the simulation sim <- MCMCsim(sampler, store=c("x", "y")) str(sim) summary(sim) # example that requires start values or a start function sampler$draw <- function(p) list(x=rnorm(1L), y=p$x * runif(1L)) sampler$start <- function(p) list(x=rnorm(1L), y=runif(1L)) sim <- MCMCsim(sampler, store=c("x", "y")) summary(sim) plot(sim, c("x", "y")) # example using create_sampler; first generate some data n <- 100 dat <- data.frame(x=runif(n), f=as.factor(sample(1:4, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ x + f, data=dat) sim <- MCMCsim(sampler, burnin=100, n.iter=400, n.chain=2) (summary(sim)) gd$pars
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates an additive regression term in the
model's linear predictor. Covariates are assumed to be measured subject
to normally distributed errors with zero mean and variance specified using
the formula
or V
arguments. Note that this means that formula
should only contain quantitative variables, and no intercept.
By default, the prior for the regression
coefficients is improper uniform. A proper normal prior can be set up
using function pr_normal
, and passed to argument prior
.
It should be noted that pr_normal
expects a precision matrix
as input for its second argument, and that the prior variance (matrix) is
taken to be the inverse of this precision matrix, where in case the
model's family is "gaussian"
this matrix is additionally
multiplied by the residual scalar variance parameter sigma_^2
.
mec( formula = ~1, sparse = NULL, X = NULL, V = NULL, prior = NULL, Q0 = NULL, b0 = NULL, R = NULL, r = NULL, S = NULL, s = NULL, lower = NULL, upper = NULL, name = "", debug = FALSE )
mec( formula = ~1, sparse = NULL, X = NULL, V = NULL, prior = NULL, Q0 = NULL, b0 = NULL, R = NULL, r = NULL, S = NULL, s = NULL, lower = NULL, upper = NULL, name = "", debug = FALSE )
formula |
a formula specifying the predictors subject to measurement error
and possibly their variances as well. In the latter case the formula syntax
|
sparse |
whether the model matrix associated with |
X |
a (possibly sparse) design matrix can be specified directly, as an
alternative to the creation of one based on |
V |
measurement error variance; can contain zeros |
prior |
prior specification for the regression coefficients. Currently only
normal priors are supported, specified using function |
Q0 |
prior precision matrix for the regression effects. The default is a
zero matrix corresponding to a noninformative improper prior.
It can be specified as a scalar value, as a numeric vector of appropriate
length, or as a matrix object. DEPRECATED, please use argument |
b0 |
prior mean for the regression effect. Defaults to a zero vector.
It can be specified as a scalar value or as a numeric vector of
appropriate length. DEPRECATED, please use argument |
R |
optional constraint matrix for equality restrictions R'x = r where
|
r |
right hand side for the equality constraints. |
S |
optional constraint matrix for inequality constraints S'x >= s where x is the vector of regression effects. |
s |
right hand side for the inequality constraints. |
lower |
as an alternative to |
upper |
as an alternative to |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
L.M. Ybarra and S.L. Lohr (2008). Small area estimation when auxiliary information is measured with error. Biometrika 95(4), 919-931.
S. Arima, G.S. Datta and B. Liseo (2015). Bayesian estimators for small area models when auxiliary information is measured with error. Scandinavian Journal of Statistics 42(2), 518-529.
# example of Ybarra and Lohr (2008) m <- 50 X <- rnorm(m, mean=5, sd=3) # true covariate values v <- rnorm(m, sd=2) theta <- 1 + 3*X + v # true values psi <- rgamma(m, shape=4.5, scale=2) e <- rnorm(m, sd=sqrt(psi)) # sampling error y <- theta + e # direct estimates C <- c(rep(3, 10), rep(0, 40)) # measurement error for first 10 values W <- X + rnorm(m, sd=sqrt(C)) # covariate subject to measurement error # fit Ybarra-Lohr model sampler <- create_sampler( y ~ 1 + mec(~ 0 + W, V=C) + gen(factor=~local_), Q0=1/psi, sigma.fixed=TRUE, linpred="fitted" ) sim <- MCMCsim(sampler, n.iter=800, n.chain=2, store.all=TRUE, verbose=FALSE) (summ <- summary(sim)) plot(X, W, xlab="true X", ylab="inferred X") points(X, summ$mec2_X[, "Mean"], col="green") abline(0, 1, col="red") legend("topleft", legend=c("prior mean", "posterior mean"), col=c("black", "green"), pch=c(1,1))
# example of Ybarra and Lohr (2008) m <- 50 X <- rnorm(m, mean=5, sd=3) # true covariate values v <- rnorm(m, sd=2) theta <- 1 + 3*X + v # true values psi <- rgamma(m, shape=4.5, scale=2) e <- rnorm(m, sd=sqrt(psi)) # sampling error y <- theta + e # direct estimates C <- c(rep(3, 10), rep(0, 40)) # measurement error for first 10 values W <- X + rnorm(m, sd=sqrt(C)) # covariate subject to measurement error # fit Ybarra-Lohr model sampler <- create_sampler( y ~ 1 + mec(~ 0 + W, V=C) + gen(factor=~local_), Q0=1/psi, sigma.fixed=TRUE, linpred="fitted" ) sim <- MCMCsim(sampler, n.iter=800, n.chain=2, store.all=TRUE, verbose=FALSE) (summ <- summary(sim)) plot(X, W, xlab="true X", ylab="inferred X") points(X, summ$mec2_X[, "Mean"], col="green") abline(0, 1, col="red") legend("topleft", legend=c("prior mean", "posterior mean"), col=c("black", "green"), pch=c(1,1))
Compute possibly sparse model matrix
model_matrix( formula, data = NULL, contrasts.arg = NULL, drop.unused.levels = FALSE, sparse = NULL, drop0 = TRUE, catsep = "", by = NULL, tabM = FALSE, enclos = .GlobalEnv )
model_matrix( formula, data = NULL, contrasts.arg = NULL, drop.unused.levels = FALSE, sparse = NULL, drop0 = TRUE, catsep = "", by = NULL, tabM = FALSE, enclos = .GlobalEnv )
formula |
model formula. |
data |
data frame containing all variables used in |
contrasts.arg |
specification of contrasts for factor variables. Currently supported are "contr.none" (no contrasts applied), "contr.treatment" (first level removed) and "contr.SAS" (last level removed). Alternatively, a named list specifying a single level per factor variable can be passed. |
drop.unused.levels |
whether empty levels of individual factor variables should be removed. |
sparse |
if |
drop0 |
whether to drop any remaining explicit zeros in resulting sparse matrix. |
catsep |
separator for concatenating factor variable names and level names.
By default it is the empty string, reproducing the labels of |
by |
a vector by which to aggregate the result. |
tabM |
if |
enclos |
enclosure to look for objects not found in |
Design matrix X, either an ordinary matrix or a sparse dgCMatrix
.
Compute the Deviance Information Criterion (DIC) or
Watanabe-Akaike Information Criterion (WAIC) from an
object of class mcdraws
output by MCMCsim
.
Method waic.mcdraws
computes WAIC using package loo.
Method loo.mcdraws
also depends on package loo to compute
a Pareto-smoothed importance sampling (PSIS) approximation
to leave-one-out cross-validation.
compute_DIC(x, use.pV = FALSE) compute_WAIC( x, diagnostic = FALSE, batch.size = NULL, show.progress = TRUE, cl = NULL, n.cores = 1L ) ## S3 method for class 'mcdraws' waic(x, by.unit = FALSE, ...) ## S3 method for class 'mcdraws' loo(x, by.unit = FALSE, r_eff = FALSE, n.cores = 1L, ...)
compute_DIC(x, use.pV = FALSE) compute_WAIC( x, diagnostic = FALSE, batch.size = NULL, show.progress = TRUE, cl = NULL, n.cores = 1L ) ## S3 method for class 'mcdraws' waic(x, by.unit = FALSE, ...) ## S3 method for class 'mcdraws' loo(x, by.unit = FALSE, r_eff = FALSE, n.cores = 1L, ...)
x |
an object of class |
use.pV |
whether half the posterior variance of the deviance should be used as an alternative estimate of the effective number of model parameters for DIC. |
diagnostic |
whether vectors of log-pointwise-predictive-densities and pointwise contributions to the WAIC effective number of model parameters should be returned. |
batch.size |
number of data units to process per batch. |
show.progress |
whether to show a progress bar. |
cl |
an existing cluster can be passed for parallel computation. If |
n.cores |
the number of cpu cores to use. Default is one, i.e. no parallel computation. |
by.unit |
if |
... |
Other arguments, passed to |
r_eff |
whether to compute relative effective sample size estimates
for the likelihood of each observation. This takes more time, but should
result in a better PSIS approximation. See |
For compute_DIC
a vector with the deviance information criterion and
effective number of model parameters. For compute_WAIC
a vector with the
WAIC model selection criterion and WAIC effective number of model parameters.
Method waic
returns an object of class waic, loo
, see the
documentation for waic
in package loo.
Method loo
returns an object of class psis_loo
, see
loo
.
D. Spiegelhalter, N. Best, B. Carlin and A. van der Linde (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society B 64 (4), 583-639.
S. Watanabe (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning 11, 3571-3594.
A. Gelman, J. Hwang and A. Vehtari (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing 24, 997-1016.
A. Vehtari, D. Simpson, A. Gelman, Y. Yao and J. Gabry (2015). Pareto smoothed importance sampling. arXiv:1507.02646.
A. Vehtari, A. Gelman and J. Gabry (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 27, 1413-1432.
P.-C. Buerkner, J. Gabry and A. Vehtari (2021). Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models. Computational Statistics 36, 1243-1261.
ex <- mcmcsae_example(n=100) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, n.chain=4, store.all=TRUE) compute_DIC(sim) compute_WAIC(sim) if (require(loo)) { waic(sim) loo(sim, r_eff=TRUE) }
ex <- mcmcsae_example(n=100) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, n.chain=4, store.all=TRUE) compute_DIC(sim) compute_WAIC(sim) if (require(loo)) { waic(sim) loo(sim, r_eff=TRUE) }
Get the number of chains, samples per chain or the number of variables in a simulation object
nchains(obj) ndraws(obj) nvars(dc)
nchains(obj) ndraws(obj) nvars(dc)
obj |
an mcdraws object or a draws component (dc) object. |
dc |
a draws component object. |
The number of chains or retained samples per chain or the number of variables.
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=5, store.all=TRUE) # resolve possible conflict with posterior package: nchains <- mcmcsae::nchains; ndraws <- mcmcsae::ndraws nchains(sim); nchains(sim$beta) ndraws(sim); ndraws(sim$beta) nvars(sim$beta); nvars(sim$sigma_); nvars(sim$llh_); nvars(sim$v) plot(sim, "beta") nchains(subset(sim$beta, chains=1:2)) ndraws(subset(sim$beta, draws=sample(1:ndraws(sim), 100))) nvars(subset(sim$u, vars=1:2))
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=5, store.all=TRUE) # resolve possible conflict with posterior package: nchains <- mcmcsae::nchains; ndraws <- mcmcsae::ndraws nchains(sim); nchains(sim$beta) ndraws(sim); ndraws(sim$beta) nvars(sim$beta); nvars(sim$sigma_); nvars(sim$llh_); nvars(sim$v) plot(sim, "beta") nchains(subset(sim$beta, chains=1:2)) ndraws(subset(sim$beta, draws=sample(1:ndraws(sim), 100))) nvars(subset(sim$u, vars=1:2))
Get the parameter names from an mcdraws object
par_names(obj)
par_names(obj)
obj |
an mcdraws object. |
The names of the parameters whose MCMC simulations are stored in obj
.
data(iris) sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris) sim <- MCMCsim(sampler, burnin=100, n.iter=400) (summary(sim)) par_names(sim)
data(iris) sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris) sim <- MCMCsim(sampler, burnin=100, n.iter=400) (summary(sim)) par_names(sim)
This function plots estimates with error bars. Multiple sets of
estimates can be compared. The error bars can either be based on
standard errors or on explicitly specified lower and upper bounds.
The function is adapted from function plot.sae
in package
hbsae, which in turn was adapted from function
coefplot.default
from package arm.
plot_coef( ..., n.se = 1, est.names, sort.by = NULL, decreasing = FALSE, index = NULL, maxrows = 50L, maxcols = 6L, offset = 0.1, cex.var = 0.8, mar = c(0.1, 2.1, 5.1, 0.1) )
plot_coef( ..., n.se = 1, est.names, sort.by = NULL, decreasing = FALSE, index = NULL, maxrows = 50L, maxcols = 6L, offset = 0.1, cex.var = 0.8, mar = c(0.1, 2.1, 5.1, 0.1) )
... |
|
n.se |
number of standard errors below and above the point estimates
to use for error bars. By default equal to 1. This only refers to the
objects of class |
est.names |
labels to use in the legend for the components of the |
sort.by |
vector by which to sort the coefficients, referring to the first object passed. |
decreasing |
if |
index |
vector of names or indices of the selected areas to be plotted. |
maxrows |
maximum number of rows in a column. |
maxcols |
maximum number of columns of estimates on a page. |
offset |
space used between plots of multiple estimates for the same area. |
cex.var |
the font size for the variable names, default=0.8. |
mar |
a numerical vector of the form |
# create artificial data set.seed(21) n <- 100 dat <- data.frame( x=runif(n), f=factor(sample(1:20, n, replace=TRUE)) ) model <- ~ reg(~ x, prior=pr_normal(precision=1), name="beta") + gen(factor=~f, name="v") gd <- generate_data(model, data=dat) dat$y <- gd$y # fit a base model model0 <- y ~ reg(~ 1, name="beta") + gen(factor=~f, name="v") sampler <- create_sampler(model0, data=dat, block=TRUE) sim <- MCMCsim(sampler, store.all=TRUE) (summ0 <- summary(sim)) # fit 'true' model model <- y ~ reg(~ x, name="beta") + gen(factor=~f, name="v") sampler <- create_sampler(model, data=dat, block=TRUE) sim <- MCMCsim(sampler, store.all=TRUE) (summ <- summary(sim)) # compare random effect estimates against true parameter values plot_coef(summ0$v, summ$v, list(est=gd$pars$v), n.se=2, offset=0.2, maxrows=10, est.names=c("base model", "true model", "true"))
# create artificial data set.seed(21) n <- 100 dat <- data.frame( x=runif(n), f=factor(sample(1:20, n, replace=TRUE)) ) model <- ~ reg(~ x, prior=pr_normal(precision=1), name="beta") + gen(factor=~f, name="v") gd <- generate_data(model, data=dat) dat$y <- gd$y # fit a base model model0 <- y ~ reg(~ 1, name="beta") + gen(factor=~f, name="v") sampler <- create_sampler(model0, data=dat, block=TRUE) sim <- MCMCsim(sampler, store.all=TRUE) (summ0 <- summary(sim)) # fit 'true' model model <- y ~ reg(~ x, name="beta") + gen(factor=~f, name="v") sampler <- create_sampler(model, data=dat, block=TRUE) sim <- MCMCsim(sampler, store.all=TRUE) (summ <- summary(sim)) # compare random effect estimates against true parameter values plot_coef(summ0$v, summ$v, list(est=gd$pars$v), n.se=2, offset=0.2, maxrows=10, est.names=c("base model", "true model", "true"))
Trace, density and autocorrelation plots for (parameters of a) draws component (dc) object
## S3 method for class 'dc' plot(x, nrows, ncols, ask = FALSE, ...)
## S3 method for class 'dc' plot(x, nrows, ncols, ask = FALSE, ...)
x |
a draws component object. |
nrows |
number of rows in plot layout. |
ncols |
number of columns in plot layout. |
ask |
ask before plotting the next page; default is |
... |
arguments passed to |
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) plot(sim$u)
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) plot(sim$u)
Trace, density and autocorrelation plots for selected components of an mcdraws
object.
## S3 method for class 'mcdraws' plot(x, vnames, nrows, ncols, ask = FALSE, ...)
## S3 method for class 'mcdraws' plot(x, vnames, nrows, ncols, ask = FALSE, ...)
x |
an object of class |
vnames |
optional character vector to select a subset of parameters. |
nrows |
number of rows in plot layout. |
ncols |
number of columns in plot layout. |
ask |
ask before plotting the next page; default is |
... |
arguments passed to |
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) plot(sim, c("beta", "u", "u_sigma", "v_sigma"), ask=TRUE)
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) plot(sim, c("beta", "u", "u_sigma", "v_sigma"), ask=TRUE)
Get means or standard deviations of parameters from the MCMC output in an mcdraws object
get_means(obj, vnames = NULL) get_sds(obj, vnames = NULL)
get_means(obj, vnames = NULL) get_sds(obj, vnames = NULL)
obj |
an object of class |
vnames |
optional character vector to select a subset of parameters. |
A list with simulation means or standard deviations.
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4) get_means(sim) get_means(sim, "e_") sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.mean=c("beta", "u"), store.sds=TRUE) summary(sim, "beta") get_means(sim, "beta") get_sds(sim, "beta") get_means(sim, "u") get_sds(sim, "u")
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4) get_means(sim) get_means(sim, "e_") sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.mean=c("beta", "u"), store.sds=TRUE) summary(sim, "beta") get_means(sim, "beta") get_sds(sim, "beta") get_means(sim, "u") get_sds(sim, "u")
Create an object representing exponential prior distributions
pr_exp(scale = 1)
pr_exp(scale = 1)
scale |
scalar or vector scale parameter. |
An environment representing the specified prior, for internal use.
Create an object representing a degenerate prior fixing a parameter (vector) to a fixed value
pr_fixed(value = 1)
pr_fixed(value = 1)
value |
scalar or vector value parameter. |
An environment representing the specified prior, for internal use.
Create an object representing gamma prior distributions
pr_gamma(shape = 1, rate = 1)
pr_gamma(shape = 1, rate = 1)
shape |
scalar or vector shape parameter. |
rate |
scalar or vector rate, i.e. inverse scale, parameter. |
An environment representing the specified prior, for internal use.
Create an object representing Generalized Inverse Gaussian (GIG) prior distributions
pr_gig(a, b, p)
pr_gig(a, b, p)
a |
scalar or vector parameter. |
b |
scalar or vector parameter. |
p |
scalar or vector parameter. |
An environment representing the specified prior, for internal use.
Create an object representing inverse chi-squared priors with possibly modeled degrees of freedom and scale parameters
pr_invchisq(df = 1, scale = 1)
pr_invchisq(df = 1, scale = 1)
df |
degrees of freedom parameter. This can be a numeric scalar or
vector of length
|
scale |
scalar or vector scale parameter. Alternatively,
|
An environment representing the specified prior, for internal use.
Create an object representing an inverse Wishart prior, possibly with modeled scale matrix
pr_invwishart(df = NULL, scale = NULL)
pr_invwishart(df = NULL, scale = NULL)
df |
Degrees of freedom parameter. This should be a scalar numeric value. Default value is the dimension plus one. |
scale |
Either a (known) scale matrix, or
|
An environment representing the specified prior, for internal use.
A. Huang and M.P. Wand (2013). Simple marginally noninformative prior distributions for covariance matrices. Bayesian Analysis 8, 439-452.
Create an object representing a Multivariate Log inverse Gamma (MLiG) prior distribution
pr_MLiG(mean = 0, precision = 0, labels = NULL, a = 1000)
pr_MLiG(mean = 0, precision = 0, labels = NULL, a = 1000)
mean |
scalar or vector parameter for the mean in the large
|
precision |
scalar or vector parameter for the precision in the
large |
labels |
optional character vector with coefficient labels. If specified,
it should have the same length as at least one of |
a |
scalar parameter that controls how close the prior is to independent
normal priors with |
An environment representing the specified prior, for internal use.
J.R. Bradley, S.H. Holan and C.K. Wikle (2018). Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data (with discussion). Bayesian Analysis 13(1), 253-310.
Create an object representing a possibly multivariate normal prior distribution
pr_normal(mean = 0, precision = 0, labels = NULL)
pr_normal(mean = 0, precision = 0, labels = NULL)
mean |
scalar or vector mean parameter. |
precision |
scalar, vector or matrix precision parameter. |
labels |
optional character vector with coefficient labels. If specified,
it should have the same length as at least one of |
An environment representing the specified prior, for internal use.
Generate draws from the predictive distribution
## S3 method for class 'mcdraws' predict( object, newdata = NULL, X. = if (is.null(newdata)) "in-sample" else NULL, type = c("data", "link", "response", "data_cat"), var = NULL, ny = NULL, ry = NULL, fun. = identity, labels = NULL, ppcheck = FALSE, iters = NULL, to.file = FALSE, filename, write.single.prec = FALSE, show.progress = TRUE, verbose = TRUE, n.cores = 1L, cl = NULL, seed = NULL, export = NULL, ... )
## S3 method for class 'mcdraws' predict( object, newdata = NULL, X. = if (is.null(newdata)) "in-sample" else NULL, type = c("data", "link", "response", "data_cat"), var = NULL, ny = NULL, ry = NULL, fun. = identity, labels = NULL, ppcheck = FALSE, iters = NULL, to.file = FALSE, filename, write.single.prec = FALSE, show.progress = TRUE, verbose = TRUE, n.cores = 1L, cl = NULL, seed = NULL, export = NULL, ... )
object |
an object of class |
newdata |
data frame with auxiliary information to be used for prediction. |
X. |
a list of design matrices; alternatively, |
type |
the type of predictions. The default is |
var |
variance(s) used for out-of-sample prediction. By default 1. |
ny |
number of trials used for out-of-sample prediction in case of a binomial model. By default 1. |
ry |
fixed part of the (reciprocal) dispersion parameter in case of a negative binomial model. |
fun. |
function applied to the vector of posterior predictions to compute one or multiple summaries or test statistics. The function can have one or two arguments. The first argument is always the vector of posterior predictions. The optional second argument represents a list of model parameters, needed only when a test statistic depends on them. The function must return an integer or numeric vector. |
labels |
optional names for the output object. Must be a vector of the same length as the result of |
ppcheck |
if |
iters |
iterations in |
to.file |
if |
filename |
name of the file to write predictions to in case |
write.single.prec |
Whether to write to file in single precision. Default is |
show.progress |
whether to show a progress bar. |
verbose |
whether to show informative messages. |
n.cores |
the number of cpu cores to use. Default is one, i.e. no parallel computation.
If an existing cluster |
cl |
an existing cluster can be passed for parallel computation. If |
seed |
a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers. |
export |
a character vector with names of objects to export to the workers. This may
be needed for parallel execution if expressions in |
... |
currently not used. |
An object of class dc
, containing draws from the posterior (or prior) predictive distribution.
If ppcheck=TRUE
posterior predictive p-values are returned as an additional attribute.
In case to.file=TRUE
the file name used is returned.
n <- 250 dat <- data.frame(x=runif(n)) dat$y <- 1 + dat$x + rnorm(n) sampler <- create_sampler(y ~ x, data=dat) sim <- MCMCsim(sampler) summary(sim) # in-sample prediction pred <- predict(sim, ppcheck=TRUE) hist(attr(pred, "ppp")) # out-of-sample prediction pred <- predict(sim, newdata=data.frame(x=seq(0, 1, by=0.1))) summary(pred)
n <- 250 dat <- data.frame(x=runif(n)) dat$y <- 1 + dat$x + rnorm(n) sampler <- create_sampler(y ~ x, data=dat) sim <- MCMCsim(sampler) summary(sim) # in-sample prediction pred <- predict(sim, ppcheck=TRUE) hist(attr(pred, "ppp")) # out-of-sample prediction pred <- predict(sim, newdata=data.frame(x=seq(0, 1, by=0.1))) summary(pred)
dc
objectDisplay a summary of a dc
object
## S3 method for class 'dc_summary' print( x, digits = 3L, max.lines = 1000L, tail = FALSE, sort = NULL, max.label.length = NULL, ... )
## S3 method for class 'dc_summary' print( x, digits = 3L, max.lines = 1000L, tail = FALSE, sort = NULL, max.label.length = NULL, ... )
x |
an object of class |
digits |
number of digits to use, defaults to 3. |
max.lines |
maximum number of lines to display.
If |
tail |
if |
sort |
column name on which to sort the output. |
max.label.length |
if specified, printed row labels will be abbreviated to at most this length. |
... |
passed on to |
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) print(summary(sim$u), sort="n_eff")
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) print(summary(sim$u), sort="n_eff")
Display a summary of an mcdraws
object, as output by MCMCsim
.
## S3 method for class 'mcdraws_summary' print(x, digits = 3L, max.lines = 10L, tail = FALSE, sort = NULL, ...)
## S3 method for class 'mcdraws_summary' print(x, digits = 3L, max.lines = 10L, tail = FALSE, sort = NULL, ...)
x |
an object of class |
digits |
number of digits to use, defaults to 3. |
max.lines |
maximum number of elements per vector parameter to display.
If |
tail |
if |
sort |
column name on which to sort the output. |
... |
passed on to |
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) print(summary(sim), sort="n_eff")
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) print(summary(sim), sort="n_eff")
Read draws written to file by MCMCsim
used with argument to.file
.
read_draws(name, filename = paste0("MCdraws_", name, ".dat"))
read_draws(name, filename = paste0("MCdraws_", name, ".dat"))
name |
name of the parameter to load the corresponding file with posterior draws for. |
filename |
name of the file in which the draws are stored. |
An object of class dc
containing MCMC draws for a (vector) parameter.
## Not run: # NB this example creates a file "MCdraws_e_.dat" in the working directory n <- 100 dat <- data.frame(x=runif(n), f=as.factor(sample(1:5, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat) # run the MCMC simulation and write draws of residuals to file: sim <- MCMCsim(sampler, n.iter=500, to.file="e_") summary(sim) mcres <- read_draws("e_") summary(mcres) ## End(Not run)
## Not run: # NB this example creates a file "MCdraws_e_.dat" in the working directory n <- 100 dat <- data.frame(x=runif(n), f=as.factor(sample(1:5, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat) # run the MCMC simulation and write draws of residuals to file: sim <- MCMCsim(sampler, n.iter=500, to.file="e_") summary(sim) mcres <- read_draws("e_") summary(mcres) ## End(Not run)
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates an additive regression term in the
model's linear predictor. By default, the prior for the regression
coefficients is improper uniform. A proper normal prior can be set up
using function pr_normal
, and passed to argument prior
.
It should be noted that pr_normal
expects a precision matrix
as input for its second argument, and that the prior variance (matrix) is
taken to be the inverse of this precision matrix, where in case the
model's family is "gaussian"
this matrix is additionally
multiplied by the residual scalar variance parameter sigma_^2
.
reg( formula = ~1, remove.redundant = FALSE, sparse = NULL, X = NULL, prior = NULL, Q0 = NULL, b0 = NULL, R = NULL, r = NULL, S = NULL, s = NULL, lower = NULL, upper = NULL, name = "", debug = FALSE )
reg( formula = ~1, remove.redundant = FALSE, sparse = NULL, X = NULL, prior = NULL, Q0 = NULL, b0 = NULL, R = NULL, r = NULL, S = NULL, s = NULL, lower = NULL, upper = NULL, name = "", debug = FALSE )
formula |
a formula specifying the predictors to be used in the model,
in the same way as the right hand side of the |
remove.redundant |
whether redundant columns should be removed from the
design matrix. Default is |
sparse |
whether the model matrix associated with |
X |
a (possibly sparse) design matrix can be specified directly, as an
alternative to the creation of one based on |
prior |
prior specification for the regression coefficients. Supported
priors can be specified using functions |
Q0 |
prior precision matrix for the regression effects. The default is a
zero matrix corresponding to a noninformative improper prior.
It can be specified as a scalar value, as a numeric vector of appropriate
length, or as a matrix object. DEPRECATED, please use argument |
b0 |
prior mean for the regression effect. Defaults to a zero vector.
It can be specified as a scalar value or as a numeric vector of
appropriate length. DEPRECATED, please use argument |
R |
optional constraint matrix for equality restrictions R'x = r where
|
r |
right hand side for the equality constraints. |
S |
optional constraint matrix for inequality constraints S'x >= s where
|
s |
right hand side for the inequality constraints. |
lower |
as an alternative to |
upper |
as an alternative to |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
data(iris) # default: flat priors on regression coefficients sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris ) sim <- MCMCsim(sampler, burnin=100, n.iter=400) summary(sim) # (weakly) informative normal priors on regression coefficients sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, prior=pr_normal(precision=1e-2), name="beta"), data=iris ) sim <- MCMCsim(sampler, burnin=100, n.iter=400) summary(sim) # binary regression sampler <- create_sampler(Species == "setosa" ~ reg(~ Sepal.Length, prior=pr_normal(precision=0.1), name="beta"), family="binomial", data=iris) sim <- MCMCsim(sampler, burnin=100, n.iter=400) summary(sim) pred <- predict(sim) str(pred) # example with equality constrained regression effects n <- 500 df <- data.frame(x=runif(n)) df$y <- rnorm(n, 1 + 2*df$x) R <- matrix(1, 2, 1) r <- 3 sampler <- create_sampler(y ~ reg(~ 1 + x, R=R, r=r, name="beta"), data=df) sim <- MCMCsim(sampler) summary(sim) plot(sim, "beta") summary(transform_dc(sim$beta, fun=function(x) crossprod_mv(R, x) - r))
data(iris) # default: flat priors on regression coefficients sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, name="beta"), data=iris ) sim <- MCMCsim(sampler, burnin=100, n.iter=400) summary(sim) # (weakly) informative normal priors on regression coefficients sampler <- create_sampler(Sepal.Length ~ reg(~ Petal.Length + Species, prior=pr_normal(precision=1e-2), name="beta"), data=iris ) sim <- MCMCsim(sampler, burnin=100, n.iter=400) summary(sim) # binary regression sampler <- create_sampler(Species == "setosa" ~ reg(~ Sepal.Length, prior=pr_normal(precision=0.1), name="beta"), family="binomial", data=iris) sim <- MCMCsim(sampler, burnin=100, n.iter=400) summary(sim) pred <- predict(sim) str(pred) # example with equality constrained regression effects n <- 500 df <- data.frame(x=runif(n)) df$y <- rnorm(n, 1 + 2*df$x) R <- matrix(1, 2, 1) r <- 3 sampler <- create_sampler(y ~ reg(~ 1 + x, R=R, r=r, name="beta"), data=df) sim <- MCMCsim(sampler) summary(sim) plot(sim, "beta") summary(transform_dc(sim$beta, fun=function(x) crossprod_mv(R, x) - r))
For a model created with create_sampler
and estimated using MCMCsim
,
these functions return the posterior draws of fitted values or residuals.
In the current implementation the fitted values correspond to the linear predictor
and the residuals are computed as the data vector minus the fitted values,
regardless of the model's distribution family.
For large datasets the returned object can become very large. One may therefore
select a subset of draws or chains or use mean.only=TRUE
to
return a vector of posterior means only.
## S3 method for class 'mcdraws' fitted( object, mean.only = FALSE, units = NULL, chains = seq_len(nchains(object)), draws = seq_len(ndraws(object)), matrix = FALSE, type = c("link", "response"), ... ) ## S3 method for class 'mcdraws' residuals( object, mean.only = FALSE, units = NULL, chains = seq_len(nchains(object)), draws = seq_len(ndraws(object)), matrix = FALSE, ... )
## S3 method for class 'mcdraws' fitted( object, mean.only = FALSE, units = NULL, chains = seq_len(nchains(object)), draws = seq_len(ndraws(object)), matrix = FALSE, type = c("link", "response"), ... ) ## S3 method for class 'mcdraws' residuals( object, mean.only = FALSE, units = NULL, chains = seq_len(nchains(object)), draws = seq_len(ndraws(object)), matrix = FALSE, ... )
object |
an object of class |
mean.only |
if |
units |
the data units (by default all) for which fitted values or residuals should be computed. |
chains |
optionally, a selection of chains. |
draws |
optionally, a selection of draws per chain. |
matrix |
whether a matrix should be returned instead of a dc object. |
type |
the type of fitted values: "link" for fitted values on the linear predictor scale (the default), and "response" for fitted values on the response scale. Returned residuals are always on the response scale. |
... |
currently not used. |
Either a draws component object or a matrix with draws of fitted values or residuals.
The residuals are always on the response scale, whereas fitted values can
be on the scale of the linear predictor or the response depending on type
.
If mean.only=TRUE
, a vector of posterior means.
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, store.all=TRUE) fitted(sim, mean.only=TRUE) summary(fitted(sim)) residuals(sim, mean.only=TRUE) summary(residuals(sim)) bayesplot::mcmc_intervals(as.matrix(subset(residuals(sim), vars=1:20)))
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, store.all=TRUE) fitted(sim, mean.only=TRUE) summary(fitted(sim)) residuals(sim, mean.only=TRUE) summary(residuals(sim)) bayesplot::mcmc_intervals(as.matrix(subset(residuals(sim), vars=1:20)))
Set computational options for the sampling algorithms
sampler_control( add.outer.R = TRUE, recompute.e = TRUE, expanded.cMVN.sampler = FALSE, CG = NULL, block = TRUE, block.V = TRUE, auto.order.block = TRUE, chol.control = chol_control(), max.size.cps.template = 100, PG.approx = TRUE, PG.approx.m = -2L, CRT.approx.m = 20L )
sampler_control( add.outer.R = TRUE, recompute.e = TRUE, expanded.cMVN.sampler = FALSE, CG = NULL, block = TRUE, block.V = TRUE, auto.order.block = TRUE, chol.control = chol_control(), max.size.cps.template = 100, PG.approx = TRUE, PG.approx.m = -2L, CRT.approx.m = 20L )
add.outer.R |
whether to add the outer product of a constraint matrix for a better conditioned
linear system of equations, typically for coefficients sampled in a Gibbs-block. Default is |
recompute.e |
when |
expanded.cMVN.sampler |
whether an expanded linear system including dual variables is used
for equality constrained multivariate normal sampling. If set to |
CG |
use a conjugate gradient iterative algorithm instead of Cholesky updates for sampling
the model's coefficients. This must be a list with possible components |
block |
if |
block.V |
if |
auto.order.block |
whether Gibbs blocks should be ordered automatically in such a way that those with the most sparse design matrices come first. This way of ordering can make Cholesky updates more efficient. |
chol.control |
options for Cholesky decomposition, see |
max.size.cps.template |
maximum allowed size in MB of the sparse matrix serving as a template for the sparse symmetric crossproduct X'QX of a dgCMatrix X, where Q is a diagonal matrix subject to change. |
PG.approx |
whether Polya-Gamma draws for logistic binomial models are
approximated by a hybrid gamma convolution approach. If not, |
PG.approx.m |
if |
CRT.approx.m |
scalar integer specifying the degree of approximation to sampling from a Chinese Restaurant Table distribution. The approximation is based on Le Cam's theorem. Larger values yield a slower but more accurate sampler. |
A list with specified computational options used by various sampling functions.
D. Bates, M. Maechler, B. Bolker and S.C. Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67(1), 1-48.
Y. Chen, T.A. Davis, W.W. Hager and S. Rajamanickam (2008). Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software 35(3), 1-14.
Simulation based calibration
SBC_test( ..., pars, n.draws = 25L, n.sim = 20L * n.draws, burnin = 25L, thin = 2L, show.progress = TRUE, verbose = TRUE, n.cores = 1L, cl = NULL, seed = NULL, export = NULL )
SBC_test( ..., pars, n.draws = 25L, n.sim = 20L * n.draws, burnin = 25L, thin = 2L, show.progress = TRUE, verbose = TRUE, n.cores = 1L, cl = NULL, seed = NULL, export = NULL )
... |
passed to |
pars |
named list with univariate functions of the parameters to use in test. This list
is passed to argument |
n.draws |
number of posterior draws to retain in posterior simulations. |
n.sim |
number of simulation iterations. |
burnin |
burnin to use in posterior simulations, passed to |
thin |
thinning to use in posterior simulations, passed to |
show.progress |
whether a progress bar should be shown. |
verbose |
set to |
n.cores |
the number of cpu cores to use. Default is one, i.e. no parallel computation.
If an existing cluster |
cl |
an existing cluster can be passed for parallel computation. If |
seed |
a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers. |
export |
a character vector with names of objects to export to the workers. This may be needed for parallel execution if expressions in the model formulae depend on global variables. |
A matrix with ranks.
M. Modrak, A.H. Moon, S. Kim, P. Buerkner, N. Huurre, K. Faltejskova, A. Gelman and A. Vehtari (2023). Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. Bayesian Analysis, 1(1), 1-28.
## Not run: # this example may take a long time n <- 10L dat <- data.frame(x=runif(n)) ranks <- SBC_test(~ reg(~ 1 + x, prior=pr_normal(mean=c(0.25, 1), precision=1), name="beta"), sigma.mod=pr_invchisq(df=1, scale=list(df=1, scale=1)), data=dat, pars=list(mu="beta[1]", beta_x="beta[2]", sigma="sigma_"), n.draws=9L, n.sim=10L*20L, thin=2L, burnin=20L ) ranks ## End(Not run)
## Not run: # this example may take a long time n <- 10L dat <- data.frame(x=runif(n)) ranks <- SBC_test(~ reg(~ 1 + x, prior=pr_normal(mean=c(0.25, 1), precision=1), name="beta"), sigma.mod=pr_invchisq(df=1, scale=list(df=1, scale=1)), data=dat, pars=list(mu="beta[1]", beta_x="beta[2]", sigma="sigma_"), n.draws=9L, n.sim=10L*20L, thin=2L, burnin=20L ) ranks ## End(Not run)
The cluster is set up for a number of workers by loading the mcmcsae package and setting up independent RNG streams.
setup_cluster(n.cores = NULL, seed = NULL, export = NULL)
setup_cluster(n.cores = NULL, seed = NULL, export = NULL)
n.cores |
the number of cpu cores to use. |
seed |
optional random seed for reproducibility. |
export |
a character vector with names of objects to export to the workers. |
An object representing the cluster.
Stop a cluster set up by setup_cluster
.
stop_cluster(cl)
stop_cluster(cl)
cl |
the cluster object. |
NULL
.
Select a subset of chains, samples and parameters from a draws component (dc) object
## S3 method for class 'dc' subset( x, chains = seq_len(nchains(x)), draws = seq_len(ndraws(x)), vars = seq_len(nvars(x)), ... )
## S3 method for class 'dc' subset( x, chains = seq_len(nchains(x)), draws = seq_len(ndraws(x)), vars = seq_len(nvars(x)), ... )
x |
a draws component (dc) object. |
chains |
an integer vector indicating which chains to select. |
draws |
an integer vector indicating which samples to select. |
vars |
an integer vector indicating which parameters to select. |
... |
not used. |
The selected part of the draws component as an object of class dc
.
n <- 300 dat <- data.frame(x=runif(n), f=as.factor(sample(1:7, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat) sim <- MCMCsim(sampler) (summary(sim$beta)) (summary(subset(sim$beta, chains=1))) (summary(subset(sim$beta, chains=1, draws=sample(1:ndraws(sim), 100)))) (summary(subset(sim$beta, vars=1:2)))
n <- 300 dat <- data.frame(x=runif(n), f=as.factor(sample(1:7, n, replace=TRUE))) gd <- generate_data(~ reg(~ x + f, prior=pr_normal(precision=1), name="beta"), data=dat) dat$y <- gd$y sampler <- create_sampler(y ~ reg(~ x + f, name="beta"), data=dat) sim <- MCMCsim(sampler) (summary(sim$beta)) (summary(subset(sim$beta, chains=1))) (summary(subset(sim$beta, chains=1, draws=sample(1:ndraws(sim), 100)))) (summary(subset(sim$beta, vars=1:2)))
Summarize a draws component (dc) object
## S3 method for class 'dc' summary( object, probs = c(0.05, 0.5, 0.95), na.rm = FALSE, time = NULL, abbr = FALSE, batch.size = 100L, ... )
## S3 method for class 'dc' summary( object, probs = c(0.05, 0.5, 0.95), na.rm = FALSE, time = NULL, abbr = FALSE, batch.size = 100L, ... )
object |
an object of class |
probs |
vector of probabilities at which to evaluate quantiles. |
na.rm |
whether to remove NA/NaN draws in computing the summaries. |
time |
MCMC computation time; if specified the effective sample size per unit of time is returned in an extra column labeled 'efficiency'. |
abbr |
if |
batch.size |
number of parameter columns to process simultaneously. A larger batch size may speed things up a little, but if an out of memory error occurs it may be a good idea to use a smaller number and try again. The default is 100. |
... |
arguments passed to |
A matrix with summaries of class dc_summary
.
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) summary(sim$u)
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) summary(sim$u)
Summarize an mcdraws object
## S3 method for class 'mcdraws' summary( object, vnames = NULL, probs = c(0.05, 0.5, 0.95), na.rm = FALSE, efficiency = FALSE, abbr = FALSE, batch.size = 100L, ... )
## S3 method for class 'mcdraws' summary( object, vnames = NULL, probs = c(0.05, 0.5, 0.95), na.rm = FALSE, efficiency = FALSE, abbr = FALSE, batch.size = 100L, ... )
object |
an object of class |
vnames |
optional character vector to select a subset of parameters. |
probs |
vector of probabilities at which to evaluate quantiles. |
na.rm |
whether to remove NA/NaN draws in computing the summaries. |
efficiency |
if |
abbr |
if |
batch.size |
number of parameter columns to process simultaneously for vector parameters. A larger batch size may speed things up a little, but if an out of memory error occurs it may be a good idea to use a smaller number and try again. The default is 100. |
... |
arguments passed to |
A list of class mcdraws_summary
summarizing object
.
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) summary(sim) par_names(sim) summary(sim, c("beta", "v_sigma", "u_sigma"))
ex <- mcmcsae_example() sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, store.all=TRUE) summary(sim) par_names(sim) summary(sim, c("beta", "v_sigma", "u_sigma"))
These functions are intended for use in the method
argument of create_TMVN_sampler
.
m_direct() m_Gibbs(slice = FALSE, diagnostic = FALSE, debug = FALSE) m_HMC( Tsim = pi/2, max.events = .Machine$integer.max, diagnostic = FALSE, debug = FALSE ) m_HMCZigZag( Tsim = 1, rate = 1, prec.eq = NULL, diagnostic = FALSE, max.events = .Machine$integer.max, adapt = FALSE, debug = FALSE ) m_softTMVN( sharpness = 100, useV = FALSE, CG = NULL, PG.approx = TRUE, PG.approx.m = -2L, debug = FALSE )
m_direct() m_Gibbs(slice = FALSE, diagnostic = FALSE, debug = FALSE) m_HMC( Tsim = pi/2, max.events = .Machine$integer.max, diagnostic = FALSE, debug = FALSE ) m_HMCZigZag( Tsim = 1, rate = 1, prec.eq = NULL, diagnostic = FALSE, max.events = .Machine$integer.max, adapt = FALSE, debug = FALSE ) m_softTMVN( sharpness = 100, useV = FALSE, CG = NULL, PG.approx = TRUE, PG.approx.m = -2L, debug = FALSE )
slice |
if |
diagnostic |
whether information about violations of inequalities, bounces off inequality walls (for 'HMC' and 'HMCZigZag' methods) or gradient events (for 'HMCZigZag') is printed to the screen. |
debug |
if |
Tsim |
the duration of a Hamiltonian Monte Carlo simulated particle trajectory. This can be specified as either a single positive numeric value for a fixed simulation time, or as a function that is applied in each MCMC iteration to generates a simulation time. |
max.events |
maximum number of events (reflections off inequality walls and for method 'HMCZigZag' also gradient events). Default is unlimited. Specifying a finite number may speed up the sampling but may also result in a biased sampling algorithm. |
rate |
vector of Laplace rate parameters for method 'HMCZigZag'. It must be a positive numeric vector of length one or the number of variables. |
prec.eq |
positive numeric vector of length 1 or the number of equality restrictions,
to control the precision with which the equality restrictions are imposed; the larger
|
adapt |
experimental feature: if |
sharpness |
for method 'softTMVN', the sharpness of the soft inequalities; the larger the better the approximation of exact inequalities. It must a positive numeric vector of length one or the number of inequality restrictions. |
useV |
for method 'softTMVN' whether to base computations on variance instead of precision matrices. |
CG |
use a conjugate gradient iterative algorithm instead of Cholesky updates for sampling
the model's coefficients. This must be a list with possible components |
PG.approx |
see |
PG.approx.m |
see |
A method object, for internal use only.
Transform one or more draws component objects into a new one by applying a function
transform_dc(..., fun, to.matrix = FALSE, labels = NULL)
transform_dc(..., fun, to.matrix = FALSE, labels = NULL)
... |
draws component object(s) of class |
fun |
a function to apply. This function should take as many arguments as there are input objects. The arguments can be arbitrarily named, but they are assumed to be in the same order as the input objects. The function should return a vector. |
to.matrix |
if |
labels |
optional labels for the output object. |
Either a matrix or a draws component object.
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) summary(sim$v_sigma) summary(transform_dc(sim$v_sigma, fun=function(x) x^2)) summary(transform_dc(sim$u, sim$u_sigma, fun=function(x1, x2) abs(x1)/x2))
ex <- mcmcsae_example(n=50) sampler <- create_sampler(ex$model, data=ex$dat) sim <- MCMCsim(sampler, burnin=100, n.iter=300, thin=2, n.chain=4, store.all=TRUE) summary(sim$v_sigma) summary(transform_dc(sim$v_sigma, fun=function(x) x^2)) summary(transform_dc(sim$u, sim$u_sigma, fun=function(x1, x2) abs(x1)/x2))
This function is intended to be used on the right hand side of the formula.V
argument to
create_sampler
or generate_data
.
vfac( factor = "local_", prior = pr_invchisq(df = 1, scale = 1), name = "", debug = FALSE )
vfac( factor = "local_", prior = pr_invchisq(df = 1, scale = 1), name = "", debug = FALSE )
factor |
The name of a factor variable. The name |
prior |
the prior assigned to the variance factors. Currently the prior can be inverse chi-squared
or exponential, specified by a call to |
name |
The name of the variance model component. This name is used in the output of the MCMC simulation
function |
debug |
If |
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
This function is intended to be used on the right hand side of the formula.V
argument to
create_sampler
or generate_data
.
vreg( formula = NULL, remove.redundant = FALSE, sparse = NULL, X = NULL, prior = NULL, Q0 = NULL, b0 = NULL, name = "" )
vreg( formula = NULL, remove.redundant = FALSE, sparse = NULL, X = NULL, prior = NULL, Q0 = NULL, b0 = NULL, name = "" )
formula |
a formula for the regression effects explaining the log-variance.
Variable names are looked up in the data frame passed as |
remove.redundant |
whether redundant columns should be removed from the design matrix.
Default is |
sparse |
whether the model matrix associated with |
X |
a (possibly sparse) design matrix can be specified directly, as an alternative to the
creation of one based on |
prior |
prior specification for the coefficients. Currently only
normal priors are supported, specified using function |
Q0 |
prior precision matrix for the regression effects. The default is a
zero matrix corresponding to a noninformative improper prior.
DEPRECATED, please use argument |
b0 |
prior mean for the regression effect. Defaults to a zero vector.
DEPRECATED, please use argument |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
E. Cepeda and D. Gamerman (2000). Bayesian modeling of variance heterogeneity in normal regression models. Brazilian Journal of Probability and Statistics, 207-221.
T.I. Lin and W.L. Wang (2011). Bayesian inference in joint modelling of location and scale parameters of the t distribution for longitudinal data. Journal of Statistical Planning and Inference 141(4), 1543-1553.
Extract weights from an mcdraws object
## S3 method for class 'mcdraws' weights(object, ...)
## S3 method for class 'mcdraws' weights(object, ...)
object |
an object of class |
... |
currently not used. |
A vector with (simulation means of) weights.
# first create a population data frame N <- 1000 # population size pop <- data.frame(x=rnorm(N), area=factor(sample(1:10, N, replace=TRUE))) pop$y <- 1 + 2*pop$x + seq(-1, to=1, length.out=10)[pop$area] + 0.5*rnorm(N) pop$sample <- FALSE pop$sample[sample(seq_len(N), 100)] <- TRUE # a simple linear regression model: sampler <- create_sampler( y ~ reg(~ x, name="beta"), linpred=list(beta=rowsum(model.matrix(~ x, pop), pop$area)), compute.weights=TRUE, data=pop[pop$sample, ] ) sim <- MCMCsim(sampler) (summary(sim)) str(weights(sim)) crossprod_mv(weights(sim), pop$y[pop$sample]) summary(sim$linpred_) # a multilevel model: sampler <- create_sampler( y ~ reg(~ x, name="beta") + gen(factor = ~ area, name="v"), linpred=list(beta=rowsum(model.matrix(~ x, pop), pop$area), v=diag(10)), compute.weights=TRUE, data=pop[pop$sample, ] ) sim <- MCMCsim(sampler) (summary(sim)) str(weights(sim)) crossprod_mv(weights(sim), pop$y[pop$sample]) summary(sim$linpred_)
# first create a population data frame N <- 1000 # population size pop <- data.frame(x=rnorm(N), area=factor(sample(1:10, N, replace=TRUE))) pop$y <- 1 + 2*pop$x + seq(-1, to=1, length.out=10)[pop$area] + 0.5*rnorm(N) pop$sample <- FALSE pop$sample[sample(seq_len(N), 100)] <- TRUE # a simple linear regression model: sampler <- create_sampler( y ~ reg(~ x, name="beta"), linpred=list(beta=rowsum(model.matrix(~ x, pop), pop$area)), compute.weights=TRUE, data=pop[pop$sample, ] ) sim <- MCMCsim(sampler) (summary(sim)) str(weights(sim)) crossprod_mv(weights(sim), pop$y[pop$sample]) summary(sim$linpred_) # a multilevel model: sampler <- create_sampler( y ~ reg(~ x, name="beta") + gen(factor = ~ area, name="v"), linpred=list(beta=rowsum(model.matrix(~ x, pop), pop$area), v=diag(10)), compute.weights=TRUE, data=pop[pop$sample, ] ) sim <- MCMCsim(sampler) (summary(sim)) str(weights(sim)) crossprod_mv(weights(sim), pop$y[pop$sample]) summary(sim$linpred_)