Title: | Hierarchical Bayesian Small Area Estimation |
---|---|
Description: | Functions to compute small area estimates based on a basic area or unit-level model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the between-area variance. The output includes the model fit, small area estimates and corresponding mean squared errors, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and mean squared errors, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates. |
Authors: | Harm Jan Boonstra [aut, cre] |
Maintainer: | Harm Jan Boonstra <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2025-02-15 04:40:14 UTC |
Source: | https://github.com/cran/hbsae |
Package hbsae provides functions to compute small area estimates based on the basic unit-level and area-level models. The models are fit and small area estimates are computed in a hierarchical Bayesian way, using numerical integration.
The main function that does most of the computational work is fSAE.Unit
.
Function fSAE
is provided as a more convenient interface to
fSurvReg
, fSAE.Area
and fSAE.Unit
.
Compute aggregates of small area estimates and MSEs.
aggr(x, R)
aggr(x, R)
x |
sae object. |
R |
aggregation matrix, M x r matrix where M is the number of areas and r the number of aggregate areas; default is aggregation over all areas. |
Object of class sae
with aggregated small area estimates and MSEs.
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop) # by default aggregate over all areas global <- aggr(sae) EST(global); RMSE(global) # aggregation to broad area # first build aggregation matrix M <- d$Xpop[, c("area22", "area23", "area24")] / d$Xpop[, "(Intercept)"] M <- cbind(1 - rowSums(M), M); colnames(M)[1] <- "area21" est.area2 <- aggr(sae, M) EST(est.area2); RMSE(est.area2) COV(est.area2) # covariance matrix
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop) # by default aggregate over all areas global <- aggr(sae) EST(global); RMSE(global) # aggregation to broad area # first build aggregation matrix M <- d$Xpop[, c("area22", "area23", "area24")] / d$Xpop[, "(Intercept)"] M <- cbind(1 - rowSums(M), M); colnames(M)[1] <- "area21" est.area2 <- aggr(sae, M) EST(est.area2); RMSE(est.area2) COV(est.area2) # covariance matrix
Benchmark small area estimates to conform to given totals at aggregate levels.
bench(x, R, rhs, mseMethod = "no", Omega, Lambda)
bench(x, R, rhs, mseMethod = "no", Omega, Lambda)
x |
sae object to be benchmarked. As an alternative, a list can be supplied with at least components |
R |
restriction matrix, M x r matrix where r is the number of restrictions and M the number of areas; default is a single constraint on the population total.
Note that |
rhs |
r-vector of benchmark totals corresponding to the restrictions represented by (the columns of) |
mseMethod |
if |
Omega |
M x M matrix |
Lambda |
r x r matrix |
This function adjusts the small area estimates EST(x)
, denoted by , to
where
is a symmetric M x M matrix. By default,
is taken to be the covariance matrix
of the input sae-object
x
.
where
is the matrix passed to
bench
and denotes the population size
of the
th area, is a M x r matrix describing the aggregate level relative to the area level.
Note that the matrix
acts on the vector of area totals whereas
acts on the area means to
produce the aggregate totals.
The default for
is a column vector of 1s representing an additivity constraint to the overall population total.
is an r-vector of aggregate-level totals, specified as
rhs
, that the small area estimates should add up to.
is a symmetric r x r matrix controlling the penalty associated with deviations from the constraints
.
The default is
, implying that the constraints must hold exactly.
The adjusted or benchmarked small area estimates minimize the expectation of the loss function
with respect to the posterior for the unknown small area means .
Optionally, MSE(x)
is updated as well. If mseMethod="exact"
the covariance matrix is adjusted from
to
and if mseMethod
is "model"
the adjusted covariance matrix is
The latter method treats the benchmark adjustments as incurring a bias relative to the best predictor under the model.
An object of class sae
with adjusted estimates.
G.S. Datta, M. Ghosh, R. Steorts and J. Maples (2011). Bayesian benchmarking with applications to small area estimation. TEST 20(3), 574-588.
Y. You, J.N.K. Rao and P. Dick (2004). Benchmarking Hierarchical Bayes Small Area Estimators in the Canadian Census Undercoverage Estimation. Statistics in Transition 6(5), 631-640.
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop) # calibrate to overall population total sae.c <- bench(sae, rhs=sum(d$mY0*sae$Narea)) plot(sae, sae.c)
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop) # calibrate to overall population total sae.c <- bench(sae, rhs=sum(d$mY0*sae$Narea)) plot(sae, sae.c)
This function computes a cross-validation measure defined at the area level.
It can be used, for example, to compare the predictive ability of area and unit-level models.
The code is based in part on that of cv.glm
from package boot.
CVarea( sae, weight = TRUE, cost = function(y, yhat, w) sum(w * (y - yhat)^2)/sum(w), K = 10L, method = "hybrid", seed )
CVarea( sae, weight = TRUE, cost = function(y, yhat, w) sum(w * (y - yhat)^2)/sum(w), K = 10L, method = "hybrid", seed )
sae |
object of class sae, resulting from a call to |
weight |
if |
cost |
cost function to be used. Defaults to a quadratic cost function. |
K |
K in K-fold cross-validation. Specifies in how many parts the dataset should be divided. |
method |
method used to refit the model. One of "HB", "hybrid" (default) or "REML", in the order of slow to fast. |
seed |
random seed used in selecting groups of areas to leave out in K-fold cross-validation. |
The computed area-level cross-validation measure.
d <- generateFakeData() # compute small area estimates based on area-level and unit-level models saeArea <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="area", silent=TRUE, keep.data=TRUE) saeUnit <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="unit", silent=TRUE, keep.data=TRUE) # compare area and unit-level models based on area-level cross-validation CVarea(saeArea, K=10, seed=1) # 10-fold CV for area-level model CVarea(saeUnit, K=10, seed=1) # 10-fold CV for unit-level model
d <- generateFakeData() # compute small area estimates based on area-level and unit-level models saeArea <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="area", silent=TRUE, keep.data=TRUE) saeUnit <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="unit", silent=TRUE, keep.data=TRUE) # compare area and unit-level models based on area-level cross-validation CVarea(saeArea, K=10, seed=1) # 10-fold CV for area-level model CVarea(saeUnit, K=10, seed=1) # 10-fold CV for unit-level model
This function prepares the (unit-level) input data and calls one of the lower level functions fSurvReg
, fSAE.Area
or fSAE.Unit
to compute survey regression, area-level model or unit-level model small area estimates. Area-level model estimates
are computed by first computing survey regression estimates and using these as input for fSAE.Area
.
fSAE( formula, data, area = NULL, popdata = NULL, type = "unit", model.direct = NULL, formula.area = NULL, contrasts.arg = NULL, remove.redundant = TRUE, redundancy.tol = 1e-07, sparse = FALSE, ... )
fSAE( formula, data, area = NULL, popdata = NULL, type = "unit", model.direct = NULL, formula.area = NULL, contrasts.arg = NULL, remove.redundant = TRUE, redundancy.tol = 1e-07, sparse = FALSE, ... )
formula |
model formula, indicating response variable and covariates. |
data |
unit-level data frame containing all variables used in |
area |
name of area indicator variable in |
popdata |
data frame or matrix containing area population totals for all covariates. The rows should correspond to areas
for which estimates are required.
Column names should include those produced by |
type |
type of small area estimates: "direct" for survey regression, "area" for area-level model, "unit" for unit-level model estimates.
If |
model.direct |
if type="area", this argument can be used to specify by means of a formula the covariates to use for the computation of the initial survey regression estimates.
If unspecified, the covariates specified by |
formula.area |
if type="unit", this is an optional formula specifying covariates that should be used at the area level.
These covariates should be available in |
contrasts.arg |
list for specification of contrasts for factor variables. Passed to |
remove.redundant |
if |
redundancy.tol |
tolerance for detecting linear dependencies among the columns of the design matrix. Also used as tolerance in the check whether the design matrix redundancy is shared by the population totals. |
sparse |
if |
... |
An object of class sae
containing the small area estimates, their MSEs, and the model fit. If type
is "data" a list containing
the model matrix, response vector, area indicator, area population sizes and matrix of population means is returned.
d <- generateFakeData() # model fitting only (fit <- fSAE(y0 ~ x + area2, data=d$sam, area="area")) # model fitting and small area estimation, unit-level model saeHB <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, silent=TRUE) saeHB # print a summary EST(saeHB) # small area estimates RMSE(saeHB) # error estimates str(saeHB) plot(saeHB, list(est=d$mY0), CI=2) # compare to true population means # unit-level model with REML model-fit instead of Bayesian approach saeREML <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, method="REML", silent=TRUE) plot(saeHB, saeREML) # compare # basic area-level model saeA <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="area") plot(saeHB, saeA) # SAE estimates based on a linear unit-level model without area effects saeL <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, method="synthetic") plot(saeHB, saeL) # model-based estimation of overall population mean without area effects est.global <- fSAE(y0 ~ x + area2, data=d$sam, area=NULL, popdata=colSums(d$Xpop), method="synthetic") EST(est.global); RMSE(est.global) # no model fitting or estimation, but return design matrix, variable of interest, # area indicator, area population sizes and matrix of population means dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="data") str(dat)
d <- generateFakeData() # model fitting only (fit <- fSAE(y0 ~ x + area2, data=d$sam, area="area")) # model fitting and small area estimation, unit-level model saeHB <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, silent=TRUE) saeHB # print a summary EST(saeHB) # small area estimates RMSE(saeHB) # error estimates str(saeHB) plot(saeHB, list(est=d$mY0), CI=2) # compare to true population means # unit-level model with REML model-fit instead of Bayesian approach saeREML <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, method="REML", silent=TRUE) plot(saeHB, saeREML) # compare # basic area-level model saeA <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="area") plot(saeHB, saeA) # SAE estimates based on a linear unit-level model without area effects saeL <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, method="synthetic") plot(saeHB, saeL) # model-based estimation of overall population mean without area effects est.global <- fSAE(y0 ~ x + area2, data=d$sam, area=NULL, popdata=colSums(d$Xpop), method="synthetic") EST(est.global); RMSE(est.global) # no model fitting or estimation, but return design matrix, variable of interest, # area indicator, area population sizes and matrix of population means dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="data") str(dat)
This function returns small area estimates based on the basic area-level model, also known as the Fay-Herriot model.
It calls fSAE.Unit
to carry out the computations.
fSAE.Area(est.init, var.init, X, x, ...)
fSAE.Area(est.init, var.init, X, x, ...)
est.init |
m-vector of initial estimates, where m is the number of in-sample areas. |
var.init |
m-vector of corresponding variance estimates. |
X |
M x p matrix of area-level covariates (typically population means), where M is the number of areas for which estimates are computed.
If missing, a column vector of ones of the same length as |
x |
an optional m x p matrix with auxiliary area-level covariates to be used for fitting the model,
where the rows correspond to the components of |
... |
additional arguments passed to |
An object of class sae
containing the small area estimates and MSEs, the model fit, and model selection measures.
R.E. Fay and R.A. Herriot (1979). Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. Journal of the American Statistical Association 74(366), 269-277.
J.N.K. Rao and I. Molina (2015). Small Area Estimation. Wiley.
d <- generateFakeData() # first compute input estimates without "borrowing strength" over areas sae0 <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="direct", keep.data=TRUE) # compute small area estimates based on the basic area-level model # using the above survey regression estimates as input sae <- fSAE.Area(EST(sae0), MSE(sae0), X=sae0$Xp) EST(sae) # estimates RMSE(sae) # standard errors
d <- generateFakeData() # first compute input estimates without "borrowing strength" over areas sae0 <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="direct", keep.data=TRUE) # compute small area estimates based on the basic area-level model # using the above survey regression estimates as input sae <- fSAE.Area(EST(sae0), MSE(sae0), X=sae0$Xp) EST(sae) # estimates RMSE(sae) # standard errors
This is the function that carries out most of the computational work. It computes small area estimates based on the basic unit-level model, also known as the
Battese-Harter-Fuller model, although it is also called by fSurvReg
and fSAE.Area
to compute survey regression
or area-level model small area estimates. By default, Hierarchical Bayes estimates are computed, using fast one-dimensional
numerical integration to average over the posterior density for the ratio of between and within area variance. This way, the small area estimates
and MSEs account for the uncertainty about this parameter. Besides hierarchical Bayes, REML and hybrid methods are supported.
These methods use the REML estimate or posterior mean of the variance ratio, respectively, as a plug-in estimate. Both methods do not account for uncertainty about this
parameter. Synthetic estimates are computed by setting the variance ratio to zero.
fSAE.Unit( y, X, area, Narea = NULL, Xpop = NULL, fpc = TRUE, v = NULL, vpop = NULL, w = NULL, wpop = NULL, method = "HB", beta0 = rep(0, ncol(X)), Omega0 = Diagonal(n = ncol(X), x = 0), nu0 = 0, s20 = 0, prior = function(x) rep.int(1L, length(x)), CV = prod(dim(X)) < 1e+06, CVweights = NULL, silent = FALSE, keep.data = FALSE, full.cov = nrow(Xpop) < 1000L, lambda0 = NULL, rel.int.tol = 0.01, ... )
fSAE.Unit( y, X, area, Narea = NULL, Xpop = NULL, fpc = TRUE, v = NULL, vpop = NULL, w = NULL, wpop = NULL, method = "HB", beta0 = rep(0, ncol(X)), Omega0 = Diagonal(n = ncol(X), x = 0), nu0 = 0, s20 = 0, prior = function(x) rep.int(1L, length(x)), CV = prod(dim(X)) < 1e+06, CVweights = NULL, silent = FALSE, keep.data = FALSE, full.cov = nrow(Xpop) < 1000L, lambda0 = NULL, rel.int.tol = 0.01, ... )
y |
response vector of length n. |
X |
n x p model matrix. |
area |
n-vector of area codes, typically a factor variable with m levels, where m is the number of in-sample areas. |
Narea |
M-vector of area population sizes, where M is the number of areas for which estimates are required.
There should be a one-to-one correspondence with the rows of |
Xpop |
M x p matrix of population means. If |
fpc |
whether a finite population correction should be used. Default is |
v |
unit-level variance structure, n-vector. Defaults to a vector of 1s. In some cases it might be useful to take v proportional to the sampling probabilities. |
vpop |
population area means of v, M-vector. Defaults to a vector of 1s. Not used when |
w |
area-level variance structure, m-vector. Defaults to a vector of 1s. |
wpop |
area-level variance structure, M-vector. Defaults to a vector of 1s.
Only components of |
method |
one of "HB", "hybrid", "REML", "synthetic", "survreg", "BLUP" where
"HB" (default) does the full hierarchical Bayes computation, i.e. numerical integration over the posterior density for the between area variance parameter,
"hybrid" computes the Best Linear Unbiased Predictor (BLUP) with the posterior mean for the variance parameter plugged in,
"REML" computes the BLUP with the restricted maximum likelihood estimate of the variance parameter plugged in,
"synthetic" computes synthetic estimates where the between area variance is set to 0, and
"survreg" computes survey regression estimates where the between area variance approaches infinity.
"BLUP" computes BLUP estimates with the value provided for |
beta0 |
mean vector of normal prior for coefficient vector. |
Omega0 |
inverse covariance matrix of normal prior for coefficient vector. Default prior corresponds to the (improper) uniform distribution. |
nu0 |
degrees of freedom parameter for inverse gamma prior for residual (within-area) variance. Default is 0. |
s20 |
scale parameter for inverse gamma prior for residual (within-area) variance. Default is 0. |
prior |
prior density for the ratio lambda = between-area-variance / within-area variance. This should be a (vectorized) function that takes a vector lambda and returns a vector of prior density values at lambda. The density does not have to be normalized. The default is the (improper) uniform prior. The within-area variance and lambda are assumed independent a priori. |
CV |
whether (an approximation to the) leave-one-out cross-validation measure should be computed. As this
requires the computation of a dense matrix the size of |
CVweights |
n-vector of weights to use for CV computation. |
silent |
if |
keep.data |
if |
full.cov |
if |
lambda0 |
optional starting value for the ratio of between and within-area variance used in the numerical routines.
If |
rel.int.tol |
tolerance for the estimated relative integration error (default is 1 percent). A warning is issued if the estimated relative error exceeds this value. |
... |
additional control parameters passed to function |
The default Hierarchical Bayes method uses numerical integration (as provided by function integrate
) to compute
small area estimates and MSEs. The model parameters returned, such as fixed and random effects, are currently not averaged over the
posterior distribution for the variance ratio. They are evaluated at the posterior mean of the variance ratio.
An object of class sae
containing the small area estimates and MSEs, the model fit, and model selection measures.
G.E. Battese, R.M. Harter and W.A. Fuller (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical Association, 83(401), 28-36.
G.S. Datta and M. Ghosh (1991). Bayesian Prediction in Linear Models: Applications to Small Area Estimation. The Annals of Statistics 19(4), 1748-1770.
J.N.K. Rao and I. Molina (2015). Small Area Estimation. Wiley.
d <- generateFakeData() # generate design matrix, variable of interest, area indicator and population data dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="data") # compute small area estimates based on the basic unit-level model sae <- fSAE.Unit(dat$y, dat$X, dat$area, dat$Narea, dat$PopMeans) EST(sae) # estimates RMSE(sae) # standard errors
d <- generateFakeData() # generate design matrix, variable of interest, area indicator and population data dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="data") # compute small area estimates based on the basic unit-level model sae <- fSAE.Unit(dat$y, dat$X, dat$area, dat$Narea, dat$PopMeans) EST(sae) # estimates RMSE(sae) # standard errors
This function computes survey regression estimates as a special case of unit-level model small area estimates with a (relatively) very large value for the between-area variance
but without including area effects in the model fit. The model assumes a single overall variance parameter, so that the resulting estimated variances are not area-specific but smoothed.
Varying inclusion probabilities may be taken into account by including them in the model, e.g. as an additional covariate,
and/or as model variance structure by specifying arguments v and vpop, see fSAE.Unit
. The resulting estimates may be used as input estimates for area-level model small area estimation.
fSurvReg(y, X, area, Narea, Xpop, removeEmpty = TRUE, ...)
fSurvReg(y, X, area, Narea, Xpop, removeEmpty = TRUE, ...)
y |
response vector of length n. |
X |
n x p model matrix. |
area |
n-vector of area codes, typically a factor variable with m levels, where m is the number of in-sample areas. |
Narea |
M-vector of area population sizes. |
Xpop |
M x p matrix of population means. |
removeEmpty |
whether out-of-sample areas should be removed from the results. If |
... |
optional arguments v and vpop passed to |
An object of class sae
containing the survey regression small area estimates and their estimated variances.
G.E. Battese, R.M. Harter and W.A. Fuller (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical Association, 83(401), 28-36.
J.N.K. Rao and I. Molina (2015). Small Area Estimation. Wiley.
d <- generateFakeData() # generate design matrix, variable of interest, area indicator and population data dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="data") sae <- fSurvReg(dat$y, dat$X, dat$area, dat$Narea, dat$PopMeans) EST(sae) # estimates RMSE(sae) # standard errors
d <- generateFakeData() # generate design matrix, variable of interest, area indicator and population data dat <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, type="data") sae <- fSurvReg(dat$y, dat$X, dat$area, dat$Narea, dat$PopMeans) EST(sae) # estimates RMSE(sae) # standard errors
Generate artificial dataset for demonstration and testing purposes.
generateFakeData( M = 50, meanNarea = 1000, sW = 100, sB = 50, sBx = 0.5, samplingFraction = 0.1 )
generateFakeData( M = 50, meanNarea = 1000, sW = 100, sB = 50, sBx = 0.5, samplingFraction = 0.1 )
M |
number of areas. |
meanNarea |
mean number of population units per area. |
sW |
within area standard deviation. |
sB |
between area standard deviation. |
sBx |
random slope standard deviation. |
samplingFraction |
sampling fraction used to draw a random sample from the population units. |
List containing sample (sam), population totals (Xpop), and true population means for four target variables (mY0, mY1, mY2, mY3).
This function plots small area estimates with error bars.
Multiple sets of estimates can be compared. The default ordering of the estimates
is by their area population sizes.
This method uses a plot function that is adapted from function
coefplot.default
of package arm.
## S3 method for class 'sae' plot( ..., n.se = 1, est.names, sort.by = NULL, decreasing = FALSE, index = NULL, maxrows = 50L, maxcols = 6L, type = "sae", offset = 0.1, cex.var = 0.8, mar = c(0.1, 2.1, 5.1, 0.1) )
## S3 method for class 'sae' plot( ..., n.se = 1, est.names, sort.by = NULL, decreasing = FALSE, index = NULL, maxrows = 50L, maxcols = 6L, type = "sae", offset = 0.1, cex.var = 0.8, mar = c(0.1, 2.1, 5.1, 0.1) )
... |
|
n.se |
number of standard errors below and above the point estimates
to use for error bars. By default equal to 1. This only refers to the
objects of class |
est.names |
labels to use in the legend for the components of the |
sort.by |
vector by which to sort the coefficients, referring to the first object passed. |
decreasing |
if |
index |
vector of names or indices of the selected areas to be plotted. |
maxrows |
maximum number of rows in a column. |
maxcols |
maximum number of columns of estimates on a page. |
type |
"sae" for small area estimates (default), "coef" for coefficients, "raneff" for random effects. |
offset |
space used between plots of multiple estimates for the same area. |
cex.var |
the fontsize of the variable names, default=0.8. |
mar |
a numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot. |
weights
.Plot method for objects of class weights
.
## S3 method for class 'weights' plot( x, log = FALSE, breaks = "Scott", main = "Distribution of weights", xlab = if (log) "log(weight)" else "weight", ylab = "frequency", col = "cyan", ... )
## S3 method for class 'weights' plot( x, log = FALSE, breaks = "Scott", main = "Distribution of weights", xlab = if (log) "log(weight)" else "weight", ylab = "frequency", col = "cyan", ... )
x |
object of class |
log |
whether to log-transform the weights. |
breaks |
breaks argument of function |
main |
main title of plot. |
xlab |
x-axis label. |
ylab |
y-axis label. |
col |
colour. |
... |
additional arguments passed to |
Print method for objects of class sae.
## S3 method for class 'sae' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'sae' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
object of class |
digits |
number of digits to display. |
... |
additional arguments passed to |
Functions fSAE
, fSurvReg
, fSAE.Area
and fSAE.Unit
return an object of class sae
. It contains information on the model fit as well as the
small area estimates, error estimates and a few model selection measures.
The functions listed below extract the main components from an object of class sae
.
EST(x, type="sae", tot=FALSE)
return the vector of small area estimates of sae
object x. Alternatively,
with type
"coef" or "raneff" fixed or random effect estimates are returned. If 'tot=TRUE' and 'type="sae"' estimates
for area population totals instead of means are returned.
MSE(x, type="sae", tot=FALSE)
return the vector of mean squared errors of sae
object x. Alternatively,
with type
"coef" or "raneff" MSEs of fixed or random effects are returned. If 'tot=TRUE' and 'type="sae"' MSEs
for area population totals instead of means are returned.
SE(x, type="sae", tot=FALSE)
extract standard errors, i.e. square roots of MSEs.
RMSE(x, type="sae", tot=FALSE)
alias for SE(x, type="sae", tot=FALSE)
relSE(x, type="sae")
extract relative standard errors.
COV(x)
extract the covariance matrix for the small area estimates.
COR(x)
extract the correlation matrix for the small area estimates.
coef(x)
coef
method for sae
objects; returns vector of fixed effects.
vcov(x)
vcov
method for sae
objects; returns covariance matrix for fixed effects.
raneff(x, pop)
return vector of random effects. If pop=TRUE
returns a vector for predicted areas (zero for out-of-sample areas), otherwise a vector for in-sample areas.
raneff.se(x, pop)
return vector of standard errors for random effects.
residuals(x)
residuals
method for sae
objects; returns a vector of residuals.
fitted(x)
fitted
method for sae
objects; returns a vector of fitted values.
se2(x)
extracts within-area variance estimate.
sv2(x)
extracts between-area variance estimate.
wDirect(x, pop)
extract vector of weights of the survey regression components in the small area estimates. If pop=TRUE
returns a vector for predicted areas (zero for out-of-sample areas), otherwise a vector for in-sample areas.
synthetic(x)
extract vector of synthetic estimates.
CV(x)
extract leave-one-out cross-validation measure.
cAIC(x)
extract conditional AIC measure.
R2(x)
extract unit-level R-squared goodness-of-fit measure.
Other components include
relErrM,relErrV
relative numerical integration errors in estimates and MSEs, for method
"HB".
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop) coef(sae) # fixed effects raneff(sae) # random effects sv2(sae) # between-area variance se2(sae) # within-area variance cAIC(sae) # conditional AIC
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop) coef(sae) # fixed effects raneff(sae) # random effects sv2(sae) # between-area variance se2(sae) # within-area variance cAIC(sae) # conditional AIC
weights
.Summary method for objects of class weights
.
## S3 method for class 'weights' summary(object, ...)
## S3 method for class 'weights' summary(object, ...)
object |
object of class |
... |
not used. |
The small area estimates can be interpreted as weighted sums of the response variable. This function computes the weights corresponding to the aggregated small area estimates or the weights corresponding to a specific small area estimate. The weights applied to the response variable need not exactly reproduce the Hierarchical Bayes estimate since the latter is averaged over the posterior distribution for the variance ratio whereas the weights are evaluated at the posterior mean. Under the default prior for the fixed effects, the weights applied to the design matrix reproduce the corresponding population numbers.
uweights(x, areaID = NULL, forTotal = FALSE)
uweights(x, areaID = NULL, forTotal = FALSE)
x |
sae object. |
areaID |
if left unspecified ( |
forTotal |
if |
An object of class weights
.
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, method="hybrid", keep.data=TRUE) # compute unit weights w <- uweights(sae, forTotal=TRUE) summary(w) # summary statistics plot(w) # histogram of weights # checks all.equal(sum(w * sae$y), sum(EST(sae) * sae$Narea)) all.equal(colSums(w * as.matrix(sae$X)), colSums(sae$Xp * sae$Narea))
d <- generateFakeData() # compute small area estimates sae <- fSAE(y0 ~ x + area2, data=d$sam, area="area", popdata=d$Xpop, method="hybrid", keep.data=TRUE) # compute unit weights w <- uweights(sae, forTotal=TRUE) summary(w) # summary statistics plot(w) # histogram of weights # checks all.equal(sum(w * sae$y), sum(EST(sae) * sae$Narea)) all.equal(colSums(w * as.matrix(sae$X)), colSums(sae$Xp * sae$Narea))