An introduction to the ngvb package
Rafael Cabral
2023-05-15
ngvb.Rmd
Introduction
Latent Gaussian models (LGMs) assume that the latent variables \(\mathbf{x}^G\) follow a Gaussian random
vector with mean \(\mathbf{\mu}\) and
precision matrix \(\mathbf{Q}=
\mathbf{\mathbf{D}}^T \mathbf{D}\). It can be expressed through
\[\begin{equation}\label{eq:gaussian}
\mathbf{D}(\mathbf{x}^G -\mathbf{\mu})\overset{d}{=} \mathbf{Z},
\end{equation}\] where \(\mathbf{Z}\) is a vector of i.i.d. standard
Gaussian variables. Models of this type can be fitted with
inla
. The non-Gaussian extension for \(\mathbf{x}^G\) consists in replacing the
driving noise distribution: \[\begin{equation}\label{eq:frame}
\mathbf{D}(\mathbf{x}-\mathbf{\mu})\overset{d}{=} \mathbf{\Lambda},
\end{equation}\] where \(\boldsymbol{\Lambda}\) is a vector of
independent and standardized normal-inverse Gaussian (NIG) random
variables that depend on the parameter \(\eta\), which controls the leptokurtosis.
This extension leads to latent non-Gaussian models (LnGMs). The package
ngvb
is designed to fit these models. ngvb
is
based on a variational Bayes algorithm. It leverages the fact that the
conditioned latent field \(\mathbf{x}|\mathbf{V}\) is normally
distributed where \(\mathbf{V}\) are
inverse-Gaussian mixing variables that regulate the extra flexibility of
the model.
We assume the user is familiar with the INLA package (Gómez-Rubio (2020)). Latent non-Gaussian models
are discussed in Cabral, Bolin, and Rue
(2022), and a Bayesian implementation of LnGMs in Stan can be
found in rafaelcabral96.github.io/nigstan/.
We present next a series of examples on how to use the ngvb
package.
RW1 model
Here we fit an RW1 latent process to the jumpts
time
series.
library(ngvb)
#> Loading required package: INLA
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loading required package: parallel
#> Loading required package: sp
#> This is INLA_22.06.03 built 2022-06-04 11:18:55 UTC.
#> - See www.r-inla.org/contact-us for how to get help.
#> Loading required package: GIGrvg
plot(jumpts)
The model is the following:
\[ y_i = \sigma_x x_i + \sigma_y \epsilon_i, \]
where \(\epsilon_i\) is standard Gaussian noise, and \(\mathbf{x}=[x_1,\dotsc,x_{100}]^T\) follows a non-Gaussian RW1 prior, defined by \(x_{i}= x_{i-1} + \Lambda_i(\eta), i=2,\dotsc,100\), where \(\Lambda_i(\eta)\) is normal-inverge Gaussian (NIG) noise with non-Gaussianity parameter \(\eta\) (\(\eta=0\) leads to the Gaussian model).
First, we start by fitting a latent Gaussian model (LGM), where the driving noise of the latent field is normally distributed:
formula <- y ~ -1 + f(x, model = "rw1")
LGM <- inla(formula,
data = jumpts,
control.compute = list(config = TRUE))
We check the adequacy of the latent Gaussian assumption for this
model and data through the ng.check
function, which
provides a set of diagnostic tools. It plots \(d_i(\mathbf{y}), \ i=1,\dotsc,n\), the
local increase (for small \(\eta\)) in
the Bayes factor (BF) when we make the latent field non-Gaussian. High
values of \(d_i(\mathbf{y})\) for a
particular index \(i\) indicate that
making the driving noise \(\Lambda_i(\eta)\) non-Gaussian increases
the Bayes Factor, and thus indicating the lack of flexibility of the
latent Gaussian assumption at that location. Notice that the two sudden
jumps are detected in the next plot.
It also plots the overall BF sensitivity \(s_0(\mathbf{y}) = \sum_i d_i(\mathbf{y})\) (red line) against its reference distribution. A high mismatch suggests that there could be non-Gaussian features in the observed data that are better modelled by an LnGM.
check.list <- ng.check(LGM)
#> Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> $plot.ranking
#>
#> $plot.ref
#> $plot.ref[[1]]
The diagnostic plots suggest we could benefit from fitting an LnGM.
Next, we consider the non-Gaussian version of the previous LGM model,
where the latent field is driven by NIG noise. We use the output of the
inla
function LGM
as the input of the
ngvb
function. By default, the leptokurtosis parameter
\(\eta\) has an exponential prior with
rate parameter equal to 1. This can be changed with the argument
alpha.eta
. Run ?ngvb
to see the other
arguments.
LnGM <- ngvb(fit = LGM)
#> [1] "Warning: You need to add log(|D|) to mll: check inla.doc(generic0)"
#>
#> 0Oo----------- Iteration 1 -----------oO0
#> Initial value of eta: 0.5
#>
#> 0Oo----------- Iteration 2 -----------oO0
#> Expectation of eta: 4.878
#>
#> 0Oo----------- Iteration 3 -----------oO0
#> Expectation of eta: 3.952
#>
#> 0Oo----------- Iteration 4 -----------oO0
#> Expectation of eta: 3.423
#>
#> 0Oo----------- Iteration 5 -----------oO0
#> Expectation of eta: 3.06
#>
#> 0Oo----------- Iteration 6 -----------oO0
#> Expectation of eta: 2.747
#>
#> 0Oo----------- Iteration 7 -----------oO0
#> Expectation of eta: 2.616
#>
#> 0Oo----------- Iteration 8 -----------oO0
#> Expectation of eta: 2.476
#>
#> 0Oo----------- Iteration 9 -----------oO0
#> Expectation of eta: 2.427
#>
#> 0Oo----------- Iteration 10 -----------oO0
#> Expectation of eta: 2.398
#>
#> 0Oo----------- Maximum number of iterations reached -----------oO0
LnGM
is an S4 object of class ngvb.list
containing the outputs, which can be accessed with
LnGM@...
. Available methods are summary
,
print
, plot
, fitted
, and
simulate
. To read the help functions, run:
#?`summary,ngvb.list-method`
#?`print,ngvb.list-method`
#?`plot,ngvb.list,missing-method`
#?`fitted,ngvb.list-method`
#?`simulate,ngvb.list-method`
We can see in plot(LnGM)
that the two jump locations
were picked up by the mixing variables \(V\) (which add more flexibility on those
locations), and the evolution of the parameter \(\eta\) seems to indicate convergence.
plot(LnGM)
LnGM@LGM
contains the inla object containing summaries
and marginals of the LGM approximation of \((x,\theta)\) for the last iteration. We
show next the Bayes factor:
exp(LnGM@LGM$mlik[2] - LGM$mlik[2])
#> [1] 1.760063e+15
Random slople, random intercept model (several model components)
We fit here growth curves from an orthodontic study including several
male and female children at ages 8,10,12, and 14. For more information
and references, run ?mice::potthoffroy
. Next, we show a
Trellis plot, where female subjects have labels starting with F and male
subjects starting with M.
We use the following linear mixed-effects model to describe the response growth with age:
\[ y_{i j}=\beta_0+\delta_0 I_i(F) +\left(\beta_1+\delta_1 I_i(F)\right) t_j + b_{0 i}+b_{1 i} t_j+\epsilon_{i j}, \]
where \(y_{i j}\) denotes the response for the \(i\)th subject at age \(t_j\), \(i=1, \ldots, 27\) and \(j=1, \ldots, 4\) ; \(\beta_0\) and \(\beta_1\) denote, respectively, the intercept and the slope fixed effects for boys; \(\delta_0\) and \(\delta_1\) denote, respectively, the difference in intercept and slope fixed effects between girls and boys; \(I_i(F)\) denotes an indicator variable for females; \(\mathbf{b}_i=\left(b_{0 i}, b_{1 i}\right)\) is the random effects vector for the \(i\) th subject; and \(\epsilon_{i j}\) is the within-subject error.
The dataset is:
summary(Orthodont)
#> subject Female time value tF
#> Min. : 1 Min. :0.0000 Min. :0.00 Min. :16.50 Min. :0.000
#> 1st Qu.: 7 1st Qu.:0.0000 1st Qu.:0.75 1st Qu.:22.00 1st Qu.:0.000
#> Median :14 Median :0.0000 Median :1.50 Median :23.75 Median :0.000
#> Mean :14 Mean :0.4074 Mean :1.50 Mean :24.02 Mean :1.019
#> 3rd Qu.:21 3rd Qu.:1.0000 3rd Qu.:2.25 3rd Qu.:26.00 3rd Qu.:2.000
#> Max. :27 Max. :1.0000 Max. :3.00 Max. :31.50 Max. :4.000
#> subject2
#> Min. : 1
#> 1st Qu.: 7
#> Median :14
#> Mean :14
#> 3rd Qu.:21
#> Max. :27
If the random effects have a normal prior, the previous model is an
LGM that can be fitted in INLA. The random intercept is elicited in
formula
by f(subject, model = "iid")
, and the
random slopes by f(subject2, time, model = "iid")
. The
covariates subject1
and subject2
are the same
since the f()-terms in INLA should depend on the unique names of the
covariates.
formula <- value ~ 1 + Female + time + tF + f(subject, model = "iid") + f(subject2, time, model = "iid")
LGM <- inla(formula,
data = Orthodont,
control.compute = list(config = TRUE))
We now run the ng.check
function to generate diagnostic
plots regarding the latent Gaussianity assumption.
check.list <- ng.check(LGM)
#> Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> $plot.ranking
#>
#> $plot.ref
#> $plot.ref[[1]]
#>
#> $plot.ref[[2]]
The last two plots do not show a large mismatch between \(s_0(\mathbf{y})\) and its reference distribution. The sensitivity measures of the fixed effects relative to the latent Gaussianity assumption for the random intercepts (subject) and random slopes (subject2) is shown next. These are small compared to the posterior standard deviation of the fixed effects.
check.list$sens.fixed.matrix
#> (Intercept) Female time tF
#> subject -6.467191e-01 8.005355e-01 -4.748893e-05 1.165662e-04
#> subject2 -9.455233e-11 -2.395009e-07 -2.940273e-07 2.397182e-07
However, for illustration purposes, we still fit an LnGM. We consider the prior random effects to be distributed according to long-tailed NIG variables to accommodate possible outliers in the intercepts and slopes.
LnGM <- ngvb(LGM)
#>
#> 0Oo----------- Iteration 1 -----------oO0
#> Initial value of eta: 0.5 0.5
#>
#> 0Oo----------- Iteration 2 -----------oO0
#> Expectation of eta: 0.302 0.13
#>
#> 0Oo----------- Iteration 3 -----------oO0
#> Expectation of eta: 0.351 0.152
#>
#> 0Oo----------- Iteration 4 -----------oO0
#> Expectation of eta: 0.401 0.151
#>
#> 0Oo----------- Iteration 5 -----------oO0
#> Expectation of eta: 0.388 0.152
#>
#> 0Oo----------- Iteration 6 -----------oO0
#> Expectation of eta: 0.391 0.151
#>
#> 0Oo----------- Convergence achieved -----------oO0
plot(LnGM)
For the random slopes (subject2), the posterior means of the mixing variables \(V\) are very close to 1, and \(\eta\) is close to 0, suggesting that a non-Gaussian model for this component is not necessary.
We show next the Bayes factor:
exp(LnGM@LGM$mlik[2] - LGM$mlik[2])
#> [1] 6.862794
SAR model (using manual.configs)
Here we demonstrate how to fit LnGM models currently unavailable in
ngvb
or inla
using the rgeneric functionality
of inla
and the manual.configs
argument of
ngvb
.
We study here areal data, which consists of the number of residential
burglaries and vehicle thefts per thousand households (\(y_i\)) in 49 counties of Columbus, Ohio, in
1980. This dataset can be found in the spdep
package. We
consider the following model:
\[ y_{i}= \beta_0 + \beta_1 \mathrm{HV}_i + \beta_2 \mathrm{HI}_i + \sigma_{\mathbf{x}}x_i + \sigma_{\epsilon}\epsilon_i, \]
where \(\mathrm{HV}_i\) and \(\mathrm{HI}_i\) are the average household value and household income for county \(i\), and \(\mathbf{x}\) accounts for structured spatial effects, while \(\epsilon_i \overset{i.i.d}{\sim} N(0,1)\) is an unstructured spatial effect.
We consider a simultaneous autoregressive (SAR) model for the spatially structured effects \(\mathbf{x}\). The Gaussian version of this model can be built from the following system \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{Z}\), where \(\mathbf{D}_{SAR}=\mathbf{I}-\rho\mathbf{W}\). \(\mathbf{W}\) is a row standardized adjacency matrix and \(-1<\rho<1\). The equivalent model driven by NIG noise is then \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{\Lambda}\), where \(\mathbf{\Lambda}\) is i.i.d. standardized NIG noise.
Gaussian SAR models are not implemented in INLA, so we have to use
the rgeneric
or cgeneric
functionalities
(Gómez-Rubio (2020)). We start by loading
and preparing the data.
# Required package names
packages <- c("spdep", "rgdal")
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) { install.packages(packages[!installed_packages]) }
# Packages loading
invisible(lapply(packages, library, character.only = TRUE))
data(columbus)
data <- columbus[,c("CRIME","HOVAL","INC")] # data
N <- nrow(data) # number of counties
data$s <- 1:N
map <- readOGR(system.file("shapes/columbus.shp", package="spData")[1]) #shape file containing the polygons
#> OGR data source with driver: ESRI Shapefile
#> Source: "C:\R\library\spData\shapes\columbus.shp", layer: "columbus"
#> with 49 features
#> It has 20 fields
#> Integer64 fields read as strings: COLUMBUS_ COLUMBUS_I POLYID
plot(map)
We construct now the row standardized adjacency matrix \(\mathbf{W}\)
nb_q <- poly2nb(map) # Construct neighbours list from polygon list
n <- length(nb_q)
W <- matrix(0, nrow=n, ncol=n)
for(i in 1:length(nb_q)){
W[i,nb_q[[i]]] <- 1
}
W <- diag(1/rowSums(W))%*%W # Row standardize adjacency matrix
eigenv <- eigen(W)$values # Eigenvalues of W. We need them to compute the log-determinant of the nb_q
We now implement the latent process \(\mathbf{x}|\mathbf{y},\mathbf{V},\sigma_x,\rho
\sim N(\mathbf{0}, \sigma_x^{-2} \mathbf{D}_{SAR}^{-1}
\text{diag}(\mathbf{V})\mathbf{D}_{SAR}^{-T})\) in
rgeneric
. The precision matrix is \(\mathbf{Q} = \tau_x \mathbf{D}_{SAR}^T
\text{diag}(\mathbf{V})^{-1}\mathbf{D}_{SAR}\) and it is defined
in the Q
function. We consider a \(\text{Unif}(0,1)\) prior on \(\rho\) and a half-normal prior on \(\tau_x\).
'inla.rgeneric.sar.model' <- function(
cmd = c("graph", "Q", "mu", "initial", "log.norm.const",
"log.prior", "quit"),
theta = NULL) {
envir = parent.env(environment())
#Internal scale is log of precision and log of kappa
interpret.theta <- function() {
return(list(prec = exp(theta[1L]),
rho = exp(theta[2L])/(1+exp(theta[2L]))))
}
graph <- function(){
return(Q())
}
Q <- function() {
param = interpret.theta()
rho <- param$rho
prec <- param$prec
D = Diagonal(n,rep(1,n)) - rho*W
return(prec*t(D)%*%Diagonal(n,1/V)%*%D)
}
mu <- function()
{
return(numeric(0))
}
log.norm.const <- function() {
param = interpret.theta()
rho <- param$rho
prec <- param$prec
log.det.D <- sum(log(1-rho*eigenv))
res <- -0.5*n*log(2*pi) + 0.5*n*log(prec) + log.det.D -0.5*sum(log(V))
return (res) }
log.prior <- function() {
param = interpret.theta()
rho <- param$rho
prec <- param$prec
tau_prec = 1/sqrt(5)
prior.theta1 <- log( (2*sqrt(tau_prec/(2*pi)))*exp(-0.5*tau_prec*prec^2) ) + theta[1L]
prior.theta2 <- log(exp(theta[2L])/(1+exp(theta[2L]))^2)
res <- prior.theta1 + prior.theta2
return(res)
}
initial <- function() {
return(c(0,0))
}
quit <- function() {
return(invisible())
}
if (!length(theta)) theta = initial()
res <- do.call(match.arg(cmd), args = list())
return(res)
}
We now create a function that receives as input the mixing vector
\(\mathbf{V}\) and outputs the inla
object. The input should be a list of numeric vectors, where each vector
corresponds to each model component to extend to non-Gaussianity (in
this example, there is just one model component). The argument
control.compute = list(config = TRUE)
in the
inla
function is required.
inla.fit.V <- function(V){
model = inla.rgeneric.define(inla.rgeneric.sar.model, n = N, V = V[[1]], W = W, eigenv = eigenv)
formula <- CRIME ~ 1 + HOVAL + INC + f(s, model = model)
fit <- inla(formula,
data = data,
control.compute = list(config = TRUE))
return(fit)
}
If \(\mathbf{V}=\mathbf{1}\) then
the latent field is Gaussian and inla.fit.V
gives us the
LGM.
LGM <- inla.fit.V(list(rep(1,N)))
#> Warning in inla.model.properties.generic(inla.trim.family(model), mm[names(mm) == : Model 'rgeneric' in section 'latent' is marked as 'experimental'; changes may appear at any time.
#> Use this model with extra care!!! Further warnings are disabled.
LGM$summary.fixed
#> mean sd 0.025quant 0.5quant 0.975quant mode
#> (Intercept) 60.0654513 6.34655354 46.8598297 60.2593757 72.0097849 60.5304478
#> HOVAL -0.3038646 0.09423473 -0.4892529 -0.3039573 -0.1179609 -0.3041503
#> INC -0.9481599 0.37259524 -1.6913821 -0.9444518 -0.2239064 -0.9358919
#> kld
#> (Intercept) 1.869492e-07
#> HOVAL 1.006507e-08
#> INC 1.126924e-08
We now define the LnGM in ngvb
using the
manual.configs
argument. manual.configs
should
be a list containing inla.fit.V
. It should also include
Dfunc
which is a list containing the functions \(\mathbf{D}_{SAR}(\boldsymbol{\theta})\) for
each model component, that receive as input all hyperparameters \(\boldsymbol{\theta}\) in internal scale,
and outputs \(\mathbf{D}_{SAR}(\boldsymbol{\theta})\).
Finally, manual.configs
also needs h
, a list
containing all predefined constant vectors (see Cabral, Bolin, and Rue (2022)) for each model
component.
D1 <- function(theta){
prec <- exp(theta[2L])
rho <- exp(theta[3L])/(1+exp(theta[3L]))
return(sqrt(prec)*(Diagonal(N,rep(1,N)) - rho*W))
}
Dfunc <- list(D1)
h <- list(rep(1,N))
manual.configs <- list(inla.fit.V = inla.fit.V, Dfunc = Dfunc, h = h)
We run the diagnostic plots first as follows.
check.list <- ng.check(LGM, Dfunc, h)
#> Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> $plot.ranking
#>
#> $plot.ref
#> $plot.ref[[1]]
The last plots identify an outlying county. To fit the LnGM, we now
call ngvb
, and analyze the output.
LnGM <- ngvb(manual.configs = manual.configs, selection = list(s = 1:49),
iter = 10, d.sampling = TRUE, n.sampling = 1000)
#>
#> 0Oo----------- Iteration 1 -----------oO0
#> Initial value of eta: 0.5
#>
#> 0Oo----------- Iteration 2 -----------oO0
#> Expectation of eta: 0.36
#>
#> 0Oo----------- Iteration 3 -----------oO0
#> Expectation of eta: 0.541
#>
#> 0Oo----------- Iteration 4 -----------oO0
#> Expectation of eta: 0.605
#>
#> 0Oo----------- Iteration 5 -----------oO0
#> Expectation of eta: 0.655
#>
#> 0Oo----------- Iteration 6 -----------oO0
#> Expectation of eta: 0.666
#>
#> 0Oo----------- Iteration 7 -----------oO0
#> Expectation of eta: 0.645
#>
#> 0Oo----------- Iteration 8 -----------oO0
#> Expectation of eta: 0.672
#>
#> 0Oo----------- Iteration 9 -----------oO0
#> Expectation of eta: 0.681
#>
#> 0Oo----------- Iteration 10 -----------oO0
#> Expectation of eta: 0.687
#>
#> 0Oo----------- Convergence achieved -----------oO0
LnGM@LGM$summary.fixed
#> mean sd 0.025quant 0.5quant 0.975quant mode
#> (Intercept) 58.2211212 5.38888939 46.9580137 58.4203214 68.29904011 58.7650042
#> HOVAL -0.1663135 0.07726016 -0.3180045 -0.1665085 -0.01352487 -0.1668969
#> INC -1.1950034 0.30496795 -1.8095714 -1.1893868 -0.61022283 -1.1768686
#> kld
#> (Intercept) 7.312655e-08
#> HOVAL 1.080458e-08
#> INC 2.704894e-08
We can see in plot(LnGM)
that more flexibility was added
in 2 counties.
plot(LnGM)
The Bayes factor is:
exp(LnGM@LGM$mlik[2]-LGM$mlik[2])
#> [1] 111510.8