An introduction to the ngvb package

Introduction

Latent Gaussian models (LGMs) assume that the latent variables \(\mathbf{x}^G\) follow a Gaussian random vector with mean \(\mathbf{\mu}\) and precision matrix \(\mathbf{Q}= \mathbf{\mathbf{D}}^T \mathbf{D}\). It can be expressed through \[\begin{equation}\label{eq:gaussian} \mathbf{D}(\mathbf{x}^G -\mathbf{\mu})\overset{d}{=} \mathbf{Z}, \end{equation}\] where \(\mathbf{Z}\) is a vector of i.i.d. standard Gaussian variables. Models of this type can be fitted with inla. The non-Gaussian extension for \(\mathbf{x}^G\) consists in replacing the driving noise distribution: \[\begin{equation}\label{eq:frame} \mathbf{D}(\mathbf{x}-\mathbf{\mu})\overset{d}{=} \mathbf{\Lambda}, \end{equation}\] where \(\boldsymbol{\Lambda}\) is a vector of independent and standardized normal-inverse Gaussian (NIG) random variables that depend on the parameter \(\eta\), which controls the leptokurtosis. This extension leads to latent non-Gaussian models (LnGMs). The package ngvb is designed to fit these models. ngvb is based on a variational Bayes algorithm. It leverages the fact that the conditioned latent field \(\mathbf{x}|\mathbf{V}\) is normally distributed where \(\mathbf{V}\) are inverse-Gaussian mixing variables that regulate the extra flexibility of the model.

We assume the user is familiar with the INLA package (Gómez-Rubio (2020)). Latent non-Gaussian models are discussed in Cabral, Bolin, and Rue (2022), and a Bayesian implementation of LnGMs in Stan can be found in rafaelcabral96.github.io/nigstan/. We present next a series of examples on how to use the ngvb package.

RW1 model

Here we fit an RW1 latent process to the jumpts time series.

library(ngvb)
#> Loading required package: INLA
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loading required package: parallel
#> Loading required package: sp
#> This is INLA_22.06.03 built 2022-06-04 11:18:55 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.
#> Loading required package: GIGrvg

plot(jumpts)

The model is the following:

\[ y_i = \sigma_x x_i + \sigma_y \epsilon_i, \]

where \(\epsilon_i\) is standard Gaussian noise, and \(\mathbf{x}=[x_1,\dotsc,x_{100}]^T\) follows a non-Gaussian RW1 prior, defined by \(x_{i}= x_{i-1} + \Lambda_i(\eta), i=2,\dotsc,100\), where \(\Lambda_i(\eta)\) is normal-inverge Gaussian (NIG) noise with non-Gaussianity parameter \(\eta\) (\(\eta=0\) leads to the Gaussian model).

First, we start by fitting a latent Gaussian model (LGM), where the driving noise of the latent field is normally distributed:

formula <- y ~ -1 + f(x,  model = "rw1") 
LGM     <- inla(formula, 
                data = jumpts,
                control.compute = list(config = TRUE))

We check the adequacy of the latent Gaussian assumption for this model and data through the ng.check function, which provides a set of diagnostic tools. It plots \(d_i(\mathbf{y}), \ i=1,\dotsc,n\), the local increase (for small \(\eta\)) in the Bayes factor (BF) when we make the latent field non-Gaussian. High values of \(d_i(\mathbf{y})\) for a particular index \(i\) indicate that making the driving noise \(\Lambda_i(\eta)\) non-Gaussian increases the Bayes Factor, and thus indicating the lack of flexibility of the latent Gaussian assumption at that location. Notice that the two sudden jumps are detected in the next plot.

It also plots the overall BF sensitivity \(s_0(\mathbf{y}) = \sum_i d_i(\mathbf{y})\) (red line) against its reference distribution. A high mismatch suggests that there could be non-Gaussian features in the observed data that are better modelled by an LnGM.

check.list <- ng.check(LGM)
#> Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> $plot.ranking

#> 
#> $plot.ref
#> $plot.ref[[1]]

The diagnostic plots suggest we could benefit from fitting an LnGM. Next, we consider the non-Gaussian version of the previous LGM model, where the latent field is driven by NIG noise. We use the output of the inla function LGM as the input of the ngvb function. By default, the leptokurtosis parameter \(\eta\) has an exponential prior with rate parameter equal to 1. This can be changed with the argument alpha.eta. Run ?ngvb to see the other arguments.

LnGM <- ngvb(fit = LGM)
#> [1] "Warning: You need to add log(|D|) to mll: check inla.doc(generic0)"
#> 
#> 0Oo----------- Iteration 1 -----------oO0
#> Initial value of eta: 0.5
#> 
#> 0Oo----------- Iteration 2 -----------oO0
#> Expectation of eta: 4.878
#> 
#> 0Oo----------- Iteration 3 -----------oO0
#> Expectation of eta: 3.952
#> 
#> 0Oo----------- Iteration 4 -----------oO0
#> Expectation of eta: 3.423
#> 
#> 0Oo----------- Iteration 5 -----------oO0
#> Expectation of eta: 3.06
#> 
#> 0Oo----------- Iteration 6 -----------oO0
#> Expectation of eta: 2.747
#> 
#> 0Oo----------- Iteration 7 -----------oO0
#> Expectation of eta: 2.616
#> 
#> 0Oo----------- Iteration 8 -----------oO0
#> Expectation of eta: 2.476
#> 
#> 0Oo----------- Iteration 9 -----------oO0
#> Expectation of eta: 2.427
#> 
#> 0Oo----------- Iteration 10 -----------oO0
#> Expectation of eta: 2.398
#> 
#>  0Oo----------- Maximum number of iterations reached -----------oO0

LnGM is an S4 object of class ngvb.list containing the outputs, which can be accessed with LnGM@.... Available methods are summary, print, plot, fitted, and simulate. To read the help functions, run:

#?`summary,ngvb.list-method`
#?`print,ngvb.list-method`
#?`plot,ngvb.list,missing-method`
#?`fitted,ngvb.list-method`
#?`simulate,ngvb.list-method`

We can see in plot(LnGM) that the two jump locations were picked up by the mixing variables \(V\) (which add more flexibility on those locations), and the evolution of the parameter \(\eta\) seems to indicate convergence.

plot(LnGM)

LnGM@LGM contains the inla object containing summaries and marginals of the LGM approximation of \((x,\theta)\) for the last iteration. We show next the Bayes factor:

exp(LnGM@LGM$mlik[2] - LGM$mlik[2])
#> [1] 1.760063e+15

Random slople, random intercept model (several model components)

We fit here growth curves from an orthodontic study including several male and female children at ages 8,10,12, and 14. For more information and references, run ?mice::potthoffroy. Next, we show a Trellis plot, where female subjects have labels starting with F and male subjects starting with M.

plot(Orthodont.plot, layout = c(16,2))

We use the following linear mixed-effects model to describe the response growth with age:

\[ y_{i j}=\beta_0+\delta_0 I_i(F) +\left(\beta_1+\delta_1 I_i(F)\right) t_j + b_{0 i}+b_{1 i} t_j+\epsilon_{i j}, \]

where \(y_{i j}\) denotes the response for the \(i\)th subject at age \(t_j\), \(i=1, \ldots, 27\) and \(j=1, \ldots, 4\) ; \(\beta_0\) and \(\beta_1\) denote, respectively, the intercept and the slope fixed effects for boys; \(\delta_0\) and \(\delta_1\) denote, respectively, the difference in intercept and slope fixed effects between girls and boys; \(I_i(F)\) denotes an indicator variable for females; \(\mathbf{b}_i=\left(b_{0 i}, b_{1 i}\right)\) is the random effects vector for the \(i\) th subject; and \(\epsilon_{i j}\) is the within-subject error.

The dataset is:

summary(Orthodont)
#>     subject       Female            time          value             tF       
#>  Min.   : 1   Min.   :0.0000   Min.   :0.00   Min.   :16.50   Min.   :0.000  
#>  1st Qu.: 7   1st Qu.:0.0000   1st Qu.:0.75   1st Qu.:22.00   1st Qu.:0.000  
#>  Median :14   Median :0.0000   Median :1.50   Median :23.75   Median :0.000  
#>  Mean   :14   Mean   :0.4074   Mean   :1.50   Mean   :24.02   Mean   :1.019  
#>  3rd Qu.:21   3rd Qu.:1.0000   3rd Qu.:2.25   3rd Qu.:26.00   3rd Qu.:2.000  
#>  Max.   :27   Max.   :1.0000   Max.   :3.00   Max.   :31.50   Max.   :4.000  
#>     subject2 
#>  Min.   : 1  
#>  1st Qu.: 7  
#>  Median :14  
#>  Mean   :14  
#>  3rd Qu.:21  
#>  Max.   :27

If the random effects have a normal prior, the previous model is an LGM that can be fitted in INLA. The random intercept is elicited in formula by f(subject, model = "iid"), and the random slopes by f(subject2, time, model = "iid"). The covariates subject1 and subject2 are the same since the f()-terms in INLA should depend on the unique names of the covariates.

formula <- value ~ 1 + Female + time + tF + f(subject, model = "iid") + f(subject2, time, model = "iid")

LGM <- inla(formula,
            data = Orthodont,
            control.compute = list(config = TRUE))

We now run the ng.check function to generate diagnostic plots regarding the latent Gaussianity assumption.

check.list <- ng.check(LGM)
#> Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> $plot.ranking

#> 
#> $plot.ref
#> $plot.ref[[1]]

#> 
#> $plot.ref[[2]]

The last two plots do not show a large mismatch between \(s_0(\mathbf{y})\) and its reference distribution. The sensitivity measures of the fixed effects relative to the latent Gaussianity assumption for the random intercepts (subject) and random slopes (subject2) is shown next. These are small compared to the posterior standard deviation of the fixed effects.

check.list$sens.fixed.matrix
#>            (Intercept)        Female          time           tF
#> subject  -6.467191e-01  8.005355e-01 -4.748893e-05 1.165662e-04
#> subject2 -9.455233e-11 -2.395009e-07 -2.940273e-07 2.397182e-07

However, for illustration purposes, we still fit an LnGM. We consider the prior random effects to be distributed according to long-tailed NIG variables to accommodate possible outliers in the intercepts and slopes.

LnGM <- ngvb(LGM)
#> 
#> 0Oo----------- Iteration 1 -----------oO0
#> Initial value of eta: 0.5 0.5
#> 
#> 0Oo----------- Iteration 2 -----------oO0
#> Expectation of eta: 0.302 0.13
#> 
#> 0Oo----------- Iteration 3 -----------oO0
#> Expectation of eta: 0.351 0.152
#> 
#> 0Oo----------- Iteration 4 -----------oO0
#> Expectation of eta: 0.401 0.151
#> 
#> 0Oo----------- Iteration 5 -----------oO0
#> Expectation of eta: 0.388 0.152
#> 
#> 0Oo----------- Iteration 6 -----------oO0
#> Expectation of eta: 0.391 0.151
#> 
#>  0Oo----------- Convergence achieved -----------oO0

plot(LnGM)

For the random slopes (subject2), the posterior means of the mixing variables \(V\) are very close to 1, and \(\eta\) is close to 0, suggesting that a non-Gaussian model for this component is not necessary.

We show next the Bayes factor:

exp(LnGM@LGM$mlik[2] - LGM$mlik[2])
#> [1] 6.862794

SAR model (using manual.configs)

Here we demonstrate how to fit LnGM models currently unavailable in ngvb or inla using the rgeneric functionality of inla and the manual.configs argument of ngvb.

We study here areal data, which consists of the number of residential burglaries and vehicle thefts per thousand households (\(y_i\)) in 49 counties of Columbus, Ohio, in 1980. This dataset can be found in the spdep package. We consider the following model:

\[ y_{i}= \beta_0 + \beta_1 \mathrm{HV}_i + \beta_2 \mathrm{HI}_i + \sigma_{\mathbf{x}}x_i + \sigma_{\epsilon}\epsilon_i, \]

where \(\mathrm{HV}_i\) and \(\mathrm{HI}_i\) are the average household value and household income for county \(i\), and \(\mathbf{x}\) accounts for structured spatial effects, while \(\epsilon_i \overset{i.i.d}{\sim} N(0,1)\) is an unstructured spatial effect.

We consider a simultaneous autoregressive (SAR) model for the spatially structured effects \(\mathbf{x}\). The Gaussian version of this model can be built from the following system \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{Z}\), where \(\mathbf{D}_{SAR}=\mathbf{I}-\rho\mathbf{W}\). \(\mathbf{W}\) is a row standardized adjacency matrix and \(-1<\rho<1\). The equivalent model driven by NIG noise is then \(\mathbf{D}_{SAR}\mathbf{x} = \sigma_{\mathbf{x}}\mathbf{\Lambda}\), where \(\mathbf{\Lambda}\) is i.i.d. standardized NIG noise.

Gaussian SAR models are not implemented in INLA, so we have to use the rgeneric or cgeneric functionalities (Gómez-Rubio (2020)). We start by loading and preparing the data.

# Required package names
packages <- c("spdep", "rgdal") 

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {  install.packages(packages[!installed_packages]) }

# Packages loading
invisible(lapply(packages, library, character.only = TRUE))

data(columbus)
data    <- columbus[,c("CRIME","HOVAL","INC")]       # data
N       <- nrow(data)                                   # number of counties
data$s  <- 1:N 
map     <- readOGR(system.file("shapes/columbus.shp", package="spData")[1]) #shape file containing the polygons
#> OGR data source with driver: ESRI Shapefile 
#> Source: "C:\R\library\spData\shapes\columbus.shp", layer: "columbus"
#> with 49 features
#> It has 20 fields
#> Integer64 fields read as strings:  COLUMBUS_ COLUMBUS_I POLYID
plot(map)

We construct now the row standardized adjacency matrix \(\mathbf{W}\)

nb_q <- poly2nb(map)              # Construct neighbours list from polygon list
n <- length(nb_q)
W <- matrix(0, nrow=n, ncol=n)
for(i in 1:length(nb_q)){
  W[i,nb_q[[i]]] <- 1 
}
W    <- diag(1/rowSums(W))%*%W    # Row standardize adjacency matrix 
eigenv    <- eigen(W)$values      # Eigenvalues of W. We need them to compute the log-determinant of the nb_q

We now implement the latent process \(\mathbf{x}|\mathbf{y},\mathbf{V},\sigma_x,\rho \sim N(\mathbf{0}, \sigma_x^{-2} \mathbf{D}_{SAR}^{-1} \text{diag}(\mathbf{V})\mathbf{D}_{SAR}^{-T})\) in rgeneric. The precision matrix is \(\mathbf{Q} = \tau_x \mathbf{D}_{SAR}^T \text{diag}(\mathbf{V})^{-1}\mathbf{D}_{SAR}\) and it is defined in the Q function. We consider a \(\text{Unif}(0,1)\) prior on \(\rho\) and a half-normal prior on \(\tau_x\).

'inla.rgeneric.sar.model' <- function(
    cmd = c("graph", "Q", "mu", "initial", "log.norm.const",
            "log.prior", "quit"),
    theta = NULL) {
  
  envir = parent.env(environment())
  
  #Internal scale is log of precision and log of kappa
  interpret.theta <- function() {
    return(list(prec = exp(theta[1L]),
                rho = exp(theta[2L])/(1+exp(theta[2L]))))
  }
  
  
  graph <- function(){
    return(Q())
  }
  
  Q <- function() { 
    param = interpret.theta()
    rho <- param$rho
    prec  <- param$prec
    
    D = Diagonal(n,rep(1,n)) - rho*W
    
    return(prec*t(D)%*%Diagonal(n,1/V)%*%D)
  }
  
  mu <- function()
  {
    return(numeric(0))
  }
  
  log.norm.const <- function() {
    
    param = interpret.theta()
    rho <- param$rho
    prec  <- param$prec
    
    log.det.D <- sum(log(1-rho*eigenv))
    
    res <- -0.5*n*log(2*pi) + 0.5*n*log(prec) + log.det.D -0.5*sum(log(V))
    return (res)  }
  
  

  log.prior <- function() {
    
    param = interpret.theta()
    rho <- param$rho
    prec  <- param$prec
    
    tau_prec = 1/sqrt(5)  
    
    prior.theta1 <- log( (2*sqrt(tau_prec/(2*pi)))*exp(-0.5*tau_prec*prec^2) ) + theta[1L]
    prior.theta2 <- log(exp(theta[2L])/(1+exp(theta[2L]))^2)
    
    res <-  prior.theta1 + prior.theta2
    
    return(res)
  }
  
  
  initial <- function() {
    return(c(0,0))
  }
  
  
  quit <- function() {
    return(invisible())
  }
  
  if (!length(theta)) theta = initial()
  res <- do.call(match.arg(cmd), args = list())
  return(res)
}

We now create a function that receives as input the mixing vector \(\mathbf{V}\) and outputs the inla object. The input should be a list of numeric vectors, where each vector corresponds to each model component to extend to non-Gaussianity (in this example, there is just one model component). The argument control.compute = list(config = TRUE) in the inla function is required.

inla.fit.V <- function(V){
  
  model = inla.rgeneric.define(inla.rgeneric.sar.model, n = N, V = V[[1]], W = W, eigenv = eigenv)

  formula <- CRIME ~ 1 + HOVAL + INC  + f(s, model = model)

  fit <- inla(formula, 
              data = data,
              control.compute = list(config = TRUE))
  
  return(fit)
}

If \(\mathbf{V}=\mathbf{1}\) then the latent field is Gaussian and inla.fit.V gives us the LGM.

LGM <- inla.fit.V(list(rep(1,N)))
#> Warning in inla.model.properties.generic(inla.trim.family(model), mm[names(mm) == : Model 'rgeneric' in section 'latent' is marked as 'experimental'; changes may appear at any time.
#>   Use this model with extra care!!! Further warnings are disabled.

LGM$summary.fixed
#>                   mean         sd 0.025quant   0.5quant 0.975quant       mode
#> (Intercept) 60.0654513 6.34655354 46.8598297 60.2593757 72.0097849 60.5304478
#> HOVAL       -0.3038646 0.09423473 -0.4892529 -0.3039573 -0.1179609 -0.3041503
#> INC         -0.9481599 0.37259524 -1.6913821 -0.9444518 -0.2239064 -0.9358919
#>                      kld
#> (Intercept) 1.869492e-07
#> HOVAL       1.006507e-08
#> INC         1.126924e-08

We now define the LnGM in ngvb using the manual.configs argument. manual.configs should be a list containing inla.fit.V. It should also include Dfunc which is a list containing the functions \(\mathbf{D}_{SAR}(\boldsymbol{\theta})\) for each model component, that receive as input all hyperparameters \(\boldsymbol{\theta}\) in internal scale, and outputs \(\mathbf{D}_{SAR}(\boldsymbol{\theta})\). Finally, manual.configs also needs h, a list containing all predefined constant vectors (see Cabral, Bolin, and Rue (2022)) for each model component.

D1  <- function(theta){
  prec   <- exp(theta[2L]) 
  rho    <- exp(theta[3L])/(1+exp(theta[3L]))
  return(sqrt(prec)*(Diagonal(N,rep(1,N)) - rho*W))
}
Dfunc <- list(D1)

h         <- list(rep(1,N))

manual.configs <- list(inla.fit.V = inla.fit.V, Dfunc = Dfunc, h = h)

We run the diagnostic plots first as follows.

check.list <- ng.check(LGM, Dfunc, h)
#> Warning: geom_vline(): Ignoring `mapping` because `xintercept` was provided.
#> $plot.ranking

#> 
#> $plot.ref
#> $plot.ref[[1]]

The last plots identify an outlying county. To fit the LnGM, we now call ngvb, and analyze the output.

LnGM <- ngvb(manual.configs = manual.configs, selection = list(s = 1:49),
             iter = 10, d.sampling = TRUE, n.sampling = 1000)
#> 
#> 0Oo----------- Iteration 1 -----------oO0
#> Initial value of eta: 0.5
#> 
#> 0Oo----------- Iteration 2 -----------oO0
#> Expectation of eta: 0.36
#> 
#> 0Oo----------- Iteration 3 -----------oO0
#> Expectation of eta: 0.541
#> 
#> 0Oo----------- Iteration 4 -----------oO0
#> Expectation of eta: 0.605
#> 
#> 0Oo----------- Iteration 5 -----------oO0
#> Expectation of eta: 0.655
#> 
#> 0Oo----------- Iteration 6 -----------oO0
#> Expectation of eta: 0.666
#> 
#> 0Oo----------- Iteration 7 -----------oO0
#> Expectation of eta: 0.645
#> 
#> 0Oo----------- Iteration 8 -----------oO0
#> Expectation of eta: 0.672
#> 
#> 0Oo----------- Iteration 9 -----------oO0
#> Expectation of eta: 0.681
#> 
#> 0Oo----------- Iteration 10 -----------oO0
#> Expectation of eta: 0.687
#> 
#>  0Oo----------- Convergence achieved -----------oO0

LnGM@LGM$summary.fixed
#>                   mean         sd 0.025quant   0.5quant  0.975quant       mode
#> (Intercept) 58.2211212 5.38888939 46.9580137 58.4203214 68.29904011 58.7650042
#> HOVAL       -0.1663135 0.07726016 -0.3180045 -0.1665085 -0.01352487 -0.1668969
#> INC         -1.1950034 0.30496795 -1.8095714 -1.1893868 -0.61022283 -1.1768686
#>                      kld
#> (Intercept) 7.312655e-08
#> HOVAL       1.080458e-08
#> INC         2.704894e-08

We can see in plot(LnGM) that more flexibility was added in 2 counties.

plot(LnGM)

The Bayes factor is:

exp(LnGM@LGM$mlik[2]-LGM$mlik[2])
#> [1] 111510.8

Cabral, Rafael, David Bolin, and Håvard Rue. 2022. “Controlling the Flexibility of Non-Gaussian Processes Through Shrinkage Priors.” arXiv Preprint arXiv:2203.05510.

Gómez-Rubio, Virgilio. 2020. Bayesian Inference with INLA. CRC Press.

Rafael Cabral

2023-05-15

Introduction

RW1 model

Random slople, random intercept model (several model components)

SAR model (using manual.configs)