A major limitation of many MCMC methods is their inherently sequential nature. We propose a natural extension to the Metropolis Hastings algorithm that allows for parallelising a single chain using existing MCMC samplers, while maintaining convergence to the correct stationary distribution. Our approach is generally applicable and straightforward to implement. We demonstrate how this construction may be used to greatly increase the computational speed via parallelisation of a wide variety of existing MCMC methods, including Metropolis-Adjusted Langevin Algorithms and Adaptive MCMC. Furthermore we show how it allows for a principled way of utilising every integration step within Hamiltonian based MCMC methods, resulting in increased accuracy of Monte Carlo estimates with minimal extra computational cost.

## Archive for the ‘MCMSki IV abstract’ Category

### Ben Calderhead (Imperial College London): Parallel Monte Carlo with a Single Markov Chain

January 6, 2014### Winfried Barta (George Washington University): Forgetting the Starting State in Interacting Tempering

January 4, 2014We consider the interacting tempering algorithm, a simplified version of the equi-energy sampler by Kou, Zhou and Wong (2006). As a first step towards a quantitative and non-asymptotic analysis of its convergence behavior, we show that under (easy to verify) assumptions on the distribution of interest, the interacting tempering process rapidly forgets its starting state. The result applies, among others, to exponential random graph models, the Ising and Potts models (in mean field or on a bounded degree graph), as well as (Edwards-Anderson) Ising spin glasses.

To the extent we believe that some of these distributions are hard to (approximately) simulate, the result suggests that in interacting tempering, at least for some distributions, forgetting the starting state might happen a lot earlier than convergence to the limiting distribution. For bounding the mixing time of the interacting tempering algorithm, the result allows us assume that the process started in a state drawn from the limiting distribution.

### Tatiana Xifara (University of California, Santa Cruz): Diffusions with position dependent volatility and the Metropolis adjusted Langevin algorithm

December 31, 2013**Keywords:**

Langevin diffusions

### Marco Banterle (Université Paris-Dauphine): Sufficient dimension reduction for ABC via RKHS

December 19, 2013How to choose the “right” summary of the data and thus how to minimize the information loss in reducing the dimension of the problem from the observed y to s=S(y) is still an open and essential matter, especially for Approximate Bayesian Computation (ABC) which relies on the approximation p(θ|s) ~ p(θ|y).

It has been shown that aiming for sufficiency is not a trivial matter and hence is usual to resort to a collection of hopefully quasi-sufficient statistics, typically dependent on the problem, under the condition that this collection lies in a lower dimensional space to avoid the so-called curse of dimensionality.

Following the idea underling dimension reduction techniques for ABC like Partial Least Squares, Wegman et al. (2009), we derive a procedure to select a sufficient dimension reduction subset u of the vector s by characterizing it as the minimal subset such that the distribution of θ is independent of s given u.

PLS however operate only in the original space and the independence condition in their criterion is in fact replaced with an uncorrelation requirement. This simplification is known to fail outside Gaussian models, an assumption which is easily unsatisfied in complicate models where ABC is needed. We thus resort to kernel methods by examining the conditional independence requirement through mean embedding in a RKHS as in Fukumizu et al. (2008) and Zhang et al. (2012) where linear relations are known to represent non-linear dependencies in the original space.

A proper testing procedure and a greedy procedure are compared in various simulation studies.

### Merrill Liechty (Drexel University): Multivariate Sufficient Statistics Using Kronecker Products

December 13, 2013We present a multivariate sufficient statistic using Kronecker products that dramatically increases computational efficiency in evaluating likelihood functions and/or posterior distributions. In particular, we examine the example of multivariate regression in a Bayesian setting. We compare the computation time for using the Gibbs sampler both with and without the sufficient statistic, and we show that the difference in computation time grows quadratically with the number of covariates and products and linearly with the number of individuals. In the simulation study, speedup factors ranging from at least 20 times to almost 300 times were observed when using the Kronecker sufficient statistic.

**Keywords** Multivariate Summary Statistic · Multivariate Normal Distribution · Inverse-Wishart Distribution · Computationally Efficient Sufficient Statistic · Bayesian Regression

Joint work with Tibbits, Matt (U.S. Department of Defense)

### Jegar Pitchforth (Queensland University of Technology): Combining Complex Systems Models for Better International Passenger Processing Operations

December 11, 2013Every year over twenty million passengers travel into Australian territory, and the primary channel for their arrival is airport terminals. Given the stringent security and biosecurity regulations required to protect Australia, national border agencies face a significant challenge in processing international passengers within internationally mandated times. To understand this problem, a system of complex systems models has been developed to describe the behaviour of the terminal.

A Bayesian Network Modelling approach was adopted to describe the causal structure of the system, based on existing metric frameworks. While this approach overcame a number of challenges such as modelling the terminal at the flight level and coping with stochastic variability in processing times, including a temporal dimension in the model necessitated the use of other system interpretations that could more appropriately describe system behaviour over time.

By introducing alternative descriptions of the terminal behaviour, we also raise questions as to how to integrate two or more models of a system given there is an intersection in their required data. A number of options are presented along with an overview of how the current work fits with the larger research project, and how this work will be used in the future.

** Keywords**: Bayesian Network; systems modelling; operations research; airport; model integration; processing

### Xiyun Jiao (Imperial College London): Combining the Marginal Data Augmentation and the Ancillarity-Sufficiency Interweaving Strategy to Further Improve the Convergence of Data Augmentation Algorithm

December 10, 2013The Data Augmentation (DA) algorithm is frequently used in posterior sampling, especially when the model has a hierarchical structure containing latent variables. Marginal Data Augmentation (MDA) and Ancillarity-Sufficiency Interweaving Strategy (ASIS) are both effective tools to improve the convergence of DA. MDA introduces a working parameter α in the augmented data model that is not identifiable under the observed data model. Although the resulting joint chain may not be positive recurrent, the marginal chain for the parameter can mix very efficiently. ASIS considers a pair of special DA schemes: the so-called sufficient and ancillary schemes. If the sampler corresponding to one of these is fast, the sampler corresponding to the other is typically slow. The ASIS strategy can often substantially outperform both of the parent DA samplers. We propose to combine these strategies to further improve efficiency. We apply this combining strategy on a factor analysis model to show its effect. The intuition of combining MDA and ASIS in this factor analysis model is that MDA is efficient in improving the convergence of the variance parameter, ∑, but has very little effect on the factor loading matrix, β. ASIS can result in samplers with much better convergence properties, but ancillary and sufficient augmentations can be difficult or impossible to derive in complex models unless we condition on a subset of the parameters. Thus, we consider handling ∑ with MDA, and deriving ASIS for β given Σ. By combining MDA and ASIS in this way, we improve the convergence of both ∑ and β significantly. Furthermore, we can guarantee the subchain {(β^{(t)},∑^{(t)}),t≥0} resulting from the combining strategy converges to the target distribution, i.e., p(β,∑|Y).

*Joint work with David van Dyk (Imperial College London).*

### Ari Pakman (Columbia University): Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions

December 8, 2013We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest. An extension of this idea to distributions over mixtures of binary and possibly-truncated Gaussian or exponential variables allows us to sample from posteriors of linear and probit regression models with spike-and-slab priors and truncated parameters. We illustrate the advantages of these algorithms in several examples in which they outperform the Metropolis or Gibbs samplers.

**Keywords:** Hamiltonian Monte Carlo, binary distributions, spike-and-slab

**Link:** arxiv:1311.2166

### Gavin Band (University of Oxford): Bayesian analysis of genetic association with severe malaria

December 7, 2013Malaria is an infectious disease prevalent in sub-Saharan Africa, south-east Asia and elsewhere, causing high rates of morbidity and an estimated 600,000 deaths each year, most of which are among young children in Africa. Although several genetic susceptibility loci are known, much remains to be discovered about both host and parasite genetics of the disease. As part of the MalariaGEN consortium, we describe a Bayesian analysis of human genetic loci associated with severe P.falciparum malaria, using 10,000 cases and 15,000 controls from 10 study sites located across sub-Saharan Africa. By choosing suitable prior distributions, we build and test a model of association accounting for uncertainty in the level of heterogeneity of effect between study sites and between severe malaria subtypes, as well as uncertainty in the mode of inheritance. We present evidence that there is substantial heterogeneity of effect at at least two well-known susceptibility loci. By making use of approximate Bayes factors and conjugate priors, we scale this analysis up across millions of SNPs, applying it to a genome-wide association analysis of severe malaria in Africa.

**Keywords:** malaria, Meta-analysis, Genome-wide association study.

### Anne Sabourin (Telecom ParisTech): Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization

December 7, 2013The probabilistic framework of extreme value theory is well-known: the dependence structure of large events is characterized by an angular measure on the positive orthant of the unit sphere. The family of these angular measures is non-parametric by nature. Nonetheless, any angular measure may be approached arbitrarily well by a mixture of Dirichlet distributions.

The semi-parametric Dirichlet mixture model for angular measures ( Boldi and Davison, 2007) is theoretically valid in arbitrary dimension, but the original parametrization is subject to a moment constraint making Bayesian inference very challenging in dimension greater than three. A new unconstrained parametrization is proposed. This allows for a natural prior specification as well as a simple implementation of a reversible-jump MCMC. Posterior consistency and ergodicity of the Markov chain are verified and the algorithm is tested up to dimension five. In this non identifiable setting, convergence monitoring is performed by integrating the sampled angular densities against Dirichlet test functions.

*Joint work with Philippe Naveau.*

**Reference :** Sabourin, A. , Naveau, P., “Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization”, to appear in *Computational Statistics and Data Analysis*.

### Jamie Owen (Newcastle University, UK): Scalable inference for intractable Markov processes using ABC and pMCMC

December 5, 2013Bayesian inference for non-linear stochastic processes has become of increasing interest in recent years. Problems of this type typically have intractable likelihoods, and prior knowledge about model rate parameters is often poor. MCMC techniques can lead to ‘exact’ inference for such models, but in practice can suffer performance issues such as long burn-in periods, slow mixing and poor amenability to parallelisation. On the other hand approximate Bayesian computation (ABC) techniques can allow rapid, concurrent exploration of a large parameter space but yield only approximate posterior distributions. Here we consider the combined use of ABC and more standard MCMC techniques for improved computational efficiency which still allow ‘exact’ posterior inference and effective use of parallel hardware.

**Keywords:** approximate Bayesian computation, ABC, MCMC, Markov processes, stochastic networks

### Paul Birrell (MRC, Cambridge): Efficient real-time statistical modelling for pandemic influenza

December 5, 2013During the 2009 A/H1N1pdm outbreak, much attention was devoted to capturing the dynamics of the epidemic through real-time modelling. The goal was to provide up-to-the-moment assessments of the state of the epidemic at any time, as well as predictions of its future course based upon streams of information updated at regular intervals. In the UK, existing and expanding surveillance consists of a multiplicity of data sources, typically noisy and providing only indirect evidence on the epidemic characteristics of interest. Models capable of reconstructing a pandemic on the basis of this type of information are, therefore, necessarily complex, as they need to link the unobserved transmission process to the intricate mechanisms generating the observed data (e.g. healthcare-seeking behaviour and reporting). As the volume and type of data expand, so does the model complexity and the attendant computational burden, limiting the capacity for real-time inference. This problem is exaggerated when multiple model runs are required to adapt the model to sudden changes in data patterns due, for example, to modifications in population behaviour following an intervention.

Here we extend the modelling of Birrell et al. 2011, focussing on the capability to perform real-time inference. Originally, the model was implemented within the Bayesian statistical framework using Markov Chain-Monte Carlo (MCMC). The real-time utility of MCMC is limited by its requirement to consider all relevant data in their entirety each time the analysis is iterated. We investigate sequential methods for Bayesian analysis that form a hybrid of MCMC and particle filtering that prove capable of drastically reducing the required computation time without losing the intrinsic accuracy of the “gold-standard” MCMC techniques. We illustrate this using both simulated data and data from the 2009 pandemic in England, highlighting how reconstructions and projections of the epidemic curve evolve over the course of the epidemic.

**Keywords** Epidemic modelling; transmission modelling; MCMC; resample-move; particle learning; real-time modelling

### Julien Stoehr (Université Montpellier 2): ABC model choice between hidden Gibbs random fields based on geometric summary statistics

December 2, 2013**G**ibbs random fields are polymorphous statistical models that are useful to analyse different types of spatially correlated data like pixels of an image. But selecting between two different dependence structures can be very challenging, due to the intractable normalizing constant in the Gibbs likelihoods. Approximate Bayesian Computation (ABC) algorithms provide a model choice method in the Bayesian paradigm. The scheme compares the observed data and many numerical simulations through summary statistics. The consistency of the ABC algorithm is obvious if the summary statistics are sufficient. Otherwise summary statistics has to be chosen carefully [2, 3]. When the Gibbs random field is directly observed, Grelaud et al. [1] exhibited sufficient summary statistics. But, when the random field is hidden, those statistics are not sufficient anymore. We provide new summary statistics based on pixels clusters, and assess their efficiency for an ABC model choice between hidden random fields models.

**Keywords :** Approximate Bayesian Computation, model choice, hidden Gibbs random fields, summary statistics

*This is joint work with Pierre Pudlo and Lionel Cucala*

**References**

[1] A. Grelaud, C. P. Robert, J-M. Marin, F. Rodolphe and J-F. Taly. ABC likelihood-free methods for model choice in Gibbs random fields. *Bayesian Analysis*, 4(2) :317–336, 2009.

[2] C. P. Robert, J-M. Cornuet, J-M. Marin and N. S. Pillai. Lack of confidence in approximate Bayesian computation model choice. *Proceedings of the National Academy of Sciences of the United States of America*, 108(37) :15112–15117, 2011.

[3] J-M. Marin, N. S. Pillai, C. P. Robert and J. Rousseau. Relevant statistics for Bayesian model choice. To appear in the *Journal of the Royal Statistical Society, Series B*, 2014.

### Matthew Parno (MIT): Using multiscale structure and transport maps in MCMC for high-dimensional inverse problems

December 2, 2013In many inverse problems, the forward model “smooths” or filters the input parameters. As a result, observations of the model output can only inform certain functionals of the parameters, while complementary parts of the posterior are dominated by the prior. In high-dimensional problems, discovering this structure through direct application of MCMC can be computationally intractable.

We propose a multiscale decomposition of the inference problem that takes advantage of multiscale methods for simulating the systems under consideration—including, but not limited to, multiscale methods for the solution of certain PDEs. Multiscale methods can be interpreted as a means of identifying conditional independence structure, such that parameters of interest are conditionally independent of the observations given some intermediate coarse-scale quantities. Inference that exploits this structure, particularly when the relationship between scales is nonlinear, requires new approaches. Using tools from optimal transportation, we develop an approach for extracting a prior distribution on these coarse scale quantities (as a pushforward of the original prior) and for conditionally sampling the original parameters given the coarse-scale quantities. The resulting scheme couples conditional sampling of the parameters to a low-dimensional inference problem where typical MCMC methods can be applied. We illustrate our approach on problems having between 2 and 10000 parameters. Also, we discuss the relationship between this approach and other “subspace MCMC” methods.

**Keywords:** Bayesian Inference, Multiscale, Optimal Transport, Reduced order modeling

* This is joint work with Youssef Marzouk and Tarek Moselhy.*

### André Martins (Besançon Observatory – Institut UTINAM): Constraining the Milky Way formation and evolution with MCMC/ABC method

December 2, 2013The Besançon galaxy model (Robin et al. 2003) is an important tool in astronomy. The model simulates the stellar content of the Galaxy, assuming a scenario of formation and evolution. It is assumed that stars belong to 5 different populations: The thin disc, thick disc, halo, bulge and bar. It is dynamically self-consistent which delivers an advantage comparing to other stellar population synthesis models. Among other applications it can be used for data interpretation and to test different scenarios of galaxy formation and evolution. The Sloan Extension for Galactic Understanding and Exploration (SEGUE), one of the four survey’s of the sloan digital sky survey (SDSS), is a survey which covers a part of the sky and measure physical properties of stars in order to trace the structure, kinematics, and chemical evolution of the outer Milky Way disk and halo. The spectroscopic data includes for each star, temperature, gravity, metallicity (chemical abundance) and radial velocity in different directions close to the Galactic plane. We compare simulations from the Besançon Galaxy model with spectroscopic data, specifically metallicity, in order to constrain the radial metallicity gradient of the Milky Way. We use a MCMC-ABC method to fit the radial metallicity gradient in the thick disc, younger thin disc, older thin disc and their respective dispersions in metallicity. We will extend our observational data to other directions, by adding a new survey in the infrared, the Apache Point Observatory Galactic Evolution Experiment (APOGEE), such that we will be able to constrain the vertical metallicity gradient along with the radial metallicity gradient, using the same method.

### Yuting Chen (Ecole Centrale Paris): A comparison of Sequential Monte Carlo techniques for parameter estimation in a plant growth model

December 1, 2013Plant models have specific characteristics that make their parameter estimation difficult. Models are usually strongly nonlinear, with a strong mechanistic basis, they count many unknown parameters and often only scarce experimental data are available for model inference. Because of their mechanistic basis, there is generally some available knowledge on the biological processes at stake (owing to previous literature studies on the same processes) and it is possible to deduce some a priori distribution on the model parameters. This fact, together with the limited number of data, makes the Bayesian choice generally interesting for parameter estimation in plant growth models. In this study, we compare two approaches for Bayesian inference that are adapted to the characteristics of plant models. We first consider a MCMC algorithm implemented with Adaptive Metropolis and second, the Convolution Particle Filter (Rossi and Vila, 2006; Campillo and Rossi, 2009) which is inspired by the post-regularized particle filter (Musso and Oudjane, 1998). The performance of these two approaches are tested for the LNAS dynamic model of plant growth (Cournede et al., 2013), formalized as a general state space hidden Markov model (Cappé et al., 2005). We show that both methods perform well with virtual data, however in realistic scenarios with sparse observations, the filtering method appears more consistent. New implementations of multiple interacting MCMC techniques are thus tested to improve the performance of the MCMC approach (Campillo et al., 2009).

*Joint work with Samis Trevezas, Sonia Malefaki, Paul-Henry Cournède.*

### Kaniav Kamari (Université Paris-Dauphine): A discussion on Bayesian analysis : Applying Noninformative Priors

December 1, 2013In this paper, we discuss Bayesian statistics that is concerned with generating the posterior distribution of the unknown parameters given both the data and some prior density for the parameters. It is of interest to show that Bayesian data analysis is stable relative to different noninformative choices of prior distribution. As cited in [1], there is unintended influence on the posterior for a function of interest caused by noninformative priors on the parameters and according to these authors, this is a problem that “is often ignored in applications of Bayesian inference”. Those authors focussed on four examples and analysed them for a particular choice of prior. Here, we consider these examples and discuss the Bayesian processing of their model assuming different prior choices. The results of our analysis lead us to infer stability in the posterior distribution of all parameters . So, the effect of the chosen noninformative prior is essentially void even though there is not something like “non informative” prior.

### Simon Lacoste-Julien (INRIA, ENS, Paris): Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

December 1, 2013Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a RKHS with a potentially faster rate of convergence than Monte Carlo integration (and “kernel herding” was shown to be a special case of this procedure). In this work, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. By optimizing the position of the particles, we can obtain better accuracy than random or quasi-Monte Carlo sampling. In applications where the evaluation of the emission probabilities is expensive (such as in robot localization), the additional computational cost to generate the particles through optimization can be justified. Experiments on standard synthetic examples as well as on a robot localization task indicate indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.

*Joint work with Fredrik Lindsten and Francis Bach.*

### Marian Farah (MRC Biostatistics Unit, Cambridge, UK): A Strategy for Calibrating Time Series Epidemic Simulators

November 30, 2013*Keywords*: Multivariate time series, Emulation, Calibration, Gaussian process.

### Alain Durmus (Telecom ParisTech): New bounds for the sub-geometric convergence of Markov Chains in Wasserstein metric and application to the pre-conditionned Crank Nicholson algorithm

November 30, 2013In [1], the authors generalize the conditions of the well-known Harris theorem for the convergence of Markov Chains in the Total Variation norm (TV) to obtain geometric ergodicity in Wasserstein distance. [2] takes inspiration of [1] and [3] to establish sub-geometric ergodicity in Wasserstein metric, but does not succeed to get rates as good as in [2] for the TV. We will present improvement of these rates which are between the ones in [2] and [3]. This result is establishes by a more probabilistic reasoning than in [1] and [2], who use more analytic techniques. We apply it to the non-linear autoregressive model in and the function space MCMC algorithm : the pre-conditionned Crank-Nicolson algorithm. In particular, for the latter, we show that an simple Hölder condition on the log-density implies the sub-geometric ergodicity of the Markov chain produced by the algorithm in a Wasserstein distance, generalizing some conditions in [4] to have geometric one.

*Joint work with G. Fort and E. Moulines*

**References :**

[1] M. Hairer, J.C. Mattingly, and M. Scheutzow. Asymptotic coupling and a

general form of Harris’ theorem with applications to stochastic delay equa-

tions. *Probability Theory and Related Fields*, 149(1-2):223–259, 2011.

[2] Oleg Butkovsky. Subgeometric rates of convergence of Markov processes in

the wasserstein metric. *Annals of Applied Probability*, 2012.

[3] Randal Douc, Gersende Fort, Eric ́Moulines, and Philippe Soulier. Practical

drift conditions for subgeometric rates of convergence. *Ann. Appl. Probab.,*

14(3):1353–1377, 2004.

[4] M. Hairer, A.M. Stuart, and S.J. Vollmer. Spectral gaps for Metropolis-

Hastings algorithms in infinite dimensions.

**Keywords** : sub-geometric ergodicity, Markov Chain, pre-conditionned Crank-Nicolson algorithm.

### Doudou Tang (Durham University): Hamiltonian Monte Carlo with Local Stochastic Step-Size

November 29, 2013Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that tries to avoid the slow exploration of the state space caused by random walk behaviour and correlated parameters by taking several steps according to gradient information of target distribution. This distinguishing feature makes HMC converge quicker than traditional MCMC algorithms like Metropolis random walk. HMC’s performance, however, is very sensitive to one parameter, the step-size $\varepsilon$ in approximating Hamiltonian dynamics, which the user must specify. In particular, too large a step-size leads to a low acceptance rate and a chain which gets stuck while too small a step-size causes random walk behaviour and slow exploration. We find that the upper boundary of a suitable step-size is related to the gradient information and curvature of target distributions and thus such a boundary might change across the state space. Therefore, we propose a device which uses stochastic step-size for HMC. This allows the step-size to take small values to get out of sticky points and large values to avoid small MCMC steps. The proposed method exploits the geometric structure of the log-posterior for a statistical model to generate step-size and thus step-size automatically adapts to the local structure at each MCMC iteration according to the parameter value.

*Keywords:* MCMC, Hamiltonian Monte Carlo, stochastic step-size

### Elodie Vernet (Université Paris Sud): Posterior consistency for nonparametric hidden Markov models with finite state space

November 28, 2013Hidden Markov models (HMMs) have been widely used in diverse fields such as speech recognition, genomics, econometrics. Because parametric modeling of emission distributions may lead to poor results in practice, in particular for clustering purposes, recent interest in using non parametric HMMs appeared in applications, see Yau, Papaspiliopoulos, Roberts and Holmes (2011). Here we study posterior consistency for different topologies on the parameters for nonparametric hidden Markov models with finite state space. We first obtain weak and strong posterior consistency for the marginal density function of finitely many consecutive observations and deduce posterior consistency for the different components of the parameter. We finally apply our results to independent emission probabilities, translated emission probabilities and discrete HMMs, some priors for which the posterior is consistent are given.

* Keywords*: Bayesian nonparametrics, consistency, hidden Markov models

link to article: arxiv.org/pdf/1311.3092

### Axel Finke (University of Warwick): Investigation of Exactly Approximated Rao-Blackwellised Particle Filters

November 28, 2013To improve the efficiency of sequential Monte Carlo (SMC) algorithms, it is desirable to reduce the dimension of the target distribution by analytically integrating out as many components as possible. This leads to so-called Rao-Blackwellised particle filters (RBPFs).

Unfortunately, the necessary integrals can usually not be calculated except in special cases such as conditionally linear/Gaussian hidden Markov models. To deal with more general problems, an exactly approximated Rao-Blackwellised particle filter (EARBF) was proposed by A. M. Johansen, N. Whiteley and A. Doucet (2012).

The EARBF is a pseudo-marginal SMC algorithm in which each particle is associated with its own sub-level SMC algorithm. The latter is designed to approximate the integral that would have to be calculated for a RBPF. This leads to a hierarchical SMC algorithm not unlike the SMC^2 algorithm by N. Chopin, P. E. Jacob and O. Papaspiliopoulos (2012).

We investigate the relationship between the performance of the algorithm and a number of design parameters.

This is joint work with Adam M. Johansen and Dario Spano.

* Keywords:* sequential Monte Carlo, Rao-Blackwellised particle filters, pseudo-marginal approach

### Eric Frichot (Université Joseph Fourier, Grenoble): Testing for associations between loci and environmental gradients using latent factor mixed models

November 28, 2013Local adaptation through natural selection plays a central role in shaping the genetic variation of populations. A way to investigate signatures of local adaptation, especially when beneficial alleles have weak phenotypic effects, is to identify polymorphisms that exhibit high correlation with environmental variables. However the geographical basis of both environmental and genetic variation can confound interpretation of these associations, as they can also result from genetic drift at neutral loci.

Here we propose an integrated framework based on spatial statistics, population genetics and ecological modeling for scans for signatures of local adaptation from genomic data. We present a novel class of algorithms to detect correlations between environmental and genetic variation that take account background levels of population structure and spatial autocorrelation in allele frequencies generated by isolation-by-distance mechanisms. Our framework uses Latent Factor Mixed Models, a hierarchical Bayesian mixed model in which environmental variables are fixed effects and population structure is introduced as random effects.

We implement fast algorithms that simultaneously estimate scores and loadings for the genotypic matrix and effects of environmental variables. Comparing these new algorithms with related methods provides evidence that LFMM can efficiently estimate random effects due to population history and isolation-by-distance patterns when computing gene-environment correlations, and decrease the number of false-positive associations in genome scans. We then apply these models to plant and human genetic data, identifying several genes with functions related to development that exhibit strong correlations with climatic gradients.

* Keywords:* mixed models, populations genetics, local adaptation.

The link to the paper is:

http://mbe.oxfordjournals.org/content/30/7/1687.full

### Nicolas Duforet-Frebourg (UJF, Grenoble): Genome scans for local adaptation: a Bayesian factor model

November 28, 2013We present a new Bayesian hierarchical model based on factor models for detecting outliers in high-dimensional data. Outliers are explicitly modeled using a variance inflation approach. The degree of outlyingness for each variable is measured using Bayes factors. In population genetics where many genetic markers are typed in different populations, we show that factor models can be used to map genomic regions involved in Darwinian adaptation. We provide a comparison with common approaches of genome scans and show that factor models reduce the false discovery rate.

Keywords: Factor models, outlier detection, population genetics, local adaptation

Joint work with Michael G.B. Blum and Eric Bazin

### Amandine Schreck (Télécom ParisTech): A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection

November 28, 2013We consider the long-standing problem of Bayesian variable selection in a linear regression model. Variable selection is a complicated task in high dimensional settings where the number of regression parameters P is much larger than the number of observations N. In this context, it is crucial to introduce sparsity assumptions based on the prior knowledge that only a few number of regression parameters are significant. Using a sequence of observations from a linear regression model, the aims are (i) to determine which components of the regression vector are active and explain the observations and (ii) to estimate the regression vector.

In this work, we introduce a new MCMC algorithm, called Shrinkage-Thresholding MALA (STMALA), designed to sample sparse regression vectors by jointly sampling a model and a regression vector in this model. This algorithm, which is a transdimensional MCMC method, relies on MALA (see [Roberts and Tweedie, 1996]). The proposal distribution of MALA is based on the computation of the gradient of the logarithm of the target distribution. In order to both deal with a non-differentiable target posterior distribution and to actually set some components to zero, we propose to combine MALA with a shrinkage-thresholding operator:

- – we first compute a noisy gradient step involving the term of the logarithm of the target distribution which is continuously differentiable;
- then a shrinkage-thresholding operator is applied to ensure sparsity and shrink small values of the regression parameters toward zero.

Such an algorithm is motivated by Bayesian variable selection with non-smooth priors. This algorithm can perform global moves from one model to a rather distant other one, which allows to explore efficiently high dimensional spaces in comparison to local move algorithms, like reversible jump MCMC (RJMCMC – see [Green 1995]). The geometric ergodicity of this new algorithm is proved for a large class of target distributions.

Joint work with Gersende Fort, Sylvain Le Corff and Éric Moulines.

References:

– P. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, *Biometrika* 82(4) 1995.

– G.O. Roberts, R.L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, *Bernouilli* 2(4) 1996.

Keywords: Markov Chain Monte Carlo, Proximal operators, Bayesian variable selection.

### Patrick Conrad (MIT): Asymptotically Exact MCMC Algorithms for Computationally Expensive Models via Local Approximations

November 28, 2013We construct a new framework for accelerating MCMC algorithms for sampling from posterior distributions in the context of computationally intensive models. We proceed by constructing local surrogates of the forward model within the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Our work builds upon previous work in surrogate-based inference by exploiting useful convergence characteristics of local surrogates. We prove the ergodicity of our approximate Markov chain and show that asymptotically it samples from the exact posterior density of interest. We describe variations of the algorithm that construct either local polynomial approximations or Gaussian process regressors, thus spanning two important classes of surrogate models. Numerical experiments demonstrate significant reductions in the number of forward model evaluations used in representative ODE or PDE inference problems, in both real and synthetic data examples.

This is joint work with Youssef Marzouk, Natesh Pillai, and Aaron Smith.

### Väinö Jääskinen (University of Helsinki): Sparse Markov Chains for Sequence Data

November 26, 2013Finite memory sources and variable-length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree-based approaches. This can lead to a substantially higher rate of data compression, and such non-hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.

*Joint work with Jie Xiong, Jukka Corander, and Timo Koski.*

**Keywords:** Bayesian learning, data compression, predictive inference, Markov chains, variable order Markov models

### Daniel Rudolf (Friedrich Schiller University Jena): On the hybrid slice sampler

November 25, 2013It is known that the simple slice sampler has very robust convergence properties. In contrast, we consider hybrid slice samplers where another Markov chain approximately samples the uniform distribution on each slice. Under appropriate assumptions of the Markov chain on the slice we show a lower bound and upper bound of the spectral gap of the hybrid slice sampler in terms of the spectral gap of the simple slice sampler.

**Keywords:** hybrid Markov chains, slice sampler, spectral gap

### Jonathan Heydari (Newcastle University): Bayesian hierarchical modelling for inferring genetic interactions

November 24, 2013### Johan Pensar (Åbo Akademi University): Bayesian learning of Labeled Directed Acyclic Graphs using non-reversible MCMC

November 22, 2013An LDAG is a directed acyclic graph for which labels have been added to the edges. The labels represent certain local statements of context-specific independence. A probabilistic graphical model based on an LDAG has thereby a more refined dependence structure than a traditional Bayesian network. Efficient Bayesian learning of LDAGs is enabled by introducing an LDAG-based factorization of the Dirichlet prior for the model parameters such that the marginal likelihood can be calculated analytically. In addition to the marginal likelihood, we introduce a novel prior distribution that can appropriately penalize a model for its labeling complexity. To search the model space for high-posterior models, we combine a non-reversible MCMC algorithm with a greedy hill-climbing approach.

*Joint work with Henrik Nyman, Timo Koski, and Jukka Corander.*

### Adrien Todeschini (INRIA Bordeaux): BiiPS: A software for inference in Bayesian graphical models with sequential Monte Carlo methods

November 18, 2013The main factor in the success of Markov Chain Monte Carlo Methods is that they can be implemented with little efforts in a large variety of settings. Many softwares have been developped such as BUGS and JAGS, that helped to popularize Bayesian methods. These softwares allow the user to define his statistical model in a so-called BUGS language, then runs MCMC algorithms as a black box.

Although sequential Monte Carlo methods have become a very popular class of numerical methods over the last 20 years, there is no such “black box software” for this class of methods. The BiiPS software, which stands for Bayesian Inference with Interacting Particle Systems, aims at bridging this gap. From a graphical model defined in BUGS language, it automatically implements sequential Monte Carlo algorithms and provides summaries of the posterior distributions. In this poster, we will highlight some of the features of the BiiPS software and an illustration of its application to a stochastic volatility model.

### Luca Martino (University of Helsinki): An iterated batch importance sampler driven by an MCMC algorithm

November 15, 2013Monte Carlo (MC) methods are widely used for statistical inference and stochastic optimization. A well-known class of MC methods is composed of importance sampling (IS) and its adaptive extensions (e.g., population Monte Carlo). In this work, we introduce an iterated batch importance sampler where the proposal is changed following a random walk generated by an MCMC technique. The novel algorithm provides a global estimation of the variables of interest iteratively, using and all the generated (weighted) samples.

Compared with a traditional IS scheme using with the same number of samples, the performance are strongly improved and the computational cost increases lightly, only for the acceptance step of the MCMC method (for accepting or rejecting the movement). Clearly, the dependence to the choice of the proposal is sensibly reduced. The novel algorithm can be considered as a IS scheme with a population of proposals automatically selected by the used MCMC technique. Numerical results show the advantages of the proposed sampling scheme in terms of mean absolute error.

### Johan Dahlin (Linköping University): Particle Metropolis-Hastings using second-order proposals

November 4, 2013We propose an improved proposal distribution in the Particle Metropolis-Hastings (PMH) algorithm for Bayesian parameter inference in nonlinear state space models. This proposal incorporates second-order information about the posterior distribution over the system parameters, which can be extracted from the particle filter used in the PMH algorithm. This makes the algorithm scale-invariant, simpler to calibrate and shortens the burn-in phase. We also suggest improvements that reduces the computational complexity of our earlier first-order method. The complexity of the previous method is quadratic in the number of particles, whereas the new second-order method is linear.

Joint work of Johan Dahlin (Linköping University), Fredrik Lindsten (Linköping University), and Thomas B. Schön (Uppsala University)

### Kengo Kamatani (Osaka University): Local consistency of Markov chain Monte Carlo with some applications

October 29, 2013In this poster, we study the notion of efficiency (consistency) and examine some asymptotic properties of Markov chain Monte Carlo methods. We apply these results to some data augmentation procedures for independent and identically distributed observations.

The advantages of using the local properties are the simplicity and the generality of the results. The local properties provide useful insight into the problem of how to construct efficient algorithms.

**Keywords;** Asymptotic Normality, Ergodicity, Data augmentation

### Luca Martino (Universidad Carlos III de Madrid): Adaptive sticky Metropolis

October 25, 2013In order to draw from univariate target densities, in the last decades several adaptive Monte Carlo techniques have been developed using an approach that approximates adaptively the target distribution. The most famous methodologies in this area, are probably the adaptive rejection sampling (ARS) and the adaptive rejection Metropolis sampling (ARMS). However, in literature have been introduced many other algorithms in which the proposal density is built using an interpolation procedure. In this work, we introduce a novel and simple adaptive Metropolis-Hasting algorithm based on this idea. Moreover, the algorithm is strengthened by a step that controls the evolution of the set of support points. This extra stage improves the computational cost and accelerates the convergence of the proposal distribution to the target. The performance of this novel approach are illustrated through numerical example, and some extensions are discussed.

### Alberto Caimo (USI Lugano): Bayesian modeling of network heterogeneity

October 23, 2013With respect to the available statistical modeling cross-sectional network data one may roughly distinguish between two strands: (a) models which explain the existence of an edge depends on nodal random effects, (b) models where the existence of an edge also depends on the local network structure. The strand (a) is phrased as p1 and p2 models and the strand (b) is based on exponential random graph models (ERGMs). We present a comprehensive inferential framework for Bayesian ERGMs with nodal random effects in order to account for both global dependence structure and network heterogeneity. Parameter inference and model selection procedures are based on the use of an approximate exchange algorithm and its trans-dimensional extension.

**Keywords:** social network analysis, network heterogeneity, exponential random graphs, exchange algorithm.

*Join work with Stephanie Thiemichen (LMU Munich), Goeran Kauermann (LMU Munich) and Nial Friel (UCD Dublin)

### Sofia Tsepletidou (Université Paris-Dauphine): Computational Bayesian Tools for Modeling the Aging Process in Escherichia coli

October 16, 2013This research studies the aging process at the bacterium* E. Coli* in a bayesian framework. Modeling appropriately this process, by reconstructing a hidden quantity that explains the physiological characteristics of each cell in the lineage tree, is the first step towards the estimation. The last one is possible through exploration of the posterior distribution of the constructed model. To this purpose, firstly, an Approximate Bayesian Computation methodology has been considered. Later, Monte Carlo Markov Chains methods, a Gibbs Sampler, have been also performed. Finally, the results of each approach are discussed, as well as the possible extensions.

### Weixuan Zhu (Universidad Carlos III de Madrid): Bootstrap Likelihood and Bayesian Computations

October 16, 2013During the past ten years, for addressing problems with an intractable likelihood, a new class of algorithms has been proposed in the literature. They are customary called Approximate Bayesian Computational (ABC) Methods or also, likelihood-free techniques. Unfortunately, ABC methods suffers some calibration problems. Recently, Mergensen, Pudlo and Robert (2013) proposed an interesting alternative approach, based on empirical likelihood, which bypasses some of the typical problems of such algorithms. In a similar flavor, in this work we propose an approach based on Bootstrap Likelihood or alternatively, the Bootstrap Likelihood representation of the empirical Likelihood. There are some benefits in using this approach. Precisely, it is faster by a computational point of view and there are less parameters to set. The effectiveness of the method is tested on Time Series and Bioinformatics models.

**References:**

Mengersen, K., Pudlo, P., and Robert, C.P. (2013). Bayesian computation via empirical likelihood. *Proceedings of the National Academy of Sciences* 110 (4), 1321–1326.

Davison, A.C., Hinkley D.V. (1992). Bootstrap likelihoods. *Biometrika* 79(1), 113-130

Joint work with Juan Miguel Marin (Universidad Carlos III de Madrid) and Fabrizio Leisen (University of Kent)

### Samuel Wong (University of Florida): Sequential Monte Carlo methods in protein folding

October 16, 2013Predicting the native structure of a protein from its amino acid sequence is a long standing problem. A significant bottleneck of computational prediction is the lack of efficient sampling algorithms to explore the configuration space of a protein. In this talk we will introduce a sequential Monte Carlo method to address this challenge: fragment regrowth via energy-guided sequential sampling (FRESS). The FRESS algorithm combines statistical learning (namely, learning from the protein data bank) with sequential sampling to guide the computation, resulting in a fast and effective exploration of the configurations. We will illustrate the FRESS algorithm with both lattice protein model and real proteins.

Joint work with Kevin Bartz, Samuel Kou, Jun Liu, Jinfeng Zhang

### Kasia Wolny (University of Warwick): Robust Metropolis-adjusted Langevin algorithms

October 16, 2013Choice of an optimal scaling is a common concern when applying Metropolis-Hastings algorithm. We address this problem by suggesting a proposal distribution with state-dependent variance. The new algorithm is generalization of Metropolis-adjusted Langevin algorithm (MALA) and it employs local curvature of target distribution π. Therefore we call it Curvature MALA (CMALA). The motivation for our choice of proposal distribution can be described as follows. Informally speaking if at current state x target distribution is steep we suggest distribution which favours small moves and big leaps are preferred when the chain explores flat parts of π. For one dimensional distributions the proposals of CMALA follow a normal distribution N(μx,σ2x) with σ2x=hk2x and

μx=x+h[12k2x∂∂xlogπ(x)+γkx∂∂xkx],wherekx=∣∣∂2∂x2logπ(x)∣∣−12,γ=1

and h is a time discretization step of Langevin diffusion. Our main interest lies in showing when CMALA and netCMALA (with γ=0) are geometrically ergodic (GE). We obtain that under the regularity conditions

(R0)π(x)∝exp{−|x|β+} for x>M, where M<∞,

and π(x)∝exp{−|x|β−} for x0∀x∈R,(R2∗)π∈C2,(R3)∂2∂x2logπ(x)≠0∀x∈R

and if

(C1.1)min{β+,β−}>1,(C2)h>0 is sufficiently small

hold then netCMALA is well-defined and GE. If all above conditions are satisfied and in addition

(R2)π∈C3

then also CMALA is well-defined and GE. Suppose (R0), (R1), (R2*) and (R3) hold. If also

(C1.2)β+≠β−,min{β+,β−}∈(0,1)

then netCMALA is well-defined but not GE. If in addition (R2) holds then also CMALA is well-defined but not GE. Furthermore we show similar results for CMALA based algorithms adopting randomized h.

Joint work with Gareth O. Roberts, Krzysztof Latuszynski

### Veroniká Rockova (Wharton Business School): Sequential EMVS Mode Detection for Posterior Reconstruction

October 16, 2013EMVS is a fast deterministic approach to identifying sparse high posterior models for Bayesian variable selection under spike-and-slab priors. In large high-dimensional problems where exact full posterior inference must be sacrificed for computational feasibility, sequentially reinitialized deployments of EMVS can be used to find subsets of the highest posterior modes. These EMVS identified modes can then be used as a basis for a truncated reconstruction of the full parameter space posterior. Obtained as a posterior model probability weighted sum of expressions for model coefficient posterior distributions, this reconstruction is a rapidly computable closed form expression that yields fast approximations for model averaging and the median probability model.

Joint work with Edward George

### Samantha Robinson (University of Arkansas): Structural Functional Time Series Models

October 16, 2013We propose a Bayesian model for functional time series that is based on an extension of the highly successful dynamic linear model to Banach state-valued data, discussing practical issues of discretization and simulation-based inference via MCMC. We discuss extensions of finite dimensional structural time series models to the functional setting, building a very convenient and flexible toolbox of standard functional dynamic linear models. The proposed models, together with the relative MCMC algorithms, are tested on two data sets: the first about electricity demand and the second about interest rates.

Joint work with Giovanni Petris, Samantha Robinson

### Murray Pollock (University of Warwick): Jump Diffusion Barriers

October 16, 2013In this talk we will discuss recent work extending methodology for simulating finite dimensional representations of jump diffusion sample paths over finite intervals, without discretisation error (\textit{exactly}), in such a way that the sample path can be \textit{restored} at any finite collection of time points. We demonstrate the efficacy of our approach by showing that with finite computation it is possible to determine whether or not sample paths cross various non-standard barriers.

Joint work with Adam Johansen, Gareth Roberts

### Felipe Medina Aguayo (University of Warwick): The Pseudo-Marginal Approach to MCMC

October 16, 2013Metropolis-Hastings algorithm emerged as a useful and powerful simulation tool for many settings in different areas. However when we do not have an analytic expression for the target density, we may need to appeal to an algorithm on an extended space. This could easily turn into a high-dimensional problem, possibly leading to mixing and convergence complications. The use of importance sampling estimates for the unknown target distribution provides a second alternative. Even though this approximation introduces a bias, a simple modification of this idea will eliminate such error. MCWM and GIMH algorithms are presented, as well as respective extensions within a particle MCMC context.

Joint work with Gareth Roberts, Anthony Lee

### Matthias Katzfuss (Texas A&M University): Statistical inference for massive distributed spatial data using low-rank models

October 16, 2013Due to rapid data growth, it is becoming increasingly infeasible to move massive datasets, and statistical analyses have to be carried out where the data reside. If several datasets stored in separate physical locations are all relevant to a given problem, the challenge is to obtain valid inference based on all data without moving the datasets. This distributed data problem frequently arises in the geophysical and environmental sciences, for example, when the variable of interest is measured by several satellite instruments. We show that for a very widely used class of spatial low-rank models, which contain a component that can be written as a linear combination of spatial basis functions, computationally feasible spatial inference and prediction for massive distributed data can be carried out exactly. The required number of floating-point operations is linear in the number of data points, while the required amount of communication does not depend on the data sizes at all. After discussing several extensions and special cases, we apply our methodology to carry out spatio-temporal particle filtering inference on total precipitable water measured by three different sensor systems.

Joint work with Dorit Hammerling

### Gregor Kastner (WU Vienna University of Economics and Business): Analysis of Multivariate Financial Time Series via Bayesian Factor Stochastic Volatility Models

October 16, 2013In recent years, multivariate factor stochastic volatility (SV) models have been increasingly used to analyze high-dimensional financial and economic time series because they can pick up the joint volatility dynamics by a small number of latent time-varying factors. The main advantage of such a model is its parsimony; all variances and covariances of a time series vector are governed by a low-dimensional common factor with the components following independent SV models. For problems of this kind, MCMC is a very efficient estimation method; nevertheless, it is associated with a considerable computational burden when the number of assets is moderate to large. To overcome this, we avoid the usual forward-filtering backward-sampling (FFBS) algorithm by sampling the latent states “all without a loop” (AWOL), consider various reparameterizations such as (partial) non-centering, and apply an ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC estimation at a univariate level, which can be applied directly to heteroskedasticity estimation for latent variables such as factors. Moreover, we use modern supercomputing architecture for parallel implementation. Our algorithm is designed in a way such that existing software crafted for efficient Bayesian univariate SV estimation can easily be incorporated. Finally, to show the effectiveness of our approach, we apply the model to a vector of daily returns.

Joint work with Sylvia Frühwirth-Schnatter and Hedibert F. Lopes

### Pierre Jacob (National University of Singapore & University of Oxford): Path storage in the particle filter

October 16, 2013This talk considers the problem of storing all the paths generated by a particle filter. I will present a theoretical result bounding the expected memory cost and an efficient algorithm to realise this. The theoretical result and the algorithm are illustrated with numerical experiments.

Joint work with Lawrence Murray, Sylvain Rubenthaler

### Michael Gutmann (University of Helsinki): Bayesian Optimization for Likelihood-Free Estimation

October 16, 2013 In approximate Bayesian computation, sampled parameters are retained with high probability if the sampled and observed data are “close”. Similarly, in point estimation with indirect inference, a parameter is sought which makes the sampled and observed data close. Unfortunately, the sampled and observed data are rarely close enough which makes both approaches to estimation computationally costly. In this work, we propose to use Bayesian optimization to increase the computational efficiency in indirect inference. The same principle may be used in ABC.

Joint work with Jukka Corander

### Clara Grazian (Università di Roma “La Sapienza” and Université Paris-Dauphine): Approximate Bayesian computation for the elimination of nuisance parameters

October 16, 2013Recent developments allow Bayesian analysis also when the likelihood function L(θ;y) is intractable, that means it is analytically unavailable or computationally prohibitive to evaluate. These methods are known as “approximate Bayesian computation” (ABC) or likelihood-free methods and are characterized by the fact that the approximation of the posterior distribution is obtained without explicitly evaluating the likelihood function. This kind of analysis is popular in genetic and financial settings. We propose a novel use of the approximate Bayesian methodology; an intractable likelihood is related to the problem of the elimination of nuisance parameters. We propose to use ABC to approximate the likelihood function of the parameter of interest. We will present several application of the methodology.

Joint work with Brunero Liseo