How to choose the “right” summary of the data and thus how to minimize the information loss in reducing the dimension of the problem from the observed y to s=S(y) is still an open and essential matter, especially for Approximate Bayesian Computation (ABC) which relies on the approximation p(θ|s) ~ p(θ|y).

It has been shown that aiming for sufficiency is not a trivial matter and hence is usual to resort to a collection of hopefully quasi-sufficient statistics, typically dependent on the problem, under the condition that this collection lies in a lower dimensional space to avoid the so-called curse of dimensionality.

Following the idea underling dimension reduction techniques for ABC like Partial Least Squares, Wegman et al. (2009), we derive a procedure to select a sufficient dimension reduction subset u of the vector s by characterizing it as the minimal subset such that the distribution of θ is independent of s given u.

PLS however operate only in the original space and the independence condition in their criterion is in fact replaced with an uncorrelation requirement. This simplification is known to fail outside Gaussian models, an assumption which is easily unsatisfied in complicate models where ABC is needed. We thus resort to kernel methods by examining the conditional independence requirement through mean embedding in a RKHS as in Fukumizu et al. (2008) and Zhang et al. (2012) where linear relations are known to represent non-linear dependencies in the original space.

A proper testing procedure and a greedy procedure are compared in various simulation studies.