TuringLang · Red-Portal · Oct 22, 2025 · Oct 22, 2025 · Oct 24, 2025 · Oct 24, 2025
diff --git a/HISTORY.md b/HISTORY.md
@@ -18,6 +18,96 @@ As long as the above functions are defined correctly, Turing will be able to use
 
 The `Turing.Inference.isgibbscomponent(::MySampler)` interface function still exists, but in this version the default has been changed to `true`, so you should not need to overload this.
 
+## **AdvancedVI 0.6**
+
+Turing.jl v0.42 updates `AdvancedVI.jl` compatibility to 0.6 (we skipped the breaking 0.5 update as it does not introduce new features).
+`AdvancedVI.jl@0.6` introduces major structural changes including breaking changes to the interface and multiple new features.
+The summary of the changes below are the things that affect the end-users of Turing.
+For a more comprehensive list of changes, please refer to the [changelogs](https://github.com/TuringLang/AdvancedVI.jl/blob/main/HISTORY.md) in `AdvancedVI`.
+
+### Breaking Changes
+
+A new level of interface for defining different variational algorithms has been introduced in `AdvancedVI` v0.5. As a result, the function `Turing.vi` now receives a keyword argument `algorithm`. The object `algorithm <: AdvancedVI.AbstractVariationalAlgorithm` should now contain all the algorithm-specific configurations. Therefore, keyword arguments of `vi` that were algorithm-specific such as `objective`, `operator`, `averager` and so on, have been moved as fields of the relevant `<: AdvancedVI.AbstractVariationalAlgorithm` structs.
+
+In addition, the outputs also changed. Previously, `vi` returned both the last-iterate of the algorithm `q` and the iterate average `q_avg`. Now, for the algorithms running parameter averaging, only `q_avg` is returned. As a result, the number of returned values reduced from 4 to 3.
+
+For example,
+
+```julia
+q, q_avg, info, state = vi(
+    model, q, n_iters; objective=RepGradELBO(10), operator=AdvancedVI.ClipScale()
+)
+```
+
+is now
+
+```julia
+q_avg, info, state = vi(
+    model,
+    q,
+    n_iters;
+    algorithm=KLMinRepGradDescent(adtype; n_samples=10, operator=AdvancedVI.ClipScale()),
+)
+```
+
+Similarly,
+
+```julia
+vi(
+    model,
+    q,
+    n_iters;
+    objective=RepGradELBO(10; entropy=AdvancedVI.ClosedFormEntropyZeroGradient()),
+    operator=AdvancedVI.ProximalLocationScaleEntropy(),
+)
+```
+
+is now
+
+```julia
+vi(model, q, n_iters; algorithm=KLMinRepGradProxDescent(adtype; n_samples=10))
+```
+
+Lastly, to obtain the last-iterate `q` of `KLMinRepGradDescent`, which is not returned in the new interface, simply select the averaging strategy to be `AdvancedVI.NoAveraging()`. That is,
+
+```julia
+q, info, state = vi(
+    model,
+    q,
+    n_iters;
+    algorithm=KLMinRepGradDescent(
+        adtype;
+        n_samples=10,
+        operator=AdvancedVI.ClipScale(),
+        averager=AdvancedVI.NoAveraging(),
+    ),
+)
+```
+
+Additionally,
+
+  - The default hyperparameters of `DoG`and `DoWG` have been altered.
+  - The deprecated `AdvancedVI@0.2`-era interface is now removed.
+  - `estimate_objective` now always returns the value to be minimized by the optimization algorithm. For example, for ELBO maximization algorithms, `estimate_objective` will return the *negative ELBO*. This is breaking change from the previous behavior where the ELBO was returned.
+  - The initial value for the `q_meanfield_gaussian`, `q_fullrank_gaussian`, and `q_locationscale` have changed. Specificially, the default initial value for the scale matrix has been changed from `I` to `0.6*I`.
+  - When using algorithms that expect to operate in unconstrained spaces, the user is now explicitly expected to provide a `Bijectors.TransformedDistribution` wrapping an unconstrained distribution. (Refer to the docstring of `vi`.)
+
+### New Features
+
+`AdvancedVI@0.6` adds numerous new features including the following new VI algorithms:
+
+  - `KLMinWassFwdBwd`: Also known as "Wasserstein variational inference," this algorithm minimizes the KL divergence under the Wasserstein-2 metric.
+  - `KLMinNaturalGradDescent`: This algorithm, also known as "online variational Newton," is the canonical "black-box" natural gradient variational inference algorithm, which minimizes the KL divergence via mirror descent under the KL divergence as the Bregman divergence.
+  - `KLMinSqrtNaturalGradDescent`: This is a recent variant of `KLMinNaturalGradDescent` that operates in the Cholesky-factor parameterization of Gaussians instead of precision matrices.
+  - `FisherMinBatchMatch`: This algorithm called "batch-and-match," minimizes the variation of the 2nd order Fisher divergence via a proximal point-type algorithm.
+
+Any of the new algorithms above can readily be used by simply swappin the `algorithm` keyword argument of `vi`.
+For example, to use batch-and-match:
+
+```julia
+vi(model, q, n_iters; algorithm=FisherMinBatchMatch())
+```
+
 # 0.41.4
 
 Fixed a bug where the `check_model=false` keyword argument would not be respected when sampling with multiple threads or cores.

diff --git a/Project.toml b/Project.toml
@@ -55,7 +55,7 @@ Accessors = "0.1"
 AdvancedHMC = "0.8.3"
 AdvancedMH = "0.8.9"
 AdvancedPS = "0.7"
-AdvancedVI = "0.4"
+AdvancedVI = "0.6"
 BangBang = "0.4.2"
 Bijectors = "0.14, 0.15"
 Compat = "4.15.0"

diff --git a/docs/src/api.md b/docs/src/api.md
@@ -109,12 +109,19 @@ Turing.jl provides several strategies to initialise parameters for models.
 
 See the [docs of AdvancedVI.jl](https://turinglang.org/AdvancedVI.jl/stable/) for detailed usage and the [variational inference tutorial](https://turinglang.org/docs/tutorials/09-variational-inference/) for a basic walkthrough.
 
-| Exported symbol        | Documentation                                     | Description                                                                              |
-|:---------------------- |:------------------------------------------------- |:---------------------------------------------------------------------------------------- |
-| `vi`                   | [`Turing.vi`](@ref)                               | Perform variational inference                                                            |
-| `q_locationscale`      | [`Turing.Variational.q_locationscale`](@ref)      | Find a numerically non-degenerate initialization for a location-scale variational family |
-| `q_meanfield_gaussian` | [`Turing.Variational.q_meanfield_gaussian`](@ref) | Find a numerically non-degenerate initialization for a mean-field Gaussian family        |
-| `q_fullrank_gaussian`  | [`Turing.Variational.q_fullrank_gaussian`](@ref)  | Find a numerically non-degenerate initialization for a full-rank Gaussian family         |
+| Exported symbol               | Documentation                                            | Description                                                                                                                                       |
+|:----------------------------- |:-------------------------------------------------------- |:------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `vi`                          | [`Turing.vi`](@ref)                                      | Perform variational inference                                                                                                                     |
+| `q_locationscale`             | [`Turing.Variational.q_locationscale`](@ref)             | Find a numerically non-degenerate initialization for a location-scale variational family                                                          |
+| `q_meanfield_gaussian`        | [`Turing.Variational.q_meanfield_gaussian`](@ref)        | Find a numerically non-degenerate initialization for a mean-field Gaussian family                                                                 |
+| `q_fullrank_gaussian`         | [`Turing.Variational.q_fullrank_gaussian`](@ref)         | Find a numerically non-degenerate initialization for a full-rank Gaussian family                                                                  |
+| `KLMinRepGradDescent`         | [`Turing.Variational.KLMinRepGradDescent`](@ref)         | KL divergence minimization via stochastic gradient descent with the reparameterization gradient                                                   |
+| `KLMinRepGradProxDescent`     | [`Turing.Variational.KLMinRepGradProxDescent`](@ref)     | KL divergence minimization via stochastic proximal gradient descent with the reparameterization gradient over location-scale variational families |
+| `KLMinScoreGradDescent`       | [`Turing.Variational.KLMinScoreGradDescent`](@ref)       | KL divergence minimization via stochastic gradient descent with the score gradient                                                                |
+| `KLMinWassFwdBwd`             | [`Turing.Variational.KLMinWassFwdBwd`](@ref)             | KL divergence minimization via Wasserstein proximal gradient descent                                                                              |
+| `KLMinNaturalGradDescent`     | [`Turing.Variational.KLMinNaturalGradDescent`](@ref)     | KL divergence minimization via natural gradient descent                                                                                           |
+| `KLMinSqrtNaturalGradDescent` | [`Turing.Variational.KLMinSqrtNaturalGradDescent`](@ref) | KL divergence minimization via natural gradient descent in the square-root parameterization                                                       |
+| `FisherMinBatchMatch`         | [`Turing.Variational.FisherMinBatchMatch`](@ref)         | Covariance-weighted Fisher divergence minimization via the batch-and-match algorithm                                                              |
 
 ### Automatic differentiation types
 

diff --git a/src/Turing.jl b/src/Turing.jl
@@ -47,7 +47,7 @@ include("stdlib/distributions.jl")
 include("stdlib/RandomMeasures.jl")
 include("mcmc/Inference.jl")  # inference algorithms
 using .Inference
-include("variational/VariationalInference.jl")
+include("variational/Variational.jl")
 using .Variational
 
 include("optimisation/Optimisation.jl")
@@ -117,10 +117,16 @@ export
     externalsampler,
     # Variational inference - AdvancedVI
     vi,
-    ADVI,
     q_locationscale,
     q_meanfield_gaussian,
     q_fullrank_gaussian,
+    KLMinRepGradProxDescent,
+    KLMinRepGradDescent,
+    KLMinScoreGradDescent,
+    KLMinNaturalGradDescent,
+    KLMinSqrtNaturalGradDescent,
+    KLMinWassFwdBwd,
+    FisherMinBatchMatch,
     # ADTypes
     AutoForwardDiff,
     AutoReverseDiff,