Utils

AutoEncoderToolkit.jl offers a series of utility functions for different tasks.

Training Utilities

AutoEncoderToolkit.utils.step_schedulerFunction
`step_scheduler(epoch, epoch_change, learning_rates)`

Simple function to define different learning rates at specified epochs.

Arguments

  • epoch::Int: Epoch at which to define learning rate.
  • epoch_change::Vector{<:Int}: Number of epochs at which to change learning rate. It must include the initial learning rate!
  • learning_rates::Vector{<:AbstractFloat}: Learning rate value for the epoch range. Must be the same length as epoch_change

Returns

  • η::AbstractFloat: Learning rate for the current epoch.
AutoEncoderToolkit.utils.cycle_annealFunction
cycle_anneal(
    epoch::Int, 
    n_epoch::Int, 
    n_cycles::Int; 
    frac::AbstractFloat=0.5f0, 
    βmax::Number=1.0f0, 
    βmin::Number=0.0f0, 
    T::Type=Float32
)

Function that computes the value of the annealing parameter β for a variational autoencoder as a function of the epoch number according to the cyclical annealing strategy.

Arguments

  • epoch::Int: Epoch on which to evaluate the value of the annealing parameter.
  • n_epoch::Int: Number of epochs that will be run to train the VAE.
  • n_cycles::Int: Number of annealing cycles to be fit within the number of epochs.

Optional Arguments

  • frac::AbstractFloat= 0.5f0: Fraction of the cycle in which the annealing parameter β will increase from the minimum to the maximum value.
  • βmax::Number=1.0f0: Maximum value that the annealing parameter can reach.
  • βmin::Number=0.0f0: Minimum value that the annealing parameter can reach.
  • T::Type=Float32: The type of the output. The function will convert the output to this type.

Returns

  • β::T: Value of the annealing parameter.

Citation

Fu, H. et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing. Preprint at http://arxiv.org/abs/1903.10145 (2019).

AutoEncoderToolkit.utils.locality_samplerFunction

locality_sampler(data, dist_tree, n_primary, n_secondary, k_neighbors; index=false)

Algorithm to generate mini-batches based on spatial locality as determined by a pre-constructed nearest neighbors tree.

Arguments

  • data::AbstractArray: An array containing the data points. The data points can be of any dimension.
  • dist_tree::NearestNeighbors.NNTree: NearestNeighbors.jl tree used to determine the distance between data points.
  • n_primary::Int: Number of primary points to sample.
  • n_secondary::Int: Number of secondary points to sample from the neighbors of each primary point.
  • k_neighbors::Int: Number of nearest neighbors from which to potentially sample the secondary points.

Optional Keyword Arguments

  • index::Bool: If true, returns the indices of the selected samples. If false, returns the data corresponding to the indexes. Defaults to false.

Returns

  • If index is true, returns sample_idx::Vector{Int64}: Indices of data points to include in the mini-batch.
  • If index is false, returns sample_data::AbstractArray: The data points to include in the mini-batch.

Description

This sampling algorithm consists of three steps:

  1. For each datapoint, determine the k_neighbors nearest neighbors using the dist_tree.
  2. Uniformly sample n_primary points without replacement from all data points.
  3. For each primary point, sample n_secondary points without replacement from its k_neighbors nearest neighbors.

Examples

# Pre-constructed NearestNeighbors.jl tree
dist_tree = NearestNeighbors.KDTree(data, metric)
sample_indices = locality_sampler(data, dist_tree, 10, 5, 50)

Citation

Skafte, N., Jø rgensen, M. & Hauberg, S. ren. Reliable training and estimation of variance networks. in Advances in Neural Information Processing Systems vol. 32 (Curran Associates, Inc., 2019).

Centroid Finding Utilities

Some VAE models, such as the RHVAE, require clustering of the data. Specifically RHVAE can take a fixed subset of the training data as a reference for the computation of the metric tensor. The following functions can be used to define this reference subset to be used as centroids for the metric tensor computation.

AutoEncoderToolkit.utils.centroids_kmeansFunction
centroids_kmeans(
    x::AbstractMatrix, 
    n_centroids::Int; 
    assign::Bool=false
)

Perform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

Arguments

  • x::AbstractMatrix: The input data. Rows represent individual samples.
  • n_centroids::Int: The number of centroids to compute.

Optional Keyword Arguments

  • assign::Bool=false: If true, also return the assignments of each point to a centroid.

Returns

  • If assign is false, returns a matrix where each column is a centroid.
  • If assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.

Examples

data = rand(100, 10)
centroids = centroids_kmeans(data, 5)
centroids_kmeans(
    x::AbstractArray, 
    n_centroids::Int; 
    reshape_centroids::Bool=true, 
    assign::Bool=false
)

Perform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

The input data is flattened into a matrix before performing k-means clustering. This is done because k-means operates on a set of data points in a vector space and cannot handle multi-dimensional arrays. Flattening the input ensures that the k-means algorithm can process the data correctly.

By default, the output centroids are reshaped back to the original input shape. This is controlled by the reshape_centroids argument.

Arguments

  • x::AbstractArray: The input data. It can be a multi-dimensional array where the last dimension represents individual samples.
  • n_centroids::Int: The number of centroids to compute.

Optional Keyword Arguments

  • reshape_centroids::Bool=true: If true, reshape the output centroids back to the original input shape.
  • assign::Bool=false: If true, also return the assignments of each point to a centroid.

Returns

  • If assign is false, returns a matrix where each column is a centroid.
  • If assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.

Examples

data = rand(100, 10)
centroids = centroids_kmeans(data, 5)
AutoEncoderToolkit.utils.centroids_kmedoidsFunction
    centroids_kmedoids(
        x::AbstractMatrix, n_centroids::Int; assign::Bool=false
    )

Perform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

Arguments

  • x::AbstractMatrix: The input data. Rows represent individual samples.
  • n_centroids::Int: The number of centroids to compute.
  • dist::Distances.PreMetric=Distances.Euclidean(): The distance metric to use when computing the pairwise distance matrix.

Optional Keyword Arguments

  • assign::Bool=false: If true, also return the assignments of each point to a centroid.

Returns

  • If assign is false, returns a matrix where each column is a centroid.
  • If assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.

Examples

data = rand(100, 10)
centroids = centroids_kmedoids(data, 5)
centroids_kmedoids(
    x::AbstractArray,
    n_centroids::Int,
    dist::Distances.PreMetric=Distances.Euclidean();
    assign::Bool=false
)

Perform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

Arguments

  • x::AbstractArray: The input data. The last dimension of x should contain each of the samples that should be clustered.
  • n_centroids::Int: The number of centroids to compute.
  • dist::Distances.PreMetric=Distances.Euclidean(): The distance metric to use for the clustering. Defaults to Euclidean distance.

Optional Keyword Arguments

  • assign::Bool=false: If true, also return the assignments of each point to a centroid.

Returns

  • If assign is false, returns an array where each column is a centroid.
  • If assign is true, returns a tuple where the first element is the array of centroids and the second element is a vector of assignments.

Examples

data = rand(10, 100)
centroids = centroids_kmedoids(data, 5)

Other Utilities

AutoEncoderToolkit.utils.storage_typeFunction
storage_type(A::AbstractArray)

Determine the storage type of an array.

This function recursively checks the parent of the array until it finds the base storage type. This is useful for determining whether an array or its subarrays are stored on the CPU or GPU.

Arguments

  • A::AbstractArray: The array whose storage type is to be determined.

Returns

The type of the array that is the base storage of A.

AutoEncoderToolkit.utils.vec_to_ltriFunction
    vec_to_ltri(diag::AbstractVecOrMat, lower::AbstractVecOrMat)

Convert two one-dimensional vectors or matrices into a lower triangular matrix or a 3D tensor.

Arguments

  • diag::AbstractVecOrMat: The input vector or matrix to be converted into the diagonal of the matrix. If it's a matrix, each column is considered as a separate vector.
  • lower::AbstractVecOrMat: The input vector or matrix to be converted into the lower triangular part of the matrix. The length of this vector or the number of rows in this matrix should be a triangular number (i.e., the sum of the first n natural numbers for some n). If it's a matrix, each column is considered the lower part of a separate lower triangular matrix.

Returns

  • A lower triangular matrix or a 3D tensor where each slice is a lower triangular matrix constructed from diag and lower.

Description

This function constructs a lower triangular matrix or a 3D tensor from two input vectors or matrices, diag and lower. The diag vector or matrix provides the diagonal elements of the matrix, while the lower vector or matrix provides the elements below the diagonal. The function uses a comprehension to construct the matrix or tensor, with the lower_index function calculating the appropriate index in the lower vector or matrix for each element below the diagonal.

GPU Support

The function supports both CPU and GPU arrays. For GPU arrays, the data is first transferred to the CPU, the lower triangular matrix or tensor is constructed, and then it is transferred back to the GPU.

AutoEncoderToolkit.utils.vec_mat_vec_batchedFunction
vec_mat_vec_batched(
    v::AbstractVector, 
    M::AbstractMatrix, 
    w::AbstractVector
)

Compute the product of a vector, a matrix, and another vector in the form v̲ᵀ M̲̲ w̲.

This function takes two vectors v and w, and a matrix M, and computes the product v̲ M̲̲ w̲. This function is added for consistency when calling multiple dispatch.

Arguments

  • v::AbstractVector: A d dimensional vector.
  • M::AbstractMatrix: A d×d matrix.
  • w::AbstractVector: A d dimensional vector.

Returns

A scalar which is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.

Notes

This function uses the LinearAlgebra.dot function to perform the multiplication of the matrix M with the vector w. The resulting vector is then element-wise multiplied with the vector v and summed over the dimensions to obtain the final result. This function is added for consistency when calling multiple dispatch.

vec_mat_vec_batched(
    v::AbstractMatrix, 
    M::AbstractArray, 
    w::AbstractMatrix
)

Compute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲.

This function takes two matrices v and w, and a 3D array M, and computes the batched product v̲ M̲̲ w̲. The computation is performed in a broadcasted manner using the Flux.batched_vec function.

Arguments

  • v::AbstractMatrix: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors.
  • M::AbstractArray: A d×d×n array, where d is the dimension of the matrices and n is the number of matrices.
  • w::AbstractMatrix: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors.

Returns

An n dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.

Notes

This function uses the Flux.batched_vec function to perform the batched multiplication of the matrices in M with the vectors in w. The resulting vectors are then element-wise multiplied with the vectors in v and summed over the dimensions to obtain the final result.

vec_mat_vec_batched(
    v::AbstractVector{T}, 
    M::AbstractMatrix{S}, 
    w::AbstractVector{T}
) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}

Compute the product of a vector and a matrix in the form v̲ᵀ M̲ w̲ for a specific type of matrix and vectors.

This function takes two vectors v and w of type TaylorDiff.TaylorScalar{Float32,2}, and a matrix M of type Number, and computes the product v̲ M̲ w̲. The computation is performed by first performing the matrix-vector multiplication M̲ w̲, and then computing the dot product of the resulting vector with v.

Arguments

  • v::AbstractVector{T}: A d dimensional vector. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.
  • M::AbstractMatrix{S}: A d×d matrix. S is a subtype of Number.
  • w::AbstractVector{T}: A d dimensional vector. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.

Returns

A scalar which is the result of the product v̲ M̲ w̲.

Notes

This function uses the dot function to compute the final dot product.

vec_mat_vec_batched(
    v::AbstractMatrix{T}, 
    M::AbstractArray{S,3}, 
    w::AbstractMatrix{T}
) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}

Compute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲ for a specific type of matrices and vectors.

This function takes two matrices v and w of type TaylorDiff.TaylorScalar{Float32,2}, and a 3D array M of type Number, and computes the batched product v̲ M̲̲ w̲. The computation is performed by first extracting each slice of M and each column of w, then performing the vector-matrix multiplication for each pair of slices, and finally computing the element-wise multiplication of the resulting matrix with v and summing over the dimensions.

Arguments

  • v::AbstractMatrix{T}: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.
  • M::AbstractArray{S,3}: A d×d×n array, where d is the dimension of the matrices and n is the number of matrices. S is a subtype of Number.
  • w::AbstractMatrix{T}: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.

Returns

An n dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.

Notes

This function uses the eachslice and eachcol functions to extract the slices of M and the columns of w, respectively. It then uses a list comprehension to perform the vector-matrix multiplication for each pair of slices, and finally computes the element-wise multiplication of the resulting matrix with v and sums over the dimensions to obtain the final result.

AutoEncoderToolkit.utils.slogdetFunction
slogdet(A::AbstractArray{T}; check::Bool=false) where {T<:Number}

Compute the log determinant of a positive-definite matrix A or a 3D array of such matrices.

Arguments

  • A::AbstractArray{T}: A positive-definite matrix or a 3D array of positive-definite matrices whose log determinant is to be computed.
  • check::Bool=false: A flag that determines whether to check if the input matrix A is positive-definite. Defaults to false due to numerical instability.

Returns

  • The log determinant of A. If A is a 3D array, returns a 1D array of log determinants, one for each slice along the third dimension of A.

Description

This function computes the log determinant of a positive-definite matrix A or a 3D array of such matrices. It first computes the Cholesky decomposition of A, and then calculates the log determinant as twice the sum of the log of the diagonal elements of the lower triangular matrix from the Cholesky decomposition.

Conditions

The input matrix A must be a positive-definite matrix, i.e., it must be symmetric and all its eigenvalues must be positive. If check is set to true, the function will throw an error if A is not positive-definite.

GPU Support

The function supports both CPU and GPU arrays.

AutoEncoderToolkit.utils.sample_MvNormalCanonFunction
sample_MvNormalCanon(Σ⁻¹::AbstractArray{T}) where {T<:Number}

Draw a random sample from a multivariate normal distribution in canonical form.

Arguments

  • Σ⁻¹::AbstractArray{T}: The precision matrix (inverse of the covariance matrix) of the multivariate normal distribution. This can be a 2D array (matrix) or a 3D array.

Returns

  • A random sample drawn from the multivariate normal distribution specified by the input precision matrix. If Σ⁻¹ is a 3D array, returns a 2D array of samples, one for each slice along the third dimension of Σ⁻¹.

Description

This function draws a random sample from a multivariate normal distribution specified by a precision matrix Σ⁻¹. The precision matrix can be a 2D array (matrix) or a 3D array. If Σ⁻¹ is a 3D array, the function draws a sample for each slice along the third dimension of Σ⁻¹.

The function first inverts the precision matrix to obtain the covariance matrix, then performs a Cholesky decomposition of the covariance matrix. It then draws a sample from a standard normal distribution and multiplies it by the lower triangular matrix from the Cholesky decomposition to obtain the final sample.

GPU Support

The function supports both CPU and GPU arrays.

AutoEncoderToolkit.utils.unit_vectorFunction
unit_vector(x::AbstractVector, i::Int)

Create a unit vector of the same length as x with the i-th element set to 1.

Arguments

  • x::AbstractVector: The vector whose length is used to determine the dimension of the unit vector.
  • i::Int: The index of the element to be set to 1.

Returns

  • A unit vector of type eltype(x) and length equal to x with the i-th element set to 1.

Description

This function creates a unit vector of the same length as x with the i-th element set to 1. All other elements are set to 0.

Note

This function is marked with the @ignore_derivatives macro from the ChainRulesCore package, which means that all AutoDiff backends will ignore any call to this function when computing gradients.

unit_vector(x::AbstractMatrix, i::Int)

Create a unit vector of the same length as the number of rows in x with the i-th element set to 1.

Arguments

  • x::AbstractMatrix: The matrix whose number of rows is used to determine the dimension of the unit vector.
  • i::Int: The index of the element to be set to 1.

Returns

  • A unit vector of type eltype(x) and length equal to the number of rows in x with the i-th element set to 1.

Description

This function creates a unit vector of the same length as the number of rows in x with the i-th element set to 1. All other elements are set to 0.

AutoEncoderToolkit.utils.finite_difference_gradientFunction
finite_difference_gradient(
    f::Function,
    x::AbstractVecOrMat;
    fdtype::Symbol=:central
)

Compute the finite difference gradient of a function f at a point x.

Arguments

  • f::Function: The function for which the gradient is to be computed. This function must return a scalar value.
  • x::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.

Optional Keyword Arguments

  • fdtype::Symbol=:central: The finite difference type. It can be either :forward or :central. Defaults to :central.

Returns

  • A vector or a matrix representing the gradient of f at x, depending on the input type of x.

Description

This function computes the finite difference gradient of a function f at a point x. The gradient is a vector or a matrix where the i-th element is the partial derivative of f with respect to the i-th element of x.

The partial derivatives are computed using the forward or central difference formula, depending on the fdtype argument:

  • Forward difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x)] / ε
  • Central difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x - ε * eᵢ)] / 2ε

where ε is the step size and eᵢ is the i-th unit vector.

GPU Support

This function supports both CPU and GPU arrays.

AutoEncoderToolkit.utils.taylordiff_gradientFunction
    taylordiff_gradient(
            f::Function,
            x::AbstractVecOrMat
    )

Compute the gradient of a function f at a point x using Taylor series differentiation.

Arguments

  • f::Function: The function for which the gradient is to be computed. This must be a scalar function.
  • x::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.

Returns

  • A vector or a matrix representing the gradient of f at x, depending on the input type of x.

Description

This function computes the gradient of a function f at a point x using Taylor series differentiation. The gradient is a vector or a matrix where the i-th element or column is the partial derivative of f with respect to the i-th element of x.

The partial derivatives are computed using the TaylorDiff.derivative function.

GPU Support

This function currently only supports CPU arrays.