Utils
AutoEncoderToolkit.jl
offers a series of utility functions for different tasks.
Training Utilities
AutoEncoderToolkit.utils.step_scheduler
— Function`step_scheduler(epoch, epoch_change, learning_rates)`
Simple function to define different learning rates at specified epochs.
Arguments
epoch::Int
: Epoch at which to define learning rate.epoch_change::Vector{<:Int}
: Number of epochs at which to change learning rate. It must include the initial learning rate!learning_rates::Vector{<:AbstractFloat}
: Learning rate value for the epoch range. Must be the same length asepoch_change
Returns
η::AbstractFloat
: Learning rate for the current epoch.
AutoEncoderToolkit.utils.cycle_anneal
— Functioncycle_anneal(
epoch::Int,
n_epoch::Int,
n_cycles::Int;
frac::AbstractFloat=0.5f0,
βmax::Number=1.0f0,
βmin::Number=0.0f0,
T::Type=Float32
)
Function that computes the value of the annealing parameter β for a variational autoencoder as a function of the epoch number according to the cyclical annealing strategy.
Arguments
epoch::Int
: Epoch on which to evaluate the value of the annealing parameter.n_epoch::Int
: Number of epochs that will be run to train the VAE.n_cycles::Int
: Number of annealing cycles to be fit within the number of epochs.
Optional Arguments
frac::AbstractFloat= 0.5f0
: Fraction of the cycle in which the annealing parameter β will increase from the minimum to the maximum value.βmax::Number=1.0f0
: Maximum value that the annealing parameter can reach.βmin::Number=0.0f0
: Minimum value that the annealing parameter can reach.T::Type=Float32
: The type of the output. The function will convert the output to this type.
Returns
β::T
: Value of the annealing parameter.
Citation
Fu, H. et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing. Preprint at http://arxiv.org/abs/1903.10145 (2019).
AutoEncoderToolkit.utils.locality_sampler
— Functionlocality_sampler(data, dist_tree, n_primary, n_secondary, k_neighbors; index=false)
Algorithm to generate mini-batches based on spatial locality as determined by a pre-constructed nearest neighbors tree.
Arguments
data::AbstractArray
: An array containing the data points. The data points can be of any dimension.dist_tree::NearestNeighbors.NNTree
:NearestNeighbors.jl
tree used to determine the distance between data points.n_primary::Int
: Number of primary points to sample.n_secondary::Int
: Number of secondary points to sample from the neighbors of each primary point.k_neighbors::Int
: Number of nearest neighbors from which to potentially sample the secondary points.
Optional Keyword Arguments
index::Bool
: Iftrue
, returns the indices of the selected samples. Iffalse
, returns thedata
corresponding to the indexes. Defaults tofalse
.
Returns
- If
index
istrue
, returnssample_idx::Vector{Int64}
: Indices of data points to include in the mini-batch. - If
index
isfalse
, returnssample_data::AbstractArray
: The data points to include in the mini-batch.
Description
This sampling algorithm consists of three steps:
- For each datapoint, determine the
k_neighbors
nearest neighbors using thedist_tree
. - Uniformly sample
n_primary
points without replacement from all data points. - For each primary point, sample
n_secondary
points without replacement from itsk_neighbors
nearest neighbors.
Examples
# Pre-constructed NearestNeighbors.jl tree
dist_tree = NearestNeighbors.KDTree(data, metric)
sample_indices = locality_sampler(data, dist_tree, 10, 5, 50)
Citation
Skafte, N., Jø rgensen, M. & Hauberg, S. ren. Reliable training and estimation of variance networks. in Advances in Neural Information Processing Systems vol. 32 (Curran Associates, Inc., 2019).
Centroid Finding Utilities
Some VAE models, such as the RHVAE
, require clustering of the data. Specifically RHVAE
can take a fixed subset of the training data as a reference for the computation of the metric tensor. The following functions can be used to define this reference subset to be used as centroids for the metric tensor computation.
AutoEncoderToolkit.utils.centroids_kmeans
— Functioncentroids_kmeans(
x::AbstractMatrix,
n_centroids::Int;
assign::Bool=false
)
Perform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).
Arguments
x::AbstractMatrix
: The input data. Rows represent individual samples.n_centroids::Int
: The number of centroids to compute.
Optional Keyword Arguments
assign::Bool=false
: If true, also return the assignments of each point to a centroid.
Returns
- If
assign
is false, returns a matrix where each column is a centroid. - If
assign
is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.
Examples
data = rand(100, 10)
centroids = centroids_kmeans(data, 5)
centroids_kmeans(
x::AbstractArray,
n_centroids::Int;
reshape_centroids::Bool=true,
assign::Bool=false
)
Perform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).
The input data is flattened into a matrix before performing k-means clustering. This is done because k-means operates on a set of data points in a vector space and cannot handle multi-dimensional arrays. Flattening the input ensures that the k-means algorithm can process the data correctly.
By default, the output centroids are reshaped back to the original input shape. This is controlled by the reshape_centroids
argument.
Arguments
x::AbstractArray
: The input data. It can be a multi-dimensional array where the last dimension represents individual samples.n_centroids::Int
: The number of centroids to compute.
Optional Keyword Arguments
reshape_centroids::Bool=true
: If true, reshape the output centroids back to the original input shape.assign::Bool=false
: If true, also return the assignments of each point to a centroid.
Returns
- If
assign
is false, returns a matrix where each column is a centroid. - If
assign
is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.
Examples
data = rand(100, 10)
centroids = centroids_kmeans(data, 5)
AutoEncoderToolkit.utils.centroids_kmedoids
— Function centroids_kmedoids(
x::AbstractMatrix, n_centroids::Int; assign::Bool=false
)
Perform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).
Arguments
x::AbstractMatrix
: The input data. Rows represent individual samples.n_centroids::Int
: The number of centroids to compute.dist::Distances.PreMetric=Distances.Euclidean()
: The distance metric to use when computing the pairwise distance matrix.
Optional Keyword Arguments
assign::Bool=false
: If true, also return the assignments of each point to a centroid.
Returns
- If
assign
is false, returns a matrix where each column is a centroid. - If
assign
is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.
Examples
data = rand(100, 10)
centroids = centroids_kmedoids(data, 5)
centroids_kmedoids(
x::AbstractArray,
n_centroids::Int,
dist::Distances.PreMetric=Distances.Euclidean();
assign::Bool=false
)
Perform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).
Arguments
x::AbstractArray
: The input data. The last dimension ofx
should contain each of the samples that should be clustered.n_centroids::Int
: The number of centroids to compute.dist::Distances.PreMetric=Distances.Euclidean()
: The distance metric to use for the clustering. Defaults to Euclidean distance.
Optional Keyword Arguments
assign::Bool=false
: If true, also return the assignments of each point to a centroid.
Returns
- If
assign
is false, returns an array where each column is a centroid. - If
assign
is true, returns a tuple where the first element is the array of centroids and the second element is a vector of assignments.
Examples
data = rand(10, 100)
centroids = centroids_kmedoids(data, 5)
Other Utilities
AutoEncoderToolkit.utils.storage_type
— Functionstorage_type(A::AbstractArray)
Determine the storage type of an array.
This function recursively checks the parent of the array until it finds the base storage type. This is useful for determining whether an array or its subarrays are stored on the CPU or GPU.
Arguments
A::AbstractArray
: The array whose storage type is to be determined.
Returns
The type of the array that is the base storage of A
.
AutoEncoderToolkit.utils.vec_to_ltri
— Function vec_to_ltri(diag::AbstractVecOrMat, lower::AbstractVecOrMat)
Convert two one-dimensional vectors or matrices into a lower triangular matrix or a 3D tensor.
Arguments
diag::AbstractVecOrMat
: The input vector or matrix to be converted into the diagonal of the matrix. If it's a matrix, each column is considered as a separate vector.lower::AbstractVecOrMat
: The input vector or matrix to be converted into the lower triangular part of the matrix. The length of this vector or the number of rows in this matrix should be a triangular number (i.e., the sum of the firstn
natural numbers for somen
). If it's a matrix, each column is considered the lower part of a separate lower triangular matrix.
Returns
- A lower triangular matrix or a 3D tensor where each slice is a lower triangular matrix constructed from
diag
andlower
.
Description
This function constructs a lower triangular matrix or a 3D tensor from two input vectors or matrices, diag
and lower
. The diag
vector or matrix provides the diagonal elements of the matrix, while the lower
vector or matrix provides the elements below the diagonal. The function uses a comprehension to construct the matrix or tensor, with the lower_index
function calculating the appropriate index in the lower
vector or matrix for each element below the diagonal.
GPU Support
The function supports both CPU and GPU arrays. For GPU arrays, the data is first transferred to the CPU, the lower triangular matrix or tensor is constructed, and then it is transferred back to the GPU.
AutoEncoderToolkit.utils.vec_mat_vec_batched
— Functionvec_mat_vec_batched(
v::AbstractVector,
M::AbstractMatrix,
w::AbstractVector
)
Compute the product of a vector, a matrix, and another vector in the form v̲ᵀ M̲̲ w̲.
This function takes two vectors v
and w
, and a matrix M
, and computes the product v̲ M̲̲ w̲. This function is added for consistency when calling multiple dispatch.
Arguments
v::AbstractVector
: Ad
dimensional vector.M::AbstractMatrix
: Ad×d
matrix.w::AbstractVector
: Ad
dimensional vector.
Returns
A scalar which is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.
Notes
This function uses the LinearAlgebra.dot
function to perform the multiplication of the matrix M
with the vector w
. The resulting vector is then element-wise multiplied with the vector v
and summed over the dimensions to obtain the final result. This function is added for consistency when calling multiple dispatch.
vec_mat_vec_batched(
v::AbstractMatrix,
M::AbstractArray,
w::AbstractMatrix
)
Compute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲.
This function takes two matrices v
and w
, and a 3D array M
, and computes the batched product v̲ M̲̲ w̲. The computation is performed in a broadcasted manner using the Flux.batched_vec
function.
Arguments
v::AbstractMatrix
: Ad×n
matrix, whered
is the dimension of the vectors andn
is the number of vectors.M::AbstractArray
: Ad×d×n
array, whered
is the dimension of the matrices andn
is the number of matrices.w::AbstractMatrix
: Ad×n
matrix, whered
is the dimension of the vectors andn
is the number of vectors.
Returns
An n
dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.
Notes
This function uses the Flux.batched_vec
function to perform the batched multiplication of the matrices in M
with the vectors in w
. The resulting vectors are then element-wise multiplied with the vectors in v
and summed over the dimensions to obtain the final result.
vec_mat_vec_batched(
v::AbstractVector{T},
M::AbstractMatrix{S},
w::AbstractVector{T}
) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}
Compute the product of a vector and a matrix in the form v̲ᵀ M̲ w̲ for a specific type of matrix and vectors.
This function takes two vectors v
and w
of type TaylorDiff.TaylorScalar{Float32,2}
, and a matrix M
of type Number
, and computes the product v̲ M̲ w̲. The computation is performed by first performing the matrix-vector multiplication M̲ w̲, and then computing the dot product of the resulting vector with v
.
Arguments
v::AbstractVector{T}
: Ad
dimensional vector.T
is a subtype ofTaylorDiff.TaylorScalar{Float32,2}
.M::AbstractMatrix{S}
: Ad×d
matrix.S
is a subtype ofNumber
.w::AbstractVector{T}
: Ad
dimensional vector.T
is a subtype ofTaylorDiff.TaylorScalar{Float32,2}
.
Returns
A scalar which is the result of the product v̲ M̲ w̲.
Notes
This function uses the dot
function to compute the final dot product.
vec_mat_vec_batched(
v::AbstractMatrix{T},
M::AbstractArray{S,3},
w::AbstractMatrix{T}
) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}
Compute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲ for a specific type of matrices and vectors.
This function takes two matrices v
and w
of type TaylorDiff.TaylorScalar{Float32,2}
, and a 3D array M
of type Number
, and computes the batched product v̲ M̲̲ w̲. The computation is performed by first extracting each slice of M
and each column of w
, then performing the vector-matrix multiplication for each pair of slices, and finally computing the element-wise multiplication of the resulting matrix with v
and summing over the dimensions.
Arguments
v::AbstractMatrix{T}
: Ad×n
matrix, whered
is the dimension of the vectors andn
is the number of vectors.T
is a subtype ofTaylorDiff.TaylorScalar{Float32,2}
.M::AbstractArray{S,3}
: Ad×d×n
array, whered
is the dimension of the matrices andn
is the number of matrices.S
is a subtype ofNumber
.w::AbstractMatrix{T}
: Ad×n
matrix, whered
is the dimension of the vectors andn
is the number of vectors.T
is a subtype ofTaylorDiff.TaylorScalar{Float32,2}
.
Returns
An n
dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.
Notes
This function uses the eachslice
and eachcol
functions to extract the slices of M
and the columns of w
, respectively. It then uses a list comprehension to perform the vector-matrix multiplication for each pair of slices, and finally computes the element-wise multiplication of the resulting matrix with v
and sums over the dimensions to obtain the final result.
AutoEncoderToolkit.utils.slogdet
— Functionslogdet(A::AbstractArray{T}; check::Bool=false) where {T<:Number}
Compute the log determinant of a positive-definite matrix A
or a 3D array of such matrices.
Arguments
A::AbstractArray{T}
: A positive-definite matrix or a 3D array of positive-definite matrices whose log determinant is to be computed.check::Bool=false
: A flag that determines whether to check if the input matrixA
is positive-definite. Defaults tofalse
due to numerical instability.
Returns
- The log determinant of
A
. IfA
is a 3D array, returns a 1D array of log determinants, one for each slice along the third dimension ofA
.
Description
This function computes the log determinant of a positive-definite matrix A
or a 3D array of such matrices. It first computes the Cholesky decomposition of A
, and then calculates the log determinant as twice the sum of the log of the diagonal elements of the lower triangular matrix from the Cholesky decomposition.
Conditions
The input matrix A
must be a positive-definite matrix, i.e., it must be symmetric and all its eigenvalues must be positive. If check
is set to true
, the function will throw an error if A
is not positive-definite.
GPU Support
The function supports both CPU and GPU arrays.
AutoEncoderToolkit.utils.sample_MvNormalCanon
— Functionsample_MvNormalCanon(Σ⁻¹::AbstractArray{T}) where {T<:Number}
Draw a random sample from a multivariate normal distribution in canonical form.
Arguments
Σ⁻¹::AbstractArray{T}
: The precision matrix (inverse of the covariance matrix) of the multivariate normal distribution. This can be a 2D array (matrix) or a 3D array.
Returns
- A random sample drawn from the multivariate normal distribution specified by the input precision matrix. If
Σ⁻¹
is a 3D array, returns a 2D array of samples, one for each slice along the third dimension ofΣ⁻¹
.
Description
This function draws a random sample from a multivariate normal distribution specified by a precision matrix Σ⁻¹
. The precision matrix can be a 2D array (matrix) or a 3D array. If Σ⁻¹
is a 3D array, the function draws a sample for each slice along the third dimension of Σ⁻¹
.
The function first inverts the precision matrix to obtain the covariance matrix, then performs a Cholesky decomposition of the covariance matrix. It then draws a sample from a standard normal distribution and multiplies it by the lower triangular matrix from the Cholesky decomposition to obtain the final sample.
GPU Support
The function supports both CPU and GPU arrays.
AutoEncoderToolkit.utils.unit_vector
— Functionunit_vector(x::AbstractVector, i::Int)
Create a unit vector of the same length as x
with the i
-th element set to 1.
Arguments
x::AbstractVector
: The vector whose length is used to determine the dimension of the unit vector.i::Int
: The index of the element to be set to 1.
Returns
- A unit vector of type
eltype(x)
and length equal tox
with thei
-th element set to 1.
Description
This function creates a unit vector of the same length as x
with the i
-th element set to 1. All other elements are set to 0.
Note
This function is marked with the @ignore_derivatives
macro from the ChainRulesCore
package, which means that all AutoDiff backends will ignore any call to this function when computing gradients.
unit_vector(x::AbstractMatrix, i::Int)
Create a unit vector of the same length as the number of rows in x
with the i
-th element set to 1.
Arguments
x::AbstractMatrix
: The matrix whose number of rows is used to determine the dimension of the unit vector.i::Int
: The index of the element to be set to 1.
Returns
- A unit vector of type
eltype(x)
and length equal to the number of rows inx
with thei
-th element set to 1.
Description
This function creates a unit vector of the same length as the number of rows in x
with the i
-th element set to 1. All other elements are set to 0.
AutoEncoderToolkit.utils.finite_difference_gradient
— Functionfinite_difference_gradient(
f::Function,
x::AbstractVecOrMat;
fdtype::Symbol=:central
)
Compute the finite difference gradient of a function f
at a point x
.
Arguments
f::Function
: The function for which the gradient is to be computed. This function must return a scalar value.x::AbstractVecOrMat
: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.
Optional Keyword Arguments
fdtype::Symbol=:central
: The finite difference type. It can be either:forward
or:central
. Defaults to:central
.
Returns
- A vector or a matrix representing the gradient of
f
atx
, depending on the input type ofx
.
Description
This function computes the finite difference gradient of a function f
at a point x
. The gradient is a vector or a matrix where the i
-th element is the partial derivative of f
with respect to the i
-th element of x
.
The partial derivatives are computed using the forward or central difference formula, depending on the fdtype
argument:
- Forward difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x)] / ε
- Central difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x - ε * eᵢ)] / 2ε
where ε is the step size and eᵢ is the i
-th unit vector.
GPU Support
This function supports both CPU and GPU arrays.
AutoEncoderToolkit.utils.taylordiff_gradient
— Function taylordiff_gradient(
f::Function,
x::AbstractVecOrMat
)
Compute the gradient of a function f
at a point x
using Taylor series differentiation.
Arguments
f::Function
: The function for which the gradient is to be computed. This must be a scalar function.x::AbstractVecOrMat
: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.
Returns
- A vector or a matrix representing the gradient of
f
atx
, depending on the input type ofx
.
Description
This function computes the gradient of a function f
at a point x
using Taylor series differentiation. The gradient is a vector or a matrix where the i
-th element or column is the partial derivative of f
with respect to the i
-th element of x
.
The partial derivatives are computed using the TaylorDiff.derivative function.
GPU Support
This function currently only supports CPU arrays.