Bootstrap your own latent: A new approach to self-supervised Learning

BYOL is a type of SSL that uses two networks: an online network and a target network. The key thing is that Self-supervised learning generates training signals from data structure itself, so the networks are sampled from the same data distribution (see Contrastive learning prevents model collapse by pushing apart positive and negative examples). This is good because it prevents model collapse through asymmetric prediction and slow updates.

The online network consists of an encoder, projector, and predictor. The target network shares the encoder and projector but lacks a predictor and updates via EMA of the online network’s parameters because Exponential Moving Average creates stable learning targets in self-supervised systems. Both process differently masked views of the same input image, creating predictions in latent space rather than pixels which is beneficial because Predicting abstract representations reduces computational waste compared to pixel-level prediction.

GPT came up with this example that helped me understand it:

Consider a single image of a bouncing ball, augmented into two views: $v$ (cropped left, color-distorted) and $v^{'}$ (flipped, blurred).

Forward Pass 1: Feed $v$ to online network ( $θ$ ): encoder $f_{θ} (v) \to y_{θ}$ , projector $g_{θ} (y_{θ}) \to z_{θ}$ , predictor $q_{θ} (z_{θ})$ . Feed $v^{'}$ to target network ( $ξ$ ): encoder $f_{ξ} (v^{'}) \to y_{ξ}^{'}$ , projector $g_{ξ} (y_{ξ}^{'}) \to z_{ξ}^{'}$ .

Loss Computation: Minimize normalized L2 distance, forcing online prediction to match target’s stable representation.

Symmetric Pass: Swap inputs— $v^{'}$ to online, $v$ to target—and average losses: $L_{BYOL} = \frac{1}{2} (L + \tilde{L})$

Updates: Online $θ$ optimizes via gradient descent; target $ξ \leftarrow τ ξ + (1 - τ) θ$ where based on Exponential Moving Average creates stable learning targets in self-supervised systems could have $τ = 0.99$ ensuring slow, stable evolution.

🪴 Satwik Panigrahi

Explorer

Bootstrap your own latent: A new approach to self-supervised Learning

Graph View

Backlinks

Recent Notes

Every new AI capability jump enables new product categories to come into mainstream

Bootstrap your own latent: A new approach to self-supervised Learning

Satwik's Values and Code