Exponential Moving Averages (EMA) create stable learning targets in SSL techniques like BYOL by maintaining slow updating versions of the model parameters that smooths out rapid fluctuations in the online network’s updates. This keeps the online network from running into unstable feedback loops where the model predicts its own noisy outputs.
EMA computes the weighted average of the parameters over time, by giving exponentially decreasing weights to older updates. Without EMA, using the online network itself as a target causes collapse. EMA rather makes a slow average that provides reliable supervision even as the online network updates aggressively