Unlike most ML models that generate a probability distribution of the possible outcomes, energy-based models assign the compatibility of the pair of inputs. For example, if is a good continuation of , then an EBM would output low energy between the pair. Whereas, if it were bad the energy output would be high. Imagine it as if magnets repelling vs. attracting. The reason why these EBMs are easier than Transformers is that Predicting abstract representations reduces computational waste compared to pixel-level prediction.