Self-supervised learning is a type of training process that is often accompanied in JEPA that makes the model learn meaningful ontologies from the data itself rather than human-provided labels. One popular SSL technique is BYOL which uses an online network and target network to prevent model collapse. SSL can learn from the data structure itself because it takes advantage of the data invariances and relationships that create things like temporal continuity in videos or spatial consistency in images. Because Predicting abstract representations reduces computational waste compared to pixel-level prediction, SSL focuses on robust features from raw structure of the data.