Models | Aurora GPT

Venkat Vishwanath and Sam Foreman lead this team, which pursues core model pre-training and model architecture developments, such as mixture-of-experts (MOE) studies, long-context studies, alternative attention models, architecture and hyperparameter A/B testing, and model scaling performance analysis. It will provide improved understanding of these dimensions of model architecture and training processes, plus a series of raw models (7B, 70B, etc.) trained on large datasets.