Training a German LLM from Scratch 🦜,
14 Nov. 2024
(posts)This article is not finished and will be updated. The research group I work with has access to a small GPU cluster, which occasionally sits idle. To avoid wasting valuable compute resources (IDLE GPUs essentially burn money through opportunity costs), I decided to train a German GPT-2-style model from scratch, using only German text.
Existing German models available on Hugging Face have 137M parameters and a context length of 1024 tokens1, which is quite limited compared to recently released …
2296 Words, Tagged with:
Deep Learning ·
Generative Models ·
LLM