You can now train your own DeepSeek-R1 model on your local device!
Hey guys! Last week, we released R1 Dynamic 1.58bit quants so you can run it locally & we couldn't thank you guys enough for the love!
I run an open-source project Unsloth with my brother & worked at NVIDIA, so optimizations are my thing. Today, we're back to announce that you can now train your own reasoning model like R1 locally.
- R1 was trained with an algorithm called GRPO, and we enhanced the entire process, making it use 80% less VRAM.
- We're not trying to replicate the entire R1 model as that's unlikely (unless you're super rich). We're trying to recreate R1's chain-of-thought/reasoning/thinking process
- We want a model to learn by itself without providing any reasons to how it derives answers. GRPO allows the model to figure out the reason autonomously. This is called the "aha" moment.
- GRPO can improve accuracy for tasks in medicine, law, math, coding + more.
- You can transform Llama 3.1 (8B), Phi-4 (14B) or any open model into a reasoning model. You'll need a minimum of 7GB of VRAM to do it!
- In a test example below, even after just one hour of GRPO training on Phi-4 (Microsoft's open-source model), the new model developed a clear thinking process and produced correct answers—unlike the original model.
Read our really informative blog + guide: https://unsloth.ai/blog/r1-reasoning
To train locally, install Unsloth by following the blog's instructions. Installation instructions are here.
I also know some of you guys don't have GPUs, but worry not, as you can do it for free on Google Colab/Kaggle using their free 15GB GPUs they provide.
We created a notebook + guide so you can train GRPO with Phi-4 (14B) for free on Google Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4_(14B)-GRPO.ipynb-GRPO.ipynb)
Have a lovely weekend! :)