You can now train your own DeepSeek-R1 model on your local device!

Hey guys! Last week, we released R1 Dynamic 1.58bit quants so you can run it locally & we couldn't thank you guys enough for the love!

I run an open-source project Unsloth with my brother & worked at NVIDIA, so optimizations are my thing. Today, we're back to announce that you can now train your own reasoning model like R1 locally.

  1. R1 was trained with an algorithm called GRPO, and we enhanced the entire process, making it use 80% less VRAM.
  2. We're not trying to replicate the entire R1 model as that's unlikely (unless you're super rich). We're trying to recreate R1's chain-of-thought/reasoning/thinking process
  3. We want a model to learn by itself without providing any reasons to how it derives answers. GRPO allows the model to figure out the reason autonomously. This is called the "aha" moment.
  4. GRPO can improve accuracy for tasks in medicine, law, math, coding + more.
  5. You can transform Llama 3.1 (8B), Phi-4 (14B) or any open model into a reasoning model. You'll need a minimum of 7GB of VRAM to do it!
  6. In a test example below, even after just one hour of GRPO training on Phi-4 (Microsoft's open-source model), the new model developed a clear thinking process and produced correct answers—unlike the original model.

https://preview.redd.it/tnr6ep71nrhe1.jpg?width=3812&format=pjpg&auto=webp&s=1f800377c5bea1e905b0db68d16917e3cf173ff5

Read our really informative blog + guide: https://unsloth.ai/blog/r1-reasoning

To train locally, install Unsloth by following the blog's instructions. Installation instructions are here.

I also know some of you guys don't have GPUs, but worry not, as you can do it for free on Google Colab/Kaggle using their free 15GB GPUs they provide.
We created a notebook + guide so you can train GRPO with Phi-4 (14B) for free on Google Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4_(14B)-GRPO.ipynb-GRPO.ipynb)

Have a lovely weekend! :)