Let's build DeepSeek from Scratch | Taught by MIT PhD graduate
https://i.redd.it/vjwhw6ticwie1.gif
Join us for the 6pm Youtube premier here: https://youtu.be/QWNxQIq0hMo?si=YVHJtgMRjlVj2SZJ
Ever since DeepSeek was launched, everyone is focused on:
- Flashy headlines
- Company wars
- Building LLM applications powered by DeepSeek
I very strongly think that students, researchers, engineers and working professionals should focus on the foundations.
The real question we should ask ourselves is:
“Can I build the DeepSeek architecture and model myself, from scratch?”
If you ask this question, you will discover that to make DeepSeek work, there are a number of key ingredients which play a role:
(1) Mixture of Experts (MoE)
(2) Multi-head Latent Attention (MLA)
(3) Rotary Positional Encodings (RoPE)
(4) Multi-token prediction (MTP)
(5) Supervised Fine-Tuning (SFT)
(6) Group Relative Policy Optimisation (GRPO)
My aim with the “Build DeepSeek from Scratch” playlist is:
- To teach you the mathematical foundations behind all the 6 ingredients above.
- To code all 6 ingredients above, from scratch.
- To assemble these ingredients and to run a “mini Deep-Seek” on your own.
After this, you will among the top 0.1%. of ML/LLM engineers who can build DeepSeek ingredients on their own.
This playlist won’t be a 1 hour or 2 hour video. This will be a mega playlist of 35-40 videos with a duration of 40+ hours.
It will be in-depth. No fluff. Solid content.
Join us for the 6pm premier here: https://youtu.be/QWNxQIq0hMo?si=YVHJtgMRjlVj2SZJ
P.S: Attached is a small GIF showing the notes we have made. This is just 5-10% of the total amount of notes and material we have prepared for this series!