Replicating DeepSeek-R3-Zero RL recipe on 3B LLM for <30$, the model develops self-verification and search abilities all on its own