Type: Podcast / Technical Roundtable / Strategic Forecast Main Topic: A retrospective analysis of the "DeepSeek Moment" of 2025, the technical shift impactful posttraining techniques (RLVR), and predictions for the AI landscape in 2026. Speakers: Lex Fridman (Host) Sebastian Raschka (AI Researcher, Educator, Author of Build a LLM From Scratch) Nathan Lambert (AI Researcher at AI2, RLHF Expert, Posttraining Lead) This conversation takes place in early 2026 (contextual timeframe), looking back at the explosive year of 2025. The primary goal is to dissect the technical breakthroughs that occurred—specifically the shift from massive pretraining to inferencetime scaling and Reinforcement Learning with Verifiable Rewards (RLVR). The trio analyzes the geopolitical tension between US closedsource labs (OpenAI, Anthropic, Google) and Chinese openweight dominance (DeepSeek, Qwen). They aim to provide a "state of the art" snapshot for researchers and engineers, demystifying how reasoning models work and where the economic value of AI will settle. The speakers provide highlevel educational breakdowns of complex mechanisms. RLVR (Reinforcement Learning with Verifiable Rewards): Definition: A training method where the model is rewarded based on the accuracy of a final answer (e.g., in math or code) rather than human preference. Mechanism: It allows the model to generate thousands of internal "thoughts" or steps. If the final answer is correct, the entire chain is reinforced. This simulates "System 2" thinking. Impact: This enabled the "DeepSeek R1" moment, proving that smaller models can achieve stateoftheart results by thinking longer, rather than just being bigger. InferenceTime Scaling (The O1 Paradigm): Concept: Instead of spending capital on training a massive model once (Pretraining scaling), you spend capital during the actual query (Inference scaling). Tradeoff: You trade latency (speed) for intelligence. T
Loading analysis...