RL Training - Search News

13d

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

Ai2 updates its Olmo 3 family of models to Olmo 3.1 following additional extended RL training to boost performance.

11d

Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

The company is positioning its new offerings as a business-ready way for enterprises to build domain-specific agents without first needing to create foundation models.

NextBigFuture

Microsoft and China AI Research Possible Reinforcement Pre-Training Breakthrough

Reinforcement Pre-Training (RPT) is a new method for training large language models (LLMs) by reframing the standard task of predicting the next token in a sequence as a reasoning problem solved using ...

Seeking Alpha

Nvidia's $10 Trillion+ Roadmap: Reinforcement Learning And Synthetic Data

AI scaling faces diminishing returns due to the growing scarcity of high-quality, high-entropy data from the internet, pushing the industry towards richer, synthetic data. Nvidia is strategically ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results