Reinforcement Learning LLM

RLHF and LLM Training with Invisible Technologies: Tech Disruptors

Matt Fitzpatrick, CEO of Invisible Technologies talk about the use of reinforcement learning by frontier model providers for training and the company's enterprise business. From reinforcement learning ...

Nature

LLMs augmented hierarchical reinforcement learning with action primitives for long-horizon manipulation tasks

Deep reinforcement learning methods have shown promising results in learning specific tasks, but struggle to cope with the challenges of long horizon manipulation tasks. As task complexity increases, ...

VentureBeat

Nvidia researchers boost LLMs reasoning skills by getting them to 'think' during pre-training

Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates ...

Tech Xplore

How AI chatbots become better learning coaches

Many AI systems answer questions in a matter of seconds—and, in the process, often prevent people from doing exactly what learning is all about: thinking for themselves. Machine learning expert Jakub ...

Computer Weekly

Ineffable Intelligence strikes Google Cloud deal for Vera Rubin GPU power

LLM” developer led by AlphaGo founder David Silver selects Google Cloud vera Rubin GPU infrastructure to build reinforcement ...

VentureBeat

MiniMax-M1 is a new open source model with 1 MILLION TOKEN context and new, hyper efficient reinforcement learning

Chinese AI startup MiniMax, perhaps best known in the West for its hit realistic AI video model Hailuo, has released its latest large language model, MiniMax-M1 — and in great news for enterprises and ...

Nature

Preserving and combining knowledge in robotic lifelong reinforcement learning

Humans can continually accumulate knowledge and develop increasingly complex behaviours and skills throughout their lives, which is a capability known as ‘lifelong learning’. Although this lifelong ...

Sify.com

Pressure Paradox: How Punishing AI Makes Better LLMs

So far, scientists have relied on positive reinforcement learning to train LLMs, but the opposite seems to be giving much better results, finds Satyen K. Bordoloi… This is a finding that’ll have ...

Wired

The Man Behind AlphaGo Thinks AI Is Taking the Wrong Path

David Silver gave the world its very first glimpse of superintelligence. In 2016, an AI program he developed at Google DeepMind, AlphaGo, taught itself to play the famously difficult game of Go with a ...

The Conversation

What is reinforcement learning? An AI researcher explains a key method of teaching machines – and how it relates to training your dog

Understanding intelligence and creating intelligent machines are grand scientific challenges of our times. The ability to learn from experience is a cornerstone of intelligence for machines and living ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results