· home
xent — a transparent path to AGI
a transparent path to improve the cognitive abilities of language models toward general intelligence
take a language model and turn it into a self-improving system that is stable and competitive at the same time
cognitive training — make a model discover relevant skills by creating tasks for itself
how is cognitive training implemented?
define an appropriate space of tasks — the xent games — that leverage the implicit knowledge of language models
identify a unique meta-objective on the xent-game space to measure a task's relevance, using symmetry arguments
how is cognitive training special?
xent games are rich enough to overlap with interesting tasks, yet structured enough that cognitive training is computationally tractable
relevant skill discovery is singled out as the quintessential game models ought to learn: the game of creating games
the meta-objective is fixed a priori: models grow in capabilities, while being unable to rewrite or alter the meta-objective
what is xent's mission?
our goal is to build a principled, stable, self-improving system that teaches itself new skills — leading to a generally capable system
the task is to make an environment of environments: the game is to create a game that is useful for a model
how do we do this? we need a space of tasks, an algorithm to train on them, and a way to evaluate the quality of a task
frost training — a new faster-than-Monte-Carlo RL algorithm that works for all xent games
the meta-objective — the only mathematically correct way to measure a game's usefulness
what is cognitive training?
1. realize there is implicit knowledge — models do not know their own probabilities
2. formulate games on top of the implicit knowledge: cross-entropy games
3. train models on cross-entropy games with special frost algorithms, enhancing their capabilities
4. define a meta-game: the game is to create cross-entropy games
5. play the meta-game! from a sufficiently strong model, the process leads to automatic skill discovery
what is our thesis?
cognitive training is the formalization of what it means to acquire relevant new skills
scaling it up leads to the emergence of AGI
models teach themselves new skills from within — no external environments needed — and keep improving
they improve in a balanced, organic, competitive way, while keeping a fixed meta-objective — leaving less room for undesirable surprises
what about the meta-game?
the goal of cognitive training is to optimize a meta-objective over the space of xent games
playing a move of the meta-game means creating a xent game
the reward for creating a game is its internal and external transfer value
the external value is what external benchmarks measure
the internal value is the key novelty: there is a principled derivation of it
surprisingly, there is only one meta-game, up to two hyperparameters
what is the internal value of a game?
the question: can a model trained on games judge for itself the value of a new game?
informally, the internal value measures how well a game balances relevance to old games with new skill discovery
the remarkable, exciting result: there is essentially one consistent expression for that value
why don't we have AGI yet?
models learn — at a spectacular level — the tasks they are trained on, but stay very weak on some others
in other words, models are very uneven in their abilities, much more so than humans
equivalently, they generalize less well than humans outside the training points
so: what is the weak point of model training?
what are examples of implicit-knowledge questions?
counterfactual — what information would change one's point of view on things?
interestingness — does a piece of information change our view of something?
in-filling — is there a plausible sequence of steps from A to B?
originality — given local plausibility, the most surprising end to a story
synthesis — given a family of texts, do ideas emerge that are in none of them?
what does "derived from first principles" mean?
a priori, there are many formulae to measure the quality of a game
but there is one that is more ‘right’ — more consistent than the others
is playing the meta-game expensive?
xent games are amenable to faster-than-Monte-Carlo training, thanks to their special differential structure — see frost algorithms
they also live in a much smaller space than the space of all tasks
what will we have, once cognitive training succeeds?
something that will learn faster and better from any post-training environment
how will we recognize AGI?
when models become very good at generating games that make them even better
when we get the equivalent of move 37 for the meta-game — a game that makes sense only a posteriori
when models can build a solid foundation of simple games that let them learn new tasks quickly
an example of implicit knowledge — signal
imagine an experiment with two copies of yourself: one that receives a piece of information, one that doesn't
if you could compare how both copies fare in the world afterwards, you'd have a good idea of the value of that information
this is impossible for humans — life is lived once — but for models it can be done at will
learn to play this game well, and you learn to gauge the value of any information
e.g. what difference does it make to read an article (say, the cognitive-training paper)?
what is move 37 for the meta-game?
we don't know yet — but it would be an unexpected game that teaches us genuinely new skills: a new territory that makes things look simple in hindsight
something simple that, at the same time, increases performance on benchmarks
scalable oversight
(card in progress)