xent — postcards

xent — a transparent path to AGI

a transparent path to improve the cognitive abilities of language models toward general intelligence

take a language model and turn it into a self-improving system that is stable and competitive at the same time

cognitive training — make a model discover relevant skills by creating tasks for itself

how is cognitive training implemented?

define an appropriate space of tasks — the xent games — that leverage the implicit knowledge of language models

identify a unique meta-objective on the xent-game space to measure a task's relevance, using symmetry arguments

how is cognitive training special?

xent games are rich enough to overlap with interesting tasks, yet structured enough that cognitive training is computationally tractable

relevant skill discovery is singled out as the quintessential game models ought to learn: the game of creating games

the meta-objective is fixed a priori: models grow in capabilities, while being unable to rewrite or alter the meta-objective

what is xent's mission?

our goal is to build a principled, stable, self-improving system that teaches itself new skills — leading to a generally capable system

the task is to make an environment of environments: the game is to create a game that is useful for a model

how do we do this? we need a space of tasks, an algorithm to train on them, and a way to evaluate the quality of a task

frost training — a new faster-than-Monte-Carlo RL algorithm that works for all xent games

the meta-objective — the only mathematically correct way to measure a game's usefulness

what is cognitive training?

1. realize there is implicit knowledge — models do not know their own probabilities

2. formulate games on top of the implicit knowledge: cross-entropy games

3. train models on cross-entropy games with special frost algorithms, enhancing their capabilities

4. define a meta-game: the game is to create cross-entropy games

5. play the meta-game! from a sufficiently strong model, the process leads to automatic skill discovery

what is our thesis?

cognitive training is the formalization of what it means to acquire relevant new skills

scaling it up leads to the emergence of AGI

models teach themselves new skills from within — no external environments needed — and keep improving

they improve in a balanced, organic, competitive way, while keeping a fixed meta-objective — leaving less room for undesirable surprises

what about the meta-game?

the goal of cognitive training is to optimize a meta-objective over the space of xent games

playing a move of the meta-game means creating a xent game

the reward for creating a game is its internal and external transfer value

the external value is what external benchmarks measure

the internal value is the key novelty: there is a principled derivation of it

surprisingly, there is only one meta-game, up to two hyperparameters

what is the internal value of a game?

the question: can a model trained on games judge for itself the value of a new game?

informally, the internal value measures how well a game balances relevance to old games with new skill discovery

the remarkable, exciting result: there is essentially one consistent expression for that value

some questions about AGI

why don't we have AGI yet?

what does it mean to get to AGI?

what is xent's path to AGI?

where are we on that mission?

can this be a stable process?

why don't we have AGI yet?

models learn — at a spectacular level — the tasks they are trained on, but stay very weak on some others

in other words, models are very uneven in their abilities, much more so than humans

equivalently, they generalize less well than humans outside the training points

so: what is the weak point of model training?

what are examples of implicit-knowledge questions?

counterfactual — what information would change one's point of view on things?

interestingness — does a piece of information change our view of something?

in-filling — is there a plausible sequence of steps from A to B?

originality — given local plausibility, the most surprising end to a story

synthesis — given a family of texts, do ideas emerge that are in none of them?

what does "derived from first principles" mean?

a priori, there are many formulae to measure the quality of a game

but there is one that is more ‘right’ — more consistent than the others

is playing the meta-game expensive?

xent games are amenable to faster-than-Monte-Carlo training, thanks to their special differential structure — see frost algorithms

they also live in a much smaller space than the space of all tasks

what will we have, once cognitive training succeeds?

something that will learn faster and better from any post-training environment

how will we recognize AGI?

when models become very good at generating games that make them even better

when we get the equivalent of move 37 for the meta-game — a game that makes sense only a posteriori

when models can build a solid foundation of simple games that let them learn new tasks quickly

an example of implicit knowledge — signal

imagine an experiment with two copies of yourself: one that receives a piece of information, one that doesn't

if you could compare how both copies fare in the world afterwards, you'd have a good idea of the value of that information

this is impossible for humans — life is lived once — but for models it can be done at will

learn to play this game well, and you learn to gauge the value of any information

e.g. what difference does it make to read an article (say, the cognitive-training paper)?

what is move 37 for the meta-game?

we don't know yet — but it would be an unexpected game that teaches us genuinely new skills: a new territory that makes things look simple in hindsight

something simple that, at the same time, increases performance on benchmarks

scalable oversight

(card in progress)