Adrià Garriga-Alonso
Subscribe
Sign in
Home
Notes
Archive
About
Inside me there are at least two wolves
Some people have many characters inside of them.
Nov 30, 2025
•
The Column Space
2
Latest
Top
Discussions
Concrete reasons why alignment is on track to be solved
Evan Hubinger, alignment stress-testing lead at Anthropic, wrote a post making the case for why alignment remains a difficult unsolved problem. It is a…
Nov 27, 2025
•
The Column Space
6
Spatially distributed consciousness is not an abstract thought experiment if AI is conscious
A fun paper in philosophy of consciousness is Eric Schwitzgebel’s “If Materialism Is True, The United States Is Probably Conscious”.
Nov 26, 2025
•
The Column Space
14
1
1
Alignment will happen by default. What’s next?
I’m not fully convinced of this, but I’m fairly convinced, more and more so over time.
Nov 25, 2025
•
The Column Space
9
Why investigate neural networks that plan? (Part 1)
To catch a mesa-optimizer
Nov 24, 2025
•
The Column Space
3
2
Two ways to advance science: mapping and search
Is “computer science” actually a science? I argue it is.
Nov 23, 2025
•
The Column Space
5
1
1
Does training LLMs with RL improve their ability to reason?
Or, does it draw from a fixed reservoir of reasoning ability that's already in the base model?
Nov 22, 2025
•
The Column Space
4
2
Two rules for good research taste
Avoid getting nerd-sniped by methods with attractive equations
Nov 20, 2025
•
The Column Space
14
1
See all
Adrià Garriga-Alonso
AI safety researcher at FAR https://agarri.ga/
Subscribe
Adrià Garriga-Alonso
Subscribe
About
Archive
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts