🍈 Zettelkasten

❯

Value Fragility

Value Fragility

Mar 10, 20261 min read

ai_safety
philosophy

Human goals and values are fragile in the sense that good human values only result in a very-specific world in the space of all worlds.

Maybe a AI’s view of a good world is not the same as the human-good world (Nick Bostrom paperclip AI → paperclip world) Eliezer Yudkowsky believes even a small change in the alignment prompts can cause a massive change in the final world created.

Graph View

Backlinks

Default Misalignment

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community