Human goals and values are fragile in the sense that good human values only result in a very-specific world in the space of all worlds.

  • Maybe a AI’s view of a good world is not the same as the human-good world (Nick Bostrom paperclip AI paperclip world) Eliezer Yudkowsky believes even a small change in the alignment prompts can cause a massive change in the final world created.