An argument of Existential Catastrophe arising from all intances of AGI that is Rogue AI

Argument

  • P1: Artificial Superintelligence can be achieved
    • If we continue on this path, substantial probability that next centuries, AI capabilities reach ASI levels.
  • P2: Agentic AI will arise that will be goal directed
    • if goal directed superhuman AI is built, their output will be as bad as an empty universe by human lights
    • Superhuman AI will inherently, even if its an Oracle Machine, can be used with a Agent Harness to make it agentic
  • P3: Default Misalignment: If most goal directed ASI will hav bad goals, the future will be very bad
  • P4: Alignment is difficult. Checking if AI is hard because we don’t know until deployment
  • P5: Catastrophic Capability: AI agent misaligned goals β†’ Catastrophic outcome for humanity
  • C: We have non-trivial risk of Existential Catastrophe from Ai development

Counter Arguments

  • AGI Is Not Goal Directed
  • Goal directed systems may not always be bad
    • Small differences in utility functions may not be catastrophic
    • Human and AI values may be similar
    • Maybe value isnt fragile
  • Human success isnt from individual intelligence
  • AI agents may not be radically superior to combinations of humans and non-agentic machines
  • Intelligence may not be an overwhelming advantage
  • Unclear if many goals incentive taking over the universe
  • Speed of intelligence growth is ambiguous
  • Argument proves too much about corporations
  • If we If there is a case where AI agents are misaligned, we should be able to see this before it becomes uncontrollable