An argument of Existential Catastrophe arising from all intances of AGI that is Rogue AI
Argument
- P1: Artificial Superintelligence can be achieved
- If we continue on this path, substantial probability that next centuries, AI capabilities reach ASI levels.
- P2: Agentic AI will arise that will be goal directed
- if goal directed superhuman AI is built, their output will be as bad as an empty universe by human lights
- Superhuman AI will inherently, even if its an Oracle Machine, can be used with a Agent Harness to make it agentic
- P3: Default Misalignment: If most goal directed ASI will hav bad goals, the future will be very bad
- P4: Alignment is difficult. Checking if AI is hard because we donβt know until deployment
- P5: Catastrophic Capability: AI agent misaligned goals β Catastrophic outcome for humanity
- C: We have non-trivial risk of Existential Catastrophe from Ai development
Counter Arguments
- AGI Is Not Goal Directed
- Goal directed systems may not always be bad
- Small differences in utility functions may not be catastrophic
- Human and AI values may be similar
- Maybe value isnt fragile
- Human success isnt from individual intelligence
- AI agents may not be radically superior to combinations of humans and non-agentic machines
- Intelligence may not be an overwhelming advantage
- Unclear if many goals incentive taking over the universe
- Speed of intelligence growth is ambiguous
- Argument proves too much about corporations
- If we If there is a case where AI agents are misaligned, we should be able to see this before it becomes uncontrollable