A study found where LLMs weights were modified, and then asked if their own weights were modified.

  • The LLMs could not check its own outputs, can only check itself Often responsed that its own weights were changed, but doesn’t necessarily know which weights were changed.