This is a safety theory that involves reverse engineering models to find the inner circuits that determine what algorithm a Neural Net is replicating. Model Interpretability by analyzing weights and activations. Not to be confused with Explainable Machine Learning

Concepts

Tools

Papers

Context Samples

https://drive.google.com/drive/u/0/folders/1GfrgKJwndk-twnJ8K7Ba-TE9i_8wBWAU