A paper written by Anthropic on removing Polysemanticity in toy models by using regularization terms to hidden layers (i.e add to the Loss Function).
A paper written by Anthropic on removing Polysemanticity in toy models by using regularization terms to hidden layers (i.e add λ∣∣h∣∣1 to the Loss Function).