You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exploration of an alternative approach to extracting steering vectors. Instead of using the classical contrastive method we investigate whether comparing activations between a base model and its fine-tuned deceptive version reveals a more meaningful latent direction.