sparse-autoencoders

Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the paper "Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small"

sparse-autoencoders sae sparse-autoencoder

Updated Jan 26, 2025
Python

255BITS / sae-evolver

Star

Use evolution with sparse autoencoders

python evolutionary-algorithms sparse-autoencoders

Updated Jan 29, 2025
Python

jwuphysics / euclid-galaxy-morphology-saes

Star

studying (self-)supervised representations of Euclid galaxy imaging via SAEs

astronomy galaxies sparse-autoencoders mechanistic-interpretability computer-visino

Updated Oct 29, 2025
Python

Dhia-naouali / Tickling-Vision-Models

Sponsor

Star

performing mechanistic interpretability on inceptionV1, from linear prob and sparse direction maximization to adversarial and ciruict patching & ablation

circuit-analysis sparse-autoencoders xai mechanistic-interpretability

Updated Sep 10, 2025
Python

ashioyajotham / exploring_saes

Star

Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.

sparse-autoencoders interpretability activation-functions neuron-activity wandb transformerlens mech-interp

Updated May 17, 2025
Python

behroozazarkhalili / SAE-Transcoder

Star

Unified SAE and Transcoder training using EleutherAI/sparsify library for neural network interpretability research

machine-learning deep-learning pytorch transcoder neural-networks sparse-autoencoders interpretability eleutherai

Updated Oct 2, 2025
Python

wasim / scaling-specialization-dense-lms

Star

Do dense LMs develop MoE-like specialization as they scale? Measure it, visualize it, and turn it into speed.

transformers sparse-autoencoders scaling-laws mechanistic-interpretability llm-efficiency

Updated Oct 26, 2025
Python

DanielJamesDavies / Turing-LLM-1.0-254M

Star

A framework for conducting interpretability research and for developing an LLM from a synthetic dataset.

python sparse-autoencoders interpretability mechanistic-interpretability large-language-model

Updated Sep 10, 2024
Python

krishnakanthnakka / MammoSAE

Star

Official code release for the paper: "Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autonencoders"

sparse-autoencoders breast interpretability breast-cancer breast-imaging

Updated Oct 11, 2025
Python

peppinob-ol / attribution-graph-probing

Star

Automates attribution-graph analysis via probe prompting: circuit-trace a prompt, auto-generate concept probes, profile feature activations, cluster supernodes.

graph-analysis sparse-autoencoders mechanistic-interpretability llm-interpretability research-tooling circuit-tracing attribution-graphs probe-prompting prompt-probing neuronpedia feature-activation supernodes cross-layer-transcoder

Updated Nov 5, 2025
Python

ghost1412 / Keras-Autoencoder

Star

python deep-learning autoencoder sparse-autoencoders keras-tensorflow variational-autoencoder

Updated Sep 26, 2018
Python

Improve this page

Add a description, image, and links to the sparse-autoencoders topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sparse-autoencoders topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse-autoencoders

Here are 17 public repositories matching this topic...

OpenMOSS / Language-Model-SAEs

LahiruJayasinghe / DeepDOA

dmis-lab / Monet

neuroexplicit-saar / Discover-then-Name

Abhipanda4 / Sparse-Autoencoders

meteahishali / SRL-SOA

MaheepChaudhary / SAE-Ravel

255BITS / sae-evolver

jwuphysics / euclid-galaxy-morphology-saes

Dhia-naouali / Tickling-Vision-Models

ashioyajotham / exploring_saes

behroozazarkhalili / SAE-Transcoder

wasim / scaling-specialization-dense-lms

DanielJamesDavies / Turing-LLM-1.0-254M

krishnakanthnakka / MammoSAE

peppinob-ol / attribution-graph-probing

ghost1412 / Keras-Autoencoder

Improve this page

Add this topic to your repo