Deep neural networks and humans both benefit from compositional language structure

This repository holds the code for the paper:

Galke, L., Ram, Y. & Raviv, L. Deep neural networks and humans both benefit from compositional language structure. Nat Commun 15, 10816 (2024). https://doi.org/10.1038/s41467-024-55158-1

@article{galkeDeepNeuralNetworks2024, title = {Deep Neural Networks and Humans Both Benefit from Compositional Language Structure}, author = {Galke, Lukas and Ram, Yoav and Raviv, Limor}, year = {2024}, journal = {Nature Communications}, volume = {15}, number = {1}, pages = {10816}, issn = {2041-1723}, doi = {10.1038/s41467-024-55158-1} }

Set up

Set up a virtual environment (e.g., via conda) with a recent python version (we used Python 3.9.5)
Within the virtual environment, install PyTorch according to your OS, GPU availability, and Python package manager.
Within the virtual environment, install all other requirements via pip install -r requirements.txt

Fetch data from experiments with human participants

The data can be obtained via OSF and should be placed in the ./data subfolder. In particular, you need all LearningExp_*_log.txt files and the input_languages.csv file.

Main entry point

The main entry point is train.py. Information on command line arguments can be obtained via python3 train.py -h.

An exemplary command to run an experiment is

 python3 train.py --as_humans /data/path/to/experiment.log --seed 1234 --outdir "results-v1"

Scripts to reproduce experiments

Use the following command to reproduce the main experiments from the paper, sweeping over all experiment log files ten times with different random seeds.

 bash sweep_as_humans.bash

Results will be stored in a subfolder results-v1.

Experiments with GPT-3

The main file for running our experiments with GPT-3 is lang2prompt.py. It expects data directory to be present and filled and will write its outputs to gpt3-completions. You need to specify a language id (S1,B1,S2,...,S5,B5) as a command line argument.

An example call to run the memorization test and the generalization test on language B4 would be:

 python3 lang2prompt.py B4 --gpt3-mem-test --gpt3-reg-test

Important: You need to make sure that the shell environment variable OPENAI_API_KEY holds your API key and edit the line starting with openai.organization with your corresponding organization id.

 python3 lang2prompt.py B4 --gpt3-mem-test --gpt3-reg-test

Run statistics

Use the following command to reproduce the statistical analysis.

 python3 stats.py -o stats-output results-v1

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze_ambiguity.py		analyze_ambiguity.py
analyze_data.py		analyze_data.py
analyze_gpt3_completions.py		analyze_gpt3_completions.py
con_loss_effect.bash		con_loss_effect.bash
config.py		config.py
data.py		data.py
debug.bash		debug.bash
hidden_size_effect_1.bash		hidden_size_effect_1.bash
hidden_size_effect_2.bash		hidden_size_effect_2.bash
lang2prompt.py		lang2prompt.py
lang_viz.py		lang_viz.py
lang_viz_revised.py		lang_viz_revised.py
language.py		language.py
learning_experiment.py		learning_experiment.py
loss.py		loss.py
measures.py		measures.py
modeling.py		modeling.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
stats.bash		stats.bash
stats.py		stats.py
sweep_as_humans.bash		sweep_as_humans.bash
sweep_the_nn_way.bash		sweep_the_nn_way.bash
test_language.py		test_language.py
test_learning_experiment.py		test_learning_experiment.py
test_measures.py		test_measures.py
test_modeling.py		test_modeling.py
test_preprocessing.py		test_preprocessing.py
test_tokenization.py		test_tokenization.py
tied_weights_effect.bash		tied_weights_effect.bash
tokenization.py		tokenization.py
train.py		train.py
training.py		training.py
two_layer_effect.bash		two_layer_effect.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep neural networks and humans both benefit from compositional language structure

Set up

Fetch data from experiments with human participants

Main entry point

Scripts to reproduce experiments

Experiments with GPT-3

Run statistics

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

lgalke/easy2deeplearn

Folders and files

Latest commit

History

Repository files navigation

Deep neural networks and humans both benefit from compositional language structure

Set up

Fetch data from experiments with human participants

Main entry point

Scripts to reproduce experiments

Experiments with GPT-3

Run statistics

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages