@@ -25,108 +25,81 @@ This will install the PET model along with the ``metatrain`` package.
2525Default Hyperparameters
2626-----------------------
2727
28- The default hyperparameters for the PET model are:
28+ The description of all the hyperparameters used in PET is provided further
29+ down this page. However, here we provide you with a yaml file containing all
30+ the default hyperparameters, which might be convenient as a starting point to
31+ create your own hyperparameter files:
2932
3033.. literalinclude :: ../../../src/metatrain/pet/default-hypers.yaml
3134 :language: yaml
35+ :lines: 2-
3236
33- Tuning Hyperparameters
37+ Tuning hyperparameters
3438----------------------
3539
36- PET offers a number of tuning knobs for flexibility across datasets:
37-
3840The default hyperparameters above will work well in most cases, but they
39- may not be optimal for your specific dataset. In general, the most important
40- hyperparameters to tune are (in decreasing order of importance):
41-
42- - ``cutoff ``: This should be set to a value after which most of the interactions between
43- atoms is expected to be negligible. A lower cutoff will lead to faster models.
44- - ``learning_rate ``: The learning rate for the neural network. This hyperparameter
45- controls how much the weights of the network are updated at each step of the
46- optimization. A larger learning rate will lead to faster training, but might cause
47- instability and/or divergence.
48- - ``batch_size ``: The number of samples to use in each batch of training. This
49- hyperparameter controls the tradeoff between training speed and memory usage. In
50- general, larger batch sizes will lead to faster training, but might require more
51- memory.
52- - ``d_pet ``: This hyperparameters controls width of the neural network. In general,
53- increasing it might lead to better accuracy, especially on larger datasets, at the
54- cost of increased training and evaluation time.
55- - ``d_node ``: The dimension of the node features. Increasing this hyperparameter
56- might lead to better accuracy, with a relatively small increase in inference time.
57- - ``num_gnn_layers ``: The number of graph neural network layers. In general, decreasing
58- this hyperparameter to 1 will lead to much faster models, at the expense of accuracy.
59- Increasing it may or may not lead to better accuracy, depending on the dataset, at the
60- cost of increased training and evaluation time.
61- - ``num_attention_layers ``: The number of attention layers in each layer of the graph
62- neural network. Depending on the dataset, increasing this hyperparameter might lead to
63- better accuracy, at the cost of increased training and evaluation time.
64- - ``loss ``: This section describes the loss function to be used. See the
65- :ref: `loss-functions ` for more details.
66- - ``long_range ``: In some systems and datasets, enabling long-range Coulomb interactions
67- might be beneficial for the accuracy of the model and/or its physical correctness.
68- See below for a breakdown of the long-range section of the model hyperparameters.
69-
70- All Hyperparameters
71- -------------------
72-
73- :param name: ``pet ``
74-
75- model
76- #####
77-
78- :param cutoff: Cutoff radius for neighbor search
79- :param cutoff_width: Width of the smoothing function at the cutoff
80- :param d_pet: Dimension of the edge features
81- :param d_head: Dimension of the attention heads
82- :param d_node: Dimension of the node features
83- :param d_feedforward: Dimension of the feedforward network in the attention layer
84- :param num_heads: Attention heads per attention layer
85- :param num_attention_layers: Number of attention layers per GNN layer
86- :param num_gnn_layers: Number of GNN layers
87- :param normalization: Layer normalization type. Currently available options are
88- ``RMSNorm `` or ``LayerNorm ``.
89- :param activation: Activation function. Currently available options are ``SiLU ``,
90- and ``SwiGLU ``.
91- :param transformer_type: The order in which the layer normalization and attention
92- are applied in a transformer block. Available options are ``PreLN ``
93- (normalization before attention) and ``PostLN `` (normalization after attention).
94- :param featurizer_type: Implementation of the featurizer of the model to use. Available
95- options are ``residual `` (the original featurizer from the PET paper, that uses
96- residual connections at each GNN layer for readout) and ``feedforward `` (a modern
97- version that uses the last representation after all GNN iterations for readout).
98- Additionally, the feedforward version uses bidirectional features flow during the
99- message passing iterations, that favors features flowing from atom ``i `` to atom
100- ``j `` to be not equal to the features flowing from atom ``j `` to atom ``i ``.
101- :param zbl: Use ZBL potential for short-range repulsion
102- :param long_range: Long-range Coulomb interactions parameters:
103- - ``enable ``: Toggle for enabling long-range interactions
104- - ``use_ewald ``: Use Ewald summation. If False, P3M is used
105- - ``smearing ``: Smearing width in Fourier space
106- - ``kspace_resolution ``: Resolution of the reciprocal space grid
107- - ``interpolation_nodes ``: Number of grid points for interpolation (for PME only)
108-
109- training
110- ########
111-
112- :param distributed: Whether to use distributed training
113- :param distributed_port: Port for DDP communication
114- :param batch_size: Training batch size
115- :param num_epochs: Number of epochs
116- :param warmup_fraction: Fraction of training steps used for learning rate warmup
117- :param learning_rate: Learning rate
118- :param log_interval: Interval to log metrics
119- :param checkpoint_interval: Interval to save checkpoints
120- :param scale_targets: Normalize targets to unit std during training
121- :param fixed_composition_weights: Weights for atomic contributions
122- :param per_structure_targets: Targets to calculate per-structure losses
123- :param log_mae: Log MAE alongside RMSE
124- :param log_separate_blocks: Log per-block error
125- :param grad_clip_norm: Maximum gradient norm value, by default inf (no clipping)
126- :param loss: Loss configuration (see above)
127- :param best_model_metric: Metric used to select best checkpoint (e.g., ``rmse_prod ``)
128- :param num_workers: Number of workers for data loading. If not provided, it is set
129- automatically.
41+ may not be optimal for your specific dataset. There is good number of
42+ parameters to tune, both for the :ref: `model <pet_model_hypers >` and the
43+ :ref: `trainer <pet_trainer_hypers >`. Since seeing them for the first time
44+ might be overwhelming, here we provide a least of the parameters that in general
45+ are the most important (in decreasing order of importance):
46+
47+ .. autoattribute :: metatrain.pet.hypers.PETHypers.cutoff
48+ :no-index:
49+
50+ .. autoattribute :: metatrain.pet.hypers.PETTrainerHypers.learning_rate
51+ :no-index:
52+
53+ .. autoattribute :: metatrain.pet.hypers.PETTrainerHypers.batch_size
54+ :no-index:
55+
56+ .. autoattribute :: metatrain.pet.hypers.PETHypers.d_pet
57+ :no-index:
58+
59+ .. autoattribute :: metatrain.pet.hypers.PETHypers.d_node
60+ :no-index:
61+
62+ .. autoattribute :: metatrain.pet.hypers.PETHypers.num_gnn_layers
63+ :no-index:
64+
65+ .. autoattribute :: metatrain.pet.hypers.PETHypers.num_attention_layers
66+ :no-index:
67+
68+ .. autoattribute :: metatrain.pet.hypers.PETTrainerHypers.loss
69+ :no-index:
70+
71+ .. autoattribute :: metatrain.pet.hypers.PETHypers.long_range
72+ :no-index:
73+
74+ .. _pet_model_hypers :
75+
76+ Model hyperparameters
77+ ------------------------
78+
79+ The parameters that go under the ``architecture.model `` section of the config file
80+ are the following:
81+
82+ .. autoclass :: metatrain.pet.hypers.PETHypers
83+ :members:
84+ :undoc-members:
85+
86+ with the long-range section being:
87+
88+ .. autoclass :: metatrain.pet.hypers.LongRangeHypers
89+ :members:
90+ :undoc-members:
91+
92+ .. _pet_trainer_hypers :
93+
94+ Trainer hyperparameters
95+ -------------------------
96+
97+ The parameters that go under the ``architecture.trainer `` section of the config file
98+ are the following:
99+
100+ .. autoclass :: metatrain.pet.hypers.PETTrainerHypers
101+ :members:
102+ :undoc-members:
130103
131104References
132105----------
0 commit comments