This can also be used in a tf.tpu.experimental.embedding.TableConfig as the optimizer parameter to set a table specific optimizer. This will override the optimizer and parameters for global embedding optimizer defined above:
In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.
See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.
Args
learning_rate
The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate.
beta_1
A float value. The exponential decay rate for the 1st moment estimates.
beta_2
A float value. The exponential decay rate for the 2nd moment estimates.
epsilon
A small constant for numerical stability.
lazy_adam
Use lazy Adam instead of Adam. Lazy Adam trains faster.
sum_inside_sqrt
When this is true, the Adam update formula is changed from m / (sqrt(v) + epsilon) to m / sqrt(v + epsilon**2). This option improves the performance of TPU training and is not expected to harm model quality.
use_gradient_accumulation
Setting this to False makes embedding gradients calculation less accurate but faster.
clip_weight_min
the minimum value to clip by; None means -infinity.
clip_weight_max
the maximum value to clip by; None means +infinity.
weight_decay_factor
amount of weight decay to apply; None means that the weights are not decayed.
multiply_weight_decay_factor_by_learning_rate
if true, weight_decay_factor is multiplied by the current learning rate.
slot_variable_creation_fn
If you wish do directly control the creation of the slot variables, set this to a callable taking three parameters: a table variable, a list of slot names to create for it, and a list of initializers. This function should return a dict with the slot names as keys and the created variables as values with types matching the table variable. When set to None (the default), uses the built-in variable creation.
clipvalue
Controls clipping of the gradient. Set to either a single positive scalar value to get clipping or a tiple of scalar values (min, max) to set a separate maximum or minimum. If one of the two entries is None, then there will be no clipping that direction.
low_dimensional_packing_status
Status of the low-dimensional embedding packing optimization controls whether to optimize the packing of 1-dimensional, 2-dimensional, and 4-dimensional embedding tables in memory.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]