@@ -991,11 +991,23 @@ val_check_interval
991991 :muted:
992992
993993How often within one training epoch to check the validation set.
994- Can specify as float or int .
994+ Can specify as float, int, or a time-based duration .
995995
996996- pass a ``float `` in the range [0.0, 1.0] to check after a fraction of the training epoch.
997997- pass an ``int `` to check after a fixed number of training batches. An ``int `` value can only be higher than the number of training
998998 batches when ``check_val_every_n_epoch=None ``, which validates after every ``N `` training batches across epochs or iteration-based training.
999+ - pass a ``string `` duration in the format "DD:HH: MM:SS", a ``datetime.timedelta `` object, or a ``dictionary `` of keyword arguments that can be passed
1000+ to ``datetime.timedelta `` for time-based validation. When using a time-based duration, validation will trigger once the elapsed wall-clock time
1001+ since the last validation exceeds the interval. The validation check occurs after the current batch completes, the validation loop runs, and
1002+ the timer resets.
1003+
1004+ **Time-based validation behavior with check_val_every_n_epoch: ** When used together with ``val_check_interval `` (time-based) and
1005+ ``check_val_every_n_epoch > 1 ``, validation is aligned to epoch multiples:
1006+
1007+ - If the time-based interval elapses **before ** the next multiple-N epoch, validation runs at the start of that epoch (after the first batch),
1008+ and the timer resets.
1009+ - If the interval elapses **during ** a multiple-N epoch, validation runs after the current batch.
1010+ - For cases where ``check_val_every_n_epoch=None `` or ``1 ``, the time-based behavior of ``val_check_interval `` applies without additional alignment.
9991011
10001012.. testcode ::
10011013
@@ -1013,10 +1025,25 @@ Can specify as float or int.
10131025 # (ie: production cases with streaming data)
10141026 trainer = Trainer(val_check_interval=1000, check_val_every_n_epoch=None)
10151027
1028+ # check validation every 15 minutes of wall-clock time using a string-based approach
1029+ trainer = Trainer(val_check_interval="00:00:15:00")
1030+
1031+ # check validation every 15 minutes of wall-clock time using a dictionary-based approach
1032+ trainer = Trainer(val_check_interval={"minutes": 15})
1033+
1034+ # check validation every 1 hour of wall-clock time using a dictionary-based approach
1035+ trainer = Trainer(val_check_interval={"hours": 1})
1036+
1037+ # check validation every 1 hour of wall-clock time using a datetime.timedelta object
1038+ from datetime import timedelta
1039+ trainer = Trainer(val_check_interval=timedelta(hours=1))
1040+
1041+
10161042
10171043.. code-block :: python
10181044
10191045 # Here is the computation to estimate the total number of batches seen within an epoch.
1046+ # This logic applies when `val_check_interval` is specified as an integer or a float.
10201047
10211048 # Find the total number of train batches
10221049 total_train_batches = total_train_samples // (train_batch_size * world_size)
0 commit comments