Skip to content

MLFlow Logger Tests Not Passing #21433

@alexeatscake

Description

@alexeatscake

Bug description

Currently, on the continuous integration tests, they are not calling mlflow as the tests are skipped as the package isn't included in the development dependencies. The tests I am referring to are here.

This could be fixed by adding mlflow to requirements/pytorch/test.txt, but doing so would cause the tests to fail. The tests fail for three separate reasons.

FutureWarning

In tests/tests_pytorch/loggers/test_mlflow.py, two tests fail due to future warnings:

  • test_mlflow_run_name_setting
  • test_mlflow_logger_dirs_creation

These occur from version 3.6.0, the latest being 3.7.0 at the time of writing.

The warning message linked to this issue and is as follows:

FutureWarning: The filesystem tracking backend (e.g., './mlruns') will be deprecated in February 2026. Consider transitioning to a database backend (e.g., 'sqlite:///mlflow.db') to take advantage of the latest MLflow features. See https://github.com/mlflow/mlflow/issues/18534 for more details and migration guidance. 

test_mlflow_run_name_setting

If you downgrade MLFlow to mlflow<=3.5.0, then the FutureWarnings disappear (they can also be suppressed), but the tests still do not pass.

They fail because test_mlflow_run_name_setting - Produces an AssertionError: expected call not found which relates to the class passing 'mlflow.runName': 'run-name-1' as an option to the class.

This is because the tests create a client and a logger, reinitialise the client, but do not attach the logger each time.
This means the assertion on client refers to the logger in the latter parts of the test, which is not the one being adjusted in the preceding lines.

restore_env_variables

In the tests/tests_pytorch/conftest.py, there is a list of environment variables that are created that should be reset at the end of the tests.
If the mlflow package is installed, then the variable MLFLOW_TRACKING_URI is expected from the behaviour of MLFlow.

What version are you seeing the problem on?

master

Reproduced in studio

No response

How to reproduce the bug

git clone https://github.com/Lightning-AI/pytorch-lightning cd pytorch-lightning uv venv -p 3.12 source .venv/bin/activate make setup uv pip install "mlflow>=3.6" Then you can either run `make test` or pytest tests/tests_pytorch/loggers/test_mlflow.py

Error messages and logs

FutureWarning

src/lightning/pytorch/loggers/mlflow.py:153: in __init__ self._mlflow_client = MlflowClient(tracking_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/mlflow/tracking/client.py:224: in __init__ self._tracking_client = TrackingServiceClient(final_tracking_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/mlflow/tracking/_tracking_service/client.py:96: in __init__ self.store .venv/lib/python3.12/site-packages/mlflow/tracking/_tracking_service/client.py:100: in store return utils._get_store(self.tracking_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/mlflow/tracking/_tracking_service/utils.py:253: in _get_store return _tracking_store_registry.get_store(store_uri, artifact_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/mlflow/tracking/_tracking_service/registry.py:45: in get_store return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/mlflow/tracking/_tracking_service/registry.py:56: in _get_store_with_resolved_uri return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .venv/lib/python3.12/site-packages/mlflow/tracking/_tracking_service/utils.py:177: in _get_file_store return FileStore(store_uri, store_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <mlflow.store.tracking.file_store.FileStore object at 0x16d905040>, root_directory = 'file:/.../pytest-of-name/pytest-18/test_mlflow_run_name_setting0' artifact_root_uri = 'file:/.../pytest-18/test_mlflow_run_name_setting0' def __init__(self, root_directory=None, artifact_root_uri=None): """ Create a new FileStore with the given root directory and a given default artifact root URI. """ super().__init__() > warnings.warn( "The filesystem tracking backend (e.g., './mlruns') will be deprecated in " "February 2026. Consider transitioning to a database backend (e.g., " "'sqlite:///mlflow.db') to take advantage of the latest MLflow features. " "See https://github.com/mlflow/mlflow/issues/18534 for more details and migration " "guidance.", FutureWarning, stacklevel=2, ) E FutureWarning: The filesystem tracking backend (e.g., './mlruns') will be deprecated in February 2026. Consider transitioning to a database backend (e.g., 'sqlite:///mlflow.db') to take advantage of the latest MLflow features. See https://github.com/mlflow/mlflow/issues/18534 for more details and migration guidance. .venv/lib/python3.12/site-packages/mlflow/store/tracking/file_store.py:219: FutureWarning 

test_mlflow_run_name_setting

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <MagicMock name='mock.create_run' id='13200726048'>, args = () kwargs = {'experiment_id': 'exp-id', 'tags': {'mlflow.source.name': '/.../pytorch-lightning/.venv/bin/pytest', 'mlflow.source.type': 'LOCAL', 'mlflow.user': 'name'}} expected = call(experiment_id='exp-id', tags={'mlflow.user': 'name', 'mlflow.source.name': '/.../pytorch-lightning/.venv/bin/pytest', 'mlflow.source.type': 'LOCAL'}) actual = call(experiment_id='exp-id', tags={'mlflow.user': 'name', 'mlflow.source.name': '/.../pytorch-lightning/.venv/bin/pytest', 'mlflow.source.type': 'LOCAL', 'mlflow.runName': 'run-name-1'}) _error_message = <function NonCallableMock.assert_called_with.<locals>._error_message at 0x313d20cc0>, cause = None def assert_called_with(self, /, *args, **kwargs): """assert that the last call was made with the specified arguments. Raises an AssertionError if the args and keyword args passed in are different to the last call to the mock.""" if self.call_args is None: expected = self._format_mock_call_signature(args, kwargs) actual = 'not called.' error_message = ('expected call not found.\nExpected: %s\n Actual: %s' % (expected, actual)) raise AssertionError(error_message) def _error_message(): msg = self._format_mock_failure_message(args, kwargs) return msg expected = self._call_matcher(_Call((args, kwargs), two=True)) actual = self._call_matcher(self.call_args) if actual != expected: cause = expected if isinstance(expected, Exception) else None > raise AssertionError(_error_message()) from cause E AssertionError: expected call not found. E Expected: create_run(experiment_id='exp-id', tags={'mlflow.user': 'name', 'mlflow.source.name': '/.../pytorch-lightning/.venv/bin/pytest', 'mlflow.source.type': 'LOCAL'}) E Actual: create_run(experiment_id='exp-id', tags={'mlflow.user': 'name', 'mlflow.source.name': '/.../Lightning/pytorch-lightning/.venv/bin/pytest', 'mlflow.source.type': 'LOCAL', 'mlflow.runName': 'run-name-1'}) /.../lib/python3.12/unittest/mock.py:949: AssertionError 

restore_env_variables

_________________________________________________________________________________ ERROR at teardown of test_mlflow_run_name_setting __________________________________________________________________________________ @pytest.fixture(autouse=True) def restore_env_variables(): """Ensures that environment variables set during the test do not leak out.""" env_backup = os.environ.copy() yield leaked_vars = os.environ.keys() - env_backup.keys() # restore environment as it was before running the test os.environ.clear() os.environ.update(env_backup) # these are currently known leakers - ideally these would not be allowed allowlist = { "CUBLAS_WORKSPACE_CONFIG", # enabled with deterministic flag "CUDA_DEVICE_ORDER", "LOCAL_RANK", "NODE_RANK", "WORLD_SIZE", "MASTER_ADDR", "MASTER_PORT", "PL_GLOBAL_SEED", "PL_SEED_WORKERS", "WANDB_MODE", "WANDB_REQUIRE_SERVICE", "WANDB_SERVICE", "RANK", # set by DeepSpeed "CUDA_MODULE_LOADING", # leaked by PyTorch "KMP_INIT_AT_FORK", # leaked by PyTorch "KMP_DUPLICATE_LIB_OK", # leaked by PyTorch "CRC32C_SW_MODE", # leaked by tensorboardX "TRITON_CACHE_DIR", # leaked by torch.compile "_TORCHINDUCTOR_PYOBJECT_TENSOR_DATA_PTR", # leaked by torch.compile "OMP_NUM_THREADS", # set by our launchers # leaked by XLA "ALLOW_MULTIPLE_LIBTPU_LOAD", "GRPC_VERBOSITY", "TF_CPP_MIN_LOG_LEVEL", "TF_GRPC_DEFAULT_OPTIONS", "XLA_FLAGS", "TORCHINDUCTOR_CACHE_DIR", # leaked by torch.compile # TensorFlow and TPU related variables "TF2_BEHAVIOR", "TPU_ML_PLATFORM", "TPU_ML_PLATFORM_VERSION", "LD_LIBRARY_PATH", "ENABLE_RUNTIME_UPTIME_TELEMETRY", } leaked_vars.difference_update(allowlist) > assert not leaked_vars, f"test is leaking environment variable(s): {set(leaked_vars)}" E AssertionError: test is leaking environment variable(s): {'MLFLOW_TRACKING_URI'} E assert not {'MLFLOW_TRACKING_URI'} tests/tests_pytorch/conftest.py:106: AssertionError 

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0): 2.6.0 #- PyTorch Version (e.g., 2.5): 2.9.0 #- Python version (e.g., 3.12): 3.12 #- OS (e.g., Linux): MacOS #- CUDA/cuDNN version: None #- GPU models and configuration: None #- How you installed Lightning(`conda`, `pip`, source): source 

More info

No response

cc @ethanwharris

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageWaiting to be triaged by maintainersver: 2.5.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions