Skip to content

Conversation

asp2286
Copy link
Contributor

@asp2286 asp2286 commented Sep 4, 2025

Related issues

Fixes #3043

Add Isolation Forest Anomaly Detection Trainer (Experimental)

This PR adds an Isolation Forest anomaly detection trainer for ML.NET.
Isolation Forest (Liu, Ting, Zhou, 2008) is a tree-ensemble algorithm for unsupervised anomaly detection that isolates outliers via random partitioning. It complements existing ML.NET anomaly detectors (e.g., SR-CNN, IID) with a density-agnostic approach.


Motivation

  • Provide a widely-used, general-purpose anomaly detection method.
  • Works without strong distribution assumptions.
  • Produces both a continuous anomaly score and a binary label.
  • Achieves parity with popular libraries like scikit-learn.

Design (v1, Experimental)

  • Core engine: IsolationForestModel (pure C#) implements random partitioning trees, scoring, and SHAP-like path contributions.
  • Pipeline integration: IsolationForestTrainer : IEstimator<ITransformer> appends:
    • Score (float, scaled 0–100; higher = more anomalous),
    • PredictedLabel (bool), thresholded by Contamination or explicit override.
  • Options:
    • Trees
    • SampleSize (psi)
    • Seed
    • Contamination
    • ParallelBuild
    • ThresholdOverride

⚠️ Experimental note: v1 uses CustomMapping internally. Models trained with this trainer cannot currently be persisted with mlContext.Model.Save(). A follow-up will introduce a proper IsolationForestTransformer with save/load and efficient row-mapping.


Usage

var pipeline = ml.Transforms.Concatenate("Features", "X1", "X2") .Append(new IsolationForestTrainer(new IsolationForestTrainer.Options { Trees = 200, SampleSize = 256, Contamination = 0.02 })); var model = pipeline.Fit(data);
@asp2286
Copy link
Contributor Author

asp2286 commented Sep 4, 2025 via email

Copy link

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 87.38255% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.05%. Comparing base (fb39755) to head (adab91a).

Files with missing lines Patch % Lines
...crosoft.ML.IsolationForest/IsolationForestModel.cs 84.69% 30 Missing and 32 partials ⚠️
...osoft.ML.IsolationForest/IsolationForestTrainer.cs 77.86% 21 Missing and 8 partials ⚠️
...L.IsolationForest.Tests/IsolationForestAllTests.cs 98.56% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@ Coverage Diff @@ ## main #7497 +/- ## ========================================== + Coverage 69.01% 69.05% +0.03%  ========================================== Files 1482 1485 +3 Lines 273999 274744 +745 Branches 28258 28388 +130 ========================================== + Hits 189093 189717 +624  - Misses 77520 77594 +74  - Partials 7386 7433 +47 
Flag Coverage Δ
Debug 69.05% <87.38%> (+0.03%) ⬆️
production 63.34% <83.02%> (+0.03%) ⬆️
test 89.49% <98.56%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...L.IsolationForest.Tests/IsolationForestAllTests.cs 98.56% <98.56%> (ø)
...osoft.ML.IsolationForest/IsolationForestTrainer.cs 77.86% <77.86%> (ø)
...crosoft.ML.IsolationForest/IsolationForestModel.cs 84.69% <84.69%> (ø)

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@asp2286 asp2286 closed this Sep 18, 2025
@asp2286 asp2286 deleted the feature/Isolation-forest-trainer branch October 2, 2025 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant