Skip to content

Commit 51ea627

Browse files
authored
Merge pull request dotnet#1817 from shauheen/release/v0.8RC2
Cherry-pick for release 0.8
2 parents 2e8c723 + 7e3edd5 commit 51ea627

File tree

19 files changed

+778
-101
lines changed

19 files changed

+778
-101
lines changed
190 KB
Loading
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# ML.NET 0.8 Release Notes
2+
3+
Today we are excited to release ML.NET 0.8 and we can finally explain why it
4+
is the best version so far! This release enables model explainability to
5+
understand which features (inputs) are most important, improved debuggability,
6+
easier to use time series predictions, several API improvements, a new
7+
recommendation use case, and more.
8+
9+
### Installation
10+
11+
ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET
12+
Core
13+
2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md)
14+
for more details.
15+
16+
You can install ML.NET NuGet from the CLI using:
17+
```
18+
dotnet add package Microsoft.ML
19+
```
20+
21+
From package manager:
22+
```
23+
Install-Package Microsoft.ML
24+
```
25+
26+
### Release Notes
27+
28+
Below are some of the highlights from this release.
29+
30+
* Added first steps towards model explainability
31+
([#1735](https://github.com/dotnet/machinelearning/pull/1735),
32+
[#1692](https://github.com/dotnet/machinelearning/pull/1692))
33+
34+
* Enabled explainability in the form of overall feature importance and
35+
generalized additive models.
36+
* Overall feature importance gives a sense of which features are overall
37+
most important for the model. For example, when predicting the sentiment
38+
of a tweet, the presence of "amazing" might be more important than
39+
whether the tweet contains "bird". This is enabled through Permutation
40+
Feature Importance. Example usage can be found
41+
[here](https://github.com/dotnet/machinelearning/blob/3d33e20f33da70cdd3da2ad9e0b2b03df929bef4/docs/samples/Microsoft.ML.Samples/Dynamic/PermutationFeatureImportance.cs).
42+
* Generalized Additive Models have very explainable predictions. They are
43+
similar to linear models in terms of ease of understanding but are more
44+
flexible and can have better performance. Example usage can be found
45+
[here](https://github.com/dotnet/machinelearning/blob/3d33e20f33da70cdd3da2ad9e0b2b03df929bef4/docs/samples/Microsoft.ML.Samples/Dynamic/GeneralizedAdditiveModels.cs).
46+
47+
* Improved debuggability by previewing IDataViews
48+
([#1518](https://github.com/dotnet/machinelearning/pull/1518))
49+
50+
* It is often useful to peek at the data that is read into an ML.NET
51+
pipeline and even look at it after some intermediate steps to ensure the
52+
data is transformed as expected.
53+
* You can now preview an IDataView by going to the Watch window in the VS
54+
debugger, entering a variable name you want to preview and calling its
55+
`Preview()` method.
56+
57+
![](dataPreview.gif)
58+
59+
* Enabled a stateful prediction engine for time series problems
60+
([#1727](https://github.com/dotnet/machinelearning/pull/1727))
61+
62+
* [ML.NET
63+
0.7](https://github.com/dotnet/machinelearning/blob/483ec04a11fbdc056a88bc581d7e5cee9092a936/docs/release-notes/0.7/release-0.7.md)
64+
enabled anomaly detection scenarios. However, the prediction engine was
65+
stateless, which means that every time you want to figure out whether
66+
the latest data point is anomolous, you need to provide historical data
67+
as well. This is unnatural.
68+
* The prediction engine can now keep state of time series data seen so
69+
far, so you can now get predictions by just providing the latest data
70+
point. This is enabled by using `CreateTimeSeriesPredictionFunction`
71+
instead of `MakePredictionFunction`. Example usage can be found
72+
[here](https://github.com/dotnet/machinelearning/blob/3d33e20f33da70cdd3da2ad9e0b2b03df929bef4/test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs#L141).
73+
You'll need to add the Microsoft.ML.TimeSeries NuGet to your project.
74+
75+
* Improved support for recommendation scenarios with implicit feedback
76+
([#1664](https://github.com/dotnet/machinelearning/pull/1664))
77+
78+
* [ML.NET
79+
0.7](https://github.com/dotnet/machinelearning/blob/483ec04a11fbdc056a88bc581d7e5cee9092a936/docs/release-notes/0.7/release-0.7.md)
80+
included Matrix Factorization which enables using ratings provided by
81+
users to recommend other items they might like.
82+
* In some cases, you don't have specific ratings from users but only
83+
implicit feedback (e.g. they watched the movie but didn't rate it).
84+
* Matrix Factorization in ML.NET can now use this type of implicit data to
85+
train models for recommendation scenarios.
86+
* Example usage can be found
87+
[here](https://github.com/dotnet/machinelearning/blob/71d58fa83f77abb630d815e5cf8aa9dd3390aa65/test/Microsoft.ML.Tests/TrainerEstimators/MatrixFactorizationTests.cs#L335).
88+
You'll need to add the Microsoft.ML.MatrixFactorization NuGet to your
89+
project.
90+
91+
* Enabled saving and loading data as a binary file (IDataView/IDV)
92+
([#1678](https://github.com/dotnet/machinelearning/pull/1678))
93+
94+
* It is sometimes useful to save data after it has been transformed. For
95+
example, you might have featurized all the text into sparse vectors and
96+
want to perform repeated experimentation with different trainers without
97+
continuously repeating the data transformation.
98+
* Saving and loading files in ML.NET's binary format can help efficiency
99+
as it is compressed and already schematized.
100+
* Reading a binary data file can be done using
101+
`mlContext.Data.ReadFromBinary("pathToFile")` and writing a binary data
102+
file can be done using `mlContext.Data.SaveAsBinary("pathToFile")`.
103+
104+
* Added filtering and caching APIs
105+
([#1569](https://github.com/dotnet/machinelearning/pull/1569))
106+
107+
* There is sometimes a need to filter the data used for training a model.
108+
For example, you need to remove rows that don't have a label, or focus
109+
your model on certain categories of inputs. This can now be done with
110+
additional filters as shown
111+
[here](https://github.com/dotnet/machinelearning/blob/71d58fa83f77abb630d815e5cf8aa9dd3390aa65/test/Microsoft.ML.Tests/RangeFilterTests.cs#L30).
112+
113+
* Some estimators iterate over the data multiple times. Instead of always
114+
reading from file, you can choose to cache the data to potentially speed
115+
things up. An example can be found
116+
[here](https://github.com/dotnet/machinelearning/blob/71d58fa83f77abb630d815e5cf8aa9dd3390aa65/test/Microsoft.ML.Tests/CachingTests.cs#L56).
117+
118+
### Acknowledgements
119+
120+
Shoutout to [jwood803](https://github.com/jwood803),
121+
[feiyun0112](https://github.com/feiyun0112),
122+
[bojanmisic](https://github.com/bojanmisic),
123+
[rantri](https://github.com/rantri), [Caraul](https://github.com/Caraul),
124+
[van-tienhoang](https://github.com/van-tienhoang),
125+
[Thomas-S-B](https://github.com/Thomas-S-B), and the ML.NET team for their
126+
contributions as part of this release!

docs/samples/Microsoft.ML.Samples/Dynamic/IidChangePointDetectorTransform.cs

Lines changed: 122 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44
using Microsoft.ML.Runtime.Data;
55
using Microsoft.ML.Runtime.Api;
66
using Microsoft.ML.Runtime.TimeSeriesProcessing;
7+
using Microsoft.ML.Core.Data;
8+
using Microsoft.ML.TimeSeries;
9+
using System.IO;
10+
using Microsoft.ML.Data;
711

812
namespace Microsoft.ML.Samples.Dynamic
913
{
@@ -34,26 +38,26 @@ public static void IidChangePointDetectorTransform()
3438
var ml = new MLContext();
3539

3640
// Generate sample series data with a change
37-
const int size = 16;
38-
var data = new List<IidChangePointData>(size);
39-
for (int i = 0; i < size / 2; i++)
41+
const int Size = 16;
42+
var data = new List<IidChangePointData>(Size);
43+
for (int i = 0; i < Size / 2; i++)
4044
data.Add(new IidChangePointData(5));
4145
// This is a change point
42-
for (int i = 0; i < size / 2; i++)
46+
for (int i = 0; i < Size / 2; i++)
4347
data.Add(new IidChangePointData(7));
4448

4549
// Convert data to IDataView.
4650
var dataView = ml.CreateStreamingDataView(data);
4751

4852
// Setup IidSpikeDetector arguments
49-
string outputColumnName = "Prediction";
50-
string inputColumnName = "Value";
53+
string outputColumnName = nameof(ChangePointPrediction.Prediction);
54+
string inputColumnName = nameof(IidChangePointData.Value);
5155
var args = new IidChangePointDetector.Arguments()
5256
{
5357
Source = inputColumnName,
5458
Name = outputColumnName,
5559
Confidence = 95, // The confidence for spike detection in the range [0, 100]
56-
ChangeHistoryLength = size / 4, // The length of the sliding window on p-values for computing the martingale score.
60+
ChangeHistoryLength = Size / 4, // The length of the sliding window on p-values for computing the martingale score.
5761
};
5862

5963
// The transformed data.
@@ -88,5 +92,116 @@ public static void IidChangePointDetectorTransform()
8892
// 7 0 7.00 0.50 0.00
8993
// 7 0 7.00 0.50 0.00
9094
}
95+
96+
// This example creates a time series (list of Data with the i-th element corresponding to the i-th time slot).
97+
// IidChangePointDetector is applied then to identify points where data distribution changed using time series
98+
// prediction engine. The engine is checkpointed and then loaded back from disk into memory and used for prediction.
99+
public static void IidChangePointDetectorPrediction()
100+
{
101+
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
102+
// as well as the source of randomness.
103+
var ml = new MLContext();
104+
105+
// Generate sample series data with a change
106+
const int Size = 16;
107+
var data = new List<IidChangePointData>(Size);
108+
for (int i = 0; i < Size / 2; i++)
109+
data.Add(new IidChangePointData(5));
110+
// This is a change point
111+
for (int i = 0; i < Size / 2; i++)
112+
data.Add(new IidChangePointData(7));
113+
114+
// Convert data to IDataView.
115+
var dataView = ml.CreateStreamingDataView(data);
116+
117+
// Setup IidSpikeDetector arguments
118+
string outputColumnName = nameof(ChangePointPrediction.Prediction);
119+
string inputColumnName = nameof(IidChangePointData.Value);
120+
var args = new IidChangePointDetector.Arguments()
121+
{
122+
Source = inputColumnName,
123+
Name = outputColumnName,
124+
Confidence = 95, // The confidence for spike detection in the range [0, 100]
125+
ChangeHistoryLength = Size / 4, // The length of the sliding window on p-values for computing the martingale score.
126+
};
127+
128+
// Time Series model.
129+
ITransformer model = new IidChangePointEstimator(ml, args).Fit(dataView);
130+
131+
// Create a time series prediction engine from the model.
132+
var engine = model.CreateTimeSeriesPredictionFunction<IidChangePointData, ChangePointPrediction>(ml);
133+
for(int index = 0; index < 8; index++)
134+
{
135+
// Anomaly change point detection.
136+
var prediction = engine.Predict(new IidChangePointData(5));
137+
Console.WriteLine("{0}\t{1}\t{2:0.00}\t{3:0.00}\t{4:0.00}", 5, prediction.Prediction[0],
138+
prediction.Prediction[1], prediction.Prediction[2], prediction.Prediction[3]);
139+
}
140+
141+
// Change point
142+
var changePointPrediction = engine.Predict(new IidChangePointData(7));
143+
Console.WriteLine("{0}\t{1}\t{2:0.00}\t{3:0.00}\t{4:0.00}", 7, changePointPrediction.Prediction[0],
144+
changePointPrediction.Prediction[1], changePointPrediction.Prediction[2], changePointPrediction.Prediction[3]);
145+
146+
// Checkpoint the model.
147+
var modelPath = "temp.zip";
148+
engine.CheckPoint(ml, modelPath);
149+
150+
// Reference to current time series engine because in the next step "engine" will point to the
151+
// checkpointed model being loaded from disk.
152+
var timeseries1 = engine;
153+
154+
// Load the model.
155+
using (var file = File.OpenRead(modelPath))
156+
model = TransformerChain.LoadFrom(ml, file);
157+
158+
// Create a time series prediction engine from the checkpointed model.
159+
engine = model.CreateTimeSeriesPredictionFunction<IidChangePointData, ChangePointPrediction>(ml);
160+
for (int index = 0; index < 8; index++)
161+
{
162+
// Anomaly change point detection.
163+
var prediction = engine.Predict(new IidChangePointData(7));
164+
Console.WriteLine("{0}\t{1}\t{2:0.00}\t{3:0.00}\t{4:0.00}", 7, prediction.Prediction[0],
165+
prediction.Prediction[1], prediction.Prediction[2], prediction.Prediction[3]);
166+
}
167+
168+
// Prediction from the original time series engine should match the prediction from
169+
// check pointed model.
170+
engine = timeseries1;
171+
for (int index = 0; index < 8; index++)
172+
{
173+
// Anomaly change point detection.
174+
var prediction = engine.Predict(new IidChangePointData(7));
175+
Console.WriteLine("{0}\t{1}\t{2:0.00}\t{3:0.00}\t{4:0.00}", 7, prediction.Prediction[0],
176+
prediction.Prediction[1], prediction.Prediction[2], prediction.Prediction[3]);
177+
}
178+
179+
// Data Alert Score P-Value Martingale value
180+
// 5 0 5.00 0.50 0.00 <-- Time Series 1.
181+
// 5 0 5.00 0.50 0.00
182+
// 5 0 5.00 0.50 0.00
183+
// 5 0 5.00 0.50 0.00
184+
// 5 0 5.00 0.50 0.00
185+
// 5 0 5.00 0.50 0.00
186+
// 5 0 5.00 0.50 0.00
187+
// 5 0 5.00 0.50 0.00
188+
// 7 1 7.00 0.00 10298.67 <-- alert is on, predicted changepoint (and model is checkpointed).
189+
190+
// 7 0 7.00 0.13 33950.16 <-- Time Series 2 : Model loaded back from disk and prediction is made.
191+
// 7 0 7.00 0.26 60866.34
192+
// 7 0 7.00 0.38 78362.04
193+
// 7 0 7.00 0.50 0.01
194+
// 7 0 7.00 0.50 0.00
195+
// 7 0 7.00 0.50 0.00
196+
// 7 0 7.00 0.50 0.00
197+
198+
// 7 0 7.00 0.13 33950.16 <-- Time Series 1 and prediction is made.
199+
// 7 0 7.00 0.26 60866.34
200+
// 7 0 7.00 0.38 78362.04
201+
// 7 0 7.00 0.50 0.01
202+
// 7 0 7.00 0.50 0.00
203+
// 7 0 7.00 0.50 0.00
204+
// 7 0 7.00 0.50 0.00
205+
}
91206
}
92207
}

0 commit comments

Comments
 (0)