Skip to content

Conversation

@jordi1215
Copy link
Contributor

Related to: #1912
Related to: Issue 1091 on the Fairlearn Repo

As part of my summer internship project, I have created several classes that will provide a foundation for the further inclusion of Fairlearn algorithms into ML.NET.

For the first PR, I have included an end-to-end working solution for the gridSearch method in Fairlearn which leverages AutoML for parameter searching. The example is included in the Unit test.

We are excited to review your PR.
So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.
Jordi Ramos and others added 30 commits June 13, 2022 14:07
Added `inPlace:true` to the append to fix the bug that no columns are added; Added unit testing to test the basic funcitonality of Metrc.Regression.ByGroup()
Added MSE and MAE (MeanAbsoluteError), and included tests for MSE and RMS
RMS and MSE fully supported by Fairlearn.Metric.Regression now accross all three functions
created the moment class and utilityParity class. Initial commit
Passed the initial unit tests for Demographic Parity. Every class is made public which needs to be changed in the future. Utility Parity still needs to be changed for other parities to work
There is still a lot of development needed to be done with the moment class
…search getting sensitive feature column names directly from a getter function, adjusted default value for the option to a random value
built the first prototype for the gridSearchTrialRunner. We have enabled a seperate training set for training and testing set for validating the result, which is a different approach from the original implementation.
Added a new feature so that users can look up row item by the name of the column instead of the raw row index
… fairlearn In fairlearn AutoML, we have to add in a singleton to the serviceCollection called moment, which will be later extracted to calculate the fairness parity.
Added an if statement to output fairlearn metric if using fairlearn
the experiment is able to add a moment to its serviceCollection which is later used to calculate fairlearn parity.
Created a tuner so that we can go through the search space through the gridsearch algorithm
Comment on lines +57 to +58
}
//TODO: what should be the object type of X be? How can I make x capitilized to fit the whole data strcuture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between these.

/// overall mean utility
///
/// </summary>
public class UtilityParity : ClassificationMoment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class going to be used by the users? If not we should keep them internal. In general, if the user doesn't need to directly interact with it we should keep things internal.

using Microsoft.Data.Analysis;


namespace Microsoft.ML.Fairlearn.reductions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reductions namespace should start with caps.

/// <param name="x">The features</param>
/// <param name="y">The label</param>
/// <param name="sensitiveFeature">The sensitive groups</param>
public new void LoadData(IDataView x, DataFrameColumn y, StringDataFrameColumn sensitiveFeature)//, StringDataFrameColumn events = null, StringDataFrameColumn utilities = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need the comments at the end?

Comment on lines +83 to +84
}
/// <summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between here.

}
}

public class DemographicParity : UtilityParity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need an empty class?

Comment on lines +38 to +39
}
private DataFrame CreateDummyDataset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.


namespace Microsoft.ML.Fairlearn.Tests
{
public class GridSearchTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should extend BaseTestClass

Comment on lines +81 to +82
// Data generated so it is identical from Binary_Classification.ipynb from Fairlearn.github on Github
private DataFrame CreateGridScearhDataset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing

Comment on lines +107 to +108
}
/// <summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.

return df;
}
/// <summary>
/// This trial runner run the tests from Grid searh for Binary Classification.ipynb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont need summary comments for test methods.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space.

using Microsoft.ML.Data;
using Xunit;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space.


namespace Microsoft.ML.Fairlearn.Tests
{
public class MetricTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend BaseTestClass

Comment on lines +23 to +24
}
public class HouseData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.

Comment on lines +18 to +19
IDataView data;
public MetricTest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.


namespace Microsoft.ML.Fairlearn.Tests
{
public class UtilityTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.

Should extend BaseTestClass

Comment on lines +22 to +24
}
[Fact]
public void DemographyParityTest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.

int[] vs = { 1, 1, 1, 1, 1, 1, 1, 0, 0, 0 };
PrimitiveDataFrameColumn<int> y = new PrimitiveDataFrameColumn<int>("label", vs);


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space.

Copy link
Contributor

@michaelgsharp michaelgsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor formatting fixes.

@ghost ghost added the needs-author-action label Aug 9, 2022
@ghost ghost added the no-recent-activity label Aug 24, 2022
@ghost
Copy link

ghost commented Aug 24, 2022

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@tarekgh
Copy link
Member

tarekgh commented Aug 26, 2022

@jordi1215 could you please address the feedback and resolve the conflict in this PR? Thanks!

@ghost ghost removed the no-recent-activity label Aug 26, 2022
@ghost ghost added the no-recent-activity label Sep 9, 2022
@ghost
Copy link

ghost commented Sep 9, 2022

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost
Copy link

ghost commented Sep 24, 2022

This pull request will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

@ghost ghost closed this Sep 24, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 24, 2022
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.