Skip to content

It is awkward to turn off char-grams with FeaturizeText #2946

@rogancarr

Description

@rogancarr

FeaturizeText was upgraded to allow specification of n-grams for words and characters. However, now it awkward to use FeaturizeText without specifying n-grams. It is now necessary to explicitly set CharFeatureExtractor as null.

This is how to compose a bag-of-words with the current API:

var pipeline = mlContext.Transforms.Text.FeaturizeText( "Features", new TextFeaturizingEstimator.Options { KeepPunctuations = false, OutputTokens = true, CharFeatureExtractor = null, WordFeatureExtractor = new WordBagEstimator.Options { NgramLength = 1}, VectorNormalizer = TextFeaturizingEstimator.NormFunction.None }, "SentimentText");

I would expect to be able to do something like

CharFeatureExtractor = new WordBagEstimator.Options { NgramLength = 0},

But this throws an error that Skipgrams is not less-than NgramLength, and Skipgrams must be positive.

Overall, it is a bit awkward and not obvious that you have to manually null a option. Is this the API we want to ship in v1.0?

Metadata

Metadata

Assignees

Labels

APIIssues pertaining the friendly API

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions