Skip to content

Conversation

@samueltlg
Copy link
Contributor

Only a smaller one this time

But a few key points/queries which will be highlighted in comments:

  • Not sure if, as part of the planned re-arch. of canonicalization, that these CanonicalForm functions will remain (i.e. as function-based replacement rules), or be replaced by individual, string-based rules. Of note though, is that the thus-far published forms should be useful guidance in achieving this, & should reduce the workload substantially.
    (In some cases though, I have pondered how/if string-based replacement rules could achieve this (so easily): considering the substantial set of conditions required upon some operations.
    I.e. you could imagine: x_{type:number;!isFunctionExpression;...}^0 -> 1, or a_{AnyInfinity}^b_{type:complex; real > 0} -> ComplexInfinity, to be verbose...)

  • Have wondered, with regards to the discussion Feat/fix: revised canonical forms; tests; preliminary & associated fixes #238 (comment), particularly The value of symbols should not be considered during canonicalization, even constant ones...:

    • Is it the case that this should be amended slightly to The value of *non-canonical/bound* symbols.... Because, as will be remarked in the upcoming review, Pi, for example, being a library-defined ID., appears to typically be 'always' bound/canonical: such that, unlike the case for user-defined/non-default library constants (which as stated should not be bound at canonicalization), it seems reasonable that Pi^0, -Pi^Infinity, etc. should apply?
  • Believe that there is a current bug for SymbolDefinitions.
    It appears that, holdUntil being set to never should only be permitted when the constant switch is also true.
    Otherwise, these non-constant symbols get substituted at canonicalization, resulting in inconsistent/variable full canonical-form?

It should be 1-2 weeks until I can finish-up the final/remaining forms PR (at least to the same degree of attention as here), but that will come.

In the next post, I will leave some open questions that have popped-up throughout my completing this request & again having stumbled into many things. (Most of which related to what's going on here).

if (b.is(0)) {
if (aIsNum) return a.isFinite ? ce.One : ce.NaN;
// If 'isFinite' is a boolean, then 'a' has a value.
if (aIsNum && a.isFinite !== undefined) return a.isFinite ? ce.One : ce.NaN;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, isFinite being 'undefined' is also acting as a check as to whether, if a symbol, it has a bound (numeric) value.
Is there a method/accessor equivalent to hasValue() - this has come to have use at least in a couple of scenarios for myself.
I know that the isConstant symbol method triggers binding (although the outdated in-lined doc. states otherwise); I guess then that it is not possible ?

(Do see PR conversation post 'open questions' regarding this point (although, may not be posted at time you read this))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as checking that expr.value !== undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as checking that expr.value !== undefined?

Would think... but not quite, since value triggers binding, currently. And so 'isFinite' is a funny check here for determining whether a value is present (with it not triggering binding).
Hence why maybe a hasValue or equivalent would be useful as things stand: but maybe not depending on incoming changes...

Copy link
Member

@arnog arnog Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for bringing this up. This lead me to revisit the various properties of BoxedExpression and which ones require a canonical form, or will automatically cause binding. It seems there are discrepancies between the documentation and what the code does, and perhaps the rules need to be revisited for improved consistency.

How about this:

  • operations that are purely structural can be done on unbound (non-canonical) expressions (e.g. expr.subs(), expr.toString(), expr.isSame(), etc...)
  • boxed expressions are never bound as a side effect, they are only bound when their canonical form is created (i.e. at construction)
  • operations/properties that rely on the value of the expression return undefined if the expression is not canonical (not bound), or a function expression. Otherwise, for number literals and symbols with an associated value, the value is used (expr.isFinite, expr.isOdd, expr.re). Perhaps confusingly, expr.valueOf() and therefore expr.value do not require the expression to be bound to return something, such as a string representation of the expression).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Obviously, the most important point there is (being the biggest change):
boxed expressions are never bound as a side effect, they are only bound when their canonical form is created (i.e. at construction)
With the new proposed evaluation frames, this makes more sense/gives way to this change, maybe? (Also, I suppose there is not much need, or it would be a niche one, to inquire into a symbol's value outside of a context? And for that, .bind(), I suppose, could be explicitly called beforehand).

The third/final point makes sense. The only potential blocking-point I can consider is whether, for functions, (returning undefined) this does not interfere with anything like the 'handler' functions (such as sgn, type)? I guess not, since many rely on the function being canonical, anyway...?

For reference - likely you have already seen - I did try to make personal sense of the present behaviour for BoxedSymbols here (a25cc37#diff-20c94cbe480f0ae94ef00ac68a8904b4da3dfe9248d1f0a111315801905ef803R72): maybe helpful to consider just-in-case you decide not to go with this proposal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did see your comment. Thanks for adding it. I think it's accurate (except that isSame() does not require binding AFAIK), and it helped me to realize that the current behavior was perhaps too inconsistent and confusing, hence my motivation to simplify it.

Copy link
Contributor Author

@samueltlg samueltlg Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that it could do with a little documentation for reference!

Before I go to sleep, have just pondered whether, it would any-more make sense for a symbol to be bound 'in-place', considering both 1) separation of canonicalization/binding, and 2) evaluation contexts.
I.e., it seems that it is not helpful (at best) for a symbol (non-constant) to be bound in the context of its containing (supposing canonical), expr., this for the large part, only being useful in the context of evaluation/similar?
Maybe this pondering is similar background reasoning to that which has led to the idea of evaluation-contexts...?
Seems that binding, is unnecessary (until 'last minute'), for non-constant symbols? (it is such that, considering proposed changes, the evaluation-context can be referred to for these, to inquire about their value, alone?)

(As a secondary through, can see why in any case, it would have been convenient in the first place to bundle together/regard as the same both canonicalization, and binding...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Don't think it matters in either case anyway (in-place binding, or updating the internal state of symbol to reflect whether it is bound) - don't think it would pose any issues if it were to remain, and besides, temporarily forgot that symbols can be unbound/reset as appropriate, anyway.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, on reflection, binding should be tied to canonicalization.

Binding (canonicalization) may be useful, if not necessary, before the evaluation of symbols, though. Consider a function whose canonicalization depends on whether its arguments are positive, or integers. In that case, the canonicalization handler would need to access this info, regardless of the value of the symbol (and they symbol may not have a value at all).

One case to consider is ["Declare", "i", "'integer'"]. In this case, the symbol i should not be bound automatically, instead the canonical handler of Declare should keep i as non-canonical, and the evaluation handler should use this unbound symbol to create a new entry in the current scope. Even though this i argument is not canonical, the function expression itself is canonical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both good examples I did not then consider; I guess even signature validation & potential concomitant inference of unknown symbols (& therefore binding to a type), also applies.

@samueltlg
Copy link
Contributor Author

(Have to go; but there's a few open-ended Q's/queries to post here, pending, in relation to changes: particularly relation between canonicalization/binding, and (function) types)

@samueltlg
Copy link
Contributor Author

samueltlg commented Apr 16, 2025

Some open questions:

Regarding this points made in #238 (comment)

  • 'Ideally, canonicalization would be independent of binding, particularly of binding of symbols'

    • The potential issue of this being the case is... ? (I'm guessing, scopes? i.e., by the time it comes to working with the expression, or containing expression further (such as evaluation), this may occur in the context of a new scope: meaning that a new definition maybe should be associated with the symbol instance? (too late, if it is already bound?)
    • If these two were to be separated, then what would canonicalization of symbols entail, in lieu ?
    • Generally, for the purposes of a workaround for avoiding binding & obtaining a symbol's its definition/its info. (in the current scope), is requesting the canonical variant of symbol, acceptable?
      This in turn, begs the question, is a/the problem with binding in this context, the binding of the symbol instance, or, what it entails: namely the creation of an associated definition (where this would best be/ideally deferred) ?

    Also just in case worth asking; is it at all possible to obtain the symbol's type without binding? Or is it all the same (the
    definition being bound required for determining any of isConstant, type, holdUntil, et cetera) ?

  • As remarked/hinted at in some code comments already:

    • Some partial-canonicalization/CanonicalForm operations cannot reliably take place because non-canonical function-expressions cannot have their type ascertained (seemingly due to not being definition-bound): notably even this is clear on inspection, for instance:
       ce.parse('{-\pi}^Infinity', {canonical: 'Power'}); // Remains unchanged ce.parse('{\pi + 1}^1', {canonical: 'Power'}); // Remains the same

For both of these cases, the revised canonicalPower is careful not to simplify these two, because the base operand cannot be verified in each case to be numeric (have a number or more refined) type.
It did come to mind then, at least two workarounds:

  • Requesting/peeking the canonical variant (if the base is a function-expression).
    Clearly, this non-ideal (if not a tad problematic), and further, certainly certainly non-optimal.
  • Perhaps better, if feasible, would be:
    • In a similar way to how 'common' numeric operations/fundamental arithmetic ones have a special handling with reference to canonicalization (the 'makeNumericFunction' function, for instance), would it not be reasonable for these common operations to have a shortcut for peeking their type (based on arguments), i.e. without first being definitely bound to a definition? Imagine that this would have reasonable usefulness: not to mention these outlined cases.

There are a couple more points, but enough for now... !

@arnog
Copy link
Member

arnog commented Apr 16, 2025

Regarding this points made in #238 (comment)

  • 'Ideally, canonicalization would be independent of binding, particularly of binding of symbols'
  • The potential issue of this being the case is... ? (I'm guessing, scopes? i.e., by the time it comes to working with the expression, or containing expression further (such as evaluation), this may occur in the context of a new scope: meaning that a new definition maybe should be associated with the symbol instance? (too late, if it is already bound?)

Yes, exactly. The scope during evaluation may be different than the scope during canonicalization. This is even more of an issue when a scope is created/modified as part of evaluation.

> - If these two were to be separated, then what would canonicalization of symbols entail, in lieu ?


As I envision it, once canonicalization is decoupled from binding, binding would not occcur before evaluation, and each evaluation could result in a different binding.


Note that some function expressions that result in symbol during canonicalization would still be canonicalized, for example the Symbol or Subscript operators.


  • Generally, for the purposes of a workaround for avoiding binding & obtaining a symbol's
    its definition/its info. (in the current scope), is requesting the canonical variant of symbol, acceptable?



Well, this would only be the canonical variant in the current context. So this could be problematic.


This in turn, begs the question, is a/the problem with binding in this context,
the binding of the symbol instance, or, what it entails: namely the creation of an associated definition (where this would best be/ideally deferred) ?


Both.


Also just in case worth asking; is it at all possible to obtain the symbol's type without binding? Or is it all the same (the definition being bound required for determining any of isConstant, type, holdUntil, et cetera) ?


All this information is dependent on the binding.


  • As remarked/hinted at in some code comments already:
  • Some partial-canonicalization/CanonicalForm operations cannot reliably take place because non-canonical function-expressions cannot have their type ascertained (seemingly due to not being definition-bound): notably even this is clear on inspection, for instance:
ce.parse('{-\pi}^Infinity', {canonical: 'Power'}); // Remains unchanged ce.parse('{\pi + 1}^1', {canonical: 'Power'}); // Remains the same

For both of these cases, the revised canonicalPower is careful not to simplify these two, because the base operand cannot be verified in each case to be numeric (have a number or more refined) type.
It did come to mind then, at least two workarounds:

  • Requesting/peeking the canonical variant (if the base is a function-expression).
    Clearly, this non-ideal (if not a tad problematic), and further, certainly certainly non optimal.
  • Perhaps better, if feasible, would be:
    • In a similar way to how 'common' numeric operations/fundamental arithmetic ones have a special handling with reference to canonicalization (the 'makeNumericFunction' function, for instance), would it not be reasonable for these common operations to have a shortcut for peeking their type (based on arguments), i.e. without first being definitely bound to a definition? Imagine that this would have reasonable usefulness: not to mention these outlined cases.

Yes, having as “special case” for some symbols could be an option. Or perhaps a scope can be marked as “static” or “global” meaning it is guaranteed not to change during evaluation, and can be relied on during canonicalization. Actually… this does make sense…

Not sure if, as part of the planned re-arch. of canonicalization, that these CanonicalForm functions will remain (i.e. as function-based replacement rules), or be replaced by individual, string-based rules.

The plan is for these function to remain. The mechanism by which they're invoked will change. Instead of being invoked as a function definition handler, they will be dispatched through a table indexed by the function operator.

Have wondered, with regards to the discussion #238 (comment), particularly The value of symbols should not be considered during canonicalization, even constant ones...:
Is it the case that this should be amended slightly to The value of non-canonical/bound symbols.... Because, as will be remarked in the upcoming review, Pi, for example, being a library-defined ID., appears to typically be 'always' bound/canonical: such that, unlike the case for user-defined/non-default library constants (which as stated should not be bound at canonicalization), it seems reasonable that Pi^0, -Pi^Infinity, etc. should apply?

That's a good point. Some constants (Pi, ExponentialE, etc...) could be handled as such during canonicalization, even if not bound. However, I'm not sure how to handle user-defined constants that would require binding to ascertain their status. Perhaps these just get canonicalized as if they were not constants...

Believe that there is a current bug for SymbolDefinitions.
It appears that, holdUntil being set to never should only be permitted when the constant switch is also true.
Otherwise, these non-constant symbols get substituted at canonicalization, resulting in inconsistent/variable full canonical-form?

:) Fair point, there is not much value in having a holdUntil: "never" if not a constant.

@arnog
Copy link
Member

arnog commented Apr 17, 2025

OK, I've given some more thoughts on canonicalization and binding.

I think that binding and canonicalization (including signature validation) can all happen at the same time, namely during boxing/parsing.

However, what will need to change is how values are associated with identifiers. They are currently associated with the definition record of the identifier, but this will need to move to a separate "frame" record in order to support recursion.

The value of constants will still be stored in their definition records, though. This means that the value of constants (and other properties, such as sign) can be considered during canonicalization (but only for constants and literals). The type of symbols (constant or not) is also available during canonicalization, as well as other symbol attributes, such as holdUntil.

Accessing the value or other properties depending on the value of non-constant identifiers during canonicalization should return undefined, or possibly throw an error (this can be achieved by disabling the frame stack at the start of canonicalization).

@samueltlg
Copy link
Contributor Author

OK, I've given some more thoughts on canonicalization and binding.

I think that binding and canonicalization (including signature validation) can all happen at the same time, namely during boxing/parsing.

However, what will need to change is how values are associated with identifiers. They are currently associated with the definition record of the identifier, but this will need to move to a separate "frame" record in order to support recursion.

The value of constants will still be stored in their definition records, though. This means that the value of constants (and other properties, such as sign) can be considered during canonicalization (but only for constants and literals). The type of symbols (constant or not) is also available during canonicalization, as well as other symbol attributes, such as holdUntil.

Accessing the value or other properties depending on the value of non-constant identifiers during canonicalization should return undefined, or possibly throw an error (this can be achieved by disabling the frame stack at the start of canonicalization).

That would be good, most likely, if signature validation could remain (this stage).

I think I get the gist of what you are saying there... the 'frames' seemingly being like a scope stack, but rooted at the original definition (explicitly declared or otherwise)? And this would be instead of, or in addition, to a 'global symbol table/scope'?
And is pertinent to functions (i.e. the binding), or function-definition bound symbols, too?
Would this help in being able to determine the type of numeric-functions during canonicalization, as in the prior given ce.parse('{-\pi}^Infinity', {canonical: 'Power'}) example ?

Does sound useful that that extended properties will be available for non-constant & symbols for canonicalization 👍

@arnog
Copy link
Member

arnog commented Apr 17, 2025

the 'frames' seemingly being like a scope stack, but rooted at the original definition (explicitly declared or otherwise)?

The frames are an evaluation context. They are arranged in a stack. A new frame is pushed on the stack when a function is evaluated (this could be optimized to be only when a function with named arguments or local variables is executed). The frame only keep track of the current values of named arguments and local variables.

They are distinct from the scope, which is a lexical context. A new scope is pushed on the scope stack when a new declaration context is boxed (canonicalized), not when it's evaluated. The scope keeps track of the type and attributes of identifiers (constant, holdUntil, etc...)

And this would be instead of, or in addition, to a 'global symbol table/scope'?

This would be instead of. There might still be some value in a global table/scope, but not clear what at this point.

And is pertinent to functions (i.e. the binding), or function-definition bound symbols, too?

Frames are not used for binding, only scopes are.

Would this help in being able to determine the type of numeric-functions during canonicalization, as in the prior given ce.parse('{-\pi}^Infinity', {canonical: 'Power'}) example ?

Yes, because the type information and attributes (constant) (for Pi) would be available during canonicalization. In theory, even the value of constants could be used during canonicalization, but attempting to use the value of non-constants would probably throw an error.

@samueltlg
Copy link
Contributor Author

samueltlg commented Apr 17, 2025

OK, think I've got it. The the evaluation contexts are however still informed by/connected to scopes, no? In order that values of local variables/symbols can be looked-up/determined? Think that is where I was getting at.
The issue with the '{-\pi}^Infinity' case at canonicalization is not Pi's attributes in this case, but rather that of 'Negate', i.e. because The Negate expression -\pi is not canonical & therefore not bound, it's type cannot be calculated as 'Number', for example.

@arnog
Copy link
Member

arnog commented Apr 17, 2025

The the evaluation contexts are however still informed by/connected to scopes, no? In order that values of local variables/symbols can be looked-up/determined? Think that is where I was getting at.

The frames ("evaluation context") keep track of the values of local variables and arguments (non constant identifiers). They do not need the scopes (lexical context) to do so.

Each symbol or function expression is bound to a record in a scope, which can be used to access metadata about it (is it a constant, what is its type, etc...).

The issue with the '{-\pi}^Infinity' case at canonicalization is not Pi's attributes in this case, but rather that of 'Negate', i.e. because The Negate expression -\pi is not canonical & therefore not bound, it's type cannot be calculated as 'Number', for example.

Oh, I see. That's because there is partial canonicalization applied... Well, this wouldn't help with that case, no. I'm not sure what the expectation would be in this case. Canonicalizing the arguments seems like it could be surprising if you only asked for Power canonicalization. But then again, not having the canonicalization apply in this case (but apply if the Negate is not there) seems confusing as well.

@samueltlg
Copy link
Contributor Author

Ok, will probably have to see it in action to fully grasp (its significance).

(Not sure any course of action for second point either; doesn't matter...)

I think I'll just pop-up the missing canonicalPower rule a^b^c -> a^(b*c) tomorrow morning, and this block of work should be good to go, I think...

…orms; assoc. fixes) - !BoxedExpr.value now altogether returns 'undefined' for non-literal boxed-expr. types (e.g. functions) - Fix: typo in calculation of value for 'Root' functions in `BoxedExpr.root()` - Fix doc. of BoxedExpr properties: e.g. isPure/isConstant
(>subject-to-change) -Alternatively, this could be achieved through a 'Symbol' CanonicalForm: although going about things this way would make things inconsistent in respect to symbol-value substitution for those declared with 'holdUntil: never' *note*: currently, canonicalization also binds symbols (see cortex-js#238 (comment)), but for now, since is tied to canonicalization, is necessary to take place.
jointly, do more checks on operand types (particularly for function exprs.), more strictly check for type 'number'; less eager in simplifying FN-expressions as operands.
@samueltlg
Copy link
Contributor Author

Think this should be good to go (aside from a small conflict), whenever you're ready...

@arnog arnog merged commit db8e27d into cortex-js:main Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants