Skip to content

Conversation

GreyCat
Copy link
Member

@GreyCat GreyCat commented Aug 28, 2025

PythonCompiler/PythonTranslator: implement escaping of reserved keywords adding _ to them to prevent creating syntactically incorrect code

Related test changes (without compiler fixes prove that the problem exists, with compiler fixes prove that the problem is resolved): kaitai-io/kaitai_struct_tests#136

Related prior art PR by @andrewmonostate: #320 — this implementation is strongly inspired by that PR, but this one:

  • covers more cases
  • avoids affecting _m_ member
  • does not add unit tests (in favor of fuller coverage in main tests repo)
…rds adding `_` to them to prevent creating syntactically incorrect code
@andrewmonostate
Copy link

Hey @GreyCat

I have to be honest - this feels very frustrating.

My PR (#320) fixed this exact issue 11 days ago, with tests and consistent escaping across identifiers. I myself implemented the same fix in 3 reis that depended on Katia's, and took my time to fix it here on the source so it stops happening for everyone. Now I see a new PR that basically re-implements the same logic, but:

•	skips escaping _m_ members (which is risky, because reserved keywords can still leak there), •	drops compiler-local unit tests in favor of “main tests” (which hurts maintainability and regression safety), •	and only tweaks scope a little wider (enums/type names) compared to mine. 

This is not an improvement - it’s the same idea, but with less safety. If the goal was broader coverage, that should have been built on top of the existing fix with tests, not a rewrite.

This is exactly the kind of thing that makes OSS contributors to fade away:
• Code review cycles drag on for weeks.
• Fixes sit unmerged.
• Then someone else lands a near-duplicate patch that removes safeguards and introduces inconsistencies.

If this review is going to be taken seriously, the right path here is:
1. Merge the tested, consistent fix (PR #320).
2. Incrementally extend it to enums/type names.
3. Keep escaping behavior consistent (including m) unless there’s a concrete reason not to.
4. Retain unit tests in the compiler repo itself, not rely solely on integration tests elsewhere, unless there's a concrete reason not to.

Otherwise we’re just encouraging people to stop contributing.

And just to touch on the “underscore vs numeric suffix” bikeshedding from my original PR: appending _ is the established convention in Python (PEP8, stdlib, countless OSS projects). It’s simple, predictable, and matches what developers already expect — class → class_. Dragging this into abstract arguments about “what if two underscores collide with another identifier” or “should we number them” is over-engineering. Those pathological cases are vanishingly rare compared to the real-world problem of generated code not even importing(!!). Numeric suffixes would actually make things less readable (class1, class2 instead of class_) and less Pythonic. The priority here should be solving the actual bug cleanly, not theoretical purity debates that stall fixes.

@GreyCat
Copy link
Member Author

GreyCat commented Aug 29, 2025

@andrewmonostate I understand and sympathize with frustration (especially in light of lengthy discussions in #320).

Now, put yourself in my shoes:

  • I've got a couple of free hours I can spend on my hobby project.
  • We have a bug #90 which needs to be fixed — compiler producing syntactically incorrect code is never ok.
  • I've got your contribution, which, unfortunately, is incomplete: it provides only the partial fix.
  • I've suggested why: unit testing makes it practically impossible to catch all the permutations, we need integration testing with actual target languages, and that's exactly why we have our tests repo.
  • I've notified you about missed parts on 2025-08-23 — no response until today.
  • Meanwhile, @generalmimon was very kind to provide a review on Reserved python keywords kaitai_struct_tests#136, which ultimately resulted in "well, we need at least 1 working language to actually prove that the test is correct, otherwise we can commit total nonsense as .kst".
  • Given that it all started with Python, I went ahead and implemented the fuller Python fix (at least up to the point currently tracked with .ksy test).

My intent was never to take the credit away from you, nor to downplay the initiative you came up with.

As I've mentioned, this is a complicated problem. Yes, it will involve changes in 2 repos instead of 1, and it's a bit more complicated than your average "load a project in IDE -> change code, add tests in same language -> run tests in IDE -> done". We can argue about monorepo vs polyrepo approach is easy to work with, but I will stand firmly by the fact we need to test such things against real target language interpreter (in this case, Python).

Now, we can do it either of two ways.

Option 1

Option 2

@andrewmonostate
Copy link

Thank you for the response. A few clarifications and concrete requests:

  1. “Incomplete” claim (scope)
    • My PR (Fix Python reserved keyword escaping in code generation #320) fixes the core bug by escaping reserved keywords at the identifier layer (including NamedIdentifier and InstanceIdentifier). That’s the minimal, Pythonic, correct fix that stops generated code from failing to import.
    • Your PR adds additional call sites (e.g., enums/type names). Great - that’s an incremental extension, not a reason to supersede the original fix. The correct process is to merge the tested base fix and then layer broader coverage on top.

  2. m members
    • Your PR deliberately does not escape m members. That creates an inconsistency and leaves a risk surface (reserved words can still appear there). If you want to change the behavior, please provide a concrete counterexample showing why escaping m is harmful, not just a preference. Otherwise, consistent escaping across identifiers is the safer default.

  3. Tests (compiler unit tests vs “main tests repo”)
    • Integration tests in the tests repo are useful, but not a substitute for targeted unit tests in the compiler repo, as there are others here already. Unit tests catch regressions at the point of change, are faster to run/iterate, and document intent right next to the code. Removing unit tests is against best practice; the right approach is both: keep the compiler-level tests and add end-to-end coverage in the main tests repo.
    • Also, the “monorepo vs polyrepo” point isn’t a reason to duplicate a patch. Polyrepo can justify additional integration tests, not deleting localized unit tests, at least in my view.

  4. Duplicate PR etiquette
    • Spinning up a parallel PR that re-implements the same escape logic (you called it “strongly inspired” by Fix Python reserved keyword escaping in code generation #320) fragments review history and credit, and prolongs resolution. The usual OSS etiquette is: review and extend the existing PR with additional cases (enums/type names), not replace it. We’ve already burned 11 days discussing this; creating a shadow PR adds more churn.

  5. Nine years of discussion is the real problem
    • This exact subject has been stuck since 2016 (see #90). My bug report (#1249) with a concrete Python repro was closed into that 9-year thread, and here we are repeating the same theoretical debates while users still can’t import generated code.

  6. Concrete path forward (pick one and I’ll cooperate immediately)
    Option A (cleaner in my view):
    • Merge Fix Python reserved keyword escaping in code generation #320 as-is (keep compiler unit tests).
    • Then extend coverage in a follow-up PR (add enum/type name escaping, etc.).
    • Keep m escaping for consistency unless you provide a concrete counterexample demonstrating harm; if you have one, you should document it and show with tests.

    Option B:
    • Proceed with your PR but:
    • Port over the compiler unit tests from Fix Python reserved keyword escaping in code generation #320 (ai wouldn't drop local tests)
    • Restore consistent m escaping or document/tests a clear rationale for excluding it.

Either way, just please ship the fix, if possible to do so without deleting tests or leaving m as a blind spot, great. The standard and practical Pythonic behavior would be class → class_, and both compiler-local and integration tests should backstop it.

PS: I didn't reply earlier or create a patch in the original PR because I was trying to understand where that discussion there was heading. I still don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants