- Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7342] Quidem test support for TopDownGeneralDecorrelator #4704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| I haven't submitted a dedicated JIRA to describe this, mainly because I'm unsure if we currently need a configuration for Calcite to support the new TopDownGeneralDecorrelator, as its functionality is still incomplete. However, it seems that Quidm testing is needed during the feature development process. I tried writing an implementation, but I'm not sure if it's appropriate. If anyone thinks it's necessary, I will create a JIRA to describe some of the capabilities available at this stage. |
| Thanks for working on this, I think this is worth a Jira ticket as it will be a non-trivial change, and I think it's very needed to build confidence on the new decorrelator. All this might eventually be removed in the future if we agree that the new decorrelator covers what the original one does, without regressions. |
@asolimando Thanks for your reply. I have filed a jira CALCITE-7342. Seeing your reply, I feel it's time for me to create a Jira instance to document this, even though I'm not sure if it's appropriate to implement it right now. But it will be necessary at some point, so I'll implement it according to my understanding and let everyone see if they accept it. |
2dfefb4 to bbd70e1 Compare | .with(CalciteConnectionProperty.FUN, SqlLibrary.CALCITE.fun) | ||
| .with(CalciteAssert.Config.SCOTT) | ||
| .connect(); | ||
| case "scott-top-down": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require doubling the number of "connections" to test everything.
I was thinking to do this by creating a single extra test file that inherits CoreQuidemTests and which sets the new algorithm for all test files and just runs all of them.
(The ones which output plans will fail, though, so something has to be done about that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. However, I'm worried that this work will take a long time, and it might be better to migrate the .iq files one by one, or even just a portion of the content within a file? Wouldn't using the same .iq file be quite messy? While we can use !if (enable_new_decorrelator) { ... } to mark it, wouldn't that be too cluttered? Alternatively, could we have the new QuidmCoreTests initially only involve the newly added decorrelate.iq file, and then gradually add the !if statements to the existing files to maintain two different logics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move this discussion in JIRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, due to the significant changes, I have force-push the code.
bbd70e1 to 913b084 Compare 913b084 to 5f02045 Compare
mihaibudiu left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but the CI fails, maybe that's expected at this stage.
| .replaceIfs(RelCollationTraitDef.INSTANCE, | ||
| () -> Collections.singletonList( | ||
| input.getTraitSet().getTrait(RelCollationTraitDef.INSTANCE))) | ||
| () -> input.getTraitSet().getTraits(RelCollationTraitDef.INSTANCE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change related to this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not entirely sure why; I made some adjustments based on the error message. I will revert the changes to RelDistribution later.
> Error while executing command ExplainCommand [sql: with t (a, b) as (select * from (values (60, 'b'))) > select * from t where a in (select deptno from "scott".dept)] > java.sql.SQLException: Error while executing SQL "explain plan for with t (a, b) as (select * from (values (60, 'b'))) > select * from t where a in (select deptno from "scott".dept)": class org.apache.calcite.plan.RelCompositeTrait cannot be cast to class org.apache.calcite.rel.RelCollation (org.apache.calcite.plan.RelCompositeTrait and org.apache.calcite.rel.RelCollation are in unnamed module of loader 'app')
agg.iq failed, so I temporarily removed this test. I don't know the reason yet; it could be due to multiple factors. I want to get this test process working before analyzing it in detail. |
092e031 to f28845d Compare | The current CI for this PR seems to have some issues. It should actually pass all checks. You can refer to #4705 for reference. |
| final URL inUrl = QuidemTest.class.getResource("/" + n2u(path)); | ||
| inFile = Sources.of(requireNonNull(inUrl, "inUrl")).file(); | ||
| outFile = replaceDir(inFile, "resources", "quidem"); | ||
| outFile = replaceDir(inFile, "resources", "quidem/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous error was caused by two tests using the same comparison file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so can we do the overloading trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we're essentially adding a layer of class name as identifier. Overloading might be a good approach, but it could require duplicating some code or adding a new method, which doesn’t seem to differ much from the current implementation.
| if (config.forceDecorrelate()) { | ||
| final RelBuilder relBuilder = | ||
| RelFactories.LOGICAL_BUILDER.create(rel.getCluster(), null); | ||
| if (config.topDownGeneralDecorrelationEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we also need to adapt the Program of removing subquery. When using the new decorrelator, We recommend using CoreRules.FILTER_SUB_QUERY_TO_MARK_CORRELATE and CoreRules.PROJECT_SUB_QUERY_TO_MARK_CORRELATE to remove subqueries from Filter and Project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I will fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After replacing these two rules, left_mark join will be generated in agg.iq, so I removed this test for now.
| The code looks great, I just have some questions about the quidem test:
|
For the first issue, since quidem does not support the !if-!else syntax, I have added enable_new_decorrelater to support the demonstration of the new decorrelator’s execution. For the second issue, I believe we may maintain this state for a long time until we consider the old decorrelator no longer needs maintenance. At that point, we might clean up some of the logic, but it is also possible that both algorithms will continue to coexist. What we need to do now is to gradually include the excluded files in CoreQuidemTest2 one by one and ensure the new decorrelator can execute correctly. |
| final RelOptCluster cluster = RelOptCluster.create(planner, rexBuilder); | ||
| final SqlToRelConverter.Config config = | ||
| sqlToRelConverterConfig.withTrimUnusedFields(false); | ||
| sqlToRelConverterConfig.withTrimUnusedFields(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would align the with() each on a new line
| final Program oldProgram = of(builder.build(), true, metadataProvider); | ||
| | ||
| final HepProgramBuilder newBuilder = HepProgram.builder(); | ||
| newBuilder.addRuleCollection( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these cover all subquery cases?
I am not sure why the SUM rule is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only found this one instance of logic for handling subqueries; there's nothing else. I'm also a bit confused about why it handles SUM, but since this is code from 2024, I've decided to keep it here for now, we can see CALCITE-6020.
| | ||
| @Override protected Collection<String> data() { | ||
| final List<String> paths = new ArrayList<>(super.data()); | ||
| paths.remove("sql/agg.iq"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would comment that these are temporary and will eventually be removed
| EnumerableTableScan(table=[[scott, EMP]]) | ||
| EnumerableCalc(expr#0..7=[{inputs}], expr#8=[CAST($t6):DECIMAL(12, 2)], expr#9=[5.00:DECIMAL(12, 2)], expr#10=[>($t8, $t9)], expr#11=[IS NOT NULL($t3)], expr#12=[AND($t10, $t11)], EMPNO=[$t0], MGR=[$t3], COMM=[$t6], $condition=[$t12]) | ||
| EnumerableTableScan(table=[[scott, EMP]]) | ||
| !plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the new decorrelator would make !plan always pass (by essentially ignoring the output), but I am not sure this is even possible with the way quidem works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't just ignore them; we still need to use !if statements to process them one by one. This is because the SQL plans in the .iq file aren't necessarily covered by other tests (such as RelOptRulesTest), so they still have value, even if we can ignore the !plan comparison. My approach is to handle each case rigorously, although this might be quite extensive.
|



See CALCITE-7342