Skip to content

Conversation

@xiedeyantu
Copy link
Member

@xiedeyantu xiedeyantu commented Dec 22, 2025

@xiedeyantu xiedeyantu marked this pull request as draft December 22, 2025 10:45
@xiedeyantu
Copy link
Member Author

I haven't submitted a dedicated JIRA to describe this, mainly because I'm unsure if we currently need a configuration for Calcite to support the new TopDownGeneralDecorrelator, as its functionality is still incomplete. However, it seems that Quidm testing is needed during the feature development process. I tried writing an implementation, but I'm not sure if it's appropriate. If anyone thinks it's necessary, I will create a JIRA to describe some of the capabilities available at this stage.

@asolimando
Copy link
Member

Thanks for working on this, I think this is worth a Jira ticket as it will be a non-trivial change, and I think it's very needed to build confidence on the new decorrelator.

All this might eventually be removed in the future if we agree that the new decorrelator covers what the original one does, without regressions.

@xiedeyantu
Copy link
Member Author

Thanks for working on this, I think this is worth a Jira ticket as it will be a non-trivial change, and I think it's very needed to build confidence on the new decorrelator.

All this might eventually be removed in the future if we agree that the new decorrelator covers what the original one does, without regressions.

@asolimando Thanks for your reply. I have filed a jira CALCITE-7342. Seeing your reply, I feel it's time for me to create a Jira instance to document this, even though I'm not sure if it's appropriate to implement it right now. But it will be necessary at some point, so I'll implement it according to my understanding and let everyone see if they accept it.

@xiedeyantu xiedeyantu changed the title Quidem test support TopDownGeneralDecorrelator [CALCITE-7342] Quidem test support for TopDownGeneralDecorrelator Dec 22, 2025
.with(CalciteConnectionProperty.FUN, SqlLibrary.CALCITE.fun)
.with(CalciteAssert.Config.SCOTT)
.connect();
case "scott-top-down":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require doubling the number of "connections" to test everything.
I was thinking to do this by creating a single extra test file that inherits CoreQuidemTests and which sets the new algorithm for all test files and just runs all of them.
(The ones which output plans will fail, though, so something has to be done about that.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. However, I'm worried that this work will take a long time, and it might be better to migrate the .iq files one by one, or even just a portion of the content within a file? Wouldn't using the same .iq file be quite messy? While we can use !if (enable_new_decorrelator) { ... } to mark it, wouldn't that be too cluttered? Alternatively, could we have the new QuidmCoreTests initially only involve the newly added decorrelate.iq file, and then gradually add the !if statements to the existing files to maintain two different logics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move this discussion in JIRA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, due to the significant changes, I have force-push the code.

@xiedeyantu xiedeyantu added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Dec 22, 2025
@xiedeyantu xiedeyantu removed the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Dec 23, 2025
Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but the CI fails, maybe that's expected at this stage.

.replaceIfs(RelCollationTraitDef.INSTANCE,
() -> Collections.singletonList(
input.getTraitSet().getTrait(RelCollationTraitDef.INSTANCE)))
() -> input.getTraitSet().getTraits(RelCollationTraitDef.INSTANCE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change related to this issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not entirely sure why; I made some adjustments based on the error message. I will revert the changes to RelDistribution later.

> Error while executing command ExplainCommand [sql: with t (a, b) as (select * from (values (60, 'b'))) > select * from t where a in (select deptno from "scott".dept)] > java.sql.SQLException: Error while executing SQL "explain plan for with t (a, b) as (select * from (values (60, 'b'))) > select * from t where a in (select deptno from "scott".dept)": class org.apache.calcite.plan.RelCompositeTrait cannot be cast to class org.apache.calcite.rel.RelCollation (org.apache.calcite.plan.RelCompositeTrait and org.apache.calcite.rel.RelCollation are in unnamed module of loader 'app') 
@xiedeyantu
Copy link
Member Author

This looks good, but the CI fails, maybe that's expected at this stage.

agg.iq failed, so I temporarily removed this test. I don't know the reason yet; it could be due to multiple factors. I want to get this test process working before analyzing it in detail.

@xiedeyantu
Copy link
Member Author

The current CI for this PR seems to have some issues. It should actually pass all checks. You can refer to #4705 for reference.

@xiedeyantu xiedeyantu marked this pull request as ready for review December 24, 2025 15:25
final URL inUrl = QuidemTest.class.getResource("/" + n2u(path));
inFile = Sources.of(requireNonNull(inUrl, "inUrl")).file();
outFile = replaceDir(inFile, "resources", "quidem");
outFile = replaceDir(inFile, "resources", "quidem/"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous error was caused by two tests using the same comparison file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so can we do the overloading trick?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're essentially adding a layer of class name as identifier. Overloading might be a good approach, but it could require duplicating some code or adding a new method, which doesn’t seem to differ much from the current implementation.

if (config.forceDecorrelate()) {
final RelBuilder relBuilder =
RelFactories.LOGICAL_BUILDER.create(rel.getCluster(), null);
if (config.topDownGeneralDecorrelationEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we also need to adapt the Program of removing subquery. When using the new decorrelator, We recommend using CoreRules.FILTER_SUB_QUERY_TO_MARK_CORRELATE and CoreRules.PROJECT_SUB_QUERY_TO_MARK_CORRELATE to remove subqueries from Filter and Project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will fix it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After replacing these two rules, left_mark join will be generated in agg.iq, so I removed this test for now.

@silundong
Copy link
Contributor

The code looks great, I just have some questions about the quidem test:

  1. We want to use the new decorrelator to cover the test cases in the current iq files. After implementing left mark join in enumerable convention, we can verify the execution results. However, many iq files (such as sub-query.iq) use !plan to verify the plan (the plan is based on the old decorrelator ). How should we handle these?
  2. After we verify that the new decorrelator correctly covers the existing test cases, CoreQuidemTest2 might be removed. Perhaps at that time, CoreQuidemTest will support a configuration similar to set new_decorrelator, allowing users to switch between them in the iq file. Is that the intended approach?
@xiedeyantu
Copy link
Member Author

The code looks great, I just have some questions about the quidem test:

  1. We want to use the new decorrelator to cover the test cases in the current iq files. After implementing left mark join in enumerable convention, we can verify the execution results. However, many iq files (such as sub-query.iq) use !plan to verify the plan (the plan is based on the old decorrelator ). How should we handle these?
  2. After we verify that the new decorrelator correctly covers the existing test cases, CoreQuidemTest2 might be removed. Perhaps at that time, CoreQuidemTest will support a configuration similar to set new_decorrelator, allowing users to switch between them in the iq file. Is that the intended approach?

For the first issue, since quidem does not support the !if-!else syntax, I have added enable_new_decorrelater to support the demonstration of the new decorrelator’s execution. For the second issue, I believe we may maintain this state for a long time until we consider the old decorrelator no longer needs maintenance. At that point, we might clean up some of the logic, but it is also possible that both algorithms will continue to coexist. What we need to do now is to gradually include the excluded files in CoreQuidemTest2 one by one and ensure the new decorrelator can execute correctly.

final RelOptCluster cluster = RelOptCluster.create(planner, rexBuilder);
final SqlToRelConverter.Config config =
sqlToRelConverterConfig.withTrimUnusedFields(false);
sqlToRelConverterConfig.withTrimUnusedFields(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would align the with() each on a new line

final Program oldProgram = of(builder.build(), true, metadataProvider);

final HepProgramBuilder newBuilder = HepProgram.builder();
newBuilder.addRuleCollection(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these cover all subquery cases?
I am not sure why the SUM rule is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found this one instance of logic for handling subqueries; there's nothing else. I'm also a bit confused about why it handles SUM, but since this is code from 2024, I've decided to keep it here for now, we can see CALCITE-6020.


@Override protected Collection<String> data() {
final List<String> paths = new ArrayList<>(super.data());
paths.remove("sql/agg.iq");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would comment that these are temporary and will eventually be removed

EnumerableTableScan(table=[[scott, EMP]])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[CAST($t6):DECIMAL(12, 2)], expr#9=[5.00:DECIMAL(12, 2)], expr#10=[>($t8, $t9)], expr#11=[IS NOT NULL($t3)], expr#12=[AND($t10, $t11)], EMPNO=[$t0], MGR=[$t3], COMM=[$t6], $condition=[$t12])
EnumerableTableScan(table=[[scott, EMP]])
!plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the new decorrelator would make !plan always pass (by essentially ignoring the output), but I am not sure this is even possible with the way quidem works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't just ignore them; we still need to use !if statements to process them one by one. This is because the SQL plans in the .iq file aren't necessarily covered by other tests (such as RelOptRulesTest), so they still have value, even if we can ignore the !plan comparison. My approach is to handle each case rigorously, although this might be quite extensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants