- Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: extract clean data #822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors selector generation in Changes
Sequence Diagram(s)sequenceDiagram autonumber actor User participant Page participant SelectorGen as clientSelectorGenerator participant Shadow as Shadow/Subtree User->>Page: Click / Point(x,y) Page->>SelectorGen: getDeepestElementFromPoint(x,y) SelectorGen->>Page: elementsFromPoint(x,y) alt deepest is meaningful SelectorGen-->>Page: return deepestElement else not meaningful SelectorGen->>Shadow: findAtomicChildAtPoint(deepest, bbox) Shadow-->>SelectorGen: atomicMeaningfulDescendant? alt found SelectorGen-->>Page: return atomicDescendant else not found SelectorGen-->>Page: return original deepest end end sequenceDiagram autonumber participant Gen as Selector Builder participant Target as TargetElement participant Lists as OtherListElements participant Utils as Path/Attr Utils Gen->>Utils: getElementPath(Target) Gen->>Lists: findCorrespondingElement(root, path)* note over Gen,Lists: Align competitor elements by DOM path loop Candidate attributes/classes Gen->>Utils: isAttributeCommonAcrossLists(attrName, attrValue, Lists) alt common across lists Gen-->>Gen: emit attribute predicate else not common Gen-->>Gen: skip predicate end end Gen-->>Target: Final selector Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/helpers/clientSelectorGenerator.ts (1)
976-993: Invert grouped parent/child filtering to keep leaf-most elements.filterParentChildGroupedElements currently keeps parents that contain other grouped elements. Usually we want the deepest (leaf) grouped elements to avoid duplicate highlights.
Apply this diff:
private filterParentChildGroupedElements( groupedElements: HTMLElement[] ): HTMLElement[] { const result: HTMLElement[] = []; for (const element of groupedElements) { const hasGroupedChild = groupedElements.some( (other) => other !== element && element.contains(other) ); - if (hasGroupedChild) { - result.push(element); - } + if (!hasGroupedChild) { + result.push(element); + } } return result.length > 0 ? result : groupedElements; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/helpers/clientSelectorGenerator.ts(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/helpers/clientSelectorGenerator.ts (1)
maxun-core/src/browserSide/scraper.js (1)
MAX_DEPTH(955-955)
🔇 Additional comments (1)
src/helpers/clientSelectorGenerator.ts (1)
2552-2552: Ensure MAX_DEPTH divergence is intentional
MAX_DEPTH increased from 4→20 in clientSelectorGenerator.ts, while upstream scraper and server selectors cap at 5/4. If this deeper traversal interacts with those components, confirm it won’t degrade performance or cause selector mismatches—consider aligning or documenting the rationale.
| if (!addPositionToAll) { | ||
| const meaningfulAttrs = ["role", "type", "name", "src", "aria-label"]; | ||
| const meaningfulAttrs = ["role", "type"]; | ||
| for (const attrName of meaningfulAttrs) { | ||
| if (element.hasAttribute(attrName)) { | ||
| const value = element.getAttribute(attrName)!.replace(/'/g, "\\'"); | ||
| return `${tagName}[@${attrName}='${value}']`; | ||
| const isCommonAttribute = this.isAttributeCommonAcrossLists( | ||
| element, | ||
| attrName, | ||
| value, | ||
| otherListElements | ||
| ); | ||
| if (isCommonAttribute) { | ||
| return `${tagName}[@${attrName}='${value}']`; | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-list attribute check maps by absolute path; compute relative path to the list root and fail closed.
isAttributeCommonAcrossLists builds a path from the document root (getElementPath) then applies it starting from other list roots. This mismaps descendants and treats “corresponding element not found” as implicitly OK, letting non-common attributes pass. Compute a path relative to the current list root and require a match in every other list.
Apply these diffs:
@@ - if (!addPositionToAll) { - const meaningfulAttrs = ["role", "type"]; + if (!addPositionToAll) { + const meaningfulAttrs = ["role", "type"]; for (const attrName of meaningfulAttrs) { if (element.hasAttribute(attrName)) { const value = element.getAttribute(attrName)!.replace(/'/g, "\\'"); - const isCommonAttribute = this.isAttributeCommonAcrossLists( - element, - attrName, - value, - otherListElements - ); + const isCommonAttribute = rootElement + ? this.isAttributeCommonAcrossLists( + element, + rootElement, + attrName, + value, + otherListElements + ) + : false; if (isCommonAttribute) { return `${tagName}[@${attrName}='${value}']`; } } } } @@ - const testId = element.getAttribute("data-testid"); + const testId = element.getAttribute("data-testid"); if (testId && !addPositionToAll) { - const isCommon = this.isAttributeCommonAcrossLists( - element, - "data-testid", - testId, - otherListElements - ); + const isCommon = rootElement + ? this.isAttributeCommonAcrossLists( + element, + rootElement, + "data-testid", + testId, + otherListElements + ) + : false; if (isCommon) { return `${tagName}[@data-testid='${testId}']`; } } @@ - if (element.id && !element.id.match(/^\d/) && !addPositionToAll) { - const isCommon = this.isAttributeCommonAcrossLists( - element, - "id", - element.id, - otherListElements - ); + if (element.id && !element.id.match(/^\d/) && !addPositionToAll) { + const isCommon = rootElement + ? this.isAttributeCommonAcrossLists( + element, + rootElement, + "id", + element.id, + otherListElements + ) + : false; if (isCommon) { return `${tagName}[@id='${element.id}']`; } } @@ - if (!addPositionToAll) { + if (!addPositionToAll) { for (const attr of Array.from(element.attributes)) { if ( attr.name.startsWith("data-") && attr.name !== "data-testid" && attr.name !== "data-mx-id" && attr.value ) { - const isCommon = this.isAttributeCommonAcrossLists( - element, - attr.name, - attr.value, - otherListElements - ); + const isCommon = rootElement + ? this.isAttributeCommonAcrossLists( + element, + rootElement, + attr.name, + attr.value, + otherListElements + ) + : false; if (isCommon) { return `${tagName}[@${attr.name}='${attr.value}']`; } } } } @@ - private isAttributeCommonAcrossLists( - targetElement: HTMLElement, - attrName: string, - attrValue: string, - otherListElements: HTMLElement[] - ): boolean { - if (otherListElements.length === 0) { - return true; - } - - const targetPath = this.getElementPath(targetElement); - - for (const otherListElement of otherListElements) { - const correspondingElement = this.findCorrespondingElement( - otherListElement, - targetPath - ); - if (correspondingElement) { - const otherValue = correspondingElement.getAttribute(attrName); - if (otherValue !== attrValue) { - return false; - } - } - } - - return true; - } + private isAttributeCommonAcrossLists( + targetElement: HTMLElement, + rootElement: HTMLElement, + attrName: string, + attrValue: string, + otherListElements: HTMLElement[] + ): boolean { + if (otherListElements.length === 0) return true; + const relPath = this.getElementPath(targetElement, rootElement); + if (!relPath) return false; // cannot map reliably relative to root + for (const otherRoot of otherListElements) { + const corresponding = this.findCorrespondingElement(otherRoot, relPath); + if (!corresponding) return false; + if (corresponding.getAttribute(attrName) !== attrValue) return false; + } + return true; + } @@ - private getElementPath(element: HTMLElement): number[] { - const path: number[] = []; - let current: HTMLElement | null = element; - - while (current && current.parentElement) { - const siblings = Array.from(current.parentElement.children); - path.unshift(siblings.indexOf(current)); - current = current.parentElement; - } - - return path; - } + private getElementPath( + element: HTMLElement, + rootElement: HTMLElement + ): number[] | null { + const path: number[] = []; + let current: HTMLElement | null = element; + // Walk up until rootElement (crossing shadow hosts if needed) + while (current && current !== rootElement) { + const parent = + current.parentElement || + ((current.getRootNode() as ShadowRoot).host as HTMLElement | null); + if (!parent) return null; + const siblings = Array.from(parent.children); + const idx = siblings.indexOf(current); + if (idx < 0) return null; + path.unshift(idx); + current = parent; + } + return current === rootElement ? path : null; + }Also applies to: 2717-2728, 2730-2740, 2749-2759, 2926-2953, 2954-2966, 2967-2983
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 2699-2715 (and also apply same change to ranges 2717-2728, 2730-2740, 2749-2759, 2926-2953, 2954-2966, 2967-2983): the current isAttributeCommonAcrossLists logic builds element paths from the document root and then queries other list roots using those absolute paths, which can map to wrong descendants and treats "not found" as success; change it to compute the element path relative to the current list root (strip or build the path starting from the list root node) and when checking otherListElements require that the corresponding relative path resolves to an element and that its attribute value equals the expected value in every other list (i.e., treat missing matches as failure), returning false if any list does not have a matching element or attribute value; update all mentioned code ranges to use the relative-path lookup and strict matching. | if (!this.isMeaningfulElementCached(deepestElement)) { | ||
| const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y); | ||
| if (atomicChild) { | ||
| return atomicChild; | ||
| } | ||
| } | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Atomic-child fallback misses shadow DOM descendants.
findAtomicChildAtPoint only walks light DOM children. If the meaningful descendant is in a shadow root, the fallback won’t find it.
Apply this diff:
- if (!this.isMeaningfulElementCached(deepestElement)) { - const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y); + if (!this.isMeaningfulElementCached(deepestElement)) { + const atomicChild = this.findAtomicChildAtPoint(deepestElement, x, y); if (atomicChild) { return atomicChild; } }And enhance traversal:
private findAtomicChildAtPoint( parent: HTMLElement, x: number, y: number ): HTMLElement | null { const stack: HTMLElement[] = [parent]; const visited = new Set<HTMLElement>(); while (stack.length > 0) { const element = stack.pop()!; if (visited.has(element)) continue; visited.add(element); if (element !== parent && this.isMeaningfulElementCached(element)) { const rect = element.getBoundingClientRect(); if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) { return element; } } - for (let i = element.children.length - 1; i >= 0; i--) { + // Light DOM children (iterate from topmost in stacking order approximation) + for (let i = element.children.length - 1; i >= 0; i--) { const child = element.children[i] as HTMLElement; const rect = child.getBoundingClientRect(); if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) { stack.push(child); } } + // Shadow DOM children (if any) + if ((element as HTMLElement).shadowRoot) { + const shadowChildren = (element as HTMLElement).shadowRoot!.children; + for (let i = shadowChildren.length - 1; i >= 0; i--) { + const child = shadowChildren[i] as HTMLElement; + const rect = child.getBoundingClientRect(); + if (x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom) { + stack.push(child); + } + } + } } return null; }Also applies to: 4003-4034
🤖 Prompt for AI Agents
In src/helpers/clientSelectorGenerator.ts around lines 3993-3999 (and similarly for 4003-4034), the atomic-child fallback uses findAtomicChildAtPoint which only traverses light DOM children and therefore misses meaningful descendants inside shadow roots; update the traversal to also check shadow DOM by: when examining a candidate element at the point, if it has a shadowRoot call shadowRoot.elementFromPoint(x,y) (or shadowRoot.deep element lookup) and continue descending into that shadow root’s composed tree (including following assigned nodes for slots and checking shadow-inserted children) before falling back to light DOM children; ensure the function returns the deepest atomic descendant found in either the shadowRoot or light children and mirror this shadow-aware logic in the other affected block (4003-4034).
What this PR does?
Summary by CodeRabbit
New Features
Bug Fixes
Performance
Refactor