- Notifications
You must be signed in to change notification settings - Fork 14
feat: Add LargeList support for JavaScript bindings #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement full support for the LargeList data type, which uses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB. Changes include: - Add LargeList type class with BigInt64Array offset support - Implement visitor pattern methods across all visitors (get, iterator, indexof, typeassembler, jsontypeassembler) - Add IPC serialization/deserialization support - Add comprehensive test suite using existing test generation framework - Export LargeList in public API The implementation follows existing code patterns and all tests pass. Closes apache#70 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
| Could you read https://www.apache.org/legal/generative-tooling.html carefully and share what you ensure? |
| I've read the doc and can assure that the contribution meets the guidelines. Thankfully I only used Anthropic's claude code. Anthropic's Commercial Terms explicitly state: "Customer (a) retains all rights to its Inputs, and (b) owns its Outputs." Anthropic assigns all rights in outputs to the customer with no licensing restrictions on how outputs may be used or distributed. As for point 2 in the doc, all three conditions are satisfied: 2.1 - Output is not copyrightable: The implementation is functional code that:
2.2 - No third party materials included: The code:
2.3 - Third party materials used with permission: N/A - no third party materials were included .. For point 3. Reasonable certainty obtained through:
|
| Agreed that based on the code changes, it's extremely unlikely to violate any of the guidelines set out in Apache's guidance - it conforms precisely to the existing codebase and adds very little "new" code/logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements full support for the LargeList data type in Apache Arrow JavaScript bindings. LargeList uses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB.
Key Changes:
- Added
LargeListtype (enum value 21) withBigInt64Arrayoffset support - Implemented visitor pattern methods across all visitor classes for LargeList support
- Added IPC serialization/deserialization support for LargeList metadata
- Enhanced test data generation with comprehensive LargeList test coverage
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/enum.ts | Added LargeList = 21 enum value |
| src/type.ts | Implemented LargeList class with BigInt64Array offsets and type guard |
| src/data.ts | Added LargeListDataProps interface and makeData visitor support |
| src/visitor.ts | Added visitLargeList method and type dispatch logic |
| src/visitor/get.ts | Implemented getLargeList function with BigInt to number conversion |
| src/visitor/iterator.ts | Added LargeList iterator support |
| src/visitor/indexof.ts | Added LargeList indexOf operation support |
| src/visitor/typeassembler.ts | Added FlatBuffers serialization for LargeList |
| src/visitor/jsontypeassembler.ts | Added JSON serialization for LargeList |
| src/ipc/metadata/message.ts | Added LargeList case in IPC metadata decoding |
| src/Arrow.ts | Exported LargeList class |
| src/Arrow.dom.ts | Exported LargeList class for DOM environments |
| test/generate-test-data.ts | Added generateLargeList function and fixed BigInt conversion |
| test/unit/generated-data-tests.ts | Added LargeList test suite |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
test/generate-test-data.ts Outdated
| } else { | ||
| do { | ||
| offsets[i + 1] = offsets[i] + BigInt(Math.min(max, Math.max(min, Math.trunc(rand() * max)))); | ||
| offsets[i + 1] = offsets[i] + BigInt(Math.trunc(Math.min(max, Math.max(min, Math.trunc(rand() * max))))); |
Copilot AI Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested Math.trunc() calls are redundant. The outer Math.trunc() is unnecessary since Math.min() and Math.max() already return numbers, and the inner Math.trunc() handles the rand() * max conversion.
| offsets[i + 1] = offsets[i] + BigInt(Math.trunc(Math.min(max, Math.max(min, Math.trunc(rand() * max))))); | |
| offsets[i + 1] = offsets[i] + BigInt(Math.min(max, Math.max(min, Math.trunc(rand() * max)))); |
domoritz left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to try to reduce code size increases if possible. Otherwise looks reasonable. Please make sure the tests pass. Unit tests for specific logic would be great as well.
| return { values, vector: new Vector([makeData({ type, length, nullCount, nullBitmap, valueOffsets, child: childVec.data[0] })]) }; | ||
| } | ||
| | ||
| function generateLargeList<T extends LargeList>(this: TestDataVectorGenerator, type: T, length = 100, nullCount = Math.trunc(length * 0.2), child = this.visit(type.children[0].type, length * 3, nullCount * 3)): GeneratedVector<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have a lot of overlap with the method above. Can we extract the logic into a helper so we don't blow up code size?
🤖 Generated with Claude Code
Summary
This PR implements full support for the
LargeListdata type in Apache Arrow JavaScript bindings. LargeList uses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB.Related Issues
Closes #70
Implementation Details
Core Type System
LargeListenum value (Type.LargeList = 21)LargeList<T>class withBigInt64Arrayoffset supportisLargeList()type guardLargeListDataPropsinterface for data constructionVisitor Pattern Implementation
Implemented
visitLargeList()methods across all visitors:visitor/get.ts): Handles vector value retrieval with BigInt to number conversionvisitor/iterator.ts): Enables iteration support viaSymbol.iteratorvisitor/indexof.ts): SupportsindexOf()andincludes()operationsvisitor/typeassembler.ts): FlatBuffers serialization for IPCvisitor/jsontypeassembler.ts): JSON serializationIPC Support
ipc/metadata/message.tsfor reading IPC metadataTesting
test/generate-test-data.tsgenerateLargeList()function with proper BigInt64Array offset generationcreateVariableWidthOffsets64()to handle float-to-BigInt conversiontest/unit/generated-data-tests.tsPublic API
LargeListinsrc/Arrow.tsandsrc/Arrow.dom.tsTest Plan
All existing tests continue to pass, plus new comprehensive LargeList tests validate:
makeData()[Symbol.iterator]indexOf()andincludes()operationsRun tests with:
npm test -- -t srcChecklist
Notes