Skip to content

Conversation

@GeorgeLeePatterson
Copy link
Contributor

What's Changed

This PR adds write support for ListView and LargeListView types through builder classes, completing the view types implementation.

Implementation Details

Builders

  • ListViewBuilder and LargeListViewBuilder in src/builder/listview.ts
  • Uses Int32Array (ListView) and BigInt64Array (LargeListView) for offsets/sizes
  • Custom flush() to pass both valueOffsets and sizes to makeData
  • BigInt(0) syntax for ES5 compatibility (not 0n literals)

Integration

  • Added to builderctor visitor for factory support
  • Exported from Arrow.ts and Arrow.dom.ts for public API

What Works

  • Building ListView/LargeListView columns from JavaScript arrays
  • Separate offset and size tracking
  • Proper null handling
  • Round-trip: build → flush → read

Testing

  • Comprehensive unit tests in test/unit/builders/listview-tests.ts
  • Tests verify offset/size semantics, null handling
  • All existing tests pass
  • CI validated on fork

Builds on #320 and the ListView read support PR

This PR adds read support for BinaryView and Utf8View types (Arrow format 1.4.0+), enabling arrow-js to consume IPC data from systems like InfluxDB 3.0 and DataFusion that use view types for efficient string handling. - Added BinaryView and Utf8View type classes with view struct layout constants - Type enum entries: Type.BinaryView = 23, Type.Utf8View = 24 - Data class support for variadic buffer management - Get visitor: Implements proper view semantics (16-byte structs, inline/out-of-line data) - Set visitor: Marks as immutable (read-only) - VectorLoader: Reads from IPC format with variadicBufferCounts - TypeComparator, TypeCtor: Type system integration - JSON visitors: Explicitly unsupported (throws error) - Generated schema files for BinaryView, Utf8View, ListView, LargeListView - Script to regenerate from Arrow format definitions - Reading BinaryView/Utf8View columns from Arrow IPC files - Accessing values with proper inline/out-of-line handling - Variadic buffer management - Type checking and comparison - ✅ Unit tests for BinaryView and Utf8View (test/unit/ipc/view-types-tests.ts) - ✅ Tests verify both inline (≤12 bytes) and out-of-line data handling - ✅ TypeScript compiles without errors - ✅ All existing tests pass - ✅ Verified with DataFusion 50.0.3 integration (enables native view types, removing need for workarounds) - Reading query results from DataFusion 50.0+ with view types enabled - Consuming InfluxDB 3.0 Arrow data with Utf8View/BinaryView columns - Processing Arrow IPC streams from any system using view types - Builders for write operations - ListView/LargeListView type implementation - Additional test coverage Closes apache#311 Related to apache#225
Add scripts/update_flatbuffers.sh and test/unit/ipc/view-types-tests.ts to RAT (Release Audit Tool) exclusion list. Both files have proper Apache license headers but need to be excluded from license scanning.
Remove blank line after shebang to match Apache Arrow JS convention. License header must start on line 2 with '#' as shown in ci/scripts/build.sh
Add BinaryView and Utf8View to main exports in Arrow.ts. These types were implemented but not exported, causing 'BinaryView is not a constructor' errors in ES5 UMD tests.
Add BinaryView and Utf8View to Arrow.dom.ts exports. Arrow.node.ts re-exports from Arrow.dom.ts, so this fixes both entrypoints.
- Simplify variadicBuffers byteLength calculation with reduce - Remove unsupported type enum entries (only add BinaryView and Utf8View) - Eliminate type casting by extracting getBinaryViewBytes helper - Simplify readVariadicBuffers with Array.from - Remove CompressedVectorLoader override (inherits base implementation) - Delete SparseTensor.ts (not implementing tensors in this PR)
- Implement BinaryViewBuilder with inline/out-of-line storage logic - Implement Utf8ViewBuilder with UTF-8 encoding support - Support random-access writes (not just append-only) - Proper variadic buffer management (32MB buffers per spec) - Handle null values correctly - Register builders in builderctor visitor - Add comprehensive test suite covering: - Inline values (≤12 bytes) - Out-of-line values (>12 bytes) - Mixed inline/out-of-line - Null values - Empty values - 12-byte boundary cases - UTF-8 multibyte characters - Large batches (1000 values) - Multiple flushes Fixes: - Correct buffer allocation for random-access writes - Proper byteLength calculation (no double-counting) - Follows FixedWidthBuilder patterns for index-based writes
ESLint rule jest/prefer-to-have-length requires using toHaveLength() instead of toBe() for length checks.
Use reduce instead of explicit loops for variadicBuffers byteLength calculation, consistent with changes in Data class.
- Add ListView and LargeListView type classes with child field support - Add type guard methods isListView and isLargeListView - Add visitor support in typeassembler and typector - Add Data interfaces for ListView with offsets and sizes buffers - Add makeData overloads for ListView and LargeListView - Update DataProps union type to include ListView types ListView and LargeListView use offset+size buffers instead of consecutive offsets, allowing out-of-order writes and value sharing.
- Add ListView and LargeListView type classes to src/type.ts - Add visitor support in src/visitor.ts (inferDType and getVisitFnByTypeId) - Add visitor support in src/visitor/typector.ts and typeassembler.ts - Add DataProps interfaces for ListView/LargeListView in src/data.ts - Implement MakeDataVisitor methods for ListView/LargeListView - Implement GetVisitor methods for ListView/LargeListView in src/visitor/get.ts - Add comprehensive test suite in test/unit/ipc/list-view-tests.ts - Tests in-order and out-of-order offsets - Tests value sharing between list elements - Tests null handling and empty lists - Tests LargeListView with BigInt64Array offsets - Tests type properties ListView and LargeListView are Arrow 1.4 variable-size list types that use offset+size buffers instead of consecutive offsets, enabling out-of-order writes and value sharing.
Add type 25 (ListView) and 26 (LargeListView) to the Type enum.
Implements builders for ListView and LargeListView types: - ListViewBuilder: Uses Int32Array for offsets and sizes - LargeListViewBuilder: Uses BigInt64Array for offsets and sizes Key implementation details: - Both builders extend Builder directly (not VariableWidthBuilder) - Use DataBufferBuilder for independent offset and size buffers - Override flush() to pass both valueOffsets and sizes to makeData - Properly handle null values and empty lists Includes comprehensive test suite with 11 passing tests: - Basic value appending - Null handling - Empty lists - Multiple flushes - Varying list sizes - BigInt offset verification This is part of the stacked PR strategy for view types support.
ESLint rule jest/prefer-to-have-length requires using toHaveLength() instead of toBe() for length checks.
@GeorgeLeePatterson
Copy link
Contributor Author

Just as I did with the Utf8View and BinaryView PR, I will absorb this into the base PR since read and write support is required to support data types. I will decline this shortly.

@GeorgeLeePatterson GeorgeLeePatterson marked this pull request as draft November 6, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant