Tags: ooples/AiDotNet
Tags
fix: resolve 6 unresolved pr review comments from #424 (#486) * fix: correct onnx attributeproto field numbers per spec Changed field numbers to match ONNX protobuf specification: - Field 20 for type (was field 3) - Field 3 for int value (was field 4) - Field 2 for float value (was field 5) - Field 4 for string value (was field 6) - Field 8 for repeated ints (unchanged, was correct) This prevents corrupt ONNX attributes when exporting models. Fixes critical code review issue #4 from PR #424. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: preserve coreml-specific configuration during export CoreMLExporter was converting CoreMLConfiguration to generic ExportConfiguration, losing CoreML-specific settings like ComputeUnits, MinimumDeploymentTarget, SpecVersion, InputFeatures, OutputFeatures, and FlexibleInputShapes. This fix: - Stores original CoreMLConfiguration in PlatformSpecificOptions during ExportToCoreML - Retrieves preserved configuration in ConvertOnnxToCoreML - Falls back to creating default config for backward compatibility Addresses PR #424 review comment: exporter drops CoreML-specific configuration * fix: add explicit null guard for directory creation Added production-ready null handling for Path.GetDirectoryName edge cases: - Explicit null check before directory operations - Changed IsNullOrEmpty to IsNullOrWhiteSpace for better validation - Added clarifying comments about edge cases (root paths, relative filenames) - Documented fallback behavior when directory is null/empty Addresses PR #424 review comment: null directory edge case handling * fix: use constraint-free hash computation in modelcache Replaced Marshal.SizeOf/Buffer.BlockCopy hashing with GetHashCode-based approach: - Removed requirement for T : unmanaged constraint - Uses unchecked hash combining with prime multipliers (17, 31) - Samples large arrays (max 100 elements) for performance - Includes array length and last element for better distribution - Proper null handling for reference types This allows ModelCache to work with any numeric type without cascading constraint requirements through DeploymentRuntime, PredictionModelResult, and dozens of other classes. Addresses PR #424 review comment: ModelCache T constraint for hashing semantics * fix: correct event ordering in telemetrycollector getevents Fixed incorrect ordering logic where Take(limit) was applied before OrderByDescending(timestamp), causing arbitrary events to be returned instead of the most recent ones. Changed: - _events.Take(limit).OrderByDescending(e => e.Timestamp) To: - _events.OrderByDescending(e => e.Timestamp).Take(limit) This ensures the method returns the MOST RECENT events as intended, not random events from the ConcurrentBag. Added clarifying documentation explaining the fix and return value semantics. Addresses PR #424 review comment: GetEvents ordering issue * fix: add comprehensive validation for tensorrt configuration Added production-ready validation to prevent invalid TensorRT configurations: 1. ForInt8() method validation: - Throws ArgumentNullException if calibration data path is null/whitespace - Ensures INT8 configurations always have calibration data 2. New Validate() method checks: - INT8 enabled requires non-empty CalibrationDataPath - Calibration data file exists if path is provided - MaxBatchSize >= 1 - MaxWorkspaceSize >= 0 - BuilderOptimizationLevel in valid range [0-5] - NumStreams >= 1 when EnableMultiStream is true This prevents runtime failures from misconfigured TensorRT engines, especially the critical INT8 without calibration data scenario. Addresses PR #424 review comment: TensorRTConfiguration calibration data validation * fix: add bounds checking for inputsize/outputsize casts in coreml proto Validate InputSize and OutputSize are non-negative before casting to ulong to prevent negative values from wrapping to large unsigned values in CoreML protobuf serialization. * fix: add production-ready onnx parsing with type validation and correct shape extraction This commit fixes three critical issues in ONNX→CoreML conversion: 1. **Data type validation in ParseTensor**: Now reads and validates the data_type field (field 5), ensuring only FLOAT tensors are converted. Throws NotSupportedException for unsupported types (DOUBLE, INT8, etc.) instead of silently corrupting data. 2. **Correct TypeProto parsing**: Fixed ParseTypeProto to properly handle nested ONNX protobuf structure (TypeProto → tensor_type → shape → dim → dim_value) instead of incorrectly treating every varint as a dimension. This fixes tensor shape extraction for model inputs/outputs. 3. **Accurate InnerProduct layer sizing**: Changed from Math.Sqrt approximation (which assumed square matrices) to using actual tensor shape from ONNX dims. For MatMul/Gemm layers, correctly extracts [out_dim, in_dim] from weight tensor shape. Technical changes: - ParseTensor now returns OnnxTensor with Name, Data, and Shape fields - Added OnnxTensor class to store tensor metadata alongside float data - Updated OnnxGraphInfo.Initializers from Dictionary<string, float[]> to Dictionary<string, OnnxTensor> - Added ParseTensorTypeProto, ParseTensorShapeProto, and ParseDimensionProto helper methods - ConvertOperatorToLayer uses shape[0] and shape[1] for layer sizing with sqrt fallback * fix: preserve all configuration properties across cloning and deserialization This ensures deployment behavior, model adaptation capabilities, and training history are maintained when copying or reloading models. Updated three methods: 1. WithParameters: Now passes LoRAConfiguration, CrossValidationResult, AgentConfig, AgentRecommendation, and DeploymentConfiguration to constructor 2. DeepCopy: Same as WithParameters for consistency 3. Deserialize: Now assigns all RAG components (RagRetriever, RagReranker, RagGenerator, QueryProcessors) and configuration properties (LoRAConfiguration, CrossValidationResult, AgentConfig, AgentRecommendation, DeploymentConfiguration) from deserialized object This fixes the issue where deployment/export/runtime settings, LoRA configurations, and meta-learning properties were lost when calling WithParameters, DeepCopy, or Deserialize. * fix: correct onnx field numbers and address pr review comments CRITICAL: Fix ONNX TensorProto field number compliance: - OnnxProto.cs: Change field 3 → 8 for tensor name per ONNX spec - OnnxToCoreMLConverter.cs: Fix all TensorProto fields (1=dims, 2=data_type, 8=name, 9=raw_data) - Previous incorrect field numbers would cause empty tensor names and broken shape inference Additional fixes: - CoreMLExporter.cs: Fix QuantizationBits mapping (Int8→8, Float16→16, default→32) - TensorRTConfiguration.cs: Use ArgumentException instead of ArgumentNullException for whitespace validation - ModelExporterBase.cs: Remove redundant null check (IsNullOrWhiteSpace handles null) Addresses PR #486 review comments #1, #2, #4, #5, #6 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * style: use ternary operator for coreml config assignment Simplify CoreMLExporter.cs by using ternary conditional operator instead of if/else for CoreMLConfiguration assignment. Addresses PR #486 review comment #5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: replace gethashcode with sha256 for model cache correctness CRITICAL: Model caching requires cryptographically secure hashing to prevent hash collisions that would cause incorrect predictions. Previous GetHashCode() approach issues: - Hash collision probability ~2^-32 (unacceptable for ML inference) - Non-deterministic across .NET runtimes, machines, and process restarts - Sampled only 100 elements from large arrays (incomplete hashing) - Could return same cache entry for different inputs (silent data corruption) SHA256-based approach: - Collision probability ~2^-256 (cryptographically secure) - Deterministic and stable across all platforms and runtimes - Hashes ALL array elements for complete correctness - Ensures cached results always match the correct input Performance impact: SHA256 hashing adds microseconds, inference takes milliseconds/seconds - the overhead is negligible compared to model inference time. This fix prioritizes correctness over premature optimization. For production ML systems, silent data corruption from hash collisions is unacceptable. Addresses PR #486 review comment #3 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
test: verify automated release with tag protection disabled Final test of automated release pipeline with tag creation restrictions removed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Cleaned up some code (#11) * Updated language version to use latest Updated target frameworks to widen the range Added a bunch of new metrics such as R2, Std Deviation, Std Error, etc * Added normalization code structure and examples Added log normalization and decimal normalization Updated example code Cleaned up exceptions to include more info Added some code documentation * Added a ZScore Normalization option Added a MinMax Normalization option * Added Multiple Regression calculations Moved some parameters to new Regression Options class Updated code examples Added Math.Net numerics library to handle the matrix calculations Added new option to choose matrix decomposition option for multiple regression Added new option to choose to use intercept or not when calculating multiple regression Added new option to choose matrix layout (row arrays or column arrays) * Added weighted regression Removed some duplicate code by introducing validation helper Removed some duplicate code by introducing usings helper Some minor code fixes * Added a validations helper to handle all validation exceptions Added new normalization and regression methods Moved all regression parameter options into a new regression options class Updated readme example to reflect the new changes * Implemented normalization methods for all regression types Updated nuget packages Changed target framework version to be compatible with older .net framework versions * Cleaned up some code that was missed --------- Signed-off-by: Franklin Moormann <cheatcountry@gmail.com>
Adding new Normalization options (#2) * Updated language version to use latest Updated target frameworks to widen the range Added a bunch of new metrics such as R2, Std Deviation, Std Error, etc * Added normalization code structure and examples Added log normalization and decimal normalization Updated example code Cleaned up exceptions to include more info Added some code documentation
Merge branch 'master' of https://github.com/ooples/AiDotNet