-
- Notifications
You must be signed in to change notification settings - Fork 7
Claude/mixed precision training architecture 011 cv13wm js ax6s gj6 ryu37k #479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Claude/mixed precision training architecture 011 cv13wm js ax6s gj6 ryu37k #479
Conversation
| Warning Rate limit exceeded@ooples has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 19 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (9)
Summary by CodeRabbit
WalkthroughThis pull request introduces comprehensive mixed-precision neural network training support to AiDotNet. Changes include a new Changes
Sequence DiagramsequenceDiagram participant Client participant TrainLoop as MixedPrecisionTrainingLoop participant Network as NeuralNetworkBase participant LossFunc as ILossFunction participant Optimizer as IGradientBasedOptimizer participant MPContext as MixedPrecisionContext participant LossScaler Client->>TrainLoop: TrainStep(input, target) TrainLoop->>MPContext: CastWeightsToFP16() Note over MPContext: Master Weights (FP32) → Working Weights (FP16) TrainLoop->>Network: Forward(inputFP16) Network-->>TrainLoop: outputFP16 TrainLoop->>LossFunc: Calculate(outputFP16, targetFP32) LossFunc-->>TrainLoop: lossFP32 TrainLoop->>LossScaler: ScaleLoss(lossFP32) LossScaler-->>TrainLoop: scaledLoss TrainLoop->>Network: Backward(scaledLoss) Network-->>TrainLoop: gradientsFP16 TrainLoop->>MPContext: PrepareGradientsForUpdate(gradientsFP16) MPContext->>LossScaler: UnscaleGradients(gradientsFP16) LossScaler->>LossScaler: DetectOverflow() alt Overflow Detected LossScaler->>LossScaler: SkippedUpdates++, ReduceScale() LossScaler-->>MPContext: false MPContext-->>TrainLoop: false TrainLoop-->>Client: false (step skipped) else No Overflow LossScaler-->>MPContext: true, gradientsFP32 MPContext->>Optimizer: ApplyGradientsWithMixedPrecision() Optimizer->>MPContext: UpdateMasterWeights(gradients, learningRate) MPContext-->>Optimizer: ✓ Optimizer-->>TrainLoop: ✓ LossScaler->>LossScaler: CheckGrowthInterval(), PotentiallyGrowScale() TrainLoop-->>Client: true (step applied) end Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0501169 to e8b3fb5 Compare Enhanced ConfigureMixedPrecision() documentation in both PredictionModelBuilder.cs and IPredictionModelBuilder.cs to clearly explain technical constraints: 1. Type Constraint: float only - Mixed-precision converts between FP32 (float) and FP16 (Half) - Cannot use double, decimal, or integer types 2. Gradient-Based Optimizers Only - Requires gradient computation for loss scaling, master weights, and gradient accumulation - Does NOT work with non-gradient methods (genetic algorithms, random search, Bayesian optimization) 3. Neural Networks (Recommended) - Best suited for networks with large parameter counts - Requires GPU with Tensor Core support for 2-3x speedup - Provides 50% memory reduction for massive models Also removed temporary development scripts from scripts/ directory: - add-half-conditional*.py (conditional compilation helpers) - add-half-ifdef.sh (development utility) - check-encoding.sh (encoding validation) - fix-encoding.py (encoding repair) - fix-half-conditionals.py (development utility) These were accidentally committed during Half type conditional compilation work. Kept launch-distributed-training scripts as they are part of the public API.
e8b3fb5 to 81c56f6 Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
User Story / Context
merge-dev2-to-masterSummary
Verification
Copilot Review Loop (Outcome-Based)
Record counts before/after your last push:
Files Modified
Notes