added safe data location #25
Closed
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Issue Summary
We added custom charset handling functionality (
decodeStringByCharSetanddecodeStringWithEncoder) to properly decode strings with different character encodings (latin1, utf8mb3, utf8mb4, gbk, etc.) from MySQL binlog events. However, this custom implementation didn't account for an edge case where the data slice could be completely empty.The Root Problem
The original go-mysql library's
decodeStringfunction just reads raw bytes without charset conversion. Our fork enhanced this by:decodeStringByCharSet- Routes string decoding based on charsetdecodeStringWithEncoder- Uses Go'sgolang.org/x/text/encodingpackages to properly decode stringsThe Bug
In
decodeStringWithEncoder, we immediately accessdata[0]to read the string length without checking ifdatahas any content:This causes an "index out of range" panic when the binlog data for a VARCHAR column is unexpectedly empty.
The Impact
The client was missing data because when this panic occurred, we skipped the entire batch of events rather than just the problematic row. This meant legitimate data was being dropped whenever we hit this edge case.
The Fix
Add a safety check at the beginning of
decodeStringWithEncoder:This prevents the panic and gracefully handles the empty data case by returning an empty string, allowing the rest of the batch to process normally.