Skip to content

Conversation

@eitamring
Copy link

@eitamring eitamring commented Aug 31, 2025

Issue Summary

We added custom charset handling functionality (decodeStringByCharSet and decodeStringWithEncoder) to properly decode strings with different character encodings (latin1, utf8mb3, utf8mb4, gbk, etc.) from MySQL binlog events. However, this custom implementation didn't account for an edge case where the data slice could be completely empty.

The Root Problem

The original go-mysql library's decodeString function just reads raw bytes without charset conversion. Our fork enhanced this by:

  1. Adding decodeStringByCharSet - Routes string decoding based on charset
  2. Adding decodeStringWithEncoder - Uses Go's golang.org/x/text/encoding packages to properly decode strings
  3. Adding smart quote normalization - Handles special characters that some encodings don't support

The Bug

In decodeStringWithEncoder, we immediately access data[0] to read the string length without checking if data has any content:

if length < 256 { length = int(data[0]) // ← Panics if len(data) == 0

This causes an "index out of range" panic when the binlog data for a VARCHAR column is unexpectedly empty.

The Impact

The client was missing data because when this panic occurred, we skipped the entire batch of events rather than just the problematic row. This meant legitimate data was being dropped whenever we hit this edge case.

The Fix

Add a safety check at the beginning of decodeStringWithEncoder:

if len(data) == 0 { return "", 0 }

This prevents the panic and gracefully handles the empty data case by returning an empty string, allowing the rest of the batch to process normally.

@snyk-io
Copy link

snyk-io bot commented Aug 31, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

case MYSQL_TYPE_NULL:
return nil, 0, nil
case MYSQL_TYPE_LONG:
if len(data) < 4 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decodeValue function has some unit tests. Please add this case also to unit tests

}

func decodeStringByCharSet(data []byte, charset string, length int) (v string, n int) {
if len(data) == 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decodeValue function has some unit tests. Please add this case also to unit tests

@eitamring eitamring closed this Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants