Skip to content

byte vector is incorrectly decoded as utf-8 string in ft result class #2275

@AnneYang720

Description

@AnneYang720

Version:

$ pip3 show redis Name: redis Version: 4.3.4 

Platform:
Python 3.9.2 on Debian GNU/Linux 11

Description: The bytes is converted to string in the vector search results and there is an error in this conversion. The bytes including b'\x80' is converted to a wrong string.

Example Code

from redis import Redis from redis.commands.search.field import VectorField from redis.commands.search.query import Query r = Redis(host='localhost',port=6379) schema = (VectorField("v", "HNSW", {"TYPE": "FLOAT32", "DIM": 1, "DISTANCE_METRIC": "L2"}),) r.ft().create_index(schema) r.hset(f'{1}',mapping={'v':b'\x80\x00\x00\x00'}) q = Query("*=>[KNN 1 @v $vec AS vector_score]").dialect(2) results = r.ft().search(q, query_params={"vec": b'\x80\x00\x00\x00'}).docs for m in results: print(m.v) print('match emb =', bytes(m.v,'utf-8'))

The original bytes b'\x80\x00\x00\x00' is converted to string '\x00\x00\x00'.

Reason

# /redis/commands/search/result.py dict( dict( zip( map(to_string, res[i + fields_offset][::2]), map(to_string, res[i + fields_offset][1::2]), ) ) ) # /redis/commands/search/_util.py def to_string(s): if isinstance(s, str): return s elif isinstance(s, bytes): return s.decode("utf-8", "ignore") # here!  else: return s

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions