Skip to content

Conversation

@itayTziv
Copy link
Contributor

@itayTziv itayTziv commented Oct 10, 2025

@jit-ci
Copy link

jit-ci bot commented Oct 10, 2025

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

REDIS_SERVER_NAME=redis-server$(PROG_SUFFIX)
REDIS_SENTINEL_NAME=redis-sentinel$(PROG_SUFFIX)
REDIS_SERVER_OBJ=threads_mngr.o memory_prefetch.o adlist.o quicklist.o ae.o anet.o dict.o ebuckets.o eventnotifier.o iothread.o mstr.o kvstore.o fwtree.o estore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o lolwut8.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o logreqres.o
REDIS_SERVER_OBJ=threads_mngr.o memory_prefetch.o adlist.o quicklist.o ae.o anet.o dict.o ebuckets.o eventnotifier.o iothread.o mstr.o kvstore.o fwtree.o estore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o lolwut8.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o logreqres.o sobj.o
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated: add test module

src/config.c Outdated
createBoolConfig("hide-user-data-from-log", NULL, MODIFIABLE_CONFIG, server.hide_user_data_from_log, 0, NULL, NULL),
createBoolConfig("lazyexpire-nested-arbitrary-keys", NULL, MODIFIABLE_CONFIG | HIDDEN_CONFIG, server.lazyexpire_nested_arbitrary_keys, 1, NULL, NULL),
createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL),
createBoolConfig("shared-strings-enabled", NULL, IMMUTABLE_CONFIG, server.shared_strings_enabled, 1, NULL, NULL),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't find where that's used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting the grounds for it's usage, and prevent conflicts
https://redislabs.atlassian.net/browse/RED-170638

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we wanna add shared infra in the upstream, let's do what without public interfaces and backwards compatibility requirements.
or at the very least use HIDDEN_CONFIG

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add proper self-sufficient description to the PR

src/config.c Outdated
createBoolConfig("hide-user-data-from-log", NULL, MODIFIABLE_CONFIG, server.hide_user_data_from_log, 0, NULL, NULL),
createBoolConfig("lazyexpire-nested-arbitrary-keys", NULL, MODIFIABLE_CONFIG | HIDDEN_CONFIG, server.lazyexpire_nested_arbitrary_keys, 1, NULL, NULL),
createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL),
createBoolConfig("shared-strings-enabled", NULL, IMMUTABLE_CONFIG, server.shared_strings_enabled, 1, NULL, NULL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we wanna add shared infra in the upstream, let's do what without public interfaces and backwards compatibility requirements.
or at the very least use HIDDEN_CONFIG

src/sobj.c Outdated
Comment on lines 63 to 68
void sobj_free_raw(sobj *o)
{
serverAssert(o->refcount == 1);
serverLog(LL_NOTICE, "Freeing shared-string %p, %d, (%p) '%s'", (void*)o, o->refcount, o->ptr, (char*)o->ptr);
decrRefCount(o);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find good reason for this function. Better call directly to decrRefCount()

src/sobj.c Outdated
Comment on lines 112 to 116
if (o->refcount == 1) {
sobj_free_raw(o);
} else {
--o->refcount;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you replace sobj_free_raw() with direct call to decrRefCount(), then all this if-else will have direct call to decrRefCount()

sobj *new_obj = createStringObject(init, initlen);
serverAssert(new_obj->ptr);
dictEntry *existing_de;
if (dictAddRaw(pool, new_obj, &existing_de)) { /* No applicable shared-string found */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If adding to the pool is treated as its own incrRefCount(), the reference count logic will become simpler. Such as, no need for manual --o->refcount. Destructors can just decrRefCount. Replacing a key in the pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed 👍

src/sobj.c Outdated
serverAssert(new_obj->ptr);
dictEntry *existing_de;
if (dictAddRaw(pool, new_obj, &existing_de)) { /* No applicable shared-string found */
serverLog(LL_NOTICE, "Added new shared-string %p, %d, (%p) '%s'", (void*)new_obj, new_obj->refcount, new_obj->ptr, (char*)new_obj->ptr);
Copy link
Collaborator

@moticless moticless Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid notice logs for debug printouts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guybe7 @oranagra
from production perspective, maybe we want to keep at least some of the logs? these can be used to understanding the real flow under the hood, though i'm not sure if it's too intrusive for OSS
I also ran some benchmarks and they seem to add abt ~8ms on avg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 3 concerns this print seems to break.

one is that it could flood the log file since it happens too many times, from that perspective it should be made DEBUG or at the very least VERBOSE.

the other is that IIUC it prints user data, which we're generally not always permitted to print to log files (see hide-user-data-from-log).

and lastly, since this print is in a low level component (such as dict, rax, sds), you can't assume how the application is using it and for what. such components should completely avoid printing to the log (or even including server.h), and leave these jobs to the caller.

unsigned long cursor;
} defragModuleCtx;

/* Defrag helper for bare-bone allocation without freeing old pointer, and ignoring defrag hint. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra @moticless how are we feeling about the changes in defrag.c?

note that the purpose of these changes is to allow sobj_defrag to create a new sobj without using tcache (so basically we wouldn't need to touch defrag.c if we had dupStringObjectNoTcache)

if we think that anyway we would need these sorts of "defrag-no-free" changes in the future then all is good, but generally speaking, since the purpose is to duplicate an object without using tcache, maybe a better approach is to create "dup-no-tcache" functions in object.c)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question is if we think that allocating an object without using tcache is more of a defrag.c code or object.c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the reason to use NoTcache is defrag related, so i'm ok adding such capabilities to defrag.c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that said... taking a look at the defrag.c modifications, IIRC i told @itayTziv that i think that just blindly duplicating all objects is the wrong approach.
@itayTziv do you remember the details from that conversation? IIRC you presented the reason you do that and i gave you an alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue was with the possibility that robj will be defraged and sds will not. this could lead to robjA freeing an sds that's still in use by robjB. solution was to tie them together (if one's deffragged, so must the other..).
I also raised concern that if we don't defrag all, at the end of the full-cycle we'll have duplicate memory we continue to run with. you said it's ok and can wait for the next defrag triggered

on another note - after rediscussing with @inbaryuval and @guybe7 we decided to not add the feature to OSS yet, because of other features that redis supports (like async-flush) or will support (like ASM) that'll break the current implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so are you going to do that (if one is duplicated, duplicate both, but if neither required being moved, don't move either)?

.keys_are_odd = 0
};

dict *sobj_init(void)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create a clear seperation between pool-related functions and obj-related functions

createBoolConfig("hide-user-data-from-log", NULL, MODIFIABLE_CONFIG, server.hide_user_data_from_log, 0, NULL, NULL),
createBoolConfig("lazyexpire-nested-arbitrary-keys", NULL, MODIFIABLE_CONFIG | HIDDEN_CONFIG, server.lazyexpire_nested_arbitrary_keys, 1, NULL, NULL),
createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL),
createBoolConfig("shared-strings-enabled", NULL, IMMUTABLE_CONFIG | HIDDEN_CONFIG, server.shared_strings_enabled, 1, NULL, NULL),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can find a better name for this config

first we need to decide if we can see this feautre used in redis core or only for modules

in addition, since we are already adding this config, let's make use it - make sure it's a only-startup config and edit sobj.c and possibly module.c accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically it's a hint for data-type codepaths, to know if they should use shared-strings in ther implementations
the reason you don't see it in use in core is because no core data-type decided to perform this optimization (yet, perhaps it could be used in other string based data-types)

if the core decides this opt will never be relevant, than the "right" thing to do it to rem it here and add it only on our fork instead of creating some stub usage for that

return o;
}

RedisModulePool *RM_CreateSharePool(void) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need several of these pools? maybe there should be only one pool, where all modules (and possibly redis core) save their shared strings to

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we do decide to have several pools (let's say one for each module and one for redis core) i think that the pool (dict) should be created on module initialization and kept within the struct RedisModule

so the API should be

RM_CreateSharedString(const char *ptr, size_t len) RM_FreeSharedString(RedisModuleString *str) RM_DefragSharedString(RedisModuleDefragCtx *ctx, RedisModuleString *str) 
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should even be more explicit and define a new type RedisModuleSharedString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are pros and cons for mono/multi-pool approaches. for e.g. single pool will have larger refcounts, which means longer time to defrag (in which 2 copies are sustained...). you may chose your approach
regarding your suggestions - just yesterday I was talking to @moticless and he was worried the existing API will be too specific and tailored to specific use case. I believe it's actually very generic, but the above mentioned suggestions certainly drive it in this direction (which OSS seems to dislike).
e.g. RedisModuleSharedString couldn't be used with any of the other RMS APIs? why?
what if I want RedisModule with 2 pools? or a single pool shared between two of my modules?

please note the repeating argument is that you could functionally perform any of your suggestions with the existing API, whilst also being able to do other things we don't think on currently

@itayTziv itayTziv closed this Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants