-
Couldn't load subscription status.
- Fork 24.3k
RED-167994: shared-strings API #14425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RED-167994: shared-strings API #14425
Conversation
| Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
| REDIS_SERVER_NAME=redis-server$(PROG_SUFFIX) | ||
| REDIS_SENTINEL_NAME=redis-sentinel$(PROG_SUFFIX) | ||
| REDIS_SERVER_OBJ=threads_mngr.o memory_prefetch.o adlist.o quicklist.o ae.o anet.o dict.o ebuckets.o eventnotifier.o iothread.o mstr.o kvstore.o fwtree.o estore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o lolwut8.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o logreqres.o | ||
| REDIS_SERVER_OBJ=threads_mngr.o memory_prefetch.o adlist.o quicklist.o ae.o anet.o dict.o ebuckets.o eventnotifier.o iothread.o mstr.o kvstore.o fwtree.o estore.o server.o sds.o zmalloc.o lzf_c.o lzf_d.o pqsort.o zipmap.o sha1.o ziplist.o release.o networking.o util.o object.o db.o replication.o rdb.o t_string.o t_list.o t_set.o t_zset.o t_hash.o config.o aof.o pubsub.o multi.o debug.o sort.o intset.o syncio.o cluster.o cluster_legacy.o cluster_slot_stats.o crc16.o endianconv.o slowlog.o eval.o bio.o rio.o rand.o memtest.o syscheck.o crcspeed.o crccombine.o crc64.o bitops.o sentinel.o notify.o setproctitle.o blocked.o hyperloglog.o latency.o sparkline.o redis-check-rdb.o redis-check-aof.o geo.o lazyfree.o module.o evict.o expire.o geohash.o geohash_helper.o childinfo.o defrag.o siphash.o rax.o t_stream.o listpack.o localtime.o lolwut.o lolwut5.o lolwut6.o lolwut8.o acl.o tracking.o socket.o tls.o sha256.o timeout.o setcpuaffinity.o monotonic.o mt19937-64.o resp_parser.o call_reply.o script_lua.o script.o functions.o function_lua.o commands.o strl.o connection.o unix.o logreqres.o sobj.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: add test module
src/config.c Outdated
| createBoolConfig("hide-user-data-from-log", NULL, MODIFIABLE_CONFIG, server.hide_user_data_from_log, 0, NULL, NULL), | ||
| createBoolConfig("lazyexpire-nested-arbitrary-keys", NULL, MODIFIABLE_CONFIG | HIDDEN_CONFIG, server.lazyexpire_nested_arbitrary_keys, 1, NULL, NULL), | ||
| createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL), | ||
| createBoolConfig("shared-strings-enabled", NULL, IMMUTABLE_CONFIG, server.shared_strings_enabled, 1, NULL, NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't find where that's used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting the grounds for it's usage, and prevent conflicts
https://redislabs.atlassian.net/browse/RED-170638
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we wanna add shared infra in the upstream, let's do what without public interfaces and backwards compatibility requirements.
or at the very least use HIDDEN_CONFIG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add proper self-sufficient description to the PR
src/config.c Outdated
| createBoolConfig("hide-user-data-from-log", NULL, MODIFIABLE_CONFIG, server.hide_user_data_from_log, 0, NULL, NULL), | ||
| createBoolConfig("lazyexpire-nested-arbitrary-keys", NULL, MODIFIABLE_CONFIG | HIDDEN_CONFIG, server.lazyexpire_nested_arbitrary_keys, 1, NULL, NULL), | ||
| createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL), | ||
| createBoolConfig("shared-strings-enabled", NULL, IMMUTABLE_CONFIG, server.shared_strings_enabled, 1, NULL, NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we wanna add shared infra in the upstream, let's do what without public interfaces and backwards compatibility requirements.
or at the very least use HIDDEN_CONFIG
src/sobj.c Outdated
| void sobj_free_raw(sobj *o) | ||
| { | ||
| serverAssert(o->refcount == 1); | ||
| serverLog(LL_NOTICE, "Freeing shared-string %p, %d, (%p) '%s'", (void*)o, o->refcount, o->ptr, (char*)o->ptr); | ||
| decrRefCount(o); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find good reason for this function. Better call directly to decrRefCount()
src/sobj.c Outdated
| if (o->refcount == 1) { | ||
| sobj_free_raw(o); | ||
| } else { | ||
| --o->refcount; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you replace sobj_free_raw() with direct call to decrRefCount(), then all this if-else will have direct call to decrRefCount()
| sobj *new_obj = createStringObject(init, initlen); | ||
| serverAssert(new_obj->ptr); | ||
| dictEntry *existing_de; | ||
| if (dictAddRaw(pool, new_obj, &existing_de)) { /* No applicable shared-string found */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If adding to the pool is treated as its own incrRefCount(), the reference count logic will become simpler. Such as, no need for manual --o->refcount. Destructors can just decrRefCount. Replacing a key in the pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed 👍
src/sobj.c Outdated
| serverAssert(new_obj->ptr); | ||
| dictEntry *existing_de; | ||
| if (dictAddRaw(pool, new_obj, &existing_de)) { /* No applicable shared-string found */ | ||
| serverLog(LL_NOTICE, "Added new shared-string %p, %d, (%p) '%s'", (void*)new_obj, new_obj->refcount, new_obj->ptr, (char*)new_obj->ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid notice logs for debug printouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are 3 concerns this print seems to break.
one is that it could flood the log file since it happens too many times, from that perspective it should be made DEBUG or at the very least VERBOSE.
the other is that IIUC it prints user data, which we're generally not always permitted to print to log files (see hide-user-data-from-log).
and lastly, since this print is in a low level component (such as dict, rax, sds), you can't assume how the application is using it and for what. such components should completely avoid printing to the log (or even including server.h), and leave these jobs to the caller.
| unsigned long cursor; | ||
| } defragModuleCtx; | ||
| | ||
| /* Defrag helper for bare-bone allocation without freeing old pointer, and ignoring defrag hint. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oranagra @moticless how are we feeling about the changes in defrag.c?
note that the purpose of these changes is to allow sobj_defrag to create a new sobj without using tcache (so basically we wouldn't need to touch defrag.c if we had dupStringObjectNoTcache)
if we think that anyway we would need these sorts of "defrag-no-free" changes in the future then all is good, but generally speaking, since the purpose is to duplicate an object without using tcache, maybe a better approach is to create "dup-no-tcache" functions in object.c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question is if we think that allocating an object without using tcache is more of a defrag.c code or object.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the reason to use NoTcache is defrag related, so i'm ok adding such capabilities to defrag.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue was with the possibility that robj will be defraged and sds will not. this could lead to robjA freeing an sds that's still in use by robjB. solution was to tie them together (if one's deffragged, so must the other..).
I also raised concern that if we don't defrag all, at the end of the full-cycle we'll have duplicate memory we continue to run with. you said it's ok and can wait for the next defrag triggered
on another note - after rediscussing with @inbaryuval and @guybe7 we decided to not add the feature to OSS yet, because of other features that redis supports (like async-flush) or will support (like ASM) that'll break the current implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so are you going to do that (if one is duplicated, duplicate both, but if neither required being moved, don't move either)?
| .keys_are_odd = 0 | ||
| }; | ||
| | ||
| dict *sobj_init(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create a clear seperation between pool-related functions and obj-related functions
| createBoolConfig("hide-user-data-from-log", NULL, MODIFIABLE_CONFIG, server.hide_user_data_from_log, 0, NULL, NULL), | ||
| createBoolConfig("lazyexpire-nested-arbitrary-keys", NULL, MODIFIABLE_CONFIG | HIDDEN_CONFIG, server.lazyexpire_nested_arbitrary_keys, 1, NULL, NULL), | ||
| createBoolConfig("cluster-slot-stats-enabled", NULL, MODIFIABLE_CONFIG, server.cluster_slot_stats_enabled, 0, NULL, NULL), | ||
| createBoolConfig("shared-strings-enabled", NULL, IMMUTABLE_CONFIG | HIDDEN_CONFIG, server.shared_strings_enabled, 1, NULL, NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can find a better name for this config
first we need to decide if we can see this feautre used in redis core or only for modules
in addition, since we are already adding this config, let's make use it - make sure it's a only-startup config and edit sobj.c and possibly module.c accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically it's a hint for data-type codepaths, to know if they should use shared-strings in ther implementations
the reason you don't see it in use in core is because no core data-type decided to perform this optimization (yet, perhaps it could be used in other string based data-types)
if the core decides this opt will never be relevant, than the "right" thing to do it to rem it here and add it only on our fork instead of creating some stub usage for that
| return o; | ||
| } | ||
| | ||
| RedisModulePool *RM_CreateSharePool(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need several of these pools? maybe there should be only one pool, where all modules (and possibly redis core) save their shared strings to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if we do decide to have several pools (let's say one for each module and one for redis core) i think that the pool (dict) should be created on module initialization and kept within the struct RedisModule
so the API should be
RM_CreateSharedString(const char *ptr, size_t len) RM_FreeSharedString(RedisModuleString *str) RM_DefragSharedString(RedisModuleDefragCtx *ctx, RedisModuleString *str) There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should even be more explicit and define a new type RedisModuleSharedString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are pros and cons for mono/multi-pool approaches. for e.g. single pool will have larger refcounts, which means longer time to defrag (in which 2 copies are sustained...). you may chose your approach
regarding your suggestions - just yesterday I was talking to @moticless and he was worried the existing API will be too specific and tailored to specific use case. I believe it's actually very generic, but the above mentioned suggestions certainly drive it in this direction (which OSS seems to dislike).
e.g. RedisModuleSharedString couldn't be used with any of the other RMS APIs? why?
what if I want RedisModule with 2 pools? or a single pool shared between two of my modules?
please note the repeating argument is that you could functionally perform any of your suggestions with the existing API, whilst also being able to do other things we don't think on currently
https://github.com/redislabsdev/Redis/issues/633