Dear all,
I am currently facing an issue when loading a PyTorch model to RedisAI as Redis crashes using the redislabs/redismod:edge docker image. In order to exemplify the issue I will use the imagenet example provided in https://github.com/RedisAI/redisai-examples.
While I can successfully load the already serialised resnet50 model to RedisAI, Redis keeps crashing if I try to load a model that I have serialised on my own following the model_saver.py script. I am currently using PyTorch 1.6 on Python 3.7.
Thanks a lot!
Best regards,
manl
The bug report is as follows:
=== REDIS BUG REPORT START: Cut & paste starting from here === 1:M 27 Oct 2020 08:18:45.084 # Redis 6.0.1 crashed by signal: 11 1:M 27 Oct 2020 08:18:45.084 # Crashed running the instruction at: 0x7f53c9dca975 1:M 27 Oct 2020 08:18:45.084 # Accessing address: 0x18 1:M 27 Oct 2020 08:18:45.084 # Failed assertion: <no assertion failed> (<no file>:0) ------ STACK TRACE ------ EIP: /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2f975)[0x7f53c9dca975] Backtrace: redis-server *:6379(logStackTrace+0x32)[0x562639f61872] redis-server *:6379(sigsegvHandler+0x9e)[0x562639f61f4e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f53fabac730] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2f975)[0x7f53c9dca975] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2fbb8)[0x7f53c9dcabb8] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit16ScriptTypeParser18parseClassConstantERKNS0_6AssignE+0x8d)[0x7f53ca06154d] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f374af)[0x7f53c9dd24af] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f39e43)[0x7f53c9dd4e43] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a225)[0x7f53c9dd5225] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a30a)[0x7f53c9dd530a] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZNK5torch3jit16ScriptTypeParser17parseTypeFromExprERKNS0_4ExprE+0x1c5)[0x7f53ca063ba5] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f36844)[0x7f53c9dd1844] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f39e43)[0x7f53c9dd4e43] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a225)[0x7f53c9dd5225] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZNK5torch3jit14SourceImporter13loadNamedTypeERKN3c1013QualifiedNameE+0x2e)[0x7f53c9dc84fe] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3bf54)[0x7f53c9dd6f54] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f10e93)[0x7f53c9dabe93] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f17352)[0x7f53c9db2352] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f19d60)[0x7f53c9db4d60] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f1a311)[0x7f53c9db5311] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit21readArchiveAndTensorsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN3c108optionalISt8functionIFNS9_13StrongTypePtrERKNS9_13QualifiedNameEEEEENSA_ISB_IFNS9_13intrusive_ptrINS9_6ivalue6ObjectENS9_6detail34intrusive_target_default_null_typeISL_EEEESC_NS9_6IValueEEEEENSA_INS9_6DeviceEEERN6caffe29serialize19PyTorchStreamReaderE+0x6b2)[0x7f53c9dd6982] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3bc9d)[0x7f53c9dd6c9d] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3e3c4)[0x7f53c9dd93c4] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit4loadESt10unique_ptrIN6caffe29serialize20ReadAdapterInterfaceESt14default_deleteIS4_EEN3c108optionalINS8_6DeviceEEERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESI_St4hashISI_ESt8equal_toISI_ESaISt4pairIKSI_SI_EEE+0x179)[0x7f53c9dd9bf9] /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit4loadERSiN3c108optionalINS2_6DeviceEEERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_St4hashISC_ESt8equal_toISC_ESaISt4pairIKSC_SC_EEE+0x75)[0x7f53c9dda3f5] /usr/lib/redis/modules/backends/redisai_torch/redisai_torch.so(torchLoadModel+0x215)[0x7f53fa86b475] /usr/lib/redis/modules/backends/redisai_torch/redisai_torch.so(RAI_ModelCreateTorch+0x8a)[0x7f53fa8641ea] /usr/lib/redis/modules/redisai.so(RAI_ModelCreate+0x16d)[0x7f53fa9bc80d] /usr/lib/redis/modules/redisai.so(RedisAI_ModelSet_RedisCommand+0x91b)[0x7f53fa9b422b] redis-server *:6379(RedisModuleCommandDispatcher+0x54)[0x562639f91ca4] redis-server *:6379(call+0x9d)[0x562639f1df0d] redis-server *:6379(processCommand+0x327)[0x562639f1e687] redis-server *:6379(processCommandAndResetClient+0x10)[0x562639f2c280] redis-server *:6379(processInputBuffer+0x18f)[0x562639f307cf] redis-server *:6379(+0xd4b4c)[0x562639fadb4c] redis-server *:6379(aeProcessEvents+0x111)[0x562639f17a21] redis-server *:6379(aeMain+0x2b)[0x562639f17eab] redis-server *:6379(main+0x4db)[0x562639f147eb] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f53fa9fb09b] redis-server *:6379(_start+0x2a)[0x562639f14a7a] ------ INFO OUTPUT ------ # Server redis_version:6.0.1 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:e02d1d807e41d65 redis_mode:standalone os:Linux 4.19.76-linuxkit x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:8.3.0 process_id:1 run_id:e6a2f85ed2c4e92ed31dcff4906f4328e9323d73 tcp_port:6379 uptime_in_seconds:12 uptime_in_days:0 hz:10 configured_hz:10 lru_clock:9951204 executable:/data/redis-server config_file: # Clients connected_clients:1 client_recent_max_input_buffer:98074634 client_recent_max_output_buffer:0 blocked_clients:0 tracking_clients:0 clients_in_timeout_table:0 # Memory used_memory:242778632 used_memory_human:231.53M used_memory_rss:124690432 used_memory_rss_human:118.91M used_memory_peak:242778632 used_memory_peak_human:231.53M used_memory_peak_perc:193.70% used_memory_overhead:105965986 used_memory_startup:7874368 used_memory_dataset:136812646 used_memory_dataset_perc:58.24% allocator_allocated:109019704 allocator_active:109441024 allocator_resident:129314816 total_system_memory:8353112064 total_system_memory_human:7.78G used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:0 maxmemory_human:0B maxmemory_policy:noeviction allocator_frag_ratio:1.00 allocator_frag_bytes:421320 allocator_rss_ratio:1.18 allocator_rss_bytes:19873792 rss_overhead_ratio:0.96 rss_overhead_bytes:-4624384 mem_fragmentation_ratio:1.15 mem_fragmentation_bytes:16131624 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:98091618 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 # Persistence loading:0 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1603786712 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:-1 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:0 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0 module_fork_in_progress:0 module_fork_last_cow_size:0 # Stats total_connections_received:1 total_commands_processed:5 instantaneous_ops_per_sec:0 total_net_input_bytes:102773871 total_net_output_bytes:0 instantaneous_input_kbps:51584.85 instantaneous_output_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 tracking_total_keys:0 tracking_total_items:0 unexpected_error_replies:0 # Replication role:master connected_slaves:0 master_replid:4e792775378397fcff3cb2682c63199109f4eeeb master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 master_repl_meaningful_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0 # CPU used_cpu_sys:0.183659 used_cpu_user:0.277650 used_cpu_sys_children:0.000000 used_cpu_user_children:0.000000 # Modules module:name=search,ver=999999,api=1,filters=0,usedby=[],using=[],options=[] module:name=graph,ver=999999,api=1,filters=0,usedby=[],using=[],options=[] module:name=ReJSON,ver=999999,api=1,filters=0,usedby=[],using=[],options=[] module:name=rg,ver=999999,api=1,filters=0,usedby=[],using=[ai],options=[] module:name=bf,ver=999999,api=1,filters=0,usedby=[],using=[],options=[] module:name=ai,ver=999999,api=1,filters=0,usedby=[rg],using=[],options=[] module:name=timeseries,ver=999999,api=1,filters=0,usedby=[],using=[],options=[] # Commandstats cmdstat_config:calls=1,usec=43,usec_per_call=43.00 cmdstat_info:calls=4,usec=53,usec_per_call=13.25 # Cluster cluster_enabled:0 # Keyspace ------ CLIENT LIST OUTPUT ------ id=21 addr=172.17.0.1:40400 fd=16 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=102773786 obl=0 oll=0 omem=0 events=r cmd=ai.modelset user=default ------ CURRENT CLIENT INFO ------ id=21 addr=172.17.0.1:40400 fd=16 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=102773786 obl=0 oll=0 omem=0 events=r cmd=ai.modelset user=default argv[0]: 'AI.MODELSET' argv[1]: 'imagenet_model' argv[2]: 'torch' argv[3]: 'cpu' argv[4]: 'BLOB' argv[5]: 'PK' ------ REGISTERS ------ 1:M 27 Oct 2020 08:18:45.095 # RAX:0000000000000000 RBX:000056263c008668 RCX:0000000000000000 RDX:0000000000000000 RDI:00007ffeb5711610 RSI:000056263c008668 RBP:00007ffeb5711890 RSP:00007ffeb5711610 R8 :0000000000000000 R9 :0000000000000001 R10:0000000000000001 R11:0000000000000020 R12:00007ffeb57126d0 R13:0000000000000138 R14:00007ffeb57126f0 R15:00007ffeb57126d0 RIP:00007f53c9dca975 EFL:0000000000010246 CSGSFS:002b000000000033 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161f) -> 000056263c005e50 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161e) -> 000056263c005e60 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161d) -> 00000000000000a0 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161c) -> 000056263c005e50 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161b) -> 000056263c005e60 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161a) -> 0000000000000000 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711619) -> 00007f53c79a7e35 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711618) -> 00007ffeb5712540 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711617) -> 00007ffeb57118d0 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711616) -> 0000000000000000 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711615) -> 00007f53c9dcaeee 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711614) -> 00007ffeb57126d0 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711613) -> 00007ffeb57126f0 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711612) -> 0000000000000138 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711611) -> 00007ffeb57126d0 1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711610) -> 0000000000000000 ------ MODULES INFO OUTPUT ------ # graph_executing commands # ai_git ai_git_sha:7a30eb39f3b3ce74bf4427b9c53f0fe6163e0ca2 # ai_load_time_configs ai_threads_per_queue:1 ai_inter_op_parallelism:0 ai_intra_op_parallelism:0 # ai_cpu ai_self_used_cpu_sys:0.183659 ai_self_used_cpu_user:0.277918 ai_children_used_cpu_sys:0.000000 ai_children_used_cpu_user:0.000000 ai_queue_CPU_bthread_#1_used_cpu_total:0.000000 ------ FAST MEMORY TEST ------ 1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #0 terminated 1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #1 terminated 1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #2 terminated *** Preparing to test memory region 56263a0ac000 (2277376 bytes) *** Preparing to test memory region 56263b2a7000 (14139392 bytes) *** Preparing to test memory region 7f53ba79d000 (205553664 bytes) *** Preparing to test memory region 7f53c6ba5000 (524288 bytes) *** Preparing to test memory region 7f53c6c25000 (331776 bytes) *** Preparing to test memory region 7f53d5a28000 (282624 bytes) *** Preparing to test memory region 7f53d5ff7000 (8192 bytes) *** Preparing to test memory region 7f53d6000000 (302125056 bytes) *** Preparing to test memory region 7f53ec023000 (331776 bytes) *** Preparing to test memory region 7f53ec1f4000 (16384 bytes) *** Preparing to test memory region 7f53ec3fb000 (8388608 bytes) *** Preparing to test memory region 7f53ecbfc000 (8388608 bytes) *** Preparing to test memory region 7f53ed3fd000 (8388608 bytes) *** Preparing to test memory region 7f53edbfe000 (8388608 bytes) *** Preparing to test memory region 7f53ee3ff000 (8388608 bytes) *** Preparing to test memory region 7f53eec00000 (8388608 bytes) *** Preparing to test memory region 7f53ef400000 (4194304 bytes) *** Preparing to test memory region 7f53ef815000 (524288 bytes) *** Preparing to test memory region 7f53ef896000 (8388608 bytes) *** Preparing to test memory region 7f53f02b4000 (9437184 bytes) *** Preparing to test memory region 7f53f0bb5000 (8388608 bytes) *** Preparing to test memory region 7f53f13b6000 (8388608 bytes) *** Preparing to test memory region 7f53f1bb7000 (8388608 bytes) *** Preparing to test memory region 7f53f23b8000 (8388608 bytes) *** Preparing to test memory region 7f53f2bb9000 (8388608 bytes) *** Preparing to test memory region 7f53f37ef000 (139264 bytes) *** Preparing to test memory region 7f53f3a25000 (8388608 bytes) *** Preparing to test memory region 7f53f44c4000 (12288 bytes) *** Preparing to test memory region 7f53f44c8000 (8388608 bytes) *** Preparing to test memory region 7f53f4cc9000 (8388608 bytes) *** Preparing to test memory region 7f53f54ca000 (8388608 bytes) *** Preparing to test memory region 7f53f5ccb000 (8388608 bytes) *** Preparing to test memory region 7f53f64cc000 (8388608 bytes) *** Preparing to test memory region 7f53f6ccd000 (8388608 bytes) *** Preparing to test memory region 7f53f74ce000 (8388608 bytes) *** Preparing to test memory region 7f53f7ccf000 (8388608 bytes) *** Preparing to test memory region 7f53f8d9f000 (16384 bytes) *** Preparing to test memory region 7f53f8da4000 (8388608 bytes) *** Preparing to test memory region 7f53f97fc000 (12288 bytes) *** Preparing to test memory region 7f53f9800000 (8388608 bytes) *** Preparing to test memory region 7f53fa000000 (8388608 bytes) *** Preparing to test memory region 7f53fa826000 (180224 bytes) *** Preparing to test memory region 7f53fa883000 (4096 bytes) *** Preparing to test memory region 7f53fa8c2000 (4096 bytes) *** Preparing to test memory region 7f53fa923000 (4096 bytes) *** Preparing to test memory region 7f53fa9d2000 (20480 bytes) *** Preparing to test memory region 7f53fab94000 (24576 bytes) *** Preparing to test memory region 7f53fabb7000 (16384 bytes) *** Preparing to test memory region 7f53faea0000 (16384 bytes) *** Preparing to test memory region 7f53fb0c8000 (8192 bytes) *** Preparing to test memory region 7f53fb0f6000 (4096 bytes) .O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible. ------ DUMPING CODE AROUND EIP ------ Symbol: (null) (base: (nil)) Module: /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so (base 0x7f53c6e9b000) $ xxd -r -p /tmp/dump.hex /tmp/dump.bin $ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin ------ === REDIS BUG REPORT END. Make sure to include from START to END. ===