From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============7321499716978131195==" MIME-Version: 1.0 From: Harris, James R Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsystem Date: Mon, 01 Apr 2019 21:27:04 +0000 Message-ID: <25CD21CD-F692-4C8B-B3B3-FE7CD47C9DEC@intel.com> In-Reply-To: AM6PR04MB512748A9107588845108831989550@AM6PR04MB5127.eurprd04.prod.outlook.com List-ID: To: spdk@lists.01.org --===============7321499716978131195== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Shahar, On v18.10.x, _nvmf_ctrlr_destruct calls spdk_nvmf_ctrlr_destruct, and spdk_= nvmf_ctrlr_destruct ends with free(ctrlr). So the compiler is doing a tail= call optimization. The call stack makes sense now. Seeing malloc_printerr in the backtrace indicates that most likely this ctr= lr pointer was double-freed. I see this spdk_nvmf_ctrlr_destruct call chai= n has been modified since v18.10, possibly to fix this issue. I=E2=80=99m = explicitly adding Seth here to get his take. Thanks, -Jim From: Shahar Salzman Date: Monday, April 1, 2019 at 2:16 PM To: James Harris , Storage Performance Developm= ent Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Looks like the problem is in _nvmf_ctrlr_destruct. Since the memory is not accessible, I copied the function to another on sta= ck array, and nullified the value after it has been successfully called. Looking at _nvmf_ctrlr_destruct, I see that it indeed calls free. I'll get to the bottom of things tomorrow, but I still can't understand how= the compiler has been able to optimize the function calls in between, as t= hey are all function pointers... #5 0x000000000047320f in _spdk_event_queue_run_batch (arg=3D0x7fff54008640= ) at reactor.c:207 event =3D count =3D 8 i =3D events =3D {0x2000fe77e640, 0x2000fe77f240, 0x2000fe77f900, 0x2000f= e77f540, 0x2000fe77ef40, 0x2000fe77f180, 0x2000fe77e580, 0x2000fe77e880} fns =3D {0, 0, 0, 0, 0x46aef0 <_nvmf_ctrlr_destruct>, 0, 0, 0} ________________________________ From: SPDK on behalf of Shahar Salzman Sent: Monday, April 1, 2019 9:48 PM To: Harris, James R; Storage Performance Development Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Thanks for the tips! I just recreated the issue with a smaller application, I will try to recrea= te it with the spdk app The spdk stack is 18.10.x, but the application is ours, which is why the to= p frames are not familiar. I already tried to print the events, but I am unable to print them, possibl= y since they have already been freed. I will try to recreate this with a debug build, and then I can get a better= idea as to the functions being run on the event. ________________________________ From: SPDK on behalf of Harris, James R Sent: Monday, April 1, 2019 7:02 PM To: Storage Performance Development Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem On 4/1/19, 8:15 AM, "SPDK on behalf of Shahar Salzman" wrote: Hi, I am running with spdk 18.10.3, and crashing during delete_nvmf_subsyst= em. This happens pretty consistently (every 2-3 attempts) when I do the del= ete while the initiator is still connected. I saw that bug #235 has the same stack trace as I am seeing, but I alre= ady have the fix. I am digging into this, but would be glad to get some pointers as to wh= ere to start looking, or more specifically, how did you know to look into t= he lib/nvmf controller code according to this error. Hi Shahar, Stack frames #8 and #9 indicate maybe an out-of-tree implementation of some= parts of the reactor framework? If so, are you able to reproduce this iss= ue with the stock SPDK NVMe-of target? Can you go to frame #5 and print event->fn? The callstack indicates this i= s calling free() directly, but that doesn't really make sense here. I woul= d suggest you start looking there for clues. Regards, -Jim This is my stack: (gdb) bt #0 0x00007f338ed46495 in raise () from /lib64/libc.so.6 #1 0x00007f338ed47c75 in abort () from /lib64/libc.so.6 #2 0x00007f338ed843a7 in __libc_message () from /lib64/libc.so.6 #3 0x00007f338ed89dee in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f338ed8cc80 in _int_free () from /lib64/libc.so.6 #5 0x000000000047ca07 in _spdk_event_queue_run_batch (arg=3D0x7f32dc00= 8640) at reactor.c:205 #6 _spdk_reactor_run (arg=3D0x7f32dc008640) at reactor.c:502 #7 0x000000000047ccac in spdk_reactors_start () at reactor.c:677 #8 0x00000000007a942e in km_spdk_target_reactor_thread (arg=3D0x1da4b1= 60) at km_spdk_target_reactor.c:143 #9 0x00000000007b28e1 in km_thread_log (ctxt=3D0x7f32e0001ea0) at km_t= hread_pthreads.c:79 #10 0x00007f338fb85aa1 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f338edfcbcd in clone () from /lib64/libc.so.6 I will open a bug on github for this once I collect all the information. Shahar _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk --===============7321499716978131195==--