From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============0594251760688471613==" MIME-Version: 1.0 From: Howell, Seth Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsystem Date: Wed, 03 Apr 2019 18:39:02 +0000 Message-ID: In-Reply-To: AM6PR04MB5127D8412BCA7243500728BC89570@AM6PR04MB5127.eurprd04.prod.outlook.com List-ID: To: spdk@lists.01.org --===============0594251760688471613== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Shahar, I think your overall conclusion is right about needing to remove the dual p= ath for destroying a controller. But I am not sure if not deleting the cont= rollers directly in the subsystem_destroy path is the right thing. What causes the whole dangling controller problem is that in the subsystem_= pause path, we first send a message to the ctrlr thread to modify the ctrlr= bitmask and then send a second message from the ctrlr thread to the subsys= tem thread to delete the controller. Meanwhile, spdk_for_each_channel has already returned to the original threa= d (The one running the rpc poller) and called the callback, invoking spdk_n= vmf_subsystem_destroy. I thought it would be enough to just send a message and call spdk_nvmf_subs= ystem_destroy from the subsystem thread, but that doesn't work since there = are two levels of message passing when we destroy the controllers, we lose = the ability to deterministically send a message to the subsystem thread tha= t we know will execute after all of the controllers have been freed. (So my= initial solution doesn't work). I think your solution would be functional because it removes the second fre= e. The only thing I don't like about it is that it allows the ctrlr to exis= t after the subsystem has been destroyed. I feel like ctrlrs should only be= allowed to exist as part of a subsystem, and the lifecycle of a ctrlr shou= ld fit inside of the lifecycle of the subsystem to which it belongs. Ben suggested that we could add a flag to the subsystem that indicates that= we are trying to destroy it and then invoke spdk_nvmf_subsystem_destroy af= ter we remove and free the last ctrlr (as a sort of callback). We would, of= course, still have to think of a solution to the edge case when the subsys= tem doesn't have any qpairs or controllers associated with it (I don't thin= k the no controller case is possible, but the no qpair case might be). I am leaning more towards something like Ben's suggestion just because of t= he lifecycle requirements. Thanks, Seth From: Shahar Salzman [mailto:shahar.salzman(a)kaminario.com] Sent: Wednesday, April 3, 2019 12:02 AM To: Harris, James R ; Howell, Seth ; Storage Performance Development Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem I was looking deeper at the 18.10.x code, and it seems that this may not be= a race, but rather a (sort of) dangling controller. I went back to one of the crashes, and I saw that the subsystem has already= been removed when the reactor calls _nvmf_ctrlr_destruct: (gdb) spdk_print_nvmf_subsystems SPDK object of type struct spdk_nvmf_subsystem at 0x7f5238008b60 ((struct spdk_nvmf_subsystem*) 0x7f5238008b60) name "nqn.2014-08.org.nvmexpress.discovery", '\000' nqn "nqn.2014-08.org.nvmexpress.discovery", '\000' ID 0 Looking at the code, the call to spdk_nvmf_subsystem_destroy is made via s= pdk_for_each_channel which makes the call via spdk_thread_send_msg, so ther= e is no synchronization issue, only a free from two places. If the controller is still active during the subsystem remove, isn't the fi= x I proposed legitimate? i.e. the subsystem does not free the controller, o= nly removes it from the list, and the message from _spdk_nvmf_ctrlr_free_fr= om_qpair will remove from the list only if needed, and free the controller. I think the "racy" part is which event gets scheduled first, _nvmf_ctrlr_de= struct or spdk_nvmf_subsystem_destroy. WDYT? ________________________________ From: SPDK = > on behalf of Shahar Salzman > Sent: Wednesday, April 3, 2019 8:48 AM To: Harris, James R; Storage Performance Development Kit; Howell, Seth Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Hi Seth, Did you get to this implementation? If you did add me as a reviewer so that I can review and test it on my envi= ronment. If not, I'll take a swing at it today. Shahar ??? ?? Outlook ???? Android ________________________________ From: Howell, Seth > Sent: Tuesday, April 2, 2019 6:02:05 PM To: Shahar Salzman; Harris, James R; Storage Performance Development Kit Subject: RE: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Hi Shahar, Thanks for doing these extra experiments. I was unable to repro this locall= y yesterday. If spdk_nvmf_subsystem_destroy is run from the subsystem thread, all of our= synchronization issues go away. I think we could add a wrapper function th= at can be passed to spdk_thread_send_msg and just calls spdk_nvmf_subsystem= _destroy and then calls a callback that sends the RPC response back. That way we keep the entire lifecycle of the controller within that of the = subsystem and never have to worry about the case you mentioned below where = the subsystem has been freed, but the controller is still sitting around. We have several other componenets within SPDK that are shared, but we force= to be freed on the same thread that created them. I think that bdev descri= ptors are one example of that. I will take a crack at the fix I mentioned above later this morning. Thanks, Seth From: Shahar Salzman [mailto:shahar.salzman(a)kaminario.com] Sent: Tuesday, April 2, 2019 7:23 AM To: Howell, Seth >;= Harris, James R >; Storage Performance Development Kit > Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Hi Seth, I think that the race you pointed to was indeed the race that has been caus= ing my crashes. I added the patch bellow to mitigate the race. Although the race may still = occur under a very specific timing, it is less likely to happen. In the patch, spdk_nvmf_subsystem_destroy only removes the ctrl from subsys= tem->ctrlrs, and sets ctrlr->subsys to NULL. spdk_nvmf_ctrlr_destruct will do the freeing via the thread event, and remo= ve the controller from the subsystem->ctrls only if it still has the pointe= r (obvious race here). I tested this in the configuration bellow, hit the scenario without crashin= g the system: rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#0 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#0 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#1 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#2 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#2 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#3 changed to: IB= V_QPS_ERR rdma.c: 526:spdk_nvmf_rdma_set_ibv_state: *NOTICE*: IBV QP#3 changed to: IB= V_QPS_ERR ctrlr.c: 223:spdk_nvmf_ctrlr_destruct: *ERROR*: Removing controller 0x7f8bf= 06cae10 subsystem 0x7f8bf0086e10 ctrlr.c: 223:spdk_nvmf_ctrlr_destruct: *ERROR*: Removing controller 0x7f8bf= 086c540 subsystem (nil) subsystem.c:1252:spdk_nvmf_subsystem_remove_ctrlr: *ERROR*: Attempting to r= emove from NULL subsystem For a longer term solution, I thought to move the entire block where spdk_n= vmf_subsystem_destroy removes the controllers from the list to a thread eve= nt so that subsystem->ctrlrs is now synchronized. The only concern I have is whether there is a potential memory leak, as spd= k_nvmf_subsystem_destroy now will never free the controller. WDYT? Shahar diff --git a/lib/nvmf/ctrlr.c b/lib/nvmf/ctrlr.c index ed5e68f..e9a662b 100644 --- a/lib/nvmf/ctrlr.c +++ b/lib/nvmf/ctrlr.c @@ -219,6 +219,8 @@ spdk_nvmf_ctrlr_create(struct spdk_nvmf_subsystem *subs= ystem, void spdk_nvmf_ctrlr_destruct(struct spdk_nvmf_ctrlr *ctrlr) { + SPDK_ERRLOG("Removing controller %p subsystem %p\n", ctrlr, + ctrlr->subsys); spdk_nvmf_subsystem_remove_ctrlr(ctrlr->subsys, ctrlr); free(ctrlr); diff --git a/lib/nvmf/subsystem.c b/lib/nvmf/subsystem.c index 9e28f3c..c910477 100644 --- a/lib/nvmf/subsystem.c +++ b/lib/nvmf/subsystem.c @@ -334,7 +334,11 @@ spdk_nvmf_subsystem_destroy(struct spdk_nvmf_subsystem= *subsystem) } TAILQ_FOREACH_SAFE(ctrlr, &subsystem->ctrlrs, link, ctrlr_tmp) { +#if 0 spdk_nvmf_ctrlr_destruct(ctrlr); +#else + spdk_nvmf_subsystem_remove_ctrlr(subsystem, ctrlr); +#endif } ns =3D spdk_nvmf_subsystem_get_first_ns(subsystem); @@ -1244,8 +1248,17 @@ void spdk_nvmf_subsystem_remove_ctrlr(struct spdk_nvmf_subsystem *subsystem, struct spdk_nvmf_ctrlr *ctrlr) { + if (subsystem =3D=3D NULL || ctrlr->subsys =3D=3D NULL) { + SPDK_ERRLOG("Attempting to remove from NULL subsystem\n"); + goto out; + } + assert(subsystem =3D=3D ctrlr->subsys); TAILQ_REMOVE(&subsystem->ctrlrs, ctrlr, link); + ctrlr->subsys =3D NULL; + +out: + return; } ________________________________ From: SPDK >> = on behalf of Shahar Salzman >> Sent: Tuesday, April 2, 2019 1:38 PM To: Howell, Seth; Harris, James R; Storage Performance Development Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem For completion, the test I am running in order to recreate the issue is the= following: Setup: - initial configuration: [Global] ReactorMask 0x2 LogFacility "local7" [Nvmf] MaxQueuesPerSession 4 MaxQueueDepth 256 AcceptorPollRate 10000 [Rpc] Enable Yes - bring up target with the same construct/add_ns/add_host bellow (I know co= nstruct is deprecated, this is based on an old test I had) - nvme connect the initiator - now run the following commands in loop (I added sleep 0.3 between each co= mmand and 12 seconds between iteration): rpc.py delete_nvmf_subsystem nqn.2016-06.io.spdk:del_test rpc.py construct_nvmf_subsystem --max-namespaces=3D1024 -s KAMINARIO.K2.N00= 001 nqn.2016-06.io.spdk:del_test 'trtype:RDMA traddr:1.1.1.1 trsvcid:1023,t= rtype:RDMA traddr:2.2.2.1 trsvcid:1023' nqn.2014-08.org.nvmexpress:uuid:84a= efe1b-c50d-48dc-8440-5~b2d2b6fde7c0 rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:del_test Km2 rpc.py nvmf_subsystem_add_host nqn.2016-06.io.spdk:del_test nqn.2014-08.org= .nvmexpress:uuid:84aefe1b-c50d-48dc-8440-b2d2b6fde7c0 After 10-20 iterations I can catch the problem. **I think that the sleep between iteration has to be >10 seconds since we w= ant the initiator to auto connect between each del/create ________________________________ From: Howell, Seth >> Sent: Tuesday, April 2, 2019 1:15 AM To: Harris, James R; Shahar Salzman; Storage Performance Development Kit Subject: RE: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Hi All, I didn't make any changes to this specific callback path since 18.10, and d= on't recall seeing any issues related to this being filed. Looking at it though, I can see a possible race condition within the RPC fu= nction even on master: The first thing that this subsystem does is call spdk_nvmf_subsystem_stop w= hich triggers disconnecting every qpair on the subsystem. Each qpair destru= ct call ends up invoking _spdk_nvmf_ctrlr_free_from_qpair. In the case of t= he last qpair from each ctrlr, it ends up sending a message to the subsyste= m's thread to call _nvmf_ctrlr_destruct. Once that returns, we call spdk_nvmf_subsystem_destruct which calls spdk_nv= mf_ctrlr_destruct directly without passing a message to the subsystem threa= d. I think that is the mechanism through which we are getting the double free. I did a quick search over our tests, and I couldn't find an instance where = we are actually calling the subsystem_destruct rpc before disconnecting the= initiator. I will have to look at this a little bit, but I think we need to add a coup= le of tests to simulate this, and also fix the race condition. Thanks, Seth From: Harris, James R Sent: Monday, April 1, 2019 2:27 PM To: Shahar Salzman >>; Storage Performance Development Kit >>; Howell, Seth >> Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Hi Shahar, On v18.10.x, _nvmf_ctrlr_destruct calls spdk_nvmf_ctrlr_destruct, and spdk_= nvmf_ctrlr_destruct ends with free(ctrlr). So the compiler is doing a tail= call optimization. The call stack makes sense now. Seeing malloc_printerr in the backtrace indicates that most likely this ctr= lr pointer was double-freed. I see this spdk_nvmf_ctrlr_destruct call chai= n has been modified since v18.10, possibly to fix this issue. I'm explicit= ly adding Seth here to get his take. Thanks, -Jim From: Shahar Salzman >>> Date: Monday, April 1, 2019 at 2:16 PM To: James Harris >>>, Stor= age Performance Development Kit >>> Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Looks like the problem is in _nvmf_ctrlr_destruct. Since the memory is not accessible, I copied the function to another on sta= ck array, and nullified the value after it has been successfully called. Looking at _nvmf_ctrlr_destruct, I see that it indeed calls free. I'll get to the bottom of things tomorrow, but I still can't understand how= the compiler has been able to optimize the function calls in between, as t= hey are all function pointers... #5 0x000000000047320f in _spdk_event_queue_run_batch (arg=3D0x7fff54008640= ) at reactor.c:207 event =3D count =3D 8 i =3D events =3D {0x2000fe77e640, 0x2000fe77f240, 0x2000fe77f900, 0x2000f= e77f540, 0x2000fe77ef40, 0x2000fe77f180, 0x2000fe77e580, 0x2000fe77e880} fns =3D {0, 0, 0, 0, 0x46aef0 <_nvmf_ctrlr_destruct>, 0, 0, 0} ________________________________ From: SPDK >>> on = behalf of Shahar Salzman >>> Sent: Monday, April 1, 2019 9:48 PM To: Harris, James R; Storage Performance Development Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem Thanks for the tips! I just recreated the issue with a smaller application, I will try to recrea= te it with the spdk app The spdk stack is 18.10.x, but the application is ours, which is why the to= p frames are not familiar. I already tried to print the events, but I am unable to print them, possibl= y since they have already been freed. I will try to recreate this with a debug build, and then I can get a better= idea as to the functions being run on the event. ________________________________ From: SPDK >>> on = behalf of Harris, James R = >>> Sent: Monday, April 1, 2019 7:02 PM To: Storage Performance Development Kit Subject: Re: [SPDK] __libc_message calling assert during delete_nvmf_subsys= tem On 4/1/19, 8:15 AM, "SPDK on behalf of Shahar Salzman" >>> w= rote: Hi, I am running with spdk 18.10.3, and crashing during delete_nvmf_subsyst= em. This happens pretty consistently (every 2-3 attempts) when I do the del= ete while the initiator is still connected. I saw that bug #235 has the same stack trace as I am seeing, but I alre= ady have the fix. I am digging into this, but would be glad to get some pointers as to wh= ere to start looking, or more specifically, how did you know to look into t= he lib/nvmf controller code according to this error. Hi Shahar, Stack frames #8 and #9 indicate maybe an out-of-tree implementation of some= parts of the reactor framework? If so, are you able to reproduce this iss= ue with the stock SPDK NVMe-of target? Can you go to frame #5 and print event->fn? The callstack indicates this i= s calling free() directly, but that doesn't really make sense here. I woul= d suggest you start looking there for clues. Regards, -Jim This is my stack: (gdb) bt #0 0x00007f338ed46495 in raise () from /lib64/libc.so.6 #1 0x00007f338ed47c75 in abort () from /lib64/libc.so.6 #2 0x00007f338ed843a7 in __libc_message () from /lib64/libc.so.6 #3 0x00007f338ed89dee in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f338ed8cc80 in _int_free () from /lib64/libc.so.6 #5 0x000000000047ca07 in _spdk_event_queue_run_batch (arg=3D0x7f32dc00= 8640) at reactor.c:205 #6 _spdk_reactor_run (arg=3D0x7f32dc008640) at reactor.c:502 #7 0x000000000047ccac in spdk_reactors_start () at reactor.c:677 #8 0x00000000007a942e in km_spdk_target_reactor_thread (arg=3D0x1da4b1= 60) at km_spdk_target_reactor.c:143 #9 0x00000000007b28e1 in km_thread_log (ctxt=3D0x7f32e0001ea0) at km_t= hread_pthreads.c:79 #10 0x00007f338fb85aa1 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f338edfcbcd in clone () from /lib64/libc.so.6 I will open a bug on github for this once I collect all the information. Shahar _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk --===============0594251760688471613==--