From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============6262592244662817735==" MIME-Version: 1.0 From: peng yu Subject: [SPDK] Re: nvmeof to localhost will hung forever Date: Fri, 06 Dec 2019 09:19:34 -0800 Message-ID: In-Reply-To: FA6C2217B01E9D48A581BB48660210143E20194B@shsmsx102.ccr.corp.intel.com List-ID: To: spdk@lists.01.org --===============6262592244662817735== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Ziye I don't explain my question clearly. I mean the bdev_nvme_attach_controller command might impact the IOs on other devices. Assume the spdk application has a native nvme device, exports it has a vhost, and it is used by a virtual machine. Now I try to call the bdev_nvme_attach_controller command to connect to a nvmeof target. The primary core will stuck until it receives the response from the nvemof target. During this time, the primary core can not handle any IO for the native nvme device. So I want to know whether we could avoid to let the primary core handle any IO, let the primary core handle rpc request only, then the IO won't be impacted. If I misunderstand anything or you have any other idea, please let me know. And I wonder should we add a timeout in the nvme_tcp_qpair_icreq_send function (and the corresponding function for rdma)? Depend on my test, if the nvmeof target accepts the tcp connection but doesn't send any response, the spdk application will stuck forever, it won't reply to any rpc in the further. We shouldn't let the spdk application stuck due to the issue of a remove nvmeof target. Best regards. On Fri, Dec 6, 2019 at 12:02 AM Yang, Ziye wrote: > > Hi Peng Yu, > > It will not. After the connection is constructed, all the I/os will becom= e async. > > > > Best Regards > Ziye Yang > > -----Original Message----- > From: peng yu > Sent: Friday, December 6, 2019 3:43 PM > To: Storage Performance Development Kit > Subject: [SPDK] Re: nvmeof to localhost will hung forever > > Hi Ziye > > Thanks for your explanation. I found a workaround: > I could run multiple spdk application on the same server, and specify dif= ferent socket path for the two applications. Then use one as nvmeof target,= another as nvmeof initiator. Depend on my simple test, it could work. > > I have anyother concern: The synchronized operation would stuck the prima= ry core. Is it possible to let all IO operations are handled by other cores= . Assume I run the bdev_nvme_attach_controller, and the target has problem,= it doesn't response or the response latency is pretty high, all IOs on thi= s core would stuck togher with the bdev_nvme_attach_controller command. I h= ope such kind of issue won't impact the IO performance. > > On Thu, Dec 5, 2019 at 11:06 PM Yang, Ziye wrote: > > > > By the way, this issue affects not only TCP transport but also RDMA tra= nsport. And current conclusion in the previous issue, is that we conclude a= s "Don't recommend this as a use case". > > > > > > > > > > Best Regards > > Ziye Yang > > > > -----Original Message----- > > From: Yang, Ziye > > Sent: Friday, December 6, 2019 2:58 PM > > To: Storage Performance Development Kit > > Subject: RE: [SPDK] nvmeof to localhost will hung forever > > > > Hi Peng Yu, > > > > We currently do not support to test the target and initiator in the sam= e process instance. > > See this previous spdk reported issue. Adding the async poller will no= t be OK in the low level nvme transport library. You can see the following = reported spdk issue, it is same with yours. > > > > https://github.com/spdk/spdk/issues/587 > > > > > > > > > > Best Regards > > Ziye Yang > > > > -----Original Message----- > > From: peng yu > > Sent: Friday, December 6, 2019 2:50 PM > > To: Storage Performance Development Kit > > Subject: [SPDK] nvmeof to localhost will hung forever > > > > Below is the steps to reproduce the issue: > > > > (1) run a spdk applicatoin, e.g.: > > sudo ./app/spdk_tgt/spdk_tgt > > > > (2) run the nvmeof target part commands: > > sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c > > 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo > > ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a > > -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py > > nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo > > ./scripts/rpc.py nvmf_subsystem_add_listener > > nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420 > > > > (3) run the nvmeof initiator part command: > > sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a > > 127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1 > > > > The bdev_nvme_attach_controller command will hung forever. I found the = problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck o= n below code: > > > > while (tqpair->state =3D=3D NVME_TCP_QPAIR_STATE_INVALID) { > > nvme_tcp_qpair_process_completions(&tqpair->qpair, 0); > > } > > > > The while loop won't finish. The nvme_tcp_qpair_process_completions > > function will try to receive response from the target. The target is th= e same spdk application, and as the application is spinning in the above wh= ile loop, the nvmeof target part code doesn't have a chance to send a respo= nse. > > > > Is it possible to use a poller to replace the while loop? We could add = a callback function, and let the poller call it when the tqpair->state is n= ot NVME_TCP_QPAIR_STATE_INVALID. Does it make sense? > > _______________________________________________ > > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to > > spdk-leave(a)lists.01.org > > _______________________________________________ > > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to > > spdk-leave(a)lists.01.org > _______________________________________________ > SPDK mailing list -- spdk(a)lists.01.org > To unsubscribe send an email to spdk-leave(a)lists.01.org > _______________________________________________ > SPDK mailing list -- spdk(a)lists.01.org > To unsubscribe send an email to spdk-leave(a)lists.01.org --===============6262592244662817735==--