All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] Re: nvmeof to localhost will hung forever
@ 2019-12-07  0:28 Yang, Ziye
  0 siblings, 0 replies; 6+ messages in thread
From: Yang, Ziye @ 2019-12-07  0:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6706 bytes --]

For the potential hang issue, we can add a timeout which is already discussed in our previous issue thread, it can be feasible.

For the async method, current I did not have a good method yet. Since the nvme tcp or RDMA initiator lib is designed for any application(it can be used without SPDK poller related framework). So it is not a very easy work to decouple the functions into two parts, i.e., send and async completion.

发自我的 iPad

> 在 2019年12月7日,上午1:19,peng yu <yupeng0921(a)gmail.com> 写道:
> 
> Hi Ziye
> 
> I don't explain my question clearly. I mean the
> bdev_nvme_attach_controller command might impact the IOs on other
> devices. Assume the spdk application has a native nvme device, exports
> it has a vhost, and it is used by a virtual machine. Now I try to call
> the bdev_nvme_attach_controller command to connect to a nvmeof target.
> The primary core will stuck until it receives the response from the
> nvemof target. During this time, the primary core can not handle any
> IO for the native nvme device. So I want to know whether we could
> avoid to let the primary core handle any IO, let the primary core
> handle rpc request only, then the IO won't be impacted. If I
> misunderstand anything or you have any other idea, please let me know.
> 
> And I wonder should we add a timeout in the nvme_tcp_qpair_icreq_send
> function (and the corresponding function for rdma)? Depend on my test,
> if the nvmeof target accepts the tcp connection but doesn't send any
> response, the spdk application will stuck forever, it won't reply to
> any rpc in the further. We shouldn't let the spdk application stuck
> due to the issue of a remove nvmeof target.
> 
> Best regards.
> 
>> On Fri, Dec 6, 2019 at 12:02 AM Yang, Ziye <ziye.yang(a)intel.com> wrote:
>> 
>> Hi Peng Yu,
>> 
>> It will not. After the connection is constructed, all the I/os will become async.
>> 
>> 
>> 
>> Best Regards
>> Ziye Yang
>> 
>> -----Original Message-----
>> From: peng yu <yupeng0921(a)gmail.com>
>> Sent: Friday, December 6, 2019 3:43 PM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> Subject: [SPDK] Re: nvmeof to localhost will hung forever
>> 
>> Hi Ziye
>> 
>> Thanks for your explanation. I found a workaround:
>> I could run multiple spdk application on the same server, and specify different socket path for the two applications. Then use one as nvmeof target, another as nvmeof initiator. Depend on my simple test, it could work.
>> 
>> I have anyother concern: The synchronized operation would stuck the primary core. Is it possible to let all IO operations are handled by other cores. Assume I run the bdev_nvme_attach_controller, and the target has problem, it doesn't response or the response latency is pretty high, all IOs on this core would stuck togher with the bdev_nvme_attach_controller command. I hope such kind of issue won't impact the IO performance.
>> 
>>> On Thu, Dec 5, 2019 at 11:06 PM Yang, Ziye <ziye.yang(a)intel.com> wrote:
>>> 
>>> By the way, this issue affects not only TCP transport but also RDMA transport. And current conclusion in the previous issue, is that we conclude as "Don't recommend this as a use case".
>>> 
>>> 
>>> 
>>> 
>>> Best Regards
>>> Ziye Yang
>>> 
>>> -----Original Message-----
>>> From: Yang, Ziye
>>> Sent: Friday, December 6, 2019 2:58 PM
>>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>>> Subject: RE: [SPDK] nvmeof to localhost will hung forever
>>> 
>>> Hi Peng Yu,
>>> 
>>> We currently do not support to test the target and initiator in the same process instance.
>>> See this previous spdk reported issue.  Adding the async poller will not be OK in the low level nvme transport library. You can see the following reported spdk issue, it is same with yours.
>>> 
>>> https://github.com/spdk/spdk/issues/587
>>> 
>>> 
>>> 
>>> 
>>> Best Regards
>>> Ziye Yang
>>> 
>>> -----Original Message-----
>>> From: peng yu <yupeng0921(a)gmail.com>
>>> Sent: Friday, December 6, 2019 2:50 PM
>>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>>> Subject: [SPDK] nvmeof to localhost will hung forever
>>> 
>>> Below is the steps to reproduce the issue:
>>> 
>>> (1) run a spdk applicatoin, e.g.:
>>> sudo ./app/spdk_tgt/spdk_tgt
>>> 
>>> (2) run the nvmeof target part commands:
>>> sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c
>>> 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo
>>> ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a
>>> -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py
>>> nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo
>>> ./scripts/rpc.py nvmf_subsystem_add_listener
>>> nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420
>>> 
>>> (3) run the nvmeof initiator part command:
>>> sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a
>>> 127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1
>>> 
>>> The bdev_nvme_attach_controller command will hung forever. I found the problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck on below code:
>>> 
>>>    while (tqpair->state == NVME_TCP_QPAIR_STATE_INVALID) {
>>>        nvme_tcp_qpair_process_completions(&tqpair->qpair, 0);
>>>    }
>>> 
>>> The while loop won't finish. The nvme_tcp_qpair_process_completions
>>> function will try to receive response from the target. The target is the same spdk application, and as the application is spinning in the above while loop, the nvmeof target part code doesn't have a chance to send a response.
>>> 
>>> Is it possible to use a poller to replace the while loop? We could add a callback function, and let the poller call it when the tqpair->state is not NVME_TCP_QPAIR_STATE_INVALID. Does it make sense?
>>> _______________________________________________
>>> SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to
>>> spdk-leave(a)lists.01.org
>>> _______________________________________________
>>> SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to
>>> spdk-leave(a)lists.01.org
>> _______________________________________________
>> SPDK mailing list -- spdk(a)lists.01.org
>> To unsubscribe send an email to spdk-leave(a)lists.01.org
>> _______________________________________________
>> SPDK mailing list -- spdk(a)lists.01.org
>> To unsubscribe send an email to spdk-leave(a)lists.01.org
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [SPDK] Re: nvmeof to localhost will hung forever
@ 2019-12-06 17:19 peng yu
  0 siblings, 0 replies; 6+ messages in thread
From: peng yu @ 2019-12-06 17:19 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5901 bytes --]

Hi Ziye

I don't explain my question clearly. I mean the
bdev_nvme_attach_controller command might impact the IOs on other
devices. Assume the spdk application has a native nvme device, exports
it has a vhost, and it is used by a virtual machine. Now I try to call
the bdev_nvme_attach_controller command to connect to a nvmeof target.
The primary core will stuck until it receives the response from the
nvemof target. During this time, the primary core can not handle any
IO for the native nvme device. So I want to know whether we could
avoid to let the primary core handle any IO, let the primary core
handle rpc request only, then the IO won't be impacted. If I
misunderstand anything or you have any other idea, please let me know.

And I wonder should we add a timeout in the nvme_tcp_qpair_icreq_send
function (and the corresponding function for rdma)? Depend on my test,
if the nvmeof target accepts the tcp connection but doesn't send any
response, the spdk application will stuck forever, it won't reply to
any rpc in the further. We shouldn't let the spdk application stuck
due to the issue of a remove nvmeof target.

Best regards.

On Fri, Dec 6, 2019 at 12:02 AM Yang, Ziye <ziye.yang(a)intel.com> wrote:
>
> Hi Peng Yu,
>
> It will not. After the connection is constructed, all the I/os will become async.
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: peng yu <yupeng0921(a)gmail.com>
> Sent: Friday, December 6, 2019 3:43 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: [SPDK] Re: nvmeof to localhost will hung forever
>
> Hi Ziye
>
> Thanks for your explanation. I found a workaround:
> I could run multiple spdk application on the same server, and specify different socket path for the two applications. Then use one as nvmeof target, another as nvmeof initiator. Depend on my simple test, it could work.
>
> I have anyother concern: The synchronized operation would stuck the primary core. Is it possible to let all IO operations are handled by other cores. Assume I run the bdev_nvme_attach_controller, and the target has problem, it doesn't response or the response latency is pretty high, all IOs on this core would stuck togher with the bdev_nvme_attach_controller command. I hope such kind of issue won't impact the IO performance.
>
> On Thu, Dec 5, 2019 at 11:06 PM Yang, Ziye <ziye.yang(a)intel.com> wrote:
> >
> > By the way, this issue affects not only TCP transport but also RDMA transport. And current conclusion in the previous issue, is that we conclude as "Don't recommend this as a use case".
> >
> >
> >
> >
> > Best Regards
> > Ziye Yang
> >
> > -----Original Message-----
> > From: Yang, Ziye
> > Sent: Friday, December 6, 2019 2:58 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: RE: [SPDK] nvmeof to localhost will hung forever
> >
> > Hi Peng Yu,
> >
> > We currently do not support to test the target and initiator in the same process instance.
> > See this previous spdk reported issue.  Adding the async poller will not be OK in the low level nvme transport library. You can see the following reported spdk issue, it is same with yours.
> >
> > https://github.com/spdk/spdk/issues/587
> >
> >
> >
> >
> > Best Regards
> > Ziye Yang
> >
> > -----Original Message-----
> > From: peng yu <yupeng0921(a)gmail.com>
> > Sent: Friday, December 6, 2019 2:50 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: [SPDK] nvmeof to localhost will hung forever
> >
> > Below is the steps to reproduce the issue:
> >
> > (1) run a spdk applicatoin, e.g.:
> > sudo ./app/spdk_tgt/spdk_tgt
> >
> > (2) run the nvmeof target part commands:
> > sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c
> > 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo
> > ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a
> > -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py
> > nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo
> > ./scripts/rpc.py nvmf_subsystem_add_listener
> > nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420
> >
> > (3) run the nvmeof initiator part command:
> > sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a
> > 127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1
> >
> > The bdev_nvme_attach_controller command will hung forever. I found the problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck on below code:
> >
> >     while (tqpair->state == NVME_TCP_QPAIR_STATE_INVALID) {
> >         nvme_tcp_qpair_process_completions(&tqpair->qpair, 0);
> >     }
> >
> > The while loop won't finish. The nvme_tcp_qpair_process_completions
> > function will try to receive response from the target. The target is the same spdk application, and as the application is spinning in the above while loop, the nvmeof target part code doesn't have a chance to send a response.
> >
> > Is it possible to use a poller to replace the while loop? We could add a callback function, and let the poller call it when the tqpair->state is not NVME_TCP_QPAIR_STATE_INVALID. Does it make sense?
> > _______________________________________________
> > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to
> > spdk-leave(a)lists.01.org
> > _______________________________________________
> > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to
> > spdk-leave(a)lists.01.org
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [SPDK] Re: nvmeof to localhost will hung forever
@ 2019-12-06  8:02 Yang, Ziye
  0 siblings, 0 replies; 6+ messages in thread
From: Yang, Ziye @ 2019-12-06  8:02 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4330 bytes --]

Hi Peng Yu,

It will not. After the connection is constructed, all the I/os will become async.



Best Regards
Ziye Yang 

-----Original Message-----
From: peng yu <yupeng0921(a)gmail.com> 
Sent: Friday, December 6, 2019 3:43 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Re: nvmeof to localhost will hung forever

Hi Ziye

Thanks for your explanation. I found a workaround:
I could run multiple spdk application on the same server, and specify different socket path for the two applications. Then use one as nvmeof target, another as nvmeof initiator. Depend on my simple test, it could work.

I have anyother concern: The synchronized operation would stuck the primary core. Is it possible to let all IO operations are handled by other cores. Assume I run the bdev_nvme_attach_controller, and the target has problem, it doesn't response or the response latency is pretty high, all IOs on this core would stuck togher with the bdev_nvme_attach_controller command. I hope such kind of issue won't impact the IO performance.

On Thu, Dec 5, 2019 at 11:06 PM Yang, Ziye <ziye.yang(a)intel.com> wrote:
>
> By the way, this issue affects not only TCP transport but also RDMA transport. And current conclusion in the previous issue, is that we conclude as "Don't recommend this as a use case".
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: Yang, Ziye
> Sent: Friday, December 6, 2019 2:58 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: RE: [SPDK] nvmeof to localhost will hung forever
>
> Hi Peng Yu,
>
> We currently do not support to test the target and initiator in the same process instance.
> See this previous spdk reported issue.  Adding the async poller will not be OK in the low level nvme transport library. You can see the following reported spdk issue, it is same with yours.
>
> https://github.com/spdk/spdk/issues/587
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: peng yu <yupeng0921(a)gmail.com>
> Sent: Friday, December 6, 2019 2:50 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: [SPDK] nvmeof to localhost will hung forever
>
> Below is the steps to reproduce the issue:
>
> (1) run a spdk applicatoin, e.g.:
> sudo ./app/spdk_tgt/spdk_tgt
>
> (2) run the nvmeof target part commands:
> sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c 
> 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo 
> ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a 
> -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py 
> nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo 
> ./scripts/rpc.py nvmf_subsystem_add_listener
> nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420
>
> (3) run the nvmeof initiator part command:
> sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a
> 127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1
>
> The bdev_nvme_attach_controller command will hung forever. I found the problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck on below code:
>
>     while (tqpair->state == NVME_TCP_QPAIR_STATE_INVALID) {
>         nvme_tcp_qpair_process_completions(&tqpair->qpair, 0);
>     }
>
> The while loop won't finish. The nvme_tcp_qpair_process_completions
> function will try to receive response from the target. The target is the same spdk application, and as the application is spinning in the above while loop, the nvmeof target part code doesn't have a chance to send a response.
>
> Is it possible to use a poller to replace the while loop? We could add a callback function, and let the poller call it when the tqpair->state is not NVME_TCP_QPAIR_STATE_INVALID. Does it make sense?
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to 
> spdk-leave(a)lists.01.org 
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to 
> spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [SPDK] Re: nvmeof to localhost will hung forever
@ 2019-12-06  7:42 peng yu
  0 siblings, 0 replies; 6+ messages in thread
From: peng yu @ 2019-12-06  7:42 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3801 bytes --]

Hi Ziye

Thanks for your explanation. I found a workaround:
I could run multiple spdk application on the same server, and specify
different socket path for the two applications. Then use one as nvmeof
target, another as nvmeof initiator. Depend on my simple test, it
could work.

I have anyother concern: The synchronized operation would stuck the
primary core. Is it possible to let all IO operations are handled by
other cores. Assume I run the bdev_nvme_attach_controller, and the
target has problem, it doesn't response or the response latency is
pretty high, all IOs on this core would stuck togher with the
bdev_nvme_attach_controller command. I hope such kind of issue won't
impact the IO performance.

On Thu, Dec 5, 2019 at 11:06 PM Yang, Ziye <ziye.yang(a)intel.com> wrote:
>
> By the way, this issue affects not only TCP transport but also RDMA transport. And current conclusion in the previous issue, is that we conclude as "Don't recommend this as a use case".
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: Yang, Ziye
> Sent: Friday, December 6, 2019 2:58 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: RE: [SPDK] nvmeof to localhost will hung forever
>
> Hi Peng Yu,
>
> We currently do not support to test the target and initiator in the same process instance.
> See this previous spdk reported issue.  Adding the async poller will not be OK in the low level nvme transport library. You can see the following reported spdk issue, it is same with yours.
>
> https://github.com/spdk/spdk/issues/587
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: peng yu <yupeng0921(a)gmail.com>
> Sent: Friday, December 6, 2019 2:50 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: [SPDK] nvmeof to localhost will hung forever
>
> Below is the steps to reproduce the issue:
>
> (1) run a spdk applicatoin, e.g.:
> sudo ./app/spdk_tgt/spdk_tgt
>
> (2) run the nvmeof target part commands:
> sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo ./scripts/rpc.py nvmf_subsystem_add_listener
> nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420
>
> (3) run the nvmeof initiator part command:
> sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a
> 127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1
>
> The bdev_nvme_attach_controller command will hung forever. I found the problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck on below code:
>
>     while (tqpair->state == NVME_TCP_QPAIR_STATE_INVALID) {
>         nvme_tcp_qpair_process_completions(&tqpair->qpair, 0);
>     }
>
> The while loop won't finish. The nvme_tcp_qpair_process_completions
> function will try to receive response from the target. The target is the same spdk application, and as the application is spinning in the above while loop, the nvmeof target part code doesn't have a chance to send a response.
>
> Is it possible to use a poller to replace the while loop? We could add a callback function, and let the poller call it when the tqpair->state is not NVME_TCP_QPAIR_STATE_INVALID. Does it make sense?
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [SPDK] Re: nvmeof to localhost will hung forever
@ 2019-12-06  7:06 Yang, Ziye
  0 siblings, 0 replies; 6+ messages in thread
From: Yang, Ziye @ 2019-12-06  7:06 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2749 bytes --]

By the way, this issue affects not only TCP transport but also RDMA transport. And current conclusion in the previous issue, is that we conclude as "Don't recommend this as a use case".




Best Regards
Ziye Yang 

-----Original Message-----
From: Yang, Ziye 
Sent: Friday, December 6, 2019 2:58 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: RE: [SPDK] nvmeof to localhost will hung forever

Hi Peng Yu,

We currently do not support to test the target and initiator in the same process instance.
See this previous spdk reported issue.  Adding the async poller will not be OK in the low level nvme transport library. You can see the following reported spdk issue, it is same with yours. 

https://github.com/spdk/spdk/issues/587




Best Regards
Ziye Yang 

-----Original Message-----
From: peng yu <yupeng0921(a)gmail.com> 
Sent: Friday, December 6, 2019 2:50 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] nvmeof to localhost will hung forever

Below is the steps to reproduce the issue:

(1) run a spdk applicatoin, e.g.:
sudo ./app/spdk_tgt/spdk_tgt

(2) run the nvmeof target part commands:
sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo ./scripts/rpc.py nvmf_subsystem_add_listener
nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420

(3) run the nvmeof initiator part command:
sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a
127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1

The bdev_nvme_attach_controller command will hung forever. I found the problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck on below code:

    while (tqpair->state == NVME_TCP_QPAIR_STATE_INVALID) {
        nvme_tcp_qpair_process_completions(&tqpair->qpair, 0);
    }

The while loop won't finish. The nvme_tcp_qpair_process_completions
function will try to receive response from the target. The target is the same spdk application, and as the application is spinning in the above while loop, the nvmeof target part code doesn't have a chance to send a response.

Is it possible to use a poller to replace the while loop? We could add a callback function, and let the poller call it when the tqpair->state is not NVME_TCP_QPAIR_STATE_INVALID. Does it make sense?
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [SPDK] Re: nvmeof to localhost will hung forever
@ 2019-12-06  6:57 Yang, Ziye
  0 siblings, 0 replies; 6+ messages in thread
From: Yang, Ziye @ 2019-12-06  6:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

Hi Peng Yu,

We currently do not support to test the target and initiator in the same process instance.
See this previous spdk reported issue.  Adding the async poller will not be OK in the low level nvme transport library. You can see the following reported spdk issue, it is same with yours. 

https://github.com/spdk/spdk/issues/587




Best Regards
Ziye Yang 

-----Original Message-----
From: peng yu <yupeng0921(a)gmail.com> 
Sent: Friday, December 6, 2019 2:50 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] nvmeof to localhost will hung forever

Below is the steps to reproduce the issue:

(1) run a spdk applicatoin, e.g.:
sudo ./app/spdk_tgt/spdk_tgt

(2) run the nvmeof target part commands:
sudo ./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c 8192 sudo ./scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 sudo ./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1 sudo ./scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 sudo ./scripts/rpc.py nvmf_subsystem_add_listener
nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420

(3) run the nvmeof initiator part command:
sudo ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t tcp -a
127.0.0.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1

The bdev_nvme_attach_controller command will hung forever. I found the problem is in the nvme_tcp_qpair_icreq_send function, the spdk will stuck on below code:

    while (tqpair->state == NVME_TCP_QPAIR_STATE_INVALID) {
        nvme_tcp_qpair_process_completions(&tqpair->qpair, 0);
    }

The while loop won't finish. The nvme_tcp_qpair_process_completions
function will try to receive response from the target. The target is the same spdk application, and as the application is spinning in the above while loop, the nvmeof target part code doesn't have a chance to send a response.

Is it possible to use a poller to replace the while loop? We could add a callback function, and let the poller call it when the tqpair->state is not NVME_TCP_QPAIR_STATE_INVALID. Does it make sense?
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-07  0:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-07  0:28 [SPDK] Re: nvmeof to localhost will hung forever Yang, Ziye
  -- strict thread matches above, loose matches on Subject: below --
2019-12-06 17:19 peng yu
2019-12-06  8:02 Yang, Ziye
2019-12-06  7:42 peng yu
2019-12-06  7:06 Yang, Ziye
2019-12-06  6:57 Yang, Ziye

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.