All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06  7:59 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-06  7:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2526 bytes --]

Hi Jim,

Our module is an implementation of the NVMeoF/RDMA host.

It works with SPDK 18.10.1 well, so the problem seems to be related to the  
SPDK 19.01 code.

I can see that the RDMA request's state engine have been changed in the  
recent SPDK release.
So it would be great if the author of the modifications could take a look  
at the issue...

Thank you for your help!

-- 
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Harris, James R <james.r.harris(a)intel.com> писал(а) в своём письме Wed, 06  
Feb 2019 02:35:38 +0200:

> Hi Valeriy,
>
> Can you provide more details on this test host module?
>
> Thanks,
>
> Jim
>
>> On Feb 5, 2019, at 4:41 AM, Valeriy Glushkov  
>> <valeriy.glushkov(a)starwind.com> wrote:
>>
>> Hi All!
>>
>> It seems there is an issue in the 19.01 code that leads to a request  
>> stays unanswered and our test NVMeoF/RDMA host module does not receive  
>> the completion.
>>
>> From the debug log of the target:
>> =======
>> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20  
>> entering state 1
>> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20  
>> entering state 2
>> rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request  
>> 0x1402a20 took 1 buffer/s from central pool
>> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20  
>> entering state 5
>> request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0  
>> cid 2751 nsid 4294967295 cdw10 0x001e0002
>> request.c: 127:nvmf_trace_command: *DEBUG*: psdt 0
>> request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr  
>> 0xffffb28f67223dd0 key 0x8294c len 0x200
>> ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page:  
>> LID=0x02 offset=0x0 len=0x7c
>> request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2751  
>> cdw0=0x00000000 rsvd1=0 status=0x0000
>> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20  
>> entering state 7
>> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20  
>> entering state 8
>> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20  
>> entering state 8
>> ========
>>
>> The same host module works well with the SPDK 18.10.1 NVMf target.
>>
>> Is this a bug in the recent SPDK code?
>>
>> --
>> Best regards,
>> Valeriy Glushkov
>> www.starwind.com
>> valeriy.glushkov(a)starwind.com
>>

	



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-28 22:07 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-28 22:07 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2740 bytes --]

Hi Seth,

Thank you for the update!
The NVMf target works for me now.

-- 
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 27 Feb  
2019 17:00:47 +0200:

> Hi Valeriy,
>
> This was a really stupid mistake on my part. I moved a decrement up  
> without moving the assert that accompanied it up with it (in my original  
> fix). Please see this one liner which should fix it. I will also  
> backport it to 19.01.1. The silver lining here is that there is nothing  
> functionally wrong with the code, the assert was just erroneous and will  
> hit when SPDK is built in debug mode.
>
> https://review.gerrithub.io/c/spdk/spdk/+/446440/1/lib/nvmf/rdma.c
>
> Thank you for replying and pointing this out,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy  
> Glushkov
> Sent: Tuesday, February 26, 2019 3:21 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Hi Seth,
>
> It seems that some problem is still present in the nvmf target built  
> from the recent SPDK trunk.
> It crashes with dump under our performance test.
>
> ===========
> # Starting SPDK v19.04-pre / DPDK 18.11.0 initialization...
> [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --log-level=lib.eal:6
> --base-virtaddr=0x200000000000 --file-prefix=spdk_pid131804 ]
> EAL: No free hugepages reported in hugepages-2048kB
> EAL: 2 hugepages of size 1073741824 reserved, but no mounted hugetlbfs  
> found for that size
> app.c: 624:spdk_app_start: *NOTICE*: Total cores available: 1
> reactor.c: 233:_spdk_reactor_run: *NOTICE*: Reactor started on core 0
> rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002458d000
> length=4096
> rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002464b000
> length=4096
> rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245ae000
> length=4096
> rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000
> length=4096
> rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000
> length=4096
> rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245a8000
> length=4096
> nvmf_tgt: rdma.c:2789: spdk_nvmf_rdma_poller_poll: Assertion  
> `rdma_req->num_outstanding_data_wr > 0' failed.
> ^C
> [1]+  Aborted                 (core dumped) ./nvmf_tgt -c ./nvmf.conf
> =========
>
> Do you need the dump file or some additional info?
>
> --
> Best regards,
> Valeriy Glushkov
> www.starwind.com
> valeriy.glushkov(a)starwind.com
>

	



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-27 15:04 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-27 15:04 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

Hi Lance,

The data_wr tracker was not added until 19.01 and was not backported to 18.10.1 so this issue won't occur there.

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Lance Hartmann ORACLE
Sent: Tuesday, February 26, 2019 4:45 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi Seth,


> On Feb 7, 2019, at 12:18 PM, Howell, Seth <seth.howell(a)intel.com> wrote:
> 
> Hi Sasha, Valeriy,
> 
> With the help of Valeriy's logs I was able to get to the bottom of this. The root cause is that for NVMe-oF requests that don't transfer any data, such as keep_alive, we were not properly resetting the value of rdma_req->num_outstanding_data_wr between uses of that structure. All data carrying operations properly reset this value in spdk_nvmf_rdma_req_parse_sgl. 
> 
> My local repro steps look like this for anyone interested.
> 
> Start the SPDK target,
> Submit a full queue depth worth of Smart log requests (sequentially is fine). A smaller number also works, but takes much longer.
> Wait for a while (This assumes you have keep alive enabled). Keep alive requests will reuse the rdma_req objects slowly incrementing the curr_send_depth on the admin qpair.
> Eventually the admin qpair will be unable to submit I/O.
> 
> I was able to fix the issue locally with the following patch. https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please let me know if applying this patch also fixes it for you ( I am pretty sure that it will).


Does this issue present only in 19.01, or can it also occur in 18.10.1?

--
Lance Hartmann


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-27 15:00 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-27 15:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4942 bytes --]

Hi Valeriy,

This was a really stupid mistake on my part. I moved a decrement up without moving the assert that accompanied it up with it (in my original fix). Please see this one liner which should fix it. I will also backport it to 19.01.1. The silver lining here is that there is nothing functionally wrong with the code, the assert was just erroneous and will hit when SPDK is built in debug mode.

https://review.gerrithub.io/c/spdk/spdk/+/446440/1/lib/nvmf/rdma.c

Thank you for replying and pointing this out,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov
Sent: Tuesday, February 26, 2019 3:21 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi Seth,

It seems that some problem is still present in the nvmf target built from the recent SPDK trunk.
It crashes with dump under our performance test.

===========
# Starting SPDK v19.04-pre / DPDK 18.11.0 initialization...
[ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --log-level=lib.eal:6
--base-virtaddr=0x200000000000 --file-prefix=spdk_pid131804 ]
EAL: No free hugepages reported in hugepages-2048kB
EAL: 2 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
app.c: 624:spdk_app_start: *NOTICE*: Total cores available: 1
reactor.c: 233:_spdk_reactor_run: *NOTICE*: Reactor started on core 0
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002458d000
length=4096
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002464b000
length=4096
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245ae000
length=4096
rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000
length=4096
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000
length=4096
rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245a8000
length=4096
nvmf_tgt: rdma.c:2789: spdk_nvmf_rdma_poller_poll: Assertion `rdma_req->num_outstanding_data_wr > 0' failed.
^C
[1]+  Aborted                 (core dumped) ./nvmf_tgt -c ./nvmf.conf
=========

Do you need the dump file or some additional info?

--
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Thu, 07 Feb
2019 20:18:28 +0200:

> Hi Sasha, Valeriy,
>
> With the help of Valeriy's logs I was able to get to the bottom of this.  
> The root cause is that for NVMe-oF requests that don't transfer any 
> data, such as keep_alive, we were not properly resetting the value of 
> rdma_req->num_outstanding_data_wr between uses of that structure. All 
> data carrying operations properly reset this value in 
> spdk_nvmf_rdma_req_parse_sgl.
>
> My local repro steps look like this for anyone interested.
>
> Start the SPDK target,
> Submit a full queue depth worth of Smart log requests (sequentially is 
> fine). A smaller number also works, but takes much longer.
> Wait for a while (This assumes you have keep alive enabled). Keep 
> alive requests will reuse the rdma_req objects slowly incrementing the 
> curr_send_depth on the admin qpair.
> Eventually the admin qpair will be unable to submit I/O.
>
> I was able to fix the issue locally with the following patch.  
> https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please 
> let me know if applying this patch also fixes it for you ( I am pretty 
> sure that it will).
>
> Thank you for the bug report and for all of your help,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
> Kotchubievsky
> Sent: Thursday, February 7, 2019 11:06 AM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Hi,
>
> RNR value shouldn't affect NVMF. I just want to check if NVMF prepost 
> enough receive requests.  19.10 introduced some new way for flow 
> control and count number of send and receive work requests. Probably, 
> NVMF doesn't pre-post enough requests.
>
> Which network do you use : IB or ROcE? What it is you HW and SW stack 
> in host and in target sides? (OS, OFED/MOFED version, NIC type)
>
> I'd suggest to configure NVMF with big max queue depth, and in your 
> test actually use a half of the value.
>
> On 2/7/2019 5:37 PM, Valeriy Glushkov wrote:
>> Hi Sasha,
>>
>> There is no IBV on the host side, it's Windows.
>> So we have no control over the RNR field.
>>
>>   From a RDMA session's dump I can see that the initiator sets 
>> infiniband.cm.req.rnrretrcount to 0x6.
>>
>> Could the RNR value be related to the problem we have with SPDK 19.01 
>> NVMeoF target?
>>

	


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-26 23:45 Lance Hartmann ORACLE
  0 siblings, 0 replies; 21+ messages in thread
From: Lance Hartmann ORACLE @ 2019-02-26 23:45 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1288 bytes --]

Hi Seth,


> On Feb 7, 2019, at 12:18 PM, Howell, Seth <seth.howell(a)intel.com> wrote:
> 
> Hi Sasha, Valeriy,
> 
> With the help of Valeriy's logs I was able to get to the bottom of this. The root cause is that for NVMe-oF requests that don't transfer any data, such as keep_alive, we were not properly resetting the value of rdma_req->num_outstanding_data_wr between uses of that structure. All data carrying operations properly reset this value in spdk_nvmf_rdma_req_parse_sgl. 
> 
> My local repro steps look like this for anyone interested.
> 
> Start the SPDK target,
> Submit a full queue depth worth of Smart log requests (sequentially is fine). A smaller number also works, but takes much longer.
> Wait for a while (This assumes you have keep alive enabled). Keep alive requests will reuse the rdma_req objects slowly incrementing the curr_send_depth on the admin qpair.
> Eventually the admin qpair will be unable to submit I/O.
> 
> I was able to fix the issue locally with the following patch. https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please let me know if applying this patch also fixes it for you ( I am pretty sure that it will).


Does this issue present only in 19.01, or can it also occur in 18.10.1?

--
Lance Hartmann



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-26 22:21 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-26 22:21 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4034 bytes --]

Hi Seth,

It seems that some problem is still present in the nvmf target built from  
the recent SPDK trunk.
It crashes with dump under our performance test.

===========
# Starting SPDK v19.04-pre / DPDK 18.11.0 initialization...
[ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --log-level=lib.eal:6  
--base-virtaddr=0x200000000000 --file-prefix=spdk_pid131804 ]
EAL: No free hugepages reported in hugepages-2048kB
EAL: 2 hugepages of size 1073741824 reserved, but no mounted hugetlbfs  
found for that size
app.c: 624:spdk_app_start: *NOTICE*: Total cores available: 1
reactor.c: 233:_spdk_reactor_run: *NOTICE*: Reactor started on core 0
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002458d000  
length=4096
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002464b000  
length=4096
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245ae000  
length=4096
rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000  
length=4096
rdma.c:2758:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x20002457e000  
length=4096
rdma.c:2786:spdk_nvmf_rdma_poller_poll: *ERROR*: data=0x2000245a8000  
length=4096
nvmf_tgt: rdma.c:2789: spdk_nvmf_rdma_poller_poll: Assertion  
`rdma_req->num_outstanding_data_wr > 0' failed.
^C
[1]+  Aborted                 (core dumped) ./nvmf_tgt -c ./nvmf.conf
=========

Do you need the dump file or some additional info?

-- 
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Thu, 07 Feb  
2019 20:18:28 +0200:

> Hi Sasha, Valeriy,
>
> With the help of Valeriy's logs I was able to get to the bottom of this.  
> The root cause is that for NVMe-oF requests that don't transfer any  
> data, such as keep_alive, we were not properly resetting the value of  
> rdma_req->num_outstanding_data_wr between uses of that structure. All  
> data carrying operations properly reset this value in  
> spdk_nvmf_rdma_req_parse_sgl.
>
> My local repro steps look like this for anyone interested.
>
> Start the SPDK target,
> Submit a full queue depth worth of Smart log requests (sequentially is  
> fine). A smaller number also works, but takes much longer.
> Wait for a while (This assumes you have keep alive enabled). Keep alive  
> requests will reuse the rdma_req objects slowly incrementing the  
> curr_send_depth on the admin qpair.
> Eventually the admin qpair will be unable to submit I/O.
>
> I was able to fix the issue locally with the following patch.  
> https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please let  
> me know if applying this patch also fixes it for you ( I am pretty sure  
> that it will).
>
> Thank you for the bug report and for all of your help,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha  
> Kotchubievsky
> Sent: Thursday, February 7, 2019 11:06 AM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Hi,
>
> RNR value shouldn't affect NVMF. I just want to check if NVMF prepost  
> enough receive requests.  19.10 introduced some new way for flow control  
> and count number of send and receive work requests. Probably, NVMF  
> doesn't pre-post enough requests.
>
> Which network do you use : IB or ROcE? What it is you HW and SW stack in  
> host and in target sides? (OS, OFED/MOFED version, NIC type)
>
> I'd suggest to configure NVMF with big max queue depth, and in your test  
> actually use a half of the value.
>
> On 2/7/2019 5:37 PM, Valeriy Glushkov wrote:
>> Hi Sasha,
>>
>> There is no IBV on the host side, it's Windows.
>> So we have no control over the RNR field.
>>
>>   From a RDMA session's dump I can see that the initiator sets
>> infiniband.cm.req.rnrretrcount to 0x6.
>>
>> Could the RNR value be related to the problem we have with SPDK 19.01
>> NVMeoF target?
>>

	



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-07 18:18 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-07 18:18 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2382 bytes --]

Hi Sasha, Valeriy,

With the help of Valeriy's logs I was able to get to the bottom of this. The root cause is that for NVMe-oF requests that don't transfer any data, such as keep_alive, we were not properly resetting the value of rdma_req->num_outstanding_data_wr between uses of that structure. All data carrying operations properly reset this value in spdk_nvmf_rdma_req_parse_sgl. 

My local repro steps look like this for anyone interested.

Start the SPDK target,
Submit a full queue depth worth of Smart log requests (sequentially is fine). A smaller number also works, but takes much longer.
Wait for a while (This assumes you have keep alive enabled). Keep alive requests will reuse the rdma_req objects slowly incrementing the curr_send_depth on the admin qpair.
Eventually the admin qpair will be unable to submit I/O.

I was able to fix the issue locally with the following patch. https://review.gerrithub.io/#/c/spdk/spdk/+/443811/. Valeriy, please let me know if applying this patch also fixes it for you ( I am pretty sure that it will).

Thank you for the bug report and for all of your help,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
Sent: Thursday, February 7, 2019 11:06 AM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi,

RNR value shouldn't affect NVMF. I just want to check if NVMF prepost enough receive requests.  19.10 introduced some new way for flow control and count number of send and receive work requests. Probably, NVMF doesn't pre-post enough requests.

Which network do you use : IB or ROcE? What it is you HW and SW stack in host and in target sides? (OS, OFED/MOFED version, NIC type)

I'd suggest to configure NVMF with big max queue depth, and in your test actually use a half of the value.

On 2/7/2019 5:37 PM, Valeriy Glushkov wrote:
> Hi Sasha,
>
> There is no IBV on the host side, it's Windows.
> So we have no control over the RNR field.
>
>   From a RDMA session's dump I can see that the initiator sets 
> infiniband.cm.req.rnrretrcount to 0x6.
>
> Could the RNR value be related to the problem we have with SPDK 19.01 
> NVMeoF target?
>
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-07 18:05 Sasha Kotchubievsky
  0 siblings, 0 replies; 21+ messages in thread
From: Sasha Kotchubievsky @ 2019-02-07 18:05 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

Hi,

RNR value shouldn't affect NVMF. I just want to check if NVMF prepost 
enough receive requests.  19.10 introduced some new way for flow control 
and count number of send and receive work requests. Probably, NVMF 
doesn't pre-post enough requests.

Which network do you use : IB or ROcE? What it is you HW and SW stack in 
host and in target sides? (OS, OFED/MOFED version, NIC type)

I'd suggest to configure NVMF with big max queue depth, and in your test 
actually use a half of the value.

On 2/7/2019 5:37 PM, Valeriy Glushkov wrote:
> Hi Sasha,
>
> There is no IBV on the host side, it's Windows.
> So we have no control over the RNR field.
>
>   From a RDMA session's dump I can see that the initiator sets
> infiniband.cm.req.rnrretrcount to 0x6.
>
> Could the RNR value be related to the problem we have with SPDK 19.01
> NVMeoF target?
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-07 16:59 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-07 16:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]

Hi Valeriy,

I have created a patch with the debug prints I would like to see and uploaded it to gerrithub. Please run the following from your spdk repo while you have v19.01 checked out. git fetch "https://review.gerrithub.io/spdk/spdk" refs/changes/01/443801/1 && git cherry-pick FETCH_HEAD. Then if you could rerun your test and post the results back here that would be very helpful.

Thanks,

Seth
-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov
Sent: Thursday, February 7, 2019 8:38 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi Sasha,

There is no IBV on the host side, it's Windows.
So we have no control over the RNR field.

 From a RDMA session's dump I can see that the initiator sets infiniband.cm.req.rnrretrcount to 0x6.

Could the RNR value be related to the problem we have with SPDK 19.01 NVMeoF target?

--
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Sasha Kotchubievsky <sashakot(a)dev.mellanox.co.il> писал(а) в своём письме Thu, 07 Feb 2019 16:31:33 +0200:

> Hi,
>
> What "rnr count" value you set in initiator side at connect to NVMF? 
> Do you see**any *IBV_WC_RNR_RETRY_EXC_ERR *error in initiator side?
>
> Best regards
>
> Sasha
>
>
> On 2/7/2019 6:50 AM, Valeriy Glushkov wrote:
>> Hi Seth,
>>
>> It would be great if you send me a modified source with changes you 
>> need or a patch and I add it to the SPDK build, reproduce the case 
>> and send you the resulting debug log.
>>
>> It seems that the Log page request is okay itself as there are many 
>> such requests before in the log...
>>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


	


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-07 15:37 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-07 15:37 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1248 bytes --]

Hi Sasha,

There is no IBV on the host side, it's Windows.
So we have no control over the RNR field.

 From a RDMA session's dump I can see that the initiator sets  
infiniband.cm.req.rnrretrcount to 0x6.

Could the RNR value be related to the problem we have with SPDK 19.01  
NVMeoF target?

-- 
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Sasha Kotchubievsky <sashakot(a)dev.mellanox.co.il> писал(а) в своём письме  
Thu, 07 Feb 2019 16:31:33 +0200:

> Hi,
>
> What "rnr count" value you set in initiator side at connect to NVMF? Do  
> you see**any *IBV_WC_RNR_RETRY_EXC_ERR *error in initiator side?
>
> Best regards
>
> Sasha
>
>
> On 2/7/2019 6:50 AM, Valeriy Glushkov wrote:
>> Hi Seth,
>>
>> It would be great if you send me a modified source with changes you need
>> or a patch and I add it to the SPDK build, reproduce the case and send  
>> you
>> the resulting debug log.
>>
>> It seems that the Log page request is okay itself as there are many such
>> requests before in the log...
>>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


	



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-07 14:31 Sasha Kotchubievsky
  0 siblings, 0 replies; 21+ messages in thread
From: Sasha Kotchubievsky @ 2019-02-07 14:31 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 534 bytes --]

Hi,

What "rnr count" value you set in initiator side at connect to NVMF? Do 
you see**any *IBV_WC_RNR_RETRY_EXC_ERR *error in initiator side?

Best regards

Sasha


On 2/7/2019 6:50 AM, Valeriy Glushkov wrote:
> Hi Seth,
>
> It would be great if you send me a modified source with changes you need
> or a patch and I add it to the SPDK build, reproduce the case and send you
> the resulting debug log.
>
> It seems that the Log page request is okay itself as there are many such
> requests before in the log...
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-07  4:50 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-07  4:50 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11004 bytes --]

Hi Seth,

It would be great if you send me a modified source with changes you need  
or a patch and I add it to the SPDK build, reproduce the case and send you  
the resulting debug log.

It seems that the Log page request is okay itself as there are many such  
requests before in the log...

-- 

Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Thu, 07 Feb  
2019 01:03:06 +0200:

> Hi Valeriy,
>
> Thanks for sending the full log. At the surface, everything looks how I  
> would expect it to. I am going to try to replicate your I/O patterns  
> here locally to see if I can repro this with another initiator. In the  
> meantime, can you add debug statements to all of the places where we  
> increment and decrement the current_send_depth value and run the test  
> again?
>
> Also, does it always fail on a get_log_page command?
>
> Thanks,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Seth
> Sent: Wednesday, February 6, 2019 1:28 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Thanks Valeriy,
>
> I agree with you that there is something going wrong with the  
> current_send_depth value.
> My first thought with something like this is usually that we are doing  
> our math wrong and we are underflowing. But the value being reported is  
> 62 which implies that we have simply submitted 62 send_wrs without  
> reaping any on the completion queue. Underflow would probably result in  
> a number closer to 2^32-1. This leads me to believe we are blocking the  
> decrement in some way.
>
> In order for a blockage like this to occur, one of two things has to be  
> happening:
> 1. The SENDs are not properly being posted to the completion queue.  
> (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the  
> work requests. The spec says they will definitely get completed back to  
> the completion queue in that case) 2. We aren't properly decrementing  
> the current_send_depth when we get a completion (Also very unlikely. You  
> can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every  
> completion that is classified as a send we decrement current_send_depth  
> in both the standard and error paths).
>
> Can you please send me the full debug log? I want to look at all of the  
> commands previous to the failed one to see if they are returning to  
> state 0, which would imply that we are getting the send completions back.
> Something else that would be really helpful is if you printed out the  
> value of current_send_depth each time we increment or decrement it. That  
> would help me see exactly what's missing here.
>
> Just a few more questions to establish a baseline - 1. What kind of a  
> NIC are you using?
> 2. Have you made any changes to the target code on top of 19.01? i.e.  
> applied any custom patches?
>
> Thanks,
>
> Seth
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy  
> Glushkov
> Sent: Wednesday, February 6, 2019 11:15 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
> Hi Seth,
>
> thank you for your great explanation of the changes you have done!
>
> I added some debug prints to the target code as you requested.
>
> Here is a part of the target's debug log with the last Admin queue
> requests:
> =======
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 1
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 5
> request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0  
> cid
> 2815 nsid 0 cdw10 0x00000006
> request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0  
> len 0x0
> request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815
> cdw0=0x00000001 rsvd1=0 status=0x0000
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 7
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 9
> rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:
> 0xfdd2e8 Connection: 0xf04970
> rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:
> 0xff14e0 Connection: 0xf04970
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 11
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 11
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 12
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
> entering state 0 ...
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
> entering state 1
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
> entering state 2
> rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20  
> took 1 buffer/s from central pool
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
> entering state 5
> request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0  
> cid
> 2847 nsid 4294967295 cdw10 0x001e0002
> request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
> 0xffffe483a781fdd0 key 0x82929 len 0x200
> ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page:  
> LID=0x02
> offset=0x0 len=0x7c
> request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847
> cdw0=0x00000000 rsvd1=0 status=0x0000
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
> entering state 7
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
> entering state 8
> rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs  
> outstanding (max_send_depth 63). Waiting until some finish.
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
> entering state 8
> rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs  
> outstanding (max_send_depth 63). Waiting until some finish.
> =======
>
> The Admin queue depth = 32.
>
> It seems that there is an issue with the rqpair->current_send_depth  
> counter as the request was the only one in the queue at the moment.
>
> I could send you the complete debug log, if you need.
>
> Thank you for your help.
>
> --
> Best regards,
> Valeriy Glushkov
> www.starwind.com
> valeriy.glushkov(a)starwind.com
>
> Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb
> 2019 18:03:49 +0200:
>
>> Hi Valeriy,
>>
>> I made the changes that you mentioned to the RDMA request state machine.
>> Here is a little background on those changes.
>> Those changes were part of an effort to better approach the limits of
>> work requests that can be posted at once to the send queue. Previously
>> we were limiting both RDMA reads and RDMA writes to the same limit.
>> That is the one indicated by device_attr.qp_rd_atom_max. This value
>> only controls rdma READ operations, and write operations should be
>> governed by the size of the send queue. So we separated the state
>> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states,
>> one for controller to host transfers, and the other for host to
>> controller transfers.
>> This was useful in two ways. Firstly, it allowed us to loosen
>> restrictions on rdma sends, and secondly it allowed us to make the
>> states of the rdma request a linear chain that can be followed through
>> from start to finish.
>> It looks like your I/O is getting stuck in state 8, or
>> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from
>> your stacktrace that the I/O enters that state at least twice, which
>> is normal. If there is more than one I/O queued in that state, we
>> operate on them in a fifo manner. Also, if there are currently too
>> many outstanding send wrs, we wait until we receive some completions
>> on the send queue to continue. Every time we get a completion, we will
>> poll through this list to try to submit more SENDs.
>>
>> I have a few follow up questions about your configuration and
>> information we will need to be able to help:
>>
>> 1. What is the queue depth of I/O you are submitting to the target?
>> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed
>> read/write workload?
>> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend
>> continue forever until you get a timeout on the I/O?
>> 4. Can you add two extra debug prints to the target, one to each if
>> statement inside of state
>> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we
>> are staying in that state. In those debug prints, it would be useful
>> if you could include the following information:
>> rqpair->current_send_depth, rdma_req->num_outstanding_data_wr,  
>> rqpair->max_send_depth.
>>
>> Some other information that may be useful:
>> How many cores are you running the SPDK target on? How many devices
>> and connections do you have when this hits?
>>
>> This information will help us understand better why your I/O aren't
>> making forward progress. I am really interested to see what is going
>> on here since we don't have this issue on either the kernel or SPDK
>> initiators, and the changes we made should have loosened up the
>> requirements for sending I/O.
>>
>> Thanks,
>>
>> Seth Howell
>>
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris,
>> James R
>> Sent: Wednesday, February 6, 2019 7:55 AM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>>
>>
>>
>> On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"
>> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>
>> wrote:
>>
>>     Hi Jim,
>>    Our module is an implementation of the NVMeoF/RDMA host.
>>    It works with SPDK 18.10.1 well, so the problem seems to be related
>> to the
>>     SPDK 19.01 code.
>>    I can see that the RDMA request's state engine have been changed in
>> the
>>     recent SPDK release.
>>     So it would be great if the author of the modifications could take
>> a look
>>     at the issue...
>>    Thank you for your help!
>>
>>
>> Hi Valeriy,
>>
>> Can you provide detailed information about what your host module is
>> doing to induce this behavior?  Our tests with the Linux kernel host
>> driver and the SPDK host driver do not seem to be hitting this problem.
>>
>> Thanks,
>>
>> -Jim
>>
>>
>

	



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 23:06 Harris, James R
  0 siblings, 0 replies; 21+ messages in thread
From: Harris, James R @ 2019-02-06 23:06 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11602 bytes --]

Could you also (as an experiment) try reading the full 512 bytes of the SMART/Health Information log page from your host driver?  Currently it looks like it's trying to read just the first 124 bytes.

On 2/6/19, 4:03 PM, "SPDK on behalf of Howell, Seth" <spdk-bounces(a)lists.01.org on behalf of seth.howell(a)intel.com> wrote:

    Hi Valeriy,
    
    Thanks for sending the full log. At the surface, everything looks how I would expect it to. I am going to try to replicate your I/O patterns here locally to see if I can repro this with another initiator. In the meantime, can you add debug statements to all of the places where we increment and decrement the current_send_depth value and run the test again?
    
    Also, does it always fail on a get_log_page command?
    
    Thanks,
    
    Seth
    
    -----Original Message-----
    From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Seth
    Sent: Wednesday, February 6, 2019 1:28 PM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    
    Thanks Valeriy,
    
    I agree with you that there is something going wrong with the current_send_depth value.
    My first thought with something like this is usually that we are doing our math wrong and we are underflowing. But the value being reported is 62 which implies that we have simply submitted 62 send_wrs without reaping any on the completion queue. Underflow would probably result in a number closer to 2^32-1. This leads me to believe we are blocking the decrement in some way.
    
    In order for a blockage like this to occur, one of two things has to be happening:
    1. The SENDs are not properly being posted to the completion queue. (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the work requests. The spec says they will definitely get completed back to the completion queue in that case) 2. We aren't properly decrementing the current_send_depth when we get a completion (Also very unlikely. You can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every completion that is classified as a send we decrement current_send_depth in both the standard and error paths).
    
    Can you please send me the full debug log? I want to look at all of the commands previous to the failed one to see if they are returning to state 0, which would imply that we are getting the send completions back.
    Something else that would be really helpful is if you printed out the value of current_send_depth each time we increment or decrement it. That would help me see exactly what's missing here.
    
    Just a few more questions to establish a baseline - 1. What kind of a NIC are you using?
    2. Have you made any changes to the target code on top of 19.01? i.e. applied any custom patches?
    
    Thanks,
    
    Seth
    
    -----Original Message-----
    From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov
    Sent: Wednesday, February 6, 2019 11:15 AM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    
    Hi Seth,
    
    thank you for your great explanation of the changes you have done!
    
    I added some debug prints to the target code as you requested.
    
    Here is a part of the target's debug log with the last Admin queue
    requests:
    =======
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 1
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 5
    request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 cid
    2815 nsid 0 cdw10 0x00000006
    request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0 len 0x0
    request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815
    cdw0=0x00000001 rsvd1=0 status=0x0000
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 7
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 9
    rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  
    0xfdd2e8 Connection: 0xf04970
    rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  
    0xff14e0 Connection: 0xf04970
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 12
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 0 ...
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 1
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 2
    rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20 took 1 buffer/s from central pool
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 5
    request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid
    2847 nsid 4294967295 cdw10 0x001e0002
    request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
    0xffffe483a781fdd0 key 0x82929 len 0x200
    ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02
    offset=0x0 len=0x7c
    request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847
    cdw0=0x00000000 rsvd1=0 status=0x0000
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 7
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
    rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
    rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
    rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
    =======
    
    The Admin queue depth = 32.
    
    It seems that there is an issue with the rqpair->current_send_depth counter as the request was the only one in the queue at the moment.
    
    I could send you the complete debug log, if you need.
    
    Thank you for your help.
    
    --
    Best regards,
    Valeriy Glushkov
    www.starwind.com
    valeriy.glushkov(a)starwind.com
    
    Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb
    2019 18:03:49 +0200:
    
    > Hi Valeriy,
    >
    > I made the changes that you mentioned to the RDMA request state machine.  
    > Here is a little background on those changes.
    > Those changes were part of an effort to better approach the limits of 
    > work requests that can be posted at once to the send queue. Previously 
    > we were limiting both RDMA reads and RDMA writes to the same limit.
    > That is the one indicated by device_attr.qp_rd_atom_max. This value 
    > only controls rdma READ operations, and write operations should be 
    > governed by the size of the send queue. So we separated the state 
    > RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, 
    > one for controller to host transfers, and the other for host to 
    > controller transfers.
    > This was useful in two ways. Firstly, it allowed us to loosen 
    > restrictions on rdma sends, and secondly it allowed us to make the 
    > states of the rdma request a linear chain that can be followed through 
    > from start to finish.
    > It looks like your I/O is getting stuck in state 8, or 
    > RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from 
    > your stacktrace that the I/O enters that state at least twice, which 
    > is normal. If there is more than one I/O queued in that state, we 
    > operate on them in a fifo manner. Also, if there are currently too 
    > many outstanding send wrs, we wait until we receive some completions 
    > on the send queue to continue. Every time we get a completion, we will 
    > poll through this list to try to submit more SENDs.
    >
    > I have a few follow up questions about your configuration and 
    > information we will need to be able to help:
    >
    > 1. What is the queue depth of I/O you are submitting to the target?
    > 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed 
    > read/write workload?
    > 3. Your stacktrace shows the I/O enter state 8 twice, does this trend 
    > continue forever until you get a timeout on the I/O?
    > 4. Can you add two extra debug prints to the target, one to each if 
    > statement inside of state 
    > RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we 
    > are staying in that state. In those debug prints, it would be useful 
    > if you could include the following information:
    > rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
    >
    > Some other information that may be useful:
    > How many cores are you running the SPDK target on? How many devices 
    > and connections do you have when this hits?
    >
    > This information will help us understand better why your I/O aren't 
    > making forward progress. I am really interested to see what is going 
    > on here since we don't have this issue on either the kernel or SPDK 
    > initiators, and the changes we made should have loosened up the 
    > requirements for sending I/O.
    >
    > Thanks,
    >
    > Seth Howell
    >
    >
    > -----Original Message-----
    > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, 
    > James R
    > Sent: Wednesday, February 6, 2019 7:55 AM
    > To: Storage Performance Development Kit <spdk(a)lists.01.org>
    > Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    >
    >
    >
    > On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  
    > <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>
    > wrote:
    >
    >     Hi Jim,
    >    Our module is an implementation of the NVMeoF/RDMA host.
    >    It works with SPDK 18.10.1 well, so the problem seems to be related 
    > to the
    >     SPDK 19.01 code.
    >    I can see that the RDMA request's state engine have been changed in 
    > the
    >     recent SPDK release.
    >     So it would be great if the author of the modifications could take 
    > a look
    >     at the issue...
    >    Thank you for your help!
    >
    >
    > Hi Valeriy,
    >
    > Can you provide detailed information about what your host module is 
    > doing to induce this behavior?  Our tests with the Linux kernel host 
    > driver and the SPDK host driver do not seem to be hitting this problem.
    >
    > Thanks,
    >
    > -Jim
    >
    >
    
    
    	
    
    
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 23:03 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-06 23:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10276 bytes --]

Hi Valeriy,

Thanks for sending the full log. At the surface, everything looks how I would expect it to. I am going to try to replicate your I/O patterns here locally to see if I can repro this with another initiator. In the meantime, can you add debug statements to all of the places where we increment and decrement the current_send_depth value and run the test again?

Also, does it always fail on a get_log_page command?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Howell, Seth
Sent: Wednesday, February 6, 2019 1:28 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Thanks Valeriy,

I agree with you that there is something going wrong with the current_send_depth value.
My first thought with something like this is usually that we are doing our math wrong and we are underflowing. But the value being reported is 62 which implies that we have simply submitted 62 send_wrs without reaping any on the completion queue. Underflow would probably result in a number closer to 2^32-1. This leads me to believe we are blocking the decrement in some way.

In order for a blockage like this to occur, one of two things has to be happening:
1. The SENDs are not properly being posted to the completion queue. (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the work requests. The spec says they will definitely get completed back to the completion queue in that case) 2. We aren't properly decrementing the current_send_depth when we get a completion (Also very unlikely. You can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every completion that is classified as a send we decrement current_send_depth in both the standard and error paths).

Can you please send me the full debug log? I want to look at all of the commands previous to the failed one to see if they are returning to state 0, which would imply that we are getting the send completions back.
Something else that would be really helpful is if you printed out the value of current_send_depth each time we increment or decrement it. That would help me see exactly what's missing here.

Just a few more questions to establish a baseline - 1. What kind of a NIC are you using?
2. Have you made any changes to the target code on top of 19.01? i.e. applied any custom patches?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov
Sent: Wednesday, February 6, 2019 11:15 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi Seth,

thank you for your great explanation of the changes you have done!

I added some debug prints to the target code as you requested.

Here is a part of the target's debug log with the last Admin queue
requests:
=======
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 cid
2815 nsid 0 cdw10 0x00000006
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0 len 0x0
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815
cdw0=0x00000001 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 9
rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  
0xfdd2e8 Connection: 0xf04970
rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  
0xff14e0 Connection: 0xf04970
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 12
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 0 ...
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 2
rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20 took 1 buffer/s from central pool
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid
2847 nsid 4294967295 cdw10 0x001e0002
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
0xffffe483a781fdd0 key 0x82929 len 0x200
ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02
offset=0x0 len=0x7c
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847
cdw0=0x00000000 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
=======

The Admin queue depth = 32.

It seems that there is an issue with the rqpair->current_send_depth counter as the request was the only one in the queue at the moment.

I could send you the complete debug log, if you need.

Thank you for your help.

--
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb
2019 18:03:49 +0200:

> Hi Valeriy,
>
> I made the changes that you mentioned to the RDMA request state machine.  
> Here is a little background on those changes.
> Those changes were part of an effort to better approach the limits of 
> work requests that can be posted at once to the send queue. Previously 
> we were limiting both RDMA reads and RDMA writes to the same limit.
> That is the one indicated by device_attr.qp_rd_atom_max. This value 
> only controls rdma READ operations, and write operations should be 
> governed by the size of the send queue. So we separated the state 
> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, 
> one for controller to host transfers, and the other for host to 
> controller transfers.
> This was useful in two ways. Firstly, it allowed us to loosen 
> restrictions on rdma sends, and secondly it allowed us to make the 
> states of the rdma request a linear chain that can be followed through 
> from start to finish.
> It looks like your I/O is getting stuck in state 8, or 
> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from 
> your stacktrace that the I/O enters that state at least twice, which 
> is normal. If there is more than one I/O queued in that state, we 
> operate on them in a fifo manner. Also, if there are currently too 
> many outstanding send wrs, we wait until we receive some completions 
> on the send queue to continue. Every time we get a completion, we will 
> poll through this list to try to submit more SENDs.
>
> I have a few follow up questions about your configuration and 
> information we will need to be able to help:
>
> 1. What is the queue depth of I/O you are submitting to the target?
> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed 
> read/write workload?
> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend 
> continue forever until you get a timeout on the I/O?
> 4. Can you add two extra debug prints to the target, one to each if 
> statement inside of state 
> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we 
> are staying in that state. In those debug prints, it would be useful 
> if you could include the following information:
> rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
>
> Some other information that may be useful:
> How many cores are you running the SPDK target on? How many devices 
> and connections do you have when this hits?
>
> This information will help us understand better why your I/O aren't 
> making forward progress. I am really interested to see what is going 
> on here since we don't have this issue on either the kernel or SPDK 
> initiators, and the changes we made should have loosened up the 
> requirements for sending I/O.
>
> Thanks,
>
> Seth Howell
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, 
> James R
> Sent: Wednesday, February 6, 2019 7:55 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
>
>
> On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  
> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>
> wrote:
>
>     Hi Jim,
>    Our module is an implementation of the NVMeoF/RDMA host.
>    It works with SPDK 18.10.1 well, so the problem seems to be related 
> to the
>     SPDK 19.01 code.
>    I can see that the RDMA request's state engine have been changed in 
> the
>     recent SPDK release.
>     So it would be great if the author of the modifications could take 
> a look
>     at the issue...
>    Thank you for your help!
>
>
> Hi Valeriy,
>
> Can you provide detailed information about what your host module is 
> doing to induce this behavior?  Our tests with the Linux kernel host 
> driver and the SPDK host driver do not seem to be hitting this problem.
>
> Thanks,
>
> -Jim
>
>


	


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 20:27 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-06 20:27 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 9419 bytes --]

Thanks Valeriy,

I agree with you that there is something going wrong with the current_send_depth value.
My first thought with something like this is usually that we are doing our math wrong and we are underflowing. But the value being reported is 62 which implies that we have simply submitted 62 send_wrs without reaping any on the completion queue. Underflow would probably result in a number closer to 2^32-1. This leads me to believe we are blocking the decrement in some way.

In order for a blockage like this to occur, one of two things has to be happening:
1. The SENDs are not properly being posted to the completion queue. (This is unlikely since we add the flag IBV_SEND_SIGNALED to all of the work requests. The spec says they will definitely get completed back to the completion queue in that case)
2. We aren't properly decrementing the current_send_depth when we get a completion (Also very unlikely. You can inspect the code in spdk_nvmf_rdma_poller_poll to see that for every completion that is classified as a send we decrement current_send_depth in both the standard and error paths).

Can you please send me the full debug log? I want to look at all of the commands previous to the failed one to see if they are returning to state 0, which would imply that we are getting the send completions back.
Something else that would be really helpful is if you printed out the value of current_send_depth each time we increment or decrement it. That would help me see exactly what's missing here.

Just a few more questions to establish a baseline -
1. What kind of a NIC are you using?
2. Have you made any changes to the target code on top of 19.01? i.e. applied any custom patches?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Valeriy Glushkov
Sent: Wednesday, February 6, 2019 11:15 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target

Hi Seth,

thank you for your great explanation of the changes you have done!

I added some debug prints to the target code as you requested.

Here is a part of the target's debug log with the last Admin queue
requests:
=======
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 cid
2815 nsid 0 cdw10 0x00000006
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0 len 0x0
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815
cdw0=0x00000001 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 9
rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  
0xfdd2e8 Connection: 0xf04970
rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  
0xff14e0 Connection: 0xf04970
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 12
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0 entering state 0 ...
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 2
rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20 took 1 buffer/s from central pool
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid
2847 nsid 4294967295 cdw10 0x001e0002
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr
0xffffe483a781fdd0 key 0x82929 len 0x200
ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02
offset=0x0 len=0x7c
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847
cdw0=0x00000000 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20 entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding (max_send_depth 63). Waiting until some finish.
=======

The Admin queue depth = 32.

It seems that there is an issue with the rqpair->current_send_depth counter as the request was the only one in the queue at the moment.

I could send you the complete debug log, if you need.

Thank you for your help.

--
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb
2019 18:03:49 +0200:

> Hi Valeriy,
>
> I made the changes that you mentioned to the RDMA request state machine.  
> Here is a little background on those changes.
> Those changes were part of an effort to better approach the limits of 
> work requests that can be posted at once to the send queue. Previously 
> we were limiting both RDMA reads and RDMA writes to the same limit. 
> That is the one indicated by device_attr.qp_rd_atom_max. This value 
> only controls rdma READ operations, and write operations should be 
> governed by the size of the send queue. So we separated the state 
> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, 
> one for controller to host transfers, and the other for host to 
> controller transfers.
> This was useful in two ways. Firstly, it allowed us to loosen 
> restrictions on rdma sends, and secondly it allowed us to make the 
> states of the rdma request a linear chain that can be followed through 
> from start to finish.
> It looks like your I/O is getting stuck in state 8, or 
> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from 
> your stacktrace that the I/O enters that state at least twice, which 
> is normal. If there is more than one I/O queued in that state, we 
> operate on them in a fifo manner. Also, if there are currently too 
> many outstanding send wrs, we wait until we receive some completions 
> on the send queue to continue. Every time we get a completion, we will 
> poll through this list to try to submit more SENDs.
>
> I have a few follow up questions about your configuration and 
> information we will need to be able to help:
>
> 1. What is the queue depth of I/O you are submitting to the target?
> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed 
> read/write workload?
> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend 
> continue forever until you get a timeout on the I/O?
> 4. Can you add two extra debug prints to the target, one to each if 
> statement inside of state 
> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we 
> are staying in that state. In those debug prints, it would be useful 
> if you could include the following information: 
> rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
>
> Some other information that may be useful:
> How many cores are you running the SPDK target on? How many devices 
> and connections do you have when this hits?
>
> This information will help us understand better why your I/O aren't 
> making forward progress. I am really interested to see what is going 
> on here since we don't have this issue on either the kernel or SPDK 
> initiators, and the changes we made should have loosened up the 
> requirements for sending I/O.
>
> Thanks,
>
> Seth Howell
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, 
> James R
> Sent: Wednesday, February 6, 2019 7:55 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
>
>
> On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  
> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>
> wrote:
>
>     Hi Jim,
>    Our module is an implementation of the NVMeoF/RDMA host.
>    It works with SPDK 18.10.1 well, so the problem seems to be related 
> to the
>     SPDK 19.01 code.
>    I can see that the RDMA request's state engine have been changed in 
> the
>     recent SPDK release.
>     So it would be great if the author of the modifications could take 
> a look
>     at the issue...
>    Thank you for your help!
>
>
> Hi Valeriy,
>
> Can you provide detailed information about what your host module is 
> doing to induce this behavior?  Our tests with the Linux kernel host 
> driver and the SPDK host driver do not seem to be hitting this problem.
>
> Thanks,
>
> -Jim
>
>


	


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 18:15 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-06 18:15 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7384 bytes --]

Hi Seth,

thank you for your great explanation of the changes you have done!

I added some debug prints to the target code as you requested.

Here is a part of the target's debug log with the last Admin queue  
requests:
=======
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x0a fuse 0 cid  
2815 nsid 0 cdw10 0x00000006
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed: addr 0x0 key 0x0  
len 0x0
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2815  
cdw0=0x00000001 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 9
rdma.c: 978:request_transfer_out: *DEBUG*: RDMA RECV POSTED. Recv:  
0xfdd2e8 Connection: 0xf04970
rdma.c:1001:request_transfer_out: *DEBUG*: RDMA SEND POSTED. Request:  
0xff14e0 Connection: 0xf04970
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 11
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 12
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff14e0  
entering state 0
...
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 2
rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0xff1a20  
took 1 buffer/s from central pool
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid  
2847 nsid 4294967295 cdw10 0x001e0002
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr  
0xffffe483a781fdd0 key 0x82929 len 0x200
ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02  
offset=0x0 len=0x7c
request.c:  92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2847  
cdw0=0x00000000 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding  
(max_send_depth 63). Waiting until some finish.
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0xff1a20  
entering state 8
rdma.c:1643:spdk_nvmf_rdma_request_process: *DEBUG*: 62+1 WRs outstanding  
(max_send_depth 63). Waiting until some finish.
=======

The Admin queue depth = 32.

It seems that there is an issue with the rqpair->current_send_depth  
counter as the request was the only one in the queue at the moment.

I could send you the complete debug log, if you need.

Thank you for your help.

-- 
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com

Howell, Seth <seth.howell(a)intel.com> писал(а) в своём письме Wed, 06 Feb  
2019 18:03:49 +0200:

> Hi Valeriy,
>
> I made the changes that you mentioned to the RDMA request state machine.  
> Here is a little background on those changes.
> Those changes were part of an effort to better approach the limits of  
> work requests that can be posted at once to the send queue. Previously  
> we were limiting both RDMA reads and RDMA writes to the same limit. That  
> is the one indicated by device_attr.qp_rd_atom_max. This value only  
> controls rdma READ operations, and write operations should be governed  
> by the size of the send queue. So we separated the state  
> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, one  
> for controller to host transfers, and the other for host to controller  
> transfers.
> This was useful in two ways. Firstly, it allowed us to loosen  
> restrictions on rdma sends, and secondly it allowed us to make the  
> states of the rdma request a linear chain that can be followed through  
> from start to finish.
> It looks like your I/O is getting stuck in state 8, or  
> RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from your  
> stacktrace that the I/O enters that state at least twice, which is  
> normal. If there is more than one I/O queued in that state, we operate  
> on them in a fifo manner. Also, if there are currently too many  
> outstanding send wrs, we wait until we receive some completions on the  
> send queue to continue. Every time we get a completion, we will poll  
> through this list to try to submit more SENDs.
>
> I have a few follow up questions about your configuration and  
> information we will need to be able to help:
>
> 1. What is the queue depth of I/O you are submitting to the target?
> 2. Are you doing a 100% NVMe-oF read workload, or is it a mixed  
> read/write workload?
> 3. Your stacktrace shows the I/O enter state 8 twice, does this trend  
> continue forever until you get a timeout on the I/O?
> 4. Can you add two extra debug prints to the target, one to each if  
> statement inside of state  
> RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we  
> are staying in that state. In those debug prints, it would be useful if  
> you could include the following information: rqpair->current_send_depth,  
> rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
>
> Some other information that may be useful:
> How many cores are you running the SPDK target on? How many devices and  
> connections do you have when this hits?
>
> This information will help us understand better why your I/O aren't  
> making forward progress. I am really interested to see what is going on  
> here since we don't have this issue on either the kernel or SPDK  
> initiators, and the changes we made should have loosened up the  
> requirements for sending I/O.
>
> Thanks,
>
> Seth Howell
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James  
> R
> Sent: Wednesday, February 6, 2019 7:55 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
>
>
>
> On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov"  
> <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com>  
> wrote:
>
>     Hi Jim,
>    Our module is an implementation of the NVMeoF/RDMA host.
>    It works with SPDK 18.10.1 well, so the problem seems to be related  
> to the
>     SPDK 19.01 code.
>    I can see that the RDMA request's state engine have been changed in  
> the
>     recent SPDK release.
>     So it would be great if the author of the modifications could take a  
> look
>     at the issue...
>    Thank you for your help!
>
>
> Hi Valeriy,
>
> Can you provide detailed information about what your host module is  
> doing to induce this behavior?  Our tests with the Linux kernel host  
> driver and the SPDK host driver do not seem to be hitting this problem.
>
> Thanks,
>
> -Jim
>
>


	



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 16:54 Harris, James R
  0 siblings, 0 replies; 21+ messages in thread
From: Harris, James R @ 2019-02-06 16:54 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4912 bytes --]

Hey Seth,

Looking at Valeriy's original e-mail, this looks like a GET_LOG_PAGE.

request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid 2751 nsid 4294967295 cdw10 0x001e0002

Admin cmd opc 0x02 = GET LOG PAGE
cdw10 0x001e0002
Log Page Identifier (07:00) == 0x02 (SMART/Health)
Number of Dwords Lower (31:16) == 0x1e == 30 (but since this is 0's based, it actually means 31 dwords)

-Jim




On 2/6/19, 9:29 AM, "SPDK on behalf of Howell, Seth" <spdk-bounces(a)lists.01.org on behalf of seth.howell(a)intel.com> wrote:

    Hi Valeriy,
    
    I made the changes that you mentioned to the RDMA request state machine. Here is a little background on those changes.
    Those changes were part of an effort to better approach the limits of work requests that can be posted at once to the send queue. Previously we were limiting both RDMA reads and RDMA writes to the same limit. That is the one indicated by device_attr.qp_rd_atom_max. This value only controls rdma READ operations, and write operations should be governed by the size of the send queue. So we separated the state RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, one for controller to host transfers, and the other for host to controller transfers.
    This was useful in two ways. Firstly, it allowed us to loosen restrictions on rdma sends, and secondly it allowed us to make the states of the rdma request a linear chain that can be followed through from start to finish.
    It looks like your I/O is getting stuck in state 8, or RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from your stacktrace that the I/O enters that state at least twice, which is normal. If there is more than one I/O queued in that state, we operate on them in a fifo manner. Also, if there are currently too many outstanding send wrs, we wait until we receive some completions on the send queue to continue. Every time we get a completion, we will poll through this list to try to submit more SENDs.
    
    I have a few follow up questions about your configuration and information we will need to be able to help:
    
    1. What is the queue depth of I/O you are submitting to the target?
    2. Are you doing a 100% NVMe-oF read workload, or is it a mixed read/write workload?
    3. Your stacktrace shows the I/O enter state 8 twice, does this trend continue forever until you get a timeout on the I/O?
    4. Can you add two extra debug prints to the target, one to each if statement inside of state RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we are staying in that state. In those debug prints, it would be useful if you could include the following information: rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.
    
    Some other information that may be useful:
    How many cores are you running the SPDK target on? How many devices and connections do you have when this hits? 
    
    This information will help us understand better why your I/O aren't making forward progress. I am really interested to see what is going on here since we don't have this issue on either the kernel or SPDK initiators, and the changes we made should have loosened up the requirements for sending I/O.
    
    Thanks,
    
    Seth Howell
    
    
    -----Original Message-----
    From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
    Sent: Wednesday, February 6, 2019 7:55 AM
    To: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
    
    
    
    On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov" <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com> wrote:
    
        Hi Jim,
        
        Our module is an implementation of the NVMeoF/RDMA host.
        
        It works with SPDK 18.10.1 well, so the problem seems to be related to the  
        SPDK 19.01 code.
        
        I can see that the RDMA request's state engine have been changed in the  
        recent SPDK release.
        So it would be great if the author of the modifications could take a look  
        at the issue...
        
        Thank you for your help!
    
    
    Hi Valeriy,
    
    Can you provide detailed information about what your host module is doing to induce this behavior?  Our tests with the Linux kernel host driver and the SPDK host driver do not seem to be hitting this problem.
    
    Thanks,
    
    -Jim
    
    
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 16:03 Howell, Seth
  0 siblings, 0 replies; 21+ messages in thread
From: Howell, Seth @ 2019-02-06 16:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3949 bytes --]

Hi Valeriy,

I made the changes that you mentioned to the RDMA request state machine. Here is a little background on those changes.
Those changes were part of an effort to better approach the limits of work requests that can be posted at once to the send queue. Previously we were limiting both RDMA reads and RDMA writes to the same limit. That is the one indicated by device_attr.qp_rd_atom_max. This value only controls rdma READ operations, and write operations should be governed by the size of the send queue. So we separated the state RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING into two different states, one for controller to host transfers, and the other for host to controller transfers.
This was useful in two ways. Firstly, it allowed us to loosen restrictions on rdma sends, and secondly it allowed us to make the states of the rdma request a linear chain that can be followed through from start to finish.
It looks like your I/O is getting stuck in state 8, or RDMA_REQUEST_STATE_DATA_TRANSFERRING_HOST_TO_CONTROLLER. I see from your stacktrace that the I/O enters that state at least twice, which is normal. If there is more than one I/O queued in that state, we operate on them in a fifo manner. Also, if there are currently too many outstanding send wrs, we wait until we receive some completions on the send queue to continue. Every time we get a completion, we will poll through this list to try to submit more SENDs.

I have a few follow up questions about your configuration and information we will need to be able to help:

1. What is the queue depth of I/O you are submitting to the target?
2. Are you doing a 100% NVMe-oF read workload, or is it a mixed read/write workload?
3. Your stacktrace shows the I/O enter state 8 twice, does this trend continue forever until you get a timeout on the I/O?
4. Can you add two extra debug prints to the target, one to each if statement inside of state RDMA_REQUEST_STATE_DATA_TRANSFER_TO_HOST_PENDING? I want to see why we are staying in that state. In those debug prints, it would be useful if you could include the following information: rqpair->current_send_depth, rdma_req->num_outstanding_data_wr, rqpair->max_send_depth.

Some other information that may be useful:
How many cores are you running the SPDK target on? How many devices and connections do you have when this hits? 

This information will help us understand better why your I/O aren't making forward progress. I am really interested to see what is going on here since we don't have this issue on either the kernel or SPDK initiators, and the changes we made should have loosened up the requirements for sending I/O.

Thanks,

Seth Howell


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
Sent: Wednesday, February 6, 2019 7:55 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target



On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov" <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com> wrote:

    Hi Jim,
    
    Our module is an implementation of the NVMeoF/RDMA host.
    
    It works with SPDK 18.10.1 well, so the problem seems to be related to the  
    SPDK 19.01 code.
    
    I can see that the RDMA request's state engine have been changed in the  
    recent SPDK release.
    So it would be great if the author of the modifications could take a look  
    at the issue...
    
    Thank you for your help!


Hi Valeriy,

Can you provide detailed information about what your host module is doing to induce this behavior?  Our tests with the Linux kernel host driver and the SPDK host driver do not seem to be hitting this problem.

Thanks,

-Jim


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06 14:55 Harris, James R
  0 siblings, 0 replies; 21+ messages in thread
From: Harris, James R @ 2019-02-06 14:55 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]



On 2/6/19, 12:59 AM, "SPDK on behalf of Valeriy Glushkov" <spdk-bounces(a)lists.01.org on behalf of valeriy.glushkov(a)starwind.com> wrote:

    Hi Jim,
    
    Our module is an implementation of the NVMeoF/RDMA host.
    
    It works with SPDK 18.10.1 well, so the problem seems to be related to the  
    SPDK 19.01 code.
    
    I can see that the RDMA request's state engine have been changed in the  
    recent SPDK release.
    So it would be great if the author of the modifications could take a look  
    at the issue...
    
    Thank you for your help!


Hi Valeriy,

Can you provide detailed information about what your host module is doing to induce this behavior?  Our tests with the Linux kernel host driver and the SPDK host driver do not seem to be hitting this problem.

Thanks,

-Jim



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-06  0:35 Harris, James R
  0 siblings, 0 replies; 21+ messages in thread
From: Harris, James R @ 2019-02-06  0:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]

Hi Valeriy,

Can you provide more details on this test host module?

Thanks,

Jim

> On Feb 5, 2019, at 4:41 AM, Valeriy Glushkov <valeriy.glushkov(a)starwind.com> wrote:
> 
> Hi All!
> 
> It seems there is an issue in the 19.01 code that leads to a request stays unanswered and our test NVMeoF/RDMA host module does not receive the completion.
> 
> From the debug log of the target:
> =======
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 1
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 2
> rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0x1402a20 took 1 buffer/s from central pool
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 5
> request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid 2751 nsid 4294967295 cdw10 0x001e0002
> request.c: 127:nvmf_trace_command: *DEBUG*: psdt 0
> request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr 0xffffb28f67223dd0 key 0x8294c len 0x200
> ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02 offset=0x0 len=0x7c
> request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2751 cdw0=0x00000000 rsvd1=0 status=0x0000
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 7
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 8
> rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 8
> ========
> 
> The same host module works well with the SPDK 18.10.1 NVMf target.
> 
> Is this a bug in the recent SPDK code?
> 
> --
> Best regards,
> Valeriy Glushkov
> www.starwind.com
> valeriy.glushkov(a)starwind.com
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target
@ 2019-02-05 11:41 Valeriy Glushkov
  0 siblings, 0 replies; 21+ messages in thread
From: Valeriy Glushkov @ 2019-02-05 11:41 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

Hi All!

It seems there is an issue in the 19.01 code that leads to a request stays unanswered and our test NVMeoF/RDMA host module does not receive the completion.

From the debug log of the target:
=======
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 1
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 2
rdma.c:1414:spdk_nvmf_rdma_request_parse_sgl: *DEBUG*: Request 0x1402a20 took 1 buffer/s from central pool
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 5
request.c: 121:nvmf_trace_command: *DEBUG*: Admin cmd: opc 0x02 fuse 0 cid 2751 nsid 4294967295 cdw10 0x001e0002
request.c: 127:nvmf_trace_command: *DEBUG*: psdt 0
request.c: 136:nvmf_trace_command: *DEBUG*: SGL: Keyed (Inv): addr 0xffffb28f67223dd0 key 0x8294c len 0x200
ctrlr.c:1285:spdk_nvmf_ctrlr_get_log_page: *DEBUG*: Get log page: LID=0x02 offset=0x0 len=0x7c
request.c: 92:spdk_nvmf_request_complete: *DEBUG*: cpl: cid=2751 cdw0=0x00000000 rsvd1=0 status=0x0000
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 7
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 8
rdma.c:1508:spdk_nvmf_rdma_request_process: *DEBUG*: Request 0x1402a20 entering state 8
========

The same host module works well with the SPDK 18.10.1 NVMf target.

Is this a bug in the recent SPDK code?

--
Best regards,
Valeriy Glushkov
www.starwind.com
valeriy.glushkov(a)starwind.com




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-02-28 22:07 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06  7:59 [SPDK] A problem with SPDK 19.01 NVMeoF/RDMA target Valeriy Glushkov
  -- strict thread matches above, loose matches on Subject: below --
2019-02-28 22:07 Valeriy Glushkov
2019-02-27 15:04 Howell, Seth
2019-02-27 15:00 Howell, Seth
2019-02-26 23:45 Lance Hartmann ORACLE
2019-02-26 22:21 Valeriy Glushkov
2019-02-07 18:18 Howell, Seth
2019-02-07 18:05 Sasha Kotchubievsky
2019-02-07 16:59 Howell, Seth
2019-02-07 15:37 Valeriy Glushkov
2019-02-07 14:31 Sasha Kotchubievsky
2019-02-07  4:50 Valeriy Glushkov
2019-02-06 23:06 Harris, James R
2019-02-06 23:03 Howell, Seth
2019-02-06 20:27 Howell, Seth
2019-02-06 18:15 Valeriy Glushkov
2019-02-06 16:54 Harris, James R
2019-02-06 16:03 Howell, Seth
2019-02-06 14:55 Harris, James R
2019-02-06  0:35 Harris, James R
2019-02-05 11:41 Valeriy Glushkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.