All of lore.kernel.org
 help / color / mirror / Atom feed
* Windows NVMeoF client - debug assistance requested
@ 2016-06-21  1:50 Robert Randall (rrandall)
  2016-06-21 13:56 ` Steve Wise
  2016-06-21 14:19 ` Sagi Grimberg
  0 siblings, 2 replies; 9+ messages in thread
From: Robert Randall (rrandall) @ 2016-06-21  1:50 UTC (permalink / raw)



Hello all,

We were making great progress on a Windows version of the NVMe over Fabrics host / initiator / client until we ran into something very interesting.  All of read/write requests from the client execute without any errors reported on either side, the Linux server is happy and the Windows client is happy.  The interesting part is that the writes never make it to the device.  The write completes with success but a subsequent read does not return what was written.  We've verified the command packets, etc. and the writes look just fine and do succeed but the data never makes it to the device.  The server reports no errors and returns success.

If we use the Linux host to write data to the device or just 'dd' a good partition table to the device we can read it without any problems from Windows.

I could use some advice on the best strategy to track this down.

The traffic generated by the Linux host and Windows host are quite similar.  The only difference that might be interesting are the RDMA addresses.  On Linux they tend to be low in the address range.  On Windows they tend to be high in the address range.  But this is true of the reads as well.

A sample write from the Linux host:
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet: drivers/nvme/target/core.c nvmet_req_init 499
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct nvme_common_command {
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr { nvme_keyed_sgl_desc
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet: nvme_keyed_sgl_desc.__le64 addr 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet: nvme_keyed_sgl_desc.__u8 length[0] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet: nvme_keyed_sgl_desc.__u8 length[1] 10
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet: nvme_keyed_sgl_desc.__u8 length[2] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet: nvme_keyed_sgl_desc.__u8 key[0] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet: nvme_keyed_sgl_desc.__u8 key[1] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet: nvme_keyed_sgl_desc.__u8 key[2] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet: nvme_keyed_sgl_desc.__u8 key[3] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet: nvme_keyed_sgl_desc.__u8 type 1
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0

A sample write from the Windows host:
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct nvme_common_command {
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr { nvme_keyed_sgl_desc
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet: nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet: nvme_keyed_sgl_desc.__u8 length[0] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet: nvme_keyed_sgl_desc.__u8 length[1] 2
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet: nvme_keyed_sgl_desc.__u8 length[2] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet: nvme_keyed_sgl_desc.__u8 key[0] A
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet: nvme_keyed_sgl_desc.__u8 key[1] 64
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet: nvme_keyed_sgl_desc.__u8 key[2] 1
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet: nvme_keyed_sgl_desc.__u8 key[3] 92
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet: nvme_keyed_sgl_desc.__u8 type 40
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0


I haven't updated my code since 6/8 and I'm using the nvmf-all.2 branch.

Any / all assistance would be much appreciated.  We would like to demo this at FMS in August and, assuming Microsoft releases the documentation for the in-kernel RDMA client API we're using, we would like to open source the driver as well.

Best regards,
Robert Randall

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21  1:50 Windows NVMeoF client - debug assistance requested Robert Randall (rrandall)
@ 2016-06-21 13:56 ` Steve Wise
  2016-06-21 14:16   ` Robert Randall (rrandall)
  2016-06-21 14:19 ` Sagi Grimberg
  1 sibling, 1 reply; 9+ messages in thread
From: Steve Wise @ 2016-06-21 13:56 UTC (permalink / raw)


> I haven't updated my code since 6/8 and I'm using the nvmf-all.2 branch.
>

I had problems with nvmf-all.2 on linux.  Perhaps you should try nvmf-all.3?
 
steve.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21 13:56 ` Steve Wise
@ 2016-06-21 14:16   ` Robert Randall (rrandall)
  2016-06-21 14:22     ` Steve Wise
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Randall (rrandall) @ 2016-06-21 14:16 UTC (permalink / raw)


Similar problems?  What kind of problems.

-----Original Message-----
From: Steve Wise [mailto:swise@opengridcomputing.com] 
Sent: Tuesday, June 21, 2016 8:56 AM
To: Robert Randall (rrandall); linux-nvme at lists.infradead.org
Subject: RE: Windows NVMeoF client - debug assistance requested

> I haven't updated my code since 6/8 and I'm using the nvmf-all.2 branch.
>

I had problems with nvmf-all.2 on linux.  Perhaps you should try nvmf-all.3?
 
steve.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21  1:50 Windows NVMeoF client - debug assistance requested Robert Randall (rrandall)
  2016-06-21 13:56 ` Steve Wise
@ 2016-06-21 14:19 ` Sagi Grimberg
  2016-06-21 14:55   ` Robert Randall
  1 sibling, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2016-06-21 14:19 UTC (permalink / raw)



> Hello all,
>
> We were making great progress on a Windows version of the NVMe over Fabrics host / initiator / client until we ran into something very interesting.  All of read/write requests from the client execute without any errors reported on either side, the Linux server is happy and the Windows client is happy.  The interesting part is that the writes never make it to the device.  The write completes with success but a subsequent read does not return what was written.  We've verified the command packets, etc. and the writes look just fine and do succeed but the data never makes it to the device.  The server reports no errors and returns success.
>
> If we use the Linux host to write data to the device or just 'dd' a good partition table to the device we can read it without any problems from Windows.
>
> I could use some advice on the best strategy to track this down.
>
> The traffic generated by the Linux host and Windows host are quite similar.  The only difference that might be interesting are the RDMA addresses.  On Linux they tend to be low in the address range.  On Windows they tend to be high in the address range.  But this is true of the reads as well.
>
> A sample write from the Linux host:
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet: drivers/nvme/target/core.c nvmet_req_init 499
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct nvme_common_command {
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr { nvme_keyed_sgl_desc
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet: nvme_keyed_sgl_desc.__le64 addr 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet: nvme_keyed_sgl_desc.__u8 length[0] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet: nvme_keyed_sgl_desc.__u8 length[1] 10
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet: nvme_keyed_sgl_desc.__u8 length[2] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet: nvme_keyed_sgl_desc.__u8 key[0] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet: nvme_keyed_sgl_desc.__u8 key[1] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet: nvme_keyed_sgl_desc.__u8 key[2] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet: nvme_keyed_sgl_desc.__u8 key[3] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet: nvme_keyed_sgl_desc.__u8 type 1
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0
>
> A sample write from the Windows host:
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct nvme_common_command {
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr { nvme_keyed_sgl_desc
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet: nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet: nvme_keyed_sgl_desc.__u8 length[0] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet: nvme_keyed_sgl_desc.__u8 length[1] 2
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet: nvme_keyed_sgl_desc.__u8 length[2] 0

Looks like the writes differ in length (4k vs. 512).

> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet: nvme_keyed_sgl_desc.__u8 key[0] A
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet: nvme_keyed_sgl_desc.__u8 key[1] 64
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet: nvme_keyed_sgl_desc.__u8 key[2] 1
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet: nvme_keyed_sgl_desc.__u8 key[3] 92
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet: nvme_keyed_sgl_desc.__u8 type 40
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0

Looks like you never use in-capsule data (which we do in linux for 4k
or less), you passed to the target:
{addr=0xFFFFD0009BB95040, length=0x200, key=0x9201640a}

Are you sure that this is the buffer address you have registered with
the device? Do you see the actual data travel the wire?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21 14:16   ` Robert Randall (rrandall)
@ 2016-06-21 14:22     ` Steve Wise
  0 siblings, 0 replies; 9+ messages in thread
From: Steve Wise @ 2016-06-21 14:22 UTC (permalink / raw)



> 
> Similar problems?  What kind of problems.
>

The same, I think.  From me in an earlier private email:

> > When using nvmf-all.2 branch from infradead, I can discover/connect ok to
> > a target, but when I do an mkfs.ext4 on the nvmf device, it succeeds, but
> > the data isn't written to the device.  So a subsequent mount will fail.  I
> > backed up to nvmf-all branch, and it works fine.

Christoph provided me a patch to include, and I think it is here in nvmf-all.3:

commit bb18f6e76300970383d66dbc17936b854f6d79e6
Author: Christoph Hellwig <hch at lst.de>
Date:   Tue Jun 14 09:27:53 2016 +0200

    nvmet: more fixups for the op vs flags split

    Signed-off-by: Christoph Hellwig <hch at lst.de>

I didn't try .2 + just this fix, because .3 popped out and went to that version.



Steve.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21 14:19 ` Sagi Grimberg
@ 2016-06-21 14:55   ` Robert Randall
  2016-06-21 15:20     ` Robert Randall
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Randall @ 2016-06-21 14:55 UTC (permalink / raw)


Correct, we don't use in-capsule data.  What is confusing is that the
reads are working fine without in-capsule data.  The writes are
failing silently.  Nothing in the protocol indicates an error but the
bits never get to the device.

On Tue, Jun 21, 2016@9:19 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>> Hello all,
>>
>> We were making great progress on a Windows version of the NVMe over
>> Fabrics host / initiator / client until we ran into something very
>> interesting.  All of read/write requests from the client execute without any
>> errors reported on either side, the Linux server is happy and the Windows
>> client is happy.  The interesting part is that the writes never make it to
>> the device.  The write completes with success but a subsequent read does not
>> return what was written.  We've verified the command packets, etc. and the
>> writes look just fine and do succeed but the data never makes it to the
>> device.  The server reports no errors and returns success.
>>
>> If we use the Linux host to write data to the device or just 'dd' a good
>> partition table to the device we can read it without any problems from
>> Windows.
>>
>> I could use some advice on the best strategy to track this down.
>>
>> The traffic generated by the Linux host and Windows host are quite
>> similar.  The only difference that might be interesting are the RDMA
>> addresses.  On Linux they tend to be low in the address range.  On Windows
>> they tend to be high in the address range.  But this is true of the reads as
>> well.
>>
>> A sample write from the Linux host:
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet:
>> drivers/nvme/target/core.c nvmet_req_init 499
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct
>> nvme_common_command {
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr
>> { nvme_keyed_sgl_desc
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet:
>> nvme_keyed_sgl_desc.__le64 addr 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[0] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[1] 10
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[2] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[0] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[1] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[2] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[3] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet:
>> nvme_keyed_sgl_desc.__u8 type 1
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0
>>
>> A sample write from the Windows host:
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct
>> nvme_common_command {
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr
>> { nvme_keyed_sgl_desc
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet:
>> nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[0] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[1] 2
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[2] 0
>
>
> Looks like the writes differ in length (4k vs. 512).
>
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[0] A
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[1] 64
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[2] 1
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[3] 92
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet:
>> nvme_keyed_sgl_desc.__u8 type 40
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0
>
>
> Looks like you never use in-capsule data (which we do in linux for 4k
> or less), you passed to the target:
> {addr=0xFFFFD0009BB95040, length=0x200, key=0x9201640a}
>
> Are you sure that this is the buffer address you have registered with
> the device? Do you see the actual data travel the wire?
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme



-- 
Robert Randall | robert.r.randall at gmail.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21 14:55   ` Robert Randall
@ 2016-06-21 15:20     ` Robert Randall
  2016-06-21 16:02       ` Christoph Hellwig
  2016-06-21 16:02       ` Robert Randall
  0 siblings, 2 replies; 9+ messages in thread
From: Robert Randall @ 2016-06-21 15:20 UTC (permalink / raw)


Well, this is interesting, based on the telemetry from the block layer
the write is going down as a read!!  I'll start looking for the bug.

8975474 Jun 21 10:10:38 nvmeof0 kernel: [  160.772992] nvmet_rdma:
drivers/nvme/target/rdma.c nvmet_rdma_read_data_done 552
8975475 Jun 21 10:10:38 nvmeof0 kernel: [  160.772996] nvmet:
drivers/nvme/target/io-cmd.c nvmet_execute_rw 62
8975476 Jun 21 10:10:38 nvmeof0 kernel: [  160.772998] nvmet:
drivers/nvme/target/io-cmd.c nvmet_execute_rw 71 WRITE
8975477 Jun 21 10:10:38 nvmeof0 kernel: [  160.773005] nvmet:
req->cmd->rw details
8975478 Jun 21 10:10:38 nvmeof0 kernel: [  160.773011] nvmet: __u8 opcode 1
8975479 Jun 21 10:10:38 nvmeof0 kernel: [  160.773013] nvmet: __u8 flags 40
8975480 Jun 21 10:10:38 nvmeof0 kernel: [  160.773015] nvmet: __u16 command_id 3
8975481 Jun 21 10:10:38 nvmeof0 kernel: [  160.773017] nvmet: __le32 nsid 1
8975482 Jun 21 10:10:38 nvmeof0 kernel: [  160.773019] nvmet: __u64 rsvd2 0
8975483 Jun 21 10:10:38 nvmeof0 kernel: [  160.773030] nvmet: __le64 metadata 0
8975484 Jun 21 10:10:38 nvmeof0 kernel: [  160.773035] nvmet: __le64
dptr.ksgl.addr FFFFD0009D6B4400
8975485 Jun 21 10:10:38 nvmeof0 kernel: [  160.773037] nvmet: __u8
dptr.ksgl.length[0] 0
8975486 Jun 21 10:10:38 nvmeof0 kernel: [  160.773040] nvmet: __u8
dptr.ksgl.length[1] 2
8975487 Jun 21 10:10:38 nvmeof0 kernel: [  160.773042] nvmet: __u8
dptr.ksgl.length[2] 0
8975488 Jun 21 10:10:38 nvmeof0 kernel: [  160.773044] nvmet: __u8
dptr.ksgl.key[0] F
8975489 Jun 21 10:10:38 nvmeof0 kernel: [  160.773047] nvmet: __u8
dptr.ksgl.key[1] 85
8975490 Jun 21 10:10:38 nvmeof0 kernel: [  160.773055] nvmet: __u8
dptr.ksgl.key[2] 1
8975491 Jun 21 10:10:38 nvmeof0 kernel: [  160.773064] nvmet: __u8
dptr.ksgl.key[3] BA
8975492 Jun 21 10:10:38 nvmeof0 kernel: [  160.773066] nvmet: __u8
dptr.ksgl.type 40
8975493 Jun 21 10:10:38 nvmeof0 kernel: [  160.773068] nvmet: __le64 slba 0
8975494 Jun 21 10:10:38 nvmeof0 kernel: [  160.773071] nvmet: __le16 length 0
8975495 Jun 21 10:10:38 nvmeof0 kernel: [  160.773072] nvmet: __le16 control 0
8975496 Jun 21 10:10:38 nvmeof0 kernel: [  160.773075] nvmet: __le32 dsmgmt 0
8975497 Jun 21 10:10:38 nvmeof0 kernel: [  160.773077] nvmet: __le32 reftag 0
8975498 Jun 21 10:10:38 nvmeof0 kernel: [  160.773085] nvmet: __le16 apptag 0
8975499 Jun 21 10:10:38 nvmeof0 kernel: [  160.773094] nvmet: __le16 appmask 0
8975500 Jun 21 10:10:38 nvmeof0 kernel: [  160.773101] nvmet: bio before
8975501 Jun 21 10:10:38 nvmeof0 kernel: [  160.773103] nvmet: bio->bi_flags 0
8975502 Jun 21 10:10:38 nvmeof0 kernel: [  160.773105] nvmet: bio->bi_error 0
8975503 Jun 21 10:10:38 nvmeof0 kernel: [  160.773107] nvmet: bio->bi_rw 0
8975504 Jun 21 10:10:38 nvmeof0 kernel: [  160.773109] nvmet: bio->bi_ioprio 0
8975505 Jun 21 10:10:38 nvmeof0 kernel: [  160.773112] nvmet:
bio->bi_phys_segments 0
8975506 Jun 21 10:10:38 nvmeof0 kernel: [  160.773114] nvmet:
bio->bi_phys_segments 0
8975507 Jun 21 10:10:38 nvmeof0 kernel: [  160.773117] nvmet:
drivers/nvme/target/io-cmd.c nvmet_execute_rw 123 sg_cnt 1
**************8975508 Jun 21 10:10:38 nvmeof0 kernel: [  160.773124]
kworker/u17:1(3103): READ block 0 on nvme1n1 (1 sectors) ***********
8975509 Jun 21 10:10:38 nvmeof0 kernel: [  160.773134] nvmet:
drivers/nvme/target/io-cmd.c nvmet_execute_rw 146 calling blk_poll
8975510 Jun 21 10:10:38 nvmeof0 kernel: [  160.773136] nvmet: bio after
8975511 Jun 21 10:10:38 nvmeof0 kernel: [  160.773138] nvmet: bio->bi_flags 2
8975512 Jun 21 10:10:38 nvmeof0 kernel: [  160.773140] nvmet: bio->bi_error 0
8975513 Jun 21 10:10:38 nvmeof0 kernel: [  160.773142] nvmet: bio->bi_rw 0
8975514 Jun 21 10:10:38 nvmeof0 kernel: [  160.773144] nvmet: bio->bi_ioprio 0
8975515 Jun 21 10:10:38 nvmeof0 kernel: [  160.773146] nvmet:
bio->bi_phys_segments 1
8975516 Jun 21 10:10:38 nvmeof0 kernel: [  160.773149] nvmet:
bio->bi_phys_segments 1
8975517 Jun 21 10:10:38 nvmeof0 kernel: [  160.773165] nvmet:
drivers/nvme/target/io-cmd.c nvmet_bio_done 29
8975518 Jun 21 10:10:38 nvmeof0 kernel: [  160.773166] nvmet: block io complete
8975519 Jun 21 10:10:38 nvmeof0 kernel: [  160.773168] nvmet:
drivers/nvme/target/core.c nvmet_req_complete 427
8975520 Jun 21 10:10:38 nvmeof0 kernel: [  160.773169] nvmet: request
is complete.
8975521 Jun 21 10:10:38 nvmeof0 kernel: [  160.773170] nvmet:
drivers/nvme/target/core.c __nvmet_req_complete 406
8975522 Jun 21 10:10:38 nvmeof0 kernel: [  160.773172] nvmet:
__nvmet_req_complete status is zero 0
8975523 Jun 21 10:10:38 nvmeof0 kernel: [  160.773173] nvmet:
__nvmet_req_complete status not set on request!!

On Tue, Jun 21, 2016 at 9:55 AM, Robert Randall
<robert.r.randall@gmail.com> wrote:
> Correct, we don't use in-capsule data.  What is confusing is that the
> reads are working fine without in-capsule data.  The writes are
> failing silently.  Nothing in the protocol indicates an error but the
> bits never get to the device.
>
> On Tue, Jun 21, 2016@9:19 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>>
>>> Hello all,
>>>
>>> We were making great progress on a Windows version of the NVMe over
>>> Fabrics host / initiator / client until we ran into something very
>>> interesting.  All of read/write requests from the client execute without any
>>> errors reported on either side, the Linux server is happy and the Windows
>>> client is happy.  The interesting part is that the writes never make it to
>>> the device.  The write completes with success but a subsequent read does not
>>> return what was written.  We've verified the command packets, etc. and the
>>> writes look just fine and do succeed but the data never makes it to the
>>> device.  The server reports no errors and returns success.
>>>
>>> If we use the Linux host to write data to the device or just 'dd' a good
>>> partition table to the device we can read it without any problems from
>>> Windows.
>>>
>>> I could use some advice on the best strategy to track this down.
>>>
>>> The traffic generated by the Linux host and Windows host are quite
>>> similar.  The only difference that might be interesting are the RDMA
>>> addresses.  On Linux they tend to be low in the address range.  On Windows
>>> they tend to be high in the address range.  But this is true of the reads as
>>> well.
>>>
>>> A sample write from the Linux host:
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet:
>>> drivers/nvme/target/core.c nvmet_req_init 499
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct
>>> nvme_common_command {
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr
>>> { nvme_keyed_sgl_desc
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet:
>>> nvme_keyed_sgl_desc.__le64 addr 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet:
>>> nvme_keyed_sgl_desc.__u8 length[0] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet:
>>> nvme_keyed_sgl_desc.__u8 length[1] 10
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet:
>>> nvme_keyed_sgl_desc.__u8 length[2] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[0] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[1] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[2] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[3] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet:
>>> nvme_keyed_sgl_desc.__u8 type 1
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0
>>>
>>> A sample write from the Windows host:
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct
>>> nvme_common_command {
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr
>>> { nvme_keyed_sgl_desc
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet:
>>> nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet:
>>> nvme_keyed_sgl_desc.__u8 length[0] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet:
>>> nvme_keyed_sgl_desc.__u8 length[1] 2
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet:
>>> nvme_keyed_sgl_desc.__u8 length[2] 0
>>
>>
>> Looks like the writes differ in length (4k vs. 512).
>>
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[0] A
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[1] 64
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[2] 1
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet:
>>> nvme_keyed_sgl_desc.__u8 key[3] 92
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet:
>>> nvme_keyed_sgl_desc.__u8 type 40
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0
>>
>>
>> Looks like you never use in-capsule data (which we do in linux for 4k
>> or less), you passed to the target:
>> {addr=0xFFFFD0009BB95040, length=0x200, key=0x9201640a}
>>
>> Are you sure that this is the buffer address you have registered with
>> the device? Do you see the actual data travel the wire?
>>
>>
>> _______________________________________________
>> Linux-nvme mailing list
>> Linux-nvme at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
>
>
> --
> Robert Randall | robert.r.randall at gmail.com



-- 
Robert Randall | robert.r.randall at gmail.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21 15:20     ` Robert Randall
@ 2016-06-21 16:02       ` Christoph Hellwig
  2016-06-21 16:02       ` Robert Randall
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2016-06-21 16:02 UTC (permalink / raw)


Please use the nvmf-all.3 branch.  As steve already pointed out these
issues have been fixed for a while.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Windows NVMeoF client - debug assistance requested
  2016-06-21 15:20     ` Robert Randall
  2016-06-21 16:02       ` Christoph Hellwig
@ 2016-06-21 16:02       ` Robert Randall
  1 sibling, 0 replies; 9+ messages in thread
From: Robert Randall @ 2016-06-21 16:02 UTC (permalink / raw)


Found the problem.  A *slight* bug in io_cmd.c / nvmet_execute_rw.
The cal to setup the block io attributes was missing.  Added the
following line where it should have been:

bio_set_op_attrs(bio, op, op_flags);

Writes are now working as expected so far.

On Tue, Jun 21, 2016 at 10:20 AM, Robert Randall
<robert.r.randall@gmail.com> wrote:
> Well, this is interesting, based on the telemetry from the block layer
> the write is going down as a read!!  I'll start looking for the bug.
>
> 8975474 Jun 21 10:10:38 nvmeof0 kernel: [  160.772992] nvmet_rdma:
> drivers/nvme/target/rdma.c nvmet_rdma_read_data_done 552
> 8975475 Jun 21 10:10:38 nvmeof0 kernel: [  160.772996] nvmet:
> drivers/nvme/target/io-cmd.c nvmet_execute_rw 62
> 8975476 Jun 21 10:10:38 nvmeof0 kernel: [  160.772998] nvmet:
> drivers/nvme/target/io-cmd.c nvmet_execute_rw 71 WRITE
> 8975477 Jun 21 10:10:38 nvmeof0 kernel: [  160.773005] nvmet:
> req->cmd->rw details
> 8975478 Jun 21 10:10:38 nvmeof0 kernel: [  160.773011] nvmet: __u8 opcode 1
> 8975479 Jun 21 10:10:38 nvmeof0 kernel: [  160.773013] nvmet: __u8 flags 40
> 8975480 Jun 21 10:10:38 nvmeof0 kernel: [  160.773015] nvmet: __u16 command_id 3
> 8975481 Jun 21 10:10:38 nvmeof0 kernel: [  160.773017] nvmet: __le32 nsid 1
> 8975482 Jun 21 10:10:38 nvmeof0 kernel: [  160.773019] nvmet: __u64 rsvd2 0
> 8975483 Jun 21 10:10:38 nvmeof0 kernel: [  160.773030] nvmet: __le64 metadata 0
> 8975484 Jun 21 10:10:38 nvmeof0 kernel: [  160.773035] nvmet: __le64
> dptr.ksgl.addr FFFFD0009D6B4400
> 8975485 Jun 21 10:10:38 nvmeof0 kernel: [  160.773037] nvmet: __u8
> dptr.ksgl.length[0] 0
> 8975486 Jun 21 10:10:38 nvmeof0 kernel: [  160.773040] nvmet: __u8
> dptr.ksgl.length[1] 2
> 8975487 Jun 21 10:10:38 nvmeof0 kernel: [  160.773042] nvmet: __u8
> dptr.ksgl.length[2] 0
> 8975488 Jun 21 10:10:38 nvmeof0 kernel: [  160.773044] nvmet: __u8
> dptr.ksgl.key[0] F
> 8975489 Jun 21 10:10:38 nvmeof0 kernel: [  160.773047] nvmet: __u8
> dptr.ksgl.key[1] 85
> 8975490 Jun 21 10:10:38 nvmeof0 kernel: [  160.773055] nvmet: __u8
> dptr.ksgl.key[2] 1
> 8975491 Jun 21 10:10:38 nvmeof0 kernel: [  160.773064] nvmet: __u8
> dptr.ksgl.key[3] BA
> 8975492 Jun 21 10:10:38 nvmeof0 kernel: [  160.773066] nvmet: __u8
> dptr.ksgl.type 40
> 8975493 Jun 21 10:10:38 nvmeof0 kernel: [  160.773068] nvmet: __le64 slba 0
> 8975494 Jun 21 10:10:38 nvmeof0 kernel: [  160.773071] nvmet: __le16 length 0
> 8975495 Jun 21 10:10:38 nvmeof0 kernel: [  160.773072] nvmet: __le16 control 0
> 8975496 Jun 21 10:10:38 nvmeof0 kernel: [  160.773075] nvmet: __le32 dsmgmt 0
> 8975497 Jun 21 10:10:38 nvmeof0 kernel: [  160.773077] nvmet: __le32 reftag 0
> 8975498 Jun 21 10:10:38 nvmeof0 kernel: [  160.773085] nvmet: __le16 apptag 0
> 8975499 Jun 21 10:10:38 nvmeof0 kernel: [  160.773094] nvmet: __le16 appmask 0
> 8975500 Jun 21 10:10:38 nvmeof0 kernel: [  160.773101] nvmet: bio before
> 8975501 Jun 21 10:10:38 nvmeof0 kernel: [  160.773103] nvmet: bio->bi_flags 0
> 8975502 Jun 21 10:10:38 nvmeof0 kernel: [  160.773105] nvmet: bio->bi_error 0
> 8975503 Jun 21 10:10:38 nvmeof0 kernel: [  160.773107] nvmet: bio->bi_rw 0
> 8975504 Jun 21 10:10:38 nvmeof0 kernel: [  160.773109] nvmet: bio->bi_ioprio 0
> 8975505 Jun 21 10:10:38 nvmeof0 kernel: [  160.773112] nvmet:
> bio->bi_phys_segments 0
> 8975506 Jun 21 10:10:38 nvmeof0 kernel: [  160.773114] nvmet:
> bio->bi_phys_segments 0
> 8975507 Jun 21 10:10:38 nvmeof0 kernel: [  160.773117] nvmet:
> drivers/nvme/target/io-cmd.c nvmet_execute_rw 123 sg_cnt 1
> **************8975508 Jun 21 10:10:38 nvmeof0 kernel: [  160.773124]
> kworker/u17:1(3103): READ block 0 on nvme1n1 (1 sectors) ***********
> 8975509 Jun 21 10:10:38 nvmeof0 kernel: [  160.773134] nvmet:
> drivers/nvme/target/io-cmd.c nvmet_execute_rw 146 calling blk_poll
> 8975510 Jun 21 10:10:38 nvmeof0 kernel: [  160.773136] nvmet: bio after
> 8975511 Jun 21 10:10:38 nvmeof0 kernel: [  160.773138] nvmet: bio->bi_flags 2
> 8975512 Jun 21 10:10:38 nvmeof0 kernel: [  160.773140] nvmet: bio->bi_error 0
> 8975513 Jun 21 10:10:38 nvmeof0 kernel: [  160.773142] nvmet: bio->bi_rw 0
> 8975514 Jun 21 10:10:38 nvmeof0 kernel: [  160.773144] nvmet: bio->bi_ioprio 0
> 8975515 Jun 21 10:10:38 nvmeof0 kernel: [  160.773146] nvmet:
> bio->bi_phys_segments 1
> 8975516 Jun 21 10:10:38 nvmeof0 kernel: [  160.773149] nvmet:
> bio->bi_phys_segments 1
> 8975517 Jun 21 10:10:38 nvmeof0 kernel: [  160.773165] nvmet:
> drivers/nvme/target/io-cmd.c nvmet_bio_done 29
> 8975518 Jun 21 10:10:38 nvmeof0 kernel: [  160.773166] nvmet: block io complete
> 8975519 Jun 21 10:10:38 nvmeof0 kernel: [  160.773168] nvmet:
> drivers/nvme/target/core.c nvmet_req_complete 427
> 8975520 Jun 21 10:10:38 nvmeof0 kernel: [  160.773169] nvmet: request
> is complete.
> 8975521 Jun 21 10:10:38 nvmeof0 kernel: [  160.773170] nvmet:
> drivers/nvme/target/core.c __nvmet_req_complete 406
> 8975522 Jun 21 10:10:38 nvmeof0 kernel: [  160.773172] nvmet:
> __nvmet_req_complete status is zero 0
> 8975523 Jun 21 10:10:38 nvmeof0 kernel: [  160.773173] nvmet:
> __nvmet_req_complete status not set on request!!
>
> On Tue, Jun 21, 2016 at 9:55 AM, Robert Randall
> <robert.r.randall@gmail.com> wrote:
>> Correct, we don't use in-capsule data.  What is confusing is that the
>> reads are working fine without in-capsule data.  The writes are
>> failing silently.  Nothing in the protocol indicates an error but the
>> bits never get to the device.
>>
>> On Tue, Jun 21, 2016@9:19 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>>>
>>>> Hello all,
>>>>
>>>> We were making great progress on a Windows version of the NVMe over
>>>> Fabrics host / initiator / client until we ran into something very
>>>> interesting.  All of read/write requests from the client execute without any
>>>> errors reported on either side, the Linux server is happy and the Windows
>>>> client is happy.  The interesting part is that the writes never make it to
>>>> the device.  The write completes with success but a subsequent read does not
>>>> return what was written.  We've verified the command packets, etc. and the
>>>> writes look just fine and do succeed but the data never makes it to the
>>>> device.  The server reports no errors and returns success.
>>>>
>>>> If we use the Linux host to write data to the device or just 'dd' a good
>>>> partition table to the device we can read it without any problems from
>>>> Windows.
>>>>
>>>> I could use some advice on the best strategy to track this down.
>>>>
>>>> The traffic generated by the Linux host and Windows host are quite
>>>> similar.  The only difference that might be interesting are the RDMA
>>>> addresses.  On Linux they tend to be low in the address range.  On Windows
>>>> they tend to be high in the address range.  But this is true of the reads as
>>>> well.
>>>>
>>>> A sample write from the Linux host:
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet:
>>>> drivers/nvme/target/core.c nvmet_req_init 499
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct
>>>> nvme_common_command {
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr
>>>> { nvme_keyed_sgl_desc
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet:
>>>> nvme_keyed_sgl_desc.__le64 addr 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 length[0] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 length[1] 10
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 length[2] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[0] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[1] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[2] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[3] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 type 1
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
>>>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0
>>>>
>>>> A sample write from the Windows host:
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct
>>>> nvme_common_command {
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr
>>>> { nvme_keyed_sgl_desc
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet:
>>>> nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 length[0] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 length[1] 2
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 length[2] 0
>>>
>>>
>>> Looks like the writes differ in length (4k vs. 512).
>>>
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[0] A
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[1] 64
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[2] 1
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 key[3] 92
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet:
>>>> nvme_keyed_sgl_desc.__u8 type 40
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
>>>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0
>>>
>>>
>>> Looks like you never use in-capsule data (which we do in linux for 4k
>>> or less), you passed to the target:
>>> {addr=0xFFFFD0009BB95040, length=0x200, key=0x9201640a}
>>>
>>> Are you sure that this is the buffer address you have registered with
>>> the device? Do you see the actual data travel the wire?
>>>
>>>
>>> _______________________________________________
>>> Linux-nvme mailing list
>>> Linux-nvme at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>
>>
>>
>> --
>> Robert Randall | robert.r.randall at gmail.com
>
>
>
> --
> Robert Randall | robert.r.randall at gmail.com



-- 
Robert Randall | robert.r.randall at gmail.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-21 16:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-21  1:50 Windows NVMeoF client - debug assistance requested Robert Randall (rrandall)
2016-06-21 13:56 ` Steve Wise
2016-06-21 14:16   ` Robert Randall (rrandall)
2016-06-21 14:22     ` Steve Wise
2016-06-21 14:19 ` Sagi Grimberg
2016-06-21 14:55   ` Robert Randall
2016-06-21 15:20     ` Robert Randall
2016-06-21 16:02       ` Christoph Hellwig
2016-06-21 16:02       ` Robert Randall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.