All of lore.kernel.org
 help / color / mirror / Atom feed
* Need help to get more information about "nvme_rdma: nvme completion status=0x4007"
@ 2017-06-20  2:14 Jie Tang
  2017-06-20  6:33 ` Sagi Grimberg
  0 siblings, 1 reply; 3+ messages in thread
From: Jie Tang @ 2017-06-20  2:14 UTC (permalink / raw)


Using the standard 4.8.1 nvme-rdma model to do the NVMf testing. Always hit by 0x4007 in the host side for stress testing.
Check the 0x4007 define as: command abort. This command will generate by NVMe SSD on the target side and pass to host.

Anyone encounter similar error during the NVMf stress testing, need your insight about why we have this error, the SSD been confirmed good when this host failed by 0x4007.  Is any exist known bug of Host nvme driver related to this ?

Thanks
TJ

[  741.511259] nvme nvme0: queue_size 64 > ctrl maxcmd 1, clamping down
[  741.511263] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 12.12.12.12:4420
[  746.611906] nvme nvme0: Device shutdown incomplete; abort shutdown
[  752.659261] nvme_rdma: param.responder_resources: 16
[  752.666421] nvme nvme0: creating 4 I/O queues.
[  752.667527] nvme_rdma: param.responder_resources: 16
[  752.669024] nvme_rdma: param.responder_resources: 16
[  752.670375] nvme_rdma: param.responder_resources: 16
[  752.671740] nvme_rdma: param.responder_resources: 16
[  752.700869] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.test", addr 12.12.12.12:4420
[ 8226.418141] perf: interrupt took too long (2515 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[10909.246852] perf: interrupt took too long (3153 > 3143), lowering kernel.perf_event_max_sample_rate to 63000
[14408.023327] nvme_rdma: nvme completion status=0x4007
[14408.023333] blk_update_request: I/O error, dev nvme0n7, sector 2244579880
[14408.023344] nvme_rdma: nvme completion status=0x4007
[14408.023345] nvme_rdma: nvme completion status=0x4007
[14408.023347] nvme_rdma: nvme completion status=0x4007
[14408.023348] nvme_rdma: nvme completion status=0x4007
[14408.023350] blk_update_request: I/O error, dev nvme0n6, sector 2253649344
[14408.023352] blk_update_request: I/O error, dev nvme0n8, sector 2245553720
[14408.023352] blk_update_request: I/O error, dev nvme0n7, sector 2244580040
[14408.023357] blk_update_request: I/O error, dev nvme0n5, sector 2224823152
[14408.023358] nvme_rdma: nvme completion status=0x4007
[14408.023358] nvme_rdma: nvme completion status=0x4007
[14408.023359] nvme_rdma: nvme completion status=0x4007



This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Need help to get more information about "nvme_rdma: nvme completion status=0x4007"
  2017-06-20  2:14 Need help to get more information about "nvme_rdma: nvme completion status=0x4007" Jie Tang
@ 2017-06-20  6:33 ` Sagi Grimberg
  2017-06-20  7:27   ` Jie Tang
  0 siblings, 1 reply; 3+ messages in thread
From: Sagi Grimberg @ 2017-06-20  6:33 UTC (permalink / raw)



> Using the standard 4.8.1 nvme-rdma model to do the NVMf testing. Always hit by 0x4007 in the host side for stress testing.
> Check the 0x4007 define as: command abort. This command will generate by NVMe SSD on the target side and pass to host.

Do you mean kernel 4.8.1?

Any chance to test with latest upstream kernel? We try our best to
backport stable fixes but no one is assigned to make sure that stable
kernels actually work.

> Anyone encounter similar error during the NVMf stress testing, need your insight about why we have this error, the SSD been confirmed good when this host failed by 0x4007.  Is any exist known bug of Host nvme driver related to this ?
> 
> Thanks
> TJ
> 
> [  741.511259] nvme nvme0: queue_size 64 > ctrl maxcmd 1, clamping down
> [  741.511263] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 12.12.12.12:4420
> [  746.611906] nvme nvme0: Device shutdown incomplete; abort shutdown

Seems like the target is not responding to shutdown.

> [  752.659261] nvme_rdma: param.responder_resources: 16
> [  752.666421] nvme nvme0: creating 4 I/O queues.

That is strange, why is a discovery controller is creating I/O queues?
Could it be that a fix in this area has missed stable?

> [  752.667527] nvme_rdma: param.responder_resources: 16
> [  752.669024] nvme_rdma: param.responder_resources: 16
> [  752.670375] nvme_rdma: param.responder_resources: 16
> [  752.671740] nvme_rdma: param.responder_resources: 16

I assume these are your prints?

> [  752.700869] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.test", addr 12.12.12.12:4420
> [ 8226.418141] perf: interrupt took too long (2515 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
> [10909.246852] perf: interrupt took too long (3153 > 3143), lowering kernel.perf_event_max_sample_rate to 63000
> [14408.023327] nvme_rdma: nvme completion status=0x4007

This appears to be the target failing the command from what it received
from its backend device. Can you please atach the target log?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Need help to get more information about "nvme_rdma: nvme completion status=0x4007"
  2017-06-20  6:33 ` Sagi Grimberg
@ 2017-06-20  7:27   ` Jie Tang
  0 siblings, 0 replies; 3+ messages in thread
From: Jie Tang @ 2017-06-20  7:27 UTC (permalink / raw)


Hi, Sagi

Yes, this is kernel 4.8.1. Plan to verify the latest stable 4.11.6 as quick test and update to you later.

For the nvme-rdma: nvme completion status=0x4007 The 0x4007 from target means command abort, but the NVMe SSD on the target side in good status.
Don't have any abnormal message in the target side.

Just want to check if anyone encounter similar issue and how they fix/avoid this.

Thanks

-----Original Message-----
From: Sagi Grimberg [mailto:sagi@grimberg.me]
Sent: 2017?6?20? 14:34
To: Jie Tang <JIET at xilinx.com>; linux-nvme at lists.infradead.org
Subject: Re: Need help to get more information about "nvme_rdma: nvme completion status=0x4007"


> Using the standard 4.8.1 nvme-rdma model to do the NVMf testing. Always hit by 0x4007 in the host side for stress testing.
> Check the 0x4007 define as: command abort. This command will generate by NVMe SSD on the target side and pass to host.

Do you mean kernel 4.8.1?

Any chance to test with latest upstream kernel? We try our best to backport stable fixes but no one is assigned to make sure that stable kernels actually work.

> Anyone encounter similar error during the NVMf stress testing, need your insight about why we have this error, the SSD been confirmed good when this host failed by 0x4007.  Is any exist known bug of Host nvme driver related to this ?
>
> Thanks
> TJ
>
> [  741.511259] nvme nvme0: queue_size 64 > ctrl maxcmd 1, clamping
> down [  741.511263] nvme nvme0: new ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery", addr 12.12.12.12:4420 [
> 746.611906] nvme nvme0: Device shutdown incomplete; abort shutdown

Seems like the target is not responding to shutdown.

> [  752.659261] nvme_rdma: param.responder_resources: 16 [  752.666421]
> nvme nvme0: creating 4 I/O queues.

That is strange, why is a discovery controller is creating I/O queues?
Could it be that a fix in this area has missed stable?

> [  752.667527] nvme_rdma: param.responder_resources: 16 [  752.669024]
> nvme_rdma: param.responder_resources: 16 [  752.670375] nvme_rdma:
> param.responder_resources: 16 [  752.671740] nvme_rdma:
> param.responder_resources: 16

I assume these are your prints?

> [  752.700869] nvme nvme0: new ctrl: NQN
> "nqn.2014-08.org.nvmexpress.test", addr 12.12.12.12:4420 [
> 8226.418141] perf: interrupt took too long (2515 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79000 [10909.246852] perf:
> interrupt took too long (3153 > 3143), lowering
> kernel.perf_event_max_sample_rate to 63000 [14408.023327] nvme_rdma:
> nvme completion status=0x4007

This appears to be the target failing the command from what it received from its backend device. Can you please atach the target log?


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-06-20  7:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20  2:14 Need help to get more information about "nvme_rdma: nvme completion status=0x4007" Jie Tang
2017-06-20  6:33 ` Sagi Grimberg
2017-06-20  7:27   ` Jie Tang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.