nvme-tcp: i/o errors and stalled host from failure to send command pdus

* nvme-tcp: i/o errors and stalled host from failure to send command pdus
@ 2022-08-30 23:10 Jonathan Nicklin
  2022-08-31  9:02 ` Sagi Grimberg
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Nicklin @ 2022-08-30 23:10 UTC (permalink / raw)
  To: linux-nvme

Hello,

We've seen sporadic I/O errors with nvme-tcp in our lab when testing
with kernel versions 5.15 through 5.19. Using a combination of tracing
events and network traces, we can see nvme_setup_cmd() trace events
that do not result in transmitted commands PDUs.

Here's a snippet of the relevant portion of the nvme trace. The
following events are the last of tens of thousands of successully
executed reads...

tar-9004    [004] ..... 22457.614117: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=53272, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760448, len=55, ctrl=0x8000, dsmgmt=7, reftag=0)
tar-9004    [004] ..... 22457.614118: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=61463, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760384, len=63, ctrl=0x8000, dsmgmt=7, reftag=0)
tar-9004    [004] ..... 22457.614119: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=16406, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760376, len=7, ctrl=0x8000, dsmgmt=7, reftag=0)
tar-9004    [004] ..... 22457.614119: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=28693, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760312, len=63, ctrl=0x8000, dsmgmt=7, reftag=0)
tar-9004    [004] ..... 22457.614119: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=40980, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760248, len=63, ctrl=0x8000, dsmgmt=7, reftag=0)

The following 5 commands are issued to nvme, but never make it to the
wire. The wire state was verified manually using tcpdump on the host and
trace events on the target. Perhaps the "need-resched" marker in the
trace and the context switch to the kernel worker are relevant
to the failure?

tar-9004    [004] .n... 22457.614171: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=61469, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760704, len=55, ctrl=0x8000, dsmgmt=7, reftag=0)
kworker/4:1H-8668    [004] ..... 22457.614183: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=45084, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760640, len=63, ctrl=0x8000, dsmgmt=7, reftag=0)
kworker/4:1H-8668    [004] ..... 22457.614184: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=28699, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760632, len=7, ctrl=0x8000, dsmgmt=7, reftag=0)
kworker/4:1H-8668    [004] ..... 22457.614184: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=40986, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760568, len=63, ctrl=0x8000, dsmgmt=7, reftag=0)
kworker/4:1H-8668    [004] ..... 22457.614184: nvme_setup_cmd:
  nvme1: disk=nvme1c1n1, qid=5, cmdid=16409, nsid=1, flags=0x0, meta=0x0,
  cmd=(nvme_cmd_read slba=15760504, len=63, ctrl=0x8000, dsmgmt=7, reftag=0)

The reads above eventually timeout and recover after reset. Here are
the related reset messages.

[22486.420606] nvme1c1n1: I/O Cmd(0x2) @ LBA 15760568, 64 blocks, I/O
Error (sct 0x3 / sc 0x71)
[22486.429326] I/O error, dev nvme1c1n1, sector 15760568 op 0x0:(READ)
flags 0x4084700 phys_seg 1 prio class 0
[22486.439262] nvme nvme1: queue 5: timeout request 0x1b type 4
[22486.461788] nvme1c1n1: I/O Cmd(0x2) @ LBA 15760632, 8 blocks, I/O
Error (sct 0x3 / sc 0x71)
[22486.470440] I/O error, dev nvme1c1n1, sector 15760632 op 0x0:(READ)
flags 0x4084700 phys_seg 1 prio class 0
[22486.480386] nvme nvme1: queue 5: timeout request 0x1c type 4
[22486.486256] nvme1c1n1: I/O Cmd(0x2) @ LBA 15760640, 64 blocks, I/O
Error (sct 0x3 / sc 0x71)
[22486.494994] I/O error, dev nvme1c1n1, sector 15760640 op 0x0:(READ)
flags 0x4084700 phys_seg 2 prio class 0
[22486.504946] nvme nvme1: queue 5: timeout request 0x1d type 4
[22486.510822] nvme1c1n1: I/O Cmd(0x2) @ LBA 15760704, 56 blocks, I/O
Error (sct 0x3 / sc 0x71)
[22486.519563] I/O error, dev nvme1c1n1, sector 15760704 op 0x0:(READ)
flags 0x4080700 phys_seg 2 prio class 0
[22486.529723] nvme1c1n1: I/O Cmd(0x2) @ LBA 15760504, 64 blocks, I/O
Error (sct 0x3 / sc 0x71)
[22486.538610] I/O error, dev nvme1c1n1, sector 15760504 op 0x0:(READ)
flags 0x4084700 phys_seg 2 prio class 0

To further aid in debugging, I resurrected Keith's nvme_tcp tracing event
patch from a ways back. If my understanding is correct,
nvme_tcp_cmd_pdu() is being called for the commands in question.

I've isolated the failure to a scenario we can reproduce at will
in about a minute of runtime. Is there any additional debug that we can
provide to help locate the bug?

Thanks,
-Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread