linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [bug report] blktests nvme/tcp nvme/014 failed due to Read/Flush timeout
@ 2024-04-22  4:55 Yi Zhang
  2024-04-22  8:27 ` Daniel Wagner
  0 siblings, 1 reply; 2+ messages in thread
From: Yi Zhang @ 2024-04-22  4:55 UTC (permalink / raw)
  To: open list:NVM EXPRESS DRIVER
  Cc: Chaitanya Kulkarni, Ming Lei, Shinichiro Kawasaki, Sagi Grimberg

Hello

I found nvme/014 failed[1] on one of my servers, seems it was due to
Read/Flush timeout during the test[2], I tried to add sync operation
after the dd operation[3] and the failure can be fixed now and no
timeout from dmesg, could anyone help check if it's one test issue or
nvme/tcp stack issue, thanks.

[1]
# uname -r
6.9.0-rc4+
# nvme_trtype=tcp ./check nvme/014
nvme/014 (flush a NVMeOF block device-backed ns)             [failed]
    runtime  33.432s  ...  41.334s
    --- tests/nvme/014.out 2024-04-19 00:02:03.596691663 -0400
    +++ /root/blktests/results/nodev/nvme/014.out.bad 2024-04-22
00:10:10.042478026 -0400
    @@ -1,4 +1,4 @@
     Running nvme/014
    -NVMe Flush: success
    +NVMe status: Command Aborted By Host: The command was aborted as
a result of host action(0x371)
     disconnected 1 controller(s)
     Test complete
[2]
# dmesg
[  724.690961] run blktests nvme/014 at 2024-04-22 00:09:28
[  724.706957] loop0: detected capacity change from 0 to 2097152
[  724.715174] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  724.724103] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  724.735368] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  724.749633] nvme nvme1: creating 96 I/O queues.
[  724.757813] nvme nvme1: mapped 96/0/0 default/read/poll queues.
[  724.789035] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  758.518247] nvme nvme1: I/O tag 1 (0001) type 4 opcode 0x2 (I/O
Cmd) QID 86 timeout
[  758.525908] nvme nvme1: starting error recovery
[  758.530447] nvme nvme1: I/O tag 81 (0051) type 4 opcode 0x0 (I/O
Cmd) QID 87 timeout
[  758.538914] nvme1n1: I/O Cmd(0x2) @ LBA 2097024, 8 blocks, I/O
Error (sct 0x3 / sc 0x71)
[  758.547101] I/O error, dev nvme1n1, sector 2097024 op 0x0:(READ)
flags 0x80700 phys_seg 1 prio class 0
[  758.562940] nvme nvme1: Reconnecting in 10 seconds...
[  758.568093] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  758.572165] nvme1n1: I/O Cmd(0x0) @ LBA 2097024, 8 blocks, I/O
Error (sct 0x3 / sc 0x70)
[  758.582367] Buffer I/O error on dev nvme1n1, logical block 262128,
async page read
[  758.618831] nvme nvme1: Property Set error: 880, offset 0x14

[3]
diff --git a/tests/nvme/014 b/tests/nvme/014
index 839b91f..3f6a68f 100755
--- a/tests/nvme/014
+++ b/tests/nvme/014
@@ -37,7 +37,7 @@ test() {

        dd if=/dev/urandom of="/dev/${ns}" \
                count="${count}" bs="${bs}" status=none
-
+       sync
        nvme flush "/dev/${ns}"

        _nvme_disconnect_subsys

-- 
Best Regards,
  Yi Zhang



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/014 failed due to Read/Flush timeout
  2024-04-22  4:55 [bug report] blktests nvme/tcp nvme/014 failed due to Read/Flush timeout Yi Zhang
@ 2024-04-22  8:27 ` Daniel Wagner
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Wagner @ 2024-04-22  8:27 UTC (permalink / raw)
  To: Yi Zhang
  Cc: open list:NVM EXPRESS DRIVER, Chaitanya Kulkarni, Ming Lei,
	Shinichiro Kawasaki, Sagi Grimberg

On Mon, Apr 22, 2024 at 12:55:33PM +0800, Yi Zhang wrote:
> I found nvme/014 failed[1] on one of my servers, seems it was due to
> Read/Flush timeout during the test[2], I tried to add sync operation
> after the dd operation[3] and the failure can be fixed now and no
> timeout from dmesg, could anyone help check if it's one test issue or
> nvme/tcp stack issue, thanks.

From your log I'd say the sync command is just timing out too early. The
write operations are just too slow. Not really sure what the test is
suppose to test here. Should it test if the flush works or if the flush
is flushing any pending I/O? In the first case, the sync would be the
right fix. In the second case I suppose we would need to consider how
log it takes to flush pending I/Os and extend the timeout accordingly?

Thoughs?


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-04-22  8:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-22  4:55 [bug report] blktests nvme/tcp nvme/014 failed due to Read/Flush timeout Yi Zhang
2024-04-22  8:27 ` Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).