All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Gurtovoy <mgurtovoy@nvidia.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, <linux-nvme@lists.infradead.org>,
	<linux-block@vger.kernel.org>, <axboe@kernel.dk>,
	<sagi@grimberg.me>
Subject: Re: [PATCHv2 1/3] block: introduce rq_list_for_each_safe macro
Date: Thu, 6 Jan 2022 13:54:28 +0200	[thread overview]
Message-ID: <02a943c0-2919-a4d4-6044-7a6349b9aaf5@nvidia.com> (raw)
In-Reply-To: <20220105172625.GA3181467@dhcp-10-100-145-180.wdc.com>


On 1/5/2022 7:26 PM, Keith Busch wrote:
> On Tue, Jan 04, 2022 at 02:15:58PM +0200, Max Gurtovoy wrote:
>> This patch worked for me with 2 namespaces for NVMe PCI.
>>
>> I'll check it later on with my RDMA queue_rqs patches as well. There we have
>> also a tagset sharing with the connect_q (and not only with multiple
>> namespaces).
>>
>> But the connect_q is using a reserved tags only (for the connect commands).
>>
>> I saw some strange things that I couldn't understand:
>>
>> 1. running randread fio with libaio ioengine didn't call nvme_queue_rqs -
>> expected
>>
>> *2. running randwrite fio with libaio ioengine did call nvme_queue_rqs - Not
>> expected !!*
>>
>> *3. running randread fio with io_uring ioengine (and --iodepth_batch=32)
>> didn't call nvme_queue_rqs - Not expected !!*
>>
>> 4. running randwrite fio with io_uring ioengine (and --iodepth_batch=32) did
>> call nvme_queue_rqs - expected
>>
>> 5. *running randread fio with io_uring ioengine (and --iodepth_batch=32
>> --runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
>> jobs required "kill -9 fio" to remove refcounts from nvme_core)   - Not
>> expected !!*
>>
>> *debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
>> seconds, it appears to be stuck. Doing forceful exit of this job.
>> *
>>
>> *6. ***running randwrite fio with io_uring ioengine (and  --iodepth_batch=32
>> --runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
>> jobs required "kill -9 fio" to remove refcounts from nvme_core)   - Not
>> expected !!**
>>
>> ***debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
>> seconds, it appears to be stuck. Doing forceful exit of this job.***
>>
>>
>> any idea what could cause these unexpected scenarios ? at least unexpected
>> for me :)
> Not sure about all the scenarios. I believe it should call queue_rqs
> anytime we finish a plugged list of requests as long as the requests
> come from the same request_queue, and it's not being flushed from
> io_schedule().

I also see we have batch > 1 only in the start of the fio operation. 
After X IO operations batch size == 1 till the end of the fio.

>
> The stuck fio job might be a lost request, which is what this series
> should address. It would be unusual to see such an error happen in
> normal operation, though. I had to synthesize errors to verify the bug
> and fix.

But there is no timeout error and prints in the dmesg.

If there was a timeout prints I would expect the issue might be in the 
local NVMe device, but there isn't.

Also this phenomena doesn't happen with NVMf/RDMA code I developed locally.

>
> In any case, I'll run more multi-namespace tests to see if I can find
> any other issues with shared tags.

I believe that the above concerns are not related to the shared-tags but 
to the entire mechanism.


  reply	other threads:[~2022-01-06 11:54 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-27 16:41 [PATCHv2 1/3] block: introduce rq_list_for_each_safe macro Keith Busch
2021-12-27 16:41 ` [PATCHv2 2/3] block: introduce rq_list_move Keith Busch
2021-12-27 18:49   ` kernel test robot
2021-12-27 18:49     ` kernel test robot
2021-12-29 17:41   ` Christoph Hellwig
2021-12-29 20:59     ` Keith Busch
2021-12-27 16:41 ` [PATCHv2 3/3] nvme-pci: fix queue_rqs list splitting Keith Busch
2021-12-29 17:46   ` Christoph Hellwig
2021-12-29 21:04     ` Keith Busch
2021-12-30  7:53       ` Christoph Hellwig
2022-01-04 19:38     ` Keith Busch
2022-01-05  7:35       ` Christoph Hellwig
2021-12-29 17:39 ` [PATCHv2 1/3] block: introduce rq_list_for_each_safe macro Christoph Hellwig
2021-12-29 20:57   ` Keith Busch
2021-12-30 14:38     ` Max Gurtovoy
2021-12-30 15:30       ` Keith Busch
2022-01-03 15:23         ` Max Gurtovoy
2022-01-03 18:15           ` Keith Busch
2022-01-04 12:15             ` Max Gurtovoy
2022-01-05 17:26               ` Keith Busch
2022-01-06 11:54                 ` Max Gurtovoy [this message]
2022-01-06 13:41                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02a943c0-2919-a4d4-6044-7a6349b9aaf5@nvidia.com \
    --to=mgurtovoy@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.