All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <ming.lei@redhat.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"osandov@fb.com" <osandov@fb.com>
Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
Date: Fri, 19 Jan 2018 09:41:51 -0700	[thread overview]
Message-ID: <26833249-cadf-ba9c-1128-0bcb70ceb9e1@kernel.dk> (raw)
In-Reply-To: <20180119163736.GE14827@ming.t460p>

On 1/19/18 9:37 AM, Ming Lei wrote:
> On Fri, Jan 19, 2018 at 09:27:46AM -0700, Jens Axboe wrote:
>> On 1/19/18 9:26 AM, Ming Lei wrote:
>>> On Fri, Jan 19, 2018 at 09:19:24AM -0700, Jens Axboe wrote:
>>>> On 1/19/18 9:05 AM, Ming Lei wrote:
>>>>> On Fri, Jan 19, 2018 at 08:48:55AM -0700, Jens Axboe wrote:
>>>>>> On 1/19/18 8:40 AM, Ming Lei wrote:
>>>>>>>>>> Where does the dm STS_RESOURCE error usually come from - what's exact
>>>>>>>>>> resource are we running out of?
>>>>>>>>>
>>>>>>>>> It is from blk_get_request(underlying queue), see
>>>>>>>>> multipath_clone_and_map().
>>>>>>>>
>>>>>>>> That's what I thought. So for a low queue depth underlying queue, it's
>>>>>>>> quite possible that this situation can happen. Two potential solutions
>>>>>>>> I see:
>>>>>>>>
>>>>>>>> 1) As described earlier in this thread, having a mechanism for being
>>>>>>>>    notified when the scarce resource becomes available. It would not
>>>>>>>>    be hard to tap into the existing sbitmap wait queue for that.
>>>>>>>>
>>>>>>>> 2) Have dm set BLK_MQ_F_BLOCKING and just sleep on the resource
>>>>>>>>    allocation. I haven't read the dm code to know if this is a
>>>>>>>>    possibility or not.
>>>>>>>>
>>>>>>>> I'd probably prefer #1. It's a classic case of trying to get the
>>>>>>>> request, and if it fails, add ourselves to the sbitmap tag wait
>>>>>>>> queue head, retry, and bail if that also fails. Connecting the
>>>>>>>> scarce resource and the consumer is the only way to really fix
>>>>>>>> this, without bogus arbitrary delays.
>>>>>>>
>>>>>>> Right, as I have replied to Bart, using mod_delayed_work_on() with
>>>>>>> returning BLK_STS_NO_DEV_RESOURCE(or sort of name) for the scarce
>>>>>>> resource should fix this issue.
>>>>>>
>>>>>> It'll fix the forever stall, but it won't really fix it, as we'll slow
>>>>>> down the dm device by some random amount.
>>>>>>
>>>>>> A simple test case would be to have a null_blk device with a queue depth
>>>>>> of one, and dm on top of that. Start a fio job that runs two jobs: one
>>>>>> that does IO to the underlying device, and one that does IO to the dm
>>>>>> device. If the job on the dm device runs substantially slower than the
>>>>>> one to the underlying device, then the problem isn't really fixed.
>>>>>
>>>>> I remembered that I tried this test on scsi-debug & dm-mpath over scsi-debug,
>>>>> seems not observed this issue, could you explain a bit why IO over dm-mpath
>>>>> may be slower? Because both two IO contexts call same get_request(), and
>>>>> in theory dm-mpath should be a bit quicker since it uses direct issue for
>>>>> underlying queue, without io scheduler involved.
>>>>
>>>> Because if you lose the race for getting the request, you'll have some
>>>> arbitrary delay before trying again, potentially. Compared to the direct
>>>
>>> But the restart still works, one request is completed, then the queue
>>> is return immediately because we use mod_delayed_work_on(0), so looks
>>> no such issue.
>>
>> There are no pending requests for this case, nothing to restart the
>> queue. When you fail that blk_get_request(), you are idle, nothing
>> is pending.
> 
> I think we needn't worry about that, once a device is attached to
> dm-rq, it can't be mounted any more, and usually user don't use the device
> directly and by dm-mpath at the same time.

Even if it doesn't happen for a normal dm setup, it is a case that
needs to be handled. The request allocation is just one example of
a wider scope resource that can be unavailable. If the driver returns
NO_DEV_RESOURCE (or whatever name), it will be a possibility that
the device itself is currently idle.

-- 
Jens Axboe

WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <ming.lei@redhat.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	Bart Van Assche <Bart.VanAssche@wdc.com>,
	"osandov@fb.com" <osandov@fb.com>
Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
Date: Fri, 19 Jan 2018 09:41:51 -0700	[thread overview]
Message-ID: <26833249-cadf-ba9c-1128-0bcb70ceb9e1@kernel.dk> (raw)
In-Reply-To: <20180119163736.GE14827@ming.t460p>

On 1/19/18 9:37 AM, Ming Lei wrote:
> On Fri, Jan 19, 2018 at 09:27:46AM -0700, Jens Axboe wrote:
>> On 1/19/18 9:26 AM, Ming Lei wrote:
>>> On Fri, Jan 19, 2018 at 09:19:24AM -0700, Jens Axboe wrote:
>>>> On 1/19/18 9:05 AM, Ming Lei wrote:
>>>>> On Fri, Jan 19, 2018 at 08:48:55AM -0700, Jens Axboe wrote:
>>>>>> On 1/19/18 8:40 AM, Ming Lei wrote:
>>>>>>>>>> Where does the dm STS_RESOURCE error usually come from - what's exact
>>>>>>>>>> resource are we running out of?
>>>>>>>>>
>>>>>>>>> It is from blk_get_request(underlying queue), see
>>>>>>>>> multipath_clone_and_map().
>>>>>>>>
>>>>>>>> That's what I thought. So for a low queue depth underlying queue, it's
>>>>>>>> quite possible that this situation can happen. Two potential solutions
>>>>>>>> I see:
>>>>>>>>
>>>>>>>> 1) As described earlier in this thread, having a mechanism for being
>>>>>>>>    notified when the scarce resource becomes available. It would not
>>>>>>>>    be hard to tap into the existing sbitmap wait queue for that.
>>>>>>>>
>>>>>>>> 2) Have dm set BLK_MQ_F_BLOCKING and just sleep on the resource
>>>>>>>>    allocation. I haven't read the dm code to know if this is a
>>>>>>>>    possibility or not.
>>>>>>>>
>>>>>>>> I'd probably prefer #1. It's a classic case of trying to get the
>>>>>>>> request, and if it fails, add ourselves to the sbitmap tag wait
>>>>>>>> queue head, retry, and bail if that also fails. Connecting the
>>>>>>>> scarce resource and the consumer is the only way to really fix
>>>>>>>> this, without bogus arbitrary delays.
>>>>>>>
>>>>>>> Right, as I have replied to Bart, using mod_delayed_work_on() with
>>>>>>> returning BLK_STS_NO_DEV_RESOURCE(or sort of name) for the scarce
>>>>>>> resource should fix this issue.
>>>>>>
>>>>>> It'll fix the forever stall, but it won't really fix it, as we'll slow
>>>>>> down the dm device by some random amount.
>>>>>>
>>>>>> A simple test case would be to have a null_blk device with a queue depth
>>>>>> of one, and dm on top of that. Start a fio job that runs two jobs: one
>>>>>> that does IO to the underlying device, and one that does IO to the dm
>>>>>> device. If the job on the dm device runs substantially slower than the
>>>>>> one to the underlying device, then the problem isn't really fixed.
>>>>>
>>>>> I remembered that I tried this test on scsi-debug & dm-mpath over scsi-debug,
>>>>> seems not observed this issue, could you explain a bit why IO over dm-mpath
>>>>> may be slower? Because both two IO contexts call same get_request(), and
>>>>> in theory dm-mpath should be a bit quicker since it uses direct issue for
>>>>> underlying queue, without io scheduler involved.
>>>>
>>>> Because if you lose the race for getting the request, you'll have some
>>>> arbitrary delay before trying again, potentially. Compared to the direct
>>>
>>> But the restart still works, one request is completed, then the queue
>>> is return immediately because we use mod_delayed_work_on(0), so looks
>>> no such issue.
>>
>> There are no pending requests for this case, nothing to restart the
>> queue. When you fail that blk_get_request(), you are idle, nothing
>> is pending.
> 
> I think we needn't worry about that, once a device is attached to
> dm-rq, it can't be mounted any more, and usually user don't use the device
> directly and by dm-mpath at the same time.

Even if it doesn't happen for a normal dm setup, it is a case that
needs to be handled. The request allocation is just one example of
a wider scope resource that can be unavailable. If the driver returns
NO_DEV_RESOURCE (or whatever name), it will be a possibility that
the device itself is currently idle.

-- 
Jens Axboe

  reply	other threads:[~2018-01-19 16:41 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-18  2:41 [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Ming Lei
2018-01-18 16:50 ` Bart Van Assche
2018-01-18 17:03   ` Mike Snitzer
2018-01-18 17:03     ` Mike Snitzer
2018-01-18 17:20     ` Bart Van Assche
2018-01-18 17:20       ` Bart Van Assche
2018-01-18 18:30       ` Mike Snitzer
2018-01-18 18:47         ` Bart Van Assche
2018-01-18 18:47           ` Bart Van Assche
2018-01-18 20:11           ` Jens Axboe
2018-01-18 20:11             ` Jens Axboe
2018-01-18 20:48             ` Mike Snitzer
2018-01-18 20:58               ` Bart Van Assche
2018-01-18 20:58                 ` Bart Van Assche
2018-01-18 21:23                 ` Mike Snitzer
2018-01-18 21:23                   ` Mike Snitzer
2018-01-18 21:37                   ` Laurence Oberman
2018-01-18 21:39                   ` [dm-devel] " Bart Van Assche
2018-01-18 21:39                     ` Bart Van Assche
2018-01-18 21:45                     ` Laurence Oberman
2018-01-18 21:45                       ` Laurence Oberman
2018-01-18 22:01                     ` Mike Snitzer
2018-01-18 22:18                       ` Laurence Oberman
2018-01-18 22:20                         ` Laurence Oberman
2018-01-18 22:20                           ` Laurence Oberman
2018-01-18 22:24                         ` Bart Van Assche
2018-01-18 22:24                           ` Bart Van Assche
2018-01-18 22:35                           ` Laurence Oberman
2018-01-18 22:39                             ` Jens Axboe
2018-01-18 22:55                               ` Bart Van Assche
2018-01-18 22:55                                 ` Bart Van Assche
2018-01-18 22:20                       ` Bart Van Assche
2018-01-18 22:20                         ` Bart Van Assche
2018-01-23  9:22                         ` [PATCH] block: neutralize blk_insert_cloned_request IO stall regression (was: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle) Mike Snitzer
2018-01-23 10:53                           ` Ming Lei
2018-01-23 12:15                             ` Mike Snitzer
2018-01-23 12:17                               ` Ming Lei
2018-01-23 12:43                                 ` Mike Snitzer
2018-01-23 16:43                           ` [PATCH] " Bart Van Assche
2018-01-23 16:43                             ` Bart Van Assche
2018-01-19  2:32             ` [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle Ming Lei
2018-01-19  4:02               ` Jens Axboe
2018-01-19  7:26                 ` Ming Lei
2018-01-19 15:20                   ` Bart Van Assche
2018-01-19 15:20                     ` Bart Van Assche
2018-01-19 15:25                     ` Jens Axboe
2018-01-19 15:33                     ` Ming Lei
2018-01-19 16:06                       ` Bart Van Assche
2018-01-19 16:06                         ` Bart Van Assche
2018-01-19 15:24                   ` Jens Axboe
2018-01-19 15:40                     ` Ming Lei
2018-01-19 15:40                       ` Ming Lei
2018-01-19 15:48                       ` Jens Axboe
2018-01-19 16:05                         ` Ming Lei
2018-01-19 16:19                           ` Jens Axboe
2018-01-19 16:26                             ` Ming Lei
2018-01-19 16:27                               ` Jens Axboe
2018-01-19 16:37                                 ` Ming Lei
2018-01-19 16:41                                   ` Jens Axboe [this message]
2018-01-19 16:41                                     ` Jens Axboe
2018-01-19 16:47                                     ` Mike Snitzer
2018-01-19 16:52                                       ` Jens Axboe
2018-01-19 17:05                                         ` Ming Lei
2018-01-19 17:09                                           ` Jens Axboe
2018-01-19 17:20                                             ` Ming Lei
2018-01-19 17:38                                   ` Jens Axboe
2018-01-19 18:24                                     ` Ming Lei
2018-01-19 18:24                                       ` Ming Lei
2018-01-19 18:33                                     ` Mike Snitzer
2018-01-19 23:52                                     ` Ming Lei
2018-01-20  4:27                                       ` Jens Axboe
2018-01-19 16:13                         ` Mike Snitzer
2018-01-19 16:23                           ` Jens Axboe
2018-01-19 23:57                             ` Ming Lei
2018-01-29 22:37                     ` Bart Van Assche
2018-01-19  5:09               ` Bart Van Assche
2018-01-19  5:09                 ` Bart Van Assche
2018-01-19  7:34                 ` Ming Lei
2018-01-19 19:47                   ` Bart Van Assche
2018-01-19 19:47                     ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26833249-cadf-ba9c-1128-0bcb70ceb9e1@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Bart.VanAssche@wdc.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.