All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, Omar Sandoval <osandov@fb.com>
Subject: Re: [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates
Date: Wed, 23 May 2018 17:32:31 +0800	[thread overview]
Message-ID: <20180523093225.GA32067@ming.t460p> (raw)
In-Reply-To: <7a4499e8-60bb-892a-91cf-f2d2c4be74b7@kernel.dk>

On Tue, May 22, 2018 at 09:59:17PM -0600, Jens Axboe wrote:
> On 5/19/18 1:44 AM, Ming Lei wrote:
> > When the allocation process is scheduled back and the mapped hw queue is
> > changed, do one extra wake up on orignal queue for compensating wake up
> > miss, so other allocations on the orignal queue won't be starved.
> > 
> > This patch fixes one request allocation hang issue, which can be
> > triggered easily in case of very low nr_request.
> 
> Trying to think of better ways we can fix this, but I don't see
> any right now. Getting rid of the wake_up_nr() kills us on tons
> of tasks waiting. 

I am not sure if I understand your point, but this issue isn't related
with wake_up_nr() actually, and it can be reproduced after reverting
4e5dff41be7b5201c1c47c (blk-mq: improve heavily contended tag case).

All tasks in current sbq_wait_state may be scheduled to other CPUs, and
there may still be tasks waiting for allocation from this sbitmap_queue,
and the root cause is about cross-queue allocation, as you said,
there are too many queues, :-)

> Maybe it might be possible to only go through
> the fake wakeup IFF we have a task waiting on the list, that'd
> spare us the atomic dec and cmpxchg for all cases except if we
> have a task (or more) waiting on the existing wait state.

sbq_wake_ptr() checks if there are waiting tasks, and it will do
nothing if there isn't any.

In theory, the fake wakeup is only needed when the current wakeup
is triggered by the last in-flight request, but it isn't cheap
to figure out that accurately.

Given it is quite unusual to schedule process from one CPU to another,
and the cost of process migration can be much bigger than the wakeup,
I guess we may not need to worry about the performance effect.

> 
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 336dde07b230..77607f89d205 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
> >  	ws = bt_wait_ptr(bt, data->hctx);
> >  	drop_ctx = data->ctx == NULL;
> >  	do {
> > +		struct sbitmap_queue *bt_orig;
> 
> This should be called 'bt_prev'.

OK.

> 
> > diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> > index 841585f6e5f2..b23f50355281 100644
> > --- a/include/linux/sbitmap.h
> > +++ b/include/linux/sbitmap.h
> > @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq,
> >  void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
> >  
> >  /**
> > + * sbitmap_wake_up() - Do a regular wake up compensation if the queue
> > + * allocated from is changed after scheduling back.
> > + * @sbq: Bitmap queue to wake up.
> > + */
> > +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
> 
> The blk-mq issue is bleeding into sbitmap here. This should just detail
> that this issues a wakeup, similar to how freeing a tag would

Right, will fix it in V3.

Thanks,
Ming

  reply	other threads:[~2018-05-23  9:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-19  7:44 [PATCH V2] blk-mq: avoid to starve tag allocation after allocation process migrates Ming Lei
2018-05-22 20:20 ` Jens Axboe
2018-05-23  0:22   ` Ming Lei
2018-05-23  3:35     ` Jens Axboe
2018-05-23  3:59 ` Jens Axboe
2018-05-23  9:32   ` Ming Lei [this message]
2018-05-23 17:48     ` Omar Sandoval
2018-05-23 22:09       ` Ming Lei
2018-05-23 22:19         ` Jens Axboe
2018-05-23 22:56           ` Ming Lei
2018-05-23 22:47         ` Omar Sandoval
2018-05-23 17:40 ` Omar Sandoval
2018-05-24  2:20   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180523093225.GA32067@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=osandov@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.