All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] sbitmap: fix batching wakeup
@ 2023-07-21  9:57 Ming Lei
  2023-07-21 10:40 ` Keith Busch
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Ming Lei @ 2023-07-21  9:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, David Jeffery, Kemeng Shi, Gabriel Krisman Bertazi,
	Chengming Zhou, Jan Kara, Ming Lei

From: David Jeffery <djeffery@redhat.com>

Current code supposes that it is enough to provide forward progress by just
waking up one wait queue after one completion batch is done.

Unfortunately this way isn't enough, cause waiter can be added to
wait queue just after it is woken up.

Follows one example(64 depth, wake_batch is 8)

1) all 64 tags are active

2) in each wait queue, there is only one single waiter

3) each time one completion batch(8 completions) wakes up just one waiter in each wait
queue, then immediately one new sleeper is added to this wait queue

4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
wait queue

5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
can't be waken up anymore.

Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
single batch.

Cc: David Jeffery <djeffery@redhat.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 lib/sbitmap.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index eff4e42c425a..d0a5081dfd12 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -550,7 +550,7 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
 
 static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
 {
-	int i, wake_index;
+	int i, wake_index, woken;
 
 	if (!atomic_read(&sbq->ws_active))
 		return;
@@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
 		 */
 		wake_index = sbq_index_inc(wake_index);
 
-		/*
-		 * It is sufficient to wake up at least one waiter to
-		 * guarantee forward progress.
-		 */
-		if (waitqueue_active(&ws->wait) &&
-		    wake_up_nr(&ws->wait, nr))
-			break;
+		if (waitqueue_active(&ws->wait)) {
+			woken = wake_up_nr(&ws->wait, nr);
+			if (woken == nr)
+				break;
+			nr -= woken;
+		}
 	}
 
 	if (wake_index != atomic_read(&sbq->wake_index))
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
@ 2023-07-21 10:40 ` Keith Busch
  2023-07-21 10:50   ` Ming Lei
  2023-07-21 11:51 ` Keith Busch
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2023-07-21 10:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou, Jan Kara

On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>

...
 
> Cc: David Jeffery <djeffery@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Shouldn't the author include their sign off? Or is this supposed to be
from you?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21 10:40 ` Keith Busch
@ 2023-07-21 10:50   ` Ming Lei
  2023-07-21 17:38     ` David Jeffery
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2023-07-21 10:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou, Jan Kara, ming.lei

On Fri, Jul 21, 2023 at 12:40:19PM +0200, Keith Busch wrote:
> On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
> > From: David Jeffery <djeffery@redhat.com>
> 
> ...
>  
> > Cc: David Jeffery <djeffery@redhat.com>
> > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> Shouldn't the author include their sign off? Or is this supposed to be
> from you?

I understand signed-off-by needs to be explicit from David, and let's
focus on patch itself. And I believe David can reply with his
signed-off-by when the patch is ready to go.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
  2023-07-21 10:40 ` Keith Busch
@ 2023-07-21 11:51 ` Keith Busch
  2023-07-21 16:35 ` Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2023-07-21 11:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou, Jan Kara

On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
>  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  {
> -	int i, wake_index;
> +	int i, wake_index, woken;
>  
>  	if (!atomic_read(&sbq->ws_active))
>  		return;
> @@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  		 */
>  		wake_index = sbq_index_inc(wake_index);
>  
> -		/*
> -		 * It is sufficient to wake up at least one waiter to
> -		 * guarantee forward progress.
> -		 */
> -		if (waitqueue_active(&ws->wait) &&
> -		    wake_up_nr(&ws->wait, nr))
> -			break;
> +		if (waitqueue_active(&ws->wait)) {
> +			woken = wake_up_nr(&ws->wait, nr);
> +			if (woken == nr)
> +				break;
> +			nr -= woken;
> +		}
>  	}

This looks good. I had something similiar at one point, but after all
the churn this file had gone through, I somehow convinced myself it
wasn't necessary anymore. Your analysis and fix look correct to me!

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
  2023-07-21 10:40 ` Keith Busch
  2023-07-21 11:51 ` Keith Busch
@ 2023-07-21 16:35 ` Gabriel Krisman Bertazi
  2023-07-22  2:42   ` Ming Lei
  2023-07-21 17:29 ` Jens Axboe
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-07-21 16:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Chengming Zhou, Jan Kara

Ming Lei <ming.lei@redhat.com> writes:

> From: David Jeffery <djeffery@redhat.com>
>
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
>
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
>
> Follows one example(64 depth, wake_batch is 8)
>
> 1) all 64 tags are active
>
> 2) in each wait queue, there is only one single waiter
>
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
>
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue
>
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
>
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.

yes, I think this makes sense. When working on this algorithm I remember
I considered it (thus wake_up_nr being ready), but ended up believing it
wasn't needed.  please take:

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>

I wonder how likely it is to reach it.  Did you get a bug report?

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
                   ` (2 preceding siblings ...)
  2023-07-21 16:35 ` Gabriel Krisman Bertazi
@ 2023-07-21 17:29 ` Jens Axboe
  2023-07-21 17:40 ` Jens Axboe
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2023-07-21 17:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, David Jeffery, Kemeng Shi, Gabriel Krisman Bertazi,
	Chengming Zhou, Jan Kara

On 7/21/23 3:57?AM, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
> Follows one example(64 depth, wake_batch is 8)
> 
> 1) all 64 tags are active
> 
> 2) in each wait queue, there is only one single waiter
> 
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
> 
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue
> 
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
> 
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.

Change looks good, but commit message needs:

1) Signed-off-by from author
2) To be wrapped at ~72 chars, it's got lines that are way too long.

I can edit the commit message, but David does need to reply with his SOB
before it can get applied.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21 10:50   ` Ming Lei
@ 2023-07-21 17:38     ` David Jeffery
  0 siblings, 0 replies; 13+ messages in thread
From: David Jeffery @ 2023-07-21 17:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, Jens Axboe, linux-block, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou, Jan Kara

On Fri, Jul 21, 2023 at 6:50 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Fri, Jul 21, 2023 at 12:40:19PM +0200, Keith Busch wrote:
> > On Fri, Jul 21, 2023 at 05:57:15PM +0800, Ming Lei wrote:
> > > From: David Jeffery <djeffery@redhat.com>
> >
> > ...
> >
> > > Cc: David Jeffery <djeffery@redhat.com>
> > > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >
> > Shouldn't the author include their sign off? Or is this supposed to be
> > from you?
>
> I understand signed-off-by needs to be explicit from David, and let's
> focus on patch itself. And I believe David can reply with his
> signed-off-by when the patch is ready to go.
>
>
> Thanks,
> Ming
>

Thank you for the initial checking of the patch, Ming. I appreciate
your patience and wisdom looking over my ideas. Please add my
signed-off-by to the patch.

Signed-off-by: David Jeffery <djeffery@redhat.com>

Regards,
David Jeffery


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
                   ` (3 preceding siblings ...)
  2023-07-21 17:29 ` Jens Axboe
@ 2023-07-21 17:40 ` Jens Axboe
  2023-08-02 16:05 ` Jan Kara
  2024-01-15  9:51 ` Kemeng Shi
  6 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2023-07-21 17:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, David Jeffery, Kemeng Shi, Gabriel Krisman Bertazi,
	Chengming Zhou, Jan Kara


On Fri, 21 Jul 2023 17:57:15 +0800, Ming Lei wrote:
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
> Follows one example(64 depth, wake_batch is 8)
> 
> [...]

Applied, thanks!

[1/1] sbitmap: fix batching wakeup
      commit: 106397376c0369fcc01c58dd189ff925a2724a57

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21 16:35 ` Gabriel Krisman Bertazi
@ 2023-07-22  2:42   ` Ming Lei
  0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2023-07-22  2:42 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Chengming Zhou, Jan Kara, ming.lei

On Fri, Jul 21, 2023 at 12:35:43PM -0400, Gabriel Krisman Bertazi wrote:
> Ming Lei <ming.lei@redhat.com> writes:
> 
> > From: David Jeffery <djeffery@redhat.com>
> >
> > Current code supposes that it is enough to provide forward progress by just
> > waking up one wait queue after one completion batch is done.
> >
> > Unfortunately this way isn't enough, cause waiter can be added to
> > wait queue just after it is woken up.
> >
> > Follows one example(64 depth, wake_batch is 8)
> >
> > 1) all 64 tags are active
> >
> > 2) in each wait queue, there is only one single waiter
> >
> > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > queue, then immediately one new sleeper is added to this wait queue
> >
> > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > wait queue
> >
> > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > can't be waken up anymore.
> >
> > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > single batch.
> 
> yes, I think this makes sense. When working on this algorithm I remember
> I considered it (thus wake_up_nr being ready), but ended up believing it
> wasn't needed.  please take:
> 
> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> I wonder how likely it is to reach it.  Did you get a bug report?

It was reported from one RH customer, and I am also hit once
when running dbench on loop(bfq) over scsi_debug in my routine test.

David figured out the idea, and we discussed other solutions too, but
turns out others can't work, and the above (extreme)example seems easier
to follow, from me.

Per David's early analysis, it should be easier to trigger since commit
26edb30dd1c0 ("sbitmap: Try each queue to wake up at least one waiter")
because 'wake_index' isn't updated before calling wake_up_nr(), then
multiple completion batch may only wakeup one same wait queue, meantime
multiple sleepers are added to same wait queue.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
                   ` (4 preceding siblings ...)
  2023-07-21 17:40 ` Jens Axboe
@ 2023-08-02 16:05 ` Jan Kara
  2023-08-08  8:18   ` Ming Lei
  2024-01-15  9:51 ` Kemeng Shi
  6 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2023-08-02 16:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou, Jan Kara

On Fri 21-07-23 17:57:15, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
> Follows one example(64 depth, wake_batch is 8)
> 
> 1) all 64 tags are active
> 
> 2) in each wait queue, there is only one single waiter
> 
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
> 
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue
> 
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
> 
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.
> 
> Cc: David Jeffery <djeffery@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

I'm sorry for the delay - I was on vacation. I can see the patch got
already merged and I'm not strictly against that (although I think Gabriel
was experimenting with this exact wakeup scheme and as far as I remember
the more eager waking up was causing performance decrease for some
configurations). But let me challenge the analysis above a bit. For the
sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
fail the blk_mq_get_driver_tag() call. Which means that all tags were used
at that moment. To summarize, anytime we add any new waiter to the
waitqueue, all tags are used and thus we should eventually receive enough
wakeups to wake all of them. What am I missing?

								Honza

> ---
>  lib/sbitmap.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index eff4e42c425a..d0a5081dfd12 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -550,7 +550,7 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
>  
>  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  {
> -	int i, wake_index;
> +	int i, wake_index, woken;
>  
>  	if (!atomic_read(&sbq->ws_active))
>  		return;
> @@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  		 */
>  		wake_index = sbq_index_inc(wake_index);
>  
> -		/*
> -		 * It is sufficient to wake up at least one waiter to
> -		 * guarantee forward progress.
> -		 */
> -		if (waitqueue_active(&ws->wait) &&
> -		    wake_up_nr(&ws->wait, nr))
> -			break;
> +		if (waitqueue_active(&ws->wait)) {
> +			woken = wake_up_nr(&ws->wait, nr);
> +			if (woken == nr)
> +				break;
> +			nr -= woken;
> +		}
>  	}
>  
>  	if (wake_index != atomic_read(&sbq->wake_index))
> -- 
> 2.40.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-08-02 16:05 ` Jan Kara
@ 2023-08-08  8:18   ` Ming Lei
  2023-08-08 10:30     ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2023-08-08  8:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou, ming.lei

On Wed, Aug 02, 2023 at 06:05:53PM +0200, Jan Kara wrote:
> On Fri 21-07-23 17:57:15, Ming Lei wrote:
> > From: David Jeffery <djeffery@redhat.com>
> > 
> > Current code supposes that it is enough to provide forward progress by just
> > waking up one wait queue after one completion batch is done.
> > 
> > Unfortunately this way isn't enough, cause waiter can be added to
> > wait queue just after it is woken up.
> > 
> > Follows one example(64 depth, wake_batch is 8)
> > 
> > 1) all 64 tags are active
> > 
> > 2) in each wait queue, there is only one single waiter
> > 
> > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > queue, then immediately one new sleeper is added to this wait queue
> > 
> > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > wait queue
> > 
> > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > can't be waken up anymore.
> > 
> > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > single batch.
> > 
> > Cc: David Jeffery <djeffery@redhat.com>
> > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> I'm sorry for the delay - I was on vacation. I can see the patch got
> already merged and I'm not strictly against that (although I think Gabriel
> was experimenting with this exact wakeup scheme and as far as I remember
> the more eager waking up was causing performance decrease for some
> configurations). But let me challenge the analysis above a bit. For the
> sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
> fail the blk_mq_get_driver_tag() call. Which means that all tags were used

Here only allocating request by blk_mq_get_tag() is involved, and
getting driver tag isn't involved.

> at that moment. To summarize, anytime we add any new waiter to the
> waitqueue, all tags are used and thus we should eventually receive enough
> wakeups to wake all of them. What am I missing?

When running the final retry(__blk_mq_get_tag) before
sleeping(io_schedule()) in blk_mq_get_tag(), the sleeper has been added to
wait queue.

So when two completion batch comes, the two may wake up same wq because
same ->wake_index can be observed from two completion path, and both two
wake_up_nr() can return > 0 because adding sleeper into wq and wake_up_nr()
can be interleaved, then 16 completions just wakeup 2 sleepers added to
same wq.

If the story happens on one wq with >= 8 sleepers, io hang will be
triggered, if there are another two pending wq.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-08-08  8:18   ` Ming Lei
@ 2023-08-08 10:30     ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2023-08-08 10:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jan Kara, Jens Axboe, linux-block, David Jeffery, Kemeng Shi,
	Gabriel Krisman Bertazi, Chengming Zhou

On Tue 08-08-23 16:18:50, Ming Lei wrote:
> On Wed, Aug 02, 2023 at 06:05:53PM +0200, Jan Kara wrote:
> > On Fri 21-07-23 17:57:15, Ming Lei wrote:
> > > From: David Jeffery <djeffery@redhat.com>
> > > 
> > > Current code supposes that it is enough to provide forward progress by just
> > > waking up one wait queue after one completion batch is done.
> > > 
> > > Unfortunately this way isn't enough, cause waiter can be added to
> > > wait queue just after it is woken up.
> > > 
> > > Follows one example(64 depth, wake_batch is 8)
> > > 
> > > 1) all 64 tags are active
> > > 
> > > 2) in each wait queue, there is only one single waiter
> > > 
> > > 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> > > queue, then immediately one new sleeper is added to this wait queue
> > > 
> > > 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> > > wait queue
> > > 
> > > 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> > > can't be waken up anymore.
> > > 
> > > Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> > > single batch.
> > > 
> > > Cc: David Jeffery <djeffery@redhat.com>
> > > Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> > > Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> > > Cc: Chengming Zhou <zhouchengming@bytedance.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > 
> > I'm sorry for the delay - I was on vacation. I can see the patch got
> > already merged and I'm not strictly against that (although I think Gabriel
> > was experimenting with this exact wakeup scheme and as far as I remember
> > the more eager waking up was causing performance decrease for some
> > configurations). But let me challenge the analysis above a bit. For the
> > sleeper to be added to a waitqueue in step 3), blk_mq_mark_tag_wait() must
> > fail the blk_mq_get_driver_tag() call. Which means that all tags were used
> 
> Here only allocating request by blk_mq_get_tag() is involved, and
> getting driver tag isn't involved.
> 
> > at that moment. To summarize, anytime we add any new waiter to the
> > waitqueue, all tags are used and thus we should eventually receive enough
> > wakeups to wake all of them. What am I missing?
> 
> When running the final retry(__blk_mq_get_tag) before
> sleeping(io_schedule()) in blk_mq_get_tag(), the sleeper has been added to
> wait queue.
> 
> So when two completion batch comes, the two may wake up same wq because
> same ->wake_index can be observed from two completion path, and both two
> wake_up_nr() can return > 0 because adding sleeper into wq and wake_up_nr()
> can be interleaved, then 16 completions just wakeup 2 sleepers added to
> same wq.
> 
> If the story happens on one wq with >= 8 sleepers, io hang will be
> triggered, if there are another two pending wq.

OK, thanks for explanation! I think I see the problem now.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] sbitmap: fix batching wakeup
  2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
                   ` (5 preceding siblings ...)
  2023-08-02 16:05 ` Jan Kara
@ 2024-01-15  9:51 ` Kemeng Shi
  6 siblings, 0 replies; 13+ messages in thread
From: Kemeng Shi @ 2024-01-15  9:51 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, David Jeffery, Gabriel Krisman Bertazi,
	Chengming Zhou, Jan Kara



on 7/21/2023 5:57 PM, Ming Lei wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> Current code supposes that it is enough to provide forward progress by just
> waking up one wait queue after one completion batch is done.
> 
> Unfortunately this way isn't enough, cause waiter can be added to
> wait queue just after it is woken up.
> 
Sorry for missed this. The change looks good and makes code simpler. But the
issue is not supposed to heppen in real world.
> Follows one example(64 depth, wake_batch is 8)
> 
> 1) all 64 tags are active
> 
> 2) in each wait queue, there is only one single waiter
> 
> 3) each time one completion batch(8 completions) wakes up just one waiter in each wait
> queue, then immediately one new sleeper is added to this wait queue
> 
> 4) after 64 completions, 8 waiters are wakeup, and there are still 8 waiters in each
> wait queue>
As we only wait when all tags are consumed (After sbitmap_prepare_to_wait,
we will try sbitmap_queue_get before sleep), there will be 64 active tags again
when any new sleeper is added. Then the menthioned issue should not exist.
If there was no any gain with old way to wakeup, this looks good. Otherwise
we may need to reconsider this.
Wish this helps, thanks!
> 5) after another 8 active tags are completed, only one waiter can be wakeup, and the other 7
> can't be waken up anymore.
> 
> Turns out it isn't easy to fix this problem, so simply wakeup enough waiters for
> single batch.
> 
> Cc: David Jeffery <djeffery@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> Cc: Chengming Zhou <zhouchengming@bytedance.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  lib/sbitmap.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index eff4e42c425a..d0a5081dfd12 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -550,7 +550,7 @@ EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
>  
>  static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  {
> -	int i, wake_index;
> +	int i, wake_index, woken;
>  
>  	if (!atomic_read(&sbq->ws_active))
>  		return;
> @@ -567,13 +567,12 @@ static void __sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr)
>  		 */
>  		wake_index = sbq_index_inc(wake_index);
>  
> -		/*
> -		 * It is sufficient to wake up at least one waiter to
> -		 * guarantee forward progress.
> -		 */
> -		if (waitqueue_active(&ws->wait) &&
> -		    wake_up_nr(&ws->wait, nr))
> -			break;
> +		if (waitqueue_active(&ws->wait)) {
> +			woken = wake_up_nr(&ws->wait, nr);
> +			if (woken == nr)
> +				break;
> +			nr -= woken;
> +		}
>  	}
>  
>  	if (wake_index != atomic_read(&sbq->wake_index))
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-15  9:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21  9:57 [RFC PATCH] sbitmap: fix batching wakeup Ming Lei
2023-07-21 10:40 ` Keith Busch
2023-07-21 10:50   ` Ming Lei
2023-07-21 17:38     ` David Jeffery
2023-07-21 11:51 ` Keith Busch
2023-07-21 16:35 ` Gabriel Krisman Bertazi
2023-07-22  2:42   ` Ming Lei
2023-07-21 17:29 ` Jens Axboe
2023-07-21 17:40 ` Jens Axboe
2023-08-02 16:05 ` Jan Kara
2023-08-08  8:18   ` Ming Lei
2023-08-08 10:30     ` Jan Kara
2024-01-15  9:51 ` Kemeng Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.