All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>,
	Keith Busch <kbusch@kernel.org>,
	Yu Kuai <yukuai1@huaweicloud.com>,
	Liu Song <liusong@linux.alibaba.com>,
	Hillf Danton <hdanton@sina.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH next v2] sbitmap: fix lockup while swapping
Date: Thu, 29 Sep 2022 10:39:39 +0200	[thread overview]
Message-ID: <20220929083939.ioytch563qikyflz@quack3> (raw)
In-Reply-To: <9f68731-e699-5679-6a71-77634767b8dd@google.com>

On Tue 27-09-22 21:07:46, Hugh Dickins wrote:
> Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
> is a big improvement: without it, I had to revert to before commit
> 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
> to avoid the high system time and freezes which that had introduced.
> 
> Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
> swapping (kernel builds in low memory) on another: soon locking up in
> sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
> around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
> showing the common pattern of outer sbitmap_queue_wake_up() interrupted
> before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
> are idle, in other cases they're spinning for a lock in dd_bio_merge()):
> 
> sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
> __blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
> scsi_end_request < scsi_io_completion < scsi_finish_command <
> scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
> __irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
> _raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
> sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
> __blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
> blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
> __submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
> submit_bio < __swap_writepage < swap_writepage < pageout <
> shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
> shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
> __alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
> do_anonymous_page < __handle_mm_fault < handle_mm_fault <
> do_user_addr_fault < exc_page_fault < asm_exc_page_fault
> 
> See how the process-context sbitmap_queue_wake_up() has been interrupted,
> after bringing wait_cnt down to 0 (and in this example, after doing its
> wakeups), before advancing wake_index and refilling wake_cnt: an
> interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.
> 
> I have almost no grasp of all the possible sbitmap races, and their
> consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
> so it is better if sbq_wake_ptr() skips on to the next ws in that case:
> which fixes the lockup and shows no adverse consequence for me.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Perhaps we could add a note here like: "The check for wait_cnt being 0 is
obviously racy and ultimately can lead to lost wakeups for example when
there is only single waitqueue with waiters. However in these cases lost
wakeups are unlikely to matter and proper fix requires redesign (and
benchmarking) of batched wakeup code so let's plug the hole with this band
aid for now."

Otherwise feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> v2: - v1 to __sbq_wake_up() broke out when this happens, but
>       v2 to sbq_wake_ptr() does better by skipping on to the next.
>     - added more comment and deleted dubious Fixes attribution.
>     - and apologies to Mr Axboe and all for my axbod typo
> 
>  lib/sbitmap.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -587,7 +587,7 @@ static struct sbq_wait_state *sbq_wake_p
>  	for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
>  		struct sbq_wait_state *ws = &sbq->ws[wake_index];
>  
> -		if (waitqueue_active(&ws->wait)) {
> +		if (waitqueue_active(&ws->wait) && atomic_read(&ws->wait_cnt)) {
>  			if (wake_index != atomic_read(&sbq->wake_index))
>  				atomic_set(&sbq->wake_index, wake_index);
>  			return ws;
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2022-09-29  8:39 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-18 21:10 [PATCH next] sbitmap: fix lockup while swapping Hugh Dickins
2022-09-19 21:22 ` Keith Busch
2022-09-19 23:01   ` Hugh Dickins
2022-09-21 16:40     ` Jan Kara
2022-09-23 14:43       ` Jan Kara
2022-09-23 15:13         ` Keith Busch
2022-09-23 16:16         ` Hugh Dickins
2022-09-23 19:07           ` Keith Busch
2022-09-23 21:29             ` Hugh Dickins
2022-09-23 23:15               ` Hugh Dickins
2022-09-26 11:44                 ` Jan Kara
2022-09-26 14:08                   ` Yu Kuai
2022-09-27  3:39                   ` Hugh Dickins
2022-09-27 10:31                     ` Jan Kara
2022-09-28  3:56                       ` Hugh Dickins
2022-09-28  3:59                         ` [PATCH next v2] " Hugh Dickins
2022-09-28  4:07                           ` Hugh Dickins
2022-09-29  8:39                             ` Jan Kara [this message]
2022-09-29 19:50                               ` [PATCH next v3] " Hugh Dickins
2022-09-29 19:56                                 ` Keith Busch
2022-09-29 23:58                                 ` Jens Axboe
     [not found]               ` <20220924023047.1410-1-hdanton@sina.com>
2022-09-27  4:02                 ` [PATCH next] " Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220929083939.ioytch563qikyflz@quack3 \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liusong@linux.alibaba.com \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.