linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, Ingo Molnar <mingo@redhat.com>,
	John Ogness <john.ogness@linutronix.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Petr Mladek <pmladek@suse.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v3 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
Date: Wed, 28 Jun 2023 15:56:20 +0200	[thread overview]
Message-ID: <ZJw8BFHj6YD2Tl6x@dhcp22.suse.cz> (raw)
In-Reply-To: <20230623201517.yw286Knb@linutronix.de>

Andrew, it seems that we have a consensus on the MM side of things that
this is good enough to go. I am not sure about patch 1, that is more on
lockdep people but I think that this patch is good enough on this own.
Can we get this patch merged into mm tree and see whether any of Tetsuo
concerns pop out?

On Fri 23-06-23 22:15:17, Sebastian Andrzej Siewior wrote:
> __build_all_zonelists() acquires zonelist_update_seq by first disabling
> interrupts via local_irq_save() and then acquiring the seqlock with
> write_seqlock(). This is troublesome and leads to problems on
> PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping
> lock on PREEMPT_RT and must not be acquired with disabled interrupts.
> 
> The API provides write_seqlock_irqsave() which does the right thing in
> one step.
> printk_deferred_enter() has to be invoked in non-migrate-able context to
> ensure that deferred printing is enabled and disabled on the same CPU.
> This is the case after zonelist_update_seq has been acquired.
> 
> There was discussion on the first submission that the order should be:
> 	local_irq_disable();
> 	printk_deferred_enter();
> 	write_seqlock();
> 
> to avoid pitfalls like having an unaccounted printk() coming from
> write_seqlock_irqsave() before printk_deferred_enter() is invoked. The
> only origin of such a printk() can be a lockdep splat because the
> lockdep annotation happens after the sequence count is incremented.
> This is exceptional and subject to change.
> 
> It was also pointed that PREEMPT_RT can be affected by the printk
> problem since its write_seqlock_irqsave() does not really disable
> interrupts. This isn't the case because PREEMPT_RT's printk
> implementation differs from the mainline implementation in two important
> aspects:
> - Printing happens in a dedicated threads and not at during the
>   invocation of printk().
> - In emergency cases where synchronous printing is used, a different
>   driver is used which does not use tty_port::lock.
> 
> Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer
> printk output.
> 
> Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> v2…v3
>   - Update comment as per Michal's suggestion.
> 
> v1…v2:
>   - Improve commit description
> 
>  mm/page_alloc.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 47421bedc12b7..440e9af67b48d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5808,19 +5808,17 @@ static void __build_all_zonelists(void *data)
>  	unsigned long flags;
>  
>  	/*
> -	 * Explicitly disable this CPU's interrupts before taking seqlock
> -	 * to prevent any IRQ handler from calling into the page allocator
> -	 * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
> +	 * The zonelist_update_seq must be acquired with irqsave because the
> +	 * reader can be invoked from IRQ with GFP_ATOMIC.
>  	 */
> -	local_irq_save(flags);
> +	write_seqlock_irqsave(&zonelist_update_seq, flags);
>  	/*
> -	 * Explicitly disable this CPU's synchronous printk() before taking
> -	 * seqlock to prevent any printk() from trying to hold port->lock, for
> +	 * Also disable synchronous printk() to prevent any printk() from
> +	 * trying to hold port->lock, for
>  	 * tty_insert_flip_string_and_push_buffer() on other CPU might be
>  	 * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
>  	 */
>  	printk_deferred_enter();
> -	write_seqlock(&zonelist_update_seq);
>  
>  #ifdef CONFIG_NUMA
>  	memset(node_load, 0, sizeof(node_load));
> @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data)
>  #endif
>  	}
>  
> -	write_sequnlock(&zonelist_update_seq);
>  	printk_deferred_exit();
> -	local_irq_restore(flags);
> +	write_sequnlock_irqrestore(&zonelist_update_seq, flags);
>  }
>  
>  static noinline void __init
> -- 
> 2.40.1

-- 
Michal Hocko
SUSE Labs


  parent reply	other threads:[~2023-06-28 13:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-23 17:12 [PATCH v2 0/2] seqlock,mm: lockdep annotation + write_seqlock_irqsave() Sebastian Andrzej Siewior
2023-06-23 17:12 ` [PATCH v2 1/2] seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() Sebastian Andrzej Siewior
2023-06-24  6:54   ` Tetsuo Handa
2023-06-26  8:12     ` Sebastian Andrzej Siewior
2023-06-26  9:25       ` Tetsuo Handa
2023-06-26 10:48         ` Peter Zijlstra
2023-06-26 11:26           ` Tetsuo Handa
2023-06-26 11:35             ` Michal Hocko
2023-06-26 12:27               ` Tetsuo Handa
2023-06-26 13:16                 ` Michal Hocko
2023-06-26 12:46               ` Sebastian Andrzej Siewior
2023-06-26 13:13           ` Sebastian Andrzej Siewior
2023-06-26 14:44       ` Petr Mladek
2023-06-28 12:14         ` Tetsuo Handa
2023-07-27 15:10           ` Sebastian Andrzej Siewior
2023-07-29  5:31             ` Tetsuo Handa
2023-07-29 11:05               ` Tetsuo Handa
2023-07-31 14:25                 ` Michal Hocko
2023-08-03 13:18                   ` Tetsuo Handa
2023-08-03 14:49                     ` Michal Hocko
2023-08-04 13:27                       ` Tetsuo Handa
2023-08-07  8:20                         ` Michal Hocko
2023-06-26 12:56   ` Mel Gorman
2023-06-23 17:12 ` [PATCH v2 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save() Sebastian Andrzej Siewior
2023-06-23 18:17   ` Michal Hocko
2023-06-23 20:15     ` [PATCH v3 " Sebastian Andrzej Siewior
2023-06-26  7:56       ` David Hildenbrand
2023-06-26 13:14       ` Mel Gorman
2023-06-28 13:56       ` Michal Hocko [this message]
2023-06-25  2:27 ` [PATCH v2 0/2] seqlock,mm: lockdep annotation + write_seqlock_irqsave() Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJw8BFHj6YD2Tl6x@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=john.ogness@linutronix.de \
    --cc=lgoncalv@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).