From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Boqun Feng <boqun.feng@gmail.com>, Ingo Molnar <mingo@redhat.com>,
John Ogness <john.ogness@linutronix.de>,
Mel Gorman <mgorman@techsingularity.net>,
Michal Hocko <mhocko@suse.com>,
Peter Zijlstra <peterz@infradead.org>,
Petr Mladek <pmladek@suse.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.co>, Will Deacon <will@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: [PATCH v2 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
Date: Fri, 23 Jun 2023 19:12:32 +0200 [thread overview]
Message-ID: <20230623171232.892937-3-bigeasy@linutronix.de> (raw)
In-Reply-To: <20230623171232.892937-1-bigeasy@linutronix.de>
__build_all_zonelists() acquires zonelist_update_seq by first disabling
interrupts via local_irq_save() and then acquiring the seqlock with
write_seqlock(). This is troublesome and leads to problems on
PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping
lock on PREEMPT_RT and must not be acquired with disabled interrupts.
The API provides write_seqlock_irqsave() which does the right thing in
one step.
printk_deferred_enter() has to be invoked in non-migrate-able context to
ensure that deferred printing is enabled and disabled on the same CPU.
This is the case after zonelist_update_seq has been acquired.
There was discussion on the first submission that the order should be:
local_irq_disable();
printk_deferred_enter();
write_seqlock();
to avoid pitfalls like having an unaccounted printk() coming from
write_seqlock_irqsave() before printk_deferred_enter() is invoked. The
only origin of such a printk() can be a lockdep splat because the
lockdep annotation happens after the sequence count is incremented.
This is exceptional and subject to change.
It was also pointed that PREEMPT_RT can be affected by the printk
problem since its write_seqlock_irqsave() does not really disable
interrupts. This isn't the case because PREEMPT_RT's printk
implementation differs from the mainline implementation in two important
aspects:
- Printing happens in a dedicated threads and not at during the
invocation of printk().
- In emergency cases where synchronous printing is used, a different
driver is used which does not use tty_port::lock.
Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer
printk output.
Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
mm/page_alloc.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47421bedc12b7..99b7e7d09c5c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5808,11 +5808,10 @@ static void __build_all_zonelists(void *data)
unsigned long flags;
/*
- * Explicitly disable this CPU's interrupts before taking seqlock
- * to prevent any IRQ handler from calling into the page allocator
- * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ * The zonelist_update_seq must be acquired with irqsave because the
+ * reader can be invoked from IRQ with GFP_ATOMIC.
*/
- local_irq_save(flags);
+ write_seqlock_irqsave(&zonelist_update_seq, flags);
/*
* Explicitly disable this CPU's synchronous printk() before taking
* seqlock to prevent any printk() from trying to hold port->lock, for
@@ -5820,7 +5819,6 @@ static void __build_all_zonelists(void *data)
* calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
*/
printk_deferred_enter();
- write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
memset(node_load, 0, sizeof(node_load));
@@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data)
#endif
}
- write_sequnlock(&zonelist_update_seq);
printk_deferred_exit();
- local_irq_restore(flags);
+ write_sequnlock_irqrestore(&zonelist_update_seq, flags);
}
static noinline void __init
--
2.40.1
next prev parent reply other threads:[~2023-06-23 17:12 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-23 17:12 [PATCH v2 0/2] seqlock,mm: lockdep annotation + write_seqlock_irqsave() Sebastian Andrzej Siewior
2023-06-23 17:12 ` [PATCH v2 1/2] seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() Sebastian Andrzej Siewior
2023-06-24 6:54 ` Tetsuo Handa
2023-06-26 8:12 ` Sebastian Andrzej Siewior
2023-06-26 9:25 ` Tetsuo Handa
2023-06-26 10:48 ` Peter Zijlstra
2023-06-26 11:26 ` Tetsuo Handa
2023-06-26 11:35 ` Michal Hocko
2023-06-26 12:27 ` Tetsuo Handa
2023-06-26 13:16 ` Michal Hocko
2023-06-26 12:46 ` Sebastian Andrzej Siewior
2023-06-26 13:13 ` Sebastian Andrzej Siewior
2023-06-26 14:44 ` Petr Mladek
2023-06-28 12:14 ` Tetsuo Handa
2023-07-27 15:10 ` Sebastian Andrzej Siewior
2023-07-29 5:31 ` Tetsuo Handa
2023-07-29 11:05 ` Tetsuo Handa
2023-07-31 14:25 ` Michal Hocko
2023-08-03 13:18 ` Tetsuo Handa
2023-08-03 14:49 ` Michal Hocko
2023-08-04 13:27 ` Tetsuo Handa
2023-08-07 8:20 ` Michal Hocko
2023-06-26 12:56 ` Mel Gorman
2023-06-23 17:12 ` Sebastian Andrzej Siewior [this message]
2023-06-23 18:17 ` [PATCH v2 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save() Michal Hocko
2023-06-23 20:15 ` [PATCH v3 " Sebastian Andrzej Siewior
2023-06-26 7:56 ` David Hildenbrand
2023-06-26 13:14 ` Mel Gorman
2023-06-28 13:56 ` Michal Hocko
2023-06-25 2:27 ` [PATCH v2 0/2] seqlock,mm: lockdep annotation + write_seqlock_irqsave() Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230623171232.892937-3-bigeasy@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=john.ogness@linutronix.de \
--cc=lgoncalv@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.co \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).