linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-mm@kvack.org,
	"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Michal Hocko <mhocko@suse.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
Date: Thu, 22 Jun 2023 17:04:43 +0200	[thread overview]
Message-ID: <ZJRjC0e_nmXQRirr@alley> (raw)
In-Reply-To: <ZJRWm0beSphxKsKA@alley>

On Thu 2023-06-22 16:11:41, Petr Mladek wrote:
> On Thu 2023-06-22 22:36:27, Tetsuo Handa wrote:
> > On 2023/06/22 8:24, Tetsuo Handa wrote:
> > > By the way, given
> > > 
> > >   write_seqlock_irqsave(&zonelist_update_seq, flags);
> > >   <<IRQ>>
> > >     some_timer_function() {
> > >       kmalloc(GFP_ATOMIC);
> > >     }
> > >   <</IRQ>>
> > >   printk_deferred_enter();
> > > 
> > > scenario in CONFIG_PREEMPT_RT=y case is handled by executing some_timer_function()
> > > on a dedicated kernel thread for IRQs, what guarantees that the kernel thread for
> > > IRQs gives up CPU and the user thread which called write_seqlock() gains CPU until
> > > write_sequnlock() is called? How can the kernel figure out that executing the user
> > > thread needs higher priority than the kernel thread?

My understanding is that this is achieved by spin_lock_irqsave(&sl->lock, flags).
When RT is enabled then	rt_spin_lock(lock) is used.

AFAIK, rt_spin_lock(lock) fulfills exactly the above requirements.
The owner could schedule. The waiter could schedule as well so that
they could be running on the same CPU. Also the current owner gets
higher priority when the is a waiter with the higher priority to avoid
the priority inversion.

> > I haven't got response on this question.
> > 
> > Several years ago, I demonstrated that a SCHED_IDLE priority userspace thread holding
> > oom_lock causes other concurrently allocating !SCHED_IDLE priority threads to
> > misunderstand that mutex_trylock(&oom_lock) failure implies we are making forward
> > progress (despite the SCHED_IDLE priority userspace thread was unable to wake up for
> > minutes).
> > 
> > If a SCHED_IDLE priority thread which called write_seqlock_irqsave() is preempted by
> > some other !SCHED_IDLE priority threads (especially realtime priority threads), and
> > such !SCHED_IDLE priority thread calls kmalloc(GFP_ATOMIC) or printk(), a similar thing
> > (misunderstand that spinning on read_seqbegin() from zonelist_iter_begin() can make
> > forward progress despite a thread which called write_seqlock_irqsave() cannot make
> > progress due to preemption) can happen.
> > 
> > Question to Sebastian:
> > To make sure that such thing cannot happen, we should make sure that
> > a thread which entered write_seqcount_begin(&zonelist_update_seq.seqcount) from 
> > write_seqlock_irqsave(&zonelist_update_seq, flags) can continue using CPU until
> > write_seqcount_end(&zonelist_update_seq.seqcount) from
> > write_seqlock_irqrestore(&zonelist_update_seq, flags).
> > Does adding preempt_disable() before write_seqlock(&zonelist_update_seq, flags) help?
> > 
> > Question to Peter:
> > Even if local_irq_save(flags) disables IRQ, NMI context can enqueue message via printk().
> > When does the message enqueued from NMI context gets printed?
> 
> They are flushed to the console either by irq_work or by another
> printk(). The irq_work could not be proceed when IRQs are disabled.
> But another non-deferred printk() would try to flush them immediately.
> 
> > If there is a possibility
> > that the message enqueued from NMI context gets printed between
> > "write_seqlock_irqsave(&zonelist_update_seq, flags) and printk_deferred_enter()" or
> > "printk_deferred_exit() and write_sequnlock_irqrestore(&zonelist_update_seq, flags)" ?
> > If yes, we can't increment zonelist_update_seq.seqcount before printk_deferred_enter()...
> 
> It might happen when a printk() is called in these holes.

I believe that this hole is the only remaining problem. IMHO, the only
solution is to disable migration/preemtion in printk_deferred_enter().

It is not a big deal, really. __rt_spin_lock() would call
migrate_disable() anyway. And nested migrate_disable() is fast.
It just increments the counter when it is not 0.

I would suggest to do this change in printk_deferred_enter() first.
It will allows to call it before write_seqlock_irqsave(). And we
will not need to change the ordering back and forth.

The result would look like:

in kernel/linux/printk.h:

static inline void printk_deferred_enter(void)
{
	if (!defined(CONFIG_PREEMPT_RT))
		preempt_disable();
	else
		migrate_disable();

	__printk_safe_enter();
}

in mm/page_alloc.c

	printk_deferred_enter();
	write_seqlock_irqsafe(&zonelist_update_seq, flags);


Best Regards,
Petr


  parent reply	other threads:[~2023-06-22 15:05 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-21 10:40 [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save() Sebastian Andrzej Siewior
2023-06-21 10:59 ` Michal Hocko
2023-06-21 11:16   ` Sebastian Andrzej Siewior
2023-06-21 11:49     ` Michal Hocko
2023-06-21 13:11       ` Sebastian Andrzej Siewior
2023-06-21 13:22         ` Michal Hocko
2023-06-21 13:25           ` Sebastian Andrzej Siewior
2023-06-21 11:14 ` David Hildenbrand
2023-06-21 11:33 ` Tetsuo Handa
2023-06-21 12:40   ` Petr Mladek
2023-06-21 13:08     ` Sebastian Andrzej Siewior
2023-06-21 13:06   ` Sebastian Andrzej Siewior
2023-06-21 13:32     ` Tetsuo Handa
2023-06-21 14:34       ` Sebastian Andrzej Siewior
2023-06-21 14:50         ` Tetsuo Handa
2023-06-21 23:24           ` Tetsuo Handa
2023-06-22  7:18             ` Michal Hocko
2023-06-22 10:58               ` Tetsuo Handa
2023-06-22 12:09                 ` Michal Hocko
2023-06-22 13:36             ` Tetsuo Handa
2023-06-22 14:11               ` Petr Mladek
2023-06-22 14:28                 ` Tetsuo Handa
2023-06-23  9:35                   ` Sebastian Andrzej Siewior
2023-06-22 15:04                 ` Petr Mladek [this message]
2023-06-22 15:43                   ` Tetsuo Handa
2023-06-23  9:45                     ` Sebastian Andrzej Siewior
2023-06-23  9:51                       ` Tetsuo Handa
2023-06-23 10:11                         ` Sebastian Andrzej Siewior
2023-06-23 10:36                           ` Tetsuo Handa
2023-06-23 12:44                             ` Sebastian Andrzej Siewior
2023-06-23 12:57                               ` Michal Hocko
2023-06-23 10:53                           ` Petr Mladek
2023-06-23 11:16                             ` Tetsuo Handa
2023-06-23 13:31                             ` Sebastian Andrzej Siewior
2023-06-23 15:38                               ` Petr Mladek
2023-06-23 16:04                                 ` Sebastian Andrzej Siewior
2023-06-23  9:31               ` Sebastian Andrzej Siewior
2023-06-23  7:27           ` Sebastian Andrzej Siewior
2023-06-21 15:38         ` Petr Mladek
2023-06-23  8:12           ` Sebastian Andrzej Siewior
2023-06-23  9:21             ` Michal Hocko
2023-06-23  9:58               ` Sebastian Andrzej Siewior
2023-06-23 10:43                 ` Michal Hocko
2023-06-23 10:45                 ` Sebastian Andrzej Siewior
2023-06-23 10:50                   ` Sebastian Andrzej Siewior
2023-06-23 11:32                   ` Michal Hocko
2023-06-23 10:40             ` Petr Mladek
2023-06-23 13:24               ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJRjC0e_nmXQRirr@alley \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=lgoncalv@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).