All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>
Subject: Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full
Date: Wed, 20 Mar 2024 17:15:48 +0100	[thread overview]
Message-ID: <ZfsLtMijRrNZfqh6@localhost.localdomain> (raw)
In-Reply-To: <1b5752c8-ef32-4ed4-b539-95d507ec99ce@paulmck-laptop>

Le Wed, Mar 20, 2024 at 04:14:24AM -0700, Paul E. McKenney a écrit :
> On Tue, Mar 19, 2024 at 02:18:00AM -0700, Paul E. McKenney wrote:
> > On Tue, Mar 19, 2024 at 12:07:29AM +0100, Frederic Weisbecker wrote:
> > > While running in nohz_full mode, a task may enqueue a timer while the
> > > tick is stopped. However the only places where the timer wheel,
> > > alongside the timer migration machinery's decision, may reprogram the
> > > next event accordingly with that new timer's expiry are the idle loop or
> > > any IRQ tail.
> > > 
> > > However neither the idle task nor an interrupt may run on the CPU if it
> > > resumes busy work in userspace for a long while in full dynticks mode.
> > > 
> > > To solve this, the timer enqueue path raises a self-IPI that will
> > > re-evaluate the timer wheel on its IRQ tail. This asynchronous solution
> > > avoids potential locking inversion.
> > > 
> > > This is supposed to happen both for local and global timers but commit:
> > > 
> > > 	b2cf7507e186 ("timers: Always queue timers on the local CPU")
> > > 
> > > broke the global timers case with removing the ->is_idle field handling
> > > for the global base. As a result, global timers enqueue may go unnoticed
> > > in nohz_full.
> > > 
> > > Fix this with restoring the idle tracking of the global timer's base,
> > > allowing self-IPIs again on enqueue time.
> > 
> > Testing with the previous patch (1/2 in this series) reduced the number of
> > problems by about an order of magnitude, down to two sched_tick_remote()
> > instances and one enqueue_hrtimer() instance, very good!
> > 
> > I have kicked off a test including this patch.  Here is hoping!  ;-)
> 
> And 22*100 hours of TREE07 got me one run with a sched_tick_remote()
> complaint and another run with a starved RCU grace-period kthread.
> So this is definitely getting more reliable, but still a little ways
> to go.

Right, there is clearly something else. Investigation continues...

  reply	other threads:[~2024-03-20 16:15 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 23:07 [PATCH 0/2] timers: More fixes Frederic Weisbecker
2024-03-18 23:07 ` [PATCH 1/2] timers/migration: Fix endless timer requeue after idle interrupts Frederic Weisbecker
2024-03-21 11:24   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
2024-03-18 23:07 ` [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Frederic Weisbecker
2024-03-19  9:18   ` Paul E. McKenney
2024-03-20 11:14     ` Paul E. McKenney
2024-03-20 16:15       ` Frederic Weisbecker [this message]
2024-03-20 22:55         ` Paul E. McKenney
2024-03-21 11:42           ` Frederic Weisbecker
2024-03-21 12:47             ` Paul E. McKenney
2024-03-22 11:32               ` Frederic Weisbecker
2024-03-22 13:22                 ` for_each_domain()/sched_domain_span() has offline CPUs (was Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full) Frederic Weisbecker
2024-03-26 16:46                   ` Valentin Schneider
2024-03-27 12:42                     ` Frederic Weisbecker
2024-03-27 14:28                       ` Valentin Schneider
2024-03-28 14:08                         ` Valentin Schneider
2024-03-28 16:58                           ` Frederic Weisbecker
2024-03-28 20:31                             ` Valentin Schneider
2024-03-27 20:42                     ` Thomas Gleixner
2024-03-28 20:39                       ` Valentin Schneider
2024-03-29  2:08                         ` Tejun Heo
2024-03-29 17:06                           ` Waiman Long
2024-04-01 21:26               ` [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Paul E. McKenney
2024-04-01 21:56                 ` Frederic Weisbecker
2024-04-02  0:04                   ` Paul E. McKenney
2024-04-02 16:47                     ` Paul E. McKenney
2024-04-03 18:05                       ` Paul E. McKenney
2024-03-21 11:24   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfsLtMijRrNZfqh6@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.