All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>
Subject: Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full
Date: Thu, 21 Mar 2024 12:42:09 +0100	[thread overview]
Message-ID: <ZfwdEROGFFmIbkCM@lothringen> (raw)
In-Reply-To: <6a95b6ac-6681-4492-b155-e30c19bb3341@paulmck-laptop>

On Wed, Mar 20, 2024 at 03:55:17PM -0700, Paul E. McKenney wrote:
> On Wed, Mar 20, 2024 at 05:15:48PM +0100, Frederic Weisbecker wrote:
> > Le Wed, Mar 20, 2024 at 04:14:24AM -0700, Paul E. McKenney a écrit :
> > > On Tue, Mar 19, 2024 at 02:18:00AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Mar 19, 2024 at 12:07:29AM +0100, Frederic Weisbecker wrote:
> > > > > While running in nohz_full mode, a task may enqueue a timer while the
> > > > > tick is stopped. However the only places where the timer wheel,
> > > > > alongside the timer migration machinery's decision, may reprogram the
> > > > > next event accordingly with that new timer's expiry are the idle loop or
> > > > > any IRQ tail.
> > > > > 
> > > > > However neither the idle task nor an interrupt may run on the CPU if it
> > > > > resumes busy work in userspace for a long while in full dynticks mode.
> > > > > 
> > > > > To solve this, the timer enqueue path raises a self-IPI that will
> > > > > re-evaluate the timer wheel on its IRQ tail. This asynchronous solution
> > > > > avoids potential locking inversion.
> > > > > 
> > > > > This is supposed to happen both for local and global timers but commit:
> > > > > 
> > > > > 	b2cf7507e186 ("timers: Always queue timers on the local CPU")
> > > > > 
> > > > > broke the global timers case with removing the ->is_idle field handling
> > > > > for the global base. As a result, global timers enqueue may go unnoticed
> > > > > in nohz_full.
> > > > > 
> > > > > Fix this with restoring the idle tracking of the global timer's base,
> > > > > allowing self-IPIs again on enqueue time.
> > > > 
> > > > Testing with the previous patch (1/2 in this series) reduced the number of
> > > > problems by about an order of magnitude, down to two sched_tick_remote()
> > > > instances and one enqueue_hrtimer() instance, very good!
> > > > 
> > > > I have kicked off a test including this patch.  Here is hoping!  ;-)
> > > 
> > > And 22*100 hours of TREE07 got me one run with a sched_tick_remote()
> 
> Sigh.  s/22/12/
> 
> > > complaint and another run with a starved RCU grace-period kthread.
> > > So this is definitely getting more reliable, but still a little ways
> > > to go.
> 
> An additional eight hours got anohtre sched_tick_remote().
> 
> > Right, there is clearly something else. Investigation continues...
> 
> Please let me know if there is a debug patch that I could apply.

So there are three things:

_ The sched_tick_remote() warning. I can easily trigger this one and while
  trying a bisection, I realize it actually also triggers on v6.8
  I'm not really tempted to investigate further because the warning doesn't make
  much sense to me. This computes the delta between the time the kworker got
  scheduled and the time it reaches the middle of the work. It happens to be
  ~3s but isn't it something to be expected with jitter and all involved into
  rcutorture?

  We should probably just remove this warning. This remote tick is my most horrible
  hackery anyway.

_ The hrtimer enqueue to an offline CPU warning:

	[ 1054.265335] WARNING: CPU: 8 PID: 150 at kernel/time/hrtimer.c:1091 enqueue_hrtimer+0x6f/0x80
	[ 1054.269166] Modules linked in:
	[ 1054.270077] CPU: 8 PID: 150 Comm: rcu_torture_rea Not tainted 6.8.0-11407-ge990136580ab-dirty #21
	[ 1054.272768] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
	[ 1054.275893] RIP: 0010:enqueue_hrtimer+0x6f/0x80
	[ 1054.277252] Code: a3 05 b5 ff b7 01 73 bd 48 8b 05 44 c5 b5 01 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 7b b5 ff ff 48 8b 03 f6 40 10 10 75 a5 90 <0f> 0b 90 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
	[ 1054.282448] RSP: 0000:ffffb1480056fd70 EFLAGS: 00010046
	[ 1054.283931] RAX: ffff95b7df2616c0 RBX: ffff95b7df261700 RCX: ffff95b7df2616c0
	[ 1054.286091] RDX: 0000000000000001 RSI: ffff95b7df261700 RDI: ffffb1480056fde0
	[ 1054.288095] RBP: ffffb1480056fde0 R08: 0000000000000001 R09: 000000000000fc03
	[ 1054.290189] R10: 0000000000000001 R11: ffff95b7c1271c80 R12: 0000000000000040
	[ 1054.292592] R13: ffff95b7df261700 R14: ffff95b7df261700 R15: ffff95b7df2616c0
	[ 1054.294622] FS:  0000000000000000(0000) GS:ffff95b7df400000(0000) knlGS:0000000000000000
	[ 1054.296884] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 1054.298497] CR2: 0000000000000000 CR3: 000000001822c000 CR4: 00000000000006f0
	[ 1054.300475] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	[ 1054.302516] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	[ 1054.304580] Call Trace:
	[ 1054.305354]  <TASK>
	[ 1054.306009]  ? __warn+0x77/0x120
	[ 1054.306973]  ? enqueue_hrtimer+0x6f/0x80
	[ 1054.308177]  ? report_bug+0xf1/0x1d0
	[ 1054.310200]  ? handle_bug+0x40/0x70
	[ 1054.311252]  ? exc_invalid_op+0x13/0x70
	[ 1054.312437]  ? asm_exc_invalid_op+0x16/0x20
	[ 1054.313738]  ? enqueue_hrtimer+0x6f/0x80
	[ 1054.314994]  hrtimer_start_range_ns+0x258/0x2f0
	[ 1054.316531]  ? __pfx_rcu_torture_reader+0x10/0x10
	[ 1054.317983]  schedule_hrtimeout_range_clock+0x95/0x120
	[ 1054.319604]  ? __pfx_hrtimer_wakeup+0x10/0x10
	[ 1054.320957]  torture_hrtimeout_ns+0x50/0x70
	[ 1054.322211]  rcu_torture_reader+0x1be/0x280
	[ 1054.323455]  ? __pfx_rcu_torture_timer+0x10/0x10
	[ 1054.329888]  ? kthread+0xc4/0xf0
	[ 1054.330880]  ? __pfx_rcu_torture_reader+0x10/0x10
	[ 1054.332334]  kthread+0xc4/0xf0
	[ 1054.333305]  ? __pfx_kthread+0x10/0x10
	[ 1054.334466]  ret_from_fork+0x2b/0x40
	[ 1054.335551]  ? __pfx_kthread+0x10/0x10
	[ 1054.336690]  ret_from_fork_asm+0x1a/0x30
	[ 1054.337878]  </TASK>

  I don't know how it happened. This is where I'm investigating now.

_ The RCU CPU Stall report. I strongly suspect the cause is the hrtimer
  enqueue to an offline CPU. Let's solve that and we'll see if it still
  triggers.

  reply	other threads:[~2024-03-21 11:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 23:07 [PATCH 0/2] timers: More fixes Frederic Weisbecker
2024-03-18 23:07 ` [PATCH 1/2] timers/migration: Fix endless timer requeue after idle interrupts Frederic Weisbecker
2024-03-21 11:24   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
2024-03-18 23:07 ` [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Frederic Weisbecker
2024-03-19  9:18   ` Paul E. McKenney
2024-03-20 11:14     ` Paul E. McKenney
2024-03-20 16:15       ` Frederic Weisbecker
2024-03-20 22:55         ` Paul E. McKenney
2024-03-21 11:42           ` Frederic Weisbecker [this message]
2024-03-21 12:47             ` Paul E. McKenney
2024-03-22 11:32               ` Frederic Weisbecker
2024-03-22 13:22                 ` for_each_domain()/sched_domain_span() has offline CPUs (was Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full) Frederic Weisbecker
2024-03-26 16:46                   ` Valentin Schneider
2024-03-27 12:42                     ` Frederic Weisbecker
2024-03-27 14:28                       ` Valentin Schneider
2024-03-28 14:08                         ` Valentin Schneider
2024-03-28 16:58                           ` Frederic Weisbecker
2024-03-28 20:31                             ` Valentin Schneider
2024-03-27 20:42                     ` Thomas Gleixner
2024-03-28 20:39                       ` Valentin Schneider
2024-03-29  2:08                         ` Tejun Heo
2024-03-29 17:06                           ` Waiman Long
2024-04-01 21:26               ` [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Paul E. McKenney
2024-04-01 21:56                 ` Frederic Weisbecker
2024-04-02  0:04                   ` Paul E. McKenney
2024-04-02 16:47                     ` Paul E. McKenney
2024-04-03 18:05                       ` Paul E. McKenney
2024-03-21 11:24   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfwdEROGFFmIbkCM@lothringen \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.