All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <vschneid@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Alex Shi <alexs@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Barry Song <song.bao.hua@hisilicon.com>
Subject: Re: for_each_domain()/sched_domain_span() has offline CPUs (was Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full)
Date: Thu, 28 Mar 2024 21:31:21 +0100	[thread overview]
Message-ID: <xhsmhmsqiayjq.mognet@vschneid-thinkpadt14sgen2i.remote.csb> (raw)
In-Reply-To: <ZgWhteHzLb8Jutp3@localhost.localdomain>

On 28/03/24 17:58, Frederic Weisbecker wrote:
> Le Thu, Mar 28, 2024 at 03:08:08PM +0100, Valentin Schneider a écrit :
>> On 27/03/24 15:28, Valentin Schneider wrote:
>> > On 27/03/24 13:42, Frederic Weisbecker wrote:
>> >> Le Tue, Mar 26, 2024 at 05:46:07PM +0100, Valentin Schneider a écrit :
>> >>> > Then with that patch I ran TREE07, just some short iterations:
>> >>> >
>> >>> > tools/testing/selftests/rcutorture/bin/kvm.sh --configs "10*TREE07" --allcpus --bootargs "rcutorture.onoff_interval=200" --duration 2
>> >>> >
>> >>> > And the warning triggers very quickly. At least since v6.3 but maybe since
>> >>> > earlier. Is this expected behaviour or am I right to assume that
>> >>> > for_each_domain()/sched_domain_span() shouldn't return an offline CPU?
>> >>> >
>> >>> 
>> >>> I would very much assume an offline CPU shouldn't show up in a
>> >>> sched_domain_span().
>> >>> 
>> >>> Now, on top of the above, there's one more thing worth noting:
>> >>>   cpu_up_down_serialize_trainwrecks()
>> >>> 
>> >>> This just flushes the cpuset work, so after that the sched_domain topology
>> >>> should be sane. However I see it's invoked at the tail end of _cpu_down(),
>> >>> IOW /after/ takedown_cpu() has run, which sounds too late. The comments
>> >>> around this vs. lock ordering aren't very reassuring however, so I need to
>> >>> look into this more.
>> >>
>> >> Ouch...
>> >>
>> >>> 
>> >>> Maybe as a "quick" test to see if this is the right culprit, you could try
>> >>> that with CONFIG_CPUSET=n? Because in that case the sched_domain update is
>> >>> ran within sched_cpu_deactivate().
>> >>
>> >> I just tried and I fear that doesn't help. It still triggers even without
>> >> cpusets :-s
>> >>
>> >
>> > What, you mean I can't always blame cgroups? What has the world come to?
>> >
>> > That's interesting, it means the deferred work item isn't the (only)
>> > issue. I'll grab your test patch and try to reproduce on TREE07.
>> >
>> 
>> Unfortunately I haven't been able to trigger your warning with ~20 runs of
>> TREE07 & CONFIG_CPUSETS=n, however it does trigger reliably with
>> CONFIG_CPUSETS=y, so I'm back to thinking the cpuset work is a likely
>> culprit...
>
> Funny, I just checked again and I can still reliably reproduce with:
>
> ./tools/testing/selftests/rcutorture/bin/kvm.sh --kconfig "CONFIG_CPUSETS=n CONFIG_PROC_PID_CPUSET=n" --configs "10*TREE07" --allcpus --bootargs "rcutorture.onoff_interval=200" --duration 2
>
> I'm thinking there might be several culprits... ;-)

Hmm, frustrating that I can't seem to reproduce this...

Could you run this with CONFIG_SCHED_DEBUG=y and sched_verbose on the
cmdline? And maybe tweak the warning to show which CPU we are scanning the
sched_domain of and which one we found to be offline in the span.


  reply	other threads:[~2024-03-28 20:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 23:07 [PATCH 0/2] timers: More fixes Frederic Weisbecker
2024-03-18 23:07 ` [PATCH 1/2] timers/migration: Fix endless timer requeue after idle interrupts Frederic Weisbecker
2024-03-21 11:24   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
2024-03-18 23:07 ` [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Frederic Weisbecker
2024-03-19  9:18   ` Paul E. McKenney
2024-03-20 11:14     ` Paul E. McKenney
2024-03-20 16:15       ` Frederic Weisbecker
2024-03-20 22:55         ` Paul E. McKenney
2024-03-21 11:42           ` Frederic Weisbecker
2024-03-21 12:47             ` Paul E. McKenney
2024-03-22 11:32               ` Frederic Weisbecker
2024-03-22 13:22                 ` for_each_domain()/sched_domain_span() has offline CPUs (was Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full) Frederic Weisbecker
2024-03-26 16:46                   ` Valentin Schneider
2024-03-27 12:42                     ` Frederic Weisbecker
2024-03-27 14:28                       ` Valentin Schneider
2024-03-28 14:08                         ` Valentin Schneider
2024-03-28 16:58                           ` Frederic Weisbecker
2024-03-28 20:31                             ` Valentin Schneider [this message]
2024-03-27 20:42                     ` Thomas Gleixner
2024-03-28 20:39                       ` Valentin Schneider
2024-03-29  2:08                         ` Tejun Heo
2024-03-29 17:06                           ` Waiman Long
2024-04-01 21:26               ` [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full Paul E. McKenney
2024-04-01 21:56                 ` Frederic Weisbecker
2024-04-02  0:04                   ` Paul E. McKenney
2024-04-02 16:47                     ` Paul E. McKenney
2024-04-03 18:05                       ` Paul E. McKenney
2024-03-21 11:24   ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xhsmhmsqiayjq.mognet@vschneid-thinkpadt14sgen2i.remote.csb \
    --to=vschneid@redhat.com \
    --cc=alexs@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.