linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anna-Maria Behnsen <anna-maria@linutronix.de>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Eric Dumazet <edumazet@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Arjan van de Ven <arjan@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Rik van Riel <riel@surriel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Siewior <bigeasy@linutronix.de>,
	Giovanni Gherdovich <ggherdovich@suse.cz>,
	Lukasz Luba <lukasz.luba@arm.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Srinivas Pandruvada <srinivas.pandruvada@intel.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model
Date: Tue, 30 Jan 2024 18:56:32 +0100	[thread overview]
Message-ID: <87v87a9033.fsf@somnus> (raw)
In-Reply-To: <Zbb5m0hRHgk59-8z@pavilion.home>

Frederic Weisbecker <frederic@kernel.org> writes:

> Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
>> +int tmigr_requires_handle_remote(void)
>> +{
>> +	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
>> +	struct tmigr_remote_data data;
>> +	unsigned int ret = 0;
>> +	unsigned long jif;
>> +
>> +	if (tmigr_is_not_available(tmc))
>> +		return ret;
>> +
>> +	data.now = get_jiffies_update(&jif);
>> +	data.childmask = tmc->childmask;
>> +	data.firstexp = KTIME_MAX;
>> +	data.tmc_active = !tmc->idle;
>> +	data.check = false;
>> +
>> +	/*
>> +	 * If the CPU is active, walk the hierarchy to check whether a remote
>> +	 * expiry is required.
>> +	 *
>> +	 * Check is done lockless as interrupts are disabled and @tmc->idle is
>> +	 * set only by the local CPU.
>> +	 */
>> +	if (!tmc->idle) {
>> +		__walk_groups(&tmigr_requires_handle_remote_up, &data, tmc);
>> +
>> +		if (data.firstexp != KTIME_MAX)
>> +			ret = 1;
>> +
>> +		return ret;
>> +	}
>> +
>> +	/*
>> +	 * If the CPU is idle, check whether the recalculation of @tmc->wakeup
>> +	 * is required. @tmc->wakeup_recalc is set, when the last active CPU
>> +	 * went offline. The last active CPU delegated the handling of the timer
>> +	 * migration hierarchy to another (this) CPU by updating this flag and
>> +	 * sending a reschedule.
>> +	 *
>> +	 * Racy lockless check is valid:
>> +	 * - @tmc->wakeup_recalc is set by the remote CPU before it issues
>> +	 *   reschedule IPI.
>> +	 * - As interrupts are disabled here this CPU will either observe
>> +	 *   @tmc->wakeup_recalc set before the reschedule IPI can be handled or
>> +	 *   it will observe it when this function is called again on return
>> +	 *   from handling the reschedule IPI.
>> +	 */
>> +	if (tmc->wakeup_recalc) {
>> +		__walk_groups(&tmigr_requires_handle_remote_up, &data, tmc);
>> +
>> +		if (data.firstexp != KTIME_MAX)
>> +			ret = 1;
>> +
>> +		raw_spin_lock(&tmc->lock);
>> +		WRITE_ONCE(tmc->wakeup, data.firstexp);
>> +		tmc->wakeup_recalc = false;
>> +		raw_spin_unlock(&tmc->lock);
>
> Suppose we have:
>
>             [GRP1:0]
>             migrator = GRP0:1
>             active   = GRP0:0, GRP0:1
>               /                \
>      [GRP0:0]                  [GRP0:1]
>      migrator = CPU 1         migrator = CPU 3
>      active   = CPU 1         active   = CPU 3
>        /         \               /         \
> CPUs  0           1             2           3
>      idle        active        idle        active
>
> CPU 0 and CPU 2 have no timer.
> CPU 1 has a timer in a few millisecs.
>
>             [GRP1:0]
>             migrator = GRP0:1
>             active   = GRP0:1
>               /                \
>      [GRP0:0]                  [GRP0:1]
>      migrator = NONE           migrator = CPU 3
>      active   = NONE           active   = CPU 3
>        /         \                /         \
> CPUs  0           1              2           3
>      idle        idle         idle        active
>
>
> CPU 1 went idle, CPU 3 will take care of CPU 1's timer. Then come two
> things happening at the same time: CPU 0 has a timer interrupt, due to
> RCU callbacks handling for example, and CPU 3 goes offline:
>
> CPU 0                                   CPU 3
> -----                                   -----
>                                         // On top level [GRP1:0], just set migrator = TMIGR_NONE
>                                         tmigr_inactive_up() {
>                                             cpu = cpumask_any_but(cpu_online_mask, cpu);
>                                             //cpu == 0
>                                             tmc_resched = per_cpu_ptr(&tmigr_cpu, CPU 0);
>                                             raw_spin_lock(&tmc_resched->lock);
>                                             tmc_resched->wakeup_recalc = true;
>                                             raw_spin_unlock(&tmc_resched->lock);
> // timer interrupt
> run_local_timers() {
>     tmigr_requires_handle_remote() {
>         data.firstexp = KTIME_MAX;
>         // CPU 0 sees the tmc_resched->wakeup_recalc
>         // latest update
>         if (tmc->wakeup_recalc) {
>             tmigr_requires_handle_remote_up() {
>                 // CPU 0 doesn't see GRP0:0 
>                 // latest update from CPU 1,
>                 // because it has no locking
>                 // and does a racy check.
>         	    if (!tmigr_check_migrator(group, childmask))
>                     return true;
>             }
>             raw_spin_lock(&tmc->lock);
>             WRITE_ONCE(tmc->wakeup, data.firstexp);
>             tmc->wakeup_recalc = false;
>             raw_spin_unlock(&tmc->lock)
>             return 0;
>         }
>                                             // IPI is sent only now
> 		                                    smp_send_reschedule(cpu);
>                                             }
>
>
> There is nothing that prevents CPU 0 from not seeing the hierarchy updates from
> other CPUs since it checks the migrators in a racy way. As a result the timer of
> CPU 1 may be ignored by CPU 0.
>
> You'd need to lock the tmc while calling tmigr_requires_handle_remote_up(), so
> that CPU 0 "inherits" everything that CPU 3 has seen, and that includes changes
> from CPU 1.
>

puhh. ok. But for the !idle case the lockless walk of
tmigr_requires_handle_remote_up() is ok? It's also possible, that the
CPU misses an update of the state - another CPU goes idle and selects
this CPU as the new migrator. And this CPU reads a stale value where the
other CPU is migrator. But this will be revisited on the next
tick. hmm...

>
> But I see that tmigr_cpu_new_timer() does it right. Wouldn't it be possible to
> exlusively let tmigr_cpu_new_timer() handle the wakeup_recalc thing? This is
> going to be called after the end of the IRQ (whether timer interrupt or sched
> IPI) in any case.

Should work, yes. But when a timer has to be handled right away and it
is checked after the end of the IRQ, then the tick might be reprogrammed
so that CPU comes out of idle, or am I wrong?

But, while I'm having a deeper look at the code - I completely destroyed
the logic to use the 'check' value of the tmigr_remote_date struct for
making a decision whether to raise a timer softirq or not... Whenever
the expiry in the top level is !KTIME_MAX, then
tmigr_require_handle_remote returns 1. Oh no. Also the struct member
description is not up to date.

Thanks,

	Anna-Maria

  reply	other threads:[~2024-01-30 17:56 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15 14:37 [PATCH v10 00/20] timers: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 01/20] timers: Restructure get_next_timer_interrupt() Anna-Maria Behnsen
2024-01-17 15:01   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 02/20] timers: Split out get next timer interrupt Anna-Maria Behnsen
2024-01-17 15:06   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 03/20] timers: Move marking timer bases idle into tick_nohz_stop_tick() Anna-Maria Behnsen
2024-01-17 16:02   ` Frederic Weisbecker
2024-01-22 11:45     ` Anna-Maria Behnsen
2024-01-22 21:49       ` Frederic Weisbecker
2024-02-19  8:52   ` [PATCH v10a] " Anna-Maria Behnsen
2024-02-19 22:37     ` Frederic Weisbecker
2024-02-20 10:48       ` Anna-Maria Behnsen
2024-02-20 11:41         ` Frederic Weisbecker
2024-02-20 12:02           ` Anna-Maria Behnsen
2024-02-20 12:34             ` Frederic Weisbecker
2024-02-20 14:00               ` Anna-Maria Behnsen
2024-02-20 15:10                 ` Frederic Weisbecker
2024-02-20 15:23                   ` Anna-Maria Behnsen
2024-02-20 15:25                     ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 04/20] timers: Optimization for timer_base_try_to_set_idle() Anna-Maria Behnsen
2024-01-17 16:45   ` Frederic Weisbecker
2024-01-22 11:48     ` Anna-Maria Behnsen
2024-01-22 22:22   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 05/20] timers: Introduce add_timer() variants which modify timer flags Anna-Maria Behnsen
2024-01-17 17:01   ` Frederic Weisbecker
2024-01-22 11:50     ` Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 06/20] workqueue: Use global variant for add_timer() Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 07/20] timers: add_timer_on(): Make sure TIMER_PINNED flag is set Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 08/20] timers: Ease code in run_local_timers() Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 09/20] timers: Split next timer interrupt logic Anna-Maria Behnsen
2024-01-23 14:28   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 10/20] timers: Keep the pinned timers separate from the others Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 11/20] timers: Retrieve next expiry of pinned/non-pinned timers separately Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 12/20] timers: Split out "get next timer interrupt" functionality Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 13/20] timers: Add get next timer interrupt functionality for remote CPUs Anna-Maria Behnsen
2024-02-19 16:04   ` Frederic Weisbecker
2024-02-19 16:57     ` Anna-Maria Behnsen
2024-01-15 14:37 ` [PATCH v10 14/20] timers: Restructure internal locking Anna-Maria Behnsen
2024-01-24 13:56   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 15/20] timers: Check if timers base is handled already Anna-Maria Behnsen
2024-01-24 14:22   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 16/20] tick/sched: Split out jiffies update helper function Anna-Maria Behnsen
2024-01-24 14:42   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 17/20] timers: Introduce function to check timer base is_idle flag Anna-Maria Behnsen
2024-01-24 14:52   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 18/20] timers: Implement the hierarchical pull model Anna-Maria Behnsen
2024-01-25 14:30   ` Frederic Weisbecker
2024-01-28 15:58     ` Anna-Maria Behnsen
2024-01-30 15:29       ` Frederic Weisbecker
2024-01-30 16:45         ` Anna-Maria Behnsen
2024-01-26 12:53   ` Frederic Weisbecker
2024-01-27 22:54   ` Frederic Weisbecker
2024-01-29 10:50     ` Anna-Maria Behnsen
2024-01-29 22:21       ` Frederic Weisbecker
2024-01-30 13:32         ` Anna-Maria Behnsen
2024-01-29 13:50     ` Paul E. McKenney
2024-01-29  1:04   ` Frederic Weisbecker
2024-01-30 17:56     ` Anna-Maria Behnsen [this message]
2024-01-30 21:13       ` Frederic Weisbecker
2024-01-31 11:19         ` Anna-Maria Behnsen
2024-01-30 15:37   ` Frederic Weisbecker
2024-02-01 14:59     ` Anna-Maria Behnsen
2024-02-01 15:05   ` Frederic Weisbecker
2024-02-01 16:15     ` Anna-Maria Behnsen
2024-02-01 17:43       ` Frederic Weisbecker
2024-02-01 20:52         ` Anna-Maria Behnsen
2024-02-05 13:29           ` Anna-Maria Behnsen
2024-02-05 20:30             ` Frederic Weisbecker
2024-02-06 10:06               ` Anna-Maria Behnsen
2024-02-06 10:29                 ` Frederic Weisbecker
2024-02-01 16:33   ` Frederic Weisbecker
2024-02-05 15:59     ` Anna-Maria Behnsen
2024-02-05 20:28       ` Frederic Weisbecker
2024-02-04 22:02   ` Frederic Weisbecker
2024-02-06 11:03     ` Anna-Maria Behnsen
2024-02-06 11:11       ` Frederic Weisbecker
2024-02-04 22:32   ` Frederic Weisbecker
2024-02-06 11:36     ` Anna-Maria Behnsen
2024-02-06 13:21       ` Frederic Weisbecker
2024-02-06 14:13         ` Anna-Maria Behnsen
2024-02-06 14:21           ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 19/20] timer_migration: Add tracepoints Anna-Maria Behnsen
2024-02-01 16:47   ` Frederic Weisbecker
2024-01-15 14:37 ` [PATCH v10 20/20] timers: Always queue timers on the local CPU Anna-Maria Behnsen
2024-02-01 17:36   ` Frederic Weisbecker
2024-02-01 20:58     ` Anna-Maria Behnsen
2024-02-02 11:57       ` Frederic Weisbecker
2024-01-30 22:07 ` [PATCH v10 00/20] timers: Move from a push remote at enqueue to a pull at expiry model Christian Loehle
2024-02-01 15:03   ` Anna-Maria Behnsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v87a9033.fsf@somnus \
    --to=anna-maria@linutronix.de \
    --cc=arjan@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=gautham.shenoy@amd.com \
    --cc=ggherdovich@suse.cz \
    --cc=jstultz@google.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=srinivas.pandruvada@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).