linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavan Kondeti <quic_pkondeti@quicinc.com>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Pavan Kondeti <quic_pkondeti@quicinc.com>,
	<linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Eric Dumazet <edumazet@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Arjan van de Ven <arjan@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	"Frederic Weisbecker" <fweisbec@gmail.com>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v4 00/16] timer: Move from a push remote at enqueue to a pull at expiry model
Date: Wed, 9 Nov 2022 00:18:22 +0530	[thread overview]
Message-ID: <20221108184822.GB28920@hu-pkondeti-hyd.qualcomm.com> (raw)
In-Reply-To: <d1814f32-c8a2-c65f-836d-6c84878c4650@linutronix.de>

On Tue, Nov 08, 2022 at 06:39:22PM +0100, Anna-Maria Behnsen wrote:
> On Tue, 8 Nov 2022, Pavan Kondeti wrote:
> 
> > Hi Anna-Maria,
> > 
> > On Tue, Nov 08, 2022 at 04:06:15PM +0100, Anna-Maria Behnsen wrote:
> > > On Tue, 8 Nov 2022, Pavan Kondeti wrote:
> > > 
> > > > Hi Anna-Maria,
> > > > 
> > > > On Fri, Nov 04, 2022 at 03:57:21PM +0100, Anna-Maria Behnsen wrote:
> > > > > Next Steps
> > > > > ~~~~~~~~~~
> > > > > 
> > > > > Simple deferrable timers are no longer required as they can be converted to
> > > > > global timers. If a CPU goes idle, a formerly deferrable timer will not
> > > > > prevent the CPU to sleep as long as possible. Only the last migrator CPU
> > > > > has to take care of them. Deferrable timers with timer pinned flags needs
> > > > > to be expired on the specified CPU but must not prevent CPU from going
> > > > > idle. They require their own timer base which is never taken into account
> > > > > when calculating the next expiry time. This conversation and required
> > > > > cleanup will be done in a follow up series.
> > > > > 
> > > > 
> > > > Taking non-pinned deferrable timers case, they are queued on their own base
> > > > and its expiry is not taken into account while programming the next timer
> > > > event during idle.
> > > 
> > > If CPU is not the last CPU going idle, then yes.
> > 
> > What is special with last CPU that is going idle? Sorry, it is not clear where
> > the deferrable timer expiry is taken into account while programming the next
> > wakeup event?
> 
> The last CPU has to make sure the global timers are handled. At the moment
> the deferrable timer expiry is not taken into account for next wakeup.
> 

Right. Nothing changes wrt deferrable timers with this series.

> > forward_and_idle_timer_bases()->tmigr_cpu_deactivate() is only taking global
> > timer expiry (deferrable timers are NOT queued on global base) and comparing
> > against the local base expiry. This makes me think that we are not taking
> > deferrable timers expiry into account, which is correct IMO.
> 
> The information "deferrable timers [...] can be converted to global timers"
> is below the heading "Next Steps". It is _NOT_ part of this series, it will
> be part of a follow up patch series.
> 
> The posted series only introduces the timer migration hierarchy and then
> removes the heuristic on which CPU a timer will be enqueued. The only
> change for deferrable timers after this series is: They are always enqueued
> on the local CPU. The rest stays the same.
> 

Understood. 

> 
> > > > When the deferrable timers will be queued on global base, once a CPU comes out
> > > > of idle and serve the timers on global base, the deferrable timers also would
> > > > be served. This is a welcoming change. We would see a truly deferrable global
> > > > timer something we would be interested in. [1] has some background on this.
> > > 
> > > Serving the deferrable timers once a CPU comes out of idle is already the
> > > case even without the timer migration hierarchy. See upstream version of
> > > run_local_timers().
> > 
> > However upstream version does not wake a CPU just to serve a deferrable timer.
> > But it seems if we consider a deferrable timer just as another global timer,
> > sure it will not prevent the local CPU going idle but there would be one CPU
> > (thus, the system) that pays the penalty.
> >
> 
> Right.
> 
> > > > [1]
> > > > https://lore.kernel.org/lkml/1430188744-24737-1-git-send-email-joonwoop@codeaurora.org/
> > > 
> > > As far as I understand the problem you are linking to correctly, you want
> > > to have a real "unbound" solution for deferrable or delayed work. This is
> > > what you get with the timer migration hierarchy and when enqueuing
> > > deferrable timers into global timer base. Timers are executed on the
> > > migrator CPU because this CPU is not idle - doesn't matter where they have
> > > been queued before.
> > > 
> > > It might be possible, that a fomerly deferrable timer enforces the last CPU
> > > going idle to come back out of idle. But the question is, how often does
> > > this occur in contrast to a wakeup cause by a non deferrable timer?  If you
> > > have a look at the timers in kernel you have 64 deferrable timers (this
> > > number also contain the deferrable and pinned timers). There are 7 timers
> > > with only TIMER_PINNED flag and some additional using the add_timer_on() to
> > > be enqueued on a dedicated CPU. But in total we have more than 1000
> > > timers. Sure - in the end, numbers hardly depends on the selected kernel
> > > config...
> > 
> > I will give an example here. Lets say we have 4 CPUs in a system. There is a
> > devfreq governor driver that configures a delayed work for every 20 msec.
> 
> s/delayed work/deferrable work ?

yeah, deferrable work.

> 
> > #1 When the system is busy, this *deferrable* timer expires at the 20 msec
> > boundary. However, when the system is idle (i.e no use case is running but
> > system does not enter global suspend because of other reasons like display
> > ON etc), we don't expect this deferrable timer to expire at every 20 msec. 
> > 
> > With your proposal, we endup seeing the system (last CPU that enters idle)
> > coming out of idle for every 20 msec which is not desirable.
> 
> With my proposal for next steps only timers with pinned and deferrable flag
> set would keep the old behavior.

pinned timers/work are meant for collecting/doing per-CPU things. Those who
don't care about these like devfreq would generally want the scheduler to select
the best CPU for their work and probably don't use pinned timers.

> 
> > #2 Today, deferrable is local to CPU. Irrespective of the activity on the
> > other CPUs, this deferrable timer does not expire as long as the local CPU
> > is idle for whatever reason. That is definitly not the devfreq governor
> > expectation. The intention is to save power when system is idle but serving
> > the purpose when it is relatively busy.
> > 
> > > 
> > > Side note: One big problem of deferrable timers disappear with this
> > > approach. All deferrable timers _WILL_ expire. Even if CPU where they have
> > > been enqueued does not come back out of idle. Only deferrable and pinned
> > > timers will still have this problem.
> > > 
> > Yes, this is a welcoming change. Solves the #2 problem as mentioned above.
> 
> But this welcoming change is only accessible when enqueuing deferrable
> timers into global base. But be aware, the problem sill exists for pinned
> deferrable timers.
> 
Yes. I will look forward to your series that implements Next steps and we can
discuss more about this. Please do keep the usecase/example, I have mentioned
above. We really don't want folks to use pinned deferrable timers as a
workaround because now deferrable timers do wake up CPUs otherwise.

Thanks,
Pavan

      reply	other threads:[~2022-11-08 18:51 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 14:57 [PATCH v4 00/16] timer: Move from a push remote at enqueue to a pull at expiry model Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 01/16] tick-sched: Warn when next tick seems to be in the past Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 02/16] timer: Move store of next event into __next_timer_interrupt() Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 03/16] timer: Split next timer interrupt logic Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 04/16] timer: Rework idle logic Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 05/16] add_timer_on(): Make sure callers have TIMER_PINNED flag Anna-Maria Behnsen
2022-11-04 16:43   ` Frederic Weisbecker
2022-11-07  8:11     ` Anna-Maria Behnsen
2022-11-07 10:11       ` Frederic Weisbecker
2022-11-04 14:57 ` [PATCH v4 06/16] timer: Keep the pinned timers separate from the others Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 07/16] timer: Retrieve next expiry of pinned/non-pinned timers seperately Anna-Maria Behnsen
2022-11-07 11:58   ` Frederic Weisbecker
2022-11-04 14:57 ` [PATCH v4 08/16] timer: Rename get_next_timer_interrupt() Anna-Maria Behnsen
2022-11-07 12:13   ` Frederic Weisbecker
2022-11-04 14:57 ` [PATCH v4 09/16] timer: Split out "get next timer interrupt" functionality Anna-Maria Behnsen
2022-11-07 12:42   ` Frederic Weisbecker
2022-11-08 15:30     ` Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 10/16] timer: Add get next timer interrupt functionality for remote CPUs Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 11/16] timer: Restructure internal locking Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 12/16] timer: Check if timers base is handled already Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 13/16] tick/sched: Split out jiffies update helper function Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 14/16] timer: Implement the hierarchical pull model Anna-Maria Behnsen
2022-11-07 22:07   ` Frederic Weisbecker
2022-11-08 16:16     ` Anna-Maria Behnsen
2022-11-09 17:12       ` Frederic Weisbecker
2022-11-08 10:47   ` Frederic Weisbecker
2022-11-08 17:02     ` Anna-Maria Behnsen
2022-11-09 17:13       ` Frederic Weisbecker
2022-11-08 11:48   ` Frederic Weisbecker
2022-11-09 16:39   ` Frederic Weisbecker
2022-11-10  6:34     ` Anna-Maria Behnsen
2022-11-14 13:09   ` Frederic Weisbecker
2022-11-15 11:31   ` Frederic Weisbecker
2022-11-24  7:47     ` Anna-Maria Behnsen
2022-11-28 16:20       ` Frederic Weisbecker
2022-11-29 10:30         ` Frederic Weisbecker
2022-11-16 13:40   ` Frederic Weisbecker
2022-11-04 14:57 ` [PATCH v4 15/16] timer_migration: Add tracepoints Anna-Maria Behnsen
2022-11-04 14:57 ` [PATCH v4 16/16] timer: Always queue timers on the local CPU Anna-Maria Behnsen
2022-11-08  4:37 ` [PATCH v4 00/16] timer: Move from a push remote at enqueue to a pull at expiry model Pavan Kondeti
2022-11-08 15:06   ` Anna-Maria Behnsen
2022-11-08 16:04     ` Pavan Kondeti
2022-11-08 17:39       ` Anna-Maria Behnsen
2022-11-08 18:48         ` Pavan Kondeti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108184822.GB28920@hu-pkondeti-hyd.qualcomm.com \
    --to=quic_pkondeti@quicinc.com \
    --cc=anna-maria@linutronix.de \
    --cc=arjan@infradead.org \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).