All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Pavan Kondeti <quic_pkondeti@quicinc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Charan Teja Kalla <quic_charante@quicinc.com>
Subject: Re: PSI idle-shutoff
Date: Sun, 2 Oct 2022 23:11:10 -0700	[thread overview]
Message-ID: <CAJuCfpEeNzDQ-CvMN3fP5LejOzpnfgUgvkzpPj1CLF-8NqNoww@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpE_nM2uqixnds0d6wbsz4=OQ3KPoJ5HOqDhQXaxFGxwXQ@mail.gmail.com>

On Fri, Sep 16, 2022 at 10:45 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Sep 14, 2022 at 11:20 PM Pavan Kondeti
> <quic_pkondeti@quicinc.com> wrote:
> >
> > On Tue, Sep 13, 2022 at 07:38:17PM +0530, Pavan Kondeti wrote:
> > > Hi
> > >
> > > The fact that psi_avgs_work()->collect_percpu_times()->get_recent_times()
> > > run from a kworker thread, PSI_NONIDLE condition would be observed as
> > > there is a RUNNING task. So we would always end up re-arming the work.
> > >
> > > If the work is re-armed from the psi_avgs_work() it self, the backing off
> > > logic in psi_task_change() (will be moved to psi_task_switch soon) can't
> > > help. The work is already scheduled. so we don't do anything there.
>
> Hi Pavan,
> Thanks for reporting the issue. IIRC [1] was meant to fix exactly this
> issue. At the time it was written I tested it and it seemed to work.
> Maybe I missed something or some other change introduced afterwards
> affected the shutoff logic. I'll take a closer look next week when I'm
> back at my computer and will consult with Johannes.

Sorry for the delay. I had some time to look into this and test psi
shutoff on my device and I think you are right. The patch I mentioned
prevents new psi_avgs_work from being scheduled when the only non-idle
task is psi_avgs_work itself, however the regular 2sec averaging work
will still go on. I think we could record the fact that the only
active task is psi_avgs_work in record_times() using a new
psi_group_cpu.state_mask flag and then prevent psi_avgs_work() from
rescheduling itself if that flag is set for all non-idle cpus. I'll
test this approach and will post a patch for review if that works.
Thanks,
Suren.

> Thanks,
> Suren.
>
> [1] 1b69ac6b40eb "psi: fix aggregation idle shut-off"
>
> > >
> > > Probably I am missing some thing here. Can you please clarify how we
> > > shut off re-arming the psi avg work?
> > >
> >
> > I have collected traces on an idle system (running android12-5.10 with minimal
> > user space). This is a older kernel, however the issue remain on latest kernel
> > as per code inspection.
> >
> > I have eliminated noise created by other work items. For example, vmstat_work.
> > This is a deferrable work but gets executed since this is queued on the same
> > CPU on which PSI work timer is queued. So I have increased
> > sysctl_stat_interval to 60 * HZ to supress this work.
> >
> > As we can see from the traces, CPU#7 comes out of idle only to execute PSI
> > work for every 2 seconds. The work is always re-armed from the psi_avgs_work()
> > as it finds PSI_NONIDLE condition. The non-idle time is essentially
> >
> > non_idle_time = (work_start_now - wakeup_now) + (sleep_prev - work_end_prev)
> >
> > The first term accounts the non-idle time since the task woken up (queued) to
> > the execution of the work item. It is around ~4 usec (54.119420 - 54.119416)
> >
> > The second term account for the previous update. ~2 usec (52.135424 -
> > 52.135422).
> >
> > PSI work needs to be run when there is some activity after the last update is done
> > i.e last time the work is run. Since we use non-deferrable timer, the other
> > deferrable timers gets woken up and they might queue work or wakeup other threads
> > and creates activity which inturn makes PSI work to be scheduled.
> >
> > PSI work can't just be made deferrable work. Because, it is a system level
> > work and if the CPU on which it is queued is idle for longer duration but the
> > other CPUs are active, we miss PSI updates. What we probably need is a global
> > deferrable timers [1] i.e this timer should not be bound to any CPU but
> > run when any of the CPU comes out of idle. As long as one CPU is busy, we keep
> > running the PSI but if the whole system is idle, we never wakeup.
> >
> >           <idle>-0     [007]    52.135402: cpu_idle:             state=4294967295 cpu_id=7
> >           <idle>-0     [007]    52.135415: workqueue_activate_work: work struct 0xffffffc011bd5010
> >           <idle>-0     [007]    52.135417: sched_wakeup:         comm=kworker/7:3 pid=196 prio=120 target_cpu=007
> >           <idle>-0     [007]    52.135421: sched_switch:         prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/7:3 next_pid=196 next_prio=120
> >      kworker/7:3-196   [007]    52.135421: workqueue_execute_start: work struct 0xffffffc011bd5010: function psi_avgs_work
> >      kworker/7:3-196   [007]    52.135422: timer_start:          timer=0xffffffc011bd5040 function=delayed_work_timer_fn expires=4294905814 [timeout=494] cpu=7 idx=123 flags=D|P|I
> >      kworker/7:3-196   [007]    52.135422: workqueue_execute_end: work struct 0xffffffc011bd5010: function psi_avgs_work
> >      kworker/7:3-196   [007]    52.135424: sched_switch:         prev_comm=kworker/7:3 prev_pid=196 prev_prio=120 prev_state=I ==> next_comm=swapper/7 next_pid=0 next_prio=120
> >           <idle>-0     [007]    52.135428: cpu_idle:             state=0 cpu_id=7
> >
> >           <system is idle and gets woken up after 2 seconds due to PSI work>
> >
> >           <idle>-0     [007]    54.119402: cpu_idle:             state=4294967295 cpu_id=7
> >           <idle>-0     [007]    54.119414: workqueue_activate_work: work struct 0xffffffc011bd5010
> >           <idle>-0     [007]    54.119416: sched_wakeup:         comm=kworker/7:3 pid=196 prio=120 target_cpu=007
> >           <idle>-0     [007]    54.119420: sched_switch:         prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/7:3 next_pid=196 next_prio=120
> >      kworker/7:3-196   [007]    54.119420: workqueue_execute_start: work struct 0xffffffc011bd5010: function psi_avgs_work
> >      kworker/7:3-196   [007]    54.119421: timer_start:          timer=0xffffffc011bd5040 function=delayed_work_timer_fn expires=4294906315 [timeout=499] cpu=7 idx=122 flags=D|P|I
> >      kworker/7:3-196   [007]    54.119422: workqueue_execute_end: work struct 0xffffffc011bd5010: function psi_avgs_work
> >
> > [1]
> > https://lore.kernel.org/lkml/1430188744-24737-1-git-send-email-joonwoop@codeaurora.org/
> >
> > Thanks,
> > Pavan

  reply	other threads:[~2022-10-03  6:11 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13 14:08 PSI idle-shutoff Pavan Kondeti
2022-09-15  6:20 ` Pavan Kondeti
2022-09-17  5:45   ` Suren Baghdasaryan
2022-10-03  6:11     ` Suren Baghdasaryan [this message]
2022-10-05 16:32       ` Suren Baghdasaryan
2022-10-09 12:41         ` Chengming Zhou
2022-10-09 13:17           ` Chengming Zhou
2022-10-10  6:18             ` Pavan Kondeti
2022-10-10  6:43               ` Pavan Kondeti
2022-10-10  6:57                 ` [External] " Chengming Zhou
2022-10-10  8:30                   ` Chengming Zhou
2022-10-10  9:09                     ` Pavan Kondeti
2022-10-10  9:22                       ` Chengming Zhou
2022-10-10 20:59             ` Suren Baghdasaryan
2022-10-10 20:33           ` Suren Baghdasaryan
2022-10-10  5:57         ` Pavan Kondeti
2022-10-10  9:01           ` Pavan Kondeti
2022-10-10  6:25         ` Pavan Kondeti
2022-10-10 10:42 ` [PATCH] sched/psi: Fix avgs_work re-arm in psi_avgs_work() Chengming Zhou
2022-10-10 21:21   ` Suren Baghdasaryan
2022-10-11  0:07     ` Chengming Zhou
2022-10-11 17:00       ` Suren Baghdasaryan
2022-10-12  2:10         ` Chengming Zhou
2022-10-12 18:24           ` Suren Baghdasaryan
2022-10-13  2:23             ` Chengming Zhou
2022-10-13 11:06             ` Chengming Zhou
2022-10-13 15:52               ` Johannes Weiner
2022-10-13 16:10                 ` Suren Baghdasaryan
2022-10-14  2:03                   ` Chengming Zhou
2022-10-14  2:02                 ` Chengming Zhou
2022-10-28  6:42   ` [tip: sched/core] " tip-bot2 for Chengming Zhou
2022-10-28  6:50     ` [External] " Chengming Zhou
2022-10-28 15:58       ` Suren Baghdasaryan
2022-10-28 16:05         ` Chengming Zhou
2022-10-28 19:53         ` [External] " Peter Zijlstra
2022-10-29 11:55           ` Peter Zijlstra
2022-10-29 12:40             ` Chengming Zhou
2022-10-29 18:46               ` Suren Baghdasaryan
2022-10-10 10:57 ` PSI idle-shutoff Hillf Danton
2022-10-10 21:16   ` Suren Baghdasaryan
2022-10-11 11:38     ` Hillf Danton
2022-10-11 17:11       ` Suren Baghdasaryan
2022-10-12  6:20         ` Hillf Danton
2022-10-12 15:40           ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpEeNzDQ-CvMN3fP5LejOzpnfgUgvkzpPj1CLF-8NqNoww@mail.gmail.com \
    --to=surenb@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quic_charante@quicinc.com \
    --cc=quic_pkondeti@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.