All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Benjamin Segall" <bsegall@google.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	matthias.bgg@gmail.com, "Minchan Kim" <minchan@google.com>,
	"Tim Murray" <timmurray@google.com>,
	"YT Chang" <yt.chang@mediatek.com>,
	"Wenju Xu (许文举)" <wenju.xu@mediatek.com>,
	"Jonathan JMChen (陳家明)" <jonathan.jmchen@mediatek.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org,
	kernel-team <kernel-team@android.com>,
	"SH Chen" <show-hong.chen@mediatek.com>
Subject: Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
Date: Wed, 7 Jul 2021 15:43:48 -0700	[thread overview]
Message-ID: <CAJuCfpGx22iTaDGCfOrM_pD6PYZqQrni2+u5jQy+NpNeNg7B9w@mail.gmail.com> (raw)
In-Reply-To: <YOWugYxQ9Yfsqba2@cmpxchg.org>

On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> This looks good to me now code wise. Just a comment on the comments:
>
> On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> >       return now + group->poll_min_period;
> >  }
> >
> > -/* Schedule polling if it's not already scheduled. */
> > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > +/* Schedule polling if it's not already scheduled or forced. */
> > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > +                                bool force)
> >  {
> >       struct task_struct *task;
> >
> > -     /*
> > -      * Do not reschedule if already scheduled.
> > -      * Possible race with a timer scheduled after this check but before
> > -      * mod_timer below can be tolerated because group->polling_next_update
> > -      * will keep updates on schedule.
> > -      */
> > -     if (timer_pending(&group->poll_timer))
> > +     /* xchg should be called even when !force to set poll_scheduled */
> > +     if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> >               return;
>
> This explains what the code does, but not why. It would be good to
> explain the ordering with poll_work, here or there. But both sides
> should mention each other.

How about this:

/*
 * atomic_xchg should be called even when !force to always set poll_scheduled
 * and to provide a memory barrier (see the comment inside psi_poll_work).
 */

>
> > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> >
> >       now = sched_clock();
> >
> > +     if (now > group->polling_until) {
> > +             /*
> > +              * We are either about to start or might stop polling if no
> > +              * state change was recorded. Resetting poll_scheduled leaves
> > +              * a small window for psi_group_change to sneak in and schedule
> > +              * an immegiate poll_work before we get to rescheduling. One
> > +              * potential extra wakeup at the end of the polling window
> > +              * should be negligible and polling_next_update still keeps
> > +              * updates correctly on schedule.
> > +              */
> > +             atomic_set(&group->poll_scheduled, 0);
> > +             /*
> > +              * Ensure that operations of clearing group->poll_scheduled and
> > +              * obtaining changed_states are not reordered.
> > +              */
> > +             smp_mb();
>
> Same here, it would be good to explain that this is ordering the
> scheduler with the timer such that no events are missed. Feel free to
> reuse my race diagram from the other thread - those are better at
> conveying the situation than freeform text.

I tried to make your diagram a bit less abstract by using the actual
names. How about this?

/*
 * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
 * reads and writes in psi_poll_work and psi_group_change functions.
Otherwise we
 * might fail to reschedule the timer when monitored states change:
 *
 * psi_poll_work:
 *     poll_scheduled = 0
 *     smp_mb()
 *     changed_states = collect_percpu_times()
 *     if changed_states && xchg(poll_scheduled, 1) == 0
 *         mod_timer()
 *
 * psi_group_change:
 *     record_times()
 *     smp_mb()
 *     if xchg(poll_scheduled, 1) == 0
 *         mod_timer()
 *
 * atomic_xchg in psi_schedule_poll_work implements an implicit memory
barrier but
 * we need an explicit one here.
 */

If we remove smp_mb barriers then there are the following possible
reordering cases:

Case1: reordering in psi_poll_work
psi_poll_work                    psi_group_change
  changed_states = collect_percpu_times()
                                              record_times()
                                              if xchg(poll_scheduled,
1) == 0 <-- false
                                                  mod_timer()
  poll_scheduled = 0
  if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
      mod_timer()

Case2: reordering in psi_group_change
psi_poll_work                    psi_group_change
                                              if xchg(poll_scheduled,
1) == 0 <-- false
                                                  mod_timer()
  poll_scheduled = 0
  changed_states = collect_percpu_times()
                                                  record_times()
  if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
      mod_timer()

In both cases mod_timer() is not called, poll update is missed. But
describing this all in the comments would be an overkill IMHO.
WDYT?

>
> Thanks

WARNING: multiple messages have this Message-ID (diff)
From: Suren Baghdasaryan <surenb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Benjamin Segall" <bsegall@google.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	matthias.bgg@gmail.com, "Minchan Kim" <minchan@google.com>,
	"Tim Murray" <timmurray@google.com>,
	"YT Chang" <yt.chang@mediatek.com>,
	"Wenju Xu (许文举)" <wenju.xu@mediatek.com>,
	"Jonathan JMChen (陳家明)" <jonathan.jmchen@mediatek.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org,
	kernel-team <kernel-team@android.com>,
	"SH Chen" <show-hong.chen@mediatek.com>
Subject: Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
Date: Wed, 7 Jul 2021 15:43:48 -0700	[thread overview]
Message-ID: <CAJuCfpGx22iTaDGCfOrM_pD6PYZqQrni2+u5jQy+NpNeNg7B9w@mail.gmail.com> (raw)
In-Reply-To: <YOWugYxQ9Yfsqba2@cmpxchg.org>

On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> This looks good to me now code wise. Just a comment on the comments:
>
> On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> >       return now + group->poll_min_period;
> >  }
> >
> > -/* Schedule polling if it's not already scheduled. */
> > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > +/* Schedule polling if it's not already scheduled or forced. */
> > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > +                                bool force)
> >  {
> >       struct task_struct *task;
> >
> > -     /*
> > -      * Do not reschedule if already scheduled.
> > -      * Possible race with a timer scheduled after this check but before
> > -      * mod_timer below can be tolerated because group->polling_next_update
> > -      * will keep updates on schedule.
> > -      */
> > -     if (timer_pending(&group->poll_timer))
> > +     /* xchg should be called even when !force to set poll_scheduled */
> > +     if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> >               return;
>
> This explains what the code does, but not why. It would be good to
> explain the ordering with poll_work, here or there. But both sides
> should mention each other.

How about this:

/*
 * atomic_xchg should be called even when !force to always set poll_scheduled
 * and to provide a memory barrier (see the comment inside psi_poll_work).
 */

>
> > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> >
> >       now = sched_clock();
> >
> > +     if (now > group->polling_until) {
> > +             /*
> > +              * We are either about to start or might stop polling if no
> > +              * state change was recorded. Resetting poll_scheduled leaves
> > +              * a small window for psi_group_change to sneak in and schedule
> > +              * an immegiate poll_work before we get to rescheduling. One
> > +              * potential extra wakeup at the end of the polling window
> > +              * should be negligible and polling_next_update still keeps
> > +              * updates correctly on schedule.
> > +              */
> > +             atomic_set(&group->poll_scheduled, 0);
> > +             /*
> > +              * Ensure that operations of clearing group->poll_scheduled and
> > +              * obtaining changed_states are not reordered.
> > +              */
> > +             smp_mb();
>
> Same here, it would be good to explain that this is ordering the
> scheduler with the timer such that no events are missed. Feel free to
> reuse my race diagram from the other thread - those are better at
> conveying the situation than freeform text.

I tried to make your diagram a bit less abstract by using the actual
names. How about this?

/*
 * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
 * reads and writes in psi_poll_work and psi_group_change functions.
Otherwise we
 * might fail to reschedule the timer when monitored states change:
 *
 * psi_poll_work:
 *     poll_scheduled = 0
 *     smp_mb()
 *     changed_states = collect_percpu_times()
 *     if changed_states && xchg(poll_scheduled, 1) == 0
 *         mod_timer()
 *
 * psi_group_change:
 *     record_times()
 *     smp_mb()
 *     if xchg(poll_scheduled, 1) == 0
 *         mod_timer()
 *
 * atomic_xchg in psi_schedule_poll_work implements an implicit memory
barrier but
 * we need an explicit one here.
 */

If we remove smp_mb barriers then there are the following possible
reordering cases:

Case1: reordering in psi_poll_work
psi_poll_work                    psi_group_change
  changed_states = collect_percpu_times()
                                              record_times()
                                              if xchg(poll_scheduled,
1) == 0 <-- false
                                                  mod_timer()
  poll_scheduled = 0
  if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
      mod_timer()

Case2: reordering in psi_group_change
psi_poll_work                    psi_group_change
                                              if xchg(poll_scheduled,
1) == 0 <-- false
                                                  mod_timer()
  poll_scheduled = 0
  changed_states = collect_percpu_times()
                                                  record_times()
  if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
      mod_timer()

In both cases mod_timer() is not called, poll update is missed. But
describing this all in the comments would be an overkill IMHO.
WDYT?

>
> Thanks

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

WARNING: multiple messages have this Message-ID (diff)
From: Suren Baghdasaryan <surenb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Benjamin Segall" <bsegall@google.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	matthias.bgg@gmail.com, "Minchan Kim" <minchan@google.com>,
	"Tim Murray" <timmurray@google.com>,
	"YT Chang" <yt.chang@mediatek.com>,
	"Wenju Xu (许文举)" <wenju.xu@mediatek.com>,
	"Jonathan JMChen (陳家明)" <jonathan.jmchen@mediatek.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org,
	kernel-team <kernel-team@android.com>,
	"SH Chen" <show-hong.chen@mediatek.com>
Subject: Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
Date: Wed, 7 Jul 2021 15:43:48 -0700	[thread overview]
Message-ID: <CAJuCfpGx22iTaDGCfOrM_pD6PYZqQrni2+u5jQy+NpNeNg7B9w@mail.gmail.com> (raw)
In-Reply-To: <YOWugYxQ9Yfsqba2@cmpxchg.org>

On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> This looks good to me now code wise. Just a comment on the comments:
>
> On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> >       return now + group->poll_min_period;
> >  }
> >
> > -/* Schedule polling if it's not already scheduled. */
> > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > +/* Schedule polling if it's not already scheduled or forced. */
> > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > +                                bool force)
> >  {
> >       struct task_struct *task;
> >
> > -     /*
> > -      * Do not reschedule if already scheduled.
> > -      * Possible race with a timer scheduled after this check but before
> > -      * mod_timer below can be tolerated because group->polling_next_update
> > -      * will keep updates on schedule.
> > -      */
> > -     if (timer_pending(&group->poll_timer))
> > +     /* xchg should be called even when !force to set poll_scheduled */
> > +     if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> >               return;
>
> This explains what the code does, but not why. It would be good to
> explain the ordering with poll_work, here or there. But both sides
> should mention each other.

How about this:

/*
 * atomic_xchg should be called even when !force to always set poll_scheduled
 * and to provide a memory barrier (see the comment inside psi_poll_work).
 */

>
> > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> >
> >       now = sched_clock();
> >
> > +     if (now > group->polling_until) {
> > +             /*
> > +              * We are either about to start or might stop polling if no
> > +              * state change was recorded. Resetting poll_scheduled leaves
> > +              * a small window for psi_group_change to sneak in and schedule
> > +              * an immegiate poll_work before we get to rescheduling. One
> > +              * potential extra wakeup at the end of the polling window
> > +              * should be negligible and polling_next_update still keeps
> > +              * updates correctly on schedule.
> > +              */
> > +             atomic_set(&group->poll_scheduled, 0);
> > +             /*
> > +              * Ensure that operations of clearing group->poll_scheduled and
> > +              * obtaining changed_states are not reordered.
> > +              */
> > +             smp_mb();
>
> Same here, it would be good to explain that this is ordering the
> scheduler with the timer such that no events are missed. Feel free to
> reuse my race diagram from the other thread - those are better at
> conveying the situation than freeform text.

I tried to make your diagram a bit less abstract by using the actual
names. How about this?

/*
 * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
 * reads and writes in psi_poll_work and psi_group_change functions.
Otherwise we
 * might fail to reschedule the timer when monitored states change:
 *
 * psi_poll_work:
 *     poll_scheduled = 0
 *     smp_mb()
 *     changed_states = collect_percpu_times()
 *     if changed_states && xchg(poll_scheduled, 1) == 0
 *         mod_timer()
 *
 * psi_group_change:
 *     record_times()
 *     smp_mb()
 *     if xchg(poll_scheduled, 1) == 0
 *         mod_timer()
 *
 * atomic_xchg in psi_schedule_poll_work implements an implicit memory
barrier but
 * we need an explicit one here.
 */

If we remove smp_mb barriers then there are the following possible
reordering cases:

Case1: reordering in psi_poll_work
psi_poll_work                    psi_group_change
  changed_states = collect_percpu_times()
                                              record_times()
                                              if xchg(poll_scheduled,
1) == 0 <-- false
                                                  mod_timer()
  poll_scheduled = 0
  if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
      mod_timer()

Case2: reordering in psi_group_change
psi_poll_work                    psi_group_change
                                              if xchg(poll_scheduled,
1) == 0 <-- false
                                                  mod_timer()
  poll_scheduled = 0
  changed_states = collect_percpu_times()
                                                  record_times()
  if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
      mod_timer()

In both cases mod_timer() is not called, poll update is missed. But
describing this all in the comments would be an overkill IMHO.
WDYT?

>
> Thanks

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-07-07 22:44 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  2:39 [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling Suren Baghdasaryan
2021-07-07  2:39 ` Suren Baghdasaryan
2021-07-07  2:39 ` Suren Baghdasaryan
2021-07-07 13:39 ` Johannes Weiner
2021-07-07 13:39   ` Johannes Weiner
2021-07-07 13:39   ` Johannes Weiner
2021-07-07 22:43   ` Suren Baghdasaryan [this message]
2021-07-07 22:43     ` Suren Baghdasaryan
2021-07-07 22:43     ` Suren Baghdasaryan
2021-07-08 14:44     ` Johannes Weiner
2021-07-08 14:44       ` Johannes Weiner
2021-07-08 14:44       ` Johannes Weiner
2021-07-08 15:54       ` Suren Baghdasaryan
2021-07-08 15:54         ` Suren Baghdasaryan
2021-07-08 15:54         ` Suren Baghdasaryan
2021-07-08 18:38         ` Johannes Weiner
2021-07-08 18:38           ` Johannes Weiner
2021-07-08 18:38           ` Johannes Weiner
2021-07-08 19:55           ` Suren Baghdasaryan
2021-07-08 19:55             ` Suren Baghdasaryan
2021-07-08 19:55             ` Suren Baghdasaryan
2021-07-08 20:37             ` Suren Baghdasaryan
2021-07-08 20:37               ` Suren Baghdasaryan
2021-07-08 20:37               ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpGx22iTaDGCfOrM_pD6PYZqQrni2+u5jQy+NpNeNg7B9w@mail.gmail.com \
    --to=surenb@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jonathan.jmchen@mediatek.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=matthias.bgg@gmail.com \
    --cc=mgorman@suse.de \
    --cc=minchan@google.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=show-hong.chen@mediatek.com \
    --cc=timmurray@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wenju.xu@mediatek.com \
    --cc=yt.chang@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.