* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 14:44 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-07-08 14:44 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > This looks good to me now code wise. Just a comment on the comments:
> >
> > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > return now + group->poll_min_period;
> > > }
> > >
> > > -/* Schedule polling if it's not already scheduled. */
> > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > +/* Schedule polling if it's not already scheduled or forced. */
> > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > + bool force)
> > > {
> > > struct task_struct *task;
> > >
> > > - /*
> > > - * Do not reschedule if already scheduled.
> > > - * Possible race with a timer scheduled after this check but before
> > > - * mod_timer below can be tolerated because group->polling_next_update
> > > - * will keep updates on schedule.
> > > - */
> > > - if (timer_pending(&group->poll_timer))
> > > + /* xchg should be called even when !force to set poll_scheduled */
> > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > return;
> >
> > This explains what the code does, but not why. It would be good to
> > explain the ordering with poll_work, here or there. But both sides
> > should mention each other.
>
> How about this:
>
> /*
> * atomic_xchg should be called even when !force to always set poll_scheduled
> * and to provide a memory barrier (see the comment inside psi_poll_work).
> */
The memory barrier part makes sense, but the first part says what the
code does and the message is unclear to me. Are you worried somebody
might turn this around in the future and only conditionalize on
poll_scheduled when !force? Essentially, I don't see the downside of
dropping that. But maybe I'm missing something.
/*
* The xchg implies a full barrier that matches the one
* in psi_poll_work() (see corresponding comment there).
*/
> > > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> > >
> > > now = sched_clock();
> > >
> > > + if (now > group->polling_until) {
> > > + /*
> > > + * We are either about to start or might stop polling if no
> > > + * state change was recorded. Resetting poll_scheduled leaves
> > > + * a small window for psi_group_change to sneak in and schedule
> > > + * an immegiate poll_work before we get to rescheduling. One
> > > + * potential extra wakeup at the end of the polling window
> > > + * should be negligible and polling_next_update still keeps
> > > + * updates correctly on schedule.
> > > + */
> > > + atomic_set(&group->poll_scheduled, 0);
> > > + /*
> > > + * Ensure that operations of clearing group->poll_scheduled and
> > > + * obtaining changed_states are not reordered.
> > > + */
> > > + smp_mb();
> >
> > Same here, it would be good to explain that this is ordering the
> > scheduler with the timer such that no events are missed. Feel free to
> > reuse my race diagram from the other thread - those are better at
> > conveying the situation than freeform text.
>
> I tried to make your diagram a bit less abstract by using the actual
> names. How about this?
>
> /*
> * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
> * reads and writes in psi_poll_work and psi_group_change functions.
> Otherwise we
> * might fail to reschedule the timer when monitored states change:
> *
> * psi_poll_work:
> * poll_scheduled = 0
> * smp_mb()
> * changed_states = collect_percpu_times()
> * if changed_states && xchg(poll_scheduled, 1) == 0
> * mod_timer()
Those last two lines aren't relevant for the race, right? I'd leave
those out to not distract from it.
> * psi_group_change:
> * record_times()
> * smp_mb()
> * if xchg(poll_scheduled, 1) == 0
> * mod_timer()
The reason I tend to keep these more abstract is because 1) the names
of the functions change (I had already sent out patches to rename half
the variable and function names in this diagram), while the
architecture (task change vs poll worker) likely won't, and 2) because
it's easy to drown out what the reads, writes, and thus the race
condition is with code details and function call indirections.
How about a compromise?
/*
* A task change can race with the poll worker that is supposed to
* report on it. To avoid missing events, ensure ordering between
* poll_scheduled and the task state accesses, such that if the poll
* worker misses the state update, the task change is guaranteed to
* reschedule the poll worker:
*
* poll worker:
* atomic_set(poll_scheduled, 0)
* smp_mb()
* LOAD states
*
* task change:
* STORE states
* if atomic_xchg(poll_scheduled, 1) == 0:
* schedule poll worker
*
* The atomic_xchg() implies a full barrier.
*/
smp_mb();
This gives a high-level view of what's happening but it can still be
mapped to the code by following the poll_scheduled variable.
> If we remove smp_mb barriers then there are the following possible
> reordering cases:
>
> Case1: reordering in psi_poll_work
> psi_poll_work psi_group_change
> changed_states = collect_percpu_times()
> record_times()
> if xchg(poll_scheduled,
> 1) == 0 <-- false
> mod_timer()
> poll_scheduled = 0
> if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> mod_timer()
>
> Case2: reordering in psi_group_change
> psi_poll_work psi_group_change
> if xchg(poll_scheduled,
> 1) == 0 <-- false
> mod_timer()
> poll_scheduled = 0
> changed_states = collect_percpu_times()
> record_times()
> if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> mod_timer()
>
> In both cases mod_timer() is not called, poll update is missed. But
> describing this all in the comments would be an overkill IMHO.
> WDYT?
Yeah, I also think that's overkill. The failure cases can be derived
from the concurrency diagram and explanation.
Thanks
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 14:44 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-07-08 14:44 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > This looks good to me now code wise. Just a comment on the comments:
> >
> > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > return now + group->poll_min_period;
> > > }
> > >
> > > -/* Schedule polling if it's not already scheduled. */
> > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > +/* Schedule polling if it's not already scheduled or forced. */
> > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > + bool force)
> > > {
> > > struct task_struct *task;
> > >
> > > - /*
> > > - * Do not reschedule if already scheduled.
> > > - * Possible race with a timer scheduled after this check but before
> > > - * mod_timer below can be tolerated because group->polling_next_update
> > > - * will keep updates on schedule.
> > > - */
> > > - if (timer_pending(&group->poll_timer))
> > > + /* xchg should be called even when !force to set poll_scheduled */
> > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > return;
> >
> > This explains what the code does, but not why. It would be good to
> > explain the ordering with poll_work, here or there. But both sides
> > should mention each other.
>
> How about this:
>
> /*
> * atomic_xchg should be called even when !force to always set poll_scheduled
> * and to provide a memory barrier (see the comment inside psi_poll_work).
> */
The memory barrier part makes sense, but the first part says what the
code does and the message is unclear to me. Are you worried somebody
might turn this around in the future and only conditionalize on
poll_scheduled when !force? Essentially, I don't see the downside of
dropping that. But maybe I'm missing something.
/*
* The xchg implies a full barrier that matches the one
* in psi_poll_work() (see corresponding comment there).
*/
> > > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> > >
> > > now = sched_clock();
> > >
> > > + if (now > group->polling_until) {
> > > + /*
> > > + * We are either about to start or might stop polling if no
> > > + * state change was recorded. Resetting poll_scheduled leaves
> > > + * a small window for psi_group_change to sneak in and schedule
> > > + * an immegiate poll_work before we get to rescheduling. One
> > > + * potential extra wakeup at the end of the polling window
> > > + * should be negligible and polling_next_update still keeps
> > > + * updates correctly on schedule.
> > > + */
> > > + atomic_set(&group->poll_scheduled, 0);
> > > + /*
> > > + * Ensure that operations of clearing group->poll_scheduled and
> > > + * obtaining changed_states are not reordered.
> > > + */
> > > + smp_mb();
> >
> > Same here, it would be good to explain that this is ordering the
> > scheduler with the timer such that no events are missed. Feel free to
> > reuse my race diagram from the other thread - those are better at
> > conveying the situation than freeform text.
>
> I tried to make your diagram a bit less abstract by using the actual
> names. How about this?
>
> /*
> * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
> * reads and writes in psi_poll_work and psi_group_change functions.
> Otherwise we
> * might fail to reschedule the timer when monitored states change:
> *
> * psi_poll_work:
> * poll_scheduled = 0
> * smp_mb()
> * changed_states = collect_percpu_times()
> * if changed_states && xchg(poll_scheduled, 1) == 0
> * mod_timer()
Those last two lines aren't relevant for the race, right? I'd leave
those out to not distract from it.
> * psi_group_change:
> * record_times()
> * smp_mb()
> * if xchg(poll_scheduled, 1) == 0
> * mod_timer()
The reason I tend to keep these more abstract is because 1) the names
of the functions change (I had already sent out patches to rename half
the variable and function names in this diagram), while the
architecture (task change vs poll worker) likely won't, and 2) because
it's easy to drown out what the reads, writes, and thus the race
condition is with code details and function call indirections.
How about a compromise?
/*
* A task change can race with the poll worker that is supposed to
* report on it. To avoid missing events, ensure ordering between
* poll_scheduled and the task state accesses, such that if the poll
* worker misses the state update, the task change is guaranteed to
* reschedule the poll worker:
*
* poll worker:
* atomic_set(poll_scheduled, 0)
* smp_mb()
* LOAD states
*
* task change:
* STORE states
* if atomic_xchg(poll_scheduled, 1) == 0:
* schedule poll worker
*
* The atomic_xchg() implies a full barrier.
*/
smp_mb();
This gives a high-level view of what's happening but it can still be
mapped to the code by following the poll_scheduled variable.
> If we remove smp_mb barriers then there are the following possible
> reordering cases:
>
> Case1: reordering in psi_poll_work
> psi_poll_work psi_group_change
> changed_states = collect_percpu_times()
> record_times()
> if xchg(poll_scheduled,
> 1) == 0 <-- false
> mod_timer()
> poll_scheduled = 0
> if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> mod_timer()
>
> Case2: reordering in psi_group_change
> psi_poll_work psi_group_change
> if xchg(poll_scheduled,
> 1) == 0 <-- false
> mod_timer()
> poll_scheduled = 0
> changed_states = collect_percpu_times()
> record_times()
> if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> mod_timer()
>
> In both cases mod_timer() is not called, poll update is missed. But
> describing this all in the comments would be an overkill IMHO.
> WDYT?
Yeah, I also think that's overkill. The failure cases can be derived
from the concurrency diagram and explanation.
Thanks
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
2021-07-08 14:44 ` Johannes Weiner
(?)
@ 2021-07-08 15:54 ` Suren Baghdasaryan
-1 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 15:54 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
t
On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > This looks good to me now code wise. Just a comment on the comments:
> > >
> > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > return now + group->poll_min_period;
> > > > }
> > > >
> > > > -/* Schedule polling if it's not already scheduled. */
> > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > + bool force)
> > > > {
> > > > struct task_struct *task;
> > > >
> > > > - /*
> > > > - * Do not reschedule if already scheduled.
> > > > - * Possible race with a timer scheduled after this check but before
> > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > - * will keep updates on schedule.
> > > > - */
> > > > - if (timer_pending(&group->poll_timer))
> > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > return;
> > >
> > > This explains what the code does, but not why. It would be good to
> > > explain the ordering with poll_work, here or there. But both sides
> > > should mention each other.
> >
> > How about this:
> >
> > /*
> > * atomic_xchg should be called even when !force to always set poll_scheduled
> > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > */
>
> The memory barrier part makes sense, but the first part says what the
> code does and the message is unclear to me. Are you worried somebody
> might turn this around in the future and only conditionalize on
> poll_scheduled when !force? Essentially, I don't see the downside of
> dropping that. But maybe I'm missing something.
Actually you are right. Originally I was worried that there might be a
case when poll_scheduled==0 and force==true and if someone flips the
conditions we will reschedule the timer but will not set
poll_scheduled back to 1. However I don't think this condition is
possible. We set force=true only when we skipped resetting
poll_schedule to 0 and on initial wakeup we always reset
poll_schedule. How about changing the comment to this:
/*
* atomic_xchg should be called even when !force to provide a
* full memory barrier (see the comment inside psi_poll_work).
*/
> /*
> * The xchg implies a full barrier that matches the one
> * in psi_poll_work() (see corresponding comment there).
> */
>
> > > > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> > > >
> > > > now = sched_clock();
> > > >
> > > > + if (now > group->polling_until) {
> > > > + /*
> > > > + * We are either about to start or might stop polling if no
> > > > + * state change was recorded. Resetting poll_scheduled leaves
> > > > + * a small window for psi_group_change to sneak in and schedule
> > > > + * an immegiate poll_work before we get to rescheduling. One
> > > > + * potential extra wakeup at the end of the polling window
> > > > + * should be negligible and polling_next_update still keeps
> > > > + * updates correctly on schedule.
> > > > + */
> > > > + atomic_set(&group->poll_scheduled, 0);
> > > > + /*
> > > > + * Ensure that operations of clearing group->poll_scheduled and
> > > > + * obtaining changed_states are not reordered.
> > > > + */
> > > > + smp_mb();
> > >
> > > Same here, it would be good to explain that this is ordering the
> > > scheduler with the timer such that no events are missed. Feel free to
> > > reuse my race diagram from the other thread - those are better at
> > > conveying the situation than freeform text.
> >
> > I tried to make your diagram a bit less abstract by using the actual
> > names. How about this?
> >
> > /*
> > * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
> > * reads and writes in psi_poll_work and psi_group_change functions.
> > Otherwise we
> > * might fail to reschedule the timer when monitored states change:
> > *
> > * psi_poll_work:
> > * poll_scheduled = 0
> > * smp_mb()
> > * changed_states = collect_percpu_times()
> > * if changed_states && xchg(poll_scheduled, 1) == 0
> > * mod_timer()
>
> Those last two lines aren't relevant for the race, right? I'd leave
> those out to not distract from it.
They did help me illustrate the two failure cases but yeah, someone
who can read the code can derive the rest :)
>
> > * psi_group_change:
> > * record_times()
> > * smp_mb()
> > * if xchg(poll_scheduled, 1) == 0
> > * mod_timer()
>
> The reason I tend to keep these more abstract is because 1) the names
> of the functions change (I had already sent out patches to rename half
> the variable and function names in this diagram), while the
> architecture (task change vs poll worker) likely won't, and 2) because
> it's easy to drown out what the reads, writes, and thus the race
> condition is with code details and function call indirections.
Got it.
>
> How about a compromise?
>
> /*
> * A task change can race with the poll worker that is supposed to
> * report on it. To avoid missing events, ensure ordering between
> * poll_scheduled and the task state accesses, such that if the poll
> * worker misses the state update, the task change is guaranteed to
> * reschedule the poll worker:
> *
> * poll worker:
> * atomic_set(poll_scheduled, 0)
> * smp_mb()
> * LOAD states
> *
> * task change:
> * STORE states
> * if atomic_xchg(poll_scheduled, 1) == 0:
> * schedule poll worker
> *
> * The atomic_xchg() implies a full barrier.
> */
> smp_mb();
>
> This gives a high-level view of what's happening but it can still be
> mapped to the code by following the poll_scheduled variable.
This looks really good to me.
If you agree on the first comment modification, should I respin the
next version?
>
> > If we remove smp_mb barriers then there are the following possible
> > reordering cases:
> >
> > Case1: reordering in psi_poll_work
> > psi_poll_work psi_group_change
> > changed_states = collect_percpu_times()
> > record_times()
> > if xchg(poll_scheduled,
> > 1) == 0 <-- false
> > mod_timer()
> > poll_scheduled = 0
> > if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> > mod_timer()
> >
> > Case2: reordering in psi_group_change
> > psi_poll_work psi_group_change
> > if xchg(poll_scheduled,
> > 1) == 0 <-- false
> > mod_timer()
> > poll_scheduled = 0
> > changed_states = collect_percpu_times()
> > record_times()
> > if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> > mod_timer()
> >
> > In both cases mod_timer() is not called, poll update is missed. But
> > describing this all in the comments would be an overkill IMHO.
> > WDYT?
>
> Yeah, I also think that's overkill. The failure cases can be derived
> from the concurrency diagram and explanation.
>
> Thanks
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 15:54 ` Suren Baghdasaryan
0 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 15:54 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
t
On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > This looks good to me now code wise. Just a comment on the comments:
> > >
> > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > return now + group->poll_min_period;
> > > > }
> > > >
> > > > -/* Schedule polling if it's not already scheduled. */
> > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > + bool force)
> > > > {
> > > > struct task_struct *task;
> > > >
> > > > - /*
> > > > - * Do not reschedule if already scheduled.
> > > > - * Possible race with a timer scheduled after this check but before
> > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > - * will keep updates on schedule.
> > > > - */
> > > > - if (timer_pending(&group->poll_timer))
> > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > return;
> > >
> > > This explains what the code does, but not why. It would be good to
> > > explain the ordering with poll_work, here or there. But both sides
> > > should mention each other.
> >
> > How about this:
> >
> > /*
> > * atomic_xchg should be called even when !force to always set poll_scheduled
> > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > */
>
> The memory barrier part makes sense, but the first part says what the
> code does and the message is unclear to me. Are you worried somebody
> might turn this around in the future and only conditionalize on
> poll_scheduled when !force? Essentially, I don't see the downside of
> dropping that. But maybe I'm missing something.
Actually you are right. Originally I was worried that there might be a
case when poll_scheduled==0 and force==true and if someone flips the
conditions we will reschedule the timer but will not set
poll_scheduled back to 1. However I don't think this condition is
possible. We set force=true only when we skipped resetting
poll_schedule to 0 and on initial wakeup we always reset
poll_schedule. How about changing the comment to this:
/*
* atomic_xchg should be called even when !force to provide a
* full memory barrier (see the comment inside psi_poll_work).
*/
> /*
> * The xchg implies a full barrier that matches the one
> * in psi_poll_work() (see corresponding comment there).
> */
>
> > > > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> > > >
> > > > now = sched_clock();
> > > >
> > > > + if (now > group->polling_until) {
> > > > + /*
> > > > + * We are either about to start or might stop polling if no
> > > > + * state change was recorded. Resetting poll_scheduled leaves
> > > > + * a small window for psi_group_change to sneak in and schedule
> > > > + * an immegiate poll_work before we get to rescheduling. One
> > > > + * potential extra wakeup at the end of the polling window
> > > > + * should be negligible and polling_next_update still keeps
> > > > + * updates correctly on schedule.
> > > > + */
> > > > + atomic_set(&group->poll_scheduled, 0);
> > > > + /*
> > > > + * Ensure that operations of clearing group->poll_scheduled and
> > > > + * obtaining changed_states are not reordered.
> > > > + */
> > > > + smp_mb();
> > >
> > > Same here, it would be good to explain that this is ordering the
> > > scheduler with the timer such that no events are missed. Feel free to
> > > reuse my race diagram from the other thread - those are better at
> > > conveying the situation than freeform text.
> >
> > I tried to make your diagram a bit less abstract by using the actual
> > names. How about this?
> >
> > /*
> > * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
> > * reads and writes in psi_poll_work and psi_group_change functions.
> > Otherwise we
> > * might fail to reschedule the timer when monitored states change:
> > *
> > * psi_poll_work:
> > * poll_scheduled = 0
> > * smp_mb()
> > * changed_states = collect_percpu_times()
> > * if changed_states && xchg(poll_scheduled, 1) == 0
> > * mod_timer()
>
> Those last two lines aren't relevant for the race, right? I'd leave
> those out to not distract from it.
They did help me illustrate the two failure cases but yeah, someone
who can read the code can derive the rest :)
>
> > * psi_group_change:
> > * record_times()
> > * smp_mb()
> > * if xchg(poll_scheduled, 1) == 0
> > * mod_timer()
>
> The reason I tend to keep these more abstract is because 1) the names
> of the functions change (I had already sent out patches to rename half
> the variable and function names in this diagram), while the
> architecture (task change vs poll worker) likely won't, and 2) because
> it's easy to drown out what the reads, writes, and thus the race
> condition is with code details and function call indirections.
Got it.
>
> How about a compromise?
>
> /*
> * A task change can race with the poll worker that is supposed to
> * report on it. To avoid missing events, ensure ordering between
> * poll_scheduled and the task state accesses, such that if the poll
> * worker misses the state update, the task change is guaranteed to
> * reschedule the poll worker:
> *
> * poll worker:
> * atomic_set(poll_scheduled, 0)
> * smp_mb()
> * LOAD states
> *
> * task change:
> * STORE states
> * if atomic_xchg(poll_scheduled, 1) == 0:
> * schedule poll worker
> *
> * The atomic_xchg() implies a full barrier.
> */
> smp_mb();
>
> This gives a high-level view of what's happening but it can still be
> mapped to the code by following the poll_scheduled variable.
This looks really good to me.
If you agree on the first comment modification, should I respin the
next version?
>
> > If we remove smp_mb barriers then there are the following possible
> > reordering cases:
> >
> > Case1: reordering in psi_poll_work
> > psi_poll_work psi_group_change
> > changed_states = collect_percpu_times()
> > record_times()
> > if xchg(poll_scheduled,
> > 1) == 0 <-- false
> > mod_timer()
> > poll_scheduled = 0
> > if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> > mod_timer()
> >
> > Case2: reordering in psi_group_change
> > psi_poll_work psi_group_change
> > if xchg(poll_scheduled,
> > 1) == 0 <-- false
> > mod_timer()
> > poll_scheduled = 0
> > changed_states = collect_percpu_times()
> > record_times()
> > if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> > mod_timer()
> >
> > In both cases mod_timer() is not called, poll update is missed. But
> > describing this all in the comments would be an overkill IMHO.
> > WDYT?
>
> Yeah, I also think that's overkill. The failure cases can be derived
> from the concurrency diagram and explanation.
>
> Thanks
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 15:54 ` Suren Baghdasaryan
0 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 15:54 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
t
On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > This looks good to me now code wise. Just a comment on the comments:
> > >
> > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > return now + group->poll_min_period;
> > > > }
> > > >
> > > > -/* Schedule polling if it's not already scheduled. */
> > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > + bool force)
> > > > {
> > > > struct task_struct *task;
> > > >
> > > > - /*
> > > > - * Do not reschedule if already scheduled.
> > > > - * Possible race with a timer scheduled after this check but before
> > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > - * will keep updates on schedule.
> > > > - */
> > > > - if (timer_pending(&group->poll_timer))
> > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > return;
> > >
> > > This explains what the code does, but not why. It would be good to
> > > explain the ordering with poll_work, here or there. But both sides
> > > should mention each other.
> >
> > How about this:
> >
> > /*
> > * atomic_xchg should be called even when !force to always set poll_scheduled
> > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > */
>
> The memory barrier part makes sense, but the first part says what the
> code does and the message is unclear to me. Are you worried somebody
> might turn this around in the future and only conditionalize on
> poll_scheduled when !force? Essentially, I don't see the downside of
> dropping that. But maybe I'm missing something.
Actually you are right. Originally I was worried that there might be a
case when poll_scheduled==0 and force==true and if someone flips the
conditions we will reschedule the timer but will not set
poll_scheduled back to 1. However I don't think this condition is
possible. We set force=true only when we skipped resetting
poll_schedule to 0 and on initial wakeup we always reset
poll_schedule. How about changing the comment to this:
/*
* atomic_xchg should be called even when !force to provide a
* full memory barrier (see the comment inside psi_poll_work).
*/
> /*
> * The xchg implies a full barrier that matches the one
> * in psi_poll_work() (see corresponding comment there).
> */
>
> > > > @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
> > > >
> > > > now = sched_clock();
> > > >
> > > > + if (now > group->polling_until) {
> > > > + /*
> > > > + * We are either about to start or might stop polling if no
> > > > + * state change was recorded. Resetting poll_scheduled leaves
> > > > + * a small window for psi_group_change to sneak in and schedule
> > > > + * an immegiate poll_work before we get to rescheduling. One
> > > > + * potential extra wakeup at the end of the polling window
> > > > + * should be negligible and polling_next_update still keeps
> > > > + * updates correctly on schedule.
> > > > + */
> > > > + atomic_set(&group->poll_scheduled, 0);
> > > > + /*
> > > > + * Ensure that operations of clearing group->poll_scheduled and
> > > > + * obtaining changed_states are not reordered.
> > > > + */
> > > > + smp_mb();
> > >
> > > Same here, it would be good to explain that this is ordering the
> > > scheduler with the timer such that no events are missed. Feel free to
> > > reuse my race diagram from the other thread - those are better at
> > > conveying the situation than freeform text.
> >
> > I tried to make your diagram a bit less abstract by using the actual
> > names. How about this?
> >
> > /*
> > * We need to enforce ordering between poll_scheduled and psi_group_cpu.times
> > * reads and writes in psi_poll_work and psi_group_change functions.
> > Otherwise we
> > * might fail to reschedule the timer when monitored states change:
> > *
> > * psi_poll_work:
> > * poll_scheduled = 0
> > * smp_mb()
> > * changed_states = collect_percpu_times()
> > * if changed_states && xchg(poll_scheduled, 1) == 0
> > * mod_timer()
>
> Those last two lines aren't relevant for the race, right? I'd leave
> those out to not distract from it.
They did help me illustrate the two failure cases but yeah, someone
who can read the code can derive the rest :)
>
> > * psi_group_change:
> > * record_times()
> > * smp_mb()
> > * if xchg(poll_scheduled, 1) == 0
> > * mod_timer()
>
> The reason I tend to keep these more abstract is because 1) the names
> of the functions change (I had already sent out patches to rename half
> the variable and function names in this diagram), while the
> architecture (task change vs poll worker) likely won't, and 2) because
> it's easy to drown out what the reads, writes, and thus the race
> condition is with code details and function call indirections.
Got it.
>
> How about a compromise?
>
> /*
> * A task change can race with the poll worker that is supposed to
> * report on it. To avoid missing events, ensure ordering between
> * poll_scheduled and the task state accesses, such that if the poll
> * worker misses the state update, the task change is guaranteed to
> * reschedule the poll worker:
> *
> * poll worker:
> * atomic_set(poll_scheduled, 0)
> * smp_mb()
> * LOAD states
> *
> * task change:
> * STORE states
> * if atomic_xchg(poll_scheduled, 1) == 0:
> * schedule poll worker
> *
> * The atomic_xchg() implies a full barrier.
> */
> smp_mb();
>
> This gives a high-level view of what's happening but it can still be
> mapped to the code by following the poll_scheduled variable.
This looks really good to me.
If you agree on the first comment modification, should I respin the
next version?
>
> > If we remove smp_mb barriers then there are the following possible
> > reordering cases:
> >
> > Case1: reordering in psi_poll_work
> > psi_poll_work psi_group_change
> > changed_states = collect_percpu_times()
> > record_times()
> > if xchg(poll_scheduled,
> > 1) == 0 <-- false
> > mod_timer()
> > poll_scheduled = 0
> > if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> > mod_timer()
> >
> > Case2: reordering in psi_group_change
> > psi_poll_work psi_group_change
> > if xchg(poll_scheduled,
> > 1) == 0 <-- false
> > mod_timer()
> > poll_scheduled = 0
> > changed_states = collect_percpu_times()
> > record_times()
> > if changed_states && xchg(poll_scheduled, 1) == 0 <-- changed_states is false
> > mod_timer()
> >
> > In both cases mod_timer() is not called, poll update is missed. But
> > describing this all in the comments would be an overkill IMHO.
> > WDYT?
>
> Yeah, I also think that's overkill. The failure cases can be derived
> from the concurrency diagram and explanation.
>
> Thanks
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
2021-07-08 15:54 ` Suren Baghdasaryan
(?)
@ 2021-07-08 18:38 ` Johannes Weiner
-1 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-07-08 18:38 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > This looks good to me now code wise. Just a comment on the comments:
> > > >
> > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > return now + group->poll_min_period;
> > > > > }
> > > > >
> > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > + bool force)
> > > > > {
> > > > > struct task_struct *task;
> > > > >
> > > > > - /*
> > > > > - * Do not reschedule if already scheduled.
> > > > > - * Possible race with a timer scheduled after this check but before
> > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > - * will keep updates on schedule.
> > > > > - */
> > > > > - if (timer_pending(&group->poll_timer))
> > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > return;
> > > >
> > > > This explains what the code does, but not why. It would be good to
> > > > explain the ordering with poll_work, here or there. But both sides
> > > > should mention each other.
> > >
> > > How about this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > The memory barrier part makes sense, but the first part says what the
> > code does and the message is unclear to me. Are you worried somebody
> > might turn this around in the future and only conditionalize on
> > poll_scheduled when !force? Essentially, I don't see the downside of
> > dropping that. But maybe I'm missing something.
>
> Actually you are right. Originally I was worried that there might be a
> case when poll_scheduled==0 and force==true and if someone flips the
> conditions we will reschedule the timer but will not set
> poll_scheduled back to 1.
Oh I see.
Right, flipping the condition doesn't make sense because we need
poll_scheduled to be set when we go ahead - whether we're forcing or
not. I.e. if we were in a locked section, we'd write it like this:
if (poll_scheduled)
if (!force)
return;
else
poll_scheduled = 1;
> However I don't think this condition is possible. We set force=true
> only when we skipped resetting poll_schedule to 0 and on initial
> wakeup we always reset poll_schedule. How about changing the comment
> to this:
>
> /*
> * atomic_xchg should be called even when !force to provide a
> * full memory barrier (see the comment inside psi_poll_work).
> */
Personally, I still find this more confusing than no comment on
!force, because when you read it it sort of raises the question what
the alternatives would be. And the alternatives appear to be
nonsensical code rather than legitimate options.
But I won't insist if you prefer to leave it in. Your call.
> > /*
> > * A task change can race with the poll worker that is supposed to
> > * report on it. To avoid missing events, ensure ordering between
> > * poll_scheduled and the task state accesses, such that if the poll
> > * worker misses the state update, the task change is guaranteed to
> > * reschedule the poll worker:
> > *
> > * poll worker:
> > * atomic_set(poll_scheduled, 0)
> > * smp_mb()
> > * LOAD states
> > *
> > * task change:
> > * STORE states
> > * if atomic_xchg(poll_scheduled, 1) == 0:
> > * schedule poll worker
> > *
> > * The atomic_xchg() implies a full barrier.
> > */
> > smp_mb();
> >
> > This gives a high-level view of what's happening but it can still be
> > mapped to the code by following the poll_scheduled variable.
>
> This looks really good to me.
> If you agree on the first comment modification, should I respin the
> next version?
Yeah, sounds good to me!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 18:38 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-07-08 18:38 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > This looks good to me now code wise. Just a comment on the comments:
> > > >
> > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > return now + group->poll_min_period;
> > > > > }
> > > > >
> > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > + bool force)
> > > > > {
> > > > > struct task_struct *task;
> > > > >
> > > > > - /*
> > > > > - * Do not reschedule if already scheduled.
> > > > > - * Possible race with a timer scheduled after this check but before
> > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > - * will keep updates on schedule.
> > > > > - */
> > > > > - if (timer_pending(&group->poll_timer))
> > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > return;
> > > >
> > > > This explains what the code does, but not why. It would be good to
> > > > explain the ordering with poll_work, here or there. But both sides
> > > > should mention each other.
> > >
> > > How about this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > The memory barrier part makes sense, but the first part says what the
> > code does and the message is unclear to me. Are you worried somebody
> > might turn this around in the future and only conditionalize on
> > poll_scheduled when !force? Essentially, I don't see the downside of
> > dropping that. But maybe I'm missing something.
>
> Actually you are right. Originally I was worried that there might be a
> case when poll_scheduled==0 and force==true and if someone flips the
> conditions we will reschedule the timer but will not set
> poll_scheduled back to 1.
Oh I see.
Right, flipping the condition doesn't make sense because we need
poll_scheduled to be set when we go ahead - whether we're forcing or
not. I.e. if we were in a locked section, we'd write it like this:
if (poll_scheduled)
if (!force)
return;
else
poll_scheduled = 1;
> However I don't think this condition is possible. We set force=true
> only when we skipped resetting poll_schedule to 0 and on initial
> wakeup we always reset poll_schedule. How about changing the comment
> to this:
>
> /*
> * atomic_xchg should be called even when !force to provide a
> * full memory barrier (see the comment inside psi_poll_work).
> */
Personally, I still find this more confusing than no comment on
!force, because when you read it it sort of raises the question what
the alternatives would be. And the alternatives appear to be
nonsensical code rather than legitimate options.
But I won't insist if you prefer to leave it in. Your call.
> > /*
> > * A task change can race with the poll worker that is supposed to
> > * report on it. To avoid missing events, ensure ordering between
> > * poll_scheduled and the task state accesses, such that if the poll
> > * worker misses the state update, the task change is guaranteed to
> > * reschedule the poll worker:
> > *
> > * poll worker:
> > * atomic_set(poll_scheduled, 0)
> > * smp_mb()
> > * LOAD states
> > *
> > * task change:
> > * STORE states
> > * if atomic_xchg(poll_scheduled, 1) == 0:
> > * schedule poll worker
> > *
> > * The atomic_xchg() implies a full barrier.
> > */
> > smp_mb();
> >
> > This gives a high-level view of what's happening but it can still be
> > mapped to the code by following the poll_scheduled variable.
>
> This looks really good to me.
> If you agree on the first comment modification, should I respin the
> next version?
Yeah, sounds good to me!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 18:38 ` Johannes Weiner
0 siblings, 0 replies; 24+ messages in thread
From: Johannes Weiner @ 2021-07-08 18:38 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > This looks good to me now code wise. Just a comment on the comments:
> > > >
> > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > return now + group->poll_min_period;
> > > > > }
> > > > >
> > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > + bool force)
> > > > > {
> > > > > struct task_struct *task;
> > > > >
> > > > > - /*
> > > > > - * Do not reschedule if already scheduled.
> > > > > - * Possible race with a timer scheduled after this check but before
> > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > - * will keep updates on schedule.
> > > > > - */
> > > > > - if (timer_pending(&group->poll_timer))
> > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > return;
> > > >
> > > > This explains what the code does, but not why. It would be good to
> > > > explain the ordering with poll_work, here or there. But both sides
> > > > should mention each other.
> > >
> > > How about this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > The memory barrier part makes sense, but the first part says what the
> > code does and the message is unclear to me. Are you worried somebody
> > might turn this around in the future and only conditionalize on
> > poll_scheduled when !force? Essentially, I don't see the downside of
> > dropping that. But maybe I'm missing something.
>
> Actually you are right. Originally I was worried that there might be a
> case when poll_scheduled==0 and force==true and if someone flips the
> conditions we will reschedule the timer but will not set
> poll_scheduled back to 1.
Oh I see.
Right, flipping the condition doesn't make sense because we need
poll_scheduled to be set when we go ahead - whether we're forcing or
not. I.e. if we were in a locked section, we'd write it like this:
if (poll_scheduled)
if (!force)
return;
else
poll_scheduled = 1;
> However I don't think this condition is possible. We set force=true
> only when we skipped resetting poll_schedule to 0 and on initial
> wakeup we always reset poll_schedule. How about changing the comment
> to this:
>
> /*
> * atomic_xchg should be called even when !force to provide a
> * full memory barrier (see the comment inside psi_poll_work).
> */
Personally, I still find this more confusing than no comment on
!force, because when you read it it sort of raises the question what
the alternatives would be. And the alternatives appear to be
nonsensical code rather than legitimate options.
But I won't insist if you prefer to leave it in. Your call.
> > /*
> > * A task change can race with the poll worker that is supposed to
> > * report on it. To avoid missing events, ensure ordering between
> > * poll_scheduled and the task state accesses, such that if the poll
> > * worker misses the state update, the task change is guaranteed to
> > * reschedule the poll worker:
> > *
> > * poll worker:
> > * atomic_set(poll_scheduled, 0)
> > * smp_mb()
> > * LOAD states
> > *
> > * task change:
> > * STORE states
> > * if atomic_xchg(poll_scheduled, 1) == 0:
> > * schedule poll worker
> > *
> > * The atomic_xchg() implies a full barrier.
> > */
> > smp_mb();
> >
> > This gives a high-level view of what's happening but it can still be
> > mapped to the code by following the poll_scheduled variable.
>
> This looks really good to me.
> If you agree on the first comment modification, should I respin the
> next version?
Yeah, sounds good to me!
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
2021-07-08 18:38 ` Johannes Weiner
(?)
@ 2021-07-08 19:55 ` Suren Baghdasaryan
-1 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 19:55 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > >
> > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > return now + group->poll_min_period;
> > > > > > }
> > > > > >
> > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > + bool force)
> > > > > > {
> > > > > > struct task_struct *task;
> > > > > >
> > > > > > - /*
> > > > > > - * Do not reschedule if already scheduled.
> > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > - * will keep updates on schedule.
> > > > > > - */
> > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > return;
> > > > >
> > > > > This explains what the code does, but not why. It would be good to
> > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > should mention each other.
> > > >
> > > > How about this:
> > > >
> > > > /*
> > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > */
> > >
> > > The memory barrier part makes sense, but the first part says what the
> > > code does and the message is unclear to me. Are you worried somebody
> > > might turn this around in the future and only conditionalize on
> > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > dropping that. But maybe I'm missing something.
> >
> > Actually you are right. Originally I was worried that there might be a
> > case when poll_scheduled==0 and force==true and if someone flips the
> > conditions we will reschedule the timer but will not set
> > poll_scheduled back to 1.
>
> Oh I see.
>
> Right, flipping the condition doesn't make sense because we need
> poll_scheduled to be set when we go ahead - whether we're forcing or
> not. I.e. if we were in a locked section, we'd write it like this:
>
> if (poll_scheduled)
> if (!force)
> return;
> else
> poll_scheduled = 1;
>
> > However I don't think this condition is possible. We set force=true
> > only when we skipped resetting poll_schedule to 0 and on initial
> > wakeup we always reset poll_schedule. How about changing the comment
> > to this:
> >
> > /*
> > * atomic_xchg should be called even when !force to provide a
> > * full memory barrier (see the comment inside psi_poll_work).
> > */
>
> Personally, I still find this more confusing than no comment on
> !force, because when you read it it sort of raises the question what
> the alternatives would be. And the alternatives appear to be
> nonsensical code rather than legitimate options.
>
> But I won't insist if you prefer to leave it in. Your call.
I would like to keep it as a precaution, if you don't mind. In case
someone in the future thinks about "optimizing" this by flipping the
condition, hopefully the comment will give them a pause to think about
it :)
>
> > > /*
> > > * A task change can race with the poll worker that is supposed to
> > > * report on it. To avoid missing events, ensure ordering between
> > > * poll_scheduled and the task state accesses, such that if the poll
> > > * worker misses the state update, the task change is guaranteed to
> > > * reschedule the poll worker:
> > > *
> > > * poll worker:
> > > * atomic_set(poll_scheduled, 0)
> > > * smp_mb()
> > > * LOAD states
> > > *
> > > * task change:
> > > * STORE states
> > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > * schedule poll worker
> > > *
> > > * The atomic_xchg() implies a full barrier.
> > > */
> > > smp_mb();
> > >
> > > This gives a high-level view of what's happening but it can still be
> > > mapped to the code by following the poll_scheduled variable.
> >
> > This looks really good to me.
> > If you agree on the first comment modification, should I respin the
> > next version?
>
> Yeah, sounds good to me!
Thanks! I'll post an update shortly.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 19:55 ` Suren Baghdasaryan
0 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 19:55 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > >
> > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > return now + group->poll_min_period;
> > > > > > }
> > > > > >
> > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > + bool force)
> > > > > > {
> > > > > > struct task_struct *task;
> > > > > >
> > > > > > - /*
> > > > > > - * Do not reschedule if already scheduled.
> > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > - * will keep updates on schedule.
> > > > > > - */
> > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > return;
> > > > >
> > > > > This explains what the code does, but not why. It would be good to
> > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > should mention each other.
> > > >
> > > > How about this:
> > > >
> > > > /*
> > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > */
> > >
> > > The memory barrier part makes sense, but the first part says what the
> > > code does and the message is unclear to me. Are you worried somebody
> > > might turn this around in the future and only conditionalize on
> > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > dropping that. But maybe I'm missing something.
> >
> > Actually you are right. Originally I was worried that there might be a
> > case when poll_scheduled==0 and force==true and if someone flips the
> > conditions we will reschedule the timer but will not set
> > poll_scheduled back to 1.
>
> Oh I see.
>
> Right, flipping the condition doesn't make sense because we need
> poll_scheduled to be set when we go ahead - whether we're forcing or
> not. I.e. if we were in a locked section, we'd write it like this:
>
> if (poll_scheduled)
> if (!force)
> return;
> else
> poll_scheduled = 1;
>
> > However I don't think this condition is possible. We set force=true
> > only when we skipped resetting poll_schedule to 0 and on initial
> > wakeup we always reset poll_schedule. How about changing the comment
> > to this:
> >
> > /*
> > * atomic_xchg should be called even when !force to provide a
> > * full memory barrier (see the comment inside psi_poll_work).
> > */
>
> Personally, I still find this more confusing than no comment on
> !force, because when you read it it sort of raises the question what
> the alternatives would be. And the alternatives appear to be
> nonsensical code rather than legitimate options.
>
> But I won't insist if you prefer to leave it in. Your call.
I would like to keep it as a precaution, if you don't mind. In case
someone in the future thinks about "optimizing" this by flipping the
condition, hopefully the comment will give them a pause to think about
it :)
>
> > > /*
> > > * A task change can race with the poll worker that is supposed to
> > > * report on it. To avoid missing events, ensure ordering between
> > > * poll_scheduled and the task state accesses, such that if the poll
> > > * worker misses the state update, the task change is guaranteed to
> > > * reschedule the poll worker:
> > > *
> > > * poll worker:
> > > * atomic_set(poll_scheduled, 0)
> > > * smp_mb()
> > > * LOAD states
> > > *
> > > * task change:
> > > * STORE states
> > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > * schedule poll worker
> > > *
> > > * The atomic_xchg() implies a full barrier.
> > > */
> > > smp_mb();
> > >
> > > This gives a high-level view of what's happening but it can still be
> > > mapped to the code by following the poll_scheduled variable.
> >
> > This looks really good to me.
> > If you agree on the first comment modification, should I respin the
> > next version?
>
> Yeah, sounds good to me!
Thanks! I'll post an update shortly.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 19:55 ` Suren Baghdasaryan
0 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 19:55 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > >
> > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > return now + group->poll_min_period;
> > > > > > }
> > > > > >
> > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > + bool force)
> > > > > > {
> > > > > > struct task_struct *task;
> > > > > >
> > > > > > - /*
> > > > > > - * Do not reschedule if already scheduled.
> > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > - * will keep updates on schedule.
> > > > > > - */
> > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > return;
> > > > >
> > > > > This explains what the code does, but not why. It would be good to
> > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > should mention each other.
> > > >
> > > > How about this:
> > > >
> > > > /*
> > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > */
> > >
> > > The memory barrier part makes sense, but the first part says what the
> > > code does and the message is unclear to me. Are you worried somebody
> > > might turn this around in the future and only conditionalize on
> > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > dropping that. But maybe I'm missing something.
> >
> > Actually you are right. Originally I was worried that there might be a
> > case when poll_scheduled==0 and force==true and if someone flips the
> > conditions we will reschedule the timer but will not set
> > poll_scheduled back to 1.
>
> Oh I see.
>
> Right, flipping the condition doesn't make sense because we need
> poll_scheduled to be set when we go ahead - whether we're forcing or
> not. I.e. if we were in a locked section, we'd write it like this:
>
> if (poll_scheduled)
> if (!force)
> return;
> else
> poll_scheduled = 1;
>
> > However I don't think this condition is possible. We set force=true
> > only when we skipped resetting poll_schedule to 0 and on initial
> > wakeup we always reset poll_schedule. How about changing the comment
> > to this:
> >
> > /*
> > * atomic_xchg should be called even when !force to provide a
> > * full memory barrier (see the comment inside psi_poll_work).
> > */
>
> Personally, I still find this more confusing than no comment on
> !force, because when you read it it sort of raises the question what
> the alternatives would be. And the alternatives appear to be
> nonsensical code rather than legitimate options.
>
> But I won't insist if you prefer to leave it in. Your call.
I would like to keep it as a precaution, if you don't mind. In case
someone in the future thinks about "optimizing" this by flipping the
condition, hopefully the comment will give them a pause to think about
it :)
>
> > > /*
> > > * A task change can race with the poll worker that is supposed to
> > > * report on it. To avoid missing events, ensure ordering between
> > > * poll_scheduled and the task state accesses, such that if the poll
> > > * worker misses the state update, the task change is guaranteed to
> > > * reschedule the poll worker:
> > > *
> > > * poll worker:
> > > * atomic_set(poll_scheduled, 0)
> > > * smp_mb()
> > > * LOAD states
> > > *
> > > * task change:
> > > * STORE states
> > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > * schedule poll worker
> > > *
> > > * The atomic_xchg() implies a full barrier.
> > > */
> > > smp_mb();
> > >
> > > This gives a high-level view of what's happening but it can still be
> > > mapped to the code by following the poll_scheduled variable.
> >
> > This looks really good to me.
> > If you agree on the first comment modification, should I respin the
> > next version?
>
> Yeah, sounds good to me!
Thanks! I'll post an update shortly.
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
2021-07-08 19:55 ` Suren Baghdasaryan
(?)
@ 2021-07-08 20:37 ` Suren Baghdasaryan
-1 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 20:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 8, 2021 at 12:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > > >
> > > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > > return now + group->poll_min_period;
> > > > > > > }
> > > > > > >
> > > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > > + bool force)
> > > > > > > {
> > > > > > > struct task_struct *task;
> > > > > > >
> > > > > > > - /*
> > > > > > > - * Do not reschedule if already scheduled.
> > > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > > - * will keep updates on schedule.
> > > > > > > - */
> > > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > > return;
> > > > > >
> > > > > > This explains what the code does, but not why. It would be good to
> > > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > > should mention each other.
> > > > >
> > > > > How about this:
> > > > >
> > > > > /*
> > > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > > */
> > > >
> > > > The memory barrier part makes sense, but the first part says what the
> > > > code does and the message is unclear to me. Are you worried somebody
> > > > might turn this around in the future and only conditionalize on
> > > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > > dropping that. But maybe I'm missing something.
> > >
> > > Actually you are right. Originally I was worried that there might be a
> > > case when poll_scheduled==0 and force==true and if someone flips the
> > > conditions we will reschedule the timer but will not set
> > > poll_scheduled back to 1.
> >
> > Oh I see.
> >
> > Right, flipping the condition doesn't make sense because we need
> > poll_scheduled to be set when we go ahead - whether we're forcing or
> > not. I.e. if we were in a locked section, we'd write it like this:
> >
> > if (poll_scheduled)
> > if (!force)
> > return;
> > else
> > poll_scheduled = 1;
> >
> > > However I don't think this condition is possible. We set force=true
> > > only when we skipped resetting poll_schedule to 0 and on initial
> > > wakeup we always reset poll_schedule. How about changing the comment
> > > to this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to provide a
> > > * full memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > Personally, I still find this more confusing than no comment on
> > !force, because when you read it it sort of raises the question what
> > the alternatives would be. And the alternatives appear to be
> > nonsensical code rather than legitimate options.
> >
> > But I won't insist if you prefer to leave it in. Your call.
>
> I would like to keep it as a precaution, if you don't mind. In case
> someone in the future thinks about "optimizing" this by flipping the
> condition, hopefully the comment will give them a pause to think about
> it :)
>
> >
> > > > /*
> > > > * A task change can race with the poll worker that is supposed to
> > > > * report on it. To avoid missing events, ensure ordering between
> > > > * poll_scheduled and the task state accesses, such that if the poll
> > > > * worker misses the state update, the task change is guaranteed to
> > > > * reschedule the poll worker:
> > > > *
> > > > * poll worker:
> > > > * atomic_set(poll_scheduled, 0)
> > > > * smp_mb()
> > > > * LOAD states
> > > > *
> > > > * task change:
> > > > * STORE states
> > > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > > * schedule poll worker
> > > > *
> > > > * The atomic_xchg() implies a full barrier.
> > > > */
> > > > smp_mb();
> > > >
> > > > This gives a high-level view of what's happening but it can still be
> > > > mapped to the code by following the poll_scheduled variable.
> > >
> > > This looks really good to me.
> > > If you agree on the first comment modification, should I respin the
> > > next version?
> >
> > Yeah, sounds good to me!
>
> Thanks! I'll post an update shortly.
v4 is posted at https://lore.kernel.org/patchwork/patch/1455172/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 20:37 ` Suren Baghdasaryan
0 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 20:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 8, 2021 at 12:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > > >
> > > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > > return now + group->poll_min_period;
> > > > > > > }
> > > > > > >
> > > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > > + bool force)
> > > > > > > {
> > > > > > > struct task_struct *task;
> > > > > > >
> > > > > > > - /*
> > > > > > > - * Do not reschedule if already scheduled.
> > > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > > - * will keep updates on schedule.
> > > > > > > - */
> > > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > > return;
> > > > > >
> > > > > > This explains what the code does, but not why. It would be good to
> > > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > > should mention each other.
> > > > >
> > > > > How about this:
> > > > >
> > > > > /*
> > > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > > */
> > > >
> > > > The memory barrier part makes sense, but the first part says what the
> > > > code does and the message is unclear to me. Are you worried somebody
> > > > might turn this around in the future and only conditionalize on
> > > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > > dropping that. But maybe I'm missing something.
> > >
> > > Actually you are right. Originally I was worried that there might be a
> > > case when poll_scheduled==0 and force==true and if someone flips the
> > > conditions we will reschedule the timer but will not set
> > > poll_scheduled back to 1.
> >
> > Oh I see.
> >
> > Right, flipping the condition doesn't make sense because we need
> > poll_scheduled to be set when we go ahead - whether we're forcing or
> > not. I.e. if we were in a locked section, we'd write it like this:
> >
> > if (poll_scheduled)
> > if (!force)
> > return;
> > else
> > poll_scheduled = 1;
> >
> > > However I don't think this condition is possible. We set force=true
> > > only when we skipped resetting poll_schedule to 0 and on initial
> > > wakeup we always reset poll_schedule. How about changing the comment
> > > to this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to provide a
> > > * full memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > Personally, I still find this more confusing than no comment on
> > !force, because when you read it it sort of raises the question what
> > the alternatives would be. And the alternatives appear to be
> > nonsensical code rather than legitimate options.
> >
> > But I won't insist if you prefer to leave it in. Your call.
>
> I would like to keep it as a precaution, if you don't mind. In case
> someone in the future thinks about "optimizing" this by flipping the
> condition, hopefully the comment will give them a pause to think about
> it :)
>
> >
> > > > /*
> > > > * A task change can race with the poll worker that is supposed to
> > > > * report on it. To avoid missing events, ensure ordering between
> > > > * poll_scheduled and the task state accesses, such that if the poll
> > > > * worker misses the state update, the task change is guaranteed to
> > > > * reschedule the poll worker:
> > > > *
> > > > * poll worker:
> > > > * atomic_set(poll_scheduled, 0)
> > > > * smp_mb()
> > > > * LOAD states
> > > > *
> > > > * task change:
> > > > * STORE states
> > > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > > * schedule poll worker
> > > > *
> > > > * The atomic_xchg() implies a full barrier.
> > > > */
> > > > smp_mb();
> > > >
> > > > This gives a high-level view of what's happening but it can still be
> > > > mapped to the code by following the poll_scheduled variable.
> > >
> > > This looks really good to me.
> > > If you agree on the first comment modification, should I respin the
> > > next version?
> >
> > Yeah, sounds good to me!
>
> Thanks! I'll post an update shortly.
v4 is posted at https://lore.kernel.org/patchwork/patch/1455172/
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
@ 2021-07-08 20:37 ` Suren Baghdasaryan
0 siblings, 0 replies; 24+ messages in thread
From: Suren Baghdasaryan @ 2021-07-08 20:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Benjamin Segall, Mel Gorman,
Daniel Bristot de Oliveira, matthias.bgg, Minchan Kim,
Tim Murray, YT Chang, Wenju Xu (许文举),
Jonathan JMChen (陳家明),
LKML, linux-arm-kernel, linux-mediatek, kernel-team, SH Chen
On Thu, Jul 8, 2021 at 12:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > > >
> > > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > > return now + group->poll_min_period;
> > > > > > > }
> > > > > > >
> > > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > > + bool force)
> > > > > > > {
> > > > > > > struct task_struct *task;
> > > > > > >
> > > > > > > - /*
> > > > > > > - * Do not reschedule if already scheduled.
> > > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > > - * will keep updates on schedule.
> > > > > > > - */
> > > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > > return;
> > > > > >
> > > > > > This explains what the code does, but not why. It would be good to
> > > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > > should mention each other.
> > > > >
> > > > > How about this:
> > > > >
> > > > > /*
> > > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > > */
> > > >
> > > > The memory barrier part makes sense, but the first part says what the
> > > > code does and the message is unclear to me. Are you worried somebody
> > > > might turn this around in the future and only conditionalize on
> > > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > > dropping that. But maybe I'm missing something.
> > >
> > > Actually you are right. Originally I was worried that there might be a
> > > case when poll_scheduled==0 and force==true and if someone flips the
> > > conditions we will reschedule the timer but will not set
> > > poll_scheduled back to 1.
> >
> > Oh I see.
> >
> > Right, flipping the condition doesn't make sense because we need
> > poll_scheduled to be set when we go ahead - whether we're forcing or
> > not. I.e. if we were in a locked section, we'd write it like this:
> >
> > if (poll_scheduled)
> > if (!force)
> > return;
> > else
> > poll_scheduled = 1;
> >
> > > However I don't think this condition is possible. We set force=true
> > > only when we skipped resetting poll_schedule to 0 and on initial
> > > wakeup we always reset poll_schedule. How about changing the comment
> > > to this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to provide a
> > > * full memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > Personally, I still find this more confusing than no comment on
> > !force, because when you read it it sort of raises the question what
> > the alternatives would be. And the alternatives appear to be
> > nonsensical code rather than legitimate options.
> >
> > But I won't insist if you prefer to leave it in. Your call.
>
> I would like to keep it as a precaution, if you don't mind. In case
> someone in the future thinks about "optimizing" this by flipping the
> condition, hopefully the comment will give them a pause to think about
> it :)
>
> >
> > > > /*
> > > > * A task change can race with the poll worker that is supposed to
> > > > * report on it. To avoid missing events, ensure ordering between
> > > > * poll_scheduled and the task state accesses, such that if the poll
> > > > * worker misses the state update, the task change is guaranteed to
> > > > * reschedule the poll worker:
> > > > *
> > > > * poll worker:
> > > > * atomic_set(poll_scheduled, 0)
> > > > * smp_mb()
> > > > * LOAD states
> > > > *
> > > > * task change:
> > > > * STORE states
> > > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > > * schedule poll worker
> > > > *
> > > > * The atomic_xchg() implies a full barrier.
> > > > */
> > > > smp_mb();
> > > >
> > > > This gives a high-level view of what's happening but it can still be
> > > > mapped to the code by following the poll_scheduled variable.
> > >
> > > This looks really good to me.
> > > If you agree on the first comment modification, should I respin the
> > > next version?
> >
> > Yeah, sounds good to me!
>
> Thanks! I'll post an update shortly.
v4 is posted at https://lore.kernel.org/patchwork/patch/1455172/
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 24+ messages in thread