From: Peter Zijlstra <peterz@infradead.org> To: Will Deacon <will@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net>, Davidlohr Bueso <dave@stgolabs.net>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: Fix data-race in wakeup Date: Tue, 17 Nov 2020 10:29:36 +0100 [thread overview] Message-ID: <20201117092936.GA3121406@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <20201117091545.GA31837@willie-the-truck> On Tue, Nov 17, 2020 at 09:15:46AM +0000, Will Deacon wrote: > On Tue, Nov 17, 2020 at 09:30:16AM +0100, Peter Zijlstra wrote: > > Subject: sched: Fix data-race in wakeup > > From: Peter Zijlstra <peterz@infradead.org> > > Date: Tue Nov 17 09:08:41 CET 2020 > > > > Mel reported that on some ARM64 platforms loadavg goes bananas and > > tracked it down to the following data race: > > > > CPU0 CPU1 > > > > schedule() > > prev->sched_contributes_to_load = X; > > deactivate_task(prev); > > > > try_to_wake_up() > > if (p->on_rq &&) // false > > if (smp_load_acquire(&p->on_cpu) && // true > > ttwu_queue_wakelist()) > > p->sched_remote_wakeup = Y; > > > > smp_store_release(prev->on_cpu, 0); > > (nit: I suggested this race over at [1] ;) Ah, I'll ammend and get you a Debugged-by line or something ;-) > > where both p->sched_contributes_to_load and p->sched_remote_wakeup are > > in the same word, and thus the stores X and Y race (and can clobber > > one another's data). > > > > Whereas prior to commit c6e7bd7afaeb ("sched/core: Optimize ttwu() > > spinning on p->on_cpu") the p->on_cpu handoff serialized access to > > p->sched_remote_wakeup (just as it still does with > > p->sched_contributes_to_load) that commit broke that by calling > > ttwu_queue_wakelist() with p->on_cpu != 0. > > > > However, due to > > > > p->XXX ttwu() > > schedule() if (p->on_rq && ...) // false > > smp_mb__after_spinlock() if (smp_load_acquire(&p->on_cpu) && > > deactivate_task() ttwu_queue_wakelist()) > > p->on_rq = 0; p->sched_remote_wakeup = X; > > > > We can be sure any 'current' store is complete and 'current' is > > guaranteed asleep. Therefore we can move p->sched_remote_wakeup into > > the current flags word. > > > > Note: while the observed failure was loadavg accounting gone wrong due > > to ttwu() cobbering p->sched_contributes_to_load, the reverse problem > > is also possible where schedule() clobbers p->sched_remote_wakeup, > > this could result in enqueue_entity() wrecking ->vruntime and causing > > scheduling artifacts. > > > > Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") > > Reported-by: Mel Gorman <mgorman@techsingularity.net> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > include/linux/sched.h | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -775,7 +775,6 @@ struct task_struct { > > unsigned sched_reset_on_fork:1; > > unsigned sched_contributes_to_load:1; > > unsigned sched_migrated:1; > > - unsigned sched_remote_wakeup:1; > > #ifdef CONFIG_PSI > > unsigned sched_psi_wake_requeue:1; > > #endif > > @@ -785,6 +784,18 @@ struct task_struct { > > > > /* Unserialized, strictly 'current' */ > > > > + /* > > + * p->in_iowait = 1; ttwu() > > + * schedule() if (p->on_rq && ..) // false > > + * smp_mb__after_spinlock(); if (smp_load_acquire(&p->on_cpu) && //true > > + * deactivate_task() ttwu_queue_wakelist()) > > + * p->on_rq = 0; p->sched_remote_wakeup = X; > > + * > > + * Guarantees all stores of 'current' are visible before > > + * ->sched_remote_wakeup gets used. > > I'm still not sure this is particularly clear -- don't we want to highlight > that the store of p->on_rq is unordered wrt the update to > p->sched_contributes_to_load() in deactivate_task()? I can explicitly call that out I suppose. > I dislike bitfields with a passion, but the fix looks good: I don't particularly hate them, they're just a flag field with names on (in this case). > Acked-by: Will Deacon <will@kernel.org> Thanks! > Now the million dollar question is why KCSAN hasn't run into this. Hrmph. kernel/sched/Makefile:KCSAN_SANITIZE := n might have something to do with that, I suppose.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org> To: Will Deacon <will@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net>, Mel Gorman <mgorman@techsingularity.net>, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH] sched: Fix data-race in wakeup Date: Tue, 17 Nov 2020 10:29:36 +0100 [thread overview] Message-ID: <20201117092936.GA3121406@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <20201117091545.GA31837@willie-the-truck> On Tue, Nov 17, 2020 at 09:15:46AM +0000, Will Deacon wrote: > On Tue, Nov 17, 2020 at 09:30:16AM +0100, Peter Zijlstra wrote: > > Subject: sched: Fix data-race in wakeup > > From: Peter Zijlstra <peterz@infradead.org> > > Date: Tue Nov 17 09:08:41 CET 2020 > > > > Mel reported that on some ARM64 platforms loadavg goes bananas and > > tracked it down to the following data race: > > > > CPU0 CPU1 > > > > schedule() > > prev->sched_contributes_to_load = X; > > deactivate_task(prev); > > > > try_to_wake_up() > > if (p->on_rq &&) // false > > if (smp_load_acquire(&p->on_cpu) && // true > > ttwu_queue_wakelist()) > > p->sched_remote_wakeup = Y; > > > > smp_store_release(prev->on_cpu, 0); > > (nit: I suggested this race over at [1] ;) Ah, I'll ammend and get you a Debugged-by line or something ;-) > > where both p->sched_contributes_to_load and p->sched_remote_wakeup are > > in the same word, and thus the stores X and Y race (and can clobber > > one another's data). > > > > Whereas prior to commit c6e7bd7afaeb ("sched/core: Optimize ttwu() > > spinning on p->on_cpu") the p->on_cpu handoff serialized access to > > p->sched_remote_wakeup (just as it still does with > > p->sched_contributes_to_load) that commit broke that by calling > > ttwu_queue_wakelist() with p->on_cpu != 0. > > > > However, due to > > > > p->XXX ttwu() > > schedule() if (p->on_rq && ...) // false > > smp_mb__after_spinlock() if (smp_load_acquire(&p->on_cpu) && > > deactivate_task() ttwu_queue_wakelist()) > > p->on_rq = 0; p->sched_remote_wakeup = X; > > > > We can be sure any 'current' store is complete and 'current' is > > guaranteed asleep. Therefore we can move p->sched_remote_wakeup into > > the current flags word. > > > > Note: while the observed failure was loadavg accounting gone wrong due > > to ttwu() cobbering p->sched_contributes_to_load, the reverse problem > > is also possible where schedule() clobbers p->sched_remote_wakeup, > > this could result in enqueue_entity() wrecking ->vruntime and causing > > scheduling artifacts. > > > > Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") > > Reported-by: Mel Gorman <mgorman@techsingularity.net> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > include/linux/sched.h | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -775,7 +775,6 @@ struct task_struct { > > unsigned sched_reset_on_fork:1; > > unsigned sched_contributes_to_load:1; > > unsigned sched_migrated:1; > > - unsigned sched_remote_wakeup:1; > > #ifdef CONFIG_PSI > > unsigned sched_psi_wake_requeue:1; > > #endif > > @@ -785,6 +784,18 @@ struct task_struct { > > > > /* Unserialized, strictly 'current' */ > > > > + /* > > + * p->in_iowait = 1; ttwu() > > + * schedule() if (p->on_rq && ..) // false > > + * smp_mb__after_spinlock(); if (smp_load_acquire(&p->on_cpu) && //true > > + * deactivate_task() ttwu_queue_wakelist()) > > + * p->on_rq = 0; p->sched_remote_wakeup = X; > > + * > > + * Guarantees all stores of 'current' are visible before > > + * ->sched_remote_wakeup gets used. > > I'm still not sure this is particularly clear -- don't we want to highlight > that the store of p->on_rq is unordered wrt the update to > p->sched_contributes_to_load() in deactivate_task()? I can explicitly call that out I suppose. > I dislike bitfields with a passion, but the fix looks good: I don't particularly hate them, they're just a flag field with names on (in this case). > Acked-by: Will Deacon <will@kernel.org> Thanks! > Now the million dollar question is why KCSAN hasn't run into this. Hrmph. kernel/sched/Makefile:KCSAN_SANITIZE := n might have something to do with that, I suppose. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-11-17 9:29 UTC|newest] Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-16 9:10 Loadavg accounting error on arm64 Mel Gorman 2020-11-16 9:10 ` Mel Gorman 2020-11-16 11:49 ` Mel Gorman 2020-11-16 11:49 ` Mel Gorman 2020-11-16 12:00 ` Mel Gorman 2020-11-16 12:00 ` Mel Gorman 2020-11-16 12:53 ` Peter Zijlstra 2020-11-16 12:53 ` Peter Zijlstra 2020-11-16 12:58 ` Peter Zijlstra 2020-11-16 12:58 ` Peter Zijlstra 2020-11-16 15:29 ` Mel Gorman 2020-11-16 15:29 ` Mel Gorman 2020-11-16 16:42 ` Mel Gorman 2020-11-16 16:42 ` Mel Gorman 2020-11-16 16:49 ` Peter Zijlstra 2020-11-16 16:49 ` Peter Zijlstra 2020-11-16 17:24 ` Mel Gorman 2020-11-16 17:24 ` Mel Gorman 2020-11-16 17:41 ` Will Deacon 2020-11-16 17:41 ` Will Deacon 2020-11-16 12:46 ` Peter Zijlstra 2020-11-16 12:46 ` Peter Zijlstra 2020-11-16 12:58 ` Mel Gorman 2020-11-16 12:58 ` Mel Gorman 2020-11-16 13:11 ` Will Deacon 2020-11-16 13:11 ` Will Deacon 2020-11-16 13:37 ` Mel Gorman 2020-11-16 13:37 ` Mel Gorman 2020-11-16 14:20 ` Peter Zijlstra 2020-11-16 14:20 ` Peter Zijlstra 2020-11-16 15:52 ` Mel Gorman 2020-11-16 15:52 ` Mel Gorman 2020-11-16 16:54 ` Peter Zijlstra 2020-11-16 16:54 ` Peter Zijlstra 2020-11-16 17:16 ` Mel Gorman 2020-11-16 17:16 ` Mel Gorman 2020-11-16 19:31 ` Mel Gorman 2020-11-16 19:31 ` Mel Gorman 2020-11-17 8:30 ` [PATCH] sched: Fix data-race in wakeup Peter Zijlstra 2020-11-17 8:30 ` Peter Zijlstra 2020-11-17 9:15 ` Will Deacon 2020-11-17 9:15 ` Will Deacon 2020-11-17 9:29 ` Peter Zijlstra [this message] 2020-11-17 9:29 ` Peter Zijlstra 2020-11-17 9:46 ` Peter Zijlstra 2020-11-17 9:46 ` Peter Zijlstra 2020-11-17 10:36 ` Will Deacon 2020-11-17 10:36 ` Will Deacon 2020-11-17 12:52 ` Valentin Schneider 2020-11-17 12:52 ` Valentin Schneider 2020-11-17 15:37 ` Valentin Schneider 2020-11-17 15:37 ` Valentin Schneider 2020-11-17 16:13 ` Peter Zijlstra 2020-11-17 16:13 ` Peter Zijlstra 2020-11-17 19:32 ` Valentin Schneider 2020-11-17 19:32 ` Valentin Schneider 2020-11-18 8:05 ` Peter Zijlstra 2020-11-18 8:05 ` Peter Zijlstra 2020-11-18 9:51 ` Valentin Schneider 2020-11-18 9:51 ` Valentin Schneider 2020-11-18 13:33 ` Marco Elver 2020-11-18 13:33 ` Marco Elver 2020-11-17 9:38 ` [PATCH] sched: Fix rq->nr_iowait ordering Peter Zijlstra 2020-11-17 9:38 ` Peter Zijlstra 2020-11-17 11:43 ` Mel Gorman 2020-11-17 11:43 ` Mel Gorman 2020-11-19 9:55 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra 2020-11-17 12:40 ` [PATCH] sched: Fix data-race in wakeup Mel Gorman 2020-11-17 12:40 ` Mel Gorman 2020-11-19 9:55 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201117092936.GA3121406@hirez.programming.kicks-ass.net \ --to=peterz@infradead.org \ --cc=dave@stgolabs.net \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mgorman@techsingularity.net \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.