From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754984AbcIEIhN (ORCPT ); Mon, 5 Sep 2016 04:37:13 -0400 Received: from mail-yw0-f195.google.com ([209.85.161.195]:33898 "EHLO mail-yw0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752268AbcIEIhL (ORCPT ); Mon, 5 Sep 2016 04:37:11 -0400 MIME-Version: 1.0 In-Reply-To: <20160905074858.GP10153@twins.programming.kicks-ass.net> References: <33872563-f4bb-0b01-6848-aaa8482aa91f@gmail.com> <1473059659.2313.45.camel@kernel.crashing.org> <20160905074858.GP10153@twins.programming.kicks-ass.net> From: Balbir Singh Date: Mon, 5 Sep 2016 18:37:10 +1000 Message-ID: Subject: Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task To: Peter Zijlstra Cc: Benjamin Herrenschmidt , LKML , Oleg Nesterov , Nicholas Piggin , Alexey Kardashevskiy Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 5, 2016 at 5:48 PM, Peter Zijlstra wrote: > On Mon, Sep 05, 2016 at 05:14:19PM +1000, Benjamin Herrenschmidt wrote: >> On Mon, 2016-09-05 at 13:16 +1000, Balbir Singh wrote: >> >> .../... >> > >> > Cc: Peter Zijlstra >> > Cc: Nicholas Piggin >> >> Acked-by: Benjamin Herrenschmidt >> >> > Signed-off-by: Balbir Singh >> > --- >> > kernel/sched/core.c | 11 +++++++++++ >> > 1 file changed, 11 insertions(+) >> > >> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> > index 2a906f2..582c684 100644 >> > --- a/kernel/sched/core.c >> > +++ b/kernel/sched/core.c >> > @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned >> > int state, int wake_flags) >> > success = 1; /* we're going to change ->state */ >> > cpu = task_cpu(p); >> > >> > + /* >> > + * Ensure we see on_rq and p_state consistently >> > + * >> > + * For example in __rwsem_down_write_failed(), we have >> > + * [S] ->on_rq = 1 [L] ->state >> > + * MB RMB >> > + * [S] ->state = TASK_UNINTERRUPTIBLE [L] ->on_rq >> > + * In the absence of the RMB p->on_rq can be observed to be 0 >> > + * and we end up spinning indefinitely in while (p->on_cpu) >> > + */ > > So I did replace that comment with the one I proposed earlier. I checked > a fair number of architectures and many did not have an obvious barrier > in switch_to(). So that is not something we can rely on, nor do we need > to I think. > Thanks for the comment edit and thanks for letting us know. Balbir Singh