From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750761AbaLNSEQ (ORCPT ); Sun, 14 Dec 2014 13:04:16 -0500 Received: from mail-wg0-f46.google.com ([74.125.82.46]:38454 "EHLO mail-wg0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750701AbaLNSEP (ORCPT ); Sun, 14 Dec 2014 13:04:15 -0500 Date: Sun, 14 Dec 2014 19:04:11 +0100 From: Frederic Weisbecker To: Ingo Molnar Cc: Linus Torvalds , Dave Jones , Chris Mason , Mike Galbraith , Peter Zijlstra , =?iso-8859-1?Q?D=E2niel?= Fraga , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List Subject: Re: [PATCH] sched: Fix lost reschedule in __cond_resched() Message-ID: <20141214180409.GB12622@lerouge> References: <20141205171501.GA1320@redhat.com> <1417806247.4845.1@mail.thefacebook.com> <20141211145408.GB16800@redhat.com> <20141212185454.GB4716@redhat.com> <20141213073634.GD32572@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141213073634.GD32572@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 13, 2014 at 08:36:34AM +0100, Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > I'm also not sure if the bug ever happens with preemption > > disabled. Sasha, was that you who reported that you cannot > > reproduce it without preemption? It strikes me that there's a > > race condition in __cond_resched() wrt preemption, for example: > > we do > > > > __preempt_count_add(PREEMPT_ACTIVE); > > __schedule(); > > __preempt_count_sub(PREEMPT_ACTIVE); > > > > and in between the __schedule() and __preempt_count_sub(), if > > an interrupt comes in and wakes up some important process, it > > won't reschedule (because preemption is active), but then we > > enable preemption again and don't check whether we should > > reschedule (again), and we just go on our merry ways. > > Indeed, that's a really good find regardless of whether it's the > source of these lockups - the (untested) patch below ought to > cure that. > > > Now, I don't see how that could really matter for a long time - > > returning to user space will check need_resched, and sleeping > > will obviously force a reschedule anyway, so these kinds of > > races should at most delay things by just a tiny amount, but > > maybe there is some case where we screw up in a bigger way. So > > I do *not* believe that the one in __cond_resched() matters, > > but I'm giving it as an example of the kind of things that > > could go wrong. > > (as you later note) NOHZ is somewhat special in this regard, > because there we try really hard not to run anything > periodically, so a lost reschedule will matter more. > > But ... I'd be surprised if this patch made a difference: it > should normally not be possible to go idle with tasks on the > runqueue (even with this bug present), and with at least one busy > task on the CPU we get the regular scheduler tick which ought to > hide such latencies. > > It's nevertheless a good thing to fix, I'm just not sure it's the > root cause of the observed lockup here. > > Thanks, > > Ingo > > -- > > Reported-by: Linus Torvalds > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index bb398c0c5f08..532809aa0544 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4207,6 +4207,8 @@ static void __cond_resched(void) > __preempt_count_add(PREEMPT_ACTIVE); > __schedule(); > __preempt_count_sub(PREEMPT_ACTIVE); > + if (need_resched()) > + __schedule(); > } Nice catch! This indeed matters a lot for full nohz where a lost reschedule interrupt might be ignored and not fixed with a near tick. Although even if it is fixed by a tick, a missed reschedule delayed by HZ involves latency issue. Anyway, probably the above __schedule() should stay as a preemption point to make sure that a TASK_[UN]INTERRUPTIBLE is handled as expected and avoids early task deactivation. Such as: Signed-off-by: Frederic Weisbecker --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 240157c..6e942f3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2922,6 +2922,21 @@ void __sched schedule_preempt_disabled(void) preempt_disable(); } +static void __preempt_schedule(void) +{ + do { + __preempt_count_add(PREEMPT_ACTIVE); + __schedule(); + __preempt_count_sub(PREEMPT_ACTIVE); + + /* + * Check again in case we missed a preemption opportunity + * between schedule and now. + */ + barrier(); + } while (need_resched()); +} + #ifdef CONFIG_PREEMPT /* * this is the entry point to schedule() from in-kernel preemption @@ -2937,17 +2952,7 @@ asmlinkage __visible void __sched notrace preempt_schedule(void) if (likely(!preemptible())) return; - do { - __preempt_count_add(PREEMPT_ACTIVE); - __schedule(); - __preempt_count_sub(PREEMPT_ACTIVE); - - /* - * Check again in case we missed a preemption opportunity - * between schedule and now. - */ - barrier(); - } while (need_resched()); + __preempt_schedule(); } NOKPROBE_SYMBOL(preempt_schedule); EXPORT_SYMBOL(preempt_schedule); @@ -4249,9 +4254,7 @@ SYSCALL_DEFINE0(sched_yield) static void __cond_resched(void) { - __preempt_count_add(PREEMPT_ACTIVE); - __schedule(); - __preempt_count_sub(PREEMPT_ACTIVE); + __preempt_schedule(); } int __sched _cond_resched(void)