From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932603AbbFEJBq (ORCPT ); Fri, 5 Jun 2015 05:01:46 -0400 Received: from casper.infradead.org ([85.118.1.10]:51562 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754232AbbFEI7L (ORCPT ); Fri, 5 Jun 2015 04:59:11 -0400 Message-Id: <20150605084836.364306429@infradead.org> User-Agent: quilt/0.61-1 Date: Fri, 05 Jun 2015 10:48:36 +0200 From: Peter Zijlstra To: umgwanakikbuti@gmail.com, mingo@elte.hu Cc: ktkhai@parallels.com, rostedt@goodmis.org, tglx@linutronix.de, juri.lelli@gmail.com, pang.xunlei@linaro.org, oleg@redhat.com, wanpeng.li@linux.intel.com, linux-kernel@vger.kernel.org, peterz@infradead.org Subject: [PATCH 00/14] sched: balance callbacks Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mike stumbled over a cute bug where the RT/DL balancing ops caused a bug. The exact scenario is __sched_setscheduler() changing a (runnable) task from FIFO to OTHER. In swiched_from_rt(), where we do pull_rt_task() we temporarity drop rq->lock. This gap allows regular cfs load-balancing to step in and migrate our task. However, check_class_changed() will happily continue with switched_to_fair() which assumes our task is still on the old rq and makes the kernel go boom. Instead of trying to patch this up and make things complicated; simply disallow these methods to drop rq->lock and extend the current post_schedule stuff into a balancing callback list, and use that. This survives Mike's testcase. Changes since -v2: - reworked the hrtimer patch. -- Kirill, tglx - added lock pinning Changes since -v1: - make SMP=n build, - cured switched_from_dl()'s cancel_dl_timer(). no real tests on the new parts other than booting / building kernels.