From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753393AbbFAOOM (ORCPT ); Mon, 1 Jun 2015 10:14:12 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:52738 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753039AbbFAON0 (ORCPT ); Mon, 1 Jun 2015 10:13:26 -0400 Message-Id: <20150601135818.506080835@infradead.org> User-Agent: quilt/0.61-1 Date: Mon, 01 Jun 2015 15:58:18 +0200 From: Peter Zijlstra To: umgwanakikbuti@gmail.com, mingo@elte.hu Cc: ktkhai@parallels.com, rostedt@goodmis.org, juri.lelli@gmail.com, pang.xunlei@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, "Peter Zijlstra" Subject: [RFC][PATCH 0/7] sched: balance callbacks Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Mike stumbled over a cute bug where the RT/DL balancing ops caused a bug. The exact scenario is __sched_setscheduler() changing a (runnable) task from FIFO to OTHER. In swiched_from_rt(), where we do pull_rt_task() we temporarity drop rq->lock. This gap allows regular cfs load-balancing to step in and migrate our. However, check_class_changed() will happily continue with switched_to_fair() which assumes our task is still on the old rq and makes the kernel go boom. Instead of trying to patch this up and make things complicated; simply disallow these methods to drop rq->lock and extend the current post_schedule stuff into a balancing callback list, and use that. This survives Mike's testcase for well over an hour on my ivb-ep. I've not yet tested it on anything bigger.