From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933718AbdDFLKL (ORCPT ); Thu, 6 Apr 2017 07:10:11 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:36616 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933338AbdDFLKE (ORCPT ); Thu, 6 Apr 2017 07:10:04 -0400 Date: Thu, 6 Apr 2017 13:10:01 +0200 (CEST) From: Thomas Gleixner To: Ingo Molnar cc: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, Peter Zijlstra , Mike Galbraith , Ingo Molnar , "Rafael J . Wysocki" Subject: Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask In-Reply-To: <20170406110215.GA1367@gmail.com> Message-ID: References: <20170404184202.20376-1-bigeasy@linutronix.de> <20170405073943.GA17266@gmail.com> <20170405083753.7eszej2njds4ovdb@linutronix.de> <20170406061622.GA19979@gmail.com> <20170406073832.e7bu4ldpfuq44ui6@linutronix.de> <20170406080139.GA22069@gmail.com> <20170406110215.GA1367@gmail.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 6 Apr 2017, Ingo Molnar wrote: > CPU hotplug and changing the affinity mask are the more complex cases, because > there migrating or not migrating is a correctness issue: > > - CPU hotplug has to be aware of this anyway, regardless of whether it's solved > via a counter of the affinity mask. You have to prevent CPU hotplug simply as long as there are migration disabled tasks on the fly. Making that depend on whether they are on a CPU which is about to be unplugged or not would be complete overkill as you still have to solve the case that a task sets the migrate_disable() AFTER the cpu down machinery started. > - Changing the affinity mask (set_cpus_allowed()) has two main cases: > the synchronous and asynchronous case: > > - synchronous is when the current task changes its own affinity mask, this > should work fine mostly out of box, as we don't call set_cpus_allowed() > from inside migration disabled regions. (We can enforce this via a > debugging check.) > > - The asynchronous case is when the affinity task of some other task is > changed - this would not have an immediate effect with migration-disabled > logic, the migration would be delayed to when migration is re-enabled > again. > > As for general fragility, is there any reason why a simple debugging check in > set_task_cpu() would not catch most mishaps: > > WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled); > > ... or something like that? > > I.e. my point is that I think using a counter would be much simpler, yet still as > robust and maintainable. We could in fact move migrate_disable()/enable() upstream > straight away and eliminate this small fork of functionality between mainline and > -rt. The counter alone might be enough for the scheduler placement decisions, but it cannot solve the hotplug issue. You still need something like I sketched out in my previous reply. Thanks, tglx