From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S933718AbdDFLKL (ORCPT <rfc822;w@1wt.eu>);
        Thu, 6 Apr 2017 07:10:11 -0400
Received: from Galois.linutronix.de ([146.0.238.70]:36616 "EHLO
        Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S933338AbdDFLKE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 6 Apr 2017 07:10:04 -0400
Date: Thu, 6 Apr 2017 13:10:01 +0200 (CEST)
From: Thomas Gleixner <tglx@linutronix.de>
To: Ingo Molnar <mingo@kernel.org>
cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        linux-kernel@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
        Mike Galbraith <efault@gmx.de>, Ingo Molnar <mingo@elte.hu>,
        "Rafael J . Wysocki" <rjw@rjwysocki.net>
Subject: Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU
 mask
In-Reply-To: <20170406110215.GA1367@gmail.com>
Message-ID: <alpine.DEB.2.20.1704061304340.1716@nanos>
References: <20170404184202.20376-1-bigeasy@linutronix.de> <20170405073943.GA17266@gmail.com> <20170405083753.7eszej2njds4ovdb@linutronix.de> <20170406061622.GA19979@gmail.com> <20170406073832.e7bu4ldpfuq44ui6@linutronix.de> <20170406080139.GA22069@gmail.com>
 <alpine.DEB.2.20.1704061052370.1716@nanos> <20170406110215.GA1367@gmail.com>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, 6 Apr 2017, Ingo Molnar wrote:
> CPU hotplug and changing the affinity mask are the more complex cases, because 
> there migrating or not migrating is a correctness issue:
> 
>  - CPU hotplug has to be aware of this anyway, regardless of whether it's solved 
>    via a counter of the affinity mask.

You have to prevent CPU hotplug simply as long as there are migration
disabled tasks on the fly. Making that depend on whether they are on a CPU
which is about to be unplugged or not would be complete overkill as you
still have to solve the case that a task sets the migrate_disable() AFTER
the cpu down machinery started.

>  - Changing the affinity mask (set_cpus_allowed()) has two main cases:
>    the synchronous and asynchronous case:
> 
>      - synchronous is when the current task changes its own affinity mask, this 
>        should work fine mostly out of box, as we don't call set_cpus_allowed() 
>        from inside migration disabled regions. (We can enforce this via a 
>        debugging check.)
> 
>      - The asynchronous case is when the affinity task of some other task is 
>        changed - this would not have an immediate effect with migration-disabled 
>        logic, the migration would be delayed to when migration is re-enabled 
>        again.
> 
> As for general fragility, is there any reason why a simple debugging check in 
> set_task_cpu() would not catch most mishaps:
> 
> 	WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);
> 
> ... or something like that?
>
> I.e. my point is that I think using a counter would be much simpler, yet still as 
> robust and maintainable. We could in fact move migrate_disable()/enable() upstream 
> straight away and eliminate this small fork of functionality between mainline and 
> -rt.

The counter alone might be enough for the scheduler placement decisions,
but it cannot solve the hotplug issue. You still need something like I
sketched out in my previous reply.

Thanks,

	tglx