From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758003AbcH3QbA (ORCPT ); Tue, 30 Aug 2016 12:31:00 -0400 Received: from mail-ua0-f172.google.com ([209.85.217.172]:35799 "EHLO mail-ua0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756917AbcH3Qa6 (ORCPT ); Tue, 30 Aug 2016 12:30:58 -0400 MIME-Version: 1.0 In-Reply-To: References: <1471382376-5443-1-git-send-email-cmetcalf@mellanox.com> <1471382376-5443-5-git-send-email-cmetcalf@mellanox.com> <20160829163352.GV10153@twins.programming.kicks-ass.net> <20160830075854.GZ10153@twins.programming.kicks-ass.net> From: Andy Lutomirski Date: Tue, 30 Aug 2016 09:30:36 -0700 Message-ID: Subject: Re: [PATCH v15 04/13] task_isolation: add initial support To: Chris Metcalf Cc: Peter Zijlstra , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Michal Hocko , "linux-mm@kvack.org" , "linux-doc@vger.kernel.org" , Linux API , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: > On 8/30/2016 3:58 AM, Peter Zijlstra wrote: >> >> On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: >>> >>> On 8/29/2016 12:33 PM, Peter Zijlstra wrote: >>>> >>>> On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: >>>>> >>>>> + /* >>>>> + * Request rescheduling unless we are in full dynticks mode. >>>>> + * We would eventually get pre-empted without this, and if >>>>> + * there's another task waiting, it would run; but by >>>>> + * explicitly requesting the reschedule, we may reduce the >>>>> + * latency. We could directly call schedule() here as well, >>>>> + * but since our caller is the standard place where schedule() >>>>> + * is called, we defer to the caller. >>>>> + * >>>>> + * A more substantive approach here would be to use a struct >>>>> + * completion here explicitly, and complete it when we shut >>>>> + * down dynticks, but since we presumably have nothing better >>>>> + * to do on this core anyway, just spinning seems plausible. >>>>> + */ >>>>> + if (!tick_nohz_tick_stopped()) >>>>> + set_tsk_need_resched(current); >>>> >>>> This is broken.. and it would be really good if you don't actually need >>>> to do this. >>> >>> Can you elaborate? We clearly do want to wait until we are in full >>> dynticks mode before we return to userspace. >>> >>> We could do it just in the prctl() syscall only, but then we lose the >>> ability to implement the NOSIG mode, which can be a convenience. >> >> So this isn't spelled out anywhere. Why does this need to be in the >> return to user path? > > > I'm not sure where this should be spelled out, to be honest. I guess > I can add some commentary to the commit message explaining this part. > > The basic idea is just that we don't want to be at risk from the > dyntick getting enabled. Similarly, we don't want to be at risk of a > later global IPI due to lru_add_drain stuff, for example. And, we may > want to add additional stuff, like catching kernel TLB flushes and > deferring them when a remote core is in userspace. To do all of this > kind of stuff, we need to run in the return to user path so we are > late enough to guarantee no further kernel things will happen to > perturb our carefully-arranged isolation state that includes dyntick > off, per-cpu lru cache empty, etc etc. None of the above should need to *loop*, though, AFAIK.