linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Metcalf <cmetcalf@mellanox.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Gilad Ben Yossef <giladb@mellanox.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Tejun Heo <tj@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Christoph Lameter <cl@linux.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Michal Hocko <mhocko@suse.com>, <linux-mm@kvack.org>,
	<linux-doc@vger.kernel.org>, <linux-api@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v15 04/13] task_isolation: add initial support
Date: Tue, 30 Aug 2016 11:32:16 -0400	[thread overview]
Message-ID: <a321c8a7-fa9c-21f7-61f8-54a8f80763fe@mellanox.com> (raw)
In-Reply-To: <20160830075854.GZ10153@twins.programming.kicks-ass.net>

On 8/30/2016 3:58 AM, Peter Zijlstra wrote:
> On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote:
>> On 8/29/2016 12:33 PM, Peter Zijlstra wrote:
>>> On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote:
>>>> +	/*
>>>> +	 * Request rescheduling unless we are in full dynticks mode.
>>>> +	 * We would eventually get pre-empted without this, and if
>>>> +	 * there's another task waiting, it would run; but by
>>>> +	 * explicitly requesting the reschedule, we may reduce the
>>>> +	 * latency.  We could directly call schedule() here as well,
>>>> +	 * but since our caller is the standard place where schedule()
>>>> +	 * is called, we defer to the caller.
>>>> +	 *
>>>> +	 * A more substantive approach here would be to use a struct
>>>> +	 * completion here explicitly, and complete it when we shut
>>>> +	 * down dynticks, but since we presumably have nothing better
>>>> +	 * to do on this core anyway, just spinning seems plausible.
>>>> +	 */
>>>> +	if (!tick_nohz_tick_stopped())
>>>> +		set_tsk_need_resched(current);
>>> This is broken.. and it would be really good if you don't actually need
>>> to do this.
>> Can you elaborate?  We clearly do want to wait until we are in full
>> dynticks mode before we return to userspace.
>>
>> We could do it just in the prctl() syscall only, but then we lose the
>> ability to implement the NOSIG mode, which can be a convenience.
> So this isn't spelled out anywhere. Why does this need to be in the
> return to user path?

I'm not sure where this should be spelled out, to be honest.  I guess
I can add some commentary to the commit message explaining this part.

The basic idea is just that we don't want to be at risk from the
dyntick getting enabled.  Similarly, we don't want to be at risk of a
later global IPI due to lru_add_drain stuff, for example.  And, we may
want to add additional stuff, like catching kernel TLB flushes and
deferring them when a remote core is in userspace.  To do all of this
kind of stuff, we need to run in the return to user path so we are
late enough to guarantee no further kernel things will happen to
perturb our carefully-arranged isolation state that includes dyntick
off, per-cpu lru cache empty, etc etc.

>> Even without that consideration, we really can't be sure we stay in
>> dynticks mode if we disable the dynamic tick, but then enable interrupts,
>> and end up taking an interrupt on the way back to userspace, and
>> it turns the tick back on.  That's why we do it here, where we know
>> interrupts will stay disabled until we get to userspace.
> But but but.. task_isolation_enter() is explicitly ran with IRQs
> _enabled_!! It even WARNs if they're disabled.

Yes, true!  But if you pop up to the caller, the key thing is the
task_isolation_ready() routine where we are invoked with interrupts
disabled, and we confirm that all our criteria are met (including
tick_nohz_tick_stopped), and then leave interrupts disabled as we
return from there onwards to userspace.

The task_isolation_enter() code just does its best-faith attempt to
make sure all these criteria are met, just like all the other TIF_xxx
flag tests do in exit_to_usermode_loop() on x86, like scheduling,
delivering signals, etc.  As you know, we might run that code, go
around the loop, and discover that the TIF flag has been re-set, and
we have to run the code again before all of that stuff has "quiesced".
The isolation code uses that same model; the only difference is that
we clear the TIF flag manually in the loop by checking
task_isolation_ready().

>> So if we are doing it here, what else can/should we do?  There really
>> shouldn't be any other tasks waiting to run at this point, so there's
>> not a heck of a lot else to do on this core.  We could just spin and
>> check need_resched and signal status manually instead, but that
>> seems kind of duplicative of code already done in our caller here.
> What !? I really don't get this, what are you waiting for? Why is
> rescheduling making things better.

We need to wait for the last dyntick to fire before we can return to
userspace.  There are plenty of options as to what we can do in the
meanwhile.

1. Try to schedule().  Good luck with that in practice, since a
userspace process that has enabled task isolation is going to be alone
on its core unless something pretty broken is happening on the system.
But, at least folks understand the idiom of scheduling out while you wait.

2. Another variant of that: set up a wait completion and have the
dynticks code complete it when the tick turns off.  But this adds
complexity to option 1, and really doesn't buy us much in practice
that I can see.

3. Just admit that we are likely alone on the core, and just burn
cycles in a busy loop waiting for that last tick to fire.  Obviously
if we do this we also need to test for signals and resched so the core
remains responsive.  We can either do this in a loop just by spinning
explicitly, or I could literally just remove the line in the current
patchset that sets TIF_NEED_RESCHED, at which point we busy-wait by
just going around and around in exit_to_usermode_loop().  The only
flaw here is that we don't mark the task explicitly as TASK_INTERRUPTIBLE
while we are doing this - and that's probably worth doing.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

  reply	other threads:[~2016-08-30 15:32 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-16 21:19 [PATCH v15 00/13] support "task_isolation" mode Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 01/13] vmstat: add quiet_vmstat_sync function Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 02/13] vmstat: add vmstat_idle function Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 03/13] lru_add_drain_all: factor out lru_add_drain_needed Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 04/13] task_isolation: add initial support Chris Metcalf
2016-08-29 16:33   ` Peter Zijlstra
2016-08-29 16:40     ` Chris Metcalf
2016-08-29 16:48       ` Peter Zijlstra
2016-08-29 16:53         ` Chris Metcalf
2016-08-30  7:59           ` Peter Zijlstra
2016-08-30  7:58       ` Peter Zijlstra
2016-08-30 15:32         ` Chris Metcalf [this message]
2016-08-30 16:30           ` Andy Lutomirski
2016-08-30 17:02             ` Chris Metcalf
2016-08-30 18:43               ` Andy Lutomirski
2016-08-30 19:37                 ` Chris Metcalf
2016-08-30 19:50                   ` Andy Lutomirski
2016-09-02 14:04                     ` Chris Metcalf
2016-09-02 17:28                       ` Andy Lutomirski
2016-09-09 17:40                         ` Chris Metcalf
2016-09-12 17:41                           ` Andy Lutomirski
2016-09-12 19:25                             ` Chris Metcalf
2016-09-27 14:22                         ` Frederic Weisbecker
2016-09-27 14:39                           ` Peter Zijlstra
2016-09-27 14:51                             ` Frederic Weisbecker
2016-09-27 14:48                           ` Paul E. McKenney
2016-09-30 16:59                 ` Chris Metcalf
2016-09-01 10:06           ` Peter Zijlstra
2016-09-02 14:03             ` Chris Metcalf
2016-09-02 16:40               ` Peter Zijlstra
2017-02-02 16:13   ` Eugene Syromiatnikov
2017-02-02 18:12     ` Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 05/13] task_isolation: track asynchronous interrupts Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 06/13] arch/x86: enable task isolation functionality Chris Metcalf
2016-08-30 21:46   ` Andy Lutomirski
2016-08-16 21:19 ` [PATCH v15 07/13] arm64: factor work_pending state machine to C Chris Metcalf
2016-08-17  8:05   ` Will Deacon
2016-08-16 21:19 ` [PATCH v15 08/13] arch/arm64: enable task isolation functionality Chris Metcalf
2016-08-26 16:25   ` Catalin Marinas
2016-08-16 21:19 ` [PATCH v15 09/13] arch/tile: " Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 10/13] arm, tile: turn off timer tick for oneshot_stopped state Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 11/13] task_isolation: support CONFIG_TASK_ISOLATION_ALL Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 12/13] task_isolation: add user-settable notification signal Chris Metcalf
2016-08-16 21:19 ` [PATCH v15 13/13] task_isolation self test Chris Metcalf
2016-08-17 19:37 ` [PATCH] Fix /proc/stat freezes (was [PATCH v15] "task_isolation" mode) Christoph Lameter
2016-08-20  1:42   ` Chris Metcalf
2016-09-28 13:16   ` Frederic Weisbecker
2016-08-29 16:27 ` Ping: [PATCH v15 00/13] support "task_isolation" mode Chris Metcalf
2016-09-07 21:11   ` Francis Giraldeau
2016-09-07 21:39     ` Francis Giraldeau
2016-09-08 16:21     ` Francis Giraldeau
2016-09-12 16:01     ` Chris Metcalf
2016-09-12 16:14       ` Peter Zijlstra
2016-09-12 21:15         ` Rafael J. Wysocki
2016-09-13  0:05           ` Rafael J. Wysocki
2016-09-13 16:00             ` Francis Giraldeau
2016-09-13  0:20       ` Francis Giraldeau
2016-09-13 16:12         ` Chris Metcalf
2016-09-27 14:49         ` Frederic Weisbecker
2016-09-27 14:35   ` Frederic Weisbecker
2016-09-30 17:07     ` Chris Metcalf
2016-11-05  4:04 ` task isolation discussion at Linux Plumbers Chris Metcalf
2016-11-05 16:05   ` Christoph Lameter
2016-11-07 16:55   ` Thomas Gleixner
2016-11-07 18:36     ` Thomas Gleixner
2016-11-07 19:12       ` Rik van Riel
2016-11-07 19:16         ` Will Deacon
2016-11-07 19:18           ` Rik van Riel
2016-11-11 20:54     ` Luiz Capitulino
2016-11-09  1:40   ` Paul E. McKenney
2016-11-09 11:14     ` Andy Lutomirski
2016-11-09 17:38       ` Paul E. McKenney
2016-11-09 18:57         ` Will Deacon
2016-11-09 19:11           ` Paul E. McKenney
2016-11-10  1:44         ` Andy Lutomirski
2016-11-10  4:52           ` Paul E. McKenney
2016-11-10  5:10             ` Paul E. McKenney
2016-11-11 17:00             ` Andy Lutomirski
2016-11-09 11:07   ` Frederic Weisbecker
2016-12-19 14:37   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a321c8a7-fa9c-21f7-61f8-54a8f80763fe@mellanox.com \
    --to=cmetcalf@mellanox.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=fweisbec@gmail.com \
    --cc=giladb@mellanox.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=viresh.kumar@linaro.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).