All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Rik van Riel <riel@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Ingo Molnar <mingo@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
	X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs
Date: Mon, 23 Feb 2015 18:31:57 -0800	[thread overview]
Message-ID: <CALCETrV5bD=L2OOtDE1TbHcVMc7Ahhs5YQtx6jTmArM08wqPjg@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.11.1502240112070.17311@eddie.linux-mips.org>

On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki <macro@linux-mips.org> wrote:
> On Mon, 23 Feb 2015, Andy Lutomirski wrote:
>
>> >> After a context switch, the instructions from the old task are no
>> >> longer in the pipeline.
>> >
>> >  I'd say it's implementation-specific.  As I mentioned the i486 aborted
>> > any transcendental x87 instruction in progress upon taking an exception or
>> > interrupt.  That was a model like you refer to, but as I also mentioned it
>> > had its shortcomings.
>>
>> IRET is serializing, according to the the docs (I think) and according
>> to the Intel engineers I asked (I'm absolutely certain about this
>> part).  So FPU ops are entirely done at the end of a normal context
>> switch.
>
>  No question about the serialising property of IRET, it has been like this
> since the original Pentium implementation.  Do you have an architecture
> specification reference to back up your claim though as far as the FPU is
> concerned?  I'm asking because I am genuinely curious.
>
>  The x87 case is so special, there isn't anything there really that is
> externally observable or should be affected by IRET or any other
> synchronisation barriers apart from WAIT (or a waiting x87 instruction)
> that has been there for this purpose since forever.  And it would defeat
> some documented benefits of running the FP pipeline in the parallel.

It's plausible that this is special, but I doubt it.  Especially since
this optimization would be nuts post-SSE2.

>
>  And certainly such synchronisation didn't happen in the old days.
>
>> We also always save the FPU context on every context switch away from
>> a task that used the FPU, even in lazy mode.  This is because we might
>> switch the task back in on a different CPU, and we don't want to use
>> an IPI to move the FPU context.
>
>  That's an interesting case too, although not necessarily related.  If you
> say that we always save the FP context eagerly for the purpose of process
> migration, then sure, that invalidates any benefit we'd have from letting
> the x87 proceed.
>
>  However I can see different ways to address this case avoiding the need
> of eager FP context saving or an IPI:
>
> 1. We could bind any currently suspended process with an unsaved FP
>    context to the CPU it last executed on.

This would be insane.

>
> 2. We could mark such a process for migration next time and let it execute
>    on the CPU that holds its FP context once more, and then save the FP
>    context eagerly on the way out.

This would be worse than insane.  Now, in order to wake such a process
on a different CPU, we'd have to force a *context switch* on the
source CPU.  Now we're replacing a few hundred cycles at worse for a
transcendental function with at least 10k cycles (at a guess) and
possibly many orders of magnitude more if locks are held, plus
priority issues, plus total craziness.

>
> In some cases a lazily retained FP context would be preempted by another
> process before the process in question would resume anyway.  In this case
> any temporary binding to a CPU could be given up.
>
>> Given that we're only talking about old CPUs here, I sincerely doubt
>> that there's any relevant case in which an fxsave can usefully wait
>> for a long-running transcendental op to finish while we continue doing
>> useful work.  *Especially* since there will almost certainly be
>> several more mfences or atomic ops before the end of the context
>> switch, even if we're lucky enough to complete the context switching
>> using sysret.
>
>  I am not sure what you mean by FXSAVE usefully waiting for an op, please
> elaborate.  At the point you've reached FXSAVE and an earlier x87
> instruction hasn't completed, you've already lost.  The pipeline will be
> stalled until the x87 instruction has completed and it can be hundreds of
> cycles.  My point therefore has been about avoiding to execute FXSAVE for
> the old task until absolutely necessary, that with the lazy FP context
> switching would be at the next x87 (or SSE) instruction reached by the new
> task.
>
>  Likewise I don't see why MFENCE or an atomic operation should affect the
> excecution of say FSINCOS.  Whether the results of FSINCOS arrive before
> or after MFENCE, etc. are not externally observable.

FSINCOS; FXSAVE; MFENCE had better serialize all the way, no matter
what weird architectural crud is going on.

>
>  And I'm not sure if this all affects old CPUs only -- I don't know how
> much x87 software is out there, but after all these years I'd expect quite
> some.  Sure, lots of this can be recompiled to use SSE instead, but not
> all, and even where it is feasible, that's an extra burden for people,
> beyond say a routine hardware or Linux distribution or for that matter
> lone kernel upgrade.  Therefore I think we need to be careful not to
> pessimise things for a subset of people too much and ideally at all.
>
>  And to be clear, I am not against removing lazy FP context switching per
> se.  I am just emphasizing to be careful with that and be absolutely sure
> that it does not cause excessive harm.

We're talking about the unusual case in which we context switch within
~100 cycles of a legacy transcendental operation, and, even so,
there's *still* no regression, since we don't optimize this case
today.

--Andy

  reply	other threads:[~2015-02-24  2:32 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-20 18:58 [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs Andy Lutomirski
2015-02-20 19:05 ` Borislav Petkov
2015-02-21  9:31 ` Ingo Molnar
2015-02-21 16:38   ` Borislav Petkov
2015-02-21 17:29     ` Borislav Petkov
2015-02-21 18:39       ` Ingo Molnar
2015-02-21 19:15         ` Borislav Petkov
2015-02-21 19:23           ` Ingo Molnar
2015-02-21 21:36             ` Borislav Petkov
2015-02-22  8:18               ` Ingo Molnar
2015-02-22  8:22                 ` Ingo Molnar
2015-02-22 10:48                 ` Borislav Petkov
2015-02-22 12:50                 ` Borislav Petkov
2015-02-22 12:57                   ` Ingo Molnar
2015-02-22 13:21                     ` Borislav Petkov
2015-02-22  0:34       ` Maciej W. Rozycki
2015-02-22  2:18         ` Andy Lutomirski
2015-02-22 11:06           ` Borislav Petkov
2015-02-23  1:45             ` Rik van Riel
2015-02-23  5:22               ` Andy Lutomirski
2015-02-23 12:51                 ` Rik van Riel
2015-02-23 15:03                   ` Borislav Petkov
2015-02-23 15:51                     ` Rik van Riel
2015-02-23 18:06                       ` Borislav Petkov
2015-02-23 21:17           ` Maciej W. Rozycki
2015-02-23 21:21             ` Rik van Riel
2015-02-23 22:14               ` Linus Torvalds
2015-02-24  0:56                 ` Maciej W. Rozycki
2015-02-24  0:59                   ` Andy Lutomirski
2015-02-23 22:27               ` Maciej W. Rozycki
2015-02-23 23:44                 ` Andy Lutomirski
2015-02-24  2:14                   ` Maciej W. Rozycki
2015-02-24  2:31                     ` Andy Lutomirski [this message]
2015-02-24 14:43                       ` Rik van Riel
2015-02-21 18:34     ` Ingo Molnar
2015-02-23 14:59 ` Oleg Nesterov
2015-02-23 15:11   ` Borislav Petkov
2015-02-23 15:53     ` Rik van Riel
2015-02-23 18:40       ` Oleg Nesterov
2015-02-24 19:15 ` Denys Vlasenko
2015-02-25  0:07   ` Andy Lutomirski
2015-02-25 10:37     ` Borislav Petkov
2015-02-25 10:50       ` Ingo Molnar
2015-02-25 10:45     ` Ingo Molnar
2015-02-25 17:12 ` Some results (was: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs) Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrV5bD=L2OOtDE1TbHcVMc7Ahhs5YQtx6jTmArM08wqPjg@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=macro@linux-mips.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.