All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Brown <broonie@kernel.org>
To: Dave Martin <Dave.Martin@arm.com>
Cc: Julien Grall <julien@xen.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Zhang Lei <zhang.lei@jp.fujitsu.com>,
	Julien Grall <julien.grall@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	Daniel Kiss <Daniel.Kiss@arm.com>
Subject: Re: [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return
Date: Tue, 17 Nov 2020 23:03:38 +0000	[thread overview]
Message-ID: <20201117230338.GI5142@sirena.org.uk> (raw)
In-Reply-To: <20201117181618.GC6882@arm.com>


[-- Attachment #1.1: Type: text/plain, Size: 12248 bytes --]

On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote:

> I think that thinking of TIF_SVE_NEEDS_FLUSH as a distinct toggle may
> not help helping here -- it feels more like an override or a special
> case.

Well, we need a flag somewhere - it's a question of where we put it.

> I also wonder whether increasing the amount of commenting is actually a
> trap here: there's a risk of piling new confusion on old.  The actual
> problems might be that there is too much commenting already, not too
> little...

It's an interesting balance.  Having worked with the code it's
definitely overwhelmingly more commented than the rest of the
architecture, I think because it *is* a bit fiddly and has even fewer
people who've looked at it in detail than the rest of it.  This can make
it look much more complicated than it actually appears to be but on the
other hand it may be that other bits of the code should be more like the
FP code rather than the other way around.

> Taking a step back, this is how I think it was all supposed to work,
> prior to this series.  This time, I'll try to describe the flags
> separately rather than enumerating all the combinations (which I think
> may have been a misstep):

I tend to agree that thinking about the combinations when you don't have
to makes it harder - in particular with TIF_FOREIGN_FPSTATE.

> At this level of abstraction, these two concepts are entirely
> independent:

>           !SVE            SVE
>         +---------------+---------------+
>  !FFP   | Track: FPSIMD | Track: SVE    |
>         | Where: regs   | Where: regs   |
>         +---------------+---------------+
>  FFP    | Track: FPSIMD | Track: SVE    |
>         | Where: memory | Where: memory |
>         +---------------+---------------+

Yes.

> I think where we get in a tangle is that TIF_SVE conflates mechanism
> (whether the kernel tracks the full SVE state) with policy (whether
> userspace cares about the full SVE state).  At present, we can't
> describe the situation where we track SVE state but userspace doesn't
> care, so we either have to do nothing at all and just leave SVE on all
> the time, or we have to disable SVE at the first
> opportunity.

I think this is the current conflation that is causing issues but I
wouldn't state it quite like that, I don't know that *care* is quite the
right word for what userspace is doing here.  There's two pieces,
there's the SVE state in the registers and there's the ability to
execute SVE instructions without trapping.  When userspace executes a
SVE instruction we enable execution and also start tracking the register
state which we currently track in the single TIF_SVE flag.  When
userspace does a syscall the extra register state becomes undefined
(realistically we'll probably have to continue resetting it to zero as
we currently do) and we currently don't the ability to track the ability
to execute the instructions separately to discarding that state so we
just disable both at once.  This means that the syscall overhead is
amplified for any task that mixes SVE and syscalls.

> So, what we need here is a way (i.e., a new state) to indicate that
> userspace doesn't care about the extra SVE state.  Obviously this not
> orthogonal to TIF_SVE: if SVE state is not tracked, then there is
> no state _to_ care about.

> As I understand it, this is what the TIF_SVE_NEEDS_FLUSH flag
> describes.  It separates policy (userspace's policy in this case --
> i.e., whether it needs the full SVE state) from mechanism (whether the

That's approximately the roles, yes.  The implementation is currently
different in that both SVE flags are separately wrapping in the
execution permission.

> kernel keeps track of the full SVE state).  Perhaps we can come up with
> another name, such as TIF_SVE_MAY_DISCARD.

Naming is definitely a thing here - _MAY_DISCARD definitely doesn't seem
right.

> Whatever we call this new intermediate state, we have a something like
> this.
> 
>           !SVE            ???             SVE
>         +---------------+---------------+---------------+
>         | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
>  !FFP   | Track: FPSIMD | Track: SVE    | Track: SVE    |
>         | Where: regs   | Where: regs   | Where: regs   |
>         +---------------+---------------+---------------+
>         | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
>  FFP    | Track: FPSIMD | Track: SVE    | Track: SVE    |
>         | Where: memory | Where: memory | Where: memory |
>         +---------------+---------------+---------------+
> 
> (FFP = TIF_FOREIGN_FPSTATE, SVE = TIF_SVE)

I think I get what you mean here but I'm finding the "need" a bit
confusing here and in the following discussion.  I think instead of need
and track it might be clearer to say valid registers and execute - it's
which registers are have defined data in them and if userspace can
execute SVE instructions without trapping.

> However, the middle column is a bit special: we can't run in userspace
> in this state, since once userspace touches the regs we no longer know
> whether it cares about the data in them.  Also, it's a bit pointless
> representing this state in memory, since the main point of having it is
> to _avoid_ dumping to memory.  And we do need to do a bit of work to
> sanitise the state -- unless we always do it on the syscall entry path.

While we can't be in userspace in the middle column I'm not sure it's
entirely true to say that we can't usefully represent things here in
memory if we think about the execute part of things as well.  If we
schedule another task we can still track if we want to have SVE
instruction execution on return to userspace even if the only data we're
storing is the FPSIMD subset, avoiding a second SVE trap on the first
SVE instruction after a syscall while also storing less data in memory.

As noted in the cover letter for the series we're probably going to find
we want to reset the execute permission intermittently on syscall
workloads in order to ensure that tasks that execute SVE infrequently
don't always have SVE enabled but I think we want to sort out what we're
doing with the basic case first.

>                                                           Effort needed
>                                                           to load regs
>           !SVE            ???             SVE
>         +---------------+               +---------------+
>         | Need: FPSIMD  |               | Need: SVE     | none
>  !FFP   | Track: FPSIMD +---------------+ Track: SVE    |   ^
>         | Where: regs   | N: FPSIMD     | Where: regs   |   :
>         +---------------+ T: undecided  +---------------+ some
>         | Need: FPSIMD  | W:regs+cleanup| Need: SVE     |   :
>  FFP    | Track: FPSIMD +---------------+ Track: SVE    |   v
>         | Where: memory |               | Where: memory | full reload
>         +---------------+               +---------------+

...

> And finally, on sched-out (or any other fpsimd_save() event):

>         +---------------+               +---------------+
>         |               |               |               |
>         |  :            +---------------+            :  |
>         |  :            |               |            :  |
>         +--:------------+               +------------:--+
>         |  :         <=======  [*]  =======>         :  |
>         |  V            +---------------+            V  |
>         |               |               |               |
>         +---------------+               +---------------+

> ...at least in threory.

I agree up to this final schedule case since here we'd either be taking
a decision to force userspace to trap on the next SVE instruction or to
store and reload full SVE state in order to avoid that neither of which
seems ideal.  With the current patch if TIF_SVE_NEEDS_FLUSH is set we
only save the FPSIMD state and then restore it and switch to TIF_SVE
when scheduling the task again - this is masked a bit in the patch since
it does not update the existing code in fpsimd_save() as TIF_SVE is not
set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path.

We could potentially use this as a heuristic to decide that we want to
drop SVE on a syscall that schedules rather than after some number of
syscalls (which we don't currently do but was mentioned as future work
in the cover letter and has precedents on other architectures) but that
doesn't feel great, the syscall counter feels more robust.

> While TIF_FOREIGN_FPSTATE and TIF_SVE remain orthogonal to each other,
> this new state is not orthogonal to either.  I don't think that's a bug,
> providing that we understand its role.  Since it's a 5th state, we are
> going to burn a flag on it, but this doesn't mean that it needs to (or
> even should) have a meaning that's fully independent of the others.
> Rather it may be useful precisely because it's not independent -- it
> decribes a situation which is currently an overlap between the other
> states.

> I'm pretty sure this is more or less what the proposed patches do,
> but I'll review again with this picture in mind.

It differs in the handling of scheduling while in a syscall but
otherwise yes, I think that's pretty much it.

With that in mind and thinking on the subthread with Catalin if we
restructure the SVE flags such that we have separate SVE_EXEC and
SVE_REGS flags (bad but hopefully comprehensible names short enough for
drawing a table) for the execute and storage permissions we end up with:

           !SVE/SVE_REGS   SVE_EXEC        SVE_REGS+SVE_EXEC
         +---------------+---------------+---------------+
         | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE    |
  !FFP   | Trap:  Yes    | Trap:  No     | Trap:  No     |
         | Where: regs   | Where: regs   | Where: regs   |
         +---------------+---------------+---------------+
         | Valid: FPSIMD | Valid: FPSIMD | Valid: SVE    |
  FFP    | Trap:  Yes    | Trap:  No     | Trap:  No     |
         | Where: memory | Where: memory | Where: memory |
         +---------------+---------------+---------------+

(the first column being either no SVE flags set or only SVE_REGS set.)
All entries should go to one of the !FFP cases.  Entry from SVE traps or
syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with
SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other
entries should go to !SVE/SVE_REGS.

Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps
onto SVE_EXEC.
 
> Note: Making things symmetric
> -----------------------------

> An alternative behaviour for syscalls would be:

>                                           TIF_SVE
>                !TIF_SVE    ,-------------------------.
> 	  ,--------------------------.
> 
>         +---------------+               +---------------+
>         |               |               |               |
>         |               +---------------+               |
>         |            ======>         <======            |
>         +---------------+               +---------------+
>         |               |               |               |
>         |               +---------------+               |
>         |               |               |               |
>         +---------------+               +---------------+

> This shouldn't change the underlying behaviour other than way certain
> if() conditions would be expressed.  This model may make it clearer
> that the TIF_SVE versus !TIF_SVE decision doesn't occur until moving
> out of the middle state.  This gives a natural place to turn SVE on
> speculatively for example, when a syscall occurs with !TIF_SVE.
> (Whether this is useful is we could ever make an accurate enough guess
> is another matter...  but this still might make the code a bit easier
> to conceptualise.)

I worry that this might make it harder to follow in that you'd have to
understand why we're considering enabling SVE for tasks that never tried
to do anything with it.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-11-17 23:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-06 19:35 [PATCH v5 0/2] arm64/sve: First steps towards optimizing syscalls Mark Brown
2020-11-06 19:35 ` [PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return Mark Brown
2020-11-13 18:48   ` Catalin Marinas
2020-11-13 20:13     ` Mark Brown
2020-11-16 17:59       ` Dave Martin
2020-11-16 19:45         ` Mark Brown
2020-11-17 18:16           ` Dave Martin
2020-11-17 23:03             ` Mark Brown [this message]
2020-11-18 12:51               ` Dave Martin
2020-11-18 17:52                 ` Mark Brown
2020-11-19 18:27                   ` Dave Martin
2020-11-19 19:02                     ` Mark Brown
2020-11-06 19:35 ` [PATCH v5 2/2] arm64/sve: Rework SVE trap access to use TIF_SVE_NEEDS_FLUSH Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201117230338.GI5142@sirena.org.uk \
    --to=broonie@kernel.org \
    --cc=Daniel.Kiss@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=julien.grall@arm.com \
    --cc=julien@xen.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=will@kernel.org \
    --cc=zhang.lei@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.