From: Catalin Marinas <catalin.marinas@arm.com> To: Dave Martin <Dave.Martin@arm.com> Cc: linux-arch@vger.kernel.org, libc-alpha@sourceware.org, "Ard Biesheuvel" <ard.biesheuvel@linaro.org>, "Szabolcs Nagy" <szabolcs.nagy@arm.com>, "Richard Sandiford" <richard.sandiford@arm.com>, "Will Deacon" <will.deacon@arm.com>, "Alex Bennée" <alex.bennee@linaro.org>, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v2 11/28] arm64/sve: Core task context handling Date: Tue, 19 Sep 2017 18:13:33 +0100 [thread overview] Message-ID: <20170919171332.sjlhwnxqklmb5wsx@armageddon.cambridge.arm.com> (raw) In-Reply-To: <20170914193944.GA24231@e103592.cambridge.arm.com> On Thu, Sep 14, 2017 at 08:40:41PM +0100, Dave P Martin wrote: > On Wed, Sep 13, 2017 at 03:21:29PM -0700, Catalin Marinas wrote: > > On Wed, Sep 13, 2017 at 08:17:07PM +0100, Dave P Martin wrote: > > > On Wed, Sep 13, 2017 at 10:26:05AM -0700, Catalin Marinas wrote: > > > > On Thu, Aug 31, 2017 at 06:00:43PM +0100, Dave P Martin wrote: > > > > > +/* > > > > > + * Trapped SVE access > > > > > + */ > > > > > +void do_sve_acc(unsigned int esr, struct pt_regs *regs) > > > > > +{ > > > > > + /* Even if we chose not to use SVE, the hardware could still trap: */ > > > > > + if (unlikely(!system_supports_sve()) || WARN_ON(is_compat_task())) { > > > > > + force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0); > > > > > + return; > > > > > + } > > > > > + > > > > > + task_fpsimd_save(); > > > > > + > > > > > + sve_alloc(current); > > > > > + fpsimd_to_sve(current); > > > > > + if (test_and_set_thread_flag(TIF_SVE)) > > > > > + WARN_ON(1); /* SVE access shouldn't have trapped */ > > > > > + > > > > > + task_fpsimd_load(); > > > > > +} > > > > > > > > When this function is entered, do we expect TIF_SVE to always be > > > > cleared? It's worth adding a comment on the expected conditions. If > > > > > > Yes, and this is required for correctness, as you observe. > > > > > > I had a BUG_ON() here which I removed, but it makes sense to add a > > > comment to capture the precondition here, and how it is satisfied. > > > > > > > that's the case, task_fpsimd_save() would only save the FPSIMD state > > > > which is fine. However, you subsequently transfer the FPSIMD state to > > > > SVE, set TIF_SVE and restore the full SVE state. If we don't care about > > > > the SVE state here, can we call task_fpsimd_load() *before* setting > > > > TIF_SVE? > > > > > > There should be no way to reach this code with TIF_SVE set, unless > > > task_fpsimd_load() sets the CPACR trap bit wrongly, or the hardware is > > > broken -- either of which is a bug. > > > > Thanks for confirming my assumptions. What I meant was rewriting the > > above function as: > > > > /* reset the SVE state (other than FPSIMD) */ > > task_fpsimd_save(); > > task_fpsimd_load(); > > I think this works, but can you explain your rationale? > > I think the main effect of your suggestion is that it is cheaper, due > to eliminating some unnecessary load/store operations. My rationale was to avoid copying between the in-memory FPSIMD and SVE state. > We could go one better, and do > > mov v0.16b, v0.16b > mov v1.16b, v1.16b > // ... > mov v31.16b, v31.16b > > which doesn't require any memory access. Yes, that's even better. > But I still prefer to zero p0..p15, ffr for cleanliness, even though > the SVE programmer's model doesn't require this (unlike for the Z-reg > high bits where we do need to zero them in order not to violate the > programmer's model). I missed the px, ffr aspect. Can you not have a clear_sve_state() (or a better name) function to zero the predicate regs, ffr and the top bits of the vectors? > Currently sve_alloc()+task_fpsimd_load() ensures that all the non-FPSIMD > regs are zeroed too, in addition to the Z-reg high bits. Yes, just wondering if this can be implemented with less memory accesses since the SVE state is irrelevant at this stage. > > So we might want a special-purpose helper -- if so, we can do it all > with no memory access. > > pfalse p0.b > // .. > pfalse p15.b > wrffr p0.b > > This would allow the memset-zero an sve_alloc() to be removed, but I > would need to check what other code is relying on it. > > I guess I hadn't done this because I viewed it as an optimisation. It looked like some low-hanging optimisation to slightly accelerate the allocation of the SVE state on access, though I'm also worried I don't fully understand all the corner cases (like what happens if we allow interrupts during this function and get preempted). Anyway, I'm fine to leave this as it is for now and try to optimise it later with additional patches on top. -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: catalin.marinas@arm.com (Catalin Marinas) To: linux-arm-kernel@lists.infradead.org Subject: [PATCH v2 11/28] arm64/sve: Core task context handling Date: Tue, 19 Sep 2017 18:13:33 +0100 [thread overview] Message-ID: <20170919171332.sjlhwnxqklmb5wsx@armageddon.cambridge.arm.com> (raw) In-Reply-To: <20170914193944.GA24231@e103592.cambridge.arm.com> On Thu, Sep 14, 2017 at 08:40:41PM +0100, Dave P Martin wrote: > On Wed, Sep 13, 2017 at 03:21:29PM -0700, Catalin Marinas wrote: > > On Wed, Sep 13, 2017 at 08:17:07PM +0100, Dave P Martin wrote: > > > On Wed, Sep 13, 2017 at 10:26:05AM -0700, Catalin Marinas wrote: > > > > On Thu, Aug 31, 2017 at 06:00:43PM +0100, Dave P Martin wrote: > > > > > +/* > > > > > + * Trapped SVE access > > > > > + */ > > > > > +void do_sve_acc(unsigned int esr, struct pt_regs *regs) > > > > > +{ > > > > > + /* Even if we chose not to use SVE, the hardware could still trap: */ > > > > > + if (unlikely(!system_supports_sve()) || WARN_ON(is_compat_task())) { > > > > > + force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0); > > > > > + return; > > > > > + } > > > > > + > > > > > + task_fpsimd_save(); > > > > > + > > > > > + sve_alloc(current); > > > > > + fpsimd_to_sve(current); > > > > > + if (test_and_set_thread_flag(TIF_SVE)) > > > > > + WARN_ON(1); /* SVE access shouldn't have trapped */ > > > > > + > > > > > + task_fpsimd_load(); > > > > > +} > > > > > > > > When this function is entered, do we expect TIF_SVE to always be > > > > cleared? It's worth adding a comment on the expected conditions. If > > > > > > Yes, and this is required for correctness, as you observe. > > > > > > I had a BUG_ON() here which I removed, but it makes sense to add a > > > comment to capture the precondition here, and how it is satisfied. > > > > > > > that's the case, task_fpsimd_save() would only save the FPSIMD state > > > > which is fine. However, you subsequently transfer the FPSIMD state to > > > > SVE, set TIF_SVE and restore the full SVE state. If we don't care about > > > > the SVE state here, can we call task_fpsimd_load() *before* setting > > > > TIF_SVE? > > > > > > There should be no way to reach this code with TIF_SVE set, unless > > > task_fpsimd_load() sets the CPACR trap bit wrongly, or the hardware is > > > broken -- either of which is a bug. > > > > Thanks for confirming my assumptions. What I meant was rewriting the > > above function as: > > > > /* reset the SVE state (other than FPSIMD) */ > > task_fpsimd_save(); > > task_fpsimd_load(); > > I think this works, but can you explain your rationale? > > I think the main effect of your suggestion is that it is cheaper, due > to eliminating some unnecessary load/store operations. My rationale was to avoid copying between the in-memory FPSIMD and SVE state. > We could go one better, and do > > mov v0.16b, v0.16b > mov v1.16b, v1.16b > // ... > mov v31.16b, v31.16b > > which doesn't require any memory access. Yes, that's even better. > But I still prefer to zero p0..p15, ffr for cleanliness, even though > the SVE programmer's model doesn't require this (unlike for the Z-reg > high bits where we do need to zero them in order not to violate the > programmer's model). I missed the px, ffr aspect. Can you not have a clear_sve_state() (or a better name) function to zero the predicate regs, ffr and the top bits of the vectors? > Currently sve_alloc()+task_fpsimd_load() ensures that all the non-FPSIMD > regs are zeroed too, in addition to the Z-reg high bits. Yes, just wondering if this can be implemented with less memory accesses since the SVE state is irrelevant at this stage. > > So we might want a special-purpose helper -- if so, we can do it all > with no memory access. > > pfalse p0.b > // .. > pfalse p15.b > wrffr p0.b > > This would allow the memset-zero an sve_alloc() to be removed, but I > would need to check what other code is relying on it. > > I guess I hadn't done this because I viewed it as an optimisation. It looked like some low-hanging optimisation to slightly accelerate the allocation of the SVE state on access, though I'm also worried I don't fully understand all the corner cases (like what happens if we allow interrupts during this function and get preempted). Anyway, I'm fine to leave this as it is for now and try to optimise it later with additional patches on top. -- Catalin
next prev parent reply other threads:[~2017-09-19 17:13 UTC|newest] Thread overview: 224+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-08-31 17:00 [PATCH v2 00/28] ARM Scalable Vector Extension (SVE) Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 01/28] regset: Add support for dynamically sized regsets Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 02/28] arm64: KVM: Hide unsupported AArch64 CPU features from guests Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 14:37 ` Alex Bennée 2017-09-13 14:37 ` Alex Bennée 2017-09-13 14:37 ` Alex Bennée 2017-09-15 0:04 ` Dave Martin 2017-09-15 0:04 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 03/28] arm64: efi: Add missing Kconfig dependency on KERNEL_MODE_NEON Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 04/28] arm64: Port deprecated instruction emulation to new sysctl interface Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 05/28] arm64: fpsimd: Simplify uses of {set,clear}_ti_thread_flag() Dave Martin 2017-08-31 17:00 ` [PATCH v2 05/28] arm64: fpsimd: Simplify uses of {set, clear}_ti_thread_flag() Dave Martin 2017-08-31 17:00 ` [PATCH v2 06/28] arm64/sve: System register and exception syndrome definitions Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 14:48 ` Alex Bennée 2017-09-13 14:48 ` Alex Bennée 2017-09-13 14:48 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 07/28] arm64/sve: Low-level SVE architectural state manipulation functions Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 15:39 ` Alex Bennée 2017-09-13 15:39 ` Alex Bennée 2017-09-13 15:39 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 08/28] arm64/sve: Kconfig update and conditional compilation support Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 09/28] arm64/sve: Signal frame and context structure definition Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 13:36 ` Catalin Marinas 2017-09-13 13:36 ` Catalin Marinas 2017-09-13 21:33 ` Dave Martin 2017-09-13 21:33 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 10/28] arm64/sve: Low-level CPU setup Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 13:32 ` Catalin Marinas 2017-09-13 13:32 ` Catalin Marinas 2017-09-13 19:21 ` Dave Martin 2017-09-13 19:21 ` Dave Martin 2017-09-13 19:21 ` Dave Martin 2017-10-05 10:47 ` Dave Martin 2017-10-05 10:47 ` Dave Martin 2017-10-05 11:04 ` Suzuki K Poulose 2017-10-05 11:04 ` Suzuki K Poulose 2017-10-05 11:22 ` Dave Martin 2017-10-05 11:22 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 11/28] arm64/sve: Core task context handling Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 14:33 ` Catalin Marinas 2017-09-13 14:33 ` Catalin Marinas 2017-09-14 19:55 ` Dave Martin 2017-09-14 19:55 ` Dave Martin 2017-09-20 13:58 ` Catalin Marinas 2017-09-20 13:58 ` Catalin Marinas 2017-10-03 11:11 ` Dave Martin 2017-10-03 11:11 ` Dave Martin 2017-10-04 17:29 ` Catalin Marinas 2017-10-04 17:29 ` Catalin Marinas 2017-10-03 11:33 ` Dave Martin 2017-10-03 11:33 ` Dave Martin 2017-10-05 11:28 ` Catalin Marinas 2017-10-05 11:28 ` Catalin Marinas 2017-10-06 13:10 ` Dave Martin 2017-10-06 13:10 ` Dave Martin 2017-10-06 13:36 ` Catalin Marinas 2017-10-06 13:36 ` Catalin Marinas 2017-10-06 15:15 ` Dave Martin 2017-10-06 15:15 ` Dave Martin 2017-10-06 15:33 ` Catalin Marinas 2017-10-06 15:33 ` Catalin Marinas 2017-09-13 17:26 ` Catalin Marinas 2017-09-13 17:26 ` Catalin Marinas 2017-09-13 19:17 ` Dave Martin 2017-09-13 19:17 ` Dave Martin 2017-09-13 22:21 ` Catalin Marinas 2017-09-13 22:21 ` Catalin Marinas 2017-09-14 19:40 ` Dave Martin 2017-09-14 19:40 ` Dave Martin 2017-09-19 17:13 ` Catalin Marinas [this message] 2017-09-19 17:13 ` Catalin Marinas 2017-08-31 17:00 ` [PATCH v2 12/28] arm64/sve: Support vector length resetting for new processes Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 8:47 ` Alex Bennée 2017-09-14 8:47 ` Alex Bennée 2017-09-14 8:47 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 13/28] arm64/sve: Signal handling support Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 9:30 ` Alex Bennée 2017-09-14 9:30 ` Alex Bennée 2017-09-14 9:30 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 14/28] arm64/sve: Backend logic for setting the vector length Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-13 17:29 ` Catalin Marinas 2017-09-13 17:29 ` Catalin Marinas 2017-09-13 19:06 ` Dave Martin 2017-09-13 19:06 ` Dave Martin 2017-09-13 22:11 ` Catalin Marinas 2017-09-13 22:11 ` Catalin Marinas 2017-10-05 16:42 ` Dave Martin 2017-10-05 16:42 ` Dave Martin 2017-10-05 16:53 ` Catalin Marinas 2017-10-05 16:53 ` Catalin Marinas 2017-10-05 17:04 ` Dave Martin 2017-10-05 17:04 ` Dave Martin 2017-09-20 10:57 ` Alan Hayward 2017-09-20 10:57 ` Alan Hayward 2017-09-20 10:59 ` Alan Hayward 2017-09-20 10:59 ` Alan Hayward 2017-09-20 11:09 ` Dave Martin 2017-09-20 11:09 ` Dave Martin 2017-09-20 18:08 ` Alan Hayward 2017-09-20 18:08 ` Alan Hayward 2017-09-21 11:19 ` Dave Martin 2017-09-21 11:19 ` Dave Martin 2017-09-21 11:57 ` Alan Hayward 2017-09-21 11:57 ` Alan Hayward 2017-08-31 17:00 ` [PATCH v2 15/28] arm64: cpufeature: Move sys_caps_initialised declarations Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 9:33 ` Alex Bennée 2017-09-14 9:33 ` Alex Bennée 2017-09-14 9:33 ` Alex Bennée 2017-09-14 9:35 ` Suzuki K Poulose 2017-09-14 9:35 ` Suzuki K Poulose 2017-08-31 17:00 ` [PATCH v2 16/28] arm64/sve: Probe SVE capabilities and usable vector lengths Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 9:45 ` Alex Bennée 2017-09-14 9:45 ` Alex Bennée 2017-09-14 9:45 ` Alex Bennée 2017-09-28 14:22 ` Dave Martin 2017-09-28 14:22 ` Dave Martin 2017-09-28 17:32 ` Alex Bennée 2017-09-28 17:32 ` Alex Bennée 2017-09-28 17:32 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 17/28] arm64/sve: Preserve SVE registers around kernel-mode NEON use Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 10:52 ` Alex Bennée 2017-09-14 10:52 ` Alex Bennée 2017-09-14 10:52 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 18/28] arm64/sve: Preserve SVE registers around EFI runtime service calls Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 11:01 ` Alex Bennée 2017-09-14 11:01 ` Alex Bennée 2017-09-14 11:01 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 19/28] arm64/sve: ptrace and ELF coredump support Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-06 16:21 ` Okamoto, Takayuki 2017-09-06 16:21 ` Okamoto, Takayuki 2017-09-06 18:16 ` Dave Martin 2017-09-06 18:16 ` Dave Martin 2017-09-07 5:11 ` Okamoto, Takayuki 2017-09-07 5:11 ` Okamoto, Takayuki 2017-09-07 5:11 ` Okamoto, Takayuki 2017-09-08 13:11 ` Dave Martin 2017-09-08 13:11 ` Dave Martin 2017-09-14 12:57 ` Alex Bennée 2017-09-14 12:57 ` Alex Bennée 2017-09-14 12:57 ` Alex Bennée 2017-09-28 14:57 ` Dave Martin 2017-09-28 14:57 ` Dave Martin 2017-09-29 12:46 ` Dave Martin 2017-09-29 12:46 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 20/28] arm64/sve: Add prctl controls for userspace vector length management Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 13:02 ` Alex Bennée 2017-09-14 13:02 ` Alex Bennée 2017-09-14 13:02 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 21/28] arm64/sve: Add sysctl to set the default vector length for new processes Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 13:05 ` Alex Bennée 2017-09-14 13:05 ` Alex Bennée 2017-09-14 13:05 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 22/28] arm64/sve: KVM: Prevent guests from using SVE Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 13:28 ` Alex Bennée 2017-09-14 13:28 ` Alex Bennée 2017-09-14 13:28 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 23/28] arm64/sve: KVM: Treat guest SVE use as undefined instruction execution Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 13:30 ` Alex Bennée 2017-09-14 13:30 ` Alex Bennée 2017-09-14 13:30 ` Alex Bennée 2017-09-14 13:31 ` Alex Bennée 2017-09-14 13:31 ` Alex Bennée 2017-09-14 13:31 ` Alex Bennée 2017-09-29 13:00 ` Dave Martin 2017-09-29 13:00 ` Dave Martin 2017-09-29 14:43 ` Alex Bennée 2017-09-29 14:43 ` Alex Bennée 2017-09-29 14:43 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 24/28] arm64/sve: KVM: Hide SVE from CPU features exposed to guests Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-09-14 13:32 ` Alex Bennée 2017-09-14 13:32 ` Alex Bennée 2017-09-14 13:32 ` Alex Bennée 2017-08-31 17:00 ` [PATCH v2 25/28] arm64/sve: Detect SVE and activate runtime support Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` [PATCH v2 26/28] arm64/sve: Add documentation Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-10-05 16:39 ` Szabolcs Nagy 2017-10-05 16:39 ` Szabolcs Nagy 2017-10-05 17:02 ` Dave Martin 2017-10-05 17:02 ` Dave Martin 2017-10-06 15:43 ` Szabolcs Nagy 2017-10-06 15:43 ` Szabolcs Nagy 2017-10-06 17:37 ` Dave Martin 2017-10-06 17:37 ` Dave Martin 2017-10-09 9:34 ` Alex Bennée 2017-10-09 9:34 ` Alex Bennée 2017-10-09 9:34 ` Alex Bennée 2017-10-09 9:49 ` Dave Martin 2017-10-09 9:49 ` Dave Martin 2017-10-09 14:07 ` Alex Bennée 2017-10-09 14:07 ` Alex Bennée 2017-10-09 14:07 ` Alex Bennée 2017-10-09 16:20 ` Dave Martin 2017-10-09 16:20 ` Dave Martin 2017-08-31 17:00 ` [RFC PATCH v2 27/28] arm64: signal: Report signal frame size to userspace via auxv Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:00 ` Dave Martin 2017-08-31 17:01 ` [RFC PATCH v2 28/28] arm64/sve: signal: Include SVE when computing AT_MINSIGSTKSZ Dave Martin 2017-08-31 17:01 ` Dave Martin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170919171332.sjlhwnxqklmb5wsx@armageddon.cambridge.arm.com \ --to=catalin.marinas@arm.com \ --cc=Dave.Martin@arm.com \ --cc=alex.bennee@linaro.org \ --cc=ard.biesheuvel@linaro.org \ --cc=kvmarm@lists.cs.columbia.edu \ --cc=libc-alpha@sourceware.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=richard.sandiford@arm.com \ --cc=szabolcs.nagy@arm.com \ --cc=will.deacon@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.