All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/38] arm64/sme: Initial support for the Scalable Matrix Extension
@ 2021-09-30 18:11 ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME).  SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations.  A
more detailed overview can be found in [1].

For the kernel SME can be thought of as a series of features which are
intended to be used together by applications but operate mostly
orthogonally:

 - The ZA matrix register.
 - Streaming mode, in which ZA can be accessed and a subset of SVE
   features are available.
 - A second vector length, used for streaming mode SVE and ZA and
   controlled using a similar interface to that for SVE.
 - TPIDR2, a new userspace controllable system register intended for use
   by the C library for storing context related to the ZA ABI.

A substantial part of the series is dedicated to refactoring the
existing SVE support so that we don't need to duplicate code for
handling vector lengths and the SVE registers, this involves creating an
array of vector types and making the users take the vector type as a
parameter.  I'm not 100% happy with this but wasn't able to come up with
anything better, duplicating code definitely felt like a bad idea so
this felt like the least bad thing.  If this approach makes sense to
people it might make sense to split this off into a separate series
and/or merge it while the rest is pending review to try to make things a
little more digestable, the series is very large so it'd probably make
things easier to digest if some of the preparatory refactoring could be
merged before the rest is ready.

One feature of the architecture of particular note is that switching
to and from streaming mode may change the size of and invalidate the
contents of the SVE registers, and when in streaming mode the FFR is not
accessible.  This complicates aspects of the ABI like signal handling
and ptrace.

This initial implementation is mainly intended to get the ABI in place,
there are several areas which will be worked on going forwards - some of
these will be blockers, others could be handled in followup serieses:

 - KVM is not currently supported and we depend on !KVM, in hopefully
   the next version I will add support for coexisting with KVM and then
   in a subsequent series implement real support for KVM guests.
 - It is likely some build configurations have issues, I've not fully
   checked this yet.  In general testing is still ongoing, I anticipate
   finding and fixing some issues in the implementation.
 - No support is currently provided for scheduler control of SME or SME
   applications, given the size of the SME register state the context
   switch overhead may be noticable so this may be needed especially for
   real time applications.  Similar concerns already exist for larger
   SVE vector lengths but are amplified for SME, particularly as the
   vector length increases.
 - There has been no work on optimising the performance of anything the
   kernel does.

It is not expected that any systems will be encountered that support SME
but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
The code attempts to handle any such systems that are encountered but
this hasn't been tested extensively.

Due to dependencies on kselftest changes already upstreamed this series
is based on for-next/kselftest in the arm64 tree.

[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/scalable-matrix-extension-armv9-a-architecture

Mark Brown (38):
  arm64/fp: Reindent fpsimd_save()
  arm64/sve: Remove sve_load_from_fpsimd_state()
  arm64/sve: Make access to FFR optional
  arm64/sve: Rename find_supported_vector_length()
  arm64/sve: Use accessor functions for vector lengths in thread_struct
  arm64/sve: Put system wide vector length information into structs
  arm64/sve: Explicitly load vector length when restoring SVE state
  arm64/sve: Track vector lengths for tasks in an array
  arm64/sve: Make sysctl interface for SVE reusable by SME
  arm64/sve: Generalise vector length configuration prctl() for SME
  selftests: arm64: Parameterise ptrace vector length information
  arm64/sme: Provide ABI documentation for SME
  arm64/sme: System register and exception syndrome definitions
  arm64/sme: Define macros for manually encoding SME instructions
  arm64/sme: Early CPU setup for SME
  arm64/sme: Basic enumeration support
  arm64/sme: Identify supported SME vector lengths at boot
  arm64/sme: Implement sysctl to set the default vector length
  arm64/sme: Implement vector length configuration prctl()s
  arm64/sme: Implement support for TPIDR2
  arm64/sme: Implement SVCR context switching
  arm64/sme: Implement streaming SVE context switching
  arm64/sme: Implement ZA context switching
  arm64/sme: Implement traps and syscall handling for SME
  arm64/sme: Implement streaming SVE signal handling
  arm64/sme: Implement ZA signal handling
  arm64/sme: Implement ptrace support for streaming mode SVE registers
  arm64/sme: Add ptrace support for ZA
  arm64/sme: Disable streaming mode and ZA when flushing CPU state
  arm64/sme: Save and restore streaming mode over EFI runtime calls
  arm64/sme: Provide Kconfig for SME
  kselftest/arm64: Add tests for TPIDR2
  kselftest/arm64: Extend vector configuration API tests to cover SME
  kselftest/arm64: sme: Provide streaming mode SVE stress test
  kselftest/arm64: Add stress test for SME ZA context switching
  kselftest/arm64: signal: Add SME signal handling tests
  selftests: arm64: Add streaming SVE to SVE ptrace tests
  selftests: arm64: Add coverage for the ZA ptrace interface

 Documentation/arm64/elf_hwcaps.rst            |  29 +
 Documentation/arm64/index.rst                 |   1 +
 Documentation/arm64/sme.rst                   | 427 +++++++++
 Documentation/arm64/sve.rst                   |  62 +-
 arch/arm64/Kconfig                            |  11 +
 arch/arm64/include/asm/cpu.h                  |   4 +
 arch/arm64/include/asm/cpufeature.h           |  18 +
 arch/arm64/include/asm/el2_setup.h            |  36 +
 arch/arm64/include/asm/esr.h                  |   3 +-
 arch/arm64/include/asm/exception.h            |   1 +
 arch/arm64/include/asm/fpsimd.h               | 178 +++-
 arch/arm64/include/asm/fpsimdmacros.h         |  94 +-
 arch/arm64/include/asm/hwcap.h                |   7 +
 arch/arm64/include/asm/kvm_arm.h              |   1 +
 arch/arm64/include/asm/processor.h            |  67 +-
 arch/arm64/include/asm/sysreg.h               |  53 ++
 arch/arm64/include/asm/thread_info.h          |   4 +-
 arch/arm64/include/uapi/asm/hwcap.h           |   7 +
 arch/arm64/include/uapi/asm/ptrace.h          |  69 +-
 arch/arm64/include/uapi/asm/sigcontext.h      |  55 +-
 arch/arm64/kernel/cpufeature.c                |  96 +-
 arch/arm64/kernel/cpuinfo.c                   |  12 +
 arch/arm64/kernel/entry-common.c              |  10 +
 arch/arm64/kernel/entry-fpsimd.S              |  63 +-
 arch/arm64/kernel/fpsimd.c                    | 900 ++++++++++++++----
 arch/arm64/kernel/process.c                   |  19 +-
 arch/arm64/kernel/ptrace.c                    | 358 ++++++-
 arch/arm64/kernel/signal.c                    | 189 +++-
 arch/arm64/kernel/syscall.c                   |  49 +-
 arch/arm64/kernel/traps.c                     |   1 +
 arch/arm64/kvm/fpsimd.c                       |   3 +-
 arch/arm64/kvm/hyp/fpsimd.S                   |   6 +-
 arch/arm64/kvm/reset.c                        |  14 +-
 arch/arm64/tools/cpucaps                      |   1 +
 include/uapi/linux/elf.h                      |   2 +
 include/uapi/linux/prctl.h                    |   9 +
 kernel/sys.c                                  |   6 +
 tools/testing/selftests/arm64/Makefile        |   2 +-
 tools/testing/selftests/arm64/abi/.gitignore  |   1 +
 tools/testing/selftests/arm64/abi/Makefile    |  13 +
 tools/testing/selftests/arm64/abi/tpidr2.c    | 204 ++++
 tools/testing/selftests/arm64/fp/.gitignore   |   4 +
 tools/testing/selftests/arm64/fp/Makefile     |  12 +-
 tools/testing/selftests/arm64/fp/rdvl-sme.c   |  14 +
 tools/testing/selftests/arm64/fp/rdvl.S       |  16 +
 tools/testing/selftests/arm64/fp/rdvl.h       |   1 +
 tools/testing/selftests/arm64/fp/ssve-stress  |  59 ++
 tools/testing/selftests/arm64/fp/sve-ptrace.c | 203 ++--
 tools/testing/selftests/arm64/fp/sve-test.S   |  30 +
 tools/testing/selftests/arm64/fp/vec-syscfg.c |  10 +
 tools/testing/selftests/arm64/fp/za-ptrace.c  | 353 +++++++
 tools/testing/selftests/arm64/fp/za-stress    |  59 ++
 tools/testing/selftests/arm64/fp/za-test.S    | 545 +++++++++++
 .../testing/selftests/arm64/signal/.gitignore |   2 +
 .../selftests/arm64/signal/test_signals.h     |   2 +
 .../arm64/signal/test_signals_utils.c         |   3 +
 .../testcases/fake_sigreturn_sme_change_vl.c  |  92 ++
 .../selftests/arm64/signal/testcases/sme_vl.c |  70 ++
 .../arm64/signal/testcases/ssve_regs.c        | 129 +++
 59 files changed, 4293 insertions(+), 396 deletions(-)
 create mode 100644 Documentation/arm64/sme.rst
 create mode 100644 tools/testing/selftests/arm64/abi/.gitignore
 create mode 100644 tools/testing/selftests/arm64/abi/Makefile
 create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
 create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
 create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
 create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
 create mode 100644 tools/testing/selftests/arm64/fp/za-stress
 create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c


base-commit: 8694e5e6388695195a32bd5746635ca166a8df56
-- 
2.20.1


^ permalink raw reply	[flat|nested] 143+ messages in thread

* [PATCH v1 00/38] arm64/sme: Initial support for the Scalable Matrix Extension
@ 2021-09-30 18:11 ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME).  SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations.  A
more detailed overview can be found in [1].

For the kernel SME can be thought of as a series of features which are
intended to be used together by applications but operate mostly
orthogonally:

 - The ZA matrix register.
 - Streaming mode, in which ZA can be accessed and a subset of SVE
   features are available.
 - A second vector length, used for streaming mode SVE and ZA and
   controlled using a similar interface to that for SVE.
 - TPIDR2, a new userspace controllable system register intended for use
   by the C library for storing context related to the ZA ABI.

A substantial part of the series is dedicated to refactoring the
existing SVE support so that we don't need to duplicate code for
handling vector lengths and the SVE registers, this involves creating an
array of vector types and making the users take the vector type as a
parameter.  I'm not 100% happy with this but wasn't able to come up with
anything better, duplicating code definitely felt like a bad idea so
this felt like the least bad thing.  If this approach makes sense to
people it might make sense to split this off into a separate series
and/or merge it while the rest is pending review to try to make things a
little more digestable, the series is very large so it'd probably make
things easier to digest if some of the preparatory refactoring could be
merged before the rest is ready.

One feature of the architecture of particular note is that switching
to and from streaming mode may change the size of and invalidate the
contents of the SVE registers, and when in streaming mode the FFR is not
accessible.  This complicates aspects of the ABI like signal handling
and ptrace.

This initial implementation is mainly intended to get the ABI in place,
there are several areas which will be worked on going forwards - some of
these will be blockers, others could be handled in followup serieses:

 - KVM is not currently supported and we depend on !KVM, in hopefully
   the next version I will add support for coexisting with KVM and then
   in a subsequent series implement real support for KVM guests.
 - It is likely some build configurations have issues, I've not fully
   checked this yet.  In general testing is still ongoing, I anticipate
   finding and fixing some issues in the implementation.
 - No support is currently provided for scheduler control of SME or SME
   applications, given the size of the SME register state the context
   switch overhead may be noticable so this may be needed especially for
   real time applications.  Similar concerns already exist for larger
   SVE vector lengths but are amplified for SME, particularly as the
   vector length increases.
 - There has been no work on optimising the performance of anything the
   kernel does.

It is not expected that any systems will be encountered that support SME
but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
The code attempts to handle any such systems that are encountered but
this hasn't been tested extensively.

Due to dependencies on kselftest changes already upstreamed this series
is based on for-next/kselftest in the arm64 tree.

[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/scalable-matrix-extension-armv9-a-architecture

Mark Brown (38):
  arm64/fp: Reindent fpsimd_save()
  arm64/sve: Remove sve_load_from_fpsimd_state()
  arm64/sve: Make access to FFR optional
  arm64/sve: Rename find_supported_vector_length()
  arm64/sve: Use accessor functions for vector lengths in thread_struct
  arm64/sve: Put system wide vector length information into structs
  arm64/sve: Explicitly load vector length when restoring SVE state
  arm64/sve: Track vector lengths for tasks in an array
  arm64/sve: Make sysctl interface for SVE reusable by SME
  arm64/sve: Generalise vector length configuration prctl() for SME
  selftests: arm64: Parameterise ptrace vector length information
  arm64/sme: Provide ABI documentation for SME
  arm64/sme: System register and exception syndrome definitions
  arm64/sme: Define macros for manually encoding SME instructions
  arm64/sme: Early CPU setup for SME
  arm64/sme: Basic enumeration support
  arm64/sme: Identify supported SME vector lengths at boot
  arm64/sme: Implement sysctl to set the default vector length
  arm64/sme: Implement vector length configuration prctl()s
  arm64/sme: Implement support for TPIDR2
  arm64/sme: Implement SVCR context switching
  arm64/sme: Implement streaming SVE context switching
  arm64/sme: Implement ZA context switching
  arm64/sme: Implement traps and syscall handling for SME
  arm64/sme: Implement streaming SVE signal handling
  arm64/sme: Implement ZA signal handling
  arm64/sme: Implement ptrace support for streaming mode SVE registers
  arm64/sme: Add ptrace support for ZA
  arm64/sme: Disable streaming mode and ZA when flushing CPU state
  arm64/sme: Save and restore streaming mode over EFI runtime calls
  arm64/sme: Provide Kconfig for SME
  kselftest/arm64: Add tests for TPIDR2
  kselftest/arm64: Extend vector configuration API tests to cover SME
  kselftest/arm64: sme: Provide streaming mode SVE stress test
  kselftest/arm64: Add stress test for SME ZA context switching
  kselftest/arm64: signal: Add SME signal handling tests
  selftests: arm64: Add streaming SVE to SVE ptrace tests
  selftests: arm64: Add coverage for the ZA ptrace interface

 Documentation/arm64/elf_hwcaps.rst            |  29 +
 Documentation/arm64/index.rst                 |   1 +
 Documentation/arm64/sme.rst                   | 427 +++++++++
 Documentation/arm64/sve.rst                   |  62 +-
 arch/arm64/Kconfig                            |  11 +
 arch/arm64/include/asm/cpu.h                  |   4 +
 arch/arm64/include/asm/cpufeature.h           |  18 +
 arch/arm64/include/asm/el2_setup.h            |  36 +
 arch/arm64/include/asm/esr.h                  |   3 +-
 arch/arm64/include/asm/exception.h            |   1 +
 arch/arm64/include/asm/fpsimd.h               | 178 +++-
 arch/arm64/include/asm/fpsimdmacros.h         |  94 +-
 arch/arm64/include/asm/hwcap.h                |   7 +
 arch/arm64/include/asm/kvm_arm.h              |   1 +
 arch/arm64/include/asm/processor.h            |  67 +-
 arch/arm64/include/asm/sysreg.h               |  53 ++
 arch/arm64/include/asm/thread_info.h          |   4 +-
 arch/arm64/include/uapi/asm/hwcap.h           |   7 +
 arch/arm64/include/uapi/asm/ptrace.h          |  69 +-
 arch/arm64/include/uapi/asm/sigcontext.h      |  55 +-
 arch/arm64/kernel/cpufeature.c                |  96 +-
 arch/arm64/kernel/cpuinfo.c                   |  12 +
 arch/arm64/kernel/entry-common.c              |  10 +
 arch/arm64/kernel/entry-fpsimd.S              |  63 +-
 arch/arm64/kernel/fpsimd.c                    | 900 ++++++++++++++----
 arch/arm64/kernel/process.c                   |  19 +-
 arch/arm64/kernel/ptrace.c                    | 358 ++++++-
 arch/arm64/kernel/signal.c                    | 189 +++-
 arch/arm64/kernel/syscall.c                   |  49 +-
 arch/arm64/kernel/traps.c                     |   1 +
 arch/arm64/kvm/fpsimd.c                       |   3 +-
 arch/arm64/kvm/hyp/fpsimd.S                   |   6 +-
 arch/arm64/kvm/reset.c                        |  14 +-
 arch/arm64/tools/cpucaps                      |   1 +
 include/uapi/linux/elf.h                      |   2 +
 include/uapi/linux/prctl.h                    |   9 +
 kernel/sys.c                                  |   6 +
 tools/testing/selftests/arm64/Makefile        |   2 +-
 tools/testing/selftests/arm64/abi/.gitignore  |   1 +
 tools/testing/selftests/arm64/abi/Makefile    |  13 +
 tools/testing/selftests/arm64/abi/tpidr2.c    | 204 ++++
 tools/testing/selftests/arm64/fp/.gitignore   |   4 +
 tools/testing/selftests/arm64/fp/Makefile     |  12 +-
 tools/testing/selftests/arm64/fp/rdvl-sme.c   |  14 +
 tools/testing/selftests/arm64/fp/rdvl.S       |  16 +
 tools/testing/selftests/arm64/fp/rdvl.h       |   1 +
 tools/testing/selftests/arm64/fp/ssve-stress  |  59 ++
 tools/testing/selftests/arm64/fp/sve-ptrace.c | 203 ++--
 tools/testing/selftests/arm64/fp/sve-test.S   |  30 +
 tools/testing/selftests/arm64/fp/vec-syscfg.c |  10 +
 tools/testing/selftests/arm64/fp/za-ptrace.c  | 353 +++++++
 tools/testing/selftests/arm64/fp/za-stress    |  59 ++
 tools/testing/selftests/arm64/fp/za-test.S    | 545 +++++++++++
 .../testing/selftests/arm64/signal/.gitignore |   2 +
 .../selftests/arm64/signal/test_signals.h     |   2 +
 .../arm64/signal/test_signals_utils.c         |   3 +
 .../testcases/fake_sigreturn_sme_change_vl.c  |  92 ++
 .../selftests/arm64/signal/testcases/sme_vl.c |  70 ++
 .../arm64/signal/testcases/ssve_regs.c        | 129 +++
 59 files changed, 4293 insertions(+), 396 deletions(-)
 create mode 100644 Documentation/arm64/sme.rst
 create mode 100644 tools/testing/selftests/arm64/abi/.gitignore
 create mode 100644 tools/testing/selftests/arm64/abi/Makefile
 create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
 create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
 create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
 create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
 create mode 100644 tools/testing/selftests/arm64/fp/za-stress
 create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c


base-commit: 8694e5e6388695195a32bd5746635ca166a8df56
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save()
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Currently all the active code in fpsimd_save() is inside a check for
TIF_FOREIGN_FPSTATE. Reduce the indentation level by changing to return
from the function if TIF_FOREIGN_FPSTATE is set.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ff4962750b3d..995f8801602b 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -308,24 +308,26 @@ static void fpsimd_save(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		if (IS_ENABLED(CONFIG_ARM64_SVE) &&
-		    test_thread_flag(TIF_SVE)) {
-			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
-				/*
-				 * Can't save the user regs, so current would
-				 * re-enter user with corrupt state.
-				 * There's no way to recover, so kill it:
-				 */
-				force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
-				return;
-			}
-
-			sve_save_state((char *)last->sve_state +
-						sve_ffr_offset(last->sve_vl),
-				       &last->st->fpsr);
-		} else
-			fpsimd_save_state(last->st);
+	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
+		return;
+
+	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
+	    test_thread_flag(TIF_SVE)) {
+		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
+			/*
+			 * Can't save the user regs, so current would
+			 * re-enter user with corrupt state.
+			 * There's no way to recover, so kill it:
+			 */
+			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
+			return;
+		}
+
+		sve_save_state((char *)last->sve_state +
+					sve_ffr_offset(last->sve_vl),
+			       &last->st->fpsr);
+	} else {
+		fpsimd_save_state(last->st);
 	}
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save()
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Currently all the active code in fpsimd_save() is inside a check for
TIF_FOREIGN_FPSTATE. Reduce the indentation level by changing to return
from the function if TIF_FOREIGN_FPSTATE is set.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ff4962750b3d..995f8801602b 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -308,24 +308,26 @@ static void fpsimd_save(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		if (IS_ENABLED(CONFIG_ARM64_SVE) &&
-		    test_thread_flag(TIF_SVE)) {
-			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
-				/*
-				 * Can't save the user regs, so current would
-				 * re-enter user with corrupt state.
-				 * There's no way to recover, so kill it:
-				 */
-				force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
-				return;
-			}
-
-			sve_save_state((char *)last->sve_state +
-						sve_ffr_offset(last->sve_vl),
-				       &last->st->fpsr);
-		} else
-			fpsimd_save_state(last->st);
+	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
+		return;
+
+	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
+	    test_thread_flag(TIF_SVE)) {
+		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
+			/*
+			 * Can't save the user regs, so current would
+			 * re-enter user with corrupt state.
+			 * There's no way to recover, so kill it:
+			 */
+			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
+			return;
+		}
+
+		sve_save_state((char *)last->sve_state +
+					sve_ffr_offset(last->sve_vl),
+			       &last->st->fpsr);
+	} else {
+		fpsimd_save_state(last->st);
 	}
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 02/38] arm64/sve: Remove sve_load_from_fpsimd_state()
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Following optimisations of the SVE register handling we no longer load the
SVE state from a saved copy of the FPSIMD registers, we convert directly
in registers or from one saved state to another. Remove the function so we
don't need to update it during further refactoring.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 --
 arch/arm64/kernel/entry-fpsimd.S | 16 ----------------
 2 files changed, 18 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9a62884183e5..e0e30567b80f 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,8 +69,6 @@ extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern void sve_flush_live(unsigned long vq_minus_1);
-extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
-				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 196e921f61de..afbf7dc47e1d 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -66,22 +66,6 @@ SYM_FUNC_START(sve_set_vq)
 	ret
 SYM_FUNC_END(sve_set_vq)
 
-/*
- * Load SVE state from FPSIMD state.
- *
- * x0 = pointer to struct fpsimd_state
- * x1 = VQ - 1
- *
- * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
- * and the rest zeroed. All the other SVE registers will be zeroed.
- */
-SYM_FUNC_START(sve_load_from_fpsimd_state)
-	sve_load_vq	x1, x2, x3
-	fpsimd_restore	x0, 8
-	sve_flush_p_ffr
-	ret
-SYM_FUNC_END(sve_load_from_fpsimd_state)
-
 /*
  * Zero all SVE registers but the first 128-bits of each vector
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 02/38] arm64/sve: Remove sve_load_from_fpsimd_state()
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Following optimisations of the SVE register handling we no longer load the
SVE state from a saved copy of the FPSIMD registers, we convert directly
in registers or from one saved state to another. Remove the function so we
don't need to update it during further refactoring.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h  |  2 --
 arch/arm64/kernel/entry-fpsimd.S | 16 ----------------
 2 files changed, 18 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 9a62884183e5..e0e30567b80f 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,8 +69,6 @@ extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern void sve_flush_live(unsigned long vq_minus_1);
-extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
-				       unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 196e921f61de..afbf7dc47e1d 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -66,22 +66,6 @@ SYM_FUNC_START(sve_set_vq)
 	ret
 SYM_FUNC_END(sve_set_vq)
 
-/*
- * Load SVE state from FPSIMD state.
- *
- * x0 = pointer to struct fpsimd_state
- * x1 = VQ - 1
- *
- * Each SVE vector will be loaded with the first 128-bits taken from FPSIMD
- * and the rest zeroed. All the other SVE registers will be zeroed.
- */
-SYM_FUNC_START(sve_load_from_fpsimd_state)
-	sve_load_vq	x1, x2, x3
-	fpsimd_restore	x0, 8
-	sve_flush_p_ffr
-	ret
-SYM_FUNC_END(sve_load_from_fpsimd_state)
-
 /*
  * Zero all SVE registers but the first 128-bits of each vector
  *
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 03/38] arm64/sve: Make access to FFR optional
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

SME introduces streaming SVE mode in which FFR is not present and the
instructions for accessing it UNDEF. In preparation for handling this
update the low level SVE state access functions to take a flag specifying
if FFR should be handled. When saving the register state we store a zero
for FFR to guard against uninitialized data being read. No behaviour change
should be introduced by this patch.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  6 +++---
 arch/arm64/include/asm/fpsimdmacros.h | 20 ++++++++++++++------
 arch/arm64/kernel/entry-fpsimd.S      | 17 +++++++++++------
 arch/arm64/kernel/fpsimd.c            | 10 ++++++----
 arch/arm64/kvm/hyp/fpsimd.S           |  6 ++++--
 5 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index e0e30567b80f..bf5bb881ef67 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -65,10 +65,10 @@ static inline void *sve_pffr(struct thread_struct *thread)
 	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
 }
 
-extern void sve_save_state(void *state, u32 *pfpsr);
+extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
-			   unsigned long vq_minus_1);
-extern void sve_flush_live(unsigned long vq_minus_1);
+			   int restore_ffr, unsigned long vq_minus_1);
+extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
 
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 00a2c0b69c2b..84d8cb7b07fa 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -217,28 +217,36 @@
 .macro sve_flush_z
  _for n, 0, 31, _sve_flush_z	\n
 .endm
-.macro sve_flush_p_ffr
+.macro sve_flush_p
  _for n, 0, 15, _sve_pfalse	\n
+.endm
+.macro sve_flush_ffr
 		_sve_wrffr	0
 .endm
 
-.macro sve_save nxbase, xpfpsr, nxtmp
+.macro sve_save nxbase, xpfpsr, save_ffr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
+		cbz		\save_ffr, 921f
 		_sve_rdffr	0
 		_sve_str_p	0, \nxbase
 		_sve_ldr_p	0, \nxbase, -16
-
+		b		922f
+921:
+		str		xzr, [x\nxbase, #0]	// Zero out FFR
+922:
 		mrs		x\nxtmp, fpsr
 		str		w\nxtmp, [\xpfpsr]
 		mrs		x\nxtmp, fpcr
 		str		w\nxtmp, [\xpfpsr, #4]
 .endm
 
-.macro __sve_load nxbase, xpfpsr, nxtmp
+.macro __sve_load nxbase, xpfpsr, restore_ffr, nxtmp
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
+		cbz		\restore_ffr, 921f
 		_sve_ldr_p	0, \nxbase
 		_sve_wrffr	0
+921:
  _for n, 0, 15,	_sve_ldr_p	\n, \nxbase, \n - 16
 
 		ldr		w\nxtmp, [\xpfpsr]
@@ -247,7 +255,7 @@
 		msr		fpcr, x\nxtmp
 .endm
 
-.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
+.macro sve_load nxbase, xpfpsr, restore_ffr, xvqminus1, nxtmp, xtmp2
 		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
-		__sve_load	\nxbase, \xpfpsr, \nxtmp
+		__sve_load	\nxbase, \xpfpsr, \restore_ffr, \nxtmp
 .endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index afbf7dc47e1d..13c27465bfa8 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -38,9 +38,10 @@ SYM_FUNC_END(fpsimd_load_state)
  *
  * x0 - pointer to buffer for state
  * x1 - pointer to storage for FPSR
+ * x2 - Save FFR if non-zero
  */
 SYM_FUNC_START(sve_save_state)
-	sve_save 0, x1, 2
+	sve_save 0, x1, x2, 3
 	ret
 SYM_FUNC_END(sve_save_state)
 
@@ -49,10 +50,11 @@ SYM_FUNC_END(sve_save_state)
  *
  * x0 - pointer to buffer for state
  * x1 - pointer to storage for FPSR
- * x2 - VQ-1
+ * x2 - Restore FFR if non-zero
+ * x3 - VQ-1
  */
 SYM_FUNC_START(sve_load_state)
-	sve_load 0, x1, x2, 3, x4
+	sve_load 0, x1, x2, x3, 4, x5
 	ret
 SYM_FUNC_END(sve_load_state)
 
@@ -72,12 +74,15 @@ SYM_FUNC_END(sve_set_vq)
  * VQ must already be configured by caller, any further updates of VQ
  * will need to ensure that the register state remains valid.
  *
- * x0 = VQ - 1
+ * x0 = include FFR?
+ * x1 = VQ - 1
  */
 SYM_FUNC_START(sve_flush_live)
-	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
+	cbz		x1, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
 	sve_flush_z
-1:	sve_flush_p_ffr
+1:	cbz		x0, 2f
+	sve_flush_p
+2:	sve_flush_ffr
 	ret
 SYM_FUNC_END(sve_flush_live)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 995f8801602b..3465d3328e44 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -289,7 +289,7 @@ static void task_fpsimd_load(void)
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE))
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr,
+			       &current->thread.uw.fpsimd_state.fpsr, true,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
@@ -325,7 +325,7 @@ static void fpsimd_save(void)
 
 		sve_save_state((char *)last->sve_state +
 					sve_ffr_offset(last->sve_vl),
-			       &last->st->fpsr);
+			       &last->st->fpsr, true);
 	} else {
 		fpsimd_save_state(last->st);
 	}
@@ -962,7 +962,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 		unsigned long vq_minus_one =
 			sve_vq_from_vl(current->thread.sve_vl) - 1;
 		sve_set_vq(vq_minus_one);
-		sve_flush_live(vq_minus_one);
+		sve_flush_live(true, vq_minus_one);
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
@@ -1356,7 +1356,8 @@ void __efi_fpsimd_begin(void)
 			__this_cpu_write(efi_sve_state_used, true);
 
 			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl),
-				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr);
+				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
+				       true);
 		} else {
 			fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state));
 		}
@@ -1382,6 +1383,7 @@ void __efi_fpsimd_end(void)
 
 			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
+				       true,
 				       sve_vq_from_vl(sve_get_vl()) - 1);
 
 			__this_cpu_write(efi_sve_state_used, false);
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 3c635929771a..1bb3b04b84e6 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S
@@ -21,11 +21,13 @@ SYM_FUNC_START(__fpsimd_restore_state)
 SYM_FUNC_END(__fpsimd_restore_state)
 
 SYM_FUNC_START(__sve_restore_state)
-	__sve_load 0, x1, 2
+	mov	x2, #1
+	__sve_load 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_restore_state)
 
 SYM_FUNC_START(__sve_save_state)
-	sve_save 0, x1, 2
+	mov	x2, #1
+	sve_save 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_save_state)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 03/38] arm64/sve: Make access to FFR optional
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

SME introduces streaming SVE mode in which FFR is not present and the
instructions for accessing it UNDEF. In preparation for handling this
update the low level SVE state access functions to take a flag specifying
if FFR should be handled. When saving the register state we store a zero
for FFR to guard against uninitialized data being read. No behaviour change
should be introduced by this patch.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  6 +++---
 arch/arm64/include/asm/fpsimdmacros.h | 20 ++++++++++++++------
 arch/arm64/kernel/entry-fpsimd.S      | 17 +++++++++++------
 arch/arm64/kernel/fpsimd.c            | 10 ++++++----
 arch/arm64/kvm/hyp/fpsimd.S           |  6 ++++--
 5 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index e0e30567b80f..bf5bb881ef67 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -65,10 +65,10 @@ static inline void *sve_pffr(struct thread_struct *thread)
 	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
 }
 
-extern void sve_save_state(void *state, u32 *pfpsr);
+extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
-			   unsigned long vq_minus_1);
-extern void sve_flush_live(unsigned long vq_minus_1);
+			   int restore_ffr, unsigned long vq_minus_1);
+extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
 
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 00a2c0b69c2b..84d8cb7b07fa 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -217,28 +217,36 @@
 .macro sve_flush_z
  _for n, 0, 31, _sve_flush_z	\n
 .endm
-.macro sve_flush_p_ffr
+.macro sve_flush_p
  _for n, 0, 15, _sve_pfalse	\n
+.endm
+.macro sve_flush_ffr
 		_sve_wrffr	0
 .endm
 
-.macro sve_save nxbase, xpfpsr, nxtmp
+.macro sve_save nxbase, xpfpsr, save_ffr, nxtmp
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
+		cbz		\save_ffr, 921f
 		_sve_rdffr	0
 		_sve_str_p	0, \nxbase
 		_sve_ldr_p	0, \nxbase, -16
-
+		b		922f
+921:
+		str		xzr, [x\nxbase, #0]	// Zero out FFR
+922:
 		mrs		x\nxtmp, fpsr
 		str		w\nxtmp, [\xpfpsr]
 		mrs		x\nxtmp, fpcr
 		str		w\nxtmp, [\xpfpsr, #4]
 .endm
 
-.macro __sve_load nxbase, xpfpsr, nxtmp
+.macro __sve_load nxbase, xpfpsr, restore_ffr, nxtmp
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
+		cbz		\restore_ffr, 921f
 		_sve_ldr_p	0, \nxbase
 		_sve_wrffr	0
+921:
  _for n, 0, 15,	_sve_ldr_p	\n, \nxbase, \n - 16
 
 		ldr		w\nxtmp, [\xpfpsr]
@@ -247,7 +255,7 @@
 		msr		fpcr, x\nxtmp
 .endm
 
-.macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
+.macro sve_load nxbase, xpfpsr, restore_ffr, xvqminus1, nxtmp, xtmp2
 		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
-		__sve_load	\nxbase, \xpfpsr, \nxtmp
+		__sve_load	\nxbase, \xpfpsr, \restore_ffr, \nxtmp
 .endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index afbf7dc47e1d..13c27465bfa8 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -38,9 +38,10 @@ SYM_FUNC_END(fpsimd_load_state)
  *
  * x0 - pointer to buffer for state
  * x1 - pointer to storage for FPSR
+ * x2 - Save FFR if non-zero
  */
 SYM_FUNC_START(sve_save_state)
-	sve_save 0, x1, 2
+	sve_save 0, x1, x2, 3
 	ret
 SYM_FUNC_END(sve_save_state)
 
@@ -49,10 +50,11 @@ SYM_FUNC_END(sve_save_state)
  *
  * x0 - pointer to buffer for state
  * x1 - pointer to storage for FPSR
- * x2 - VQ-1
+ * x2 - Restore FFR if non-zero
+ * x3 - VQ-1
  */
 SYM_FUNC_START(sve_load_state)
-	sve_load 0, x1, x2, 3, x4
+	sve_load 0, x1, x2, x3, 4, x5
 	ret
 SYM_FUNC_END(sve_load_state)
 
@@ -72,12 +74,15 @@ SYM_FUNC_END(sve_set_vq)
  * VQ must already be configured by caller, any further updates of VQ
  * will need to ensure that the register state remains valid.
  *
- * x0 = VQ - 1
+ * x0 = include FFR?
+ * x1 = VQ - 1
  */
 SYM_FUNC_START(sve_flush_live)
-	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
+	cbz		x1, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
 	sve_flush_z
-1:	sve_flush_p_ffr
+1:	cbz		x0, 2f
+	sve_flush_p
+2:	sve_flush_ffr
 	ret
 SYM_FUNC_END(sve_flush_live)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 995f8801602b..3465d3328e44 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -289,7 +289,7 @@ static void task_fpsimd_load(void)
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE))
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr,
+			       &current->thread.uw.fpsimd_state.fpsr, true,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
@@ -325,7 +325,7 @@ static void fpsimd_save(void)
 
 		sve_save_state((char *)last->sve_state +
 					sve_ffr_offset(last->sve_vl),
-			       &last->st->fpsr);
+			       &last->st->fpsr, true);
 	} else {
 		fpsimd_save_state(last->st);
 	}
@@ -962,7 +962,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 		unsigned long vq_minus_one =
 			sve_vq_from_vl(current->thread.sve_vl) - 1;
 		sve_set_vq(vq_minus_one);
-		sve_flush_live(vq_minus_one);
+		sve_flush_live(true, vq_minus_one);
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
@@ -1356,7 +1356,8 @@ void __efi_fpsimd_begin(void)
 			__this_cpu_write(efi_sve_state_used, true);
 
 			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl),
-				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr);
+				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
+				       true);
 		} else {
 			fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state));
 		}
@@ -1382,6 +1383,7 @@ void __efi_fpsimd_end(void)
 
 			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
+				       true,
 				       sve_vq_from_vl(sve_get_vl()) - 1);
 
 			__this_cpu_write(efi_sve_state_used, false);
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 3c635929771a..1bb3b04b84e6 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S
@@ -21,11 +21,13 @@ SYM_FUNC_START(__fpsimd_restore_state)
 SYM_FUNC_END(__fpsimd_restore_state)
 
 SYM_FUNC_START(__sve_restore_state)
-	__sve_load 0, x1, 2
+	mov	x2, #1
+	__sve_load 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_restore_state)
 
 SYM_FUNC_START(__sve_save_state)
-	sve_save 0, x1, 2
+	mov	x2, #1
+	sve_save 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_save_state)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 04/38] arm64/sve: Rename find_supported_vector_length()
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The function has SVE specific checks in it and it will be more trouble
to add conditional code for SME than it is to simply rename it to be SVE
specific.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 3465d3328e44..dff71c041e85 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -337,7 +337,7 @@ static void fpsimd_save(void)
  * If things go wrong there's a bug somewhere, but try to fall back to a
  * safe choice.
  */
-static unsigned int find_supported_vector_length(unsigned int vl)
+static unsigned int find_supported_sve_vector_length(unsigned int vl)
 {
 	int bit;
 	int max_vl = sve_max_vl;
@@ -379,7 +379,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
 
-	set_sve_default_vl(find_supported_vector_length(vl));
+	set_sve_default_vl(find_supported_sve_vector_length(vl));
 	return 0;
 }
 
@@ -598,7 +598,7 @@ int sve_set_vector_length(struct task_struct *task,
 	if (vl > SVE_VL_ARCH_MAX)
 		vl = SVE_VL_ARCH_MAX;
 
-	vl = find_supported_vector_length(vl);
+	vl = find_supported_sve_vector_length(vl);
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
@@ -873,14 +873,14 @@ void __init sve_setup(void)
 	 * Sanity-check that the max VL we determined through CPU features
 	 * corresponds properly to sve_vq_map.  If not, do our best:
 	 */
-	if (WARN_ON(sve_max_vl != find_supported_vector_length(sve_max_vl)))
-		sve_max_vl = find_supported_vector_length(sve_max_vl);
+	if (WARN_ON(sve_max_vl != find_supported_sve_vector_length(sve_max_vl)))
+		sve_max_vl = find_supported_sve_vector_length(sve_max_vl);
 
 	/*
 	 * For the default VL, pick the maximum supported value <= 64.
 	 * VL == 64 is guaranteed not to grow the signal frame.
 	 */
-	set_sve_default_vl(find_supported_vector_length(64));
+	set_sve_default_vl(find_supported_sve_vector_length(64));
 
 	bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map,
 		      SVE_VQ_MAX);
@@ -1066,7 +1066,7 @@ void fpsimd_flush_thread(void)
 		if (WARN_ON(!sve_vl_valid(vl)))
 			vl = SVE_VL_MIN;
 
-		supported_vl = find_supported_vector_length(vl);
+		supported_vl = find_supported_sve_vector_length(vl);
 		if (WARN_ON(supported_vl != vl))
 			vl = supported_vl;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 04/38] arm64/sve: Rename find_supported_vector_length()
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The function has SVE specific checks in it and it will be more trouble
to add conditional code for SME than it is to simply rename it to be SVE
specific.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 3465d3328e44..dff71c041e85 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -337,7 +337,7 @@ static void fpsimd_save(void)
  * If things go wrong there's a bug somewhere, but try to fall back to a
  * safe choice.
  */
-static unsigned int find_supported_vector_length(unsigned int vl)
+static unsigned int find_supported_sve_vector_length(unsigned int vl)
 {
 	int bit;
 	int max_vl = sve_max_vl;
@@ -379,7 +379,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
 
-	set_sve_default_vl(find_supported_vector_length(vl));
+	set_sve_default_vl(find_supported_sve_vector_length(vl));
 	return 0;
 }
 
@@ -598,7 +598,7 @@ int sve_set_vector_length(struct task_struct *task,
 	if (vl > SVE_VL_ARCH_MAX)
 		vl = SVE_VL_ARCH_MAX;
 
-	vl = find_supported_vector_length(vl);
+	vl = find_supported_sve_vector_length(vl);
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
@@ -873,14 +873,14 @@ void __init sve_setup(void)
 	 * Sanity-check that the max VL we determined through CPU features
 	 * corresponds properly to sve_vq_map.  If not, do our best:
 	 */
-	if (WARN_ON(sve_max_vl != find_supported_vector_length(sve_max_vl)))
-		sve_max_vl = find_supported_vector_length(sve_max_vl);
+	if (WARN_ON(sve_max_vl != find_supported_sve_vector_length(sve_max_vl)))
+		sve_max_vl = find_supported_sve_vector_length(sve_max_vl);
 
 	/*
 	 * For the default VL, pick the maximum supported value <= 64.
 	 * VL == 64 is guaranteed not to grow the signal frame.
 	 */
-	set_sve_default_vl(find_supported_vector_length(64));
+	set_sve_default_vl(find_supported_sve_vector_length(64));
 
 	bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map,
 		      SVE_VQ_MAX);
@@ -1066,7 +1066,7 @@ void fpsimd_flush_thread(void)
 		if (WARN_ON(!sve_vl_valid(vl)))
 			vl = SVE_VL_MIN;
 
-		supported_vl = find_supported_vector_length(vl);
+		supported_vl = find_supported_sve_vector_length(vl);
 		if (WARN_ON(supported_vl != vl))
 			vl = supported_vl;
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 05/38] arm64/sve: Use accessor functions for vector lengths in thread_struct
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In a system with SME there are parallel vector length controls for SVE and
SME vectors which function in much the same way so it is desirable to
share the code for handling them as much as possible. In order to prepare
for doing this add a layer of accessor functions for the various VL related
operations on tasks.

Since almost all current interactions are actually via task->thread rather
than directly with the thread_info the accessors use that. Accessors are
provided for both generic and SVE specific usage, the generic accessors
should be used for cases where register state is being manipulated since
the registers are shared between streaming and regular SVE so we know that
when SME support is implemented we will always have to be in the appropriate
mode already and hence can generalise now.

Since we are using task_struct and we don't want to cause widespread
inclusion of sched.h the acessors are all out of line, it is hoped that
none of the uses are in a sufficiently critical path for this to be an
issue. Those that are most likely to present an issue are in the same
translation unit so hopefully the compiler may be able to inline anyway.

This is purely adding the layer of abstraction, additional work will be
needed to support tasks using SME.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  2 +-
 arch/arm64/include/asm/processor.h | 10 ++++++
 arch/arm64/kernel/fpsimd.c         | 55 +++++++++++++++++++++---------
 arch/arm64/kernel/ptrace.c         |  4 +--
 arch/arm64/kernel/signal.c         |  6 ++--
 5 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index bf5bb881ef67..7d0204f77f90 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -62,7 +62,7 @@ static inline size_t sve_ffr_offset(int vl)
 
 static inline void *sve_pffr(struct thread_struct *thread)
 {
-	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
+	return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread));
 }
 
 extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index ee2bdc1b9f5b..adb6a46a1fae 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -164,6 +164,16 @@ struct thread_struct {
 	u64			sctlr_user;
 };
 
+static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
+{
+	return thread->sve_vl;
+}
+
+unsigned int task_get_sve_vl(const struct task_struct *task);
+void task_set_sve_vl(struct task_struct *task, unsigned long vl);
+unsigned int task_get_sve_vl_onexec(const struct task_struct *task);
+void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl);
+
 #define SCTLR_USER_MASK                                                        \
 	(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
 	 SCTLR_EL1_TCF0_MASK)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index dff71c041e85..b0acaa20457c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -228,6 +228,26 @@ static void sve_free(struct task_struct *task)
 	__sve_free(task);
 }
 
+unsigned int task_get_sve_vl(const struct task_struct *task)
+{
+	return task->thread.sve_vl;
+}
+
+void task_set_sve_vl(struct task_struct *task, unsigned long vl)
+{
+	task->thread.sve_vl = vl;
+}
+
+unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
+{
+	return task->thread.sve_vl_onexec;
+}
+
+void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl)
+{
+	task->thread.sve_vl_onexec = vl;
+}
+
 /*
  * TIF_SVE controls whether a task can use SVE without trapping while
  * in userspace, and also the way a task's FPSIMD/SVE state is stored
@@ -290,7 +310,7 @@ static void task_fpsimd_load(void)
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE))
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr, true,
-			       sve_vq_from_vl(current->thread.sve_vl) - 1);
+			       sve_vq_from_vl(task_get_sve_vl(current)) - 1);
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
 }
@@ -458,7 +478,7 @@ static void fpsimd_to_sve(struct task_struct *task)
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(task));
 	__fpsimd_to_sve(sst, fst, vq);
 }
 
@@ -484,7 +504,7 @@ static void sve_to_fpsimd(struct task_struct *task)
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(task));
 	for (i = 0; i < SVE_NUM_ZREGS; ++i) {
 		p = (__uint128_t const *)ZREG(sst, vq, i);
 		fst->vregs[i] = arm64_le128_to_cpu(*p);
@@ -499,7 +519,7 @@ static void sve_to_fpsimd(struct task_struct *task)
  */
 size_t sve_state_size(struct task_struct const *task)
 {
-	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task->thread.sve_vl));
+	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task_get_sve_vl(task)));
 }
 
 /*
@@ -574,7 +594,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
 	if (!test_tsk_thread_flag(task, TIF_SVE))
 		return;
 
-	vq = sve_vq_from_vl(task->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(task));
 
 	memset(sst, 0, SVE_SIG_REGS_SIZE(vq));
 	__fpsimd_to_sve(sst, fst, vq);
@@ -602,16 +622,16 @@ int sve_set_vector_length(struct task_struct *task,
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
-		task->thread.sve_vl_onexec = vl;
+		task_set_sve_vl_onexec(task, vl);
 	else
 		/* Reset VL to system default on next exec: */
-		task->thread.sve_vl_onexec = 0;
+		task_set_sve_vl_onexec(task, 0);
 
 	/* Only actually set the VL if not deferred: */
 	if (flags & PR_SVE_SET_VL_ONEXEC)
 		goto out;
 
-	if (vl == task->thread.sve_vl)
+	if (vl == task_get_sve_vl(task))
 		goto out;
 
 	/*
@@ -638,7 +658,7 @@ int sve_set_vector_length(struct task_struct *task,
 	 */
 	sve_free(task);
 
-	task->thread.sve_vl = vl;
+	task_set_sve_vl(task, vl);
 
 out:
 	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
@@ -658,9 +678,9 @@ static int sve_prctl_status(unsigned long flags)
 	int ret;
 
 	if (flags & PR_SVE_SET_VL_ONEXEC)
-		ret = current->thread.sve_vl_onexec;
+		ret = task_get_sve_vl_onexec(current);
 	else
-		ret = current->thread.sve_vl;
+		ret = task_get_sve_vl(current);
 
 	if (test_thread_flag(TIF_SVE_VL_INHERIT))
 		ret |= PR_SVE_VL_INHERIT;
@@ -960,7 +980,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	 */
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		unsigned long vq_minus_one =
-			sve_vq_from_vl(current->thread.sve_vl) - 1;
+			sve_vq_from_vl(task_get_sve_vl(current)) - 1;
 		sve_set_vq(vq_minus_one);
 		sve_flush_live(true, vq_minus_one);
 		fpsimd_bind_task_to_cpu();
@@ -1060,8 +1080,9 @@ void fpsimd_flush_thread(void)
 		 * If a bug causes this to go wrong, we make some noise and
 		 * try to fudge thread.sve_vl to a safe value here.
 		 */
-		vl = current->thread.sve_vl_onexec ?
-			current->thread.sve_vl_onexec : get_sve_default_vl();
+		vl = task_get_sve_vl_onexec(current);
+		if (!vl)
+			vl = get_sve_default_vl();
 
 		if (WARN_ON(!sve_vl_valid(vl)))
 			vl = SVE_VL_MIN;
@@ -1070,14 +1091,14 @@ void fpsimd_flush_thread(void)
 		if (WARN_ON(supported_vl != vl))
 			vl = supported_vl;
 
-		current->thread.sve_vl = vl;
+		task_set_sve_vl(current, vl);
 
 		/*
 		 * If the task is not set to inherit, ensure that the vector
 		 * length will be reset by a subsequent exec:
 		 */
 		if (!test_thread_flag(TIF_SVE_VL_INHERIT))
-			current->thread.sve_vl_onexec = 0;
+			task_set_sve_vl_onexec(current, 0);
 	}
 
 	put_cpu_fpsimd_context();
@@ -1122,7 +1143,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	WARN_ON(!system_supports_fpsimd());
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_state = current->thread.sve_state;
-	last->sve_vl = current->thread.sve_vl;
+	last->sve_vl = task_get_sve_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	if (system_supports_sve()) {
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index e26196a33cf4..95ff03a1b077 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -725,7 +725,7 @@ static void sve_init_header_from_task(struct user_sve_header *header,
 	if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
 		header->flags |= SVE_PT_VL_INHERIT;
 
-	header->vl = target->thread.sve_vl;
+	header->vl = task_get_sve_vl(target);
 	vq = sve_vq_from_vl(header->vl);
 
 	header->max_vl = sve_max_vl;
@@ -820,7 +820,7 @@ static int sve_set(struct task_struct *target,
 		goto out;
 
 	/* Actual VL set may be less than the user asked for: */
-	vq = sve_vq_from_vl(target->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(target));
 
 	/* Registers: FPSIMD-only case */
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index c287b9407f28..aa1d9d7918da 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -227,7 +227,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 {
 	int err = 0;
 	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
-	unsigned int vl = current->thread.sve_vl;
+	unsigned int vl = task_get_sve_vl(current);
 	unsigned int vq = 0;
 
 	if (test_thread_flag(TIF_SVE))
@@ -266,7 +266,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (__copy_from_user(&sve, user->sve, sizeof(sve)))
 		return -EFAULT;
 
-	if (sve.vl != current->thread.sve_vl)
+	if (sve.vl != task_get_sve_vl(current))
 		return -EINVAL;
 
 	if (sve.head.size <= sizeof(*user->sve)) {
@@ -597,7 +597,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 			int vl = sve_max_vl;
 
 			if (!add_all)
-				vl = current->thread.sve_vl;
+				vl = task_get_sve_vl(current);
 
 			vq = sve_vq_from_vl(vl);
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 05/38] arm64/sve: Use accessor functions for vector lengths in thread_struct
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In a system with SME there are parallel vector length controls for SVE and
SME vectors which function in much the same way so it is desirable to
share the code for handling them as much as possible. In order to prepare
for doing this add a layer of accessor functions for the various VL related
operations on tasks.

Since almost all current interactions are actually via task->thread rather
than directly with the thread_info the accessors use that. Accessors are
provided for both generic and SVE specific usage, the generic accessors
should be used for cases where register state is being manipulated since
the registers are shared between streaming and regular SVE so we know that
when SME support is implemented we will always have to be in the appropriate
mode already and hence can generalise now.

Since we are using task_struct and we don't want to cause widespread
inclusion of sched.h the acessors are all out of line, it is hoped that
none of the uses are in a sufficiently critical path for this to be an
issue. Those that are most likely to present an issue are in the same
translation unit so hopefully the compiler may be able to inline anyway.

This is purely adding the layer of abstraction, additional work will be
needed to support tasks using SME.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  2 +-
 arch/arm64/include/asm/processor.h | 10 ++++++
 arch/arm64/kernel/fpsimd.c         | 55 +++++++++++++++++++++---------
 arch/arm64/kernel/ptrace.c         |  4 +--
 arch/arm64/kernel/signal.c         |  6 ++--
 5 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index bf5bb881ef67..7d0204f77f90 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -62,7 +62,7 @@ static inline size_t sve_ffr_offset(int vl)
 
 static inline void *sve_pffr(struct thread_struct *thread)
 {
-	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
+	return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread));
 }
 
 extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index ee2bdc1b9f5b..adb6a46a1fae 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -164,6 +164,16 @@ struct thread_struct {
 	u64			sctlr_user;
 };
 
+static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
+{
+	return thread->sve_vl;
+}
+
+unsigned int task_get_sve_vl(const struct task_struct *task);
+void task_set_sve_vl(struct task_struct *task, unsigned long vl);
+unsigned int task_get_sve_vl_onexec(const struct task_struct *task);
+void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl);
+
 #define SCTLR_USER_MASK                                                        \
 	(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
 	 SCTLR_EL1_TCF0_MASK)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index dff71c041e85..b0acaa20457c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -228,6 +228,26 @@ static void sve_free(struct task_struct *task)
 	__sve_free(task);
 }
 
+unsigned int task_get_sve_vl(const struct task_struct *task)
+{
+	return task->thread.sve_vl;
+}
+
+void task_set_sve_vl(struct task_struct *task, unsigned long vl)
+{
+	task->thread.sve_vl = vl;
+}
+
+unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
+{
+	return task->thread.sve_vl_onexec;
+}
+
+void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl)
+{
+	task->thread.sve_vl_onexec = vl;
+}
+
 /*
  * TIF_SVE controls whether a task can use SVE without trapping while
  * in userspace, and also the way a task's FPSIMD/SVE state is stored
@@ -290,7 +310,7 @@ static void task_fpsimd_load(void)
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE))
 		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr, true,
-			       sve_vq_from_vl(current->thread.sve_vl) - 1);
+			       sve_vq_from_vl(task_get_sve_vl(current)) - 1);
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
 }
@@ -458,7 +478,7 @@ static void fpsimd_to_sve(struct task_struct *task)
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(task));
 	__fpsimd_to_sve(sst, fst, vq);
 }
 
@@ -484,7 +504,7 @@ static void sve_to_fpsimd(struct task_struct *task)
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(task));
 	for (i = 0; i < SVE_NUM_ZREGS; ++i) {
 		p = (__uint128_t const *)ZREG(sst, vq, i);
 		fst->vregs[i] = arm64_le128_to_cpu(*p);
@@ -499,7 +519,7 @@ static void sve_to_fpsimd(struct task_struct *task)
  */
 size_t sve_state_size(struct task_struct const *task)
 {
-	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task->thread.sve_vl));
+	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task_get_sve_vl(task)));
 }
 
 /*
@@ -574,7 +594,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
 	if (!test_tsk_thread_flag(task, TIF_SVE))
 		return;
 
-	vq = sve_vq_from_vl(task->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(task));
 
 	memset(sst, 0, SVE_SIG_REGS_SIZE(vq));
 	__fpsimd_to_sve(sst, fst, vq);
@@ -602,16 +622,16 @@ int sve_set_vector_length(struct task_struct *task,
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
-		task->thread.sve_vl_onexec = vl;
+		task_set_sve_vl_onexec(task, vl);
 	else
 		/* Reset VL to system default on next exec: */
-		task->thread.sve_vl_onexec = 0;
+		task_set_sve_vl_onexec(task, 0);
 
 	/* Only actually set the VL if not deferred: */
 	if (flags & PR_SVE_SET_VL_ONEXEC)
 		goto out;
 
-	if (vl == task->thread.sve_vl)
+	if (vl == task_get_sve_vl(task))
 		goto out;
 
 	/*
@@ -638,7 +658,7 @@ int sve_set_vector_length(struct task_struct *task,
 	 */
 	sve_free(task);
 
-	task->thread.sve_vl = vl;
+	task_set_sve_vl(task, vl);
 
 out:
 	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
@@ -658,9 +678,9 @@ static int sve_prctl_status(unsigned long flags)
 	int ret;
 
 	if (flags & PR_SVE_SET_VL_ONEXEC)
-		ret = current->thread.sve_vl_onexec;
+		ret = task_get_sve_vl_onexec(current);
 	else
-		ret = current->thread.sve_vl;
+		ret = task_get_sve_vl(current);
 
 	if (test_thread_flag(TIF_SVE_VL_INHERIT))
 		ret |= PR_SVE_VL_INHERIT;
@@ -960,7 +980,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	 */
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		unsigned long vq_minus_one =
-			sve_vq_from_vl(current->thread.sve_vl) - 1;
+			sve_vq_from_vl(task_get_sve_vl(current)) - 1;
 		sve_set_vq(vq_minus_one);
 		sve_flush_live(true, vq_minus_one);
 		fpsimd_bind_task_to_cpu();
@@ -1060,8 +1080,9 @@ void fpsimd_flush_thread(void)
 		 * If a bug causes this to go wrong, we make some noise and
 		 * try to fudge thread.sve_vl to a safe value here.
 		 */
-		vl = current->thread.sve_vl_onexec ?
-			current->thread.sve_vl_onexec : get_sve_default_vl();
+		vl = task_get_sve_vl_onexec(current);
+		if (!vl)
+			vl = get_sve_default_vl();
 
 		if (WARN_ON(!sve_vl_valid(vl)))
 			vl = SVE_VL_MIN;
@@ -1070,14 +1091,14 @@ void fpsimd_flush_thread(void)
 		if (WARN_ON(supported_vl != vl))
 			vl = supported_vl;
 
-		current->thread.sve_vl = vl;
+		task_set_sve_vl(current, vl);
 
 		/*
 		 * If the task is not set to inherit, ensure that the vector
 		 * length will be reset by a subsequent exec:
 		 */
 		if (!test_thread_flag(TIF_SVE_VL_INHERIT))
-			current->thread.sve_vl_onexec = 0;
+			task_set_sve_vl_onexec(current, 0);
 	}
 
 	put_cpu_fpsimd_context();
@@ -1122,7 +1143,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	WARN_ON(!system_supports_fpsimd());
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_state = current->thread.sve_state;
-	last->sve_vl = current->thread.sve_vl;
+	last->sve_vl = task_get_sve_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	if (system_supports_sve()) {
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index e26196a33cf4..95ff03a1b077 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -725,7 +725,7 @@ static void sve_init_header_from_task(struct user_sve_header *header,
 	if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
 		header->flags |= SVE_PT_VL_INHERIT;
 
-	header->vl = target->thread.sve_vl;
+	header->vl = task_get_sve_vl(target);
 	vq = sve_vq_from_vl(header->vl);
 
 	header->max_vl = sve_max_vl;
@@ -820,7 +820,7 @@ static int sve_set(struct task_struct *target,
 		goto out;
 
 	/* Actual VL set may be less than the user asked for: */
-	vq = sve_vq_from_vl(target->thread.sve_vl);
+	vq = sve_vq_from_vl(task_get_sve_vl(target));
 
 	/* Registers: FPSIMD-only case */
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index c287b9407f28..aa1d9d7918da 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -227,7 +227,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 {
 	int err = 0;
 	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
-	unsigned int vl = current->thread.sve_vl;
+	unsigned int vl = task_get_sve_vl(current);
 	unsigned int vq = 0;
 
 	if (test_thread_flag(TIF_SVE))
@@ -266,7 +266,7 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (__copy_from_user(&sve, user->sve, sizeof(sve)))
 		return -EFAULT;
 
-	if (sve.vl != current->thread.sve_vl)
+	if (sve.vl != task_get_sve_vl(current))
 		return -EINVAL;
 
 	if (sve.head.size <= sizeof(*user->sve)) {
@@ -597,7 +597,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 			int vl = sve_max_vl;
 
 			if (!add_all)
-				vl = current->thread.sve_vl;
+				vl = task_get_sve_vl(current);
 
 			vq = sve_vq_from_vl(vl);
 		}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 06/38] arm64/sve: Put system wide vector length information into structs
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

With the introduction of SME we will have a second vector length in the
system, enumerated and configured in a very similar fashion to the
existing SVE vector length.  While there are a few differences in how
things are handled this is a relatively small portion of the overall
code so in order to avoid code duplication we factor out

We create two structs, one vl_info for the static hardware properties
and one vl_config for the runtime configuration, with an array
instantiated for each and update all the users to reference these. Some
accessor functions are provided where helpful for readability, and the
write to set the vector length is put into a function since the system
register being updated needs to be chosen at compile time.

This is a mostly mechanical replacement, further work will be required
to actually make things generic, ensuring that we handle those places
where there are differences properly.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  90 +++++++++++++---
 arch/arm64/include/asm/processor.h |   5 +
 arch/arm64/kernel/cpufeature.c     |   6 +-
 arch/arm64/kernel/fpsimd.c         | 163 ++++++++++++++++-------------
 arch/arm64/kernel/ptrace.c         |   2 +-
 arch/arm64/kernel/signal.c         |   2 +-
 arch/arm64/kvm/reset.c             |   6 +-
 7 files changed, 183 insertions(+), 91 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7d0204f77f90..02fa676d1a9a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -77,10 +77,6 @@ extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_zcr_features(void);
 
-extern int __ro_after_init sve_max_vl;
-extern int __ro_after_init sve_max_virtualisable_vl;
-extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
-
 /*
  * Helpers to translate bit indices in sve_vq_map to VQ values (and
  * vice versa).  This allows find_next_bit() to be used to find the
@@ -96,11 +92,6 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
 	return SVE_VQ_MAX - bit;
 }
 
-/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
-static inline bool sve_vq_available(unsigned int vq)
-{
-	return test_bit(__vq_to_bit(vq), sve_vq_map);
-}
 
 #ifdef CONFIG_ARM64_SVE
 
@@ -141,11 +132,84 @@ static inline void sve_user_enable(void)
  * Probing and setup functions.
  * Calls to these functions must be serialised with one another.
  */
-extern void __init sve_init_vq_map(void);
-extern void sve_update_vq_map(void);
-extern int sve_verify_vq_map(void);
+enum vec_type;
+
+extern void __init vec_init_vq_map(enum vec_type type);
+extern void vec_update_vq_map(enum vec_type type);
+extern int vec_verify_vq_map(enum vec_type type);
 extern void __init sve_setup(void);
 
+struct vl_info {
+	enum vec_type type;
+	const char *name;		/* For display purposes */
+
+	/* Minimum supported vector length across all CPUs */
+	int min_vl;
+
+	/* Maximum supported vector length across all CPUs */
+	int max_vl;
+	int max_virtualisable_vl;
+
+	/*
+	 * Set of available vector lengths,
+	 * where length vq encoded as bit __vq_to_bit(vq):
+	 */
+	DECLARE_BITMAP(vq_map, SVE_VQ_MAX);
+
+	/* Set of vector lengths present on at least one cpu: */
+	DECLARE_BITMAP(vq_partial_map, SVE_VQ_MAX);
+};
+
+extern __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX];
+
+static inline void write_vl(enum vec_type type, u64 val)
+{
+	u64 tmp;
+
+	switch (type) {
+#ifdef CONFIG_ARM64_SVE
+	case ARM64_VEC_SVE:
+		tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK;
+		write_sysreg_s(tmp | val, SYS_ZCR_EL1);
+		break;
+#endif
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
+
+static inline int vec_max_vl(enum vec_type type)
+{
+	return vl_info[type].max_vl;
+}
+
+static inline int vec_max_virtualisable_vl(enum vec_type type)
+{
+	return vl_info[type].max_virtualisable_vl;
+}
+
+static inline int sve_max_vl(void)
+{
+	return vec_max_vl(ARM64_VEC_SVE);
+}
+
+static inline int sve_max_virtualisable_vl(void)
+{
+	return vec_max_virtualisable_vl(ARM64_VEC_SVE);
+}
+
+/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
+static inline bool vq_available(enum vec_type type, unsigned int vq)
+{
+	return test_bit(__vq_to_bit(vq), vl_info[type].vq_map);
+}
+
+static inline bool sve_vq_available(unsigned int vq)
+{
+	return vq_available(ARM64_VEC_SVE, vq);
+}
+
 #else /* ! CONFIG_ARM64_SVE */
 
 static inline void sve_alloc(struct task_struct *task) { }
@@ -163,6 +227,8 @@ static inline int sve_get_current_vl(void)
 	return -EINVAL;
 }
 
+static inline bool sve_vq_available(unsigned int vq) { return false; }
+
 static inline void sve_user_disable(void) { BUILD_BUG(); }
 static inline void sve_user_enable(void) { BUILD_BUG(); }
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index adb6a46a1fae..fb0608fe9ded 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -115,6 +115,11 @@ struct debug_info {
 #endif
 };
 
+enum vec_type {
+	ARM64_VEC_SVE = 0,
+	ARM64_VEC_MAX,
+};
+
 struct cpu_context {
 	unsigned long x19;
 	unsigned long x20;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ec7036ef7e1..405a65d7e618 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -941,7 +941,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 
 	if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
 		init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
-		sve_init_vq_map();
+		vec_init_vq_map(ARM64_VEC_SVE);
 	}
 
 	if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
@@ -1175,7 +1175,7 @@ void update_cpu_features(int cpu,
 		/* Probe vector lengths, unless we already gave up on SVE */
 		if (id_aa64pfr0_sve(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1)) &&
 		    !system_capabilities_finalized())
-			sve_update_vq_map();
+			vec_update_vq_map(ARM64_VEC_SVE);
 	}
 
 	/*
@@ -2739,7 +2739,7 @@ static void verify_sve_features(void)
 	unsigned int safe_len = safe_zcr & ZCR_ELx_LEN_MASK;
 	unsigned int len = zcr & ZCR_ELx_LEN_MASK;
 
-	if (len < safe_len || sve_verify_vq_map()) {
+	if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SVE)) {
 		pr_crit("CPU%d: SVE: vector length support mismatch\n",
 			smp_processor_id());
 		cpu_die_early();
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b0acaa20457c..d45f14a68b9c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -121,40 +121,51 @@ struct fpsimd_last_state_struct {
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
 
-/* Default VL for tasks that don't set it explicitly: */
-static int __sve_default_vl = -1;
+__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
+#ifdef CONFIG_ARM64_SVE
+	[ARM64_VEC_SVE] = {
+		.type			= ARM64_VEC_SVE,
+		.name			= "SVE",
+		.min_vl			= SVE_VL_MIN,
+		.max_vl			= SVE_VL_MIN,
+		.max_virtualisable_vl	= SVE_VL_MIN,
+	},
+#endif
+};
+
+struct vl_config {
+	int __default_vl;		/* Default VL for tasks */
+};
+
+static struct vl_config vl_config[ARM64_VEC_MAX];
+
+static int get_default_vl(enum vec_type type)
+{
+	return READ_ONCE(vl_config[type].__default_vl);
+}
+
+static void set_default_vl(enum vec_type type, int val)
+{
+	WRITE_ONCE(vl_config[type].__default_vl, val);
+}
 
 static int get_sve_default_vl(void)
 {
-	return READ_ONCE(__sve_default_vl);
+	return get_default_vl(ARM64_VEC_SVE);
 }
 
 #ifdef CONFIG_ARM64_SVE
 
 static void set_sve_default_vl(int val)
 {
-	WRITE_ONCE(__sve_default_vl, val);
+	set_default_vl(ARM64_VEC_SVE, val);
 }
 
-/* Maximum supported vector length across all CPUs (initially poisoned) */
-int __ro_after_init sve_max_vl = SVE_VL_MIN;
-int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN;
-
-/*
- * Set of available vector lengths,
- * where length vq encoded as bit __vq_to_bit(vq):
- */
-__ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
-/* Set of vector lengths present on at least one cpu: */
-static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
-
 static void __percpu *efi_sve_state;
 
 #else /* ! CONFIG_ARM64_SVE */
 
 /* Dummy declaration for code that will be optimised out: */
-extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
-extern __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
 extern void __percpu *efi_sve_state;
 
 #endif /* ! CONFIG_ARM64_SVE */
@@ -357,21 +368,23 @@ static void fpsimd_save(void)
  * If things go wrong there's a bug somewhere, but try to fall back to a
  * safe choice.
  */
-static unsigned int find_supported_sve_vector_length(unsigned int vl)
+static unsigned int find_supported_vector_length(enum vec_type type,
+						 unsigned int vl)
 {
+	struct vl_info *info = &vl_info[type];
 	int bit;
-	int max_vl = sve_max_vl;
+	int max_vl = info->max_vl;
 
 	if (WARN_ON(!sve_vl_valid(vl)))
-		vl = SVE_VL_MIN;
+		vl = info->min_vl;
 
 	if (WARN_ON(!sve_vl_valid(max_vl)))
-		max_vl = SVE_VL_MIN;
+		max_vl = info->min_vl;
 
 	if (vl > max_vl)
 		vl = max_vl;
 
-	bit = find_next_bit(sve_vq_map, SVE_VQ_MAX,
+	bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
 			    __vq_to_bit(sve_vq_from_vl(vl)));
 	return sve_vl_from_vq(__bit_to_vq(bit));
 }
@@ -381,6 +394,7 @@ static unsigned int find_supported_sve_vector_length(unsigned int vl)
 static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 				  void *buffer, size_t *lenp, loff_t *ppos)
 {
+	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
 	int ret;
 	int vl = get_sve_default_vl();
 	struct ctl_table tmp_table = {
@@ -394,12 +408,12 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 
 	/* Writing -1 has the special meaning "set to max": */
 	if (vl == -1)
-		vl = sve_max_vl;
+		vl = info->max_vl;
 
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
 
-	set_sve_default_vl(find_supported_sve_vector_length(vl));
+	set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl));
 	return 0;
 }
 
@@ -618,7 +632,7 @@ int sve_set_vector_length(struct task_struct *task,
 	if (vl > SVE_VL_ARCH_MAX)
 		vl = SVE_VL_ARCH_MAX;
 
-	vl = find_supported_sve_vector_length(vl);
+	vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
@@ -716,18 +730,15 @@ int sve_get_current_vl(void)
 	return sve_prctl_status(0);
 }
 
-static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
+static void vec_probe_vqs(struct vl_info *info,
+			  DECLARE_BITMAP(map, SVE_VQ_MAX))
 {
 	unsigned int vq, vl;
-	unsigned long zcr;
 
 	bitmap_zero(map, SVE_VQ_MAX);
 
-	zcr = ZCR_ELx_LEN_MASK;
-	zcr = read_sysreg_s(SYS_ZCR_EL1) & ~zcr;
-
 	for (vq = SVE_VQ_MAX; vq >= SVE_VQ_MIN; --vq) {
-		write_sysreg_s(zcr | (vq - 1), SYS_ZCR_EL1); /* self-syncing */
+		write_vl(info->type, vq - 1); /* self-syncing */
 		vl = sve_get_vl();
 		vq = sve_vq_from_vl(vl); /* skip intervening lengths */
 		set_bit(__vq_to_bit(vq), map);
@@ -738,10 +749,11 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
  * Initialise the set of known supported VQs for the boot CPU.
  * This is called during kernel boot, before secondary CPUs are brought up.
  */
-void __init sve_init_vq_map(void)
+void __init vec_init_vq_map(enum vec_type type)
 {
-	sve_probe_vqs(sve_vq_map);
-	bitmap_copy(sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX);
+	struct vl_info *info = &vl_info[type];
+	vec_probe_vqs(info, info->vq_map);
+	bitmap_copy(info->vq_partial_map, info->vq_map, SVE_VQ_MAX);
 }
 
 /*
@@ -749,30 +761,33 @@ void __init sve_init_vq_map(void)
  * those not supported by the current CPU.
  * This function is called during the bring-up of early secondary CPUs only.
  */
-void sve_update_vq_map(void)
+void vec_update_vq_map(enum vec_type type)
 {
+	struct vl_info *info = &vl_info[type];
 	DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
 
-	sve_probe_vqs(tmp_map);
-	bitmap_and(sve_vq_map, sve_vq_map, tmp_map, SVE_VQ_MAX);
-	bitmap_or(sve_vq_partial_map, sve_vq_partial_map, tmp_map, SVE_VQ_MAX);
+	vec_probe_vqs(info, tmp_map);
+	bitmap_and(info->vq_map, info->vq_map, tmp_map, SVE_VQ_MAX);
+	bitmap_or(info->vq_partial_map, info->vq_partial_map, tmp_map,
+		  SVE_VQ_MAX);
 }
 
 /*
  * Check whether the current CPU supports all VQs in the committed set.
  * This function is called during the bring-up of late secondary CPUs only.
  */
-int sve_verify_vq_map(void)
+int vec_verify_vq_map(enum vec_type type)
 {
+	struct vl_info *info = &vl_info[type];
 	DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
 	unsigned long b;
 
-	sve_probe_vqs(tmp_map);
+	vec_probe_vqs(info, tmp_map);
 
 	bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX);
-	if (bitmap_intersects(tmp_map, sve_vq_map, SVE_VQ_MAX)) {
-		pr_warn("SVE: cpu%d: Required vector length(s) missing\n",
-			smp_processor_id());
+	if (bitmap_intersects(tmp_map, info->vq_map, SVE_VQ_MAX)) {
+		pr_warn("%s: cpu%d: Required vector length(s) missing\n",
+			info->name, smp_processor_id());
 		return -EINVAL;
 	}
 
@@ -788,7 +803,7 @@ int sve_verify_vq_map(void)
 	/* Recover the set of supported VQs: */
 	bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX);
 	/* Find VQs supported that are not globally supported: */
-	bitmap_andnot(tmp_map, tmp_map, sve_vq_map, SVE_VQ_MAX);
+	bitmap_andnot(tmp_map, tmp_map, info->vq_map, SVE_VQ_MAX);
 
 	/* Find the lowest such VQ, if any: */
 	b = find_last_bit(tmp_map, SVE_VQ_MAX);
@@ -799,9 +814,9 @@ int sve_verify_vq_map(void)
 	 * Mismatches above sve_max_virtualisable_vl are fine, since
 	 * no guest is allowed to configure ZCR_EL2.LEN to exceed this:
 	 */
-	if (sve_vl_from_vq(__bit_to_vq(b)) <= sve_max_virtualisable_vl) {
-		pr_warn("SVE: cpu%d: Unsupported vector length(s) present\n",
-			smp_processor_id());
+	if (sve_vl_from_vq(__bit_to_vq(b)) <= info->max_virtualisable_vl) {
+		pr_warn("%s: cpu%d: Unsupported vector length(s) present\n",
+			info->name, smp_processor_id());
 		return -EINVAL;
 	}
 
@@ -810,6 +825,8 @@ int sve_verify_vq_map(void)
 
 static void __init sve_efi_setup(void)
 {
+	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
+
 	if (!IS_ENABLED(CONFIG_EFI))
 		return;
 
@@ -818,11 +835,11 @@ static void __init sve_efi_setup(void)
 	 * This is evidence of a crippled system and we are returning void,
 	 * so no attempt is made to handle this situation here.
 	 */
-	if (!sve_vl_valid(sve_max_vl))
+	if (!sve_vl_valid(info->max_vl))
 		goto fail;
 
 	efi_sve_state = __alloc_percpu(
-		SVE_SIG_REGS_SIZE(sve_vq_from_vl(sve_max_vl)), SVE_VQ_BYTES);
+		SVE_SIG_REGS_SIZE(sve_vq_from_vl(info->max_vl)), SVE_VQ_BYTES);
 	if (!efi_sve_state)
 		goto fail;
 
@@ -871,6 +888,7 @@ u64 read_zcr_features(void)
 
 void __init sve_setup(void)
 {
+	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
 	u64 zcr;
 	DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
 	unsigned long b;
@@ -883,49 +901,52 @@ void __init sve_setup(void)
 	 * so sve_vq_map must have at least SVE_VQ_MIN set.
 	 * If something went wrong, at least try to patch it up:
 	 */
-	if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map)))
-		set_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map);
+	if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), info->vq_map)))
+		set_bit(__vq_to_bit(SVE_VQ_MIN), info->vq_map);
 
 	zcr = read_sanitised_ftr_reg(SYS_ZCR_EL1);
-	sve_max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1);
+	info->max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1);
 
 	/*
 	 * Sanity-check that the max VL we determined through CPU features
 	 * corresponds properly to sve_vq_map.  If not, do our best:
 	 */
-	if (WARN_ON(sve_max_vl != find_supported_sve_vector_length(sve_max_vl)))
-		sve_max_vl = find_supported_sve_vector_length(sve_max_vl);
+	if (WARN_ON(info->max_vl != find_supported_vector_length(ARM64_VEC_SVE,
+								 info->max_vl)))
+		info->max_vl = find_supported_vector_length(ARM64_VEC_SVE,
+							    info->max_vl);
 
 	/*
 	 * For the default VL, pick the maximum supported value <= 64.
 	 * VL == 64 is guaranteed not to grow the signal frame.
 	 */
-	set_sve_default_vl(find_supported_sve_vector_length(64));
+	set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, 64));
 
-	bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map,
+	bitmap_andnot(tmp_map, info->vq_partial_map, info->vq_map,
 		      SVE_VQ_MAX);
 
 	b = find_last_bit(tmp_map, SVE_VQ_MAX);
 	if (b >= SVE_VQ_MAX)
 		/* No non-virtualisable VLs found */
-		sve_max_virtualisable_vl = SVE_VQ_MAX;
+		info->max_virtualisable_vl = SVE_VQ_MAX;
 	else if (WARN_ON(b == SVE_VQ_MAX - 1))
 		/* No virtualisable VLs?  This is architecturally forbidden. */
-		sve_max_virtualisable_vl = SVE_VQ_MIN;
+		info->max_virtualisable_vl = SVE_VQ_MIN;
 	else /* b + 1 < SVE_VQ_MAX */
-		sve_max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1));
+		info->max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1));
 
-	if (sve_max_virtualisable_vl > sve_max_vl)
-		sve_max_virtualisable_vl = sve_max_vl;
+	if (info->max_virtualisable_vl > info->max_vl)
+		info->max_virtualisable_vl = info->max_vl;
 
-	pr_info("SVE: maximum available vector length %u bytes per vector\n",
-		sve_max_vl);
-	pr_info("SVE: default vector length %u bytes per vector\n",
-		get_sve_default_vl());
+	pr_info("%s: maximum available vector length %u bytes per vector\n",
+		info->name, info->max_vl);
+	pr_info("%s: default vector length %u bytes per vector\n",
+		info->name, get_sve_default_vl());
 
 	/* KVM decides whether to support mismatched systems. Just warn here: */
-	if (sve_max_virtualisable_vl < sve_max_vl)
-		pr_warn("SVE: unvirtualisable vector lengths present\n");
+	if (sve_max_virtualisable_vl() < sve_max_vl())
+		pr_warn("%s: unvirtualisable vector lengths present\n",
+			info->name);
 
 	sve_efi_setup();
 }
@@ -1087,7 +1108,7 @@ void fpsimd_flush_thread(void)
 		if (WARN_ON(!sve_vl_valid(vl)))
 			vl = SVE_VL_MIN;
 
-		supported_vl = find_supported_sve_vector_length(vl);
+		supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
 		if (WARN_ON(supported_vl != vl))
 			vl = supported_vl;
 
@@ -1376,7 +1397,7 @@ void __efi_fpsimd_begin(void)
 
 			__this_cpu_write(efi_sve_state_used, true);
 
-			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl),
+			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
 				       true);
 		} else {
@@ -1402,7 +1423,7 @@ void __efi_fpsimd_end(void)
 		    likely(__this_cpu_read(efi_sve_state_used))) {
 			char const *sve_state = this_cpu_ptr(efi_sve_state);
 
-			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl),
+			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
 				       true,
 				       sve_vq_from_vl(sve_get_vl()) - 1);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 95ff03a1b077..88a9034fb9b5 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -728,7 +728,7 @@ static void sve_init_header_from_task(struct user_sve_header *header,
 	header->vl = task_get_sve_vl(target);
 	vq = sve_vq_from_vl(header->vl);
 
-	header->max_vl = sve_max_vl;
+	header->max_vl = sve_max_vl();
 	header->size = SVE_PT_SIZE(vq, header->flags);
 	header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
 				      SVE_PT_REGS_SVE);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index aa1d9d7918da..8f6372b44b65 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -594,7 +594,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 		unsigned int vq = 0;
 
 		if (add_all || test_thread_flag(TIF_SVE)) {
-			int vl = sve_max_vl;
+			int vl = sve_max_vl();
 
 			if (!add_all)
 				vl = task_get_sve_vl(current);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ce36b0a3343..09cd30a9aafb 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -46,7 +46,7 @@ unsigned int kvm_sve_max_vl;
 int kvm_arm_init_sve(void)
 {
 	if (system_supports_sve()) {
-		kvm_sve_max_vl = sve_max_virtualisable_vl;
+		kvm_sve_max_vl = sve_max_virtualisable_vl();
 
 		/*
 		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
@@ -61,7 +61,7 @@ int kvm_arm_init_sve(void)
 		 * Don't even try to make use of vector lengths that
 		 * aren't available on all CPUs, for now:
 		 */
-		if (kvm_sve_max_vl < sve_max_vl)
+		if (kvm_sve_max_vl < sve_max_vl())
 			pr_warn("KVM: SVE vector length for guests limited to %u bytes\n",
 				kvm_sve_max_vl);
 	}
@@ -102,7 +102,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
 	 * set_sve_vls().  Double-check here just to be sure:
 	 */
-	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
+	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() ||
 		    vl > SVE_VL_ARCH_MAX))
 		return -EIO;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 06/38] arm64/sve: Put system wide vector length information into structs
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

With the introduction of SME we will have a second vector length in the
system, enumerated and configured in a very similar fashion to the
existing SVE vector length.  While there are a few differences in how
things are handled this is a relatively small portion of the overall
code so in order to avoid code duplication we factor out

We create two structs, one vl_info for the static hardware properties
and one vl_config for the runtime configuration, with an array
instantiated for each and update all the users to reference these. Some
accessor functions are provided where helpful for readability, and the
write to set the vector length is put into a function since the system
register being updated needs to be chosen at compile time.

This is a mostly mechanical replacement, further work will be required
to actually make things generic, ensuring that we handle those places
where there are differences properly.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    |  90 +++++++++++++---
 arch/arm64/include/asm/processor.h |   5 +
 arch/arm64/kernel/cpufeature.c     |   6 +-
 arch/arm64/kernel/fpsimd.c         | 163 ++++++++++++++++-------------
 arch/arm64/kernel/ptrace.c         |   2 +-
 arch/arm64/kernel/signal.c         |   2 +-
 arch/arm64/kvm/reset.c             |   6 +-
 7 files changed, 183 insertions(+), 91 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 7d0204f77f90..02fa676d1a9a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -77,10 +77,6 @@ extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_zcr_features(void);
 
-extern int __ro_after_init sve_max_vl;
-extern int __ro_after_init sve_max_virtualisable_vl;
-extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
-
 /*
  * Helpers to translate bit indices in sve_vq_map to VQ values (and
  * vice versa).  This allows find_next_bit() to be used to find the
@@ -96,11 +92,6 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
 	return SVE_VQ_MAX - bit;
 }
 
-/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
-static inline bool sve_vq_available(unsigned int vq)
-{
-	return test_bit(__vq_to_bit(vq), sve_vq_map);
-}
 
 #ifdef CONFIG_ARM64_SVE
 
@@ -141,11 +132,84 @@ static inline void sve_user_enable(void)
  * Probing and setup functions.
  * Calls to these functions must be serialised with one another.
  */
-extern void __init sve_init_vq_map(void);
-extern void sve_update_vq_map(void);
-extern int sve_verify_vq_map(void);
+enum vec_type;
+
+extern void __init vec_init_vq_map(enum vec_type type);
+extern void vec_update_vq_map(enum vec_type type);
+extern int vec_verify_vq_map(enum vec_type type);
 extern void __init sve_setup(void);
 
+struct vl_info {
+	enum vec_type type;
+	const char *name;		/* For display purposes */
+
+	/* Minimum supported vector length across all CPUs */
+	int min_vl;
+
+	/* Maximum supported vector length across all CPUs */
+	int max_vl;
+	int max_virtualisable_vl;
+
+	/*
+	 * Set of available vector lengths,
+	 * where length vq encoded as bit __vq_to_bit(vq):
+	 */
+	DECLARE_BITMAP(vq_map, SVE_VQ_MAX);
+
+	/* Set of vector lengths present on at least one cpu: */
+	DECLARE_BITMAP(vq_partial_map, SVE_VQ_MAX);
+};
+
+extern __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX];
+
+static inline void write_vl(enum vec_type type, u64 val)
+{
+	u64 tmp;
+
+	switch (type) {
+#ifdef CONFIG_ARM64_SVE
+	case ARM64_VEC_SVE:
+		tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK;
+		write_sysreg_s(tmp | val, SYS_ZCR_EL1);
+		break;
+#endif
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
+
+static inline int vec_max_vl(enum vec_type type)
+{
+	return vl_info[type].max_vl;
+}
+
+static inline int vec_max_virtualisable_vl(enum vec_type type)
+{
+	return vl_info[type].max_virtualisable_vl;
+}
+
+static inline int sve_max_vl(void)
+{
+	return vec_max_vl(ARM64_VEC_SVE);
+}
+
+static inline int sve_max_virtualisable_vl(void)
+{
+	return vec_max_virtualisable_vl(ARM64_VEC_SVE);
+}
+
+/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
+static inline bool vq_available(enum vec_type type, unsigned int vq)
+{
+	return test_bit(__vq_to_bit(vq), vl_info[type].vq_map);
+}
+
+static inline bool sve_vq_available(unsigned int vq)
+{
+	return vq_available(ARM64_VEC_SVE, vq);
+}
+
 #else /* ! CONFIG_ARM64_SVE */
 
 static inline void sve_alloc(struct task_struct *task) { }
@@ -163,6 +227,8 @@ static inline int sve_get_current_vl(void)
 	return -EINVAL;
 }
 
+static inline bool sve_vq_available(unsigned int vq) { return false; }
+
 static inline void sve_user_disable(void) { BUILD_BUG(); }
 static inline void sve_user_enable(void) { BUILD_BUG(); }
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index adb6a46a1fae..fb0608fe9ded 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -115,6 +115,11 @@ struct debug_info {
 #endif
 };
 
+enum vec_type {
+	ARM64_VEC_SVE = 0,
+	ARM64_VEC_MAX,
+};
+
 struct cpu_context {
 	unsigned long x19;
 	unsigned long x20;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ec7036ef7e1..405a65d7e618 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -941,7 +941,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 
 	if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
 		init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
-		sve_init_vq_map();
+		vec_init_vq_map(ARM64_VEC_SVE);
 	}
 
 	if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
@@ -1175,7 +1175,7 @@ void update_cpu_features(int cpu,
 		/* Probe vector lengths, unless we already gave up on SVE */
 		if (id_aa64pfr0_sve(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1)) &&
 		    !system_capabilities_finalized())
-			sve_update_vq_map();
+			vec_update_vq_map(ARM64_VEC_SVE);
 	}
 
 	/*
@@ -2739,7 +2739,7 @@ static void verify_sve_features(void)
 	unsigned int safe_len = safe_zcr & ZCR_ELx_LEN_MASK;
 	unsigned int len = zcr & ZCR_ELx_LEN_MASK;
 
-	if (len < safe_len || sve_verify_vq_map()) {
+	if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SVE)) {
 		pr_crit("CPU%d: SVE: vector length support mismatch\n",
 			smp_processor_id());
 		cpu_die_early();
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b0acaa20457c..d45f14a68b9c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -121,40 +121,51 @@ struct fpsimd_last_state_struct {
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
 
-/* Default VL for tasks that don't set it explicitly: */
-static int __sve_default_vl = -1;
+__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
+#ifdef CONFIG_ARM64_SVE
+	[ARM64_VEC_SVE] = {
+		.type			= ARM64_VEC_SVE,
+		.name			= "SVE",
+		.min_vl			= SVE_VL_MIN,
+		.max_vl			= SVE_VL_MIN,
+		.max_virtualisable_vl	= SVE_VL_MIN,
+	},
+#endif
+};
+
+struct vl_config {
+	int __default_vl;		/* Default VL for tasks */
+};
+
+static struct vl_config vl_config[ARM64_VEC_MAX];
+
+static int get_default_vl(enum vec_type type)
+{
+	return READ_ONCE(vl_config[type].__default_vl);
+}
+
+static void set_default_vl(enum vec_type type, int val)
+{
+	WRITE_ONCE(vl_config[type].__default_vl, val);
+}
 
 static int get_sve_default_vl(void)
 {
-	return READ_ONCE(__sve_default_vl);
+	return get_default_vl(ARM64_VEC_SVE);
 }
 
 #ifdef CONFIG_ARM64_SVE
 
 static void set_sve_default_vl(int val)
 {
-	WRITE_ONCE(__sve_default_vl, val);
+	set_default_vl(ARM64_VEC_SVE, val);
 }
 
-/* Maximum supported vector length across all CPUs (initially poisoned) */
-int __ro_after_init sve_max_vl = SVE_VL_MIN;
-int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN;
-
-/*
- * Set of available vector lengths,
- * where length vq encoded as bit __vq_to_bit(vq):
- */
-__ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
-/* Set of vector lengths present on at least one cpu: */
-static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
-
 static void __percpu *efi_sve_state;
 
 #else /* ! CONFIG_ARM64_SVE */
 
 /* Dummy declaration for code that will be optimised out: */
-extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
-extern __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
 extern void __percpu *efi_sve_state;
 
 #endif /* ! CONFIG_ARM64_SVE */
@@ -357,21 +368,23 @@ static void fpsimd_save(void)
  * If things go wrong there's a bug somewhere, but try to fall back to a
  * safe choice.
  */
-static unsigned int find_supported_sve_vector_length(unsigned int vl)
+static unsigned int find_supported_vector_length(enum vec_type type,
+						 unsigned int vl)
 {
+	struct vl_info *info = &vl_info[type];
 	int bit;
-	int max_vl = sve_max_vl;
+	int max_vl = info->max_vl;
 
 	if (WARN_ON(!sve_vl_valid(vl)))
-		vl = SVE_VL_MIN;
+		vl = info->min_vl;
 
 	if (WARN_ON(!sve_vl_valid(max_vl)))
-		max_vl = SVE_VL_MIN;
+		max_vl = info->min_vl;
 
 	if (vl > max_vl)
 		vl = max_vl;
 
-	bit = find_next_bit(sve_vq_map, SVE_VQ_MAX,
+	bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
 			    __vq_to_bit(sve_vq_from_vl(vl)));
 	return sve_vl_from_vq(__bit_to_vq(bit));
 }
@@ -381,6 +394,7 @@ static unsigned int find_supported_sve_vector_length(unsigned int vl)
 static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 				  void *buffer, size_t *lenp, loff_t *ppos)
 {
+	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
 	int ret;
 	int vl = get_sve_default_vl();
 	struct ctl_table tmp_table = {
@@ -394,12 +408,12 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 
 	/* Writing -1 has the special meaning "set to max": */
 	if (vl == -1)
-		vl = sve_max_vl;
+		vl = info->max_vl;
 
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
 
-	set_sve_default_vl(find_supported_sve_vector_length(vl));
+	set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl));
 	return 0;
 }
 
@@ -618,7 +632,7 @@ int sve_set_vector_length(struct task_struct *task,
 	if (vl > SVE_VL_ARCH_MAX)
 		vl = SVE_VL_ARCH_MAX;
 
-	vl = find_supported_sve_vector_length(vl);
+	vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
@@ -716,18 +730,15 @@ int sve_get_current_vl(void)
 	return sve_prctl_status(0);
 }
 
-static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
+static void vec_probe_vqs(struct vl_info *info,
+			  DECLARE_BITMAP(map, SVE_VQ_MAX))
 {
 	unsigned int vq, vl;
-	unsigned long zcr;
 
 	bitmap_zero(map, SVE_VQ_MAX);
 
-	zcr = ZCR_ELx_LEN_MASK;
-	zcr = read_sysreg_s(SYS_ZCR_EL1) & ~zcr;
-
 	for (vq = SVE_VQ_MAX; vq >= SVE_VQ_MIN; --vq) {
-		write_sysreg_s(zcr | (vq - 1), SYS_ZCR_EL1); /* self-syncing */
+		write_vl(info->type, vq - 1); /* self-syncing */
 		vl = sve_get_vl();
 		vq = sve_vq_from_vl(vl); /* skip intervening lengths */
 		set_bit(__vq_to_bit(vq), map);
@@ -738,10 +749,11 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
  * Initialise the set of known supported VQs for the boot CPU.
  * This is called during kernel boot, before secondary CPUs are brought up.
  */
-void __init sve_init_vq_map(void)
+void __init vec_init_vq_map(enum vec_type type)
 {
-	sve_probe_vqs(sve_vq_map);
-	bitmap_copy(sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX);
+	struct vl_info *info = &vl_info[type];
+	vec_probe_vqs(info, info->vq_map);
+	bitmap_copy(info->vq_partial_map, info->vq_map, SVE_VQ_MAX);
 }
 
 /*
@@ -749,30 +761,33 @@ void __init sve_init_vq_map(void)
  * those not supported by the current CPU.
  * This function is called during the bring-up of early secondary CPUs only.
  */
-void sve_update_vq_map(void)
+void vec_update_vq_map(enum vec_type type)
 {
+	struct vl_info *info = &vl_info[type];
 	DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
 
-	sve_probe_vqs(tmp_map);
-	bitmap_and(sve_vq_map, sve_vq_map, tmp_map, SVE_VQ_MAX);
-	bitmap_or(sve_vq_partial_map, sve_vq_partial_map, tmp_map, SVE_VQ_MAX);
+	vec_probe_vqs(info, tmp_map);
+	bitmap_and(info->vq_map, info->vq_map, tmp_map, SVE_VQ_MAX);
+	bitmap_or(info->vq_partial_map, info->vq_partial_map, tmp_map,
+		  SVE_VQ_MAX);
 }
 
 /*
  * Check whether the current CPU supports all VQs in the committed set.
  * This function is called during the bring-up of late secondary CPUs only.
  */
-int sve_verify_vq_map(void)
+int vec_verify_vq_map(enum vec_type type)
 {
+	struct vl_info *info = &vl_info[type];
 	DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
 	unsigned long b;
 
-	sve_probe_vqs(tmp_map);
+	vec_probe_vqs(info, tmp_map);
 
 	bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX);
-	if (bitmap_intersects(tmp_map, sve_vq_map, SVE_VQ_MAX)) {
-		pr_warn("SVE: cpu%d: Required vector length(s) missing\n",
-			smp_processor_id());
+	if (bitmap_intersects(tmp_map, info->vq_map, SVE_VQ_MAX)) {
+		pr_warn("%s: cpu%d: Required vector length(s) missing\n",
+			info->name, smp_processor_id());
 		return -EINVAL;
 	}
 
@@ -788,7 +803,7 @@ int sve_verify_vq_map(void)
 	/* Recover the set of supported VQs: */
 	bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX);
 	/* Find VQs supported that are not globally supported: */
-	bitmap_andnot(tmp_map, tmp_map, sve_vq_map, SVE_VQ_MAX);
+	bitmap_andnot(tmp_map, tmp_map, info->vq_map, SVE_VQ_MAX);
 
 	/* Find the lowest such VQ, if any: */
 	b = find_last_bit(tmp_map, SVE_VQ_MAX);
@@ -799,9 +814,9 @@ int sve_verify_vq_map(void)
 	 * Mismatches above sve_max_virtualisable_vl are fine, since
 	 * no guest is allowed to configure ZCR_EL2.LEN to exceed this:
 	 */
-	if (sve_vl_from_vq(__bit_to_vq(b)) <= sve_max_virtualisable_vl) {
-		pr_warn("SVE: cpu%d: Unsupported vector length(s) present\n",
-			smp_processor_id());
+	if (sve_vl_from_vq(__bit_to_vq(b)) <= info->max_virtualisable_vl) {
+		pr_warn("%s: cpu%d: Unsupported vector length(s) present\n",
+			info->name, smp_processor_id());
 		return -EINVAL;
 	}
 
@@ -810,6 +825,8 @@ int sve_verify_vq_map(void)
 
 static void __init sve_efi_setup(void)
 {
+	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
+
 	if (!IS_ENABLED(CONFIG_EFI))
 		return;
 
@@ -818,11 +835,11 @@ static void __init sve_efi_setup(void)
 	 * This is evidence of a crippled system and we are returning void,
 	 * so no attempt is made to handle this situation here.
 	 */
-	if (!sve_vl_valid(sve_max_vl))
+	if (!sve_vl_valid(info->max_vl))
 		goto fail;
 
 	efi_sve_state = __alloc_percpu(
-		SVE_SIG_REGS_SIZE(sve_vq_from_vl(sve_max_vl)), SVE_VQ_BYTES);
+		SVE_SIG_REGS_SIZE(sve_vq_from_vl(info->max_vl)), SVE_VQ_BYTES);
 	if (!efi_sve_state)
 		goto fail;
 
@@ -871,6 +888,7 @@ u64 read_zcr_features(void)
 
 void __init sve_setup(void)
 {
+	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
 	u64 zcr;
 	DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
 	unsigned long b;
@@ -883,49 +901,52 @@ void __init sve_setup(void)
 	 * so sve_vq_map must have at least SVE_VQ_MIN set.
 	 * If something went wrong, at least try to patch it up:
 	 */
-	if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map)))
-		set_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map);
+	if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), info->vq_map)))
+		set_bit(__vq_to_bit(SVE_VQ_MIN), info->vq_map);
 
 	zcr = read_sanitised_ftr_reg(SYS_ZCR_EL1);
-	sve_max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1);
+	info->max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1);
 
 	/*
 	 * Sanity-check that the max VL we determined through CPU features
 	 * corresponds properly to sve_vq_map.  If not, do our best:
 	 */
-	if (WARN_ON(sve_max_vl != find_supported_sve_vector_length(sve_max_vl)))
-		sve_max_vl = find_supported_sve_vector_length(sve_max_vl);
+	if (WARN_ON(info->max_vl != find_supported_vector_length(ARM64_VEC_SVE,
+								 info->max_vl)))
+		info->max_vl = find_supported_vector_length(ARM64_VEC_SVE,
+							    info->max_vl);
 
 	/*
 	 * For the default VL, pick the maximum supported value <= 64.
 	 * VL == 64 is guaranteed not to grow the signal frame.
 	 */
-	set_sve_default_vl(find_supported_sve_vector_length(64));
+	set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, 64));
 
-	bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map,
+	bitmap_andnot(tmp_map, info->vq_partial_map, info->vq_map,
 		      SVE_VQ_MAX);
 
 	b = find_last_bit(tmp_map, SVE_VQ_MAX);
 	if (b >= SVE_VQ_MAX)
 		/* No non-virtualisable VLs found */
-		sve_max_virtualisable_vl = SVE_VQ_MAX;
+		info->max_virtualisable_vl = SVE_VQ_MAX;
 	else if (WARN_ON(b == SVE_VQ_MAX - 1))
 		/* No virtualisable VLs?  This is architecturally forbidden. */
-		sve_max_virtualisable_vl = SVE_VQ_MIN;
+		info->max_virtualisable_vl = SVE_VQ_MIN;
 	else /* b + 1 < SVE_VQ_MAX */
-		sve_max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1));
+		info->max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1));
 
-	if (sve_max_virtualisable_vl > sve_max_vl)
-		sve_max_virtualisable_vl = sve_max_vl;
+	if (info->max_virtualisable_vl > info->max_vl)
+		info->max_virtualisable_vl = info->max_vl;
 
-	pr_info("SVE: maximum available vector length %u bytes per vector\n",
-		sve_max_vl);
-	pr_info("SVE: default vector length %u bytes per vector\n",
-		get_sve_default_vl());
+	pr_info("%s: maximum available vector length %u bytes per vector\n",
+		info->name, info->max_vl);
+	pr_info("%s: default vector length %u bytes per vector\n",
+		info->name, get_sve_default_vl());
 
 	/* KVM decides whether to support mismatched systems. Just warn here: */
-	if (sve_max_virtualisable_vl < sve_max_vl)
-		pr_warn("SVE: unvirtualisable vector lengths present\n");
+	if (sve_max_virtualisable_vl() < sve_max_vl())
+		pr_warn("%s: unvirtualisable vector lengths present\n",
+			info->name);
 
 	sve_efi_setup();
 }
@@ -1087,7 +1108,7 @@ void fpsimd_flush_thread(void)
 		if (WARN_ON(!sve_vl_valid(vl)))
 			vl = SVE_VL_MIN;
 
-		supported_vl = find_supported_sve_vector_length(vl);
+		supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
 		if (WARN_ON(supported_vl != vl))
 			vl = supported_vl;
 
@@ -1376,7 +1397,7 @@ void __efi_fpsimd_begin(void)
 
 			__this_cpu_write(efi_sve_state_used, true);
 
-			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl),
+			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
 				       true);
 		} else {
@@ -1402,7 +1423,7 @@ void __efi_fpsimd_end(void)
 		    likely(__this_cpu_read(efi_sve_state_used))) {
 			char const *sve_state = this_cpu_ptr(efi_sve_state);
 
-			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl),
+			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
 				       true,
 				       sve_vq_from_vl(sve_get_vl()) - 1);
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 95ff03a1b077..88a9034fb9b5 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -728,7 +728,7 @@ static void sve_init_header_from_task(struct user_sve_header *header,
 	header->vl = task_get_sve_vl(target);
 	vq = sve_vq_from_vl(header->vl);
 
-	header->max_vl = sve_max_vl;
+	header->max_vl = sve_max_vl();
 	header->size = SVE_PT_SIZE(vq, header->flags);
 	header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
 				      SVE_PT_REGS_SVE);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index aa1d9d7918da..8f6372b44b65 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -594,7 +594,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 		unsigned int vq = 0;
 
 		if (add_all || test_thread_flag(TIF_SVE)) {
-			int vl = sve_max_vl;
+			int vl = sve_max_vl();
 
 			if (!add_all)
 				vl = task_get_sve_vl(current);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ce36b0a3343..09cd30a9aafb 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -46,7 +46,7 @@ unsigned int kvm_sve_max_vl;
 int kvm_arm_init_sve(void)
 {
 	if (system_supports_sve()) {
-		kvm_sve_max_vl = sve_max_virtualisable_vl;
+		kvm_sve_max_vl = sve_max_virtualisable_vl();
 
 		/*
 		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
@@ -61,7 +61,7 @@ int kvm_arm_init_sve(void)
 		 * Don't even try to make use of vector lengths that
 		 * aren't available on all CPUs, for now:
 		 */
-		if (kvm_sve_max_vl < sve_max_vl)
+		if (kvm_sve_max_vl < sve_max_vl())
 			pr_warn("KVM: SVE vector length for guests limited to %u bytes\n",
 				kvm_sve_max_vl);
 	}
@@ -102,7 +102,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
 	 * set_sve_vls().  Double-check here just to be sure:
 	 */
-	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
+	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() ||
 		    vl > SVE_VL_ARCH_MAX))
 		return -EIO;
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 07/38] arm64/sve: Explicitly load vector length when restoring SVE state
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Currently when restoring the SVE state we supply the SVE vector length
as an argument to sve_load_state() and the underlying macros. This becomes
inconvenient with the addition of SME since we may need to restore any
combination of SVE and SME vector lengths, and we already separately
restore the vector length in the KVM code. We don't need to know the vector
length during the actual register load since the SME load instructions can
index into the data array for us.

Refactor the interface so we explicitly set the vector length separately
to restoring the SVE registers in preparation for adding SME support, no
functional change should be involved.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  2 +-
 arch/arm64/include/asm/fpsimdmacros.h |  7 +------
 arch/arm64/kernel/entry-fpsimd.S      |  3 +--
 arch/arm64/kernel/fpsimd.c            | 13 +++++++------
 arch/arm64/kvm/hyp/fpsimd.S           |  2 +-
 5 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 02fa676d1a9a..a18f409dec33 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -67,7 +67,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
 
 extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
-			   int restore_ffr, unsigned long vq_minus_1);
+			   int restore_ffr);
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 84d8cb7b07fa..b22538a6137e 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -241,7 +241,7 @@
 		str		w\nxtmp, [\xpfpsr, #4]
 .endm
 
-.macro __sve_load nxbase, xpfpsr, restore_ffr, nxtmp
+.macro sve_load nxbase, xpfpsr, restore_ffr, nxtmp
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		cbz		\restore_ffr, 921f
 		_sve_ldr_p	0, \nxbase
@@ -254,8 +254,3 @@
 		ldr		w\nxtmp, [\xpfpsr, #4]
 		msr		fpcr, x\nxtmp
 .endm
-
-.macro sve_load nxbase, xpfpsr, restore_ffr, xvqminus1, nxtmp, xtmp2
-		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
-		__sve_load	\nxbase, \xpfpsr, \restore_ffr, \nxtmp
-.endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 13c27465bfa8..2339d370bfe1 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -51,10 +51,9 @@ SYM_FUNC_END(sve_save_state)
  * x0 - pointer to buffer for state
  * x1 - pointer to storage for FPSR
  * x2 - Restore FFR if non-zero
- * x3 - VQ-1
  */
 SYM_FUNC_START(sve_load_state)
-	sve_load 0, x1, x2, x3, 4, x5
+	sve_load 0, x1, x2, 4
 	ret
 SYM_FUNC_END(sve_load_state)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index d45f14a68b9c..44bb3203c9d1 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -318,12 +318,13 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE))
+	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
+		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr, true,
-			       sve_vq_from_vl(task_get_sve_vl(current)) - 1);
-	else
+			       &current->thread.uw.fpsimd_state.fpsr, true);
+	} else {
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+	}
 }
 
 /*
@@ -1423,10 +1424,10 @@ void __efi_fpsimd_end(void)
 		    likely(__this_cpu_read(efi_sve_state_used))) {
 			char const *sve_state = this_cpu_ptr(efi_sve_state);
 
+			sve_set_vq(sve_vq_from_vl(sve_get_vl()) - 1);
 			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
-				       true,
-				       sve_vq_from_vl(sve_get_vl()) - 1);
+				       true);
 
 			__this_cpu_write(efi_sve_state_used, false);
 		} else {
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 1bb3b04b84e6..e950875e31ce 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S
@@ -22,7 +22,7 @@ SYM_FUNC_END(__fpsimd_restore_state)
 
 SYM_FUNC_START(__sve_restore_state)
 	mov	x2, #1
-	__sve_load 0, x1, x2, 3
+	sve_load 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_restore_state)
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 07/38] arm64/sve: Explicitly load vector length when restoring SVE state
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Currently when restoring the SVE state we supply the SVE vector length
as an argument to sve_load_state() and the underlying macros. This becomes
inconvenient with the addition of SME since we may need to restore any
combination of SVE and SME vector lengths, and we already separately
restore the vector length in the KVM code. We don't need to know the vector
length during the actual register load since the SME load instructions can
index into the data array for us.

Refactor the interface so we explicitly set the vector length separately
to restoring the SVE registers in preparation for adding SME support, no
functional change should be involved.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  2 +-
 arch/arm64/include/asm/fpsimdmacros.h |  7 +------
 arch/arm64/kernel/entry-fpsimd.S      |  3 +--
 arch/arm64/kernel/fpsimd.c            | 13 +++++++------
 arch/arm64/kvm/hyp/fpsimd.S           |  2 +-
 5 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 02fa676d1a9a..a18f409dec33 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -67,7 +67,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
 
 extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
-			   int restore_ffr, unsigned long vq_minus_1);
+			   int restore_ffr);
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 84d8cb7b07fa..b22538a6137e 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -241,7 +241,7 @@
 		str		w\nxtmp, [\xpfpsr, #4]
 .endm
 
-.macro __sve_load nxbase, xpfpsr, restore_ffr, nxtmp
+.macro sve_load nxbase, xpfpsr, restore_ffr, nxtmp
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		cbz		\restore_ffr, 921f
 		_sve_ldr_p	0, \nxbase
@@ -254,8 +254,3 @@
 		ldr		w\nxtmp, [\xpfpsr, #4]
 		msr		fpcr, x\nxtmp
 .endm
-
-.macro sve_load nxbase, xpfpsr, restore_ffr, xvqminus1, nxtmp, xtmp2
-		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
-		__sve_load	\nxbase, \xpfpsr, \restore_ffr, \nxtmp
-.endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 13c27465bfa8..2339d370bfe1 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -51,10 +51,9 @@ SYM_FUNC_END(sve_save_state)
  * x0 - pointer to buffer for state
  * x1 - pointer to storage for FPSR
  * x2 - Restore FFR if non-zero
- * x3 - VQ-1
  */
 SYM_FUNC_START(sve_load_state)
-	sve_load 0, x1, x2, x3, 4, x5
+	sve_load 0, x1, x2, 4
 	ret
 SYM_FUNC_END(sve_load_state)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index d45f14a68b9c..44bb3203c9d1 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -318,12 +318,13 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE))
+	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
+		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr, true,
-			       sve_vq_from_vl(task_get_sve_vl(current)) - 1);
-	else
+			       &current->thread.uw.fpsimd_state.fpsr, true);
+	} else {
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
+	}
 }
 
 /*
@@ -1423,10 +1424,10 @@ void __efi_fpsimd_end(void)
 		    likely(__this_cpu_read(efi_sve_state_used))) {
 			char const *sve_state = this_cpu_ptr(efi_sve_state);
 
+			sve_set_vq(sve_vq_from_vl(sve_get_vl()) - 1);
 			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
-				       true,
-				       sve_vq_from_vl(sve_get_vl()) - 1);
+				       true);
 
 			__this_cpu_write(efi_sve_state_used, false);
 		} else {
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 1bb3b04b84e6..e950875e31ce 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S
@@ -22,7 +22,7 @@ SYM_FUNC_END(__fpsimd_restore_state)
 
 SYM_FUNC_START(__sve_restore_state)
 	mov	x2, #1
-	__sve_load 0, x1, x2, 3
+	sve_load 0, x1, x2, 3
 	ret
 SYM_FUNC_END(__sve_restore_state)
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As for SVE we will track a per task SME vector length for tasks. Convert
the existing storage for the vector length into an array and update
fpsimd_flush_task() to initialise this in a function.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/processor.h   | 44 +++++++++++--
 arch/arm64/include/asm/thread_info.h |  2 +-
 arch/arm64/kernel/fpsimd.c           | 97 ++++++++++++++++------------
 3 files changed, 95 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index fb0608fe9ded..9b854e8196df 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -152,8 +152,8 @@ struct thread_struct {
 
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
-	unsigned int		sve_vl;		/* SVE vector length */
-	unsigned int		sve_vl_onexec;	/* SVE vl after next exec */
+	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
+	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
@@ -169,15 +169,45 @@ struct thread_struct {
 	u64			sctlr_user;
 };
 
+static inline unsigned int thread_get_vl(struct thread_struct *thread,
+					 enum vec_type type)
+{
+	return thread->vl[type];
+}
+
 static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
 {
-	return thread->sve_vl;
+	return thread_get_vl(thread, ARM64_VEC_SVE);
+}
+
+unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
+void task_set_vl(struct task_struct *task, enum vec_type type,
+		 unsigned long vl);
+void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
+			unsigned long vl);
+unsigned int task_get_vl_onexec(const struct task_struct *task,
+				enum vec_type type);
+
+static inline unsigned int task_get_sve_vl(const struct task_struct *task)
+{
+	return task_get_vl(task, ARM64_VEC_SVE);
 }
 
-unsigned int task_get_sve_vl(const struct task_struct *task);
-void task_set_sve_vl(struct task_struct *task, unsigned long vl);
-unsigned int task_get_sve_vl_onexec(const struct task_struct *task);
-void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl);
+static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
+{
+	task_set_vl(task, ARM64_VEC_SVE, vl);
+}
+
+static inline unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
+{
+	return task_get_vl_onexec(task, ARM64_VEC_SVE);
+}
+
+static inline void task_set_sve_vl_onexec(struct task_struct *task,
+					  unsigned long vl)
+{
+	task_set_vl_onexec(task, ARM64_VEC_SVE, vl);
+}
 
 #define SCTLR_USER_MASK                                                        \
 	(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 6623c99f0984..d5c8ac81ce11 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -78,7 +78,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SINGLESTEP		21
 #define TIF_32BIT		22	/* 32bit process */
 #define TIF_SVE			23	/* Scalable Vector Extension in use */
-#define TIF_SVE_VL_INHERIT	24	/* Inherit sve_vl_onexec across exec */
+#define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 44bb3203c9d1..814080209093 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -133,6 +133,17 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
 #endif
 };
 
+static unsigned int vec_vl_inherit_flag(enum vec_type type)
+{
+	switch (type) {
+	case ARM64_VEC_SVE:
+		return TIF_SVE_VL_INHERIT;
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
 struct vl_config {
 	int __default_vl;		/* Default VL for tasks */
 };
@@ -239,24 +250,27 @@ static void sve_free(struct task_struct *task)
 	__sve_free(task);
 }
 
-unsigned int task_get_sve_vl(const struct task_struct *task)
+unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
 {
-	return task->thread.sve_vl;
+	return task->thread.vl[type];
 }
 
-void task_set_sve_vl(struct task_struct *task, unsigned long vl)
+void task_set_vl(struct task_struct *task, enum vec_type type,
+		 unsigned long vl)
 {
-	task->thread.sve_vl = vl;
+	task->thread.vl[type] = vl;
 }
 
-unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
+unsigned int task_get_vl_onexec(const struct task_struct *task,
+				enum vec_type type)
 {
-	return task->thread.sve_vl_onexec;
+	return task->thread.vl_onexec[type];
 }
 
-void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl)
+void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
+			unsigned long vl)
 {
-	task->thread.sve_vl_onexec = vl;
+	task->thread.vl_onexec[type] = vl;
 }
 
 /*
@@ -1074,10 +1088,43 @@ void fpsimd_thread_switch(struct task_struct *next)
 	__put_cpu_fpsimd_context();
 }
 
-void fpsimd_flush_thread(void)
+static void fpsimd_flush_thread_vl(enum vec_type type)
 {
 	int vl, supported_vl;
 
+	/*
+	 * Reset the task vector length as required.  This is where we
+	 * ensure that all user tasks have a valid vector length
+	 * configured: no kernel task can become a user task without
+	 * an exec and hence a call to this function.  By the time the
+	 * first call to this function is made, all early hardware
+	 * probing is complete, so __sve_default_vl should be valid.
+	 * If a bug causes this to go wrong, we make some noise and
+	 * try to fudge thread.sve_vl to a safe value here.
+	 */
+	vl = task_get_vl_onexec(current, type);
+	if (!vl)
+		vl = get_default_vl(type);
+
+	if (WARN_ON(!sve_vl_valid(vl)))
+		vl = SVE_VL_MIN;
+
+	supported_vl = find_supported_vector_length(type, vl);
+	if (WARN_ON(supported_vl != vl))
+		vl = supported_vl;
+
+	task_set_vl(current, type, vl);
+
+	/*
+	 * If the task is not set to inherit, ensure that the vector
+	 * length will be reset by a subsequent exec:
+	 */
+	if (!test_thread_flag(vec_vl_inherit_flag(type)))
+		task_set_vl_onexec(current, type, 0);
+}
+
+void fpsimd_flush_thread(void)
+{
 	if (!system_supports_fpsimd())
 		return;
 
@@ -1090,37 +1137,7 @@ void fpsimd_flush_thread(void)
 	if (system_supports_sve()) {
 		clear_thread_flag(TIF_SVE);
 		sve_free(current);
-
-		/*
-		 * Reset the task vector length as required.
-		 * This is where we ensure that all user tasks have a valid
-		 * vector length configured: no kernel task can become a user
-		 * task without an exec and hence a call to this function.
-		 * By the time the first call to this function is made, all
-		 * early hardware probing is complete, so __sve_default_vl
-		 * should be valid.
-		 * If a bug causes this to go wrong, we make some noise and
-		 * try to fudge thread.sve_vl to a safe value here.
-		 */
-		vl = task_get_sve_vl_onexec(current);
-		if (!vl)
-			vl = get_sve_default_vl();
-
-		if (WARN_ON(!sve_vl_valid(vl)))
-			vl = SVE_VL_MIN;
-
-		supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
-		if (WARN_ON(supported_vl != vl))
-			vl = supported_vl;
-
-		task_set_sve_vl(current, vl);
-
-		/*
-		 * If the task is not set to inherit, ensure that the vector
-		 * length will be reset by a subsequent exec:
-		 */
-		if (!test_thread_flag(TIF_SVE_VL_INHERIT))
-			task_set_sve_vl_onexec(current, 0);
+		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
 	}
 
 	put_cpu_fpsimd_context();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As for SVE we will track a per task SME vector length for tasks. Convert
the existing storage for the vector length into an array and update
fpsimd_flush_task() to initialise this in a function.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/processor.h   | 44 +++++++++++--
 arch/arm64/include/asm/thread_info.h |  2 +-
 arch/arm64/kernel/fpsimd.c           | 97 ++++++++++++++++------------
 3 files changed, 95 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index fb0608fe9ded..9b854e8196df 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -152,8 +152,8 @@ struct thread_struct {
 
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
-	unsigned int		sve_vl;		/* SVE vector length */
-	unsigned int		sve_vl_onexec;	/* SVE vl after next exec */
+	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
+	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
@@ -169,15 +169,45 @@ struct thread_struct {
 	u64			sctlr_user;
 };
 
+static inline unsigned int thread_get_vl(struct thread_struct *thread,
+					 enum vec_type type)
+{
+	return thread->vl[type];
+}
+
 static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
 {
-	return thread->sve_vl;
+	return thread_get_vl(thread, ARM64_VEC_SVE);
+}
+
+unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
+void task_set_vl(struct task_struct *task, enum vec_type type,
+		 unsigned long vl);
+void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
+			unsigned long vl);
+unsigned int task_get_vl_onexec(const struct task_struct *task,
+				enum vec_type type);
+
+static inline unsigned int task_get_sve_vl(const struct task_struct *task)
+{
+	return task_get_vl(task, ARM64_VEC_SVE);
 }
 
-unsigned int task_get_sve_vl(const struct task_struct *task);
-void task_set_sve_vl(struct task_struct *task, unsigned long vl);
-unsigned int task_get_sve_vl_onexec(const struct task_struct *task);
-void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl);
+static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
+{
+	task_set_vl(task, ARM64_VEC_SVE, vl);
+}
+
+static inline unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
+{
+	return task_get_vl_onexec(task, ARM64_VEC_SVE);
+}
+
+static inline void task_set_sve_vl_onexec(struct task_struct *task,
+					  unsigned long vl)
+{
+	task_set_vl_onexec(task, ARM64_VEC_SVE, vl);
+}
 
 #define SCTLR_USER_MASK                                                        \
 	(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 6623c99f0984..d5c8ac81ce11 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -78,7 +78,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SINGLESTEP		21
 #define TIF_32BIT		22	/* 32bit process */
 #define TIF_SVE			23	/* Scalable Vector Extension in use */
-#define TIF_SVE_VL_INHERIT	24	/* Inherit sve_vl_onexec across exec */
+#define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 44bb3203c9d1..814080209093 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -133,6 +133,17 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
 #endif
 };
 
+static unsigned int vec_vl_inherit_flag(enum vec_type type)
+{
+	switch (type) {
+	case ARM64_VEC_SVE:
+		return TIF_SVE_VL_INHERIT;
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
 struct vl_config {
 	int __default_vl;		/* Default VL for tasks */
 };
@@ -239,24 +250,27 @@ static void sve_free(struct task_struct *task)
 	__sve_free(task);
 }
 
-unsigned int task_get_sve_vl(const struct task_struct *task)
+unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
 {
-	return task->thread.sve_vl;
+	return task->thread.vl[type];
 }
 
-void task_set_sve_vl(struct task_struct *task, unsigned long vl)
+void task_set_vl(struct task_struct *task, enum vec_type type,
+		 unsigned long vl)
 {
-	task->thread.sve_vl = vl;
+	task->thread.vl[type] = vl;
 }
 
-unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
+unsigned int task_get_vl_onexec(const struct task_struct *task,
+				enum vec_type type)
 {
-	return task->thread.sve_vl_onexec;
+	return task->thread.vl_onexec[type];
 }
 
-void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl)
+void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
+			unsigned long vl)
 {
-	task->thread.sve_vl_onexec = vl;
+	task->thread.vl_onexec[type] = vl;
 }
 
 /*
@@ -1074,10 +1088,43 @@ void fpsimd_thread_switch(struct task_struct *next)
 	__put_cpu_fpsimd_context();
 }
 
-void fpsimd_flush_thread(void)
+static void fpsimd_flush_thread_vl(enum vec_type type)
 {
 	int vl, supported_vl;
 
+	/*
+	 * Reset the task vector length as required.  This is where we
+	 * ensure that all user tasks have a valid vector length
+	 * configured: no kernel task can become a user task without
+	 * an exec and hence a call to this function.  By the time the
+	 * first call to this function is made, all early hardware
+	 * probing is complete, so __sve_default_vl should be valid.
+	 * If a bug causes this to go wrong, we make some noise and
+	 * try to fudge thread.sve_vl to a safe value here.
+	 */
+	vl = task_get_vl_onexec(current, type);
+	if (!vl)
+		vl = get_default_vl(type);
+
+	if (WARN_ON(!sve_vl_valid(vl)))
+		vl = SVE_VL_MIN;
+
+	supported_vl = find_supported_vector_length(type, vl);
+	if (WARN_ON(supported_vl != vl))
+		vl = supported_vl;
+
+	task_set_vl(current, type, vl);
+
+	/*
+	 * If the task is not set to inherit, ensure that the vector
+	 * length will be reset by a subsequent exec:
+	 */
+	if (!test_thread_flag(vec_vl_inherit_flag(type)))
+		task_set_vl_onexec(current, type, 0);
+}
+
+void fpsimd_flush_thread(void)
+{
 	if (!system_supports_fpsimd())
 		return;
 
@@ -1090,37 +1137,7 @@ void fpsimd_flush_thread(void)
 	if (system_supports_sve()) {
 		clear_thread_flag(TIF_SVE);
 		sve_free(current);
-
-		/*
-		 * Reset the task vector length as required.
-		 * This is where we ensure that all user tasks have a valid
-		 * vector length configured: no kernel task can become a user
-		 * task without an exec and hence a call to this function.
-		 * By the time the first call to this function is made, all
-		 * early hardware probing is complete, so __sve_default_vl
-		 * should be valid.
-		 * If a bug causes this to go wrong, we make some noise and
-		 * try to fudge thread.sve_vl to a safe value here.
-		 */
-		vl = task_get_sve_vl_onexec(current);
-		if (!vl)
-			vl = get_sve_default_vl();
-
-		if (WARN_ON(!sve_vl_valid(vl)))
-			vl = SVE_VL_MIN;
-
-		supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
-		if (WARN_ON(supported_vl != vl))
-			vl = supported_vl;
-
-		task_set_sve_vl(current, vl);
-
-		/*
-		 * If the task is not set to inherit, ensure that the vector
-		 * length will be reset by a subsequent exec:
-		 */
-		if (!test_thread_flag(TIF_SVE_VL_INHERIT))
-			task_set_sve_vl_onexec(current, 0);
+		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
 	}
 
 	put_cpu_fpsimd_context();
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 09/38] arm64/sve: Make sysctl interface for SVE reusable by SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The vector length configuration for SME is very similar to that for SVE
so in order to allow reuse refactor the SVE configuration so that it takes
the vector type from the struct ctl_table. Since there's no dedicated space
for this we repurpose the extra1 field to store the vector type, this is
otherwise unused for integer sysctls.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 814080209093..e3a88fba390d 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -15,6 +15,7 @@
 #include <linux/compiler.h>
 #include <linux/cpu.h>
 #include <linux/cpu_pm.h>
+#include <linux/ctype.h>
 #include <linux/kernel.h>
 #include <linux/linkage.h>
 #include <linux/irqflags.h>
@@ -406,17 +407,21 @@ static unsigned int find_supported_vector_length(enum vec_type type,
 
 #if defined(CONFIG_ARM64_SVE) && defined(CONFIG_SYSCTL)
 
-static int sve_proc_do_default_vl(struct ctl_table *table, int write,
+static int vec_proc_do_default_vl(struct ctl_table *table, int write,
 				  void *buffer, size_t *lenp, loff_t *ppos)
 {
-	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
+	struct vl_info *info = table->extra1;
+	enum vec_type type = info->type;
 	int ret;
-	int vl = get_sve_default_vl();
+	int vl = get_default_vl(type);
 	struct ctl_table tmp_table = {
 		.data = &vl,
 		.maxlen = sizeof(vl),
 	};
 
+	if (!info)
+		return -EINVAL;
+
 	ret = proc_dointvec(&tmp_table, write, buffer, lenp, ppos);
 	if (ret || !write)
 		return ret;
@@ -428,7 +433,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
 
-	set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl));
+	set_default_vl(type, find_supported_vector_length(type, vl));
 	return 0;
 }
 
@@ -436,7 +441,8 @@ static struct ctl_table sve_default_vl_table[] = {
 	{
 		.procname	= "sve_default_vector_length",
 		.mode		= 0644,
-		.proc_handler	= sve_proc_do_default_vl,
+		.proc_handler	= vec_proc_do_default_vl,
+		.extra1		= &vl_info[ARM64_VEC_SVE],
 	},
 	{ }
 };
@@ -1107,7 +1113,7 @@ static void fpsimd_flush_thread_vl(enum vec_type type)
 		vl = get_default_vl(type);
 
 	if (WARN_ON(!sve_vl_valid(vl)))
-		vl = SVE_VL_MIN;
+		vl = vl_info[type].min_vl;
 
 	supported_vl = find_supported_vector_length(type, vl);
 	if (WARN_ON(supported_vl != vl))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 09/38] arm64/sve: Make sysctl interface for SVE reusable by SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The vector length configuration for SME is very similar to that for SVE
so in order to allow reuse refactor the SVE configuration so that it takes
the vector type from the struct ctl_table. Since there's no dedicated space
for this we repurpose the extra1 field to store the vector type, this is
otherwise unused for integer sysctls.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 814080209093..e3a88fba390d 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -15,6 +15,7 @@
 #include <linux/compiler.h>
 #include <linux/cpu.h>
 #include <linux/cpu_pm.h>
+#include <linux/ctype.h>
 #include <linux/kernel.h>
 #include <linux/linkage.h>
 #include <linux/irqflags.h>
@@ -406,17 +407,21 @@ static unsigned int find_supported_vector_length(enum vec_type type,
 
 #if defined(CONFIG_ARM64_SVE) && defined(CONFIG_SYSCTL)
 
-static int sve_proc_do_default_vl(struct ctl_table *table, int write,
+static int vec_proc_do_default_vl(struct ctl_table *table, int write,
 				  void *buffer, size_t *lenp, loff_t *ppos)
 {
-	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
+	struct vl_info *info = table->extra1;
+	enum vec_type type = info->type;
 	int ret;
-	int vl = get_sve_default_vl();
+	int vl = get_default_vl(type);
 	struct ctl_table tmp_table = {
 		.data = &vl,
 		.maxlen = sizeof(vl),
 	};
 
+	if (!info)
+		return -EINVAL;
+
 	ret = proc_dointvec(&tmp_table, write, buffer, lenp, ppos);
 	if (ret || !write)
 		return ret;
@@ -428,7 +433,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
 
-	set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl));
+	set_default_vl(type, find_supported_vector_length(type, vl));
 	return 0;
 }
 
@@ -436,7 +441,8 @@ static struct ctl_table sve_default_vl_table[] = {
 	{
 		.procname	= "sve_default_vector_length",
 		.mode		= 0644,
-		.proc_handler	= sve_proc_do_default_vl,
+		.proc_handler	= vec_proc_do_default_vl,
+		.extra1		= &vl_info[ARM64_VEC_SVE],
 	},
 	{ }
 };
@@ -1107,7 +1113,7 @@ static void fpsimd_flush_thread_vl(enum vec_type type)
 		vl = get_default_vl(type);
 
 	if (WARN_ON(!sve_vl_valid(vl)))
-		vl = SVE_VL_MIN;
+		vl = vl_info[type].min_vl;
 
 	supported_vl = find_supported_vector_length(type, vl);
 	if (WARN_ON(supported_vl != vl))
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In preparation for adding SME support update the bulk of the implementation
for the vector length configuration prctl() calls to be independent of
vector type.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h |  8 +++---
 arch/arm64/kernel/fpsimd.c      | 47 ++++++++++++++++++---------------
 arch/arm64/kernel/ptrace.c      |  4 +--
 arch/arm64/kvm/reset.c          |  8 +++---
 4 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index a18f409dec33..802597140121 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -51,8 +51,8 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
 
-/* Maximum VL that SVE VL-agnostic software can transparently support */
-#define SVE_VL_ARCH_MAX 0x100
+/* Maximum VL that SVE/SME VL-agnostic software can transparently support */
+#define VL_ARCH_MAX 0x100
 
 /* Offset of FFR in the SVE register dump */
 static inline size_t sve_ffr_offset(int vl)
@@ -103,11 +103,13 @@ extern void fpsimd_sync_to_sve(struct task_struct *task);
 extern void sve_sync_to_fpsimd(struct task_struct *task);
 extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
 
-extern int sve_set_vector_length(struct task_struct *task,
+extern int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 				 unsigned long vl, unsigned long flags);
 
 extern int sve_set_current_vl(unsigned long arg);
 extern int sve_get_current_vl(void);
+extern int sme_set_current_vl(unsigned long arg);
+extern int sme_get_current_vl(void);
 
 static inline void sve_user_disable(void)
 {
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e3a88fba390d..b3d4786e2601 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -635,7 +635,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
 	__fpsimd_to_sve(sst, fst, vq);
 }
 
-int sve_set_vector_length(struct task_struct *task,
+int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 			  unsigned long vl, unsigned long flags)
 {
 	if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT |
@@ -646,33 +646,35 @@ int sve_set_vector_length(struct task_struct *task,
 		return -EINVAL;
 
 	/*
-	 * Clamp to the maximum vector length that VL-agnostic SVE code can
-	 * work with.  A flag may be assigned in the future to allow setting
-	 * of larger vector lengths without confusing older software.
+	 * Clamp to the maximum vector length that VL-agnostic code
+	 * can work with.  A flag may be assigned in the future to
+	 * allow setting of larger vector lengths without confusing
+	 * older software.
 	 */
-	if (vl > SVE_VL_ARCH_MAX)
-		vl = SVE_VL_ARCH_MAX;
+	if (vl > VL_ARCH_MAX)
+		vl = VL_ARCH_MAX;
 
-	vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
+	vl = find_supported_vector_length(type, vl);
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
-		task_set_sve_vl_onexec(task, vl);
+		task_set_vl_onexec(task, type, vl);
 	else
 		/* Reset VL to system default on next exec: */
-		task_set_sve_vl_onexec(task, 0);
+		task_set_vl_onexec(task, type, 0);
 
 	/* Only actually set the VL if not deferred: */
 	if (flags & PR_SVE_SET_VL_ONEXEC)
 		goto out;
 
-	if (vl == task_get_sve_vl(task))
+	if (vl == task_get_vl(task, type))
 		goto out;
 
 	/*
 	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
 	 * write any live register state back to task_struct, and convert to a
-	 * non-SVE thread.
+	 * regular FPSIMD thread.  Since the vector length can only be changed
+	 * with a syscall we can't be in streaming mode while reconfiguring.
 	 */
 	if (task == current) {
 		get_cpu_fpsimd_context();
@@ -693,10 +695,10 @@ int sve_set_vector_length(struct task_struct *task,
 	 */
 	sve_free(task);
 
-	task_set_sve_vl(task, vl);
+	task_set_vl(task, type, vl);
 
 out:
-	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
+	update_tsk_thread_flag(task, vec_vl_inherit_flag(type),
 			       flags & PR_SVE_VL_INHERIT);
 
 	return 0;
@@ -704,20 +706,21 @@ int sve_set_vector_length(struct task_struct *task,
 
 /*
  * Encode the current vector length and flags for return.
- * This is only required for prctl(): ptrace has separate fields
+ * This is only required for prctl(): ptrace has separate fields.
+ * SVE and SME use the same bits for _ONEXEC and _INHERIT.
  *
- * flags are as for sve_set_vector_length().
+ * flags are as for vec_set_vector_length().
  */
-static int sve_prctl_status(unsigned long flags)
+static int vec_prctl_status(enum vec_type type, unsigned long flags)
 {
 	int ret;
 
 	if (flags & PR_SVE_SET_VL_ONEXEC)
-		ret = task_get_sve_vl_onexec(current);
+		ret = task_get_vl_onexec(current, type);
 	else
-		ret = task_get_sve_vl(current);
+		ret = task_get_vl(current, type);
 
-	if (test_thread_flag(TIF_SVE_VL_INHERIT))
+	if (test_thread_flag(vec_vl_inherit_flag(type)))
 		ret |= PR_SVE_VL_INHERIT;
 
 	return ret;
@@ -735,11 +738,11 @@ int sve_set_current_vl(unsigned long arg)
 	if (!system_supports_sve() || is_compat_task())
 		return -EINVAL;
 
-	ret = sve_set_vector_length(current, vl, flags);
+	ret = vec_set_vector_length(current, ARM64_VEC_SVE, vl, flags);
 	if (ret)
 		return ret;
 
-	return sve_prctl_status(flags);
+	return vec_prctl_status(ARM64_VEC_SVE, flags);
 }
 
 /* PR_SVE_GET_VL */
@@ -748,7 +751,7 @@ int sve_get_current_vl(void)
 	if (!system_supports_sve() || is_compat_task())
 		return -EINVAL;
 
-	return sve_prctl_status(0);
+	return vec_prctl_status(ARM64_VEC_SVE, 0);
 }
 
 static void vec_probe_vqs(struct vl_info *info,
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 88a9034fb9b5..716dde289446 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -812,9 +812,9 @@ static int sve_set(struct task_struct *target,
 
 	/*
 	 * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by
-	 * sve_set_vector_length(), which will also validate them for us:
+	 * vec_set_vector_length(), which will also validate them for us:
 	 */
-	ret = sve_set_vector_length(target, header.vl,
+	ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl,
 		((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
 	if (ret)
 		goto out;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 09cd30a9aafb..0f6741c80226 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -52,10 +52,10 @@ int kvm_arm_init_sve(void)
 		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
 		 * to be extended with multiple register slice support in
 		 * order to support vector lengths greater than
-		 * SVE_VL_ARCH_MAX:
+		 * VL_ARCH_MAX:
 		 */
-		if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
-			kvm_sve_max_vl = SVE_VL_ARCH_MAX;
+		if (WARN_ON(kvm_sve_max_vl > VL_ARCH_MAX))
+			kvm_sve_max_vl = VL_ARCH_MAX;
 
 		/*
 		 * Don't even try to make use of vector lengths that
@@ -103,7 +103,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 	 * set_sve_vls().  Double-check here just to be sure:
 	 */
 	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() ||
-		    vl > SVE_VL_ARCH_MAX))
+		    vl > VL_ARCH_MAX))
 		return -EIO;
 
 	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In preparation for adding SME support update the bulk of the implementation
for the vector length configuration prctl() calls to be independent of
vector type.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h |  8 +++---
 arch/arm64/kernel/fpsimd.c      | 47 ++++++++++++++++++---------------
 arch/arm64/kernel/ptrace.c      |  4 +--
 arch/arm64/kvm/reset.c          |  8 +++---
 4 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index a18f409dec33..802597140121 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -51,8 +51,8 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
 
-/* Maximum VL that SVE VL-agnostic software can transparently support */
-#define SVE_VL_ARCH_MAX 0x100
+/* Maximum VL that SVE/SME VL-agnostic software can transparently support */
+#define VL_ARCH_MAX 0x100
 
 /* Offset of FFR in the SVE register dump */
 static inline size_t sve_ffr_offset(int vl)
@@ -103,11 +103,13 @@ extern void fpsimd_sync_to_sve(struct task_struct *task);
 extern void sve_sync_to_fpsimd(struct task_struct *task);
 extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
 
-extern int sve_set_vector_length(struct task_struct *task,
+extern int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 				 unsigned long vl, unsigned long flags);
 
 extern int sve_set_current_vl(unsigned long arg);
 extern int sve_get_current_vl(void);
+extern int sme_set_current_vl(unsigned long arg);
+extern int sme_get_current_vl(void);
 
 static inline void sve_user_disable(void)
 {
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e3a88fba390d..b3d4786e2601 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -635,7 +635,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
 	__fpsimd_to_sve(sst, fst, vq);
 }
 
-int sve_set_vector_length(struct task_struct *task,
+int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 			  unsigned long vl, unsigned long flags)
 {
 	if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT |
@@ -646,33 +646,35 @@ int sve_set_vector_length(struct task_struct *task,
 		return -EINVAL;
 
 	/*
-	 * Clamp to the maximum vector length that VL-agnostic SVE code can
-	 * work with.  A flag may be assigned in the future to allow setting
-	 * of larger vector lengths without confusing older software.
+	 * Clamp to the maximum vector length that VL-agnostic code
+	 * can work with.  A flag may be assigned in the future to
+	 * allow setting of larger vector lengths without confusing
+	 * older software.
 	 */
-	if (vl > SVE_VL_ARCH_MAX)
-		vl = SVE_VL_ARCH_MAX;
+	if (vl > VL_ARCH_MAX)
+		vl = VL_ARCH_MAX;
 
-	vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
+	vl = find_supported_vector_length(type, vl);
 
 	if (flags & (PR_SVE_VL_INHERIT |
 		     PR_SVE_SET_VL_ONEXEC))
-		task_set_sve_vl_onexec(task, vl);
+		task_set_vl_onexec(task, type, vl);
 	else
 		/* Reset VL to system default on next exec: */
-		task_set_sve_vl_onexec(task, 0);
+		task_set_vl_onexec(task, type, 0);
 
 	/* Only actually set the VL if not deferred: */
 	if (flags & PR_SVE_SET_VL_ONEXEC)
 		goto out;
 
-	if (vl == task_get_sve_vl(task))
+	if (vl == task_get_vl(task, type))
 		goto out;
 
 	/*
 	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
 	 * write any live register state back to task_struct, and convert to a
-	 * non-SVE thread.
+	 * regular FPSIMD thread.  Since the vector length can only be changed
+	 * with a syscall we can't be in streaming mode while reconfiguring.
 	 */
 	if (task == current) {
 		get_cpu_fpsimd_context();
@@ -693,10 +695,10 @@ int sve_set_vector_length(struct task_struct *task,
 	 */
 	sve_free(task);
 
-	task_set_sve_vl(task, vl);
+	task_set_vl(task, type, vl);
 
 out:
-	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
+	update_tsk_thread_flag(task, vec_vl_inherit_flag(type),
 			       flags & PR_SVE_VL_INHERIT);
 
 	return 0;
@@ -704,20 +706,21 @@ int sve_set_vector_length(struct task_struct *task,
 
 /*
  * Encode the current vector length and flags for return.
- * This is only required for prctl(): ptrace has separate fields
+ * This is only required for prctl(): ptrace has separate fields.
+ * SVE and SME use the same bits for _ONEXEC and _INHERIT.
  *
- * flags are as for sve_set_vector_length().
+ * flags are as for vec_set_vector_length().
  */
-static int sve_prctl_status(unsigned long flags)
+static int vec_prctl_status(enum vec_type type, unsigned long flags)
 {
 	int ret;
 
 	if (flags & PR_SVE_SET_VL_ONEXEC)
-		ret = task_get_sve_vl_onexec(current);
+		ret = task_get_vl_onexec(current, type);
 	else
-		ret = task_get_sve_vl(current);
+		ret = task_get_vl(current, type);
 
-	if (test_thread_flag(TIF_SVE_VL_INHERIT))
+	if (test_thread_flag(vec_vl_inherit_flag(type)))
 		ret |= PR_SVE_VL_INHERIT;
 
 	return ret;
@@ -735,11 +738,11 @@ int sve_set_current_vl(unsigned long arg)
 	if (!system_supports_sve() || is_compat_task())
 		return -EINVAL;
 
-	ret = sve_set_vector_length(current, vl, flags);
+	ret = vec_set_vector_length(current, ARM64_VEC_SVE, vl, flags);
 	if (ret)
 		return ret;
 
-	return sve_prctl_status(flags);
+	return vec_prctl_status(ARM64_VEC_SVE, flags);
 }
 
 /* PR_SVE_GET_VL */
@@ -748,7 +751,7 @@ int sve_get_current_vl(void)
 	if (!system_supports_sve() || is_compat_task())
 		return -EINVAL;
 
-	return sve_prctl_status(0);
+	return vec_prctl_status(ARM64_VEC_SVE, 0);
 }
 
 static void vec_probe_vqs(struct vl_info *info,
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 88a9034fb9b5..716dde289446 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -812,9 +812,9 @@ static int sve_set(struct task_struct *target,
 
 	/*
 	 * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by
-	 * sve_set_vector_length(), which will also validate them for us:
+	 * vec_set_vector_length(), which will also validate them for us:
 	 */
-	ret = sve_set_vector_length(target, header.vl,
+	ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl,
 		((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
 	if (ret)
 		goto out;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 09cd30a9aafb..0f6741c80226 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -52,10 +52,10 @@ int kvm_arm_init_sve(void)
 		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
 		 * to be extended with multiple register slice support in
 		 * order to support vector lengths greater than
-		 * SVE_VL_ARCH_MAX:
+		 * VL_ARCH_MAX:
 		 */
-		if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
-			kvm_sve_max_vl = SVE_VL_ARCH_MAX;
+		if (WARN_ON(kvm_sve_max_vl > VL_ARCH_MAX))
+			kvm_sve_max_vl = VL_ARCH_MAX;
 
 		/*
 		 * Don't even try to make use of vector lengths that
@@ -103,7 +103,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 	 * set_sve_vls().  Double-check here just to be sure:
 	 */
 	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() ||
-		    vl > SVE_VL_ARCH_MAX))
+		    vl > VL_ARCH_MAX))
 		return -EIO;
 
 	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 11/38] selftests: arm64: Parameterise ptrace vector length information
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

SME introduces a new mode called streaming mode in which the SVE registers
have a different vector length. Since the ptrace interface for this is
based on the existing SVE interface prepare for supporting this by moving
the regset specific configuration into  struct and passing that around,
allowing these tests to be reused for streaming mode. As we will also have
to verify the interoperation of the SVE and streaming SVE regsets don't
just iterate over an array.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/sve-ptrace.c | 192 ++++++++++++------
 1 file changed, 129 insertions(+), 63 deletions(-)

diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index ac0629f05365..e200f7ed9572 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -21,16 +21,36 @@
 
 #include "../../kselftest.h"
 
-#define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
-#define FPSIMD_TESTS 3
-
-#define EXPECTED_TESTS (VL_TESTS + FPSIMD_TESTS)
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
 /* <linux/elf.h> and <sys/auxv.h> don't like each other, so: */
 #ifndef NT_ARM_SVE
 #define NT_ARM_SVE 0x405
 #endif
 
+struct vec_type {
+	const char *name;
+	unsigned long hwcap_type;
+	unsigned long hwcap;
+	int regset;
+	int prctl_set;
+};
+
+static const struct vec_type vec_types[] = {
+	{
+		.name = "SVE",
+		.hwcap_type = AT_HWCAP,
+		.hwcap = HWCAP_SVE,
+		.regset = NT_ARM_SVE,
+		.prctl_set = PR_SVE_SET_VL,
+	},
+};
+
+#define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
+#define FPSIMD_TESTS 3
+
+#define EXPECTED_TESTS ((VL_TESTS + FPSIMD_TESTS) * ARRAY_SIZE(vec_types))
+
 static void fill_buf(char *buf, size_t size)
 {
 	int i;
@@ -59,7 +79,8 @@ static int get_fpsimd(pid_t pid, struct user_fpsimd_state *fpsimd)
 	return ptrace(PTRACE_GETREGSET, pid, NT_PRFPREG, &iov);
 }
 
-static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
+static struct user_sve_header *get_sve(pid_t pid, const struct vec_type *type,
+				       void **buf, size_t *size)
 {
 	struct user_sve_header *sve;
 	void *p;
@@ -80,7 +101,7 @@ static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
 
 		iov.iov_base = *buf;
 		iov.iov_len = sz;
-		if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov))
+		if (ptrace(PTRACE_GETREGSET, pid, type->regset, &iov))
 			goto error;
 
 		sve = *buf;
@@ -96,17 +117,19 @@ static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
 	return NULL;
 }
 
-static int set_sve(pid_t pid, const struct user_sve_header *sve)
+static int set_sve(pid_t pid, const struct vec_type *type,
+		   const struct user_sve_header *sve)
 {
 	struct iovec iov;
 
 	iov.iov_base = (void *)sve;
 	iov.iov_len = sve->size;
-	return ptrace(PTRACE_SETREGSET, pid, NT_ARM_SVE, &iov);
+	return ptrace(PTRACE_SETREGSET, pid, type->regset, &iov);
 }
 
 /* Validate attempting to set the specfied VL via ptrace */
-static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
+static void ptrace_set_get_vl(pid_t child, const struct vec_type *type,
+			      unsigned int vl, bool *supported)
 {
 	struct user_sve_header sve;
 	struct user_sve_header *new_sve = NULL;
@@ -116,10 +139,10 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
 	*supported = false;
 
 	/* Check if the VL is supported in this process */
-	prctl_vl = prctl(PR_SVE_SET_VL, vl);
+	prctl_vl = prctl(type->prctl_set, vl);
 	if (prctl_vl == -1)
-		ksft_exit_fail_msg("prctl(PR_SVE_SET_VL) failed: %s (%d)\n",
-				   strerror(errno), errno);
+		ksft_exit_fail_msg("prctl(PR_%s_SET_VL) failed: %s (%d)\n",
+				   type->name, strerror(errno), errno);
 
 	/* If the VL is not supported then a supported VL will be returned */
 	*supported = (prctl_vl == vl);
@@ -128,9 +151,10 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
 	memset(&sve, 0, sizeof(sve));
 	sve.size = sizeof(sve);
 	sve.vl = vl;
-	ret = set_sve(child, &sve);
+	ret = set_sve(child, type, &sve);
 	if (ret != 0) {
-		ksft_test_result_fail("Failed to set VL %u\n", vl);
+		ksft_test_result_fail("Failed to set %s VL %u\n",
+				      type->name, vl);
 		return;
 	}
 
@@ -138,12 +162,14 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
 	 * Read back the new register state and verify that we have the
 	 * same VL that we got from prctl() on ourselves.
 	 */
-	if (!get_sve(child, (void **)&new_sve, &new_sve_size)) {
-		ksft_test_result_fail("Failed to read VL %u\n", vl);
+	if (!get_sve(child, type, (void **)&new_sve, &new_sve_size)) {
+		ksft_test_result_fail("Failed to read %s VL %u\n",
+				      type->name, vl);
 		return;
 	}
 
-	ksft_test_result(new_sve->vl = prctl_vl, "Set VL %u\n", vl);
+	ksft_test_result(new_sve->vl = prctl_vl, "Set %s VL %u\n",
+			 type->name, vl);
 
 	free(new_sve);
 }
@@ -159,7 +185,7 @@ static void check_u32(unsigned int vl, const char *reg,
 }
 
 /* Access the FPSIMD registers via the SVE regset */
-static void ptrace_sve_fpsimd(pid_t child)
+static void ptrace_sve_fpsimd(pid_t child, const struct vec_type *type)
 {
 	void *svebuf = NULL;
 	size_t svebufsz = 0;
@@ -169,17 +195,18 @@ static void ptrace_sve_fpsimd(pid_t child)
 	unsigned char *p;
 
 	/* New process should start with FPSIMD registers only */
-	sve = get_sve(child, &svebuf, &svebufsz);
+	sve = get_sve(child, type, &svebuf, &svebufsz);
 	if (!sve) {
-		ksft_test_result_fail("get_sve: %s\n", strerror(errno));
+		ksft_test_result_fail("get_sve(%s): %s\n",
+				      type->name, strerror(errno));
 
 		return;
 	} else {
-		ksft_test_result_pass("get_sve(FPSIMD)\n");
+		ksft_test_result_pass("get_sve(%s FPSIMD)\n", type->name);
 	}
 
 	ksft_test_result((sve->flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD,
-			 "Set FPSIMD registers\n");
+			 "Set FPSIMD registers via %s\n", type->name);
 	if ((sve->flags & SVE_PT_REGS_MASK) != SVE_PT_REGS_FPSIMD)
 		goto out;
 
@@ -193,9 +220,9 @@ static void ptrace_sve_fpsimd(pid_t child)
 			p[j] = j;
 	}
 
-	if (set_sve(child, sve)) {
-		ksft_test_result_fail("set_sve(FPSIMD): %s\n",
-				      strerror(errno));
+	if (set_sve(child, type, sve)) {
+		ksft_test_result_fail("set_sve(%s FPSIMD): %s\n",
+				      type->name, strerror(errno));
 
 		goto out;
 	}
@@ -207,16 +234,20 @@ static void ptrace_sve_fpsimd(pid_t child)
 		goto out;
 	}
 	if (memcmp(fpsimd, &new_fpsimd, sizeof(*fpsimd)) == 0)
-		ksft_test_result_pass("get_fpsimd() gave same state\n");
+		ksft_test_result_pass("%s get_fpsimd() gave same state\n",
+				      type->name);
 	else
-		ksft_test_result_fail("get_fpsimd() gave different state\n");
+		ksft_test_result_fail("%s get_fpsimd() gave different state\n",
+				      type->name);
 
 out:
 	free(svebuf);
 }
 
 /* Validate attempting to set SVE data and read SVE data */
-static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
+static void ptrace_set_sve_get_sve_data(pid_t child,
+					const struct vec_type *type,
+					unsigned int vl)
 {
 	void *write_buf;
 	void *read_buf = NULL;
@@ -231,8 +262,8 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 	data_size = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
 	write_buf = malloc(data_size);
 	if (!write_buf) {
-		ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
-				      data_size, vl);
+		ksft_test_result_fail("Error allocating %d byte buffer for %s VL %u\n",
+				      data_size, type->name, vl);
 		return;
 	}
 	write_sve = write_buf;
@@ -256,23 +287,26 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 
 	/* TODO: Generate a valid FFR pattern */
 
-	ret = set_sve(child, write_sve);
+	ret = set_sve(child, type, write_sve);
 	if (ret != 0) {
-		ksft_test_result_fail("Failed to set VL %u data\n", vl);
+		ksft_test_result_fail("Failed to set %s VL %u data\n",
+				      type->name, vl);
 		goto out;
 	}
 
 	/* Read the data back */
-	if (!get_sve(child, (void **)&read_buf, &read_sve_size)) {
-		ksft_test_result_fail("Failed to read VL %u data\n", vl);
+	if (!get_sve(child, type, (void **)&read_buf, &read_sve_size)) {
+		ksft_test_result_fail("Failed to read %s VL %u data\n",
+				      type->name, vl);
 		goto out;
 	}
 	read_sve = read_buf;
 
 	/* We might read more data if there's extensions we don't know */
 	if (read_sve->size < write_sve->size) {
-		ksft_test_result_fail("Wrote %d bytes, only read %d\n",
-				      write_sve->size, read_sve->size);
+		ksft_test_result_fail("%s wrote %d bytes, only read %d\n",
+				      type->name, write_sve->size,
+				      read_sve->size);
 		goto out_read;
 	}
 
@@ -299,7 +333,8 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 	check_u32(vl, "FPCR", write_buf + SVE_PT_SVE_FPCR_OFFSET(vq),
 		  read_buf + SVE_PT_SVE_FPCR_OFFSET(vq), &errors);
 
-	ksft_test_result(errors == 0, "Set and get SVE data for VL %u\n", vl);
+	ksft_test_result(errors == 0, "Set and get %s data for VL %u\n",
+			 type->name, vl);
 
 out_read:
 	free(read_buf);
@@ -308,7 +343,9 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 }
 
 /* Validate attempting to set SVE data and read SVE data */
-static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
+static void ptrace_set_sve_get_fpsimd_data(pid_t child,
+					   const struct vec_type *type,
+					   unsigned int vl)
 {
 	void *write_buf;
 	struct user_sve_header *write_sve;
@@ -326,8 +363,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 	data_size = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
 	write_buf = malloc(data_size);
 	if (!write_buf) {
-		ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
-				      data_size, vl);
+		ksft_test_result_fail("Error allocating %d byte buffer for %s VL %u\n",
+				      data_size, type->name, vl);
 		return;
 	}
 	write_sve = write_buf;
@@ -345,16 +382,17 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 	fill_buf(write_buf + SVE_PT_SVE_FPSR_OFFSET(vq), SVE_PT_SVE_FPSR_SIZE);
 	fill_buf(write_buf + SVE_PT_SVE_FPCR_OFFSET(vq), SVE_PT_SVE_FPCR_SIZE);
 
-	ret = set_sve(child, write_sve);
+	ret = set_sve(child, type, write_sve);
 	if (ret != 0) {
-		ksft_test_result_fail("Failed to set VL %u data\n", vl);
+		ksft_test_result_fail("Failed to set %s VL %u data\n",
+				      type->name, vl);
 		goto out;
 	}
 
 	/* Read the data back */
 	if (get_fpsimd(child, &fpsimd_state)) {
-		ksft_test_result_fail("Failed to read VL %u FPSIMD data\n",
-				      vl);
+		ksft_test_result_fail("Failed to read %s VL %u FPSIMD data\n",
+				      type->name, vl);
 		goto out;
 	}
 
@@ -369,7 +407,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 		       sizeof(tmp));
 
 		if (tmp != fpsimd_state.vregs[i]) {
-			printf("# Mismatch in FPSIMD for VL %u Z%d\n", vl, i);
+			printf("# Mismatch in FPSIMD for %s VL %u Z%d\n",
+			       type->name, vl, i);
 			errors++;
 		}
 	}
@@ -379,8 +418,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 	check_u32(vl, "FPCR", write_buf + SVE_PT_SVE_FPCR_OFFSET(vq),
 		  &fpsimd_state.fpcr, &errors);
 
-	ksft_test_result(errors == 0, "Set and get FPSIMD data for VL %u\n",
-			 vl);
+	ksft_test_result(errors == 0, "Set and get FPSIMD data for %s VL %u\n",
+			 type->name, vl);
 
 out:
 	free(write_buf);
@@ -390,7 +429,7 @@ static int do_parent(pid_t child)
 {
 	int ret = EXIT_FAILURE;
 	pid_t pid;
-	int status;
+	int status, i;
 	siginfo_t si;
 	unsigned int vq, vl;
 	bool vl_supported;
@@ -449,23 +488,50 @@ static int do_parent(pid_t child)
 		}
 	}
 
-	/* FPSIMD via SVE regset */
-	ptrace_sve_fpsimd(child);
-
-	/* Step through every possible VQ */
-	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
-		vl = sve_vl_from_vq(vq);
+	/*
+	 * Do all the FPSIMD tests before starting SVE so we know we
+	 * don't have SVE specific register state.
+	 */
+	for (i = 0; i < ARRAY_SIZE(vec_types); i++) {
+		/* FPSIMD via SVE regset */
+		if (getauxval(vec_types[i].hwcap_type) & vec_types[i].hwcap) {
+			ptrace_sve_fpsimd(child, &vec_types[i]);
+		} else {
+			ksft_test_result_skip("%s FPSIMD get via SVE\n",
+					      vec_types[i].name);
+			ksft_test_result_skip("%s FPSIMD set via SVE\n",
+					      vec_types[i].name);
+			ksft_test_result_skip("%s set read via FPSIMD\n",
+					      vec_types[i].name);
+		}
+	}
 
-		/* First, try to set this vector length */
-		ptrace_set_get_vl(child, vl, &vl_supported);
+	for (i = 0; i < ARRAY_SIZE(vec_types); i++) {
+		/* Step through every possible VQ */
+		for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
+			vl = sve_vl_from_vq(vq);
+
+			/* First, try to set this vector length */
+			if (getauxval(vec_types[i].hwcap_type) &
+			    vec_types[i].hwcap) {
+				ptrace_set_get_vl(child, &vec_types[i], vl,
+						  &vl_supported);
+			} else {
+				ksft_test_result_skip("%s get/set VL %d\n",
+						      vec_types[i].name, vl);
+				vl_supported = false;
+			}
 
-		/* If the VL is supported validate data set/get */
-		if (vl_supported) {
-			ptrace_set_sve_get_sve_data(child, vl);
-			ptrace_set_sve_get_fpsimd_data(child, vl);
-		} else {
-			ksft_test_result_skip("set SVE get SVE for VL %d\n", vl);
-			ksft_test_result_skip("set SVE get FPSIMD for VL %d\n", vl);
+			/* If the VL is supported validate data set/get */
+			if (vl_supported) {
+				ptrace_set_sve_get_sve_data(child, &vec_types[i], vl);
+				ptrace_set_sve_get_fpsimd_data(child, &vec_types[i], vl);
+			} else {
+				ksft_test_result_skip("%s set SVE get SVE for VL %d\n",
+						      vec_types[i].name, vl);
+				ksft_test_result_skip("%s set SVE get FPSIMD for VL %d\n",
+						      vec_types[i].name, vl);
+			}
 		}
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 11/38] selftests: arm64: Parameterise ptrace vector length information
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

SME introduces a new mode called streaming mode in which the SVE registers
have a different vector length. Since the ptrace interface for this is
based on the existing SVE interface prepare for supporting this by moving
the regset specific configuration into  struct and passing that around,
allowing these tests to be reused for streaming mode. As we will also have
to verify the interoperation of the SVE and streaming SVE regsets don't
just iterate over an array.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/sve-ptrace.c | 192 ++++++++++++------
 1 file changed, 129 insertions(+), 63 deletions(-)

diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index ac0629f05365..e200f7ed9572 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -21,16 +21,36 @@
 
 #include "../../kselftest.h"
 
-#define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
-#define FPSIMD_TESTS 3
-
-#define EXPECTED_TESTS (VL_TESTS + FPSIMD_TESTS)
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
 /* <linux/elf.h> and <sys/auxv.h> don't like each other, so: */
 #ifndef NT_ARM_SVE
 #define NT_ARM_SVE 0x405
 #endif
 
+struct vec_type {
+	const char *name;
+	unsigned long hwcap_type;
+	unsigned long hwcap;
+	int regset;
+	int prctl_set;
+};
+
+static const struct vec_type vec_types[] = {
+	{
+		.name = "SVE",
+		.hwcap_type = AT_HWCAP,
+		.hwcap = HWCAP_SVE,
+		.regset = NT_ARM_SVE,
+		.prctl_set = PR_SVE_SET_VL,
+	},
+};
+
+#define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
+#define FPSIMD_TESTS 3
+
+#define EXPECTED_TESTS ((VL_TESTS + FPSIMD_TESTS) * ARRAY_SIZE(vec_types))
+
 static void fill_buf(char *buf, size_t size)
 {
 	int i;
@@ -59,7 +79,8 @@ static int get_fpsimd(pid_t pid, struct user_fpsimd_state *fpsimd)
 	return ptrace(PTRACE_GETREGSET, pid, NT_PRFPREG, &iov);
 }
 
-static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
+static struct user_sve_header *get_sve(pid_t pid, const struct vec_type *type,
+				       void **buf, size_t *size)
 {
 	struct user_sve_header *sve;
 	void *p;
@@ -80,7 +101,7 @@ static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
 
 		iov.iov_base = *buf;
 		iov.iov_len = sz;
-		if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov))
+		if (ptrace(PTRACE_GETREGSET, pid, type->regset, &iov))
 			goto error;
 
 		sve = *buf;
@@ -96,17 +117,19 @@ static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
 	return NULL;
 }
 
-static int set_sve(pid_t pid, const struct user_sve_header *sve)
+static int set_sve(pid_t pid, const struct vec_type *type,
+		   const struct user_sve_header *sve)
 {
 	struct iovec iov;
 
 	iov.iov_base = (void *)sve;
 	iov.iov_len = sve->size;
-	return ptrace(PTRACE_SETREGSET, pid, NT_ARM_SVE, &iov);
+	return ptrace(PTRACE_SETREGSET, pid, type->regset, &iov);
 }
 
 /* Validate attempting to set the specfied VL via ptrace */
-static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
+static void ptrace_set_get_vl(pid_t child, const struct vec_type *type,
+			      unsigned int vl, bool *supported)
 {
 	struct user_sve_header sve;
 	struct user_sve_header *new_sve = NULL;
@@ -116,10 +139,10 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
 	*supported = false;
 
 	/* Check if the VL is supported in this process */
-	prctl_vl = prctl(PR_SVE_SET_VL, vl);
+	prctl_vl = prctl(type->prctl_set, vl);
 	if (prctl_vl == -1)
-		ksft_exit_fail_msg("prctl(PR_SVE_SET_VL) failed: %s (%d)\n",
-				   strerror(errno), errno);
+		ksft_exit_fail_msg("prctl(PR_%s_SET_VL) failed: %s (%d)\n",
+				   type->name, strerror(errno), errno);
 
 	/* If the VL is not supported then a supported VL will be returned */
 	*supported = (prctl_vl == vl);
@@ -128,9 +151,10 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
 	memset(&sve, 0, sizeof(sve));
 	sve.size = sizeof(sve);
 	sve.vl = vl;
-	ret = set_sve(child, &sve);
+	ret = set_sve(child, type, &sve);
 	if (ret != 0) {
-		ksft_test_result_fail("Failed to set VL %u\n", vl);
+		ksft_test_result_fail("Failed to set %s VL %u\n",
+				      type->name, vl);
 		return;
 	}
 
@@ -138,12 +162,14 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
 	 * Read back the new register state and verify that we have the
 	 * same VL that we got from prctl() on ourselves.
 	 */
-	if (!get_sve(child, (void **)&new_sve, &new_sve_size)) {
-		ksft_test_result_fail("Failed to read VL %u\n", vl);
+	if (!get_sve(child, type, (void **)&new_sve, &new_sve_size)) {
+		ksft_test_result_fail("Failed to read %s VL %u\n",
+				      type->name, vl);
 		return;
 	}
 
-	ksft_test_result(new_sve->vl = prctl_vl, "Set VL %u\n", vl);
+	ksft_test_result(new_sve->vl = prctl_vl, "Set %s VL %u\n",
+			 type->name, vl);
 
 	free(new_sve);
 }
@@ -159,7 +185,7 @@ static void check_u32(unsigned int vl, const char *reg,
 }
 
 /* Access the FPSIMD registers via the SVE regset */
-static void ptrace_sve_fpsimd(pid_t child)
+static void ptrace_sve_fpsimd(pid_t child, const struct vec_type *type)
 {
 	void *svebuf = NULL;
 	size_t svebufsz = 0;
@@ -169,17 +195,18 @@ static void ptrace_sve_fpsimd(pid_t child)
 	unsigned char *p;
 
 	/* New process should start with FPSIMD registers only */
-	sve = get_sve(child, &svebuf, &svebufsz);
+	sve = get_sve(child, type, &svebuf, &svebufsz);
 	if (!sve) {
-		ksft_test_result_fail("get_sve: %s\n", strerror(errno));
+		ksft_test_result_fail("get_sve(%s): %s\n",
+				      type->name, strerror(errno));
 
 		return;
 	} else {
-		ksft_test_result_pass("get_sve(FPSIMD)\n");
+		ksft_test_result_pass("get_sve(%s FPSIMD)\n", type->name);
 	}
 
 	ksft_test_result((sve->flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD,
-			 "Set FPSIMD registers\n");
+			 "Set FPSIMD registers via %s\n", type->name);
 	if ((sve->flags & SVE_PT_REGS_MASK) != SVE_PT_REGS_FPSIMD)
 		goto out;
 
@@ -193,9 +220,9 @@ static void ptrace_sve_fpsimd(pid_t child)
 			p[j] = j;
 	}
 
-	if (set_sve(child, sve)) {
-		ksft_test_result_fail("set_sve(FPSIMD): %s\n",
-				      strerror(errno));
+	if (set_sve(child, type, sve)) {
+		ksft_test_result_fail("set_sve(%s FPSIMD): %s\n",
+				      type->name, strerror(errno));
 
 		goto out;
 	}
@@ -207,16 +234,20 @@ static void ptrace_sve_fpsimd(pid_t child)
 		goto out;
 	}
 	if (memcmp(fpsimd, &new_fpsimd, sizeof(*fpsimd)) == 0)
-		ksft_test_result_pass("get_fpsimd() gave same state\n");
+		ksft_test_result_pass("%s get_fpsimd() gave same state\n",
+				      type->name);
 	else
-		ksft_test_result_fail("get_fpsimd() gave different state\n");
+		ksft_test_result_fail("%s get_fpsimd() gave different state\n",
+				      type->name);
 
 out:
 	free(svebuf);
 }
 
 /* Validate attempting to set SVE data and read SVE data */
-static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
+static void ptrace_set_sve_get_sve_data(pid_t child,
+					const struct vec_type *type,
+					unsigned int vl)
 {
 	void *write_buf;
 	void *read_buf = NULL;
@@ -231,8 +262,8 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 	data_size = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
 	write_buf = malloc(data_size);
 	if (!write_buf) {
-		ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
-				      data_size, vl);
+		ksft_test_result_fail("Error allocating %d byte buffer for %s VL %u\n",
+				      data_size, type->name, vl);
 		return;
 	}
 	write_sve = write_buf;
@@ -256,23 +287,26 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 
 	/* TODO: Generate a valid FFR pattern */
 
-	ret = set_sve(child, write_sve);
+	ret = set_sve(child, type, write_sve);
 	if (ret != 0) {
-		ksft_test_result_fail("Failed to set VL %u data\n", vl);
+		ksft_test_result_fail("Failed to set %s VL %u data\n",
+				      type->name, vl);
 		goto out;
 	}
 
 	/* Read the data back */
-	if (!get_sve(child, (void **)&read_buf, &read_sve_size)) {
-		ksft_test_result_fail("Failed to read VL %u data\n", vl);
+	if (!get_sve(child, type, (void **)&read_buf, &read_sve_size)) {
+		ksft_test_result_fail("Failed to read %s VL %u data\n",
+				      type->name, vl);
 		goto out;
 	}
 	read_sve = read_buf;
 
 	/* We might read more data if there's extensions we don't know */
 	if (read_sve->size < write_sve->size) {
-		ksft_test_result_fail("Wrote %d bytes, only read %d\n",
-				      write_sve->size, read_sve->size);
+		ksft_test_result_fail("%s wrote %d bytes, only read %d\n",
+				      type->name, write_sve->size,
+				      read_sve->size);
 		goto out_read;
 	}
 
@@ -299,7 +333,8 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 	check_u32(vl, "FPCR", write_buf + SVE_PT_SVE_FPCR_OFFSET(vq),
 		  read_buf + SVE_PT_SVE_FPCR_OFFSET(vq), &errors);
 
-	ksft_test_result(errors == 0, "Set and get SVE data for VL %u\n", vl);
+	ksft_test_result(errors == 0, "Set and get %s data for VL %u\n",
+			 type->name, vl);
 
 out_read:
 	free(read_buf);
@@ -308,7 +343,9 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
 }
 
 /* Validate attempting to set SVE data and read SVE data */
-static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
+static void ptrace_set_sve_get_fpsimd_data(pid_t child,
+					   const struct vec_type *type,
+					   unsigned int vl)
 {
 	void *write_buf;
 	struct user_sve_header *write_sve;
@@ -326,8 +363,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 	data_size = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
 	write_buf = malloc(data_size);
 	if (!write_buf) {
-		ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
-				      data_size, vl);
+		ksft_test_result_fail("Error allocating %d byte buffer for %s VL %u\n",
+				      data_size, type->name, vl);
 		return;
 	}
 	write_sve = write_buf;
@@ -345,16 +382,17 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 	fill_buf(write_buf + SVE_PT_SVE_FPSR_OFFSET(vq), SVE_PT_SVE_FPSR_SIZE);
 	fill_buf(write_buf + SVE_PT_SVE_FPCR_OFFSET(vq), SVE_PT_SVE_FPCR_SIZE);
 
-	ret = set_sve(child, write_sve);
+	ret = set_sve(child, type, write_sve);
 	if (ret != 0) {
-		ksft_test_result_fail("Failed to set VL %u data\n", vl);
+		ksft_test_result_fail("Failed to set %s VL %u data\n",
+				      type->name, vl);
 		goto out;
 	}
 
 	/* Read the data back */
 	if (get_fpsimd(child, &fpsimd_state)) {
-		ksft_test_result_fail("Failed to read VL %u FPSIMD data\n",
-				      vl);
+		ksft_test_result_fail("Failed to read %s VL %u FPSIMD data\n",
+				      type->name, vl);
 		goto out;
 	}
 
@@ -369,7 +407,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 		       sizeof(tmp));
 
 		if (tmp != fpsimd_state.vregs[i]) {
-			printf("# Mismatch in FPSIMD for VL %u Z%d\n", vl, i);
+			printf("# Mismatch in FPSIMD for %s VL %u Z%d\n",
+			       type->name, vl, i);
 			errors++;
 		}
 	}
@@ -379,8 +418,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
 	check_u32(vl, "FPCR", write_buf + SVE_PT_SVE_FPCR_OFFSET(vq),
 		  &fpsimd_state.fpcr, &errors);
 
-	ksft_test_result(errors == 0, "Set and get FPSIMD data for VL %u\n",
-			 vl);
+	ksft_test_result(errors == 0, "Set and get FPSIMD data for %s VL %u\n",
+			 type->name, vl);
 
 out:
 	free(write_buf);
@@ -390,7 +429,7 @@ static int do_parent(pid_t child)
 {
 	int ret = EXIT_FAILURE;
 	pid_t pid;
-	int status;
+	int status, i;
 	siginfo_t si;
 	unsigned int vq, vl;
 	bool vl_supported;
@@ -449,23 +488,50 @@ static int do_parent(pid_t child)
 		}
 	}
 
-	/* FPSIMD via SVE regset */
-	ptrace_sve_fpsimd(child);
-
-	/* Step through every possible VQ */
-	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
-		vl = sve_vl_from_vq(vq);
+	/*
+	 * Do all the FPSIMD tests before starting SVE so we know we
+	 * don't have SVE specific register state.
+	 */
+	for (i = 0; i < ARRAY_SIZE(vec_types); i++) {
+		/* FPSIMD via SVE regset */
+		if (getauxval(vec_types[i].hwcap_type) & vec_types[i].hwcap) {
+			ptrace_sve_fpsimd(child, &vec_types[i]);
+		} else {
+			ksft_test_result_skip("%s FPSIMD get via SVE\n",
+					      vec_types[i].name);
+			ksft_test_result_skip("%s FPSIMD set via SVE\n",
+					      vec_types[i].name);
+			ksft_test_result_skip("%s set read via FPSIMD\n",
+					      vec_types[i].name);
+		}
+	}
 
-		/* First, try to set this vector length */
-		ptrace_set_get_vl(child, vl, &vl_supported);
+	for (i = 0; i < ARRAY_SIZE(vec_types); i++) {
+		/* Step through every possible VQ */
+		for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
+			vl = sve_vl_from_vq(vq);
+
+			/* First, try to set this vector length */
+			if (getauxval(vec_types[i].hwcap_type) &
+			    vec_types[i].hwcap) {
+				ptrace_set_get_vl(child, &vec_types[i], vl,
+						  &vl_supported);
+			} else {
+				ksft_test_result_skip("%s get/set VL %d\n",
+						      vec_types[i].name, vl);
+				vl_supported = false;
+			}
 
-		/* If the VL is supported validate data set/get */
-		if (vl_supported) {
-			ptrace_set_sve_get_sve_data(child, vl);
-			ptrace_set_sve_get_fpsimd_data(child, vl);
-		} else {
-			ksft_test_result_skip("set SVE get SVE for VL %d\n", vl);
-			ksft_test_result_skip("set SVE get FPSIMD for VL %d\n", vl);
+			/* If the VL is supported validate data set/get */
+			if (vl_supported) {
+				ptrace_set_sve_get_sve_data(child, &vec_types[i], vl);
+				ptrace_set_sve_get_fpsimd_data(child, &vec_types[i], vl);
+			} else {
+				ksft_test_result_skip("%s set SVE get SVE for VL %d\n",
+						      vec_types[i].name, vl);
+				ksft_test_result_skip("%s set SVE get FPSIMD for VL %d\n",
+						      vec_types[i].name, vl);
+			}
 		}
 	}
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Provide ABI documentation for SME similar to that for SVE. Due to the very
large overlap around streaming SVE mode in both implementation and
interfaces documentation for streaming mode SVE is added to the SVE
document rather than the SME one.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/arm64/index.rst |   1 +
 Documentation/arm64/sme.rst   | 427 ++++++++++++++++++++++++++++++++++
 Documentation/arm64/sve.rst   |  62 ++++-
 3 files changed, 480 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/arm64/sme.rst

diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 4f840bac083e..ae21f8118830 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -21,6 +21,7 @@ ARM64 Architecture
     perf
     pointer-authentication
     silicon-errata
+    sme
     sve
     tagged-address-abi
     tagged-pointers
diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
new file mode 100644
index 000000000000..7a69bf637afd
--- /dev/null
+++ b/Documentation/arm64/sme.rst
@@ -0,0 +1,427 @@
+===================================================
+Scalable Matrix Extension support for AArch64 Linux
+===================================================
+
+This document outlines briefly the interface provided to userspace by Linux in
+order to support use of the ARM Scalable Matrix Extension (SME).
+
+This is an outline of the most important features and issues only and not
+intended to be exhaustive.  It should be read in conjunction with the SVE
+documentation in sve.rst which provides details on the Streaming SVE mode
+included in SME.
+
+This document does not aim to describe the SME architecture or programmer's
+model.  To aid understanding, a minimal description of relevant programmer's
+model features for SME is included in Appendix A.
+
+
+1.  General
+-----------
+
+* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
+  register state are tracked per thread.
+
+* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
+  AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
+  instructions and registers, and the Linux-specific system interfaces
+  described in this document.  SME is reported in /proc/cpuinfo as "sme".
+
+* Support for the execution of SME instructions in userspace can also be
+  detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
+  instruction, and checking that the value of the SME field is nonzero. [3]
+
+  It does not guarantee the presence of the system interfaces described in the
+  following sections: software that needs to verify that those interfaces are
+  present must check for HWCAP2_SME instead.
+
+* There are a number of optional SME features, presence of these is reported
+  through AT_HWCAP2 through:
+
+	HWCAP2_SME_I16I64
+	HWCAP2_SME_F64F64
+	HWCAP2_SME_I8I32
+	HWCAP2_SME_F16F32
+	HWCAP2_SME_B16F32
+	HWCAP2_SME_F32F32
+
+  This list may be extended over time as the SME architecture evolves.
+
+  These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
+  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
+  cpu-feature-registers.txt for details.
+
+* Debuggers should restrict themselves to interacting with the target via the
+  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
+  of detecting support for these regsets is to connect to a target process
+  first and then attempt a
+
+	ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
+
+* Whenever ZA register values are exchanged in memory between userspace and
+  the kernel, the register value is encoded in memory as a series of horizontal
+  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
+  used for SVE vectors.
+
+
+2.  Vector lengths
+------------------
+
+SME defines a second vector length similar to the SVE vector length which is
+controls the size of the streaming mode SVE vectors and the ZA matrix array.
+The ZA matrix is square with each side having as many bytes as a SVE vector.
+
+
+3.  Sharing of streaming and non-streaming mode SVE state
+---------------------------------------------------------
+
+It is implementation defined which if any parts of the SVE state are shared
+between streaming and non-streaming modes.  When switching between modes
+via software interfaces such as ptrace if no register content is provided as
+part of switching no state will be assumed to be shared and everything will
+be zeroed.
+
+
+4.  System call behaviour
+-------------------------
+
+* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
+  ZA matrix are preserved.
+
+* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
+  as normal.
+
+* Neither the SVE registers nor ZA are used to pass arguments to or receive
+  results from any syscall.
+
+* On creation fork() or clone() the newly created process will have PSTATE.SM
+  and PSTATE.ZA cleared.
+
+* All other SME state of a thread, including the currently configured vector
+  length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
+  length (if any), is preserved across all syscalls, subject to the specific
+  exceptions for execve() described in section 6.
+
+
+5.  Signal handling
+-------------------
+
+* A new signal frame record za_context encodes the ZA register contents on
+  signal delivery. [1]
+
+* The signal frame record for ZA always contains basic metadata, in particular
+  the thread's vector length (in za_context.vl).
+
+* The ZA matrix may or may not be included in the record, depending on
+  the value of PSTATE.ZA.  The registers are present if  and only if:
+  za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
+  in which case PSTATE.ZA == 1.
+
+* If matrix data is present, the remainder of the record has a vl-dependent
+  size and layout.  Macros ZA_SIG_* are defined [1] to facilitate access to
+  them.
+
+* The matrix is stored as a series of horizontal vectors in the same format as
+  is used for SVE vectors.
+
+* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
+  space is allocated on the stack, an extra_context record is written in
+  __reserved[] referencing this space.  za_context is then written in the
+  extra space.  Refer to [1] for further details about this mechanism.
+
+
+5.  Signal return
+-----------------
+
+When returning from a signal handler:
+
+* If there is no za_context record in the signal frame, or if the record is
+  present but contains no register data as desribed in the previous section,
+  then ZA is disabled.
+
+* If za_context is present in the signal frame and contains matrix data then
+  PSTATE.ZA is set to 1 and ZA is populated with the specified data.
+
+* The vector length cannot be changed via signal return.  If za_context.vl in
+  the signal frame does not match the current vector length, the signal return
+  attempt is treated as illegal, resulting in a forced SIGSEGV.
+
+
+6.  prctl extensions
+--------------------
+
+Some new prctl() calls are added to allow programs to manage the SVE vector
+length:
+
+prctl(PR_SME_SET_VL, unsigned long arg)
+
+    Sets the vector length of the calling thread and related flags, where
+    arg == vl | flags.  Other threads of the calling process are unaffected.
+
+    vl is the desired vector length, where sve_vl_valid(vl) must be true.
+
+    flags:
+
+	PR_SME_VL_INHERIT
+
+	    Inherit the current vector length across execve().  Otherwise, the
+	    vector length is reset to the system default at execve().  (See
+	    Section 9.)
+
+	PR_SME_SET_VL_ONEXEC
+
+	    Defer the requested vector length change until the next execve()
+	    performed by this thread.
+
+	    The effect is equivalent to implicit exceution of the following
+	    call immediately after the next execve() (if any) by the thread:
+
+		prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
+
+	    This allows launching of a new program with a different vector
+	    length, while avoiding runtime side effects in the caller.
+
+	    Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
+	    immediately.
+
+
+    Return value: a nonnegative on success, or a negative value on error:
+	EINVAL: SME not supported, invalid vector length requested, or
+	    invalid flags.
+
+
+    On success:
+
+    * Either the calling thread's vector length or the deferred vector length
+      to be applied at the next execve() by the thread (dependent on whether
+      PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
+      supported by the system that is less than or equal to vl.  If vl ==
+      SVE_VL_MAX, the value set will be the largest value supported by the
+      system.
+
+    * Any previously outstanding deferred vector length change in the calling
+      thread is cancelled.
+
+    * The returned value describes the resulting configuration, encoded as for
+      PR_SME_GET_VL.  The vector length reported in this value is the new
+      current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
+      present in arg; otherwise, the reported vector length is the deferred
+      vector length that will be applied at the next execve() by the calling
+      thread.
+
+    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
+      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
+      unspecified, including both streaming and non-streaming SVE state.
+      Calling PR_SME_SET_VL with vl equal to the thread's current vector
+      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
+      does not constitute a change to the vector length for this purpose.
+
+    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
+      Calling PR_SME_SET_VL with vl equal to the thread's current vector
+      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
+      does not constitute a change to the vector length for this purpose.
+
+
+prctl(PR_SME_GET_VL)
+
+    Gets the vector length of the calling thread.
+
+    The following flag may be OR-ed into the result:
+
+	PR_SME_VL_INHERIT
+
+	    Vector length will be inherited across execve().
+
+    There is no way to determine whether there is an outstanding deferred
+    vector length change (which would only normally be the case between a
+    fork() or vfork() and the corresponding execve() in typical use).
+
+    To extract the vector length from the result, and it with
+    PR_SME_VL_LEN_MASK.
+
+    Return value: a nonnegative value on success, or a negative value on error:
+	EINVAL: SME not supported.
+
+
+7.  ptrace extensions
+---------------------
+
+* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
+  state via PTRACE_GETREGSET and  PTRACE_SETREGSET, this is documented in
+  sve.rst.
+
+* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
+  PTRACE_GETREGSET and PTRACE_SETREGSET.
+
+  Refer to [2] for definitions.
+
+The regset data starts with struct user_za_header, containing:
+
+    size
+
+	Size of the complete regset, in bytes.
+	This depends on vl and possibly on other things in the future.
+
+	If a call to PTRACE_GETREGSET requests less data than the value of
+	size, the caller can allocate a larger buffer and retry in order to
+	read the complete regset.
+
+    max_size
+
+	Maximum size in bytes that the regset can grow to for the target
+	thread.  The regset won't grow bigger than this even if the target
+	thread changes its vector length etc.
+
+    vl
+
+	Target thread's current streaming vector length, in bytes.
+
+    max_vl
+
+	Maximum possible streaming vector length for the target thread.
+
+    flags
+
+	Zero or more of the following flags, which have the same
+	meaning and behaviour as the corresponding PR_SET_VL_* flags:
+
+	    SME_PT_VL_INHERIT
+
+	    SME_PT_VL_ONEXEC (SETREGSET only).
+
+* The effects of changing the vector length and/or flags are equivalent to
+  those documented for PR_SME_SET_VL.
+
+  The caller must make a further GETREGSET call if it needs to know what VL is
+  actually set by SETREGSET, unless is it known in advance that the requested
+  VL is supported.
+
+* The size and layout of the payload depends on the header fields.  The
+  SME_PT_ZA_*() macros are provided to facilitate access to the data.
+
+* In either case, for SETREGSET it is permissible to omit the payload, in which
+  case the vector length and flags are changed and PSTATE.ZA is set to 0
+  (along with any consequences of those changes).  If a payload is provided
+  then PSTATE.ZA will be set to 1.
+
+* For SETREGSET, if the requested VL is not supported, the effect will be the
+  same as if the payload were omitted, except that an EIO error is reported.
+  No attempt is made to translate the payload data to the correct layout
+  for the vector length actually set.  It is up to the caller to translate the
+  payload layout for the actual VL and retry.
+
+* The effect of writing a partial, incomplete payload is unspecified.
+
+
+8.  ELF coredump extensions
+---------------------------
+
+* NT_ARM_SSVE notes will be added to each coredump for
+  each thread of the dumped process.  The contents will be equivalent to the
+  data that would have been read if a PTRACE_GETREGSET of the corresponding
+  type were executed for each thread when the coredump was generated.
+
+* A NT_ARM_ZA note will be added to each coredump for each thread of the
+  dumped process.  The contents will be equivalent to the data that would have
+  been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
+  when the coredump was generated.
+
+
+9.  System runtime configuration
+--------------------------------
+
+* To mitigate the ABI impact of expansion of the signal frame, a policy
+  mechanism is provided for administrators, distro maintainers and developers
+  to set the default vector length for userspace processes:
+
+/proc/sys/abi/sme_default_vector_length
+
+    Writing the text representation of an integer to this file sets the system
+    default vector length to the specified value, unless the value is greater
+    than the maximum vector length supported by the system in which case the
+    default vector length is set to that maximum.
+
+    The result can be determined by reopening the file and reading its
+    contents.
+
+    At boot, the default vector length is initially set to 32 or the maximum
+    supported vector length, whichever is smaller and supported.  This
+    determines the initial vector length of the init process (PID 1).
+
+    Reading this file returns the current system default vector length.
+
+* At every execve() call, the new vector length of the new process is set to
+  the system default vector length, unless
+
+    * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
+      calling thread, or
+
+    * a deferred vector length change is pending, established via the
+      PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
+
+* Modifying the system default vector length does not affect the vector length
+  of any existing process or thread that does not make an execve() call.
+
+
+Appendix A.  SME programmer's model (informative)
+=================================================
+
+This section provides a minimal description of the additions made by SVE to the
+ARMv8-A programmer's model that are relevant to this document.
+
+Note: This section is for information only and not intended to be complete or
+to replace any architectural specification.
+
+A.1.  Registers
+---------------
+
+In A64 state, SME adds the following:
+
+* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
+  features are available.  When supported EL0 software may enter and leave
+  streaming mode at any time.
+
+  For best system performance it is strongly encourage for software to enable
+  streaming mode only when it is actively being used.
+
+* A new vector length controlling the size of ZA and the Z registers when in
+  streaming mode, separately to the vector length used for SVE when not in
+  streaming mode.  There is no requirement that either the currently selected
+  vector length or the set of vector lengths supported for the two modes in
+  a given system have any relationship.  The streaming mode vector length
+  is referred to as SVL.
+
+* A new ZA matrix register.  This is a square matrix of SVLxSVL bits.  Most
+  operations on ZA require that streaming mode be enabled but ZA can be
+  enabled without streaming mode in order to load, save and retain data.
+
+  For best system performance it is strongly encourage for software to enable
+  ZA only when it is actively being used.
+
+* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
+  SMSTOP instructions or by access to the SVCR system register:
+
+  * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
+    data while if it is 0 then ZA can not be accessed.  When PSTATE.ZA is
+    changed from 0 to 1 all bits in ZA are cleared.
+
+  * PSTATE.SM, if this is 1 then the PE is in streaming mode.  When the value
+    of PSTATE.SM is changed then it is implementationd defined if the subset
+    of the floating point register bits valid in both modes may be retained.
+    Any other bits will be cleared.
+
+
+References
+==========
+
+[1] arch/arm64/include/uapi/asm/sigcontext.h
+    AArch64 Linux signal ABI definitions
+
+[2] arch/arm64/include/uapi/asm/ptrace.h
+    AArch64 Linux ptrace ABI definitions
+
+[3] Documentation/arm64/cpu-feature-registers.rst
+
+[4] ARM IHI0055C
+    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
+    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
+    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
index 03137154299e..574ad883b0f3 100644
--- a/Documentation/arm64/sve.rst
+++ b/Documentation/arm64/sve.rst
@@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
 Date:   4 August 2017
 
 This document outlines briefly the interface provided to userspace by Linux in
-order to support use of the ARM Scalable Vector Extension (SVE).
+order to support use of the ARM Scalable Vector Extension (SVE), including
+interactions with Streaming SVE mode added by the Scalable Matrix Extension
+(SME).
 
 This is an outline of the most important features and issues only and not
 intended to be exhaustive.
@@ -23,6 +25,9 @@ model features for SVE is included in Appendix A.
 * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
   tracked per-thread.
 
+* In streaming mode FFR is not accessible, when these interfaces are used to
+  access streaming mode FFR is read and written as zero.
+
 * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
   AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
   instructions and registers, and the Linux-specific system interfaces
@@ -53,10 +58,19 @@ model features for SVE is included in Appendix A.
   which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
   cpu-feature-registers.txt for details.
 
+* On hardware that supports the SME extensions, HWCAP2_SME will also be
+  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
+  streaming mode which provides a subset of the SVE feature set using a
+  separate SME vector length and the same Z/V registers.  See sme.rst
+  for more details.
+
 * Debuggers should restrict themselves to interacting with the target via the
   NT_ARM_SVE regset.  The recommended way of detecting support for this regset
   is to connect to a target process first and then attempt a
-  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
+  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
+  present and streaming SVE mode is in use the FPSIMD subset of registers
+  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
+  in the target.
 
 * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
   between userspace and the kernel, the register value is encoded in memory in
@@ -126,6 +140,11 @@ the SVE instruction set architecture.
   are only present in fpsimd_context.  For convenience, the content of V0..V31
   is duplicated between sve_context and fpsimd_context.
 
+* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
+  if set indicates that the thread is in streaming mode and the vector length
+  and register data (if present) describe the streaming SVE data and vector
+  length.
+
 * The signal frame record for SVE always contains basic metadata, in particular
   the thread's vector length (in sve_context.vl).
 
@@ -170,6 +189,11 @@ When returning from a signal handler:
   the signal frame does not match the current vector length, the signal return
   attempt is treated as illegal, resulting in a forced SIGSEGV.
 
+* It is permitted to enter or leave streaming mode by setting or clearing
+  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
+  when doing so sve_context.vl and any register data are appropriate for the
+  vector length in the new mode.
+
 
 6.  prctl extensions
 --------------------
@@ -265,8 +289,14 @@ prctl(PR_SVE_GET_VL)
 7.  ptrace extensions
 ---------------------
 
-* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
-  PTRACE_SETREGSET.
+* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
+  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
+  streaming mode SVE registers and NT_ARM_SVE describes the
+  non-streaming mode SVE registers.
+
+  In this description a register set is referred to as being "live" when
+  the target is in the appropriate streaming or non-streaming mode and is
+  using data beyond the subset shared with the FPSIMD Vn registers.
 
   Refer to [2] for definitions.
 
@@ -297,7 +327,7 @@ The regset data starts with struct user_sve_header, containing:
 
     flags
 
-	either
+	at most one of
 
 	    SVE_PT_REGS_FPSIMD
 
@@ -331,6 +361,10 @@ The regset data starts with struct user_sve_header, containing:
 
 	    SVE_PT_VL_ONEXEC (SETREGSET only).
 
+	If neither FPSIMD nor SVE flags are provided then no register
+	payload is available, this is only possible when SME is implemented.
+
+
 * The effects of changing the vector length and/or flags are equivalent to
   those documented for PR_SVE_SET_VL.
 
@@ -355,17 +389,25 @@ The regset data starts with struct user_sve_header, containing:
   unspecified.  It is up to the caller to translate the payload layout
   for the actual VL and retry.
 
+* Where SME is implemented it is not possible to GETREGSET the register
+  state for normal SVE when in streaming mode, nor the streaming mode
+  register state when in normal mode, regardless of the implementation defined
+  behaviour of the hardware for sharing data between the two modes.
+
+* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
+  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
+  if the target was not in streaming mode.
+
 * The effect of writing a partial, incomplete payload is unspecified.
 
 
 8.  ELF coredump extensions
 ---------------------------
 
-* A NT_ARM_SVE note will be added to each coredump for each thread of the
-  dumped process.  The contents will be equivalent to the data that would have
-  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
-  when the coredump was generated.
-
+* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
+  each thread of the dumped process.  The contents will be equivalent to the
+  data that would have been read if a PTRACE_GETREGSET of the corresponding
+  type were executed for each thread when the coredump was generated.
 
 9.  System runtime configuration
 --------------------------------
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Provide ABI documentation for SME similar to that for SVE. Due to the very
large overlap around streaming SVE mode in both implementation and
interfaces documentation for streaming mode SVE is added to the SVE
document rather than the SME one.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/arm64/index.rst |   1 +
 Documentation/arm64/sme.rst   | 427 ++++++++++++++++++++++++++++++++++
 Documentation/arm64/sve.rst   |  62 ++++-
 3 files changed, 480 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/arm64/sme.rst

diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 4f840bac083e..ae21f8118830 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -21,6 +21,7 @@ ARM64 Architecture
     perf
     pointer-authentication
     silicon-errata
+    sme
     sve
     tagged-address-abi
     tagged-pointers
diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
new file mode 100644
index 000000000000..7a69bf637afd
--- /dev/null
+++ b/Documentation/arm64/sme.rst
@@ -0,0 +1,427 @@
+===================================================
+Scalable Matrix Extension support for AArch64 Linux
+===================================================
+
+This document outlines briefly the interface provided to userspace by Linux in
+order to support use of the ARM Scalable Matrix Extension (SME).
+
+This is an outline of the most important features and issues only and not
+intended to be exhaustive.  It should be read in conjunction with the SVE
+documentation in sve.rst which provides details on the Streaming SVE mode
+included in SME.
+
+This document does not aim to describe the SME architecture or programmer's
+model.  To aid understanding, a minimal description of relevant programmer's
+model features for SME is included in Appendix A.
+
+
+1.  General
+-----------
+
+* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
+  register state are tracked per thread.
+
+* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
+  AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
+  instructions and registers, and the Linux-specific system interfaces
+  described in this document.  SME is reported in /proc/cpuinfo as "sme".
+
+* Support for the execution of SME instructions in userspace can also be
+  detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
+  instruction, and checking that the value of the SME field is nonzero. [3]
+
+  It does not guarantee the presence of the system interfaces described in the
+  following sections: software that needs to verify that those interfaces are
+  present must check for HWCAP2_SME instead.
+
+* There are a number of optional SME features, presence of these is reported
+  through AT_HWCAP2 through:
+
+	HWCAP2_SME_I16I64
+	HWCAP2_SME_F64F64
+	HWCAP2_SME_I8I32
+	HWCAP2_SME_F16F32
+	HWCAP2_SME_B16F32
+	HWCAP2_SME_F32F32
+
+  This list may be extended over time as the SME architecture evolves.
+
+  These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
+  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
+  cpu-feature-registers.txt for details.
+
+* Debuggers should restrict themselves to interacting with the target via the
+  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
+  of detecting support for these regsets is to connect to a target process
+  first and then attempt a
+
+	ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
+
+* Whenever ZA register values are exchanged in memory between userspace and
+  the kernel, the register value is encoded in memory as a series of horizontal
+  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
+  used for SVE vectors.
+
+
+2.  Vector lengths
+------------------
+
+SME defines a second vector length similar to the SVE vector length which is
+controls the size of the streaming mode SVE vectors and the ZA matrix array.
+The ZA matrix is square with each side having as many bytes as a SVE vector.
+
+
+3.  Sharing of streaming and non-streaming mode SVE state
+---------------------------------------------------------
+
+It is implementation defined which if any parts of the SVE state are shared
+between streaming and non-streaming modes.  When switching between modes
+via software interfaces such as ptrace if no register content is provided as
+part of switching no state will be assumed to be shared and everything will
+be zeroed.
+
+
+4.  System call behaviour
+-------------------------
+
+* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
+  ZA matrix are preserved.
+
+* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
+  as normal.
+
+* Neither the SVE registers nor ZA are used to pass arguments to or receive
+  results from any syscall.
+
+* On creation fork() or clone() the newly created process will have PSTATE.SM
+  and PSTATE.ZA cleared.
+
+* All other SME state of a thread, including the currently configured vector
+  length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
+  length (if any), is preserved across all syscalls, subject to the specific
+  exceptions for execve() described in section 6.
+
+
+5.  Signal handling
+-------------------
+
+* A new signal frame record za_context encodes the ZA register contents on
+  signal delivery. [1]
+
+* The signal frame record for ZA always contains basic metadata, in particular
+  the thread's vector length (in za_context.vl).
+
+* The ZA matrix may or may not be included in the record, depending on
+  the value of PSTATE.ZA.  The registers are present if  and only if:
+  za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
+  in which case PSTATE.ZA == 1.
+
+* If matrix data is present, the remainder of the record has a vl-dependent
+  size and layout.  Macros ZA_SIG_* are defined [1] to facilitate access to
+  them.
+
+* The matrix is stored as a series of horizontal vectors in the same format as
+  is used for SVE vectors.
+
+* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
+  space is allocated on the stack, an extra_context record is written in
+  __reserved[] referencing this space.  za_context is then written in the
+  extra space.  Refer to [1] for further details about this mechanism.
+
+
+5.  Signal return
+-----------------
+
+When returning from a signal handler:
+
+* If there is no za_context record in the signal frame, or if the record is
+  present but contains no register data as desribed in the previous section,
+  then ZA is disabled.
+
+* If za_context is present in the signal frame and contains matrix data then
+  PSTATE.ZA is set to 1 and ZA is populated with the specified data.
+
+* The vector length cannot be changed via signal return.  If za_context.vl in
+  the signal frame does not match the current vector length, the signal return
+  attempt is treated as illegal, resulting in a forced SIGSEGV.
+
+
+6.  prctl extensions
+--------------------
+
+Some new prctl() calls are added to allow programs to manage the SVE vector
+length:
+
+prctl(PR_SME_SET_VL, unsigned long arg)
+
+    Sets the vector length of the calling thread and related flags, where
+    arg == vl | flags.  Other threads of the calling process are unaffected.
+
+    vl is the desired vector length, where sve_vl_valid(vl) must be true.
+
+    flags:
+
+	PR_SME_VL_INHERIT
+
+	    Inherit the current vector length across execve().  Otherwise, the
+	    vector length is reset to the system default at execve().  (See
+	    Section 9.)
+
+	PR_SME_SET_VL_ONEXEC
+
+	    Defer the requested vector length change until the next execve()
+	    performed by this thread.
+
+	    The effect is equivalent to implicit exceution of the following
+	    call immediately after the next execve() (if any) by the thread:
+
+		prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
+
+	    This allows launching of a new program with a different vector
+	    length, while avoiding runtime side effects in the caller.
+
+	    Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
+	    immediately.
+
+
+    Return value: a nonnegative on success, or a negative value on error:
+	EINVAL: SME not supported, invalid vector length requested, or
+	    invalid flags.
+
+
+    On success:
+
+    * Either the calling thread's vector length or the deferred vector length
+      to be applied at the next execve() by the thread (dependent on whether
+      PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
+      supported by the system that is less than or equal to vl.  If vl ==
+      SVE_VL_MAX, the value set will be the largest value supported by the
+      system.
+
+    * Any previously outstanding deferred vector length change in the calling
+      thread is cancelled.
+
+    * The returned value describes the resulting configuration, encoded as for
+      PR_SME_GET_VL.  The vector length reported in this value is the new
+      current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
+      present in arg; otherwise, the reported vector length is the deferred
+      vector length that will be applied at the next execve() by the calling
+      thread.
+
+    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
+      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
+      unspecified, including both streaming and non-streaming SVE state.
+      Calling PR_SME_SET_VL with vl equal to the thread's current vector
+      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
+      does not constitute a change to the vector length for this purpose.
+
+    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
+      Calling PR_SME_SET_VL with vl equal to the thread's current vector
+      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
+      does not constitute a change to the vector length for this purpose.
+
+
+prctl(PR_SME_GET_VL)
+
+    Gets the vector length of the calling thread.
+
+    The following flag may be OR-ed into the result:
+
+	PR_SME_VL_INHERIT
+
+	    Vector length will be inherited across execve().
+
+    There is no way to determine whether there is an outstanding deferred
+    vector length change (which would only normally be the case between a
+    fork() or vfork() and the corresponding execve() in typical use).
+
+    To extract the vector length from the result, and it with
+    PR_SME_VL_LEN_MASK.
+
+    Return value: a nonnegative value on success, or a negative value on error:
+	EINVAL: SME not supported.
+
+
+7.  ptrace extensions
+---------------------
+
+* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
+  state via PTRACE_GETREGSET and  PTRACE_SETREGSET, this is documented in
+  sve.rst.
+
+* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
+  PTRACE_GETREGSET and PTRACE_SETREGSET.
+
+  Refer to [2] for definitions.
+
+The regset data starts with struct user_za_header, containing:
+
+    size
+
+	Size of the complete regset, in bytes.
+	This depends on vl and possibly on other things in the future.
+
+	If a call to PTRACE_GETREGSET requests less data than the value of
+	size, the caller can allocate a larger buffer and retry in order to
+	read the complete regset.
+
+    max_size
+
+	Maximum size in bytes that the regset can grow to for the target
+	thread.  The regset won't grow bigger than this even if the target
+	thread changes its vector length etc.
+
+    vl
+
+	Target thread's current streaming vector length, in bytes.
+
+    max_vl
+
+	Maximum possible streaming vector length for the target thread.
+
+    flags
+
+	Zero or more of the following flags, which have the same
+	meaning and behaviour as the corresponding PR_SET_VL_* flags:
+
+	    SME_PT_VL_INHERIT
+
+	    SME_PT_VL_ONEXEC (SETREGSET only).
+
+* The effects of changing the vector length and/or flags are equivalent to
+  those documented for PR_SME_SET_VL.
+
+  The caller must make a further GETREGSET call if it needs to know what VL is
+  actually set by SETREGSET, unless is it known in advance that the requested
+  VL is supported.
+
+* The size and layout of the payload depends on the header fields.  The
+  SME_PT_ZA_*() macros are provided to facilitate access to the data.
+
+* In either case, for SETREGSET it is permissible to omit the payload, in which
+  case the vector length and flags are changed and PSTATE.ZA is set to 0
+  (along with any consequences of those changes).  If a payload is provided
+  then PSTATE.ZA will be set to 1.
+
+* For SETREGSET, if the requested VL is not supported, the effect will be the
+  same as if the payload were omitted, except that an EIO error is reported.
+  No attempt is made to translate the payload data to the correct layout
+  for the vector length actually set.  It is up to the caller to translate the
+  payload layout for the actual VL and retry.
+
+* The effect of writing a partial, incomplete payload is unspecified.
+
+
+8.  ELF coredump extensions
+---------------------------
+
+* NT_ARM_SSVE notes will be added to each coredump for
+  each thread of the dumped process.  The contents will be equivalent to the
+  data that would have been read if a PTRACE_GETREGSET of the corresponding
+  type were executed for each thread when the coredump was generated.
+
+* A NT_ARM_ZA note will be added to each coredump for each thread of the
+  dumped process.  The contents will be equivalent to the data that would have
+  been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
+  when the coredump was generated.
+
+
+9.  System runtime configuration
+--------------------------------
+
+* To mitigate the ABI impact of expansion of the signal frame, a policy
+  mechanism is provided for administrators, distro maintainers and developers
+  to set the default vector length for userspace processes:
+
+/proc/sys/abi/sme_default_vector_length
+
+    Writing the text representation of an integer to this file sets the system
+    default vector length to the specified value, unless the value is greater
+    than the maximum vector length supported by the system in which case the
+    default vector length is set to that maximum.
+
+    The result can be determined by reopening the file and reading its
+    contents.
+
+    At boot, the default vector length is initially set to 32 or the maximum
+    supported vector length, whichever is smaller and supported.  This
+    determines the initial vector length of the init process (PID 1).
+
+    Reading this file returns the current system default vector length.
+
+* At every execve() call, the new vector length of the new process is set to
+  the system default vector length, unless
+
+    * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
+      calling thread, or
+
+    * a deferred vector length change is pending, established via the
+      PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
+
+* Modifying the system default vector length does not affect the vector length
+  of any existing process or thread that does not make an execve() call.
+
+
+Appendix A.  SME programmer's model (informative)
+=================================================
+
+This section provides a minimal description of the additions made by SVE to the
+ARMv8-A programmer's model that are relevant to this document.
+
+Note: This section is for information only and not intended to be complete or
+to replace any architectural specification.
+
+A.1.  Registers
+---------------
+
+In A64 state, SME adds the following:
+
+* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
+  features are available.  When supported EL0 software may enter and leave
+  streaming mode at any time.
+
+  For best system performance it is strongly encourage for software to enable
+  streaming mode only when it is actively being used.
+
+* A new vector length controlling the size of ZA and the Z registers when in
+  streaming mode, separately to the vector length used for SVE when not in
+  streaming mode.  There is no requirement that either the currently selected
+  vector length or the set of vector lengths supported for the two modes in
+  a given system have any relationship.  The streaming mode vector length
+  is referred to as SVL.
+
+* A new ZA matrix register.  This is a square matrix of SVLxSVL bits.  Most
+  operations on ZA require that streaming mode be enabled but ZA can be
+  enabled without streaming mode in order to load, save and retain data.
+
+  For best system performance it is strongly encourage for software to enable
+  ZA only when it is actively being used.
+
+* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
+  SMSTOP instructions or by access to the SVCR system register:
+
+  * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
+    data while if it is 0 then ZA can not be accessed.  When PSTATE.ZA is
+    changed from 0 to 1 all bits in ZA are cleared.
+
+  * PSTATE.SM, if this is 1 then the PE is in streaming mode.  When the value
+    of PSTATE.SM is changed then it is implementationd defined if the subset
+    of the floating point register bits valid in both modes may be retained.
+    Any other bits will be cleared.
+
+
+References
+==========
+
+[1] arch/arm64/include/uapi/asm/sigcontext.h
+    AArch64 Linux signal ABI definitions
+
+[2] arch/arm64/include/uapi/asm/ptrace.h
+    AArch64 Linux ptrace ABI definitions
+
+[3] Documentation/arm64/cpu-feature-registers.rst
+
+[4] ARM IHI0055C
+    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
+    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
+    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
index 03137154299e..574ad883b0f3 100644
--- a/Documentation/arm64/sve.rst
+++ b/Documentation/arm64/sve.rst
@@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
 Date:   4 August 2017
 
 This document outlines briefly the interface provided to userspace by Linux in
-order to support use of the ARM Scalable Vector Extension (SVE).
+order to support use of the ARM Scalable Vector Extension (SVE), including
+interactions with Streaming SVE mode added by the Scalable Matrix Extension
+(SME).
 
 This is an outline of the most important features and issues only and not
 intended to be exhaustive.
@@ -23,6 +25,9 @@ model features for SVE is included in Appendix A.
 * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
   tracked per-thread.
 
+* In streaming mode FFR is not accessible, when these interfaces are used to
+  access streaming mode FFR is read and written as zero.
+
 * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
   AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
   instructions and registers, and the Linux-specific system interfaces
@@ -53,10 +58,19 @@ model features for SVE is included in Appendix A.
   which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
   cpu-feature-registers.txt for details.
 
+* On hardware that supports the SME extensions, HWCAP2_SME will also be
+  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
+  streaming mode which provides a subset of the SVE feature set using a
+  separate SME vector length and the same Z/V registers.  See sme.rst
+  for more details.
+
 * Debuggers should restrict themselves to interacting with the target via the
   NT_ARM_SVE regset.  The recommended way of detecting support for this regset
   is to connect to a target process first and then attempt a
-  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
+  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
+  present and streaming SVE mode is in use the FPSIMD subset of registers
+  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
+  in the target.
 
 * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
   between userspace and the kernel, the register value is encoded in memory in
@@ -126,6 +140,11 @@ the SVE instruction set architecture.
   are only present in fpsimd_context.  For convenience, the content of V0..V31
   is duplicated between sve_context and fpsimd_context.
 
+* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
+  if set indicates that the thread is in streaming mode and the vector length
+  and register data (if present) describe the streaming SVE data and vector
+  length.
+
 * The signal frame record for SVE always contains basic metadata, in particular
   the thread's vector length (in sve_context.vl).
 
@@ -170,6 +189,11 @@ When returning from a signal handler:
   the signal frame does not match the current vector length, the signal return
   attempt is treated as illegal, resulting in a forced SIGSEGV.
 
+* It is permitted to enter or leave streaming mode by setting or clearing
+  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
+  when doing so sve_context.vl and any register data are appropriate for the
+  vector length in the new mode.
+
 
 6.  prctl extensions
 --------------------
@@ -265,8 +289,14 @@ prctl(PR_SVE_GET_VL)
 7.  ptrace extensions
 ---------------------
 
-* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
-  PTRACE_SETREGSET.
+* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
+  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
+  streaming mode SVE registers and NT_ARM_SVE describes the
+  non-streaming mode SVE registers.
+
+  In this description a register set is referred to as being "live" when
+  the target is in the appropriate streaming or non-streaming mode and is
+  using data beyond the subset shared with the FPSIMD Vn registers.
 
   Refer to [2] for definitions.
 
@@ -297,7 +327,7 @@ The regset data starts with struct user_sve_header, containing:
 
     flags
 
-	either
+	at most one of
 
 	    SVE_PT_REGS_FPSIMD
 
@@ -331,6 +361,10 @@ The regset data starts with struct user_sve_header, containing:
 
 	    SVE_PT_VL_ONEXEC (SETREGSET only).
 
+	If neither FPSIMD nor SVE flags are provided then no register
+	payload is available, this is only possible when SME is implemented.
+
+
 * The effects of changing the vector length and/or flags are equivalent to
   those documented for PR_SVE_SET_VL.
 
@@ -355,17 +389,25 @@ The regset data starts with struct user_sve_header, containing:
   unspecified.  It is up to the caller to translate the payload layout
   for the actual VL and retry.
 
+* Where SME is implemented it is not possible to GETREGSET the register
+  state for normal SVE when in streaming mode, nor the streaming mode
+  register state when in normal mode, regardless of the implementation defined
+  behaviour of the hardware for sharing data between the two modes.
+
+* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
+  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
+  if the target was not in streaming mode.
+
 * The effect of writing a partial, incomplete payload is unspecified.
 
 
 8.  ELF coredump extensions
 ---------------------------
 
-* A NT_ARM_SVE note will be added to each coredump for each thread of the
-  dumped process.  The contents will be equivalent to the data that would have
-  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
-  when the coredump was generated.
-
+* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
+  each thread of the dumped process.  The contents will be equivalent to the
+  data that would have been read if a PTRACE_GETREGSET of the corresponding
+  type were executed for each thread when the coredump was generated.
 
 9.  System runtime configuration
 --------------------------------
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 13/38] arm64/sme: System register and exception syndrome definitions
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The arm64 Scalable Matrix Extension (SME) adds some new system registers,
fields in existing system registers and exception syndromes. This patch
adds definitions for these for use in future patches implementing support
for this extension.

Since SME will be the first user of FEAT_HCX in the kernel also include
the definitions for enumerating it and the HCRX system register it adds.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/esr.h     |  3 +-
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/include/asm/sysreg.h  | 53 ++++++++++++++++++++++++++++++++
 arch/arm64/kernel/traps.c        |  1 +
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 29f97eb3dad4..8483b200c598 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -37,7 +37,8 @@
 #define ESR_ELx_EC_ERET		(0x1a)	/* EL2 only */
 /* Unallocated EC: 0x1B */
 #define ESR_ELx_EC_FPAC		(0x1C)	/* EL1 and above */
-/* Unallocated EC: 0x1D - 0x1E */
+#define ESR_ELx_EC_SME		(0x1D)
+/* Unallocated EC: 0x1E */
 #define ESR_ELx_EC_IMP_DEF	(0x1f)	/* EL3 only */
 #define ESR_ELx_EC_IABT_LOW	(0x20)
 #define ESR_ELx_EC_IABT_CUR	(0x21)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 327120c0089f..438de250ab39 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -279,6 +279,7 @@
 #define CPTR_EL2_TCPAC	(1 << 31)
 #define CPTR_EL2_TAM	(1 << 30)
 #define CPTR_EL2_TTA	(1 << 20)
+#define CPTR_EL2_TSM	(1 << 12)
 #define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
 #define CPTR_EL2_TZ	(1 << 8)
 #define CPTR_NVHE_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 (nVHE) */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b268082d67ed..86f442a3769e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -171,6 +171,7 @@
 #define SYS_ID_AA64PFR0_EL1		sys_reg(3, 0, 0, 4, 0)
 #define SYS_ID_AA64PFR1_EL1		sys_reg(3, 0, 0, 4, 1)
 #define SYS_ID_AA64ZFR0_EL1		sys_reg(3, 0, 0, 4, 4)
+#define SYS_ID_AA64SMFR0_EL1		sys_reg(3, 0, 0, 4, 5)
 
 #define SYS_ID_AA64DFR0_EL1		sys_reg(3, 0, 0, 5, 0)
 #define SYS_ID_AA64DFR1_EL1		sys_reg(3, 0, 0, 5, 1)
@@ -193,6 +194,8 @@
 
 #define SYS_ZCR_EL1			sys_reg(3, 0, 1, 2, 0)
 #define SYS_TRFCR_EL1			sys_reg(3, 0, 1, 2, 1)
+#define SYS_SMPRI_EL1			sys_reg(3, 0, 1, 2, 4)
+#define SYS_SMCR_EL1			sys_reg(3, 0, 1, 2, 6)
 
 #define SYS_TTBR0_EL1			sys_reg(3, 0, 2, 0, 0)
 #define SYS_TTBR1_EL1			sys_reg(3, 0, 2, 0, 1)
@@ -385,6 +388,8 @@
 #define TRBIDR_ALIGN_MASK		GENMASK(3, 0)
 #define TRBIDR_ALIGN_SHIFT		0
 
+#define SMPRI_EL1_PRIORITY_MASK		0xf
+
 #define SYS_PMINTENSET_EL1		sys_reg(3, 0, 9, 14, 1)
 #define SYS_PMINTENCLR_EL1		sys_reg(3, 0, 9, 14, 2)
 
@@ -440,8 +445,13 @@
 #define SYS_CCSIDR_EL1			sys_reg(3, 1, 0, 0, 0)
 #define SYS_CLIDR_EL1			sys_reg(3, 1, 0, 0, 1)
 #define SYS_GMID_EL1			sys_reg(3, 1, 0, 0, 4)
+#define SYS_SMIDR_EL1			sys_reg(3, 1, 0, 0, 6)
 #define SYS_AIDR_EL1			sys_reg(3, 1, 0, 0, 7)
 
+#define SYS_SMIDR_EL1_IMPLEMENTER_SHIFT	24
+#define SYS_SMIDR_EL1_SMPS_SHIFT	15
+#define SYS_SMIDR_EL1_AFFINITY_SHIFT	0
+
 #define SYS_CSSELR_EL1			sys_reg(3, 2, 0, 0, 0)
 
 #define SYS_CTR_EL0			sys_reg(3, 3, 0, 0, 1)
@@ -450,6 +460,10 @@
 #define SYS_RNDR_EL0			sys_reg(3, 3, 2, 4, 0)
 #define SYS_RNDRRS_EL0			sys_reg(3, 3, 2, 4, 1)
 
+#define SYS_SVCR_EL0			sys_reg(3, 3, 4, 2, 2)
+#define SYS_SVCR_EL0_ZA_MASK		2
+#define SYS_SVCR_EL0_SM_MASK		1
+
 #define SYS_PMCR_EL0			sys_reg(3, 3, 9, 12, 0)
 #define SYS_PMCNTENSET_EL0		sys_reg(3, 3, 9, 12, 1)
 #define SYS_PMCNTENCLR_EL0		sys_reg(3, 3, 9, 12, 2)
@@ -466,6 +480,7 @@
 
 #define SYS_TPIDR_EL0			sys_reg(3, 3, 13, 0, 2)
 #define SYS_TPIDRRO_EL0			sys_reg(3, 3, 13, 0, 3)
+#define SYS_TPIDR2_EL0			sys_reg(3, 3, 13, 0, 5)
 
 #define SYS_SCXTNUM_EL0			sys_reg(3, 3, 13, 0, 7)
 
@@ -532,6 +547,9 @@
 #define SYS_HFGITR_EL2			sys_reg(3, 4, 1, 1, 6)
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
+#define SYS_HCRX_EL2			sys_reg(3, 4, 1, 2, 2)
+#define SYS_SMPRIMAP_EL2		sys_reg(3, 4, 1, 2, 5)
+#define SYS_SMCR_EL2			sys_reg(3, 4, 1, 2, 6)
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
 #define SYS_HDFGRTR_EL2			sys_reg(3, 4, 3, 1, 4)
 #define SYS_HDFGWTR_EL2			sys_reg(3, 4, 3, 1, 5)
@@ -591,6 +609,7 @@
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+#define SYS_SMCR_EL12			sys_reg(3, 5, 1, 2, 3)
 #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
 #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
 #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
@@ -614,6 +633,7 @@
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
 /* Common SCTLR_ELx flags. */
+#define SCTLR_ELx_ENTP2	(BIT(60))
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
 
@@ -793,6 +813,7 @@
 #define ID_AA64PFR0_ELx_32BIT_64BIT	0x2
 
 /* id_aa64pfr1 */
+#define ID_AA64PFR1_SME_SHIFT		24
 #define ID_AA64PFR1_MPAMFRAC_SHIFT	16
 #define ID_AA64PFR1_RASFRAC_SHIFT	12
 #define ID_AA64PFR1_MTE_SHIFT		8
@@ -803,6 +824,7 @@
 #define ID_AA64PFR1_SSBS_PSTATE_ONLY	1
 #define ID_AA64PFR1_SSBS_PSTATE_INSNS	2
 #define ID_AA64PFR1_BT_BTI		0x1
+#define ID_AA64PFR1_SME			1
 
 #define ID_AA64PFR1_MTE_NI		0x0
 #define ID_AA64PFR1_MTE_EL0		0x1
@@ -830,6 +852,21 @@
 #define ID_AA64ZFR0_AES_PMULL		0x2
 #define ID_AA64ZFR0_SVEVER_SVE2		0x1
 
+/* id_aa64smfr0 */
+#define ID_AA64SMFR0_I16I64_SHIFT	52
+#define ID_AA64SMFR0_F64F64_SHIFT	48
+#define ID_AA64SMFR0_I8I32_SHIFT	36
+#define ID_AA64SMFR0_F16F32_SHIFT	35
+#define ID_AA64SMFR0_B16F32_SHIFT	34
+#define ID_AA64SMFR0_F32F32_SHIFT	32
+
+#define ID_AA64SMFR0_I16I64		0x4
+#define ID_AA64SMFR0_F64F64		0x1
+#define ID_AA64SMFR0_I8I32		0x4
+#define ID_AA64SMFR0_F16F32		0x1
+#define ID_AA64SMFR0_B16F32		0x1
+#define ID_AA64SMFR0_F32F32		0x1
+
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_ECV_SHIFT		60
 #define ID_AA64MMFR0_FGT_SHIFT		56
@@ -881,6 +918,7 @@
 #endif
 
 /* id_aa64mmfr1 */
+#define ID_AA64MMFR1_HCX_SHIFT		40
 #define ID_AA64MMFR1_ETS_SHIFT		36
 #define ID_AA64MMFR1_TWED_SHIFT		32
 #define ID_AA64MMFR1_XNX_SHIFT		28
@@ -1072,6 +1110,19 @@
 #define ZCR_ELx_LEN_SIZE	9
 #define ZCR_ELx_LEN_MASK	0x1ff
 
+/*
+ * The SMCR_ELx_LEN_* definitions intentionally include bits [8:4] which
+ * are reserved by the SME architecture for future expansion of the LEN
+ * field, with compatible semantics.
+ */
+#define SMCR_ELx_LEN_SHIFT	0
+#define SMCR_ELx_LEN_SIZE	9
+#define SMCR_ELx_LEN_MASK	0x1ff
+
+#define CPACR_EL1_SMEN_EL1EN	(BIT(24)) /* enable EL1 access */
+#define CPACR_EL1_SMEN_EL0EN	(BIT(25)) /* enable EL0 access, if EL1EN set */
+#define CPACR_EL1_SMEN		(CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN)
+
 #define CPACR_EL1_ZEN_EL1EN	(BIT(16)) /* enable EL1 access */
 #define CPACR_EL1_ZEN_EL0EN	(BIT(17)) /* enable EL0 access, if EL1EN set */
 #define CPACR_EL1_ZEN		(CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
@@ -1125,6 +1176,8 @@
 #define TRFCR_ELx_ExTRE			BIT(1)
 #define TRFCR_ELx_E0TRE			BIT(0)
 
+/* HCRX_EL2 definitions */
+#define HCRX_EL2_SMPME_MASK		(1 << 5)
 
 /* GIC Hypervisor interface registers */
 /* ICH_MISR_EL2 bit definitions */
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index b03e383d944a..cb8934e75654 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -816,6 +816,7 @@ static const char *esr_class_str[] = {
 	[ESR_ELx_EC_SVE]		= "SVE",
 	[ESR_ELx_EC_ERET]		= "ERET/ERETAA/ERETAB",
 	[ESR_ELx_EC_FPAC]		= "FPAC",
+	[ESR_ELx_EC_SME]		= "SME",
 	[ESR_ELx_EC_IMP_DEF]		= "EL3 IMP DEF",
 	[ESR_ELx_EC_IABT_LOW]		= "IABT (lower EL)",
 	[ESR_ELx_EC_IABT_CUR]		= "IABT (current EL)",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 13/38] arm64/sme: System register and exception syndrome definitions
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The arm64 Scalable Matrix Extension (SME) adds some new system registers,
fields in existing system registers and exception syndromes. This patch
adds definitions for these for use in future patches implementing support
for this extension.

Since SME will be the first user of FEAT_HCX in the kernel also include
the definitions for enumerating it and the HCRX system register it adds.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/esr.h     |  3 +-
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/include/asm/sysreg.h  | 53 ++++++++++++++++++++++++++++++++
 arch/arm64/kernel/traps.c        |  1 +
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 29f97eb3dad4..8483b200c598 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -37,7 +37,8 @@
 #define ESR_ELx_EC_ERET		(0x1a)	/* EL2 only */
 /* Unallocated EC: 0x1B */
 #define ESR_ELx_EC_FPAC		(0x1C)	/* EL1 and above */
-/* Unallocated EC: 0x1D - 0x1E */
+#define ESR_ELx_EC_SME		(0x1D)
+/* Unallocated EC: 0x1E */
 #define ESR_ELx_EC_IMP_DEF	(0x1f)	/* EL3 only */
 #define ESR_ELx_EC_IABT_LOW	(0x20)
 #define ESR_ELx_EC_IABT_CUR	(0x21)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 327120c0089f..438de250ab39 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -279,6 +279,7 @@
 #define CPTR_EL2_TCPAC	(1 << 31)
 #define CPTR_EL2_TAM	(1 << 30)
 #define CPTR_EL2_TTA	(1 << 20)
+#define CPTR_EL2_TSM	(1 << 12)
 #define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
 #define CPTR_EL2_TZ	(1 << 8)
 #define CPTR_NVHE_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 (nVHE) */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b268082d67ed..86f442a3769e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -171,6 +171,7 @@
 #define SYS_ID_AA64PFR0_EL1		sys_reg(3, 0, 0, 4, 0)
 #define SYS_ID_AA64PFR1_EL1		sys_reg(3, 0, 0, 4, 1)
 #define SYS_ID_AA64ZFR0_EL1		sys_reg(3, 0, 0, 4, 4)
+#define SYS_ID_AA64SMFR0_EL1		sys_reg(3, 0, 0, 4, 5)
 
 #define SYS_ID_AA64DFR0_EL1		sys_reg(3, 0, 0, 5, 0)
 #define SYS_ID_AA64DFR1_EL1		sys_reg(3, 0, 0, 5, 1)
@@ -193,6 +194,8 @@
 
 #define SYS_ZCR_EL1			sys_reg(3, 0, 1, 2, 0)
 #define SYS_TRFCR_EL1			sys_reg(3, 0, 1, 2, 1)
+#define SYS_SMPRI_EL1			sys_reg(3, 0, 1, 2, 4)
+#define SYS_SMCR_EL1			sys_reg(3, 0, 1, 2, 6)
 
 #define SYS_TTBR0_EL1			sys_reg(3, 0, 2, 0, 0)
 #define SYS_TTBR1_EL1			sys_reg(3, 0, 2, 0, 1)
@@ -385,6 +388,8 @@
 #define TRBIDR_ALIGN_MASK		GENMASK(3, 0)
 #define TRBIDR_ALIGN_SHIFT		0
 
+#define SMPRI_EL1_PRIORITY_MASK		0xf
+
 #define SYS_PMINTENSET_EL1		sys_reg(3, 0, 9, 14, 1)
 #define SYS_PMINTENCLR_EL1		sys_reg(3, 0, 9, 14, 2)
 
@@ -440,8 +445,13 @@
 #define SYS_CCSIDR_EL1			sys_reg(3, 1, 0, 0, 0)
 #define SYS_CLIDR_EL1			sys_reg(3, 1, 0, 0, 1)
 #define SYS_GMID_EL1			sys_reg(3, 1, 0, 0, 4)
+#define SYS_SMIDR_EL1			sys_reg(3, 1, 0, 0, 6)
 #define SYS_AIDR_EL1			sys_reg(3, 1, 0, 0, 7)
 
+#define SYS_SMIDR_EL1_IMPLEMENTER_SHIFT	24
+#define SYS_SMIDR_EL1_SMPS_SHIFT	15
+#define SYS_SMIDR_EL1_AFFINITY_SHIFT	0
+
 #define SYS_CSSELR_EL1			sys_reg(3, 2, 0, 0, 0)
 
 #define SYS_CTR_EL0			sys_reg(3, 3, 0, 0, 1)
@@ -450,6 +460,10 @@
 #define SYS_RNDR_EL0			sys_reg(3, 3, 2, 4, 0)
 #define SYS_RNDRRS_EL0			sys_reg(3, 3, 2, 4, 1)
 
+#define SYS_SVCR_EL0			sys_reg(3, 3, 4, 2, 2)
+#define SYS_SVCR_EL0_ZA_MASK		2
+#define SYS_SVCR_EL0_SM_MASK		1
+
 #define SYS_PMCR_EL0			sys_reg(3, 3, 9, 12, 0)
 #define SYS_PMCNTENSET_EL0		sys_reg(3, 3, 9, 12, 1)
 #define SYS_PMCNTENCLR_EL0		sys_reg(3, 3, 9, 12, 2)
@@ -466,6 +480,7 @@
 
 #define SYS_TPIDR_EL0			sys_reg(3, 3, 13, 0, 2)
 #define SYS_TPIDRRO_EL0			sys_reg(3, 3, 13, 0, 3)
+#define SYS_TPIDR2_EL0			sys_reg(3, 3, 13, 0, 5)
 
 #define SYS_SCXTNUM_EL0			sys_reg(3, 3, 13, 0, 7)
 
@@ -532,6 +547,9 @@
 #define SYS_HFGITR_EL2			sys_reg(3, 4, 1, 1, 6)
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
+#define SYS_HCRX_EL2			sys_reg(3, 4, 1, 2, 2)
+#define SYS_SMPRIMAP_EL2		sys_reg(3, 4, 1, 2, 5)
+#define SYS_SMCR_EL2			sys_reg(3, 4, 1, 2, 6)
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
 #define SYS_HDFGRTR_EL2			sys_reg(3, 4, 3, 1, 4)
 #define SYS_HDFGWTR_EL2			sys_reg(3, 4, 3, 1, 5)
@@ -591,6 +609,7 @@
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+#define SYS_SMCR_EL12			sys_reg(3, 5, 1, 2, 3)
 #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
 #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
 #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
@@ -614,6 +633,7 @@
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
 /* Common SCTLR_ELx flags. */
+#define SCTLR_ELx_ENTP2	(BIT(60))
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
 
@@ -793,6 +813,7 @@
 #define ID_AA64PFR0_ELx_32BIT_64BIT	0x2
 
 /* id_aa64pfr1 */
+#define ID_AA64PFR1_SME_SHIFT		24
 #define ID_AA64PFR1_MPAMFRAC_SHIFT	16
 #define ID_AA64PFR1_RASFRAC_SHIFT	12
 #define ID_AA64PFR1_MTE_SHIFT		8
@@ -803,6 +824,7 @@
 #define ID_AA64PFR1_SSBS_PSTATE_ONLY	1
 #define ID_AA64PFR1_SSBS_PSTATE_INSNS	2
 #define ID_AA64PFR1_BT_BTI		0x1
+#define ID_AA64PFR1_SME			1
 
 #define ID_AA64PFR1_MTE_NI		0x0
 #define ID_AA64PFR1_MTE_EL0		0x1
@@ -830,6 +852,21 @@
 #define ID_AA64ZFR0_AES_PMULL		0x2
 #define ID_AA64ZFR0_SVEVER_SVE2		0x1
 
+/* id_aa64smfr0 */
+#define ID_AA64SMFR0_I16I64_SHIFT	52
+#define ID_AA64SMFR0_F64F64_SHIFT	48
+#define ID_AA64SMFR0_I8I32_SHIFT	36
+#define ID_AA64SMFR0_F16F32_SHIFT	35
+#define ID_AA64SMFR0_B16F32_SHIFT	34
+#define ID_AA64SMFR0_F32F32_SHIFT	32
+
+#define ID_AA64SMFR0_I16I64		0x4
+#define ID_AA64SMFR0_F64F64		0x1
+#define ID_AA64SMFR0_I8I32		0x4
+#define ID_AA64SMFR0_F16F32		0x1
+#define ID_AA64SMFR0_B16F32		0x1
+#define ID_AA64SMFR0_F32F32		0x1
+
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_ECV_SHIFT		60
 #define ID_AA64MMFR0_FGT_SHIFT		56
@@ -881,6 +918,7 @@
 #endif
 
 /* id_aa64mmfr1 */
+#define ID_AA64MMFR1_HCX_SHIFT		40
 #define ID_AA64MMFR1_ETS_SHIFT		36
 #define ID_AA64MMFR1_TWED_SHIFT		32
 #define ID_AA64MMFR1_XNX_SHIFT		28
@@ -1072,6 +1110,19 @@
 #define ZCR_ELx_LEN_SIZE	9
 #define ZCR_ELx_LEN_MASK	0x1ff
 
+/*
+ * The SMCR_ELx_LEN_* definitions intentionally include bits [8:4] which
+ * are reserved by the SME architecture for future expansion of the LEN
+ * field, with compatible semantics.
+ */
+#define SMCR_ELx_LEN_SHIFT	0
+#define SMCR_ELx_LEN_SIZE	9
+#define SMCR_ELx_LEN_MASK	0x1ff
+
+#define CPACR_EL1_SMEN_EL1EN	(BIT(24)) /* enable EL1 access */
+#define CPACR_EL1_SMEN_EL0EN	(BIT(25)) /* enable EL0 access, if EL1EN set */
+#define CPACR_EL1_SMEN		(CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN)
+
 #define CPACR_EL1_ZEN_EL1EN	(BIT(16)) /* enable EL1 access */
 #define CPACR_EL1_ZEN_EL0EN	(BIT(17)) /* enable EL0 access, if EL1EN set */
 #define CPACR_EL1_ZEN		(CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
@@ -1125,6 +1176,8 @@
 #define TRFCR_ELx_ExTRE			BIT(1)
 #define TRFCR_ELx_E0TRE			BIT(0)
 
+/* HCRX_EL2 definitions */
+#define HCRX_EL2_SMPME_MASK		(1 << 5)
 
 /* GIC Hypervisor interface registers */
 /* ICH_MISR_EL2 bit definitions */
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index b03e383d944a..cb8934e75654 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -816,6 +816,7 @@ static const char *esr_class_str[] = {
 	[ESR_ELx_EC_SVE]		= "SVE",
 	[ESR_ELx_EC_ERET]		= "ERET/ERETAA/ERETAB",
 	[ESR_ELx_EC_FPAC]		= "FPAC",
+	[ESR_ELx_EC_SME]		= "SME",
 	[ESR_ELx_EC_IMP_DEF]		= "EL3 IMP DEF",
 	[ESR_ELx_EC_IABT_LOW]		= "IABT (lower EL)",
 	[ESR_ELx_EC_IABT_CUR]		= "IABT (current EL)",
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 14/38] arm64/sme: Define macros for manually encoding SME instructions
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As with SVE rather than impose ambitious toolchain requirements for SME we
manually encode the few instructions which we require in order to perform
the work the kernel needs to do. That is currently:

 - Vector store and load for the ZA array.
 - Zeroing of the whole ZA array.

We do not use SMSTART and SMSTOP, instead saving and restoring SVCR via
system register access.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 44 +++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index b22538a6137e..bc45bb984c49 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -93,6 +93,12 @@
 	.endif
 .endm
 
+.macro _sme_check_wv v
+	.if (\v) < 12 || (\v) > 15
+		.error "Bad vector select register \v."
+	.endif
+.endm
+
 /* SVE instruction encodings for non-SVE-capable assemblers */
 /* (pre binutils 2.28, all kernel capable clang versions support SVE) */
 
@@ -174,6 +180,44 @@
 		| (\np)
 .endm
 
+/* SME instruction encodings for non-SME-capable assemblers */
+
+/*
+ * STR (vector from ZA array):
+ *	STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _sme_str_zav nw, nxbase, offset=0
+	_sme_check_wv \nw
+	_check_general_reg \nxbase
+	_check_num (\offset), -0x100, 0xff
+	.inst	0xe1200000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+/*
+ * LDR (vector to ZA array):
+ *	LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _sme_ldr_zav nw, nxbase, offset=0
+	_sme_check_wv \nw
+	_check_general_reg \nxbase
+	_check_num (\offset), -0x100, 0xff
+	.inst	0xe1000000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+/*
+ * Zero the entire ZA array
+ *	ZERO ZA
+ */
+.macro zero_za
+	.inst 0xc00800ff
+.endm
+
 .macro __for from:req, to:req
 	.if (\from) == (\to)
 		_for__body %\from
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 14/38] arm64/sme: Define macros for manually encoding SME instructions
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As with SVE rather than impose ambitious toolchain requirements for SME we
manually encode the few instructions which we require in order to perform
the work the kernel needs to do. That is currently:

 - Vector store and load for the ZA array.
 - Zeroing of the whole ZA array.

We do not use SMSTART and SMSTOP, instead saving and restoring SVCR via
system register access.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 44 +++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index b22538a6137e..bc45bb984c49 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -93,6 +93,12 @@
 	.endif
 .endm
 
+.macro _sme_check_wv v
+	.if (\v) < 12 || (\v) > 15
+		.error "Bad vector select register \v."
+	.endif
+.endm
+
 /* SVE instruction encodings for non-SVE-capable assemblers */
 /* (pre binutils 2.28, all kernel capable clang versions support SVE) */
 
@@ -174,6 +180,44 @@
 		| (\np)
 .endm
 
+/* SME instruction encodings for non-SME-capable assemblers */
+
+/*
+ * STR (vector from ZA array):
+ *	STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _sme_str_zav nw, nxbase, offset=0
+	_sme_check_wv \nw
+	_check_general_reg \nxbase
+	_check_num (\offset), -0x100, 0xff
+	.inst	0xe1200000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+/*
+ * LDR (vector to ZA array):
+ *	LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _sme_ldr_zav nw, nxbase, offset=0
+	_sme_check_wv \nw
+	_check_general_reg \nxbase
+	_check_num (\offset), -0x100, 0xff
+	.inst	0xe1000000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+/*
+ * Zero the entire ZA array
+ *	ZERO ZA
+ */
+.macro zero_za
+	.inst 0xc00800ff
+.endm
+
 .macro __for from:req, to:req
 	.if (\from) == (\to)
 		_for__body %\from
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 15/38] arm64/sme: Early CPU setup for SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

SME requires similar setup to that for SVE: disable traps to EL2 and
make sure that the maximum vector length is available to EL1, for SME we
have two traps - one for SME itself and one for TPIDR2.

In addition since we currently make no active use of priority control
for SCMUs we map all SME priorities lower ELs may configure to 0, the
architecture specified minimum priority, to ensure that nothing we
manage is able to configure itself to consume excessive resources.  This
will need to be revisited should there be a need to manage SME
priorities at runtime.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/el2_setup.h | 36 ++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 3198acb2aad8..895a27a1dcb5 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -143,6 +143,41 @@
 .Lskip_sve_\@:
 .endm
 
+/* SME register access and priority mapping */
+.macro __init_el2_nvhe_sme
+	mrs	x1, id_aa64pfr1_el1
+	ubfx	x1, x1, #ID_AA64PFR1_SME_SHIFT, #4
+	cbz	x1, .Lskip_sme_\@
+
+	bic	x0, x0, #CPTR_EL2_TSM		// Also disable SME traps
+	msr	cptr_el2, x0			// Disable copro. traps to EL2
+	isb
+
+	mrs	x1, sctlr_el2
+	orr	x1, x1, #SCTLR_ELx_ENTP2	// Disable TPIDR2 traps
+	msr	sctlr_el2, x1
+	isb
+
+	mov	x1, #SMCR_ELx_LEN_MASK		// Enable full SME vector
+	msr_s	SYS_SMCR_EL2, x1		// length for EL1.
+
+	mrs_s	x1, SYS_SMIDR_EL1		// Priority mapping supported?
+	ubfx    x1, x1, #SYS_SMIDR_EL1_SMPS_SHIFT, #1
+	cbz     x1, .Lskip_sme_\@
+
+	msr_s	SYS_SMPRIMAP_EL2, xzr		// Make all priorities equal
+
+	mrs	x1, id_aa64mmfr1_el1		// HCRX_EL2 present?
+	ubfx	x1, x1, #ID_AA64MMFR1_HCX_SHIFT, #4
+	cbz	x1, .Lskip_sme_\@
+
+	mrs_s	x1, SYS_HCRX_EL2
+	orr	x1, x1, #HCRX_EL2_SMPME_MASK	// Enable priority mapping
+	msr_s	SYS_HCRX_EL2, x1
+
+.Lskip_sme_\@:
+.endm
+
 /* Disable any fine grained traps */
 .macro __init_el2_fgt
 	mrs	x1, id_aa64mmfr0_el1
@@ -196,6 +231,7 @@
 	__init_el2_nvhe_idregs
 	__init_el2_nvhe_cptr
 	__init_el2_nvhe_sve
+	__init_el2_nvhe_sme
 	__init_el2_fgt
 	__init_el2_nvhe_prepare_eret
 .endm
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 15/38] arm64/sme: Early CPU setup for SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

SME requires similar setup to that for SVE: disable traps to EL2 and
make sure that the maximum vector length is available to EL1, for SME we
have two traps - one for SME itself and one for TPIDR2.

In addition since we currently make no active use of priority control
for SCMUs we map all SME priorities lower ELs may configure to 0, the
architecture specified minimum priority, to ensure that nothing we
manage is able to configure itself to consume excessive resources.  This
will need to be revisited should there be a need to manage SME
priorities at runtime.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/el2_setup.h | 36 ++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 3198acb2aad8..895a27a1dcb5 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -143,6 +143,41 @@
 .Lskip_sve_\@:
 .endm
 
+/* SME register access and priority mapping */
+.macro __init_el2_nvhe_sme
+	mrs	x1, id_aa64pfr1_el1
+	ubfx	x1, x1, #ID_AA64PFR1_SME_SHIFT, #4
+	cbz	x1, .Lskip_sme_\@
+
+	bic	x0, x0, #CPTR_EL2_TSM		// Also disable SME traps
+	msr	cptr_el2, x0			// Disable copro. traps to EL2
+	isb
+
+	mrs	x1, sctlr_el2
+	orr	x1, x1, #SCTLR_ELx_ENTP2	// Disable TPIDR2 traps
+	msr	sctlr_el2, x1
+	isb
+
+	mov	x1, #SMCR_ELx_LEN_MASK		// Enable full SME vector
+	msr_s	SYS_SMCR_EL2, x1		// length for EL1.
+
+	mrs_s	x1, SYS_SMIDR_EL1		// Priority mapping supported?
+	ubfx    x1, x1, #SYS_SMIDR_EL1_SMPS_SHIFT, #1
+	cbz     x1, .Lskip_sme_\@
+
+	msr_s	SYS_SMPRIMAP_EL2, xzr		// Make all priorities equal
+
+	mrs	x1, id_aa64mmfr1_el1		// HCRX_EL2 present?
+	ubfx	x1, x1, #ID_AA64MMFR1_HCX_SHIFT, #4
+	cbz	x1, .Lskip_sme_\@
+
+	mrs_s	x1, SYS_HCRX_EL2
+	orr	x1, x1, #HCRX_EL2_SMPME_MASK	// Enable priority mapping
+	msr_s	SYS_HCRX_EL2, x1
+
+.Lskip_sme_\@:
+.endm
+
 /* Disable any fine grained traps */
 .macro __init_el2_fgt
 	mrs	x1, id_aa64mmfr0_el1
@@ -196,6 +231,7 @@
 	__init_el2_nvhe_idregs
 	__init_el2_nvhe_cptr
 	__init_el2_nvhe_sve
+	__init_el2_nvhe_sme
 	__init_el2_fgt
 	__init_el2_nvhe_prepare_eret
 .endm
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 16/38] arm64/sme: Basic enumeration support
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

This patch introduces basic cpufeature support for discovering the presence
of the Scalable Matrix Extension and reporting hwcaps for the detected
features.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/arm64/elf_hwcaps.rst  | 29 ++++++++++++++++++++
 arch/arm64/include/asm/cpu.h        |  1 +
 arch/arm64/include/asm/cpufeature.h |  6 +++++
 arch/arm64/include/asm/fpsimd.h     |  1 +
 arch/arm64/include/asm/hwcap.h      |  7 +++++
 arch/arm64/include/uapi/asm/hwcap.h |  7 +++++
 arch/arm64/kernel/cpufeature.c      | 41 +++++++++++++++++++++++++++++
 arch/arm64/kernel/cpuinfo.c         |  8 ++++++
 arch/arm64/kernel/fpsimd.c          | 19 +++++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 10 files changed, 120 insertions(+)

diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index ec1a5a63c1d0..39680ff764bb 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -247,6 +247,35 @@ HWCAP2_MTE
     Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
     by Documentation/arm64/memory-tagging-extension.rst.
 
+HWCAP2_SME
+
+    Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described
+    by Documentation/arm64/sme.rst.
+
+HWCAP2_SME_I16I64
+
+    Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111.
+
+HWCAP2_SME_F64F64
+
+    Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1.
+
+HWCAP2_SME_I8I32
+
+    Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111.
+
+HWCAP2_SME_F16F32
+
+    Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1.
+
+HWCAP2_SME_B16F32
+
+    Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1.
+
+HWCAP2_SME_F32F32
+
+    Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1.
+
 4. Unused AT_HWCAP bits
 -----------------------
 
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 0f6d16faa540..667b66fe1a53 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -57,6 +57,7 @@ struct cpuinfo_arm64 {
 	u64		reg_id_aa64pfr0;
 	u64		reg_id_aa64pfr1;
 	u64		reg_id_aa64zfr0;
+	u64		reg_id_aa64smfr0;
 
 	struct cpuinfo_32bit	aarch32;
 
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index ef6be92b1921..e1b745bf5fbe 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -727,6 +727,12 @@ static __always_inline bool system_supports_sve(void)
 		cpus_have_const_cap(ARM64_SVE);
 }
 
+static __always_inline bool system_supports_sme(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_SME) &&
+		cpus_have_const_cap(ARM64_SME);
+}
+
 static __always_inline bool system_supports_cnp(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_CNP) &&
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 802597140121..160f4f246db8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,7 @@ extern void sve_set_vq(unsigned long vq_minus_1);
 
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
+extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_zcr_features(void);
 
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index 8c129db8232a..37605f4be103 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -105,6 +105,13 @@
 #define KERNEL_HWCAP_RNG		__khwcap2_feature(RNG)
 #define KERNEL_HWCAP_BTI		__khwcap2_feature(BTI)
 #define KERNEL_HWCAP_MTE		__khwcap2_feature(MTE)
+#define KERNEL_HWCAP_SME		__khwcap2_feature(SME)
+#define KERNEL_HWCAP_SME_I16I64		__khwcap2_feature(SME_I16I64)
+#define KERNEL_HWCAP_SME_F64F64		__khwcap2_feature(SME_F64F64)
+#define KERNEL_HWCAP_SME_I8I32		__khwcap2_feature(SME_I8I32)
+#define KERNEL_HWCAP_SME_F16F32		__khwcap2_feature(SME_F16F32)
+#define KERNEL_HWCAP_SME_B16F32		__khwcap2_feature(SME_B16F32)
+#define KERNEL_HWCAP_SME_F32F32		__khwcap2_feature(SME_F32F32)
 
 /*
  * This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index b8f41aa234ee..2d6bbee3c68a 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -75,5 +75,12 @@
 #define HWCAP2_RNG		(1 << 16)
 #define HWCAP2_BTI		(1 << 17)
 #define HWCAP2_MTE		(1 << 18)
+#define HWCAP2_SME		(1 << 19)
+#define HWCAP2_SME_I16I64	(1 << 20)
+#define HWCAP2_SME_F64F64	(1 << 21)
+#define HWCAP2_SME_I8I32	(1 << 22)
+#define HWCAP2_SME_F16F32	(1 << 23)
+#define HWCAP2_SME_B16F32	(1 << 24)
+#define HWCAP2_SME_F32F32	(1 << 25)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 405a65d7e618..9d3e87ba5d5a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -246,6 +246,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SME_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MPAMFRAC_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_RASFRAC_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
@@ -278,6 +279,22 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
 	ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_id_aa64smfr0[] = {
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I16I64_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F64F64_SHIFT, 1, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I8I32_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F16F32_SHIFT, 1, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_B16F32_SHIFT, 1, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F32F32_SHIFT, 1, 0),
+	ARM64_FTR_END,
+};
+
 static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_ECV_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_FGT_SHIFT, 4, 0),
@@ -624,6 +641,7 @@ static const struct __ftr_reg_entry {
 	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1,
 			       &id_aa64pfr1_override),
 	ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0),
+	ARM64_FTR_REG(SYS_ID_AA64SMFR0_EL1, ftr_id_aa64smfr0),
 
 	/* Op1 = 0, CRn = 0, CRm = 5 */
 	ARM64_FTR_REG(SYS_ID_AA64DFR0_EL1, ftr_id_aa64dfr0),
@@ -935,6 +953,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 	init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0);
 	init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
 	init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
+	init_cpu_ftr_reg(SYS_ID_AA64SMFR0_EL1, info->reg_id_aa64smfr0);
 
 	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
 		init_32bit_cpu_features(&info->aarch32);
@@ -2332,6 +2351,19 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		.min_field_value = 1,
 	},
+#ifdef CONFIG_ARM64_SME
+	{
+		.desc = "Scalable Matrix Extension",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_SME,
+		.sys_reg = SYS_ID_AA64PFR1_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR1_SME_SHIFT,
+		.min_field_value = ID_AA64PFR1_SME,
+		.matches = has_cpuid_feature,
+		.cpu_enable = sme_kernel_enable,
+	},
+#endif /* CONFIG_ARM64_SME */
 	{},
 };
 
@@ -2451,6 +2483,15 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
 #ifdef CONFIG_ARM64_MTE
 	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE),
 #endif /* CONFIG_ARM64_MTE */
+#ifdef CONFIG_ARM64_SME
+	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SME_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_SME, CAP_HWCAP, KERNEL_HWCAP_SME),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I16I64_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_I16I64, CAP_HWCAP, KERNEL_HWCAP_SME_I16I64),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F64F64_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_F64F64, CAP_HWCAP, KERNEL_HWCAP_SME_F64F64),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I8I32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_I8I32, CAP_HWCAP, KERNEL_HWCAP_SME_I8I32),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F16F32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_F16F32, CAP_HWCAP, KERNEL_HWCAP_SME_F16F32),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_B16F32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_B16F32, CAP_HWCAP, KERNEL_HWCAP_SME_B16F32),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F32F32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_F32F32, CAP_HWCAP, KERNEL_HWCAP_SME_F32F32),
+#endif /* CONFIG_ARM64_SME */
 	{},
 };
 
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 87731fea5e41..9830fa0c7647 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -94,6 +94,13 @@ static const char *const hwcap_str[] = {
 	[KERNEL_HWCAP_RNG]		= "rng",
 	[KERNEL_HWCAP_BTI]		= "bti",
 	[KERNEL_HWCAP_MTE]		= "mte",
+	[KERNEL_HWCAP_SME]		= "sme",
+	[KERNEL_HWCAP_SME_I16I64]	= "smei16i64",
+	[KERNEL_HWCAP_SME_F64F64]	= "smef64f64",
+	[KERNEL_HWCAP_SME_I8I32]	= "smei8i32",
+	[KERNEL_HWCAP_SME_F16F32]	= "smef16f32",
+	[KERNEL_HWCAP_SME_B16F32]	= "smeb16f32",
+	[KERNEL_HWCAP_SME_F32F32]	= "smef32f32",
 };
 
 #ifdef CONFIG_COMPAT
@@ -396,6 +403,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
 	info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
 	info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
+	info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
 
 	if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
 		info->reg_gmid = read_cpuid(GMID_EL1);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b3d4786e2601..bf698c3eaed3 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -986,6 +986,21 @@ void fpsimd_release_task(struct task_struct *dead_task)
 
 #endif /* CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
+{
+	/* Set priority for all PEs to architecturally defined minimum */
+	write_sysreg_s(read_sysreg_s(SYS_SMPRI_EL1) & ~SMPRI_EL1_PRIORITY_MASK,
+		       SYS_SMPRI_EL1);
+
+	/* Allow SME in kernel */
+	write_sysreg(read_sysreg(CPACR_EL1) | CPACR_EL1_SMEN_EL1EN, CPACR_EL1);
+	isb();
+}
+
+#endif /* CONFIG_ARM64_SVE */
+
 /*
  * Trapped SVE access
  *
@@ -1528,6 +1543,10 @@ static int __init fpsimd_init(void)
 	if (!cpu_have_named_feature(ASIMD))
 		pr_notice("Advanced SIMD is not implemented\n");
 
+
+	if (cpu_have_named_feature(SME) && !cpu_have_named_feature(SVE))
+		pr_notice("SME is implemented but not SVE\n");
+
 	return sve_sysctl_init();
 }
 core_initcall(fpsimd_init);
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 49305c2e6dfd..a21a82d3cd64 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -39,6 +39,7 @@ HW_DBM
 KVM_PROTECTED_MODE
 MISMATCHED_CACHE_TYPE
 MTE
+SME
 SPECTRE_V2
 SPECTRE_V3A
 SPECTRE_V4
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 16/38] arm64/sme: Basic enumeration support
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

This patch introduces basic cpufeature support for discovering the presence
of the Scalable Matrix Extension and reporting hwcaps for the detected
features.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/arm64/elf_hwcaps.rst  | 29 ++++++++++++++++++++
 arch/arm64/include/asm/cpu.h        |  1 +
 arch/arm64/include/asm/cpufeature.h |  6 +++++
 arch/arm64/include/asm/fpsimd.h     |  1 +
 arch/arm64/include/asm/hwcap.h      |  7 +++++
 arch/arm64/include/uapi/asm/hwcap.h |  7 +++++
 arch/arm64/kernel/cpufeature.c      | 41 +++++++++++++++++++++++++++++
 arch/arm64/kernel/cpuinfo.c         |  8 ++++++
 arch/arm64/kernel/fpsimd.c          | 19 +++++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 10 files changed, 120 insertions(+)

diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index ec1a5a63c1d0..39680ff764bb 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -247,6 +247,35 @@ HWCAP2_MTE
     Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
     by Documentation/arm64/memory-tagging-extension.rst.
 
+HWCAP2_SME
+
+    Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described
+    by Documentation/arm64/sme.rst.
+
+HWCAP2_SME_I16I64
+
+    Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111.
+
+HWCAP2_SME_F64F64
+
+    Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1.
+
+HWCAP2_SME_I8I32
+
+    Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111.
+
+HWCAP2_SME_F16F32
+
+    Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1.
+
+HWCAP2_SME_B16F32
+
+    Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1.
+
+HWCAP2_SME_F32F32
+
+    Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1.
+
 4. Unused AT_HWCAP bits
 -----------------------
 
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 0f6d16faa540..667b66fe1a53 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -57,6 +57,7 @@ struct cpuinfo_arm64 {
 	u64		reg_id_aa64pfr0;
 	u64		reg_id_aa64pfr1;
 	u64		reg_id_aa64zfr0;
+	u64		reg_id_aa64smfr0;
 
 	struct cpuinfo_32bit	aarch32;
 
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index ef6be92b1921..e1b745bf5fbe 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -727,6 +727,12 @@ static __always_inline bool system_supports_sve(void)
 		cpus_have_const_cap(ARM64_SVE);
 }
 
+static __always_inline bool system_supports_sme(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_SME) &&
+		cpus_have_const_cap(ARM64_SME);
+}
+
 static __always_inline bool system_supports_cnp(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_CNP) &&
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 802597140121..160f4f246db8 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,7 @@ extern void sve_set_vq(unsigned long vq_minus_1);
 
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
+extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_zcr_features(void);
 
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index 8c129db8232a..37605f4be103 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -105,6 +105,13 @@
 #define KERNEL_HWCAP_RNG		__khwcap2_feature(RNG)
 #define KERNEL_HWCAP_BTI		__khwcap2_feature(BTI)
 #define KERNEL_HWCAP_MTE		__khwcap2_feature(MTE)
+#define KERNEL_HWCAP_SME		__khwcap2_feature(SME)
+#define KERNEL_HWCAP_SME_I16I64		__khwcap2_feature(SME_I16I64)
+#define KERNEL_HWCAP_SME_F64F64		__khwcap2_feature(SME_F64F64)
+#define KERNEL_HWCAP_SME_I8I32		__khwcap2_feature(SME_I8I32)
+#define KERNEL_HWCAP_SME_F16F32		__khwcap2_feature(SME_F16F32)
+#define KERNEL_HWCAP_SME_B16F32		__khwcap2_feature(SME_B16F32)
+#define KERNEL_HWCAP_SME_F32F32		__khwcap2_feature(SME_F32F32)
 
 /*
  * This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index b8f41aa234ee..2d6bbee3c68a 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -75,5 +75,12 @@
 #define HWCAP2_RNG		(1 << 16)
 #define HWCAP2_BTI		(1 << 17)
 #define HWCAP2_MTE		(1 << 18)
+#define HWCAP2_SME		(1 << 19)
+#define HWCAP2_SME_I16I64	(1 << 20)
+#define HWCAP2_SME_F64F64	(1 << 21)
+#define HWCAP2_SME_I8I32	(1 << 22)
+#define HWCAP2_SME_F16F32	(1 << 23)
+#define HWCAP2_SME_B16F32	(1 << 24)
+#define HWCAP2_SME_F32F32	(1 << 25)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 405a65d7e618..9d3e87ba5d5a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -246,6 +246,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SME_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MPAMFRAC_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_RASFRAC_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
@@ -278,6 +279,22 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
 	ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_id_aa64smfr0[] = {
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I16I64_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F64F64_SHIFT, 1, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I8I32_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F16F32_SHIFT, 1, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_B16F32_SHIFT, 1, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+		       FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F32F32_SHIFT, 1, 0),
+	ARM64_FTR_END,
+};
+
 static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_ECV_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_FGT_SHIFT, 4, 0),
@@ -624,6 +641,7 @@ static const struct __ftr_reg_entry {
 	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1,
 			       &id_aa64pfr1_override),
 	ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0),
+	ARM64_FTR_REG(SYS_ID_AA64SMFR0_EL1, ftr_id_aa64smfr0),
 
 	/* Op1 = 0, CRn = 0, CRm = 5 */
 	ARM64_FTR_REG(SYS_ID_AA64DFR0_EL1, ftr_id_aa64dfr0),
@@ -935,6 +953,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 	init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0);
 	init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
 	init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
+	init_cpu_ftr_reg(SYS_ID_AA64SMFR0_EL1, info->reg_id_aa64smfr0);
 
 	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
 		init_32bit_cpu_features(&info->aarch32);
@@ -2332,6 +2351,19 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		.min_field_value = 1,
 	},
+#ifdef CONFIG_ARM64_SME
+	{
+		.desc = "Scalable Matrix Extension",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_SME,
+		.sys_reg = SYS_ID_AA64PFR1_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR1_SME_SHIFT,
+		.min_field_value = ID_AA64PFR1_SME,
+		.matches = has_cpuid_feature,
+		.cpu_enable = sme_kernel_enable,
+	},
+#endif /* CONFIG_ARM64_SME */
 	{},
 };
 
@@ -2451,6 +2483,15 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
 #ifdef CONFIG_ARM64_MTE
 	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE),
 #endif /* CONFIG_ARM64_MTE */
+#ifdef CONFIG_ARM64_SME
+	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SME_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_SME, CAP_HWCAP, KERNEL_HWCAP_SME),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I16I64_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_I16I64, CAP_HWCAP, KERNEL_HWCAP_SME_I16I64),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F64F64_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_F64F64, CAP_HWCAP, KERNEL_HWCAP_SME_F64F64),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I8I32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_I8I32, CAP_HWCAP, KERNEL_HWCAP_SME_I8I32),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F16F32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_F16F32, CAP_HWCAP, KERNEL_HWCAP_SME_F16F32),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_B16F32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_B16F32, CAP_HWCAP, KERNEL_HWCAP_SME_B16F32),
+	HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F32F32_SHIFT, FTR_UNSIGNED, ID_AA64SMFR0_F32F32, CAP_HWCAP, KERNEL_HWCAP_SME_F32F32),
+#endif /* CONFIG_ARM64_SME */
 	{},
 };
 
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 87731fea5e41..9830fa0c7647 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -94,6 +94,13 @@ static const char *const hwcap_str[] = {
 	[KERNEL_HWCAP_RNG]		= "rng",
 	[KERNEL_HWCAP_BTI]		= "bti",
 	[KERNEL_HWCAP_MTE]		= "mte",
+	[KERNEL_HWCAP_SME]		= "sme",
+	[KERNEL_HWCAP_SME_I16I64]	= "smei16i64",
+	[KERNEL_HWCAP_SME_F64F64]	= "smef64f64",
+	[KERNEL_HWCAP_SME_I8I32]	= "smei8i32",
+	[KERNEL_HWCAP_SME_F16F32]	= "smef16f32",
+	[KERNEL_HWCAP_SME_B16F32]	= "smeb16f32",
+	[KERNEL_HWCAP_SME_F32F32]	= "smef32f32",
 };
 
 #ifdef CONFIG_COMPAT
@@ -396,6 +403,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
 	info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
 	info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
+	info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
 
 	if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
 		info->reg_gmid = read_cpuid(GMID_EL1);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b3d4786e2601..bf698c3eaed3 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -986,6 +986,21 @@ void fpsimd_release_task(struct task_struct *dead_task)
 
 #endif /* CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
+{
+	/* Set priority for all PEs to architecturally defined minimum */
+	write_sysreg_s(read_sysreg_s(SYS_SMPRI_EL1) & ~SMPRI_EL1_PRIORITY_MASK,
+		       SYS_SMPRI_EL1);
+
+	/* Allow SME in kernel */
+	write_sysreg(read_sysreg(CPACR_EL1) | CPACR_EL1_SMEN_EL1EN, CPACR_EL1);
+	isb();
+}
+
+#endif /* CONFIG_ARM64_SVE */
+
 /*
  * Trapped SVE access
  *
@@ -1528,6 +1543,10 @@ static int __init fpsimd_init(void)
 	if (!cpu_have_named_feature(ASIMD))
 		pr_notice("Advanced SIMD is not implemented\n");
 
+
+	if (cpu_have_named_feature(SME) && !cpu_have_named_feature(SVE))
+		pr_notice("SME is implemented but not SVE\n");
+
 	return sve_sysctl_init();
 }
 core_initcall(fpsimd_init);
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 49305c2e6dfd..a21a82d3cd64 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -39,6 +39,7 @@ HW_DBM
 KVM_PROTECTED_MODE
 MISMATCHED_CACHE_TYPE
 MTE
+SME
 SPECTRE_V2
 SPECTRE_V3A
 SPECTRE_V4
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 17/38] arm64/sme: Identify supported SME vector lengths at boot
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The vector lengths used for SME are controlled through a similar set of
registers to those for SVE and enumerated using a similar algorithm with
some slight differences due to the fact that unlike SVE there are no
restrictions on which combinations of vector lengths can be supported
nor any mandatory vector lengths which must be implemented.  Add a new
vector type and implement support for enumerating it.

One slightly awkward feature is that we need to read the current vector
length using the SVE RVDL instruction while in streaming mode.  Rather
than add an ops structure we add special cases directly in the otherwise
generic vec_probe_vqs() function, this is a bit inelegant but it's the
only place where this is an issue.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/cpu.h        |   3 +
 arch/arm64/include/asm/cpufeature.h |   7 ++
 arch/arm64/include/asm/fpsimd.h     |  35 ++++++++
 arch/arm64/include/asm/processor.h  |   1 +
 arch/arm64/kernel/cpufeature.c      |  49 +++++++++++
 arch/arm64/kernel/cpuinfo.c         |   4 +
 arch/arm64/kernel/fpsimd.c          | 131 +++++++++++++++++++++++++++-
 7 files changed, 229 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 667b66fe1a53..707f30dccbf1 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -63,6 +63,9 @@ struct cpuinfo_arm64 {
 
 	/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
 	u64		reg_zcr;
+
+	/* pseudo-SMCR for recording maximum ZCR_EL1 LEN value: */
+	u64		reg_smcr;
 };
 
 DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data);
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index e1b745bf5fbe..9a183267b341 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -619,6 +619,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0)
 	return val > 0;
 }
 
+static inline bool id_aa64pfr1_sme(u64 pfr1)
+{
+	u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_SME_SHIFT);
+
+	return val > 0;
+}
+
 static inline bool id_aa64pfr1_mte(u64 pfr1)
 {
 	u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_MTE_SHIFT);
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 160f4f246db8..f58c4be03ba2 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -77,6 +77,7 @@ extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_zcr_features(void);
+extern u64 read_smcr_features(void);
 
 /*
  * Helpers to translate bit indices in sve_vq_map to VQ values (and
@@ -131,6 +132,12 @@ static inline void sve_user_enable(void)
 			write_sysreg_s(__new, (reg));	\
 	} while (0)
 
+static inline void sme_set_svcr(u64 val)
+{
+	sysreg_clear_set_s(SYS_SVCR_EL0, SYS_SVCR_EL0_ZA_MASK |
+			   SYS_SVCR_EL0_SM_MASK, val);
+}
+
 /*
  * Probing and setup functions.
  * Calls to these functions must be serialised with one another.
@@ -175,6 +182,12 @@ static inline void write_vl(enum vec_type type, u64 val)
 		tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK;
 		write_sysreg_s(tmp | val, SYS_ZCR_EL1);
 		break;
+#endif
+#ifdef CONFIG_ARM64_SME
+	case ARM64_VEC_SME:
+		tmp = read_sysreg_s(SYS_SMCR_EL1) & ~SMCR_ELx_LEN_MASK;
+		write_sysreg_s(tmp | val, SYS_SMCR_EL1);
+		break;
 #endif
 	default:
 		WARN_ON_ONCE(1);
@@ -244,6 +257,28 @@ static inline void sve_setup(void) { }
 
 #endif /* ! CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+extern void __init sme_setup(void);
+
+static inline int sme_max_vl(void)
+{
+	return vec_max_vl(ARM64_VEC_SME);
+}
+
+static inline int sme_max_virtualisable_vl(void)
+{
+	return vec_max_virtualisable_vl(ARM64_VEC_SME);
+}
+
+#else
+
+static inline void sme_setup(void) { }
+static inline int sme_max_vl(void) { return 0; }
+static inline int sme_max_virtualisable_vl(void) { return 0; }
+
+#endif /* ! CONFIG_ARM64_SME */
+
 /* For use by EFI runtime services calls only */
 extern void __efi_fpsimd_begin(void);
 extern void __efi_fpsimd_end(void);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 9b854e8196df..575a1fe719b7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -117,6 +117,7 @@ struct debug_info {
 
 enum vec_type {
 	ARM64_VEC_SVE = 0,
+	ARM64_VEC_SME,
 	ARM64_VEC_MAX,
 };
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9d3e87ba5d5a..84aec4704885 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -562,6 +562,12 @@ static const struct arm64_ftr_bits ftr_zcr[] = {
 	ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_smcr[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
+		SMCR_ELx_LEN_SHIFT, SMCR_ELx_LEN_SIZE, 0),	/* LEN */
+	ARM64_FTR_END,
+};
+
 /*
  * Common ftr bits for a 32bit register with all hidden, strict
  * attributes, with 4bit feature fields and a default safe value of
@@ -660,6 +666,7 @@ static const struct __ftr_reg_entry {
 
 	/* Op1 = 0, CRn = 1, CRm = 2 */
 	ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr),
+	ARM64_FTR_REG(SYS_SMCR_EL1, ftr_smcr),
 
 	/* Op1 = 1, CRn = 0, CRm = 0 */
 	ARM64_FTR_REG(SYS_GMID_EL1, ftr_gmid),
@@ -963,6 +970,14 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 		vec_init_vq_map(ARM64_VEC_SVE);
 	}
 
+	if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) {
+		init_cpu_ftr_reg(SYS_SMCR_EL1, info->reg_smcr);
+		if (IS_ENABLED(CONFIG_ARM64_SME)) {
+			sme_kernel_enable(NULL);
+			vec_init_vq_map(ARM64_VEC_SME);
+		}
+	}
+
 	if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
 		init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid);
 
@@ -1187,6 +1202,9 @@ void update_cpu_features(int cpu,
 	taint |= check_update_ftr_reg(SYS_ID_AA64ZFR0_EL1, cpu,
 				      info->reg_id_aa64zfr0, boot->reg_id_aa64zfr0);
 
+	taint |= check_update_ftr_reg(SYS_ID_AA64SMFR0_EL1, cpu,
+				      info->reg_id_aa64smfr0, boot->reg_id_aa64smfr0);
+
 	if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
 		taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu,
 					info->reg_zcr, boot->reg_zcr);
@@ -1197,6 +1215,16 @@ void update_cpu_features(int cpu,
 			vec_update_vq_map(ARM64_VEC_SVE);
 	}
 
+	if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) {
+		taint |= check_update_ftr_reg(SYS_SMCR_EL1, cpu,
+					info->reg_smcr, boot->reg_smcr);
+
+		/* Probe vector lengths, unless we already gave up on SME */
+		if (id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1)) &&
+		    !system_capabilities_finalized())
+			vec_update_vq_map(ARM64_VEC_SME);
+	}
+
 	/*
 	 * The kernel uses the LDGM/STGM instructions and the number of tags
 	 * they read/write depends on the GMID_EL1.BS field. Check that the
@@ -2789,6 +2817,23 @@ static void verify_sve_features(void)
 	/* Add checks on other ZCR bits here if necessary */
 }
 
+static void verify_sme_features(void)
+{
+	u64 safe_smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1);
+	u64 smcr = read_smcr_features();
+
+	unsigned int safe_len = safe_smcr & SMCR_ELx_LEN_MASK;
+	unsigned int len = smcr & SMCR_ELx_LEN_MASK;
+
+	if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SME)) {
+		pr_crit("CPU%d: SME: vector length support mismatch\n",
+			smp_processor_id());
+		cpu_die_early();
+	}
+
+	/* Add checks on other SMCR bits here if necessary */
+}
+
 static void verify_hyp_capabilities(void)
 {
 	u64 safe_mmfr1, mmfr0, mmfr1;
@@ -2841,6 +2886,9 @@ static void verify_local_cpu_capabilities(void)
 	if (system_supports_sve())
 		verify_sve_features();
 
+	if (system_supports_sme())
+		verify_sme_features();
+
 	if (is_hyp_mode_available())
 		verify_hyp_capabilities();
 }
@@ -2957,6 +3005,7 @@ void __init setup_cpu_features(void)
 		pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
 
 	sve_setup();
+	sme_setup();
 	minsigstksz_setup();
 
 	/* Advertise that we have computed the system capabilities */
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 9830fa0c7647..572da7d1f194 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -415,6 +415,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	    id_aa64pfr0_sve(info->reg_id_aa64pfr0))
 		info->reg_zcr = read_zcr_features();
 
+	if (IS_ENABLED(CONFIG_ARM64_SME) &&
+	    id_aa64pfr1_sme(info->reg_id_aa64pfr1))
+		info->reg_smcr = read_smcr_features();
+
 	cpuinfo_detect_icache_policy(info);
 }
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index bf698c3eaed3..430c949a2438 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -132,6 +132,12 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
 		.max_virtualisable_vl	= SVE_VL_MIN,
 	},
 #endif
+#ifdef CONFIG_ARM64_SME
+	[ARM64_VEC_SME] = {
+		.type			= ARM64_VEC_SME,
+		.name			= "SME",
+	},
+#endif
 };
 
 static unsigned int vec_vl_inherit_flag(enum vec_type type)
@@ -182,6 +188,20 @@ extern void __percpu *efi_sve_state;
 
 #endif /* ! CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+static int get_sme_default_vl(void)
+{
+	return get_default_vl(ARM64_VEC_SME);
+}
+
+static void set_sme_default_vl(int val)
+{
+	set_default_vl(ARM64_VEC_SME, val);
+}
+
+#endif
+
 DEFINE_PER_CPU(bool, fpsimd_context_busy);
 EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
 
@@ -399,6 +419,8 @@ static unsigned int find_supported_vector_length(enum vec_type type,
 
 	if (vl > max_vl)
 		vl = max_vl;
+	if (vl < info->min_vl)
+		vl = info->min_vl;
 
 	bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
 			    __vq_to_bit(sve_vq_from_vl(vl)));
@@ -761,12 +783,38 @@ static void vec_probe_vqs(struct vl_info *info,
 
 	bitmap_zero(map, SVE_VQ_MAX);
 
+	/*
+	 * Enter streaming mode for SME; we don't use an op as the
+	 * vector length info is used from KVM.
+	 */
+	switch (info->type) {
+	case ARM64_VEC_SME:
+		sme_set_svcr(SYS_SVCR_EL0_SM_MASK);
+		break;
+	default:
+		break;
+	}
+
 	for (vq = SVE_VQ_MAX; vq >= SVE_VQ_MIN; --vq) {
 		write_vl(info->type, vq - 1); /* self-syncing */
+
 		vl = sve_get_vl();
+
+		/* Minimum VL identified? */
+		if (sve_vq_from_vl(vl) > vq)
+			break;
+
 		vq = sve_vq_from_vl(vl); /* skip intervening lengths */
 		set_bit(__vq_to_bit(vq), map);
 	}
+
+	switch (info->type) {
+	case ARM64_VEC_SME:
+		sme_set_svcr(0);
+		break;
+	default:
+		break;
+	}
 }
 
 /*
@@ -999,7 +1047,88 @@ void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 	isb();
 }
 
-#endif /* CONFIG_ARM64_SVE */
+/*
+ * Read the pseudo-SMCR used by cpufeatures to identify the supported
+ * vector length.
+ *
+ * Use only if SME is present.
+ * This function clobbers the SVE vector length.
+ */
+u64 read_smcr_features(void)
+{
+	u64 smcr;
+	unsigned int vq_max;
+
+	sme_kernel_enable(NULL);
+	sme_set_svcr(SYS_SVCR_EL0_SM_MASK);
+
+	/*
+	 * Set the maximum possible VL, and write zeroes to all other
+	 * bits to see if they stick.
+	 */
+	write_sysreg_s(SMCR_ELx_LEN_MASK, SYS_SMCR_EL1);
+
+	smcr = read_sysreg_s(SYS_SMCR_EL1);
+	smcr &= ~(u64)SMCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
+	vq_max = sve_vq_from_vl(sve_get_vl());
+	smcr |= vq_max - 1; /* set LEN field to maximum effective value */
+
+	sme_set_svcr(0);
+
+	return smcr;
+}
+
+void __init sme_setup(void)
+{
+	struct vl_info *info = &vl_info[ARM64_VEC_SME];
+	u64 smcr;
+	int min_bit;
+
+	if (!system_supports_sme())
+		return;
+
+	/*
+	 * SME doesn't require any particular vector length be
+	 * supported but it does require at least one.  We should have
+	 * disabled the feature entirely while bringing up CPUs but
+	 * let's double check here.
+	 */
+	WARN_ON(bitmap_empty(info->vq_map, SVE_VQ_MAX));
+
+	min_bit = find_last_bit(info->vq_map, SVE_VQ_MAX);
+	info->min_vl = sve_vl_from_vq(__bit_to_vq(min_bit));
+
+	smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1);
+	info->max_vl = sve_vl_from_vq((smcr & SMCR_ELx_LEN_MASK) + 1);
+
+	/*
+	 * Sanity-check that the max VL we determined through CPU features
+	 * corresponds properly to sme_vq_map.  If not, do our best:
+	 */
+	if (WARN_ON(info->max_vl != find_supported_vector_length(ARM64_VEC_SME,
+								 info->max_vl)))
+		info->max_vl = find_supported_vector_length(ARM64_VEC_SME,
+							    info->max_vl);
+
+	WARN_ON(info->min_vl > info->max_vl);
+
+	/*
+	 * For the default VL, pick the maximum supported value <= 32
+	 * (256 bits) if there is one since this is guaranteed not to
+	 * grow the signal frame when in streaming mode, otherwise the
+	 * minimum available VL will be used.
+	 */
+	set_sme_default_vl(find_supported_vector_length(ARM64_VEC_SME, 32));
+
+	pr_info("SME: minimum available vector length %u bytes per vector\n",
+		info->min_vl);
+	pr_info("SME: maximum available vector length %u bytes per vector\n",
+		info->max_vl);
+	pr_info("SME: default vector length %u bytes per vector\n",
+		get_sme_default_vl());
+}
+
+#endif /* CONFIG_ARM64_SME */
 
 /*
  * Trapped SVE access
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 17/38] arm64/sme: Identify supported SME vector lengths at boot
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The vector lengths used for SME are controlled through a similar set of
registers to those for SVE and enumerated using a similar algorithm with
some slight differences due to the fact that unlike SVE there are no
restrictions on which combinations of vector lengths can be supported
nor any mandatory vector lengths which must be implemented.  Add a new
vector type and implement support for enumerating it.

One slightly awkward feature is that we need to read the current vector
length using the SVE RVDL instruction while in streaming mode.  Rather
than add an ops structure we add special cases directly in the otherwise
generic vec_probe_vqs() function, this is a bit inelegant but it's the
only place where this is an issue.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/cpu.h        |   3 +
 arch/arm64/include/asm/cpufeature.h |   7 ++
 arch/arm64/include/asm/fpsimd.h     |  35 ++++++++
 arch/arm64/include/asm/processor.h  |   1 +
 arch/arm64/kernel/cpufeature.c      |  49 +++++++++++
 arch/arm64/kernel/cpuinfo.c         |   4 +
 arch/arm64/kernel/fpsimd.c          | 131 +++++++++++++++++++++++++++-
 7 files changed, 229 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h
index 667b66fe1a53..707f30dccbf1 100644
--- a/arch/arm64/include/asm/cpu.h
+++ b/arch/arm64/include/asm/cpu.h
@@ -63,6 +63,9 @@ struct cpuinfo_arm64 {
 
 	/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
 	u64		reg_zcr;
+
+	/* pseudo-SMCR for recording maximum ZCR_EL1 LEN value: */
+	u64		reg_smcr;
 };
 
 DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data);
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index e1b745bf5fbe..9a183267b341 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -619,6 +619,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0)
 	return val > 0;
 }
 
+static inline bool id_aa64pfr1_sme(u64 pfr1)
+{
+	u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_SME_SHIFT);
+
+	return val > 0;
+}
+
 static inline bool id_aa64pfr1_mte(u64 pfr1)
 {
 	u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_MTE_SHIFT);
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 160f4f246db8..f58c4be03ba2 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -77,6 +77,7 @@ extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
 extern u64 read_zcr_features(void);
+extern u64 read_smcr_features(void);
 
 /*
  * Helpers to translate bit indices in sve_vq_map to VQ values (and
@@ -131,6 +132,12 @@ static inline void sve_user_enable(void)
 			write_sysreg_s(__new, (reg));	\
 	} while (0)
 
+static inline void sme_set_svcr(u64 val)
+{
+	sysreg_clear_set_s(SYS_SVCR_EL0, SYS_SVCR_EL0_ZA_MASK |
+			   SYS_SVCR_EL0_SM_MASK, val);
+}
+
 /*
  * Probing and setup functions.
  * Calls to these functions must be serialised with one another.
@@ -175,6 +182,12 @@ static inline void write_vl(enum vec_type type, u64 val)
 		tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK;
 		write_sysreg_s(tmp | val, SYS_ZCR_EL1);
 		break;
+#endif
+#ifdef CONFIG_ARM64_SME
+	case ARM64_VEC_SME:
+		tmp = read_sysreg_s(SYS_SMCR_EL1) & ~SMCR_ELx_LEN_MASK;
+		write_sysreg_s(tmp | val, SYS_SMCR_EL1);
+		break;
 #endif
 	default:
 		WARN_ON_ONCE(1);
@@ -244,6 +257,28 @@ static inline void sve_setup(void) { }
 
 #endif /* ! CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+extern void __init sme_setup(void);
+
+static inline int sme_max_vl(void)
+{
+	return vec_max_vl(ARM64_VEC_SME);
+}
+
+static inline int sme_max_virtualisable_vl(void)
+{
+	return vec_max_virtualisable_vl(ARM64_VEC_SME);
+}
+
+#else
+
+static inline void sme_setup(void) { }
+static inline int sme_max_vl(void) { return 0; }
+static inline int sme_max_virtualisable_vl(void) { return 0; }
+
+#endif /* ! CONFIG_ARM64_SME */
+
 /* For use by EFI runtime services calls only */
 extern void __efi_fpsimd_begin(void);
 extern void __efi_fpsimd_end(void);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 9b854e8196df..575a1fe719b7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -117,6 +117,7 @@ struct debug_info {
 
 enum vec_type {
 	ARM64_VEC_SVE = 0,
+	ARM64_VEC_SME,
 	ARM64_VEC_MAX,
 };
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9d3e87ba5d5a..84aec4704885 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -562,6 +562,12 @@ static const struct arm64_ftr_bits ftr_zcr[] = {
 	ARM64_FTR_END,
 };
 
+static const struct arm64_ftr_bits ftr_smcr[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
+		SMCR_ELx_LEN_SHIFT, SMCR_ELx_LEN_SIZE, 0),	/* LEN */
+	ARM64_FTR_END,
+};
+
 /*
  * Common ftr bits for a 32bit register with all hidden, strict
  * attributes, with 4bit feature fields and a default safe value of
@@ -660,6 +666,7 @@ static const struct __ftr_reg_entry {
 
 	/* Op1 = 0, CRn = 1, CRm = 2 */
 	ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr),
+	ARM64_FTR_REG(SYS_SMCR_EL1, ftr_smcr),
 
 	/* Op1 = 1, CRn = 0, CRm = 0 */
 	ARM64_FTR_REG(SYS_GMID_EL1, ftr_gmid),
@@ -963,6 +970,14 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
 		vec_init_vq_map(ARM64_VEC_SVE);
 	}
 
+	if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) {
+		init_cpu_ftr_reg(SYS_SMCR_EL1, info->reg_smcr);
+		if (IS_ENABLED(CONFIG_ARM64_SME)) {
+			sme_kernel_enable(NULL);
+			vec_init_vq_map(ARM64_VEC_SME);
+		}
+	}
+
 	if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
 		init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid);
 
@@ -1187,6 +1202,9 @@ void update_cpu_features(int cpu,
 	taint |= check_update_ftr_reg(SYS_ID_AA64ZFR0_EL1, cpu,
 				      info->reg_id_aa64zfr0, boot->reg_id_aa64zfr0);
 
+	taint |= check_update_ftr_reg(SYS_ID_AA64SMFR0_EL1, cpu,
+				      info->reg_id_aa64smfr0, boot->reg_id_aa64smfr0);
+
 	if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
 		taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu,
 					info->reg_zcr, boot->reg_zcr);
@@ -1197,6 +1215,16 @@ void update_cpu_features(int cpu,
 			vec_update_vq_map(ARM64_VEC_SVE);
 	}
 
+	if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) {
+		taint |= check_update_ftr_reg(SYS_SMCR_EL1, cpu,
+					info->reg_smcr, boot->reg_smcr);
+
+		/* Probe vector lengths, unless we already gave up on SME */
+		if (id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1)) &&
+		    !system_capabilities_finalized())
+			vec_update_vq_map(ARM64_VEC_SME);
+	}
+
 	/*
 	 * The kernel uses the LDGM/STGM instructions and the number of tags
 	 * they read/write depends on the GMID_EL1.BS field. Check that the
@@ -2789,6 +2817,23 @@ static void verify_sve_features(void)
 	/* Add checks on other ZCR bits here if necessary */
 }
 
+static void verify_sme_features(void)
+{
+	u64 safe_smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1);
+	u64 smcr = read_smcr_features();
+
+	unsigned int safe_len = safe_smcr & SMCR_ELx_LEN_MASK;
+	unsigned int len = smcr & SMCR_ELx_LEN_MASK;
+
+	if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SME)) {
+		pr_crit("CPU%d: SME: vector length support mismatch\n",
+			smp_processor_id());
+		cpu_die_early();
+	}
+
+	/* Add checks on other SMCR bits here if necessary */
+}
+
 static void verify_hyp_capabilities(void)
 {
 	u64 safe_mmfr1, mmfr0, mmfr1;
@@ -2841,6 +2886,9 @@ static void verify_local_cpu_capabilities(void)
 	if (system_supports_sve())
 		verify_sve_features();
 
+	if (system_supports_sme())
+		verify_sme_features();
+
 	if (is_hyp_mode_available())
 		verify_hyp_capabilities();
 }
@@ -2957,6 +3005,7 @@ void __init setup_cpu_features(void)
 		pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
 
 	sve_setup();
+	sme_setup();
 	minsigstksz_setup();
 
 	/* Advertise that we have computed the system capabilities */
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 9830fa0c7647..572da7d1f194 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -415,6 +415,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
 	    id_aa64pfr0_sve(info->reg_id_aa64pfr0))
 		info->reg_zcr = read_zcr_features();
 
+	if (IS_ENABLED(CONFIG_ARM64_SME) &&
+	    id_aa64pfr1_sme(info->reg_id_aa64pfr1))
+		info->reg_smcr = read_smcr_features();
+
 	cpuinfo_detect_icache_policy(info);
 }
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index bf698c3eaed3..430c949a2438 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -132,6 +132,12 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
 		.max_virtualisable_vl	= SVE_VL_MIN,
 	},
 #endif
+#ifdef CONFIG_ARM64_SME
+	[ARM64_VEC_SME] = {
+		.type			= ARM64_VEC_SME,
+		.name			= "SME",
+	},
+#endif
 };
 
 static unsigned int vec_vl_inherit_flag(enum vec_type type)
@@ -182,6 +188,20 @@ extern void __percpu *efi_sve_state;
 
 #endif /* ! CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+static int get_sme_default_vl(void)
+{
+	return get_default_vl(ARM64_VEC_SME);
+}
+
+static void set_sme_default_vl(int val)
+{
+	set_default_vl(ARM64_VEC_SME, val);
+}
+
+#endif
+
 DEFINE_PER_CPU(bool, fpsimd_context_busy);
 EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
 
@@ -399,6 +419,8 @@ static unsigned int find_supported_vector_length(enum vec_type type,
 
 	if (vl > max_vl)
 		vl = max_vl;
+	if (vl < info->min_vl)
+		vl = info->min_vl;
 
 	bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
 			    __vq_to_bit(sve_vq_from_vl(vl)));
@@ -761,12 +783,38 @@ static void vec_probe_vqs(struct vl_info *info,
 
 	bitmap_zero(map, SVE_VQ_MAX);
 
+	/*
+	 * Enter streaming mode for SME; we don't use an op as the
+	 * vector length info is used from KVM.
+	 */
+	switch (info->type) {
+	case ARM64_VEC_SME:
+		sme_set_svcr(SYS_SVCR_EL0_SM_MASK);
+		break;
+	default:
+		break;
+	}
+
 	for (vq = SVE_VQ_MAX; vq >= SVE_VQ_MIN; --vq) {
 		write_vl(info->type, vq - 1); /* self-syncing */
+
 		vl = sve_get_vl();
+
+		/* Minimum VL identified? */
+		if (sve_vq_from_vl(vl) > vq)
+			break;
+
 		vq = sve_vq_from_vl(vl); /* skip intervening lengths */
 		set_bit(__vq_to_bit(vq), map);
 	}
+
+	switch (info->type) {
+	case ARM64_VEC_SME:
+		sme_set_svcr(0);
+		break;
+	default:
+		break;
+	}
 }
 
 /*
@@ -999,7 +1047,88 @@ void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 	isb();
 }
 
-#endif /* CONFIG_ARM64_SVE */
+/*
+ * Read the pseudo-SMCR used by cpufeatures to identify the supported
+ * vector length.
+ *
+ * Use only if SME is present.
+ * This function clobbers the SVE vector length.
+ */
+u64 read_smcr_features(void)
+{
+	u64 smcr;
+	unsigned int vq_max;
+
+	sme_kernel_enable(NULL);
+	sme_set_svcr(SYS_SVCR_EL0_SM_MASK);
+
+	/*
+	 * Set the maximum possible VL, and write zeroes to all other
+	 * bits to see if they stick.
+	 */
+	write_sysreg_s(SMCR_ELx_LEN_MASK, SYS_SMCR_EL1);
+
+	smcr = read_sysreg_s(SYS_SMCR_EL1);
+	smcr &= ~(u64)SMCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
+	vq_max = sve_vq_from_vl(sve_get_vl());
+	smcr |= vq_max - 1; /* set LEN field to maximum effective value */
+
+	sme_set_svcr(0);
+
+	return smcr;
+}
+
+void __init sme_setup(void)
+{
+	struct vl_info *info = &vl_info[ARM64_VEC_SME];
+	u64 smcr;
+	int min_bit;
+
+	if (!system_supports_sme())
+		return;
+
+	/*
+	 * SME doesn't require any particular vector length be
+	 * supported but it does require at least one.  We should have
+	 * disabled the feature entirely while bringing up CPUs but
+	 * let's double check here.
+	 */
+	WARN_ON(bitmap_empty(info->vq_map, SVE_VQ_MAX));
+
+	min_bit = find_last_bit(info->vq_map, SVE_VQ_MAX);
+	info->min_vl = sve_vl_from_vq(__bit_to_vq(min_bit));
+
+	smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1);
+	info->max_vl = sve_vl_from_vq((smcr & SMCR_ELx_LEN_MASK) + 1);
+
+	/*
+	 * Sanity-check that the max VL we determined through CPU features
+	 * corresponds properly to sme_vq_map.  If not, do our best:
+	 */
+	if (WARN_ON(info->max_vl != find_supported_vector_length(ARM64_VEC_SME,
+								 info->max_vl)))
+		info->max_vl = find_supported_vector_length(ARM64_VEC_SME,
+							    info->max_vl);
+
+	WARN_ON(info->min_vl > info->max_vl);
+
+	/*
+	 * For the default VL, pick the maximum supported value <= 32
+	 * (256 bits) if there is one since this is guaranteed not to
+	 * grow the signal frame when in streaming mode, otherwise the
+	 * minimum available VL will be used.
+	 */
+	set_sme_default_vl(find_supported_vector_length(ARM64_VEC_SME, 32));
+
+	pr_info("SME: minimum available vector length %u bytes per vector\n",
+		info->min_vl);
+	pr_info("SME: maximum available vector length %u bytes per vector\n",
+		info->max_vl);
+	pr_info("SME: default vector length %u bytes per vector\n",
+		get_sme_default_vl());
+}
+
+#endif /* CONFIG_ARM64_SME */
 
 /*
  * Trapped SVE access
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 18/38] arm64/sme: Implement sysctl to set the default vector length
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As for SVE provide a sysctl which allows the default SME vector length to
be configured.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 430c949a2438..13df5920fe1e 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -482,6 +482,30 @@ static int __init sve_sysctl_init(void)
 static int __init sve_sysctl_init(void) { return 0; }
 #endif /* ! (CONFIG_ARM64_SVE && CONFIG_SYSCTL) */
 
+#if defined(CONFIG_ARM64_SME) && defined(CONFIG_SYSCTL)
+static struct ctl_table sme_default_vl_table[] = {
+	{
+		.procname	= "sme_default_vector_length",
+		.mode		= 0644,
+		.proc_handler	= vec_proc_do_default_vl,
+		.extra1		= &vl_info[ARM64_VEC_SME],
+	},
+	{ }
+};
+
+static int __init sme_sysctl_init(void)
+{
+	if (system_supports_sme())
+		if (!register_sysctl("abi", sme_default_vl_table))
+			return -EINVAL;
+
+	return 0;
+}
+
+#else /* ! (CONFIG_ARM64_SME && CONFIG_SYSCTL) */
+static int __init sme_sysctl_init(void) { return 0; }
+#endif /* ! (CONFIG_ARM64_SME && CONFIG_SYSCTL) */
+
 #define ZREG(sve_state, vq, n) ((char *)(sve_state) +		\
 	(SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET))
 
@@ -1676,6 +1700,9 @@ static int __init fpsimd_init(void)
 	if (cpu_have_named_feature(SME) && !cpu_have_named_feature(SVE))
 		pr_notice("SME is implemented but not SVE\n");
 
-	return sve_sysctl_init();
+	sve_sysctl_init();
+	sme_sysctl_init();
+
+	return 0;
 }
 core_initcall(fpsimd_init);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 18/38] arm64/sme: Implement sysctl to set the default vector length
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As for SVE provide a sysctl which allows the default SME vector length to
be configured.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 430c949a2438..13df5920fe1e 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -482,6 +482,30 @@ static int __init sve_sysctl_init(void)
 static int __init sve_sysctl_init(void) { return 0; }
 #endif /* ! (CONFIG_ARM64_SVE && CONFIG_SYSCTL) */
 
+#if defined(CONFIG_ARM64_SME) && defined(CONFIG_SYSCTL)
+static struct ctl_table sme_default_vl_table[] = {
+	{
+		.procname	= "sme_default_vector_length",
+		.mode		= 0644,
+		.proc_handler	= vec_proc_do_default_vl,
+		.extra1		= &vl_info[ARM64_VEC_SME],
+	},
+	{ }
+};
+
+static int __init sme_sysctl_init(void)
+{
+	if (system_supports_sme())
+		if (!register_sysctl("abi", sme_default_vl_table))
+			return -EINVAL;
+
+	return 0;
+}
+
+#else /* ! (CONFIG_ARM64_SME && CONFIG_SYSCTL) */
+static int __init sme_sysctl_init(void) { return 0; }
+#endif /* ! (CONFIG_ARM64_SME && CONFIG_SYSCTL) */
+
 #define ZREG(sve_state, vq, n) ((char *)(sve_state) +		\
 	(SVE_SIG_ZREG_OFFSET(vq, n) - SVE_SIG_REGS_OFFSET))
 
@@ -1676,6 +1700,9 @@ static int __init fpsimd_init(void)
 	if (cpu_have_named_feature(SME) && !cpu_have_named_feature(SVE))
 		pr_notice("SME is implemented but not SVE\n");
 
-	return sve_sysctl_init();
+	sve_sysctl_init();
+	sme_sysctl_init();
+
+	return 0;
 }
 core_initcall(fpsimd_init);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As for SVE provide a prctl() interface which allows processes to
configure their SME vector length.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/processor.h   |  4 +++-
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/kernel/fpsimd.c           | 30 ++++++++++++++++++++++++++++
 include/uapi/linux/prctl.h           |  9 +++++++++
 kernel/sys.c                         |  6 ++++++
 5 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 575a1fe719b7..a62d2f8045bf 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -354,9 +354,11 @@ extern void __init minsigstksz_setup(void);
  */
 #include <asm/fpsimd.h>
 
-/* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */
+/* Userspace interface for PR_S[MV]E_{SET,GET}_VL prctl()s: */
 #define SVE_SET_VL(arg)	sve_set_current_vl(arg)
 #define SVE_GET_VL()	sve_get_current_vl()
+#define SME_SET_VL(arg)	sme_set_current_vl(arg)
+#define SME_GET_VL()	sme_get_current_vl()
 
 /* PR_PAC_RESET_KEYS prctl */
 #define PAC_RESET_KEYS(tsk, arg)	ptrauth_prctl_reset_keys(tsk, arg)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index d5c8ac81ce11..5c4355204f4a 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -81,6 +81,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
+#define TIF_SME_VL_INHERIT	28	/* Inherit SME vl_onexec across exec */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 13df5920fe1e..9511428c2e81 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -145,6 +145,8 @@ static unsigned int vec_vl_inherit_flag(enum vec_type type)
 	switch (type) {
 	case ARM64_VEC_SVE:
 		return TIF_SVE_VL_INHERIT;
+	case ARM64_VEC_SME:
+		return TIF_SME_VL_INHERIT;
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -800,6 +802,34 @@ int sve_get_current_vl(void)
 	return vec_prctl_status(ARM64_VEC_SVE, 0);
 }
 
+/* PR_SME_SET_VL */
+int sme_set_current_vl(unsigned long arg)
+{
+	unsigned long vl, flags;
+	int ret;
+
+	vl = arg & PR_SME_VL_LEN_MASK;
+	flags = arg & ~vl;
+
+	if (!system_supports_sme() || is_compat_task())
+		return -EINVAL;
+
+	ret = vec_set_vector_length(current, ARM64_VEC_SME, vl, flags);
+	if (ret)
+		return ret;
+
+	return vec_prctl_status(ARM64_VEC_SME, flags);
+}
+
+/* PR_SME_GET_VL */
+int sme_get_current_vl(void)
+{
+	if (!system_supports_sme() || is_compat_task())
+		return -EINVAL;
+
+	return vec_prctl_status(ARM64_VEC_SME, 0);
+}
+
 static void vec_probe_vqs(struct vl_info *info,
 			  DECLARE_BITMAP(map, SVE_VQ_MAX))
 {
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 43bd7f713c39..b3212d73c198 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -269,4 +269,13 @@ struct prctl_mm_map {
 # define PR_SCHED_CORE_SHARE_FROM	3 /* pull core_sched cookie to pid */
 # define PR_SCHED_CORE_MAX		4
 
+/* arm64 Scalable Matrix Extension controls */
+/* Flag values must be in sync with SVE versions */
+#define PR_SME_SET_VL			63	/* set task vector length */
+# define PR_SME_SET_VL_ONEXEC		(1 << 18) /* defer effect until exec */
+#define PR_SME_GET_VL			64	/* get task vector length */
+/* Bits common to PR_SME_SET_VL and PR_SME_GET_VL */
+# define PR_SME_VL_LEN_MASK		0xffff
+# define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 8fdac0d90504..bf45194ce03b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2463,6 +2463,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_SVE_GET_VL:
 		error = SVE_GET_VL();
 		break;
+	case PR_SME_SET_VL:
+		error = SME_SET_VL(arg2);
+		break;
+	case PR_SME_GET_VL:
+		error = SME_GET_VL();
+		break;
 	case PR_GET_SPECULATION_CTRL:
 		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

As for SVE provide a prctl() interface which allows processes to
configure their SME vector length.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/processor.h   |  4 +++-
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/kernel/fpsimd.c           | 30 ++++++++++++++++++++++++++++
 include/uapi/linux/prctl.h           |  9 +++++++++
 kernel/sys.c                         |  6 ++++++
 5 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 575a1fe719b7..a62d2f8045bf 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -354,9 +354,11 @@ extern void __init minsigstksz_setup(void);
  */
 #include <asm/fpsimd.h>
 
-/* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */
+/* Userspace interface for PR_S[MV]E_{SET,GET}_VL prctl()s: */
 #define SVE_SET_VL(arg)	sve_set_current_vl(arg)
 #define SVE_GET_VL()	sve_get_current_vl()
+#define SME_SET_VL(arg)	sme_set_current_vl(arg)
+#define SME_GET_VL()	sme_get_current_vl()
 
 /* PR_PAC_RESET_KEYS prctl */
 #define PAC_RESET_KEYS(tsk, arg)	ptrauth_prctl_reset_keys(tsk, arg)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index d5c8ac81ce11..5c4355204f4a 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -81,6 +81,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
+#define TIF_SME_VL_INHERIT	28	/* Inherit SME vl_onexec across exec */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 13df5920fe1e..9511428c2e81 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -145,6 +145,8 @@ static unsigned int vec_vl_inherit_flag(enum vec_type type)
 	switch (type) {
 	case ARM64_VEC_SVE:
 		return TIF_SVE_VL_INHERIT;
+	case ARM64_VEC_SME:
+		return TIF_SME_VL_INHERIT;
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
@@ -800,6 +802,34 @@ int sve_get_current_vl(void)
 	return vec_prctl_status(ARM64_VEC_SVE, 0);
 }
 
+/* PR_SME_SET_VL */
+int sme_set_current_vl(unsigned long arg)
+{
+	unsigned long vl, flags;
+	int ret;
+
+	vl = arg & PR_SME_VL_LEN_MASK;
+	flags = arg & ~vl;
+
+	if (!system_supports_sme() || is_compat_task())
+		return -EINVAL;
+
+	ret = vec_set_vector_length(current, ARM64_VEC_SME, vl, flags);
+	if (ret)
+		return ret;
+
+	return vec_prctl_status(ARM64_VEC_SME, flags);
+}
+
+/* PR_SME_GET_VL */
+int sme_get_current_vl(void)
+{
+	if (!system_supports_sme() || is_compat_task())
+		return -EINVAL;
+
+	return vec_prctl_status(ARM64_VEC_SME, 0);
+}
+
 static void vec_probe_vqs(struct vl_info *info,
 			  DECLARE_BITMAP(map, SVE_VQ_MAX))
 {
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 43bd7f713c39..b3212d73c198 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -269,4 +269,13 @@ struct prctl_mm_map {
 # define PR_SCHED_CORE_SHARE_FROM	3 /* pull core_sched cookie to pid */
 # define PR_SCHED_CORE_MAX		4
 
+/* arm64 Scalable Matrix Extension controls */
+/* Flag values must be in sync with SVE versions */
+#define PR_SME_SET_VL			63	/* set task vector length */
+# define PR_SME_SET_VL_ONEXEC		(1 << 18) /* defer effect until exec */
+#define PR_SME_GET_VL			64	/* get task vector length */
+/* Bits common to PR_SME_SET_VL and PR_SME_GET_VL */
+# define PR_SME_VL_LEN_MASK		0xffff
+# define PR_SME_VL_INHERIT		(1 << 17) /* inherit across exec */
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 8fdac0d90504..bf45194ce03b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2463,6 +2463,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_SVE_GET_VL:
 		error = SVE_GET_VL();
 		break;
+	case PR_SME_SET_VL:
+		error = SME_SET_VL(arg2);
+		break;
+	case PR_SME_GET_VL:
+		error = SME_GET_VL();
+		break;
 	case PR_GET_SPECULATION_CTRL:
 		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 20/38] arm64/sme: Implement support for TPIDR2
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The Scalable Matrix Extension introduces support for a new thread specific
data register TPIDR2 intended for use by libc. The kernel must save the
value of TPIDR2 on context switch and should ensure that all new threads
start off with a default value of 0. Add a field to the thread_struct to
store TPIDR2 and context switch it with the other thread specific data.

In case there are future extensions which also use TPIDR2 we introduce
system_supports_tpidr2() and use that rather than system_supports_sme()
for TPIDR2 handling.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 5 +++++
 arch/arm64/include/asm/processor.h  | 1 +
 arch/arm64/kernel/fpsimd.c          | 4 ++++
 arch/arm64/kernel/process.c         | 5 +++++
 4 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 9a183267b341..8d0cff410b40 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -740,6 +740,11 @@ static __always_inline bool system_supports_sme(void)
 		cpus_have_const_cap(ARM64_SME);
 }
 
+static __always_inline bool system_supports_tpidr2(void)
+{
+	return system_supports_sme();
+}
+
 static __always_inline bool system_supports_cnp(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_CNP) &&
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index a62d2f8045bf..51eca2513cb5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -168,6 +168,7 @@ struct thread_struct {
 	u64			mte_ctrl;
 #endif
 	u64			sctlr_user;
+	u64			tpidr2_el0;
 };
 
 static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9511428c2e81..e673a1d6a618 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1099,6 +1099,10 @@ void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 	/* Allow SME in kernel */
 	write_sysreg(read_sysreg(CPACR_EL1) | CPACR_EL1_SMEN_EL1EN, CPACR_EL1);
 	isb();
+
+	/* Allow EL0 to access TPIDR2 */
+	write_sysreg(read_sysreg(SCTLR_EL1) | SCTLR_ELx_ENTP2, SCTLR_EL1);
+	isb();
 }
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 40adb8cdbf5a..0640b51d4289 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -377,6 +377,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	 * as the final frame for the new task.
 	 */
 	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
+	p->thread.tpidr2_el0 = 0;
 
 	ptrace_hw_copy_thread(p);
 
@@ -386,6 +387,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 void tls_preserve_current_state(void)
 {
 	*task_user_tls(current) = read_sysreg(tpidr_el0);
+	if (system_supports_tpidr2())
+		current->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
 }
 
 static void tls_thread_switch(struct task_struct *next)
@@ -398,6 +401,8 @@ static void tls_thread_switch(struct task_struct *next)
 		write_sysreg(0, tpidrro_el0);
 
 	write_sysreg(*task_user_tls(next), tpidr_el0);
+	if (system_supports_tpidr2())
+		write_sysreg_s(next->thread.tpidr2_el0, SYS_TPIDR2_EL0);
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 20/38] arm64/sme: Implement support for TPIDR2
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The Scalable Matrix Extension introduces support for a new thread specific
data register TPIDR2 intended for use by libc. The kernel must save the
value of TPIDR2 on context switch and should ensure that all new threads
start off with a default value of 0. Add a field to the thread_struct to
store TPIDR2 and context switch it with the other thread specific data.

In case there are future extensions which also use TPIDR2 we introduce
system_supports_tpidr2() and use that rather than system_supports_sme()
for TPIDR2 handling.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 5 +++++
 arch/arm64/include/asm/processor.h  | 1 +
 arch/arm64/kernel/fpsimd.c          | 4 ++++
 arch/arm64/kernel/process.c         | 5 +++++
 4 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 9a183267b341..8d0cff410b40 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -740,6 +740,11 @@ static __always_inline bool system_supports_sme(void)
 		cpus_have_const_cap(ARM64_SME);
 }
 
+static __always_inline bool system_supports_tpidr2(void)
+{
+	return system_supports_sme();
+}
+
 static __always_inline bool system_supports_cnp(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_CNP) &&
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index a62d2f8045bf..51eca2513cb5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -168,6 +168,7 @@ struct thread_struct {
 	u64			mte_ctrl;
 #endif
 	u64			sctlr_user;
+	u64			tpidr2_el0;
 };
 
 static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9511428c2e81..e673a1d6a618 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1099,6 +1099,10 @@ void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 	/* Allow SME in kernel */
 	write_sysreg(read_sysreg(CPACR_EL1) | CPACR_EL1_SMEN_EL1EN, CPACR_EL1);
 	isb();
+
+	/* Allow EL0 to access TPIDR2 */
+	write_sysreg(read_sysreg(SCTLR_EL1) | SCTLR_ELx_ENTP2, SCTLR_EL1);
+	isb();
 }
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 40adb8cdbf5a..0640b51d4289 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -377,6 +377,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	 * as the final frame for the new task.
 	 */
 	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
+	p->thread.tpidr2_el0 = 0;
 
 	ptrace_hw_copy_thread(p);
 
@@ -386,6 +387,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 void tls_preserve_current_state(void)
 {
 	*task_user_tls(current) = read_sysreg(tpidr_el0);
+	if (system_supports_tpidr2())
+		current->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
 }
 
 static void tls_thread_switch(struct task_struct *next)
@@ -398,6 +401,8 @@ static void tls_thread_switch(struct task_struct *next)
 		write_sysreg(0, tpidrro_el0);
 
 	write_sysreg(*task_user_tls(next), tpidr_el0);
+	if (system_supports_tpidr2())
+		write_sysreg_s(next->thread.tpidr2_el0, SYS_TPIDR2_EL0);
 }
 
 /*
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 21/38] arm64/sme: Implement SVCR context switching
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In SME the use of both streaming SVE mode and ZA are tracked through
PSTATE.SM and PSTATE.ZA, visible through the system register SVCR.  In
order to context switch the floating point state for SME we need to
context switch the contents of this register as part of context
switching the floating point state.

Since changing the vector length exits streaming SVE mode and disables
ZA we also make sure we update SVCR appropraitely when setting vector
length, and similarly ensure that new threads have streaming SVE mode
and ZA disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/processor.h   |  1 +
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/kernel/fpsimd.c           | 11 +++++++++++
 arch/arm64/kernel/process.c          |  2 ++
 4 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 51eca2513cb5..3c235e165725 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -168,6 +168,7 @@ struct thread_struct {
 	u64			mte_ctrl;
 #endif
 	u64			sctlr_user;
+	u64			svcr;
 	u64			tpidr2_el0;
 };
 
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 5c4355204f4a..03cb88f63317 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -81,6 +81,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
+#define TIF_SME			27	/* SME in use */
 #define TIF_SME_VL_INHERIT	28	/* Inherit SME vl_onexec across exec */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e673a1d6a618..25193fd04860 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -355,6 +355,9 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
+	if (IS_ENABLED(CONFIG_ARM64_SME) && test_thread_flag(TIF_SME))
+		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
+
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
 		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
 		sve_load_state(sve_pffr(&current->thread),
@@ -380,6 +383,10 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
+	if (IS_ENABLED(CONFIG_ARM64_SME) &&
+	    test_thread_flag(TIF_SME))
+		current->thread.svcr = read_sysreg_s(SYS_SVCR_EL0);
+
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    test_thread_flag(TIF_SVE)) {
 		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
@@ -734,6 +741,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
 		sve_to_fpsimd(task);
 
+	if (system_supports_sme() && type == ARM64_VEC_SME)
+		task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK |
+				       SYS_SVCR_EL0_ZA_MASK);
+
 	if (task == current)
 		put_cpu_fpsimd_context();
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0640b51d4289..c5544b669a58 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -307,6 +307,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
 
+	dst->thread.svcr = 0;
+
 	/* clear any pending asynchronous tag fault raised by the parent */
 	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 21/38] arm64/sme: Implement SVCR context switching
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In SME the use of both streaming SVE mode and ZA are tracked through
PSTATE.SM and PSTATE.ZA, visible through the system register SVCR.  In
order to context switch the floating point state for SME we need to
context switch the contents of this register as part of context
switching the floating point state.

Since changing the vector length exits streaming SVE mode and disables
ZA we also make sure we update SVCR appropraitely when setting vector
length, and similarly ensure that new threads have streaming SVE mode
and ZA disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/processor.h   |  1 +
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/kernel/fpsimd.c           | 11 +++++++++++
 arch/arm64/kernel/process.c          |  2 ++
 4 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 51eca2513cb5..3c235e165725 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -168,6 +168,7 @@ struct thread_struct {
 	u64			mte_ctrl;
 #endif
 	u64			sctlr_user;
+	u64			svcr;
 	u64			tpidr2_el0;
 };
 
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 5c4355204f4a..03cb88f63317 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -81,6 +81,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
 #define TIF_SSBD		25	/* Wants SSB mitigation */
 #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
+#define TIF_SME			27	/* SME in use */
 #define TIF_SME_VL_INHERIT	28	/* Inherit SME vl_onexec across exec */
 
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e673a1d6a618..25193fd04860 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -355,6 +355,9 @@ static void task_fpsimd_load(void)
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
+	if (IS_ENABLED(CONFIG_ARM64_SME) && test_thread_flag(TIF_SME))
+		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
+
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
 		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
 		sve_load_state(sve_pffr(&current->thread),
@@ -380,6 +383,10 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
+	if (IS_ENABLED(CONFIG_ARM64_SME) &&
+	    test_thread_flag(TIF_SME))
+		current->thread.svcr = read_sysreg_s(SYS_SVCR_EL0);
+
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    test_thread_flag(TIF_SVE)) {
 		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
@@ -734,6 +741,10 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
 		sve_to_fpsimd(task);
 
+	if (system_supports_sme() && type == ARM64_VEC_SME)
+		task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK |
+				       SYS_SVCR_EL0_ZA_MASK);
+
 	if (task == current)
 		put_cpu_fpsimd_context();
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0640b51d4289..c5544b669a58 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -307,6 +307,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
 
+	dst->thread.svcr = 0;
+
 	/* clear any pending asynchronous tag fault raised by the parent */
 	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 22/38] arm64/sme: Implement streaming SVE context switching
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

When in streaming mode we need to save and restore the streaming mode
SVE register state rather than the regular SVE register state. This uses
the streaming mode vector length and omits FFR but is otherwise identical,
if TIF_SVE is enabled when we are in streaming mode then streaming mode
takes precedence.

This does not handle use of streaming SVE state with KVM, ptrace or
signals. This will be updated in further patches.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  23 +++++-
 arch/arm64/include/asm/fpsimdmacros.h |  11 +++
 arch/arm64/include/asm/processor.h    |  10 +++
 arch/arm64/kernel/entry-fpsimd.S      |   9 ++
 arch/arm64/kernel/fpsimd.c            | 114 +++++++++++++++++++++-----
 arch/arm64/kvm/fpsimd.c               |   3 +-
 6 files changed, 147 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f58c4be03ba2..43737ca91f1a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -46,11 +46,22 @@ extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
-				     void *sve_state, unsigned int sve_vl);
+				     void *sve_state, unsigned int sve_vl,
+				     unsigned int sme_vl);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
 
+static inline bool thread_sm_enabled(struct thread_struct *thread)
+{
+	return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK);
+}
+
+static inline bool thread_za_enabled(struct thread_struct *thread)
+{
+	return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_ZA_MASK);
+}
+
 /* Maximum VL that SVE/SME VL-agnostic software can transparently support */
 #define VL_ARCH_MAX 0x100
 
@@ -62,7 +73,14 @@ static inline size_t sve_ffr_offset(int vl)
 
 static inline void *sve_pffr(struct thread_struct *thread)
 {
-	return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread));
+	unsigned int vl;
+
+	if (system_supports_sme() && thread_sm_enabled(thread))
+		vl = thread_get_sme_vl(thread);
+	else
+		vl = thread_get_sve_vl(thread);
+
+	return (char *)thread->sve_state + sve_ffr_offset(vl);
 }
 
 extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
@@ -71,6 +89,7 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
+extern void sme_set_vq(unsigned long vq_minus_1);
 
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bc45bb984c49..c86fc2fc72e9 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -252,6 +252,17 @@
 921:
 .endm
 
+/* Update SMCR_EL1.LEN with the new VQ */
+.macro sme_load_vq xvqminus1, xtmp, xtmp2
+		mrs_s		\xtmp, SYS_SMCR_EL1
+		bic		\xtmp2, \xtmp, SMCR_ELx_LEN_MASK
+		orr		\xtmp2, \xtmp2, \xvqminus1
+		cmp		\xtmp2, \xtmp
+		b.eq		921f
+		msr_s		SYS_SMCR_EL1, \xtmp2	//self-synchronising
+921:
+.endm
+
 /* Preserve the first 128-bits of Znz and zero the rest. */
 .macro _sve_flush_z nz
 	_sve_check_zreg \nz
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3c235e165725..338cb03811bd 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -183,6 +183,11 @@ static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
 	return thread_get_vl(thread, ARM64_VEC_SVE);
 }
 
+static inline unsigned int thread_get_sme_vl(struct thread_struct *thread)
+{
+	return thread_get_vl(thread, ARM64_VEC_SME);
+}
+
 unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
 void task_set_vl(struct task_struct *task, enum vec_type type,
 		 unsigned long vl);
@@ -196,6 +201,11 @@ static inline unsigned int task_get_sve_vl(const struct task_struct *task)
 	return task_get_vl(task, ARM64_VEC_SVE);
 }
 
+static inline unsigned int task_get_sme_vl(const struct task_struct *task)
+{
+	return task_get_vl(task, ARM64_VEC_SME);
+}
+
 static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
 {
 	task_set_vl(task, ARM64_VEC_SVE, vl);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 2339d370bfe1..55eb45b3faa9 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -86,3 +86,12 @@ SYM_FUNC_START(sve_flush_live)
 SYM_FUNC_END(sve_flush_live)
 
 #endif /* CONFIG_ARM64_SVE */
+
+#ifdef CONFIG_ARM64_SME
+
+SYM_FUNC_START(sme_set_vq)
+	sme_load_vq x0, x1, x2
+	ret
+SYM_FUNC_END(sme_set_vq)
+
+#endif /* CONFIG_ARM64_SME */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 25193fd04860..89b94e8c81fd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -118,6 +118,7 @@ struct fpsimd_last_state_struct {
 	struct user_fpsimd_state *st;
 	void *sve_state;
 	unsigned int sve_vl;
+	unsigned int sme_vl;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -296,17 +297,28 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
 	task->thread.vl_onexec[type] = vl;
 }
 
+/*
+ * TIF_SME controls whether a task can use SME without trapping while
+ * in userspace, when TIF_SME is set then we must have storage
+ * alocated in sve_state and za_state to store the contents of both ZA
+ * and the SVE registers for both streaming and non-streaming modes.
+ *
+ * If both SVCR.ZA and SVCR.SM are disabled then at any point we
+ * may disable TIF_SME and reenable traps.
+ */
+
+
 /*
  * TIF_SVE controls whether a task can use SVE without trapping while
- * in userspace, and also the way a task's FPSIMD/SVE state is stored
- * in thread_struct.
+ * in userspace, and also (together with TIF_SME) the way a task's
+ * FPSIMD/SVE state is stored in thread_struct.
  *
  * The kernel uses this flag to track whether a user task is actively
  * using SVE, and therefore whether full SVE register state needs to
  * be tracked.  If not, the cheaper FPSIMD context handling code can
  * be used instead of the more costly SVE equivalents.
  *
- *  * TIF_SVE set:
+ *  * TIF_SVE or SVCR.SM set:
  *
  *    The task can execute SVE instructions while in userspace without
  *    trapping to the kernel.
@@ -314,7 +326,8 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
  *    corresponding Zn), P0-P15 and FFR are encoded in in
  *    task->thread.sve_state, formatted appropriately for vector
- *    length task->thread.sve_vl.
+ *    length task->thread.sve_vl or, if SVCR.SM is set,
+ *    task->thread.sme_vl.
  *
  *    task->thread.sve_state must point to a valid buffer at least
  *    sve_state_size(task) bytes in size.
@@ -352,19 +365,40 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  */
 static void task_fpsimd_load(void)
 {
+	bool restore_sve_regs = false;
+	bool restore_ffr;
+
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (IS_ENABLED(CONFIG_ARM64_SME) && test_thread_flag(TIF_SME))
-		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
-
+	/* Check if we should restore SVE first */
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
 		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
+		restore_sve_regs = true;
+		restore_ffr = true;
+	}
+
+	/* Restore SME, override SVE register configuration if needed */
+	if (system_supports_sme()) {
+		unsigned long sme_vl = task_get_sme_vl(current);
+
+		if (test_thread_flag(TIF_SME))
+			sme_set_vq(sve_vq_from_vl(sme_vl) - 1);
+
+		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
+
+		if (thread_sm_enabled(&current->thread)) {
+			restore_sve_regs = true;
+			restore_ffr = false;
+		}
+	}
+
+	if (restore_sve_regs)
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr, true);
-	} else {
+			       &current->thread.uw.fpsimd_state.fpsr,
+			       restore_ffr);
+	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
-	}
 }
 
 /*
@@ -376,6 +410,9 @@ static void fpsimd_save(void)
 	struct fpsimd_last_state_struct const *last =
 		this_cpu_ptr(&fpsimd_last_state);
 	/* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
+	bool save_sve_regs = false;
+	bool save_ffr;
+	unsigned int vl;
 
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
@@ -383,13 +420,37 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
-	if (IS_ENABLED(CONFIG_ARM64_SME) &&
-	    test_thread_flag(TIF_SME))
+	if (test_thread_flag(TIF_SVE)) {
+		save_sve_regs = true;
+		save_ffr = true;
+		vl = last->sve_vl;
+	}
+
+	if (system_supports_sme()) {
 		current->thread.svcr = read_sysreg_s(SYS_SVCR_EL0);
 
-	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
-	    test_thread_flag(TIF_SVE)) {
-		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
+		/* Are we still using SME at all? */
+		if (!(current->thread.svcr & (SYS_SVCR_EL0_ZA_MASK |
+					      SYS_SVCR_EL0_SM_MASK)))
+			clear_thread_flag(TIF_SME);
+
+		if (thread_za_enabled(&current->thread)) {
+			/* ZA state managment is not implemented yet */
+			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
+			return;
+		}
+
+		/* If we are in streaming mode override regular SVE. */
+		if (thread_sm_enabled(&current->thread)) {
+			save_sve_regs = true;
+			save_ffr = false;
+			vl = last->sme_vl;
+		}
+	}
+
+	if (IS_ENABLED(CONFIG_ARM64_SVE) && save_sve_regs) {
+		/* Get the configured VL from RDVL, will account for SM */
+		if (WARN_ON(sve_get_vl() != vl)) {
 			/*
 			 * Can't save the user regs, so current would
 			 * re-enter user with corrupt state.
@@ -400,8 +461,8 @@ static void fpsimd_save(void)
 		}
 
 		sve_save_state((char *)last->sve_state +
-					sve_ffr_offset(last->sve_vl),
-			       &last->st->fpsr, true);
+					sve_ffr_offset(vl),
+			       &last->st->fpsr, save_ffr);
 	} else {
 		fpsimd_save_state(last->st);
 	}
@@ -609,7 +670,14 @@ static void sve_to_fpsimd(struct task_struct *task)
  */
 size_t sve_state_size(struct task_struct const *task)
 {
-	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task_get_sve_vl(task)));
+	unsigned int vl = 0;
+
+	if (system_supports_sve())
+		vl = task_get_sve_vl(task);
+	if (system_supports_sme())
+		vl = max(vl, task_get_sme_vl(task));
+
+	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl));
 }
 
 /*
@@ -738,7 +806,8 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	}
 
 	fpsimd_flush_task_state(task);
-	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
+	    thread_sm_enabled(&task->thread))
 		sve_to_fpsimd(task);
 
 	if (system_supports_sme() && type == ARM64_VEC_SME)
@@ -1362,6 +1431,9 @@ void fpsimd_flush_thread(void)
 		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
 	}
 
+	if (system_supports_sme())
+		fpsimd_flush_thread_vl(ARM64_VEC_SME);
+
 	put_cpu_fpsimd_context();
 }
 
@@ -1405,6 +1477,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_state = current->thread.sve_state;
 	last->sve_vl = task_get_sve_vl(current);
+	last->sme_vl = task_get_sme_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	if (system_supports_sve()) {
@@ -1419,7 +1492,7 @@ static void fpsimd_bind_task_to_cpu(void)
 }
 
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
-			      unsigned int sve_vl)
+			      unsigned int sve_vl, unsigned int sme_vl)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1430,6 +1503,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 	last->st = st;
 	last->sve_state = sve_state;
 	last->sve_vl = sve_vl;
+	last->sme_vl = sme_vl;
 }
 
 /*
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 5621020b28de..d96871002081 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -99,7 +99,8 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
 		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
-					 vcpu->arch.sve_max_vl);
+					 vcpu->arch.sve_max_vl,
+					 0);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 22/38] arm64/sme: Implement streaming SVE context switching
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

When in streaming mode we need to save and restore the streaming mode
SVE register state rather than the regular SVE register state. This uses
the streaming mode vector length and omits FFR but is otherwise identical,
if TIF_SVE is enabled when we are in streaming mode then streaming mode
takes precedence.

This does not handle use of streaming SVE state with KVM, ptrace or
signals. This will be updated in further patches.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  23 +++++-
 arch/arm64/include/asm/fpsimdmacros.h |  11 +++
 arch/arm64/include/asm/processor.h    |  10 +++
 arch/arm64/kernel/entry-fpsimd.S      |   9 ++
 arch/arm64/kernel/fpsimd.c            | 114 +++++++++++++++++++++-----
 arch/arm64/kvm/fpsimd.c               |   3 +-
 6 files changed, 147 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f58c4be03ba2..43737ca91f1a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -46,11 +46,22 @@ extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
-				     void *sve_state, unsigned int sve_vl);
+				     void *sve_state, unsigned int sve_vl,
+				     unsigned int sme_vl);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
 
+static inline bool thread_sm_enabled(struct thread_struct *thread)
+{
+	return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK);
+}
+
+static inline bool thread_za_enabled(struct thread_struct *thread)
+{
+	return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_ZA_MASK);
+}
+
 /* Maximum VL that SVE/SME VL-agnostic software can transparently support */
 #define VL_ARCH_MAX 0x100
 
@@ -62,7 +73,14 @@ static inline size_t sve_ffr_offset(int vl)
 
 static inline void *sve_pffr(struct thread_struct *thread)
 {
-	return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread));
+	unsigned int vl;
+
+	if (system_supports_sme() && thread_sm_enabled(thread))
+		vl = thread_get_sme_vl(thread);
+	else
+		vl = thread_get_sve_vl(thread);
+
+	return (char *)thread->sve_state + sve_ffr_offset(vl);
 }
 
 extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
@@ -71,6 +89,7 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
+extern void sme_set_vq(unsigned long vq_minus_1);
 
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index bc45bb984c49..c86fc2fc72e9 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -252,6 +252,17 @@
 921:
 .endm
 
+/* Update SMCR_EL1.LEN with the new VQ */
+.macro sme_load_vq xvqminus1, xtmp, xtmp2
+		mrs_s		\xtmp, SYS_SMCR_EL1
+		bic		\xtmp2, \xtmp, SMCR_ELx_LEN_MASK
+		orr		\xtmp2, \xtmp2, \xvqminus1
+		cmp		\xtmp2, \xtmp
+		b.eq		921f
+		msr_s		SYS_SMCR_EL1, \xtmp2	//self-synchronising
+921:
+.endm
+
 /* Preserve the first 128-bits of Znz and zero the rest. */
 .macro _sve_flush_z nz
 	_sve_check_zreg \nz
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3c235e165725..338cb03811bd 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -183,6 +183,11 @@ static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
 	return thread_get_vl(thread, ARM64_VEC_SVE);
 }
 
+static inline unsigned int thread_get_sme_vl(struct thread_struct *thread)
+{
+	return thread_get_vl(thread, ARM64_VEC_SME);
+}
+
 unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
 void task_set_vl(struct task_struct *task, enum vec_type type,
 		 unsigned long vl);
@@ -196,6 +201,11 @@ static inline unsigned int task_get_sve_vl(const struct task_struct *task)
 	return task_get_vl(task, ARM64_VEC_SVE);
 }
 
+static inline unsigned int task_get_sme_vl(const struct task_struct *task)
+{
+	return task_get_vl(task, ARM64_VEC_SME);
+}
+
 static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
 {
 	task_set_vl(task, ARM64_VEC_SVE, vl);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 2339d370bfe1..55eb45b3faa9 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -86,3 +86,12 @@ SYM_FUNC_START(sve_flush_live)
 SYM_FUNC_END(sve_flush_live)
 
 #endif /* CONFIG_ARM64_SVE */
+
+#ifdef CONFIG_ARM64_SME
+
+SYM_FUNC_START(sme_set_vq)
+	sme_load_vq x0, x1, x2
+	ret
+SYM_FUNC_END(sme_set_vq)
+
+#endif /* CONFIG_ARM64_SME */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 25193fd04860..89b94e8c81fd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -118,6 +118,7 @@ struct fpsimd_last_state_struct {
 	struct user_fpsimd_state *st;
 	void *sve_state;
 	unsigned int sve_vl;
+	unsigned int sme_vl;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -296,17 +297,28 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
 	task->thread.vl_onexec[type] = vl;
 }
 
+/*
+ * TIF_SME controls whether a task can use SME without trapping while
+ * in userspace, when TIF_SME is set then we must have storage
+ * alocated in sve_state and za_state to store the contents of both ZA
+ * and the SVE registers for both streaming and non-streaming modes.
+ *
+ * If both SVCR.ZA and SVCR.SM are disabled then at any point we
+ * may disable TIF_SME and reenable traps.
+ */
+
+
 /*
  * TIF_SVE controls whether a task can use SVE without trapping while
- * in userspace, and also the way a task's FPSIMD/SVE state is stored
- * in thread_struct.
+ * in userspace, and also (together with TIF_SME) the way a task's
+ * FPSIMD/SVE state is stored in thread_struct.
  *
  * The kernel uses this flag to track whether a user task is actively
  * using SVE, and therefore whether full SVE register state needs to
  * be tracked.  If not, the cheaper FPSIMD context handling code can
  * be used instead of the more costly SVE equivalents.
  *
- *  * TIF_SVE set:
+ *  * TIF_SVE or SVCR.SM set:
  *
  *    The task can execute SVE instructions while in userspace without
  *    trapping to the kernel.
@@ -314,7 +326,8 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  *    When stored, Z0-Z31 (incorporating Vn in bits[127:0] or the
  *    corresponding Zn), P0-P15 and FFR are encoded in in
  *    task->thread.sve_state, formatted appropriately for vector
- *    length task->thread.sve_vl.
+ *    length task->thread.sve_vl or, if SVCR.SM is set,
+ *    task->thread.sme_vl.
  *
  *    task->thread.sve_state must point to a valid buffer at least
  *    sve_state_size(task) bytes in size.
@@ -352,19 +365,40 @@ void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
  */
 static void task_fpsimd_load(void)
 {
+	bool restore_sve_regs = false;
+	bool restore_ffr;
+
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
 
-	if (IS_ENABLED(CONFIG_ARM64_SME) && test_thread_flag(TIF_SME))
-		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
-
+	/* Check if we should restore SVE first */
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
 		sve_set_vq(sve_vq_from_vl(task_get_sve_vl(current)) - 1);
+		restore_sve_regs = true;
+		restore_ffr = true;
+	}
+
+	/* Restore SME, override SVE register configuration if needed */
+	if (system_supports_sme()) {
+		unsigned long sme_vl = task_get_sme_vl(current);
+
+		if (test_thread_flag(TIF_SME))
+			sme_set_vq(sve_vq_from_vl(sme_vl) - 1);
+
+		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
+
+		if (thread_sm_enabled(&current->thread)) {
+			restore_sve_regs = true;
+			restore_ffr = false;
+		}
+	}
+
+	if (restore_sve_regs)
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr, true);
-	} else {
+			       &current->thread.uw.fpsimd_state.fpsr,
+			       restore_ffr);
+	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
-	}
 }
 
 /*
@@ -376,6 +410,9 @@ static void fpsimd_save(void)
 	struct fpsimd_last_state_struct const *last =
 		this_cpu_ptr(&fpsimd_last_state);
 	/* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
+	bool save_sve_regs = false;
+	bool save_ffr;
+	unsigned int vl;
 
 	WARN_ON(!system_supports_fpsimd());
 	WARN_ON(!have_cpu_fpsimd_context());
@@ -383,13 +420,37 @@ static void fpsimd_save(void)
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
 		return;
 
-	if (IS_ENABLED(CONFIG_ARM64_SME) &&
-	    test_thread_flag(TIF_SME))
+	if (test_thread_flag(TIF_SVE)) {
+		save_sve_regs = true;
+		save_ffr = true;
+		vl = last->sve_vl;
+	}
+
+	if (system_supports_sme()) {
 		current->thread.svcr = read_sysreg_s(SYS_SVCR_EL0);
 
-	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
-	    test_thread_flag(TIF_SVE)) {
-		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
+		/* Are we still using SME at all? */
+		if (!(current->thread.svcr & (SYS_SVCR_EL0_ZA_MASK |
+					      SYS_SVCR_EL0_SM_MASK)))
+			clear_thread_flag(TIF_SME);
+
+		if (thread_za_enabled(&current->thread)) {
+			/* ZA state managment is not implemented yet */
+			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
+			return;
+		}
+
+		/* If we are in streaming mode override regular SVE. */
+		if (thread_sm_enabled(&current->thread)) {
+			save_sve_regs = true;
+			save_ffr = false;
+			vl = last->sme_vl;
+		}
+	}
+
+	if (IS_ENABLED(CONFIG_ARM64_SVE) && save_sve_regs) {
+		/* Get the configured VL from RDVL, will account for SM */
+		if (WARN_ON(sve_get_vl() != vl)) {
 			/*
 			 * Can't save the user regs, so current would
 			 * re-enter user with corrupt state.
@@ -400,8 +461,8 @@ static void fpsimd_save(void)
 		}
 
 		sve_save_state((char *)last->sve_state +
-					sve_ffr_offset(last->sve_vl),
-			       &last->st->fpsr, true);
+					sve_ffr_offset(vl),
+			       &last->st->fpsr, save_ffr);
 	} else {
 		fpsimd_save_state(last->st);
 	}
@@ -609,7 +670,14 @@ static void sve_to_fpsimd(struct task_struct *task)
  */
 size_t sve_state_size(struct task_struct const *task)
 {
-	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(task_get_sve_vl(task)));
+	unsigned int vl = 0;
+
+	if (system_supports_sve())
+		vl = task_get_sve_vl(task);
+	if (system_supports_sme())
+		vl = max(vl, task_get_sme_vl(task));
+
+	return SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl));
 }
 
 /*
@@ -738,7 +806,8 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	}
 
 	fpsimd_flush_task_state(task);
-	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+	if (test_and_clear_tsk_thread_flag(task, TIF_SVE) ||
+	    thread_sm_enabled(&task->thread))
 		sve_to_fpsimd(task);
 
 	if (system_supports_sme() && type == ARM64_VEC_SME)
@@ -1362,6 +1431,9 @@ void fpsimd_flush_thread(void)
 		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
 	}
 
+	if (system_supports_sme())
+		fpsimd_flush_thread_vl(ARM64_VEC_SME);
+
 	put_cpu_fpsimd_context();
 }
 
@@ -1405,6 +1477,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_state = current->thread.sve_state;
 	last->sve_vl = task_get_sve_vl(current);
+	last->sme_vl = task_get_sme_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	if (system_supports_sve()) {
@@ -1419,7 +1492,7 @@ static void fpsimd_bind_task_to_cpu(void)
 }
 
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
-			      unsigned int sve_vl)
+			      unsigned int sve_vl, unsigned int sme_vl)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1430,6 +1503,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 	last->st = st;
 	last->sve_state = sve_state;
 	last->sve_vl = sve_vl;
+	last->sme_vl = sme_vl;
 }
 
 /*
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 5621020b28de..d96871002081 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -99,7 +99,8 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
 		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
-					 vcpu->arch.sve_max_vl);
+					 vcpu->arch.sve_max_vl,
+					 0);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 23/38] arm64/sme: Implement ZA context switching
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Allocate space for storing ZA on first access to SME and use that to save
and restore ZA state when context switching. We do this by using the vector
form of the LDR and STR ZA instructions, these do not require streaming
mode and have implementation recommendations that they avoid contention
issues in shared SMCU implementations.

Since ZA is architecturally guaranteed to be zeroed when enabled we do not
need to explicitly zero ZA, either we will be restoring from a saved copy
or trapping on first use of SME so we know that ZA must be disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  5 ++++-
 arch/arm64/include/asm/fpsimdmacros.h | 22 ++++++++++++++++++++++
 arch/arm64/include/asm/processor.h    |  1 +
 arch/arm64/kernel/entry-fpsimd.S      | 22 ++++++++++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 16 ++++++++++------
 arch/arm64/kvm/fpsimd.c               |  2 +-
 6 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 43737ca91f1a..45f7153067bb 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -47,7 +47,7 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
-				     unsigned int sme_vl);
+				     void *za_state, unsigned int sme_vl);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
@@ -90,6 +90,8 @@ extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
 extern void sme_set_vq(unsigned long vq_minus_1);
+extern void sme_save_state(void *state, unsigned int vq_minus_1);
+extern void sme_load_state(void const *state, unsigned int vq_minus_1);
 
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
@@ -119,6 +121,7 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
 extern size_t sve_state_size(struct task_struct const *task);
 
 extern void sve_alloc(struct task_struct *task);
+extern void sme_alloc(struct task_struct *task);
 extern void fpsimd_release_task(struct task_struct *task);
 extern void fpsimd_sync_to_sve(struct task_struct *task);
 extern void sve_sync_to_fpsimd(struct task_struct *task);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index c86fc2fc72e9..146f906e9a86 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -309,3 +309,25 @@
 		ldr		w\nxtmp, [\xpfpsr, #4]
 		msr		fpcr, x\nxtmp
 .endm
+
+.macro sme_save_za nxbase, xvl, nw
+	mov	w\nw, #0
+
+423:
+	_sme_str_zav \nw, \nxbase
+	add	x\nxbase, x\nxbase, \xvl
+	add	x\nw, x\nw, #1
+	cmp	\xvl, x\nw
+	bne	423b
+.endm
+
+.macro sme_load_za nxbase, xvl, nw
+	mov	w\nw, #0
+
+423:
+	_sme_ldr_zav \nw, \nxbase
+	add	x\nxbase, x\nxbase, \xvl
+	add	x\nw, x\nw, #1
+	cmp	\xvl, x\nw
+	bne	423b
+.endm
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 338cb03811bd..e4688a58f365 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -153,6 +153,7 @@ struct thread_struct {
 
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
+	void			*za_state;	/* ZA register, if any */
 	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
 	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
 	unsigned long		fault_address;	/* fault info */
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 55eb45b3faa9..8ee5f32a81fd 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -94,4 +94,26 @@ SYM_FUNC_START(sme_set_vq)
 	ret
 SYM_FUNC_END(sme_set_vq)
 
+/*
+ * Save the SME state
+ *
+ * x0 - pointer to buffer for state
+ * x1 - Bytes per vector
+ */
+SYM_FUNC_START(sme_save_state)
+	sme_save_za 0, x1, 12
+	ret
+SYM_FUNC_END(sme_save_state)
+
+/*
+ * Load the SME state
+ *
+ * x0 - pointer to buffer for state
+ * x1 - bytes per vector
+ */
+SYM_FUNC_START(sme_load_state)
+	sme_load_za 0, x1, 12
+	ret
+SYM_FUNC_END(sme_load_state)
+
 #endif /* CONFIG_ARM64_SME */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 89b94e8c81fd..4ef95690b30e 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -117,6 +117,7 @@
 struct fpsimd_last_state_struct {
 	struct user_fpsimd_state *st;
 	void *sve_state;
+	void *za_state;
 	unsigned int sve_vl;
 	unsigned int sme_vl;
 };
@@ -387,6 +388,9 @@ static void task_fpsimd_load(void)
 
 		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
 
+		if (thread_za_enabled(&current->thread))
+			sme_load_state(current->thread.za_state, sme_vl);
+
 		if (thread_sm_enabled(&current->thread)) {
 			restore_sve_regs = true;
 			restore_ffr = false;
@@ -434,11 +438,8 @@ static void fpsimd_save(void)
 					      SYS_SVCR_EL0_SM_MASK)))
 			clear_thread_flag(TIF_SME);
 
-		if (thread_za_enabled(&current->thread)) {
-			/* ZA state managment is not implemented yet */
-			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
-			return;
-		}
+		if (thread_za_enabled(&current->thread))
+			sme_save_state(last->za_state, last->sme_vl);
 
 		/* If we are in streaming mode override regular SVE. */
 		if (thread_sm_enabled(&current->thread)) {
@@ -1476,6 +1477,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	WARN_ON(!system_supports_fpsimd());
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_state = current->thread.sve_state;
+	last->za_state = current->thread.za_state;
 	last->sve_vl = task_get_sve_vl(current);
 	last->sme_vl = task_get_sme_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
@@ -1492,7 +1494,8 @@ static void fpsimd_bind_task_to_cpu(void)
 }
 
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
-			      unsigned int sve_vl, unsigned int sme_vl)
+			      unsigned int sve_vl, void *za_state,
+			      unsigned int sme_vl)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1502,6 +1505,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 
 	last->st = st;
 	last->sve_state = sve_state;
+	last->za_state = za_state;
 	last->sve_vl = sve_vl;
 	last->sme_vl = sme_vl;
 }
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index d96871002081..007b2e8b9ae9 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -100,7 +100,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl,
-					 0);
+					 NULL, 0);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 23/38] arm64/sme: Implement ZA context switching
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Allocate space for storing ZA on first access to SME and use that to save
and restore ZA state when context switching. We do this by using the vector
form of the LDR and STR ZA instructions, these do not require streaming
mode and have implementation recommendations that they avoid contention
issues in shared SMCU implementations.

Since ZA is architecturally guaranteed to be zeroed when enabled we do not
need to explicitly zero ZA, either we will be restoring from a saved copy
or trapping on first use of SME so we know that ZA must be disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       |  5 ++++-
 arch/arm64/include/asm/fpsimdmacros.h | 22 ++++++++++++++++++++++
 arch/arm64/include/asm/processor.h    |  1 +
 arch/arm64/kernel/entry-fpsimd.S      | 22 ++++++++++++++++++++++
 arch/arm64/kernel/fpsimd.c            | 16 ++++++++++------
 arch/arm64/kvm/fpsimd.c               |  2 +-
 6 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 43737ca91f1a..45f7153067bb 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -47,7 +47,7 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
 extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
 				     void *sve_state, unsigned int sve_vl,
-				     unsigned int sme_vl);
+				     void *za_state, unsigned int sme_vl);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 extern void fpsimd_save_and_flush_cpu_state(void);
@@ -90,6 +90,8 @@ extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 extern void sve_set_vq(unsigned long vq_minus_1);
 extern void sme_set_vq(unsigned long vq_minus_1);
+extern void sme_save_state(void *state, unsigned int vq_minus_1);
+extern void sme_load_state(void const *state, unsigned int vq_minus_1);
 
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
@@ -119,6 +121,7 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
 extern size_t sve_state_size(struct task_struct const *task);
 
 extern void sve_alloc(struct task_struct *task);
+extern void sme_alloc(struct task_struct *task);
 extern void fpsimd_release_task(struct task_struct *task);
 extern void fpsimd_sync_to_sve(struct task_struct *task);
 extern void sve_sync_to_fpsimd(struct task_struct *task);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index c86fc2fc72e9..146f906e9a86 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -309,3 +309,25 @@
 		ldr		w\nxtmp, [\xpfpsr, #4]
 		msr		fpcr, x\nxtmp
 .endm
+
+.macro sme_save_za nxbase, xvl, nw
+	mov	w\nw, #0
+
+423:
+	_sme_str_zav \nw, \nxbase
+	add	x\nxbase, x\nxbase, \xvl
+	add	x\nw, x\nw, #1
+	cmp	\xvl, x\nw
+	bne	423b
+.endm
+
+.macro sme_load_za nxbase, xvl, nw
+	mov	w\nw, #0
+
+423:
+	_sme_ldr_zav \nw, \nxbase
+	add	x\nxbase, x\nxbase, \xvl
+	add	x\nw, x\nw, #1
+	cmp	\xvl, x\nw
+	bne	423b
+.endm
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 338cb03811bd..e4688a58f365 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -153,6 +153,7 @@ struct thread_struct {
 
 	unsigned int		fpsimd_cpu;
 	void			*sve_state;	/* SVE registers, if any */
+	void			*za_state;	/* ZA register, if any */
 	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
 	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
 	unsigned long		fault_address;	/* fault info */
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 55eb45b3faa9..8ee5f32a81fd 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -94,4 +94,26 @@ SYM_FUNC_START(sme_set_vq)
 	ret
 SYM_FUNC_END(sme_set_vq)
 
+/*
+ * Save the SME state
+ *
+ * x0 - pointer to buffer for state
+ * x1 - Bytes per vector
+ */
+SYM_FUNC_START(sme_save_state)
+	sme_save_za 0, x1, 12
+	ret
+SYM_FUNC_END(sme_save_state)
+
+/*
+ * Load the SME state
+ *
+ * x0 - pointer to buffer for state
+ * x1 - bytes per vector
+ */
+SYM_FUNC_START(sme_load_state)
+	sme_load_za 0, x1, 12
+	ret
+SYM_FUNC_END(sme_load_state)
+
 #endif /* CONFIG_ARM64_SME */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 89b94e8c81fd..4ef95690b30e 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -117,6 +117,7 @@
 struct fpsimd_last_state_struct {
 	struct user_fpsimd_state *st;
 	void *sve_state;
+	void *za_state;
 	unsigned int sve_vl;
 	unsigned int sme_vl;
 };
@@ -387,6 +388,9 @@ static void task_fpsimd_load(void)
 
 		write_sysreg_s(current->thread.svcr, SYS_SVCR_EL0);
 
+		if (thread_za_enabled(&current->thread))
+			sme_load_state(current->thread.za_state, sme_vl);
+
 		if (thread_sm_enabled(&current->thread)) {
 			restore_sve_regs = true;
 			restore_ffr = false;
@@ -434,11 +438,8 @@ static void fpsimd_save(void)
 					      SYS_SVCR_EL0_SM_MASK)))
 			clear_thread_flag(TIF_SME);
 
-		if (thread_za_enabled(&current->thread)) {
-			/* ZA state managment is not implemented yet */
-			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
-			return;
-		}
+		if (thread_za_enabled(&current->thread))
+			sme_save_state(last->za_state, last->sme_vl);
 
 		/* If we are in streaming mode override regular SVE. */
 		if (thread_sm_enabled(&current->thread)) {
@@ -1476,6 +1477,7 @@ static void fpsimd_bind_task_to_cpu(void)
 	WARN_ON(!system_supports_fpsimd());
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_state = current->thread.sve_state;
+	last->za_state = current->thread.za_state;
 	last->sve_vl = task_get_sve_vl(current);
 	last->sme_vl = task_get_sme_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
@@ -1492,7 +1494,8 @@ static void fpsimd_bind_task_to_cpu(void)
 }
 
 void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
-			      unsigned int sve_vl, unsigned int sme_vl)
+			      unsigned int sve_vl, void *za_state,
+			      unsigned int sme_vl)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1502,6 +1505,7 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
 
 	last->st = st;
 	last->sve_state = sve_state;
+	last->za_state = za_state;
 	last->sve_vl = sve_vl;
 	last->sme_vl = sme_vl;
 }
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index d96871002081..007b2e8b9ae9 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -100,7 +100,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl,
-					 0);
+					 NULL, 0);
 
 		clear_thread_flag(TIF_FOREIGN_FPSTATE);
 		update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 24/38] arm64/sme: Implement traps and syscall handling for SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

By default all SME operations in userspace will trap.  When this happens
we allocate storage space for the SME register state, set up the SVE
registers and disable traps.  We do not need to initialize ZA since the
architecture guarantees that it will be zeroed when enabled and when we
trap ZA is disabled.

On syscall we exit streaming mode if we were previously in it and ensure
that all but the lower 128 bits of the registers are zeroed while
preserving the state of ZA. This follows the aarch64 PCS for SME, ZA
state is preserved over a function call and streaming mode is exited.
Since the traps for SME do not distinguish between streaming mode SVE
and ZA usage if ZA is in use rather than reenabling traps we instead
zero the parts of the SVE registers not shared with FPSIMD and leave SME
enabled, this simplifies handling SME traps. If ZA is not in use then we
reenable SME traps and fall through to normal handling of SVE.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/exception.h |   1 +
 arch/arm64/include/asm/fpsimd.h    |  10 ++
 arch/arm64/kernel/entry-common.c   |  10 ++
 arch/arm64/kernel/fpsimd.c         | 165 +++++++++++++++++++++++++----
 arch/arm64/kernel/process.c        |  12 ++-
 arch/arm64/kernel/syscall.c        |  49 +++++++--
 6 files changed, 218 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index 339477dca551..2add7f33b7c2 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -64,6 +64,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 			struct pt_regs *regs);
 void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs);
 void do_sve_acc(unsigned int esr, struct pt_regs *regs);
+void do_sme_acc(unsigned int esr, struct pt_regs *regs);
 void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs);
 void do_sysinstr(unsigned int esr, struct pt_regs *regs);
 void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs);
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 45f7153067bb..b087f9868e7e 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -154,6 +154,16 @@ static inline void sve_user_enable(void)
 			write_sysreg_s(__new, (reg));	\
 	} while (0)
 
+static inline void sme_user_disable(void)
+{
+	sysreg_clear_set(cpacr_el1, CPACR_EL1_SMEN_EL0EN, 0);
+}
+
+static inline void sme_user_enable(void)
+{
+	sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_SMEN_EL0EN);
+}
+
 static inline void sme_set_svcr(u64 val)
 {
 	sysreg_clear_set_s(SYS_SVCR_EL0, SYS_SVCR_EL0_ZA_MASK |
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe..ed34b783244f 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -516,6 +516,13 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr)
 	exit_to_user_mode(regs);
 }
 
+static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr)
+{
+	enter_from_user_mode(regs);
+	local_daif_restore(DAIF_PROCCTX);
+	do_sme_acc(esr, regs);
+}
+
 static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr)
 {
 	enter_from_user_mode(regs);
@@ -624,6 +631,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 	case ESR_ELx_EC_SVE:
 		el0_sve_acc(regs, esr);
 		break;
+	case ESR_ELx_EC_SME:
+		el0_sme_acc(regs, esr);
+		break;
 	case ESR_ELx_EC_FP_EXC64:
 		el0_fpsimd_exc(regs, esr);
 		break;
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4ef95690b30e..e66ddb966192 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -204,6 +204,12 @@ static void set_sme_default_vl(int val)
 	set_default_vl(ARM64_VEC_SME, val);
 }
 
+static void sme_free(struct task_struct *);
+
+#else
+
+static inline void sme_free(struct task_struct *t) { }
+
 #endif
 
 DEFINE_PER_CPU(bool, fpsimd_context_busy);
@@ -811,18 +817,22 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	    thread_sm_enabled(&task->thread))
 		sve_to_fpsimd(task);
 
-	if (system_supports_sme() && type == ARM64_VEC_SME)
+	if (system_supports_sme() && type == ARM64_VEC_SME) {
 		task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK |
 				       SYS_SVCR_EL0_ZA_MASK);
+		clear_thread_flag(TIF_SME);
+	}
 
 	if (task == current)
 		put_cpu_fpsimd_context();
 
 	/*
-	 * Force reallocation of task SVE state to the correct size
-	 * on next use:
+	 * Force reallocation of task SVE and SME state to the correct
+	 * size on next use:
 	 */
 	sve_free(task);
+	if (system_supports_sme() && type == ARM64_VEC_SME)
+		sme_free(task);
 
 	task_set_vl(task, type, vl);
 
@@ -1165,12 +1175,55 @@ void __init sve_setup(void)
 void fpsimd_release_task(struct task_struct *dead_task)
 {
 	__sve_free(dead_task);
+	sme_free(dead_task);
 }
 
 #endif /* CONFIG_ARM64_SVE */
 
 #ifdef CONFIG_ARM64_SME
 
+/* This will move to uapi/asm/sigcontext.h when signals are implemented */
+#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
+
+/*
+ * Return how many bytes of memory are required to store the full SME
+ * specific state (currently just ZA) for task, given task's currently
+ * configured vector length.
+ */
+size_t za_state_size(struct task_struct const *task)
+{
+	unsigned int vl = task_get_sme_vl(task);
+
+	return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
+}
+
+/*
+ * Ensure that task->thread.za_state is allocated and sufficiently large.
+ *
+ * This function should be used only in preparation for replacing
+ * task->thread.za_state with new data.  The memory is always zeroed
+ * here to prevent stale data from showing through: this is done in
+ * the interest of testability and predictability, the architecture
+ * guarantees that when ZA is enabled it will be zeroed.
+ */
+void sme_alloc(struct task_struct *task)
+{
+	if (task->thread.za_state) {
+		memset(task->thread.za_state, 0, za_state_size(task));
+		return;
+	}
+
+	/* This could potentially be up to 64K. */
+	task->thread.za_state =
+		kzalloc(za_state_size(task), GFP_KERNEL);
+}
+
+static void sme_free(struct task_struct *task)
+{
+	kfree(task->thread.za_state);
+	task->thread.za_state = NULL;
+}
+
 void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 {
 	/* Set priority for all PEs to architecturally defined minimum */
@@ -1269,6 +1322,26 @@ void __init sme_setup(void)
 
 #endif /* CONFIG_ARM64_SME */
 
+static void sve_init_regs(void)
+{
+	/*
+	 * Convert the FPSIMD state to SVE, zeroing all the state that
+	 * is not shared with FPSIMD. If (as is likely) the current
+	 * state is live in the registers then do this there and
+	 * update our metadata for the current task including
+	 * disabling the trap, otherwise update our in-memory copy.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+               unsigned long vq_minus_one =
+                       sve_vq_from_vl(task_get_sve_vl(current)) - 1;
+		sve_set_vq(vq_minus_one);
+		sve_flush_live(true, vq_minus_one);
+		fpsimd_bind_task_to_cpu();
+	} else {
+		fpsimd_to_sve(current);
+	}
+}
+
 /*
  * Trapped SVE access
  *
@@ -1299,23 +1372,68 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	if (test_and_set_thread_flag(TIF_SVE))
 		WARN_ON(1); /* SVE access shouldn't have trapped */
 
-	/*
-	 * Convert the FPSIMD state to SVE, zeroing all the state that
-	 * is not shared with FPSIMD. If (as is likely) the current
-	 * state is live in the registers then do this there and
-	 * update our metadata for the current task including
-	 * disabling the trap, otherwise update our in-memory copy.
-	 */
+	sve_init_regs();
+
+	put_cpu_fpsimd_context();
+}
+
+/*
+ * Trapped SME access
+ *
+ * Storage is allocated for the full SVE and SME state, the current
+ * FPSIMD register contents are migrated to SVE if SVE is not already
+ * active, and the access trap is disabled.
+ *
+ * TIF_SME should be clear on entry: otherwise, fpsimd_restore_current_state()
+ * would have disabled the SME access trap for userspace during
+ * ret_to_user, making an SVE access trap impossible in that case.
+ */
+void do_sme_acc(unsigned int esr, struct pt_regs *regs)
+{
+	/* Even if we chose not to use SME, the hardware could still trap: */
+	if (unlikely(!system_supports_sme()) || WARN_ON(is_compat_task())) {
+		force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0);
+		return;
+	}
+
+	sve_alloc(current);
+	sme_alloc(current);
+	if (!current->thread.sve_state || !current->thread.za_state) {
+		force_sig(SIGKILL);
+		return;
+	}
+
+	get_cpu_fpsimd_context();
+
+	/* With TIF_SME userspace shouldn't generate any traps */
+	if (test_and_set_thread_flag(TIF_SME))
+		WARN_ON(1);
+
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		unsigned long vq_minus_one =
-			sve_vq_from_vl(task_get_sve_vl(current)) - 1;
-		sve_set_vq(vq_minus_one);
-		sve_flush_live(true, vq_minus_one);
+			sve_vq_from_vl(task_get_sme_vl(current)) - 1;
+		sme_set_vq(vq_minus_one);
+
+		/*
+		 * SME was not enabled for userspace so SM and ZA must
+		 * be off
+		 */
+		sme_set_svcr(0);
+
 		fpsimd_bind_task_to_cpu();
-	} else {
-		fpsimd_to_sve(current);
 	}
 
+	/*
+	 * If SVE was not already active initialise the SVE registers,
+	 * any non-shared state between the streaming and regular SVE
+	 * registers is architecturally guaranteed to be zeroed when
+	 * we enter streaming mode.  We do not need to initialize ZA
+	 * since ZA must be disabled at this point and enabling ZA is
+	 * architecturally defined to zero ZA.
+	 */
+	if (system_supports_sve() && !test_thread_flag(TIF_SVE))
+		sve_init_regs();
+
 	put_cpu_fpsimd_context();
 }
 
@@ -1432,8 +1550,11 @@ void fpsimd_flush_thread(void)
 		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
 	}
 
-	if (system_supports_sme())
+	if (system_supports_sme()) {
+		clear_thread_flag(TIF_SME);
+		sme_free(current);
 		fpsimd_flush_thread_vl(ARM64_VEC_SME);
+	}
 
 	put_cpu_fpsimd_context();
 }
@@ -1482,14 +1603,22 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->sme_vl = task_get_sme_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
+	/*
+	 * Toggle SVE and SME trapping for userspace if needed, these
+	 * are serialsied by ret_to_user()
+	 */
 	if (system_supports_sve()) {
-		/* Toggle SVE trapping for userspace if needed */
 		if (test_thread_flag(TIF_SVE))
 			sve_user_enable();
 		else
 			sve_user_disable();
+	}
 
-		/* Serialised by exception return to user */
+	if (system_supports_sme()) {
+		if (test_thread_flag(TIF_SME))
+			sme_user_enable();
+		else
+			sme_user_disable();
 	}
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c5544b669a58..fa8d4cbce132 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -296,17 +296,19 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	BUILD_BUG_ON(!IS_ENABLED(CONFIG_THREAD_INFO_IN_TASK));
 
 	/*
-	 * Detach src's sve_state (if any) from dst so that it does not
-	 * get erroneously used or freed prematurely.  dst's sve_state
-	 * will be allocated on demand later on if dst uses SVE.
-	 * For consistency, also clear TIF_SVE here: this could be done
+	 * Detach src's sve/za_state (if any) from dst so that it does not
+	 * get erroneously used or freed prematurely.  dst's copies
+	 * will be allocated on demand later on if dst uses SVE/SME.
+	 * For consistency, also clear TIF_SVE/SME here: this could be done
 	 * later in copy_process(), but to avoid tripping up future
-	 * maintainers it is best not to leave TIF_SVE and sve_state in
+	 * maintainers it is best not to leave TIF flags and buffers in
 	 * an inconsistent state, even temporarily.
 	 */
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
 
+	dst->thread.za_state = NULL;
+	clear_tsk_thread_flag(dst, TIF_SME);
 	dst->thread.svcr = 0;
 
 	/* clear any pending asynchronous tag fault raised by the parent */
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 50a0f1a38e84..a91fa36d3fe9 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -158,26 +158,63 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	syscall_trace_exit(regs);
 }
 
-static inline void sve_user_discard(void)
+/*
+ * As per the ABI exit SME streaming mode and clear the SVE state not
+ * shared with FPSIMD on syscall entry.
+ */
+static inline void fp_user_discard(void)
 {
+	/*
+	 * If SME is active then:
+	 *  - Exit streaming mode if we were in it.
+	 *  - If ZA is not in use disable SME and fall through to disable
+	 *    SVE too.
+	 *  - If ZA is in use flush the non-shared SVE state but leave
+	 *    both SVE and SME active.
+	 */
+	if (system_supports_sme()) {
+		u64 svcr = read_sysreg_s(SYS_SVCR_EL0);
+
+		if (svcr & ~SYS_SVCR_EL0_SM_MASK) {
+			svcr &= ~SYS_SVCR_EL0_SM_MASK;
+			write_sysreg_s(svcr, SYS_SVCR_EL0);
+		}
+
+		if (svcr & SYS_SVCR_EL0_ZA_MASK) {
+			unsigned long sve_vq_minus_one =
+				sve_vq_from_vl(task_get_sve_vl(current)) - 1;
+			sve_flush_live(false, sve_vq_minus_one);
+			return;
+		} else {
+			clear_thread_flag(TIF_SME);
+			sme_user_disable();
+		}
+	}
+
 	if (!system_supports_sve())
 		return;
 
+	/*
+	 * If SME is not active then disable SVE, the registers will
+	 * be cleared when userspace next attempts to access them and
+	 * we do not need to track the SVE register state until then.
+	 */
 	clear_thread_flag(TIF_SVE);
 
 	/*
 	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
+	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which
+	 * only happens if a context switch or kernel_neon_begin() or
+	 * context modification (sigreturn, ptrace) intervenes.  So,
+	 * ensure that CPACR_EL1 is already correct for the fast-path
+	 * case.
 	 */
 	sve_user_disable();
 }
 
 void do_el0_svc(struct pt_regs *regs)
 {
-	sve_user_discard();
+	fp_user_discard();
 	el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 24/38] arm64/sme: Implement traps and syscall handling for SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

By default all SME operations in userspace will trap.  When this happens
we allocate storage space for the SME register state, set up the SVE
registers and disable traps.  We do not need to initialize ZA since the
architecture guarantees that it will be zeroed when enabled and when we
trap ZA is disabled.

On syscall we exit streaming mode if we were previously in it and ensure
that all but the lower 128 bits of the registers are zeroed while
preserving the state of ZA. This follows the aarch64 PCS for SME, ZA
state is preserved over a function call and streaming mode is exited.
Since the traps for SME do not distinguish between streaming mode SVE
and ZA usage if ZA is in use rather than reenabling traps we instead
zero the parts of the SVE registers not shared with FPSIMD and leave SME
enabled, this simplifies handling SME traps. If ZA is not in use then we
reenable SME traps and fall through to normal handling of SVE.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/exception.h |   1 +
 arch/arm64/include/asm/fpsimd.h    |  10 ++
 arch/arm64/kernel/entry-common.c   |  10 ++
 arch/arm64/kernel/fpsimd.c         | 165 +++++++++++++++++++++++++----
 arch/arm64/kernel/process.c        |  12 ++-
 arch/arm64/kernel/syscall.c        |  49 +++++++--
 6 files changed, 218 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index 339477dca551..2add7f33b7c2 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -64,6 +64,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 			struct pt_regs *regs);
 void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs);
 void do_sve_acc(unsigned int esr, struct pt_regs *regs);
+void do_sme_acc(unsigned int esr, struct pt_regs *regs);
 void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs);
 void do_sysinstr(unsigned int esr, struct pt_regs *regs);
 void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs);
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 45f7153067bb..b087f9868e7e 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -154,6 +154,16 @@ static inline void sve_user_enable(void)
 			write_sysreg_s(__new, (reg));	\
 	} while (0)
 
+static inline void sme_user_disable(void)
+{
+	sysreg_clear_set(cpacr_el1, CPACR_EL1_SMEN_EL0EN, 0);
+}
+
+static inline void sme_user_enable(void)
+{
+	sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_SMEN_EL0EN);
+}
+
 static inline void sme_set_svcr(u64 val)
 {
 	sysreg_clear_set_s(SYS_SVCR_EL0, SYS_SVCR_EL0_ZA_MASK |
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe..ed34b783244f 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -516,6 +516,13 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr)
 	exit_to_user_mode(regs);
 }
 
+static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr)
+{
+	enter_from_user_mode(regs);
+	local_daif_restore(DAIF_PROCCTX);
+	do_sme_acc(esr, regs);
+}
+
 static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr)
 {
 	enter_from_user_mode(regs);
@@ -624,6 +631,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 	case ESR_ELx_EC_SVE:
 		el0_sve_acc(regs, esr);
 		break;
+	case ESR_ELx_EC_SME:
+		el0_sme_acc(regs, esr);
+		break;
 	case ESR_ELx_EC_FP_EXC64:
 		el0_fpsimd_exc(regs, esr);
 		break;
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 4ef95690b30e..e66ddb966192 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -204,6 +204,12 @@ static void set_sme_default_vl(int val)
 	set_default_vl(ARM64_VEC_SME, val);
 }
 
+static void sme_free(struct task_struct *);
+
+#else
+
+static inline void sme_free(struct task_struct *t) { }
+
 #endif
 
 DEFINE_PER_CPU(bool, fpsimd_context_busy);
@@ -811,18 +817,22 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	    thread_sm_enabled(&task->thread))
 		sve_to_fpsimd(task);
 
-	if (system_supports_sme() && type == ARM64_VEC_SME)
+	if (system_supports_sme() && type == ARM64_VEC_SME) {
 		task->thread.svcr &= ~(SYS_SVCR_EL0_SM_MASK |
 				       SYS_SVCR_EL0_ZA_MASK);
+		clear_thread_flag(TIF_SME);
+	}
 
 	if (task == current)
 		put_cpu_fpsimd_context();
 
 	/*
-	 * Force reallocation of task SVE state to the correct size
-	 * on next use:
+	 * Force reallocation of task SVE and SME state to the correct
+	 * size on next use:
 	 */
 	sve_free(task);
+	if (system_supports_sme() && type == ARM64_VEC_SME)
+		sme_free(task);
 
 	task_set_vl(task, type, vl);
 
@@ -1165,12 +1175,55 @@ void __init sve_setup(void)
 void fpsimd_release_task(struct task_struct *dead_task)
 {
 	__sve_free(dead_task);
+	sme_free(dead_task);
 }
 
 #endif /* CONFIG_ARM64_SVE */
 
 #ifdef CONFIG_ARM64_SME
 
+/* This will move to uapi/asm/sigcontext.h when signals are implemented */
+#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
+
+/*
+ * Return how many bytes of memory are required to store the full SME
+ * specific state (currently just ZA) for task, given task's currently
+ * configured vector length.
+ */
+size_t za_state_size(struct task_struct const *task)
+{
+	unsigned int vl = task_get_sme_vl(task);
+
+	return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
+}
+
+/*
+ * Ensure that task->thread.za_state is allocated and sufficiently large.
+ *
+ * This function should be used only in preparation for replacing
+ * task->thread.za_state with new data.  The memory is always zeroed
+ * here to prevent stale data from showing through: this is done in
+ * the interest of testability and predictability, the architecture
+ * guarantees that when ZA is enabled it will be zeroed.
+ */
+void sme_alloc(struct task_struct *task)
+{
+	if (task->thread.za_state) {
+		memset(task->thread.za_state, 0, za_state_size(task));
+		return;
+	}
+
+	/* This could potentially be up to 64K. */
+	task->thread.za_state =
+		kzalloc(za_state_size(task), GFP_KERNEL);
+}
+
+static void sme_free(struct task_struct *task)
+{
+	kfree(task->thread.za_state);
+	task->thread.za_state = NULL;
+}
+
 void sme_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 {
 	/* Set priority for all PEs to architecturally defined minimum */
@@ -1269,6 +1322,26 @@ void __init sme_setup(void)
 
 #endif /* CONFIG_ARM64_SME */
 
+static void sve_init_regs(void)
+{
+	/*
+	 * Convert the FPSIMD state to SVE, zeroing all the state that
+	 * is not shared with FPSIMD. If (as is likely) the current
+	 * state is live in the registers then do this there and
+	 * update our metadata for the current task including
+	 * disabling the trap, otherwise update our in-memory copy.
+	 */
+	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+               unsigned long vq_minus_one =
+                       sve_vq_from_vl(task_get_sve_vl(current)) - 1;
+		sve_set_vq(vq_minus_one);
+		sve_flush_live(true, vq_minus_one);
+		fpsimd_bind_task_to_cpu();
+	} else {
+		fpsimd_to_sve(current);
+	}
+}
+
 /*
  * Trapped SVE access
  *
@@ -1299,23 +1372,68 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	if (test_and_set_thread_flag(TIF_SVE))
 		WARN_ON(1); /* SVE access shouldn't have trapped */
 
-	/*
-	 * Convert the FPSIMD state to SVE, zeroing all the state that
-	 * is not shared with FPSIMD. If (as is likely) the current
-	 * state is live in the registers then do this there and
-	 * update our metadata for the current task including
-	 * disabling the trap, otherwise update our in-memory copy.
-	 */
+	sve_init_regs();
+
+	put_cpu_fpsimd_context();
+}
+
+/*
+ * Trapped SME access
+ *
+ * Storage is allocated for the full SVE and SME state, the current
+ * FPSIMD register contents are migrated to SVE if SVE is not already
+ * active, and the access trap is disabled.
+ *
+ * TIF_SME should be clear on entry: otherwise, fpsimd_restore_current_state()
+ * would have disabled the SME access trap for userspace during
+ * ret_to_user, making an SVE access trap impossible in that case.
+ */
+void do_sme_acc(unsigned int esr, struct pt_regs *regs)
+{
+	/* Even if we chose not to use SME, the hardware could still trap: */
+	if (unlikely(!system_supports_sme()) || WARN_ON(is_compat_task())) {
+		force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0);
+		return;
+	}
+
+	sve_alloc(current);
+	sme_alloc(current);
+	if (!current->thread.sve_state || !current->thread.za_state) {
+		force_sig(SIGKILL);
+		return;
+	}
+
+	get_cpu_fpsimd_context();
+
+	/* With TIF_SME userspace shouldn't generate any traps */
+	if (test_and_set_thread_flag(TIF_SME))
+		WARN_ON(1);
+
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		unsigned long vq_minus_one =
-			sve_vq_from_vl(task_get_sve_vl(current)) - 1;
-		sve_set_vq(vq_minus_one);
-		sve_flush_live(true, vq_minus_one);
+			sve_vq_from_vl(task_get_sme_vl(current)) - 1;
+		sme_set_vq(vq_minus_one);
+
+		/*
+		 * SME was not enabled for userspace so SM and ZA must
+		 * be off
+		 */
+		sme_set_svcr(0);
+
 		fpsimd_bind_task_to_cpu();
-	} else {
-		fpsimd_to_sve(current);
 	}
 
+	/*
+	 * If SVE was not already active initialise the SVE registers,
+	 * any non-shared state between the streaming and regular SVE
+	 * registers is architecturally guaranteed to be zeroed when
+	 * we enter streaming mode.  We do not need to initialize ZA
+	 * since ZA must be disabled at this point and enabling ZA is
+	 * architecturally defined to zero ZA.
+	 */
+	if (system_supports_sve() && !test_thread_flag(TIF_SVE))
+		sve_init_regs();
+
 	put_cpu_fpsimd_context();
 }
 
@@ -1432,8 +1550,11 @@ void fpsimd_flush_thread(void)
 		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
 	}
 
-	if (system_supports_sme())
+	if (system_supports_sme()) {
+		clear_thread_flag(TIF_SME);
+		sme_free(current);
 		fpsimd_flush_thread_vl(ARM64_VEC_SME);
+	}
 
 	put_cpu_fpsimd_context();
 }
@@ -1482,14 +1603,22 @@ static void fpsimd_bind_task_to_cpu(void)
 	last->sme_vl = task_get_sme_vl(current);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
+	/*
+	 * Toggle SVE and SME trapping for userspace if needed, these
+	 * are serialsied by ret_to_user()
+	 */
 	if (system_supports_sve()) {
-		/* Toggle SVE trapping for userspace if needed */
 		if (test_thread_flag(TIF_SVE))
 			sve_user_enable();
 		else
 			sve_user_disable();
+	}
 
-		/* Serialised by exception return to user */
+	if (system_supports_sme()) {
+		if (test_thread_flag(TIF_SME))
+			sme_user_enable();
+		else
+			sme_user_disable();
 	}
 }
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c5544b669a58..fa8d4cbce132 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -296,17 +296,19 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	BUILD_BUG_ON(!IS_ENABLED(CONFIG_THREAD_INFO_IN_TASK));
 
 	/*
-	 * Detach src's sve_state (if any) from dst so that it does not
-	 * get erroneously used or freed prematurely.  dst's sve_state
-	 * will be allocated on demand later on if dst uses SVE.
-	 * For consistency, also clear TIF_SVE here: this could be done
+	 * Detach src's sve/za_state (if any) from dst so that it does not
+	 * get erroneously used or freed prematurely.  dst's copies
+	 * will be allocated on demand later on if dst uses SVE/SME.
+	 * For consistency, also clear TIF_SVE/SME here: this could be done
 	 * later in copy_process(), but to avoid tripping up future
-	 * maintainers it is best not to leave TIF_SVE and sve_state in
+	 * maintainers it is best not to leave TIF flags and buffers in
 	 * an inconsistent state, even temporarily.
 	 */
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
 
+	dst->thread.za_state = NULL;
+	clear_tsk_thread_flag(dst, TIF_SME);
 	dst->thread.svcr = 0;
 
 	/* clear any pending asynchronous tag fault raised by the parent */
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 50a0f1a38e84..a91fa36d3fe9 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -158,26 +158,63 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	syscall_trace_exit(regs);
 }
 
-static inline void sve_user_discard(void)
+/*
+ * As per the ABI exit SME streaming mode and clear the SVE state not
+ * shared with FPSIMD on syscall entry.
+ */
+static inline void fp_user_discard(void)
 {
+	/*
+	 * If SME is active then:
+	 *  - Exit streaming mode if we were in it.
+	 *  - If ZA is not in use disable SME and fall through to disable
+	 *    SVE too.
+	 *  - If ZA is in use flush the non-shared SVE state but leave
+	 *    both SVE and SME active.
+	 */
+	if (system_supports_sme()) {
+		u64 svcr = read_sysreg_s(SYS_SVCR_EL0);
+
+		if (svcr & ~SYS_SVCR_EL0_SM_MASK) {
+			svcr &= ~SYS_SVCR_EL0_SM_MASK;
+			write_sysreg_s(svcr, SYS_SVCR_EL0);
+		}
+
+		if (svcr & SYS_SVCR_EL0_ZA_MASK) {
+			unsigned long sve_vq_minus_one =
+				sve_vq_from_vl(task_get_sve_vl(current)) - 1;
+			sve_flush_live(false, sve_vq_minus_one);
+			return;
+		} else {
+			clear_thread_flag(TIF_SME);
+			sme_user_disable();
+		}
+	}
+
 	if (!system_supports_sve())
 		return;
 
+	/*
+	 * If SME is not active then disable SVE, the registers will
+	 * be cleared when userspace next attempts to access them and
+	 * we do not need to track the SVE register state until then.
+	 */
 	clear_thread_flag(TIF_SVE);
 
 	/*
 	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
-	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
-	 * happens if a context switch or kernel_neon_begin() or context
-	 * modification (sigreturn, ptrace) intervenes.
-	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
+	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which
+	 * only happens if a context switch or kernel_neon_begin() or
+	 * context modification (sigreturn, ptrace) intervenes.  So,
+	 * ensure that CPACR_EL1 is already correct for the fast-path
+	 * case.
 	 */
 	sve_user_disable();
 }
 
 void do_el0_svc(struct pt_regs *regs)
 {
-	sve_user_discard();
+	fp_user_discard();
 	el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 25/38] arm64/sme: Implement streaming SVE signal handling
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

When in streaming mode we have the same set of SVE registers as we do in
regular SVE mode with the exception of FFR and the use of the SME vector
length. Provide signal handling for these registers by taking one of the
reserved words in the SVE signal context as a flags field and defining a
flag with a flag which is set for streaming mode. When the flag is set the
vector length is set to the streaming mode vector length and we save and
restore streaming mode data. We support entering or leaving streaming mode
based on the value of the flag but do not support changing the vector
length, this is not currently supported SVE signal handling.

We could instead allocate a separate record in the signal frame for the
streaming mode SVE context but this inflates the size of the maximal signal
frame required and adds complication when validating signal frames from
userspace, especially given the current structure of the code.

Any implementation of support for streaming mode vectors in signals will
have some potential for causing issues for applications that attempt to
handle SVE vectors in signals, use streaming mode but do not understand
streaming mode in their signal handling code, it is hard to identify a
case that is clearly better than any other - they all have cases where
they could cause unexpected register corruption or faults.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/sigcontext.h | 16 ++++++--
 arch/arm64/kernel/signal.c               | 48 ++++++++++++++++++------
 2 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index 0c796c795dbe..3a3366d4fbc2 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -134,9 +134,12 @@ struct extra_context {
 struct sve_context {
 	struct _aarch64_ctx head;
 	__u16 vl;
-	__u16 __reserved[3];
+	__u16 flags;
+	__u16 __reserved[2];
 };
 
+#define SVE_SIG_FLAG_SM	0x1	/* Context describes streaming mode */
+
 #endif /* !__ASSEMBLY__ */
 
 #include <asm/sve_context.h>
@@ -186,9 +189,16 @@ struct sve_context {
  * sve_context.vl must equal the thread's current vector length when
  * doing a sigreturn.
  *
+ * On systems with support for SME the SVE register state may reflect either
+ * streaming or non-streaming mode.  In streaming mode the streaming mode
+ * vector length will be used and the flag SVE_SIG_FLAG_SM will be set in
+ * the flags field. It is permitted to enter or leave streaming mode in
+ * a signal return, applications should take care to ensure that any difference
+ * in vector length between the two modes is handled, including any resixing
+ * and movement of context blocks.
  *
- * Note: for all these macros, the "vq" argument denotes the SVE
- * vector length in quadwords (i.e., units of 128 bits).
+ * Note: for all these macros, the "vq" argument denotes the vector length
+ * in quadwords (i.e., units of 128 bits).
  *
  * The correct way to obtain vq is to use sve_vq_from_vl(vl).  The
  * result is valid if and only if sve_vl_valid(vl) is true.  This is
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 8f6372b44b65..fea0e1d30449 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -227,11 +227,17 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 {
 	int err = 0;
 	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
+	u16 flags = 0;
 	unsigned int vl = task_get_sve_vl(current);
 	unsigned int vq = 0;
 
-	if (test_thread_flag(TIF_SVE))
+	if (thread_sm_enabled(&current->thread)) {
+		vl = task_get_sme_vl(current);
 		vq = sve_vq_from_vl(vl);
+		flags |= SVE_SIG_FLAG_SM;
+	} else if (test_thread_flag(TIF_SVE)) {
+		vq = sve_vq_from_vl(vl);
+	}
 
 	memset(reserved, 0, sizeof(reserved));
 
@@ -239,6 +245,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
 			 &ctx->head.size, err);
 	__put_user_error(vl, &ctx->vl, err);
+	__put_user_error(flags, &ctx->flags, err);
 	BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
 	err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
 
@@ -259,18 +266,28 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 static int restore_sve_fpsimd_context(struct user_ctxs *user)
 {
 	int err;
-	unsigned int vq;
+	unsigned int vl, vq;
 	struct user_fpsimd_state fpsimd;
 	struct sve_context sve;
 
 	if (__copy_from_user(&sve, user->sve, sizeof(sve)))
 		return -EFAULT;
 
-	if (sve.vl != task_get_sve_vl(current))
+	if (sve.flags & SVE_SIG_FLAG_SM) {
+		if (!system_supports_sme())
+			return -EINVAL;
+
+		vl = task_get_sme_vl(current);
+	} else {
+		vl = task_get_sve_vl(current);
+	}
+
+	if (sve.vl != vl)
 		return -EINVAL;
 
 	if (sve.head.size <= sizeof(*user->sve)) {
 		clear_thread_flag(TIF_SVE);
+		current->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK;
 		goto fpsimd_only;
 	}
 
@@ -302,7 +319,10 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (err)
 		return -EFAULT;
 
-	set_thread_flag(TIF_SVE);
+	if (sve.flags & SVE_SIG_FLAG_SM)
+		current->thread.svcr |= SYS_SVCR_EL0_SM_MASK;
+	else
+		set_thread_flag(TIF_SVE);
 
 fpsimd_only:
 	/* copy the FP and status/control registers */
@@ -394,7 +414,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
 			break;
 
 		case SVE_MAGIC:
-			if (!system_supports_sve())
+			if (!system_supports_sve() && !system_supports_sme())
 				goto invalid;
 
 			if (user->sve)
@@ -593,11 +613,16 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 	if (system_supports_sve()) {
 		unsigned int vq = 0;
 
-		if (add_all || test_thread_flag(TIF_SVE)) {
-			int vl = sve_max_vl();
+		if (add_all || test_thread_flag(TIF_SVE) ||
+		    thread_sm_enabled(&current->thread)) {
+			int vl = max(sve_max_vl(), sme_max_vl());
 
-			if (!add_all)
-				vl = task_get_sve_vl(current);
+			if (!add_all) {
+				if (thread_sm_enabled(&current->thread))
+					vl = task_get_sme_vl(current);
+				else
+					vl = task_get_sve_vl(current);
+			}
 
 			vq = sve_vq_from_vl(vl);
 		}
@@ -648,8 +673,9 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
 		__put_user_error(current->thread.fault_code, &esr_ctx->esr, err);
 	}
 
-	/* Scalable Vector Extension state, if present */
-	if (system_supports_sve() && err == 0 && user->sve_offset) {
+	/* Scalable Vector Extension state (including streaming), if present */
+	if ((system_supports_sve() || system_supports_sme()) &&
+	    err == 0 && user->sve_offset) {
 		struct sve_context __user *sve_ctx =
 			apply_user_offset(user, user->sve_offset);
 		err |= preserve_sve_context(sve_ctx);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 25/38] arm64/sme: Implement streaming SVE signal handling
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

When in streaming mode we have the same set of SVE registers as we do in
regular SVE mode with the exception of FFR and the use of the SME vector
length. Provide signal handling for these registers by taking one of the
reserved words in the SVE signal context as a flags field and defining a
flag with a flag which is set for streaming mode. When the flag is set the
vector length is set to the streaming mode vector length and we save and
restore streaming mode data. We support entering or leaving streaming mode
based on the value of the flag but do not support changing the vector
length, this is not currently supported SVE signal handling.

We could instead allocate a separate record in the signal frame for the
streaming mode SVE context but this inflates the size of the maximal signal
frame required and adds complication when validating signal frames from
userspace, especially given the current structure of the code.

Any implementation of support for streaming mode vectors in signals will
have some potential for causing issues for applications that attempt to
handle SVE vectors in signals, use streaming mode but do not understand
streaming mode in their signal handling code, it is hard to identify a
case that is clearly better than any other - they all have cases where
they could cause unexpected register corruption or faults.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/sigcontext.h | 16 ++++++--
 arch/arm64/kernel/signal.c               | 48 ++++++++++++++++++------
 2 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index 0c796c795dbe..3a3366d4fbc2 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -134,9 +134,12 @@ struct extra_context {
 struct sve_context {
 	struct _aarch64_ctx head;
 	__u16 vl;
-	__u16 __reserved[3];
+	__u16 flags;
+	__u16 __reserved[2];
 };
 
+#define SVE_SIG_FLAG_SM	0x1	/* Context describes streaming mode */
+
 #endif /* !__ASSEMBLY__ */
 
 #include <asm/sve_context.h>
@@ -186,9 +189,16 @@ struct sve_context {
  * sve_context.vl must equal the thread's current vector length when
  * doing a sigreturn.
  *
+ * On systems with support for SME the SVE register state may reflect either
+ * streaming or non-streaming mode.  In streaming mode the streaming mode
+ * vector length will be used and the flag SVE_SIG_FLAG_SM will be set in
+ * the flags field. It is permitted to enter or leave streaming mode in
+ * a signal return, applications should take care to ensure that any difference
+ * in vector length between the two modes is handled, including any resixing
+ * and movement of context blocks.
  *
- * Note: for all these macros, the "vq" argument denotes the SVE
- * vector length in quadwords (i.e., units of 128 bits).
+ * Note: for all these macros, the "vq" argument denotes the vector length
+ * in quadwords (i.e., units of 128 bits).
  *
  * The correct way to obtain vq is to use sve_vq_from_vl(vl).  The
  * result is valid if and only if sve_vl_valid(vl) is true.  This is
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 8f6372b44b65..fea0e1d30449 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -227,11 +227,17 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 {
 	int err = 0;
 	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
+	u16 flags = 0;
 	unsigned int vl = task_get_sve_vl(current);
 	unsigned int vq = 0;
 
-	if (test_thread_flag(TIF_SVE))
+	if (thread_sm_enabled(&current->thread)) {
+		vl = task_get_sme_vl(current);
 		vq = sve_vq_from_vl(vl);
+		flags |= SVE_SIG_FLAG_SM;
+	} else if (test_thread_flag(TIF_SVE)) {
+		vq = sve_vq_from_vl(vl);
+	}
 
 	memset(reserved, 0, sizeof(reserved));
 
@@ -239,6 +245,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 	__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
 			 &ctx->head.size, err);
 	__put_user_error(vl, &ctx->vl, err);
+	__put_user_error(flags, &ctx->flags, err);
 	BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
 	err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
 
@@ -259,18 +266,28 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 static int restore_sve_fpsimd_context(struct user_ctxs *user)
 {
 	int err;
-	unsigned int vq;
+	unsigned int vl, vq;
 	struct user_fpsimd_state fpsimd;
 	struct sve_context sve;
 
 	if (__copy_from_user(&sve, user->sve, sizeof(sve)))
 		return -EFAULT;
 
-	if (sve.vl != task_get_sve_vl(current))
+	if (sve.flags & SVE_SIG_FLAG_SM) {
+		if (!system_supports_sme())
+			return -EINVAL;
+
+		vl = task_get_sme_vl(current);
+	} else {
+		vl = task_get_sve_vl(current);
+	}
+
+	if (sve.vl != vl)
 		return -EINVAL;
 
 	if (sve.head.size <= sizeof(*user->sve)) {
 		clear_thread_flag(TIF_SVE);
+		current->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK;
 		goto fpsimd_only;
 	}
 
@@ -302,7 +319,10 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
 	if (err)
 		return -EFAULT;
 
-	set_thread_flag(TIF_SVE);
+	if (sve.flags & SVE_SIG_FLAG_SM)
+		current->thread.svcr |= SYS_SVCR_EL0_SM_MASK;
+	else
+		set_thread_flag(TIF_SVE);
 
 fpsimd_only:
 	/* copy the FP and status/control registers */
@@ -394,7 +414,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
 			break;
 
 		case SVE_MAGIC:
-			if (!system_supports_sve())
+			if (!system_supports_sve() && !system_supports_sme())
 				goto invalid;
 
 			if (user->sve)
@@ -593,11 +613,16 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 	if (system_supports_sve()) {
 		unsigned int vq = 0;
 
-		if (add_all || test_thread_flag(TIF_SVE)) {
-			int vl = sve_max_vl();
+		if (add_all || test_thread_flag(TIF_SVE) ||
+		    thread_sm_enabled(&current->thread)) {
+			int vl = max(sve_max_vl(), sme_max_vl());
 
-			if (!add_all)
-				vl = task_get_sve_vl(current);
+			if (!add_all) {
+				if (thread_sm_enabled(&current->thread))
+					vl = task_get_sme_vl(current);
+				else
+					vl = task_get_sve_vl(current);
+			}
 
 			vq = sve_vq_from_vl(vl);
 		}
@@ -648,8 +673,9 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
 		__put_user_error(current->thread.fault_code, &esr_ctx->esr, err);
 	}
 
-	/* Scalable Vector Extension state, if present */
-	if (system_supports_sve() && err == 0 && user->sve_offset) {
+	/* Scalable Vector Extension state (including streaming), if present */
+	if ((system_supports_sve() || system_supports_sme()) &&
+	    err == 0 && user->sve_offset) {
 		struct sve_context __user *sve_ctx =
 			apply_user_offset(user, user->sve_offset);
 		err |= preserve_sve_context(sve_ctx);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 26/38] arm64/sme: Implement ZA signal handling
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Implement support for ZA in signal handling in a very similar way to how
we implement support for SVE registers, using a signal context structure
with optional register state after it. Where present this register state
stores the ZA matrix as a series of horizontal vectors numbered from 0 to
VL/8 in the endinanness independent format used for vectors.

As with SVE we do not allow changes in the vector length during signal
return but we do allow ZA to be enabled or disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/sigcontext.h |  41 +++++++
 arch/arm64/kernel/fpsimd.c               |   3 -
 arch/arm64/kernel/signal.c               | 139 +++++++++++++++++++++++
 3 files changed, 180 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index 3a3366d4fbc2..d45bdf2c8b26 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -140,6 +140,14 @@ struct sve_context {
 
 #define SVE_SIG_FLAG_SM	0x1	/* Context describes streaming mode */
 
+#define ZA_MAGIC	0x54366345
+
+struct za_context {
+	struct _aarch64_ctx head;
+	__u16 vl;
+	__u16 __reserved[3];
+};
+
 #endif /* !__ASSEMBLY__ */
 
 #include <asm/sve_context.h>
@@ -259,4 +267,37 @@ struct sve_context {
 #define SVE_SIG_CONTEXT_SIZE(vq) \
 		(SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
 
+/*
+ * If the ZA register is enabled for the thread at signal delivery then,
+ * za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
+ * and the register data may be accessed using the ZA_SIG_*() macros.
+ *
+ * If za_context.head.size < ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
+ * then ZA was not enabled and no register data was included in which case
+ * ZA register was not enabled for the thread and no register data
+ * the ZA_SIG_*() macros should not be used except for this check.
+ *
+ * The same convention applies when returning from a signal: a caller
+ * will need to remove or resize the za_context block if it wants to
+ * enable the ZA register when it was previously non-live or vice-versa.
+ * This may require the caller to allocate fresh memory and/or move other
+ * context blocks in the signal frame.
+ *
+ * Changing the vector length during signal return is not permitted:
+ * za_context.vl must equal the thread's current SME vector length when
+ * doing a sigreturn.
+ */
+
+#define ZA_SIG_REGS_OFFSET					\
+	((sizeof(struct za_context) + (__SVE_VQ_BYTES - 1))	\
+		/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
+
+#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
+
+#define ZA_SIG_ZAV_OFFSET(vq, n) (ZA_SIG_REGS_OFFSET + \
+				  (SVE_SIG_ZREG_SIZE(vq) * n))
+
+#define ZA_SIG_CONTEXT_SIZE(vq) \
+		(ZA_SIG_REGS_OFFSET + ZA_SIG_REGS_SIZE(vq))
+
 #endif /* _UAPI__ASM_SIGCONTEXT_H */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e66ddb966192..e39413f43660 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1182,9 +1182,6 @@ void fpsimd_release_task(struct task_struct *dead_task)
 
 #ifdef CONFIG_ARM64_SME
 
-/* This will move to uapi/asm/sigcontext.h when signals are implemented */
-#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
-
 /*
  * Return how many bytes of memory are required to store the full SME
  * specific state (currently just ZA) for task, given task's currently
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index fea0e1d30449..c0007ddf0c06 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -57,6 +57,7 @@ struct rt_sigframe_user_layout {
 	unsigned long fpsimd_offset;
 	unsigned long esr_offset;
 	unsigned long sve_offset;
+	unsigned long za_offset;
 	unsigned long extra_offset;
 	unsigned long end_offset;
 };
@@ -219,6 +220,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 struct user_ctxs {
 	struct fpsimd_context __user *fpsimd;
 	struct sve_context __user *sve;
+	struct za_context __user *za;
 };
 
 #ifdef CONFIG_ARM64_SVE
@@ -347,6 +349,101 @@ extern int restore_sve_fpsimd_context(struct user_ctxs *user);
 
 #endif /* ! CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+static int preserve_za_context(struct za_context __user *ctx)
+{
+	int err = 0;
+	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
+	unsigned int vl = task_get_sme_vl(current);
+	unsigned int vq;
+
+	if (thread_za_enabled(&current->thread))
+		vq = sve_vq_from_vl(vl);
+	else
+		vq = 0;
+
+	memset(reserved, 0, sizeof(reserved));
+
+	__put_user_error(ZA_MAGIC, &ctx->head.magic, err);
+	__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
+			 &ctx->head.size, err);
+	__put_user_error(vl, &ctx->vl, err);
+	BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
+	err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
+
+	if (vq) {
+		/*
+		 * This assumes that the ZA state has already been saved to
+		 * the task struct by calling the function
+		 * fpsimd_signal_preserve_current_state().
+		 */
+		err |= __copy_to_user((char __user *)ctx + ZA_SIG_REGS_OFFSET,
+				      current->thread.za_state,
+				      ZA_SIG_REGS_SIZE(vq));
+	}
+
+	return err ? -EFAULT : 0;
+}
+
+static int restore_za_context(struct user_ctxs __user *user)
+{
+	int err;
+	unsigned int vq;
+	struct za_context za;
+
+	if (__copy_from_user(&za, user->za, sizeof(za)))
+		return -EFAULT;
+
+	if (za.vl != task_get_sme_vl(current))
+		return -EINVAL;
+
+	if (za.head.size <= sizeof(*user->za)) {
+		current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
+		return 0;
+	}
+
+	vq = sve_vq_from_vl(za.vl);
+
+	if (za.head.size < ZA_SIG_CONTEXT_SIZE(vq))
+		return -EINVAL;
+
+	/*
+	 * Careful: we are about __copy_from_user() directly into
+	 * thread.za_state with preemption enabled, so protection is
+	 * needed to prevent a racing context switch from writing stale
+	 * registers back over the new data.
+	 */
+
+	fpsimd_flush_task_state(current);
+	/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
+
+	sme_alloc(current);
+	if (!current->thread.za_state) {
+		current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
+		clear_thread_flag(TIF_SME);
+		return -ENOMEM;
+	}
+
+	err = __copy_from_user(current->thread.za_state,
+			       (char __user const *)user->za +
+					ZA_SIG_REGS_OFFSET,
+			       ZA_SIG_REGS_SIZE(vq));
+	if (err)
+		return -EFAULT;
+
+	set_thread_flag(TIF_SME);
+	current->thread.svcr |= SYS_SVCR_EL0_ZA_MASK;
+
+	return 0;
+}
+#else /* ! CONFIG_ARM64_SME */
+
+/* Turn any non-optimised out attempts to use these into a link error: */
+extern int preserve_za_context(void __user *ctx);
+extern int restore_za_context(struct user_ctxs *user);
+
+#endif /* ! CONFIG_ARM64_SME */
 
 static int parse_user_sigframe(struct user_ctxs *user,
 			       struct rt_sigframe __user *sf)
@@ -361,6 +458,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
 
 	user->fpsimd = NULL;
 	user->sve = NULL;
+	user->za = NULL;
 
 	if (!IS_ALIGNED((unsigned long)base, 16))
 		goto invalid;
@@ -426,6 +524,19 @@ static int parse_user_sigframe(struct user_ctxs *user,
 			user->sve = (struct sve_context __user *)head;
 			break;
 
+		case ZA_MAGIC:
+			if (!system_supports_sme())
+				goto invalid;
+
+			if (user->za)
+				goto invalid;
+
+			if (size < sizeof(*user->za))
+				goto invalid;
+
+			user->za = (struct za_context __user *)head;
+			break;
+
 		case EXTRA_MAGIC:
 			if (have_extra_context)
 				goto invalid;
@@ -549,6 +660,9 @@ static int restore_sigframe(struct pt_regs *regs,
 		}
 	}
 
+	if (err == 0 && system_supports_sme() && user.za)
+		err = restore_za_context(&user);
+
 	return err;
 }
 
@@ -633,6 +747,24 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 			return err;
 	}
 
+	if (system_supports_sme()) {
+		unsigned int vl;
+		unsigned int vq = 0;
+
+		if (add_all)
+			vl = sme_max_vl();
+		else
+			vl = task_get_sme_vl(current);
+
+		if (thread_za_enabled(&current->thread))
+			vq = sve_vq_from_vl(vl);
+
+		err = sigframe_alloc(user, &user->za_offset,
+				     ZA_SIG_CONTEXT_SIZE(vq));
+		if (err)
+			return err;
+	}
+
 	return sigframe_alloc_end(user);
 }
 
@@ -681,6 +813,13 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
 		err |= preserve_sve_context(sve_ctx);
 	}
 
+	/* ZA state if present */
+	if (system_supports_sme() && err == 0 && user->za_offset) {
+		struct za_context __user *za_ctx =
+			apply_user_offset(user, user->za_offset);
+		err |= preserve_za_context(za_ctx);
+	}
+
 	if (err == 0 && user->extra_offset) {
 		char __user *sfp = (char __user *)user->sigframe;
 		char __user *userp =
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 26/38] arm64/sme: Implement ZA signal handling
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Implement support for ZA in signal handling in a very similar way to how
we implement support for SVE registers, using a signal context structure
with optional register state after it. Where present this register state
stores the ZA matrix as a series of horizontal vectors numbered from 0 to
VL/8 in the endinanness independent format used for vectors.

As with SVE we do not allow changes in the vector length during signal
return but we do allow ZA to be enabled or disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/sigcontext.h |  41 +++++++
 arch/arm64/kernel/fpsimd.c               |   3 -
 arch/arm64/kernel/signal.c               | 139 +++++++++++++++++++++++
 3 files changed, 180 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index 3a3366d4fbc2..d45bdf2c8b26 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -140,6 +140,14 @@ struct sve_context {
 
 #define SVE_SIG_FLAG_SM	0x1	/* Context describes streaming mode */
 
+#define ZA_MAGIC	0x54366345
+
+struct za_context {
+	struct _aarch64_ctx head;
+	__u16 vl;
+	__u16 __reserved[3];
+};
+
 #endif /* !__ASSEMBLY__ */
 
 #include <asm/sve_context.h>
@@ -259,4 +267,37 @@ struct sve_context {
 #define SVE_SIG_CONTEXT_SIZE(vq) \
 		(SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
 
+/*
+ * If the ZA register is enabled for the thread at signal delivery then,
+ * za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
+ * and the register data may be accessed using the ZA_SIG_*() macros.
+ *
+ * If za_context.head.size < ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
+ * then ZA was not enabled and no register data was included in which case
+ * ZA register was not enabled for the thread and no register data
+ * the ZA_SIG_*() macros should not be used except for this check.
+ *
+ * The same convention applies when returning from a signal: a caller
+ * will need to remove or resize the za_context block if it wants to
+ * enable the ZA register when it was previously non-live or vice-versa.
+ * This may require the caller to allocate fresh memory and/or move other
+ * context blocks in the signal frame.
+ *
+ * Changing the vector length during signal return is not permitted:
+ * za_context.vl must equal the thread's current SME vector length when
+ * doing a sigreturn.
+ */
+
+#define ZA_SIG_REGS_OFFSET					\
+	((sizeof(struct za_context) + (__SVE_VQ_BYTES - 1))	\
+		/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
+
+#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
+
+#define ZA_SIG_ZAV_OFFSET(vq, n) (ZA_SIG_REGS_OFFSET + \
+				  (SVE_SIG_ZREG_SIZE(vq) * n))
+
+#define ZA_SIG_CONTEXT_SIZE(vq) \
+		(ZA_SIG_REGS_OFFSET + ZA_SIG_REGS_SIZE(vq))
+
 #endif /* _UAPI__ASM_SIGCONTEXT_H */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e66ddb966192..e39413f43660 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1182,9 +1182,6 @@ void fpsimd_release_task(struct task_struct *dead_task)
 
 #ifdef CONFIG_ARM64_SME
 
-/* This will move to uapi/asm/sigcontext.h when signals are implemented */
-#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
-
 /*
  * Return how many bytes of memory are required to store the full SME
  * specific state (currently just ZA) for task, given task's currently
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index fea0e1d30449..c0007ddf0c06 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -57,6 +57,7 @@ struct rt_sigframe_user_layout {
 	unsigned long fpsimd_offset;
 	unsigned long esr_offset;
 	unsigned long sve_offset;
+	unsigned long za_offset;
 	unsigned long extra_offset;
 	unsigned long end_offset;
 };
@@ -219,6 +220,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
 struct user_ctxs {
 	struct fpsimd_context __user *fpsimd;
 	struct sve_context __user *sve;
+	struct za_context __user *za;
 };
 
 #ifdef CONFIG_ARM64_SVE
@@ -347,6 +349,101 @@ extern int restore_sve_fpsimd_context(struct user_ctxs *user);
 
 #endif /* ! CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+static int preserve_za_context(struct za_context __user *ctx)
+{
+	int err = 0;
+	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
+	unsigned int vl = task_get_sme_vl(current);
+	unsigned int vq;
+
+	if (thread_za_enabled(&current->thread))
+		vq = sve_vq_from_vl(vl);
+	else
+		vq = 0;
+
+	memset(reserved, 0, sizeof(reserved));
+
+	__put_user_error(ZA_MAGIC, &ctx->head.magic, err);
+	__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
+			 &ctx->head.size, err);
+	__put_user_error(vl, &ctx->vl, err);
+	BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
+	err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
+
+	if (vq) {
+		/*
+		 * This assumes that the ZA state has already been saved to
+		 * the task struct by calling the function
+		 * fpsimd_signal_preserve_current_state().
+		 */
+		err |= __copy_to_user((char __user *)ctx + ZA_SIG_REGS_OFFSET,
+				      current->thread.za_state,
+				      ZA_SIG_REGS_SIZE(vq));
+	}
+
+	return err ? -EFAULT : 0;
+}
+
+static int restore_za_context(struct user_ctxs __user *user)
+{
+	int err;
+	unsigned int vq;
+	struct za_context za;
+
+	if (__copy_from_user(&za, user->za, sizeof(za)))
+		return -EFAULT;
+
+	if (za.vl != task_get_sme_vl(current))
+		return -EINVAL;
+
+	if (za.head.size <= sizeof(*user->za)) {
+		current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
+		return 0;
+	}
+
+	vq = sve_vq_from_vl(za.vl);
+
+	if (za.head.size < ZA_SIG_CONTEXT_SIZE(vq))
+		return -EINVAL;
+
+	/*
+	 * Careful: we are about __copy_from_user() directly into
+	 * thread.za_state with preemption enabled, so protection is
+	 * needed to prevent a racing context switch from writing stale
+	 * registers back over the new data.
+	 */
+
+	fpsimd_flush_task_state(current);
+	/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
+
+	sme_alloc(current);
+	if (!current->thread.za_state) {
+		current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
+		clear_thread_flag(TIF_SME);
+		return -ENOMEM;
+	}
+
+	err = __copy_from_user(current->thread.za_state,
+			       (char __user const *)user->za +
+					ZA_SIG_REGS_OFFSET,
+			       ZA_SIG_REGS_SIZE(vq));
+	if (err)
+		return -EFAULT;
+
+	set_thread_flag(TIF_SME);
+	current->thread.svcr |= SYS_SVCR_EL0_ZA_MASK;
+
+	return 0;
+}
+#else /* ! CONFIG_ARM64_SME */
+
+/* Turn any non-optimised out attempts to use these into a link error: */
+extern int preserve_za_context(void __user *ctx);
+extern int restore_za_context(struct user_ctxs *user);
+
+#endif /* ! CONFIG_ARM64_SME */
 
 static int parse_user_sigframe(struct user_ctxs *user,
 			       struct rt_sigframe __user *sf)
@@ -361,6 +458,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
 
 	user->fpsimd = NULL;
 	user->sve = NULL;
+	user->za = NULL;
 
 	if (!IS_ALIGNED((unsigned long)base, 16))
 		goto invalid;
@@ -426,6 +524,19 @@ static int parse_user_sigframe(struct user_ctxs *user,
 			user->sve = (struct sve_context __user *)head;
 			break;
 
+		case ZA_MAGIC:
+			if (!system_supports_sme())
+				goto invalid;
+
+			if (user->za)
+				goto invalid;
+
+			if (size < sizeof(*user->za))
+				goto invalid;
+
+			user->za = (struct za_context __user *)head;
+			break;
+
 		case EXTRA_MAGIC:
 			if (have_extra_context)
 				goto invalid;
@@ -549,6 +660,9 @@ static int restore_sigframe(struct pt_regs *regs,
 		}
 	}
 
+	if (err == 0 && system_supports_sme() && user.za)
+		err = restore_za_context(&user);
+
 	return err;
 }
 
@@ -633,6 +747,24 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 			return err;
 	}
 
+	if (system_supports_sme()) {
+		unsigned int vl;
+		unsigned int vq = 0;
+
+		if (add_all)
+			vl = sme_max_vl();
+		else
+			vl = task_get_sme_vl(current);
+
+		if (thread_za_enabled(&current->thread))
+			vq = sve_vq_from_vl(vl);
+
+		err = sigframe_alloc(user, &user->za_offset,
+				     ZA_SIG_CONTEXT_SIZE(vq));
+		if (err)
+			return err;
+	}
+
 	return sigframe_alloc_end(user);
 }
 
@@ -681,6 +813,13 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
 		err |= preserve_sve_context(sve_ctx);
 	}
 
+	/* ZA state if present */
+	if (system_supports_sme() && err == 0 && user->za_offset) {
+		struct za_context __user *za_ctx =
+			apply_user_offset(user, user->za_offset);
+		err |= preserve_za_context(za_ctx);
+	}
+
 	if (err == 0 && user->extra_offset) {
 		char __user *sfp = (char __user *)user->sigframe;
 		char __user *userp =
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 27/38] arm64/sme: Implement ptrace support for streaming mode SVE registers
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The streaming mode SVE registers are represented using the same data
structures as for SVE but since the vector lengths supported and in use
may not be the same as SVE we represent them with a new type NT_ARM_SSVE.
Unfortunately we only have a single 16 bit reserved field available in
the header so there is no space to fit the current and maximum vector
length for both standard and streaming SVE mode without redefining the
structure in a way the creates a complicatd and fragile ABI. Since FFR
is not present in streaming mode it is read and written as zero.

Setting NT_ARM_SSVE registers will put the task into streaming mode,
similarly setting NT_ARM_SVE registers will exit it. Reads that do not
correspond to the current mode of the task will return the header with
no register data. For compatibility reasons on write setting no flag for
the register type will be interpreted as setting SVE registers, though
users can provide no register data as an alternative mechanism for doing
so.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/ptrace.h |  13 +-
 arch/arm64/kernel/fpsimd.c           |  21 ++-
 arch/arm64/kernel/ptrace.c           | 212 +++++++++++++++++++++------
 include/uapi/linux/elf.h             |   1 +
 4 files changed, 190 insertions(+), 57 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 758ae984ff97..522b925a78c1 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -109,7 +109,7 @@ struct user_hwdebug_state {
 	}		dbg_regs[16];
 };
 
-/* SVE/FP/SIMD state (NT_ARM_SVE) */
+/* SVE/FP/SIMD state (NT_ARM_SVE & NT_ARM_SSVE) */
 
 struct user_sve_header {
 	__u32 size; /* total meaningful regset content in bytes */
@@ -220,6 +220,7 @@ struct user_sve_header {
 	(SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \
 		SVE_PT_SVE_PREGS_OFFSET(vq))
 
+/* For streaming mode SVE (SSVE) FFR must be read and written as zero */
 #define SVE_PT_SVE_FFR_OFFSET(vq) \
 	(SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq))
 
@@ -240,10 +241,12 @@ struct user_sve_header {
 			- SVE_PT_SVE_OFFSET + (__SVE_VQ_BYTES - 1))	\
 		/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
 
-#define SVE_PT_SIZE(vq, flags)						\
-	 (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ?		\
-		  SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags)	\
-		: SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags))
+#define SVE_PT_SIZE(vq, flags)						  \
+	 (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ?		  \
+		  SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags)	  \
+		: ((((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD ?  \
+		    SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags) \
+		  : SVE_PT_REGS_OFFSET)))
 
 /* pointer authentication masks (NT_ARM_PAC_MASK) */
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e39413f43660..118435dc1da4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -629,14 +629,19 @@ static void __fpsimd_to_sve(void *sst, struct user_fpsimd_state const *fst,
  */
 static void fpsimd_to_sve(struct task_struct *task)
 {
-	unsigned int vq;
+	unsigned int vq, vl;
 	void *sst = task->thread.sve_state;
 	struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
 
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task_get_sve_vl(task));
+	if (thread_sm_enabled(&task->thread))
+		vl = task_get_sme_vl(task);
+	else
+		vl = task_get_sve_vl(task);
+
+	vq = sve_vq_from_vl(vl);
 	__fpsimd_to_sve(sst, fst, vq);
 }
 
@@ -653,7 +658,7 @@ static void fpsimd_to_sve(struct task_struct *task)
  */
 static void sve_to_fpsimd(struct task_struct *task)
 {
-	unsigned int vq;
+	unsigned int vq, vl;
 	void const *sst = task->thread.sve_state;
 	struct user_fpsimd_state *fst = &task->thread.uw.fpsimd_state;
 	unsigned int i;
@@ -662,7 +667,12 @@ static void sve_to_fpsimd(struct task_struct *task)
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task_get_sve_vl(task));
+	if (thread_sm_enabled(&task->thread))
+		vl = task_get_sme_vl(task);
+	else
+		vl = task_get_sve_vl(task);
+
+	vq = sve_vq_from_vl(vl);
 	for (i = 0; i < SVE_NUM_ZREGS; ++i) {
 		p = (__uint128_t const *)ZREG(sst, vq, i);
 		fst->vregs[i] = arm64_le128_to_cpu(*p);
@@ -803,8 +813,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	/*
 	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
 	 * write any live register state back to task_struct, and convert to a
-	 * regular FPSIMD thread.  Since the vector length can only be changed
-	 * with a syscall we can't be in streaming mode while reconfiguring.
+	 * regular FPSIMD thread.
 	 */
 	if (task == current) {
 		get_cpu_fpsimd_context();
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 716dde289446..414126ce5897 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -714,21 +714,51 @@ static int system_call_set(struct task_struct *target,
 #ifdef CONFIG_ARM64_SVE
 
 static void sve_init_header_from_task(struct user_sve_header *header,
-				      struct task_struct *target)
+				      struct task_struct *target,
+				      enum vec_type type)
 {
 	unsigned int vq;
+	bool active;
+	bool fpsimd_only;
+	enum vec_type task_type;
 
 	memset(header, 0, sizeof(*header));
 
-	header->flags = test_tsk_thread_flag(target, TIF_SVE) ?
-		SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
-	if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
-		header->flags |= SVE_PT_VL_INHERIT;
+	/* Check if the requested registers are active for the task */
+	if (thread_sm_enabled(&target->thread))
+		task_type = ARM64_VEC_SME;
+	else
+		task_type = ARM64_VEC_SVE;
+	active = (task_type == type);
+
+	switch (type) {
+	case ARM64_VEC_SVE:
+		if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
+			header->flags |= SVE_PT_VL_INHERIT;
+		fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE);
+		break;
+	case ARM64_VEC_SME:
+		if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
+			header->flags |= SVE_PT_VL_INHERIT;
+		fpsimd_only = false;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return;
+	}
 
-	header->vl = task_get_sve_vl(target);
+	if (active) {
+		if (fpsimd_only) {
+			header->flags |= SVE_PT_REGS_FPSIMD;
+		} else {
+			header->flags |= SVE_PT_REGS_SVE;
+		}
+	}
+
+	header->vl = task_get_vl(target, type);
 	vq = sve_vq_from_vl(header->vl);
 
-	header->max_vl = sve_max_vl();
+	header->max_vl = vec_max_vl(type);
 	header->size = SVE_PT_SIZE(vq, header->flags);
 	header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
 				      SVE_PT_REGS_SVE);
@@ -739,19 +769,17 @@ static unsigned int sve_size_from_header(struct user_sve_header const *header)
 	return ALIGN(header->size, SVE_VQ_BYTES);
 }
 
-static int sve_get(struct task_struct *target,
-		   const struct user_regset *regset,
-		   struct membuf to)
+static int sve_get_common(struct task_struct *target,
+			  const struct user_regset *regset,
+			  struct membuf to,
+			  enum vec_type type)
 {
 	struct user_sve_header header;
 	unsigned int vq;
 	unsigned long start, end;
 
-	if (!system_supports_sve())
-		return -EINVAL;
-
 	/* Header */
-	sve_init_header_from_task(&header, target);
+	sve_init_header_from_task(&header, target, type);
 	vq = sve_vq_from_vl(header.vl);
 
 	membuf_write(&to, &header, sizeof(header));
@@ -759,49 +787,61 @@ static int sve_get(struct task_struct *target,
 	if (target == current)
 		fpsimd_preserve_current_state();
 
-	/* Registers: FPSIMD-only case */
-
 	BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
-	if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD)
+	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
+
+	switch ((header.flags & SVE_PT_REGS_MASK)) {
+	case SVE_PT_REGS_FPSIMD:
 		return __fpr_get(target, regset, to);
 
-	/* Otherwise: full SVE case */
+	case SVE_PT_REGS_SVE:
+		start = SVE_PT_SVE_OFFSET;
+		end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
+		membuf_write(&to, target->thread.sve_state, end - start);
 
-	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
-	start = SVE_PT_SVE_OFFSET;
-	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
-	membuf_write(&to, target->thread.sve_state, end - start);
+		start = end;
+		end = SVE_PT_SVE_FPSR_OFFSET(vq);
+		membuf_zero(&to, end - start);
 
-	start = end;
-	end = SVE_PT_SVE_FPSR_OFFSET(vq);
-	membuf_zero(&to, end - start);
+		/*
+		 * Copy fpsr, and fpcr which must follow contiguously in
+		 * struct fpsimd_state:
+		 */
+		start = end;
+		end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
+		membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr,
+			     end - start);
 
-	/*
-	 * Copy fpsr, and fpcr which must follow contiguously in
-	 * struct fpsimd_state:
-	 */
-	start = end;
-	end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
-	membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr, end - start);
+		start = end;
+		end = sve_size_from_header(&header);
+		return membuf_zero(&to, end - start);
 
-	start = end;
-	end = sve_size_from_header(&header);
-	return membuf_zero(&to, end - start);
+	default:
+		return 0;
+	}
 }
 
-static int sve_set(struct task_struct *target,
+static int sve_get(struct task_struct *target,
 		   const struct user_regset *regset,
-		   unsigned int pos, unsigned int count,
-		   const void *kbuf, const void __user *ubuf)
+		   struct membuf to)
+{
+	if (!system_supports_sve())
+		return -EINVAL;
+
+	return sve_get_common(target, regset, to, ARM64_VEC_SVE);
+}
+
+static int sve_set_common(struct task_struct *target,
+			  const struct user_regset *regset,
+			  unsigned int pos, unsigned int count,
+			  const void *kbuf, const void __user *ubuf,
+			  enum vec_type type)
 {
 	int ret;
 	struct user_sve_header header;
 	unsigned int vq;
 	unsigned long start, end;
 
-	if (!system_supports_sve())
-		return -EINVAL;
-
 	/* Header */
 	if (count < sizeof(header))
 		return -EINVAL;
@@ -814,13 +854,37 @@ static int sve_set(struct task_struct *target,
 	 * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by
 	 * vec_set_vector_length(), which will also validate them for us:
 	 */
-	ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl,
+	ret = vec_set_vector_length(target, type, header.vl,
 		((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
 	if (ret)
 		goto out;
 
 	/* Actual VL set may be less than the user asked for: */
-	vq = sve_vq_from_vl(task_get_sve_vl(target));
+	vq = sve_vq_from_vl(task_get_vl(target, type));
+
+	/* Enter/exit streaming mode */
+	if (system_supports_sme()) {
+		u64 old_svcr = target->thread.svcr;
+
+		switch (type) {
+		case ARM64_VEC_SVE:
+			target->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK;
+			break;
+		case ARM64_VEC_SME:
+			target->thread.svcr |= SYS_SVCR_EL0_SM_MASK;
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return -EINVAL;
+		}
+
+		/*
+		 * If we switched then invalidate any existing SVE
+		 * state and ensure there's storage.
+		 */
+		if (target->thread.svcr != old_svcr)
+			sve_alloc(target);
+	}
 
 	/* Registers: FPSIMD-only case */
 
@@ -832,7 +896,10 @@ static int sve_set(struct task_struct *target,
 		goto out;
 	}
 
-	/* Otherwise: full SVE case */
+	/*
+	 * Otherwise: no registers or full SVE case.  For backwards
+	 * compatibility reasons we treat empty flags as SVE registers.
+	 */
 
 	/*
 	 * If setting a different VL from the requested VL and there is
@@ -853,8 +920,9 @@ static int sve_set(struct task_struct *target,
 
 	/*
 	 * Ensure target->thread.sve_state is up to date with target's
-	 * FPSIMD regs, so that a short copyin leaves trailing registers
-	 * unmodified.
+	 * FPSIMD regs, so that a short copyin leaves trailing
+	 * registers unmodified.  Always enable SVE even if going into
+	 * streaming mode.
 	 */
 	fpsimd_sync_to_sve(target);
 	set_tsk_thread_flag(target, TIF_SVE);
@@ -890,8 +958,46 @@ static int sve_set(struct task_struct *target,
 	return ret;
 }
 
+static int sve_set(struct task_struct *target,
+		   const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	if (!system_supports_sve())
+		return -EINVAL;
+
+	return sve_set_common(target, regset, pos, count, kbuf, ubuf,
+			      ARM64_VEC_SVE);
+}
+
 #endif /* CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+static int ssve_get(struct task_struct *target,
+		   const struct user_regset *regset,
+		   struct membuf to)
+{
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	return sve_get_common(target, regset, to, ARM64_VEC_SME);
+}
+
+static int ssve_set(struct task_struct *target,
+		    const struct user_regset *regset,
+		    unsigned int pos, unsigned int count,
+		    const void *kbuf, const void __user *ubuf)
+{
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	return sve_set_common(target, regset, pos, count, kbuf, ubuf,
+			      ARM64_VEC_SME);
+}
+
+#endif /* CONFIG_ARM64_SME */
+
 #ifdef CONFIG_ARM64_PTR_AUTH
 static int pac_mask_get(struct task_struct *target,
 			const struct user_regset *regset,
@@ -1109,6 +1215,9 @@ enum aarch64_regset {
 #ifdef CONFIG_ARM64_SVE
 	REGSET_SVE,
 #endif
+#ifdef CONFIG_ARM64_SVE
+	REGSET_SSVE,
+#endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	REGSET_PAC_MASK,
 	REGSET_PAC_ENABLED_KEYS,
@@ -1189,6 +1298,17 @@ static const struct user_regset aarch64_regsets[] = {
 		.set = sve_set,
 	},
 #endif
+#ifdef CONFIG_ARM64_SME
+	[REGSET_SSVE] = { /* Streaming mode SVE */
+		.core_note_type = NT_ARM_SSVE,
+		.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
+				  SVE_VQ_BYTES),
+		.size = SVE_VQ_BYTES,
+		.align = SVE_VQ_BYTES,
+		.regset_get = ssve_get,
+		.set = ssve_set,
+	},
+#endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	[REGSET_PAC_MASK] = {
 		.core_note_type = NT_ARM_PAC_MASK,
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 61bf4774b8f2..61502388683f 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -427,6 +427,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_PACG_KEYS	0x408	/* ARM pointer authentication generic key */
 #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
 #define NT_ARM_PAC_ENABLED_KEYS	0x40a	/* arm64 ptr auth enabled keys (prctl()) */
+#define NT_ARM_SSVE	0x40b		/* ARM Streaming SVE registers */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 27/38] arm64/sme: Implement ptrace support for streaming mode SVE registers
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The streaming mode SVE registers are represented using the same data
structures as for SVE but since the vector lengths supported and in use
may not be the same as SVE we represent them with a new type NT_ARM_SSVE.
Unfortunately we only have a single 16 bit reserved field available in
the header so there is no space to fit the current and maximum vector
length for both standard and streaming SVE mode without redefining the
structure in a way the creates a complicatd and fragile ABI. Since FFR
is not present in streaming mode it is read and written as zero.

Setting NT_ARM_SSVE registers will put the task into streaming mode,
similarly setting NT_ARM_SVE registers will exit it. Reads that do not
correspond to the current mode of the task will return the header with
no register data. For compatibility reasons on write setting no flag for
the register type will be interpreted as setting SVE registers, though
users can provide no register data as an alternative mechanism for doing
so.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/ptrace.h |  13 +-
 arch/arm64/kernel/fpsimd.c           |  21 ++-
 arch/arm64/kernel/ptrace.c           | 212 +++++++++++++++++++++------
 include/uapi/linux/elf.h             |   1 +
 4 files changed, 190 insertions(+), 57 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 758ae984ff97..522b925a78c1 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -109,7 +109,7 @@ struct user_hwdebug_state {
 	}		dbg_regs[16];
 };
 
-/* SVE/FP/SIMD state (NT_ARM_SVE) */
+/* SVE/FP/SIMD state (NT_ARM_SVE & NT_ARM_SSVE) */
 
 struct user_sve_header {
 	__u32 size; /* total meaningful regset content in bytes */
@@ -220,6 +220,7 @@ struct user_sve_header {
 	(SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \
 		SVE_PT_SVE_PREGS_OFFSET(vq))
 
+/* For streaming mode SVE (SSVE) FFR must be read and written as zero */
 #define SVE_PT_SVE_FFR_OFFSET(vq) \
 	(SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq))
 
@@ -240,10 +241,12 @@ struct user_sve_header {
 			- SVE_PT_SVE_OFFSET + (__SVE_VQ_BYTES - 1))	\
 		/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
 
-#define SVE_PT_SIZE(vq, flags)						\
-	 (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ?		\
-		  SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags)	\
-		: SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags))
+#define SVE_PT_SIZE(vq, flags)						  \
+	 (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ?		  \
+		  SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags)	  \
+		: ((((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD ?  \
+		    SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags) \
+		  : SVE_PT_REGS_OFFSET)))
 
 /* pointer authentication masks (NT_ARM_PAC_MASK) */
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e39413f43660..118435dc1da4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -629,14 +629,19 @@ static void __fpsimd_to_sve(void *sst, struct user_fpsimd_state const *fst,
  */
 static void fpsimd_to_sve(struct task_struct *task)
 {
-	unsigned int vq;
+	unsigned int vq, vl;
 	void *sst = task->thread.sve_state;
 	struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
 
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task_get_sve_vl(task));
+	if (thread_sm_enabled(&task->thread))
+		vl = task_get_sme_vl(task);
+	else
+		vl = task_get_sve_vl(task);
+
+	vq = sve_vq_from_vl(vl);
 	__fpsimd_to_sve(sst, fst, vq);
 }
 
@@ -653,7 +658,7 @@ static void fpsimd_to_sve(struct task_struct *task)
  */
 static void sve_to_fpsimd(struct task_struct *task)
 {
-	unsigned int vq;
+	unsigned int vq, vl;
 	void const *sst = task->thread.sve_state;
 	struct user_fpsimd_state *fst = &task->thread.uw.fpsimd_state;
 	unsigned int i;
@@ -662,7 +667,12 @@ static void sve_to_fpsimd(struct task_struct *task)
 	if (!system_supports_sve())
 		return;
 
-	vq = sve_vq_from_vl(task_get_sve_vl(task));
+	if (thread_sm_enabled(&task->thread))
+		vl = task_get_sme_vl(task);
+	else
+		vl = task_get_sve_vl(task);
+
+	vq = sve_vq_from_vl(vl);
 	for (i = 0; i < SVE_NUM_ZREGS; ++i) {
 		p = (__uint128_t const *)ZREG(sst, vq, i);
 		fst->vregs[i] = arm64_le128_to_cpu(*p);
@@ -803,8 +813,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type,
 	/*
 	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
 	 * write any live register state back to task_struct, and convert to a
-	 * regular FPSIMD thread.  Since the vector length can only be changed
-	 * with a syscall we can't be in streaming mode while reconfiguring.
+	 * regular FPSIMD thread.
 	 */
 	if (task == current) {
 		get_cpu_fpsimd_context();
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 716dde289446..414126ce5897 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -714,21 +714,51 @@ static int system_call_set(struct task_struct *target,
 #ifdef CONFIG_ARM64_SVE
 
 static void sve_init_header_from_task(struct user_sve_header *header,
-				      struct task_struct *target)
+				      struct task_struct *target,
+				      enum vec_type type)
 {
 	unsigned int vq;
+	bool active;
+	bool fpsimd_only;
+	enum vec_type task_type;
 
 	memset(header, 0, sizeof(*header));
 
-	header->flags = test_tsk_thread_flag(target, TIF_SVE) ?
-		SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
-	if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
-		header->flags |= SVE_PT_VL_INHERIT;
+	/* Check if the requested registers are active for the task */
+	if (thread_sm_enabled(&target->thread))
+		task_type = ARM64_VEC_SME;
+	else
+		task_type = ARM64_VEC_SVE;
+	active = (task_type == type);
+
+	switch (type) {
+	case ARM64_VEC_SVE:
+		if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT))
+			header->flags |= SVE_PT_VL_INHERIT;
+		fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE);
+		break;
+	case ARM64_VEC_SME:
+		if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
+			header->flags |= SVE_PT_VL_INHERIT;
+		fpsimd_only = false;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return;
+	}
 
-	header->vl = task_get_sve_vl(target);
+	if (active) {
+		if (fpsimd_only) {
+			header->flags |= SVE_PT_REGS_FPSIMD;
+		} else {
+			header->flags |= SVE_PT_REGS_SVE;
+		}
+	}
+
+	header->vl = task_get_vl(target, type);
 	vq = sve_vq_from_vl(header->vl);
 
-	header->max_vl = sve_max_vl();
+	header->max_vl = vec_max_vl(type);
 	header->size = SVE_PT_SIZE(vq, header->flags);
 	header->max_size = SVE_PT_SIZE(sve_vq_from_vl(header->max_vl),
 				      SVE_PT_REGS_SVE);
@@ -739,19 +769,17 @@ static unsigned int sve_size_from_header(struct user_sve_header const *header)
 	return ALIGN(header->size, SVE_VQ_BYTES);
 }
 
-static int sve_get(struct task_struct *target,
-		   const struct user_regset *regset,
-		   struct membuf to)
+static int sve_get_common(struct task_struct *target,
+			  const struct user_regset *regset,
+			  struct membuf to,
+			  enum vec_type type)
 {
 	struct user_sve_header header;
 	unsigned int vq;
 	unsigned long start, end;
 
-	if (!system_supports_sve())
-		return -EINVAL;
-
 	/* Header */
-	sve_init_header_from_task(&header, target);
+	sve_init_header_from_task(&header, target, type);
 	vq = sve_vq_from_vl(header.vl);
 
 	membuf_write(&to, &header, sizeof(header));
@@ -759,49 +787,61 @@ static int sve_get(struct task_struct *target,
 	if (target == current)
 		fpsimd_preserve_current_state();
 
-	/* Registers: FPSIMD-only case */
-
 	BUILD_BUG_ON(SVE_PT_FPSIMD_OFFSET != sizeof(header));
-	if ((header.flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD)
+	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
+
+	switch ((header.flags & SVE_PT_REGS_MASK)) {
+	case SVE_PT_REGS_FPSIMD:
 		return __fpr_get(target, regset, to);
 
-	/* Otherwise: full SVE case */
+	case SVE_PT_REGS_SVE:
+		start = SVE_PT_SVE_OFFSET;
+		end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
+		membuf_write(&to, target->thread.sve_state, end - start);
 
-	BUILD_BUG_ON(SVE_PT_SVE_OFFSET != sizeof(header));
-	start = SVE_PT_SVE_OFFSET;
-	end = SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq);
-	membuf_write(&to, target->thread.sve_state, end - start);
+		start = end;
+		end = SVE_PT_SVE_FPSR_OFFSET(vq);
+		membuf_zero(&to, end - start);
 
-	start = end;
-	end = SVE_PT_SVE_FPSR_OFFSET(vq);
-	membuf_zero(&to, end - start);
+		/*
+		 * Copy fpsr, and fpcr which must follow contiguously in
+		 * struct fpsimd_state:
+		 */
+		start = end;
+		end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
+		membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr,
+			     end - start);
 
-	/*
-	 * Copy fpsr, and fpcr which must follow contiguously in
-	 * struct fpsimd_state:
-	 */
-	start = end;
-	end = SVE_PT_SVE_FPCR_OFFSET(vq) + SVE_PT_SVE_FPCR_SIZE;
-	membuf_write(&to, &target->thread.uw.fpsimd_state.fpsr, end - start);
+		start = end;
+		end = sve_size_from_header(&header);
+		return membuf_zero(&to, end - start);
 
-	start = end;
-	end = sve_size_from_header(&header);
-	return membuf_zero(&to, end - start);
+	default:
+		return 0;
+	}
 }
 
-static int sve_set(struct task_struct *target,
+static int sve_get(struct task_struct *target,
 		   const struct user_regset *regset,
-		   unsigned int pos, unsigned int count,
-		   const void *kbuf, const void __user *ubuf)
+		   struct membuf to)
+{
+	if (!system_supports_sve())
+		return -EINVAL;
+
+	return sve_get_common(target, regset, to, ARM64_VEC_SVE);
+}
+
+static int sve_set_common(struct task_struct *target,
+			  const struct user_regset *regset,
+			  unsigned int pos, unsigned int count,
+			  const void *kbuf, const void __user *ubuf,
+			  enum vec_type type)
 {
 	int ret;
 	struct user_sve_header header;
 	unsigned int vq;
 	unsigned long start, end;
 
-	if (!system_supports_sve())
-		return -EINVAL;
-
 	/* Header */
 	if (count < sizeof(header))
 		return -EINVAL;
@@ -814,13 +854,37 @@ static int sve_set(struct task_struct *target,
 	 * Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by
 	 * vec_set_vector_length(), which will also validate them for us:
 	 */
-	ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl,
+	ret = vec_set_vector_length(target, type, header.vl,
 		((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
 	if (ret)
 		goto out;
 
 	/* Actual VL set may be less than the user asked for: */
-	vq = sve_vq_from_vl(task_get_sve_vl(target));
+	vq = sve_vq_from_vl(task_get_vl(target, type));
+
+	/* Enter/exit streaming mode */
+	if (system_supports_sme()) {
+		u64 old_svcr = target->thread.svcr;
+
+		switch (type) {
+		case ARM64_VEC_SVE:
+			target->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK;
+			break;
+		case ARM64_VEC_SME:
+			target->thread.svcr |= SYS_SVCR_EL0_SM_MASK;
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return -EINVAL;
+		}
+
+		/*
+		 * If we switched then invalidate any existing SVE
+		 * state and ensure there's storage.
+		 */
+		if (target->thread.svcr != old_svcr)
+			sve_alloc(target);
+	}
 
 	/* Registers: FPSIMD-only case */
 
@@ -832,7 +896,10 @@ static int sve_set(struct task_struct *target,
 		goto out;
 	}
 
-	/* Otherwise: full SVE case */
+	/*
+	 * Otherwise: no registers or full SVE case.  For backwards
+	 * compatibility reasons we treat empty flags as SVE registers.
+	 */
 
 	/*
 	 * If setting a different VL from the requested VL and there is
@@ -853,8 +920,9 @@ static int sve_set(struct task_struct *target,
 
 	/*
 	 * Ensure target->thread.sve_state is up to date with target's
-	 * FPSIMD regs, so that a short copyin leaves trailing registers
-	 * unmodified.
+	 * FPSIMD regs, so that a short copyin leaves trailing
+	 * registers unmodified.  Always enable SVE even if going into
+	 * streaming mode.
 	 */
 	fpsimd_sync_to_sve(target);
 	set_tsk_thread_flag(target, TIF_SVE);
@@ -890,8 +958,46 @@ static int sve_set(struct task_struct *target,
 	return ret;
 }
 
+static int sve_set(struct task_struct *target,
+		   const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	if (!system_supports_sve())
+		return -EINVAL;
+
+	return sve_set_common(target, regset, pos, count, kbuf, ubuf,
+			      ARM64_VEC_SVE);
+}
+
 #endif /* CONFIG_ARM64_SVE */
 
+#ifdef CONFIG_ARM64_SME
+
+static int ssve_get(struct task_struct *target,
+		   const struct user_regset *regset,
+		   struct membuf to)
+{
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	return sve_get_common(target, regset, to, ARM64_VEC_SME);
+}
+
+static int ssve_set(struct task_struct *target,
+		    const struct user_regset *regset,
+		    unsigned int pos, unsigned int count,
+		    const void *kbuf, const void __user *ubuf)
+{
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	return sve_set_common(target, regset, pos, count, kbuf, ubuf,
+			      ARM64_VEC_SME);
+}
+
+#endif /* CONFIG_ARM64_SME */
+
 #ifdef CONFIG_ARM64_PTR_AUTH
 static int pac_mask_get(struct task_struct *target,
 			const struct user_regset *regset,
@@ -1109,6 +1215,9 @@ enum aarch64_regset {
 #ifdef CONFIG_ARM64_SVE
 	REGSET_SVE,
 #endif
+#ifdef CONFIG_ARM64_SVE
+	REGSET_SSVE,
+#endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	REGSET_PAC_MASK,
 	REGSET_PAC_ENABLED_KEYS,
@@ -1189,6 +1298,17 @@ static const struct user_regset aarch64_regsets[] = {
 		.set = sve_set,
 	},
 #endif
+#ifdef CONFIG_ARM64_SME
+	[REGSET_SSVE] = { /* Streaming mode SVE */
+		.core_note_type = NT_ARM_SSVE,
+		.n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
+				  SVE_VQ_BYTES),
+		.size = SVE_VQ_BYTES,
+		.align = SVE_VQ_BYTES,
+		.regset_get = ssve_get,
+		.set = ssve_set,
+	},
+#endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	[REGSET_PAC_MASK] = {
 		.core_note_type = NT_ARM_PAC_MASK,
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 61bf4774b8f2..61502388683f 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -427,6 +427,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_PACG_KEYS	0x408	/* ARM pointer authentication generic key */
 #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
 #define NT_ARM_PAC_ENABLED_KEYS	0x40a	/* arm64 ptr auth enabled keys (prctl()) */
+#define NT_ARM_SSVE	0x40b		/* ARM Streaming SVE registers */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 28/38] arm64/sme: Add ptrace support for ZA
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The ZA array can be read and written with the NT_ARM_ZA.  Similarly to
our interface for the SVE vector registers the regset consists of a
header with information on the current vector length followed by an
optional register data payload, represented as for signals as a series
of horizontal vectors from 0 to VL/8 in the endianness independent
format used for vectors.

On get if ZA is enabled then register data will be provided, otherwise
it will be omitted.  On set if register data is provided then ZA is
enabled and initialized using the provided data, otherwise it is
disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/ptrace.h |  56 +++++++++++
 arch/arm64/kernel/ptrace.c           | 144 +++++++++++++++++++++++++++
 include/uapi/linux/elf.h             |   1 +
 3 files changed, 201 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 522b925a78c1..7fa2f7036aa7 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -268,6 +268,62 @@ struct user_pac_generic_keys {
 	__uint128_t	apgakey;
 };
 
+/* ZA state (NT_ARM_ZA) */
+
+struct user_za_header {
+	__u32 size; /* total meaningful regset content in bytes */
+	__u32 max_size; /* maxmium possible size for this thread */
+	__u16 vl; /* current vector length */
+	__u16 max_vl; /* maximum possible vector length */
+	__u16 flags;
+	__u16 __reserved;
+};
+
+/*
+ * Common ZA_PT_* flags:
+ * These must be kept in sync with prctl interface in <linux/prctl.h>
+ */
+#define ZA_PT_VL_INHERIT		((1 << 17) /* PR_SME_VL_INHERIT */ >> 16)
+#define ZA_PT_VL_ONEXEC			((1 << 18) /* PR_SME_SET_VL_ONEXEC */ >> 16)
+
+
+/*
+ * The remainder of the ZA state follows struct user_za_header.  The
+ * total size of the ZA state (including header) depends on the
+ * metadata in the header:  ZA_PT_SIZE(vq, flags) gives the total size
+ * of the state in bytes, including the header.
+ *
+ * Refer to <asm/sigcontext.h> for details of how to pass the correct
+ * "vq" argument to these macros.
+ */
+
+/* Offset from the start of struct user_za_header to the register data */
+#define ZA_PT_ZA_OFFSET						\
+	((sizeof(struct user_za_header) + (__SVE_VQ_BYTES - 1))	\
+		/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
+
+/*
+ * The payload starts at offset ZA_PT_ZA_OFFSET, and is of size
+ * ZA_PT_ZA_SIZE(vq, flags).
+ *
+ * The ZA array is stored as a sequence of horizontal vectors ZAV of SVL/8
+ * bytes each, starting from vector 0.
+ *
+ * Additional data might be appended in the future.
+ *
+ * The ZA matrix is represented in memory in an endianness-invariant layout
+ * which differs from the layout used for the FPSIMD V-registers on big-endian
+ * systems: see sigcontext.h for more explanation.
+ */
+
+#define ZA_PT_ZAV_OFFSET(vq, n) \
+	(ZA_PT_ZA_OFFSET + ((vq * __SVE_VQ_BYTES) * n))
+
+#define ZA_PT_ZA_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
+
+#define ZA_PT_SIZE(vq)						\
+	(ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq))
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _UAPI__ASM_PTRACE_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 414126ce5897..702765f50a47 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -996,6 +996,141 @@ static int ssve_set(struct task_struct *target,
 			      ARM64_VEC_SME);
 }
 
+static int za_get(struct task_struct *target,
+		  const struct user_regset *regset,
+		  struct membuf to)
+{
+	struct user_za_header header;
+	unsigned int vq;
+	unsigned long start, end;
+
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	/* Header */
+	memset(&header, 0, sizeof(header));
+
+	if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
+		header.flags |= ZA_PT_VL_INHERIT;
+
+	header.vl = task_get_sme_vl(target);
+	vq = sve_vq_from_vl(header.vl);
+	header.max_vl = sme_max_vl();
+	header.max_size = ZA_PT_SIZE(vq);
+
+	/* If ZA is not active there is only the header */
+	if (thread_za_enabled(&target->thread))
+		header.size = ZA_PT_SIZE(vq);
+	else
+		header.size = ZA_PT_ZA_OFFSET;
+
+	membuf_write(&to, &header, sizeof(header));
+
+	BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header));
+	end = ZA_PT_ZA_OFFSET;
+;
+	if (target == current)
+		fpsimd_preserve_current_state();
+
+	/* Any register data to include? */
+	if (thread_za_enabled(&target->thread)) {
+		start = end;
+		end = ZA_PT_SIZE(vq);
+		membuf_write(&to, target->thread.za_state, end - start);
+	}
+
+	/* Zero any trailing padding */
+	start = end;
+	end = ALIGN(header.size, SVE_VQ_BYTES);
+	return membuf_zero(&to, end - start);
+}
+
+static int za_set(struct task_struct *target,
+		  const struct user_regset *regset,
+		  unsigned int pos, unsigned int count,
+		  const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+	struct user_za_header header;
+	unsigned int vq;
+	unsigned long start, end;
+
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	/* Header */
+	if (count < sizeof(header))
+		return -EINVAL;
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header,
+				 0, sizeof(header));
+	if (ret)
+		goto out;
+
+	/*
+	 * All current ZA_PT_* flags are consumed by
+	 * vec_set_vector_length(), which will also validate them for
+	 * us:
+	 */
+	ret = vec_set_vector_length(target, ARM64_VEC_SME, header.vl,
+		((unsigned long)header.flags) << 16);
+	if (ret)
+		goto out;
+
+	/* Actual VL set may be less than the user asked for: */
+	vq = sve_vq_from_vl(task_get_sme_vl(target));
+
+	/* Ensure there is some SVE storage for streaming mode */
+	if (!target->thread.sve_state) {
+		sve_alloc(target);
+		if (!target->thread.sve_state) {
+			clear_thread_flag(TIF_SME);
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	/* Allocate/reinit ZA storage */
+	sme_alloc(target);
+	if (!target->thread.za_state) {
+		ret = -ENOMEM;
+		clear_tsk_thread_flag(target, TIF_SME);
+		goto out;
+	}
+
+	/* If there is no data then disable ZA */
+	if (!count) {
+		target->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
+		goto out;
+	}
+
+	/*
+	 * If setting a different VL from the requested VL and there is
+	 * register data, the data layout will be wrong: don't even
+	 * try to set the registers in this case.
+	 */
+	if (vq != sve_vq_from_vl(header.vl)) {
+		ret = -EIO;
+		goto out;
+	}
+
+	BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header));
+	start = ZA_PT_ZA_OFFSET;
+	end = ZA_PT_SIZE(vq);
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.za_state,
+				 start, end);
+	if (ret)
+		goto out;
+
+	/* Mark ZA as active and let userspace use it */
+	set_tsk_thread_flag(target, TIF_SME);
+	target->thread.svcr |= SYS_SVCR_EL0_ZA_MASK;
+
+out:
+	fpsimd_flush_task_state(target);
+	return ret;
+}
+
 #endif /* CONFIG_ARM64_SME */
 
 #ifdef CONFIG_ARM64_PTR_AUTH
@@ -1217,6 +1352,7 @@ enum aarch64_regset {
 #endif
 #ifdef CONFIG_ARM64_SVE
 	REGSET_SSVE,
+	REGSET_ZA,
 #endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	REGSET_PAC_MASK,
@@ -1308,6 +1444,14 @@ static const struct user_regset aarch64_regsets[] = {
 		.regset_get = ssve_get,
 		.set = ssve_set,
 	},
+	[REGSET_ZA] = { /* SME ZA */
+		.core_note_type = NT_ARM_ZA,
+		.n = DIV_ROUND_UP(ZA_PT_ZA_SIZE(SVE_VQ_MAX), SVE_VQ_BYTES),
+		.size = SVE_VQ_BYTES,
+		.align = SVE_VQ_BYTES,
+		.regset_get = za_get,
+		.set = za_set,
+	},
 #endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	[REGSET_PAC_MASK] = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 61502388683f..7ef574f3256a 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -428,6 +428,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
 #define NT_ARM_PAC_ENABLED_KEYS	0x40a	/* arm64 ptr auth enabled keys (prctl()) */
 #define NT_ARM_SSVE	0x40b		/* ARM Streaming SVE registers */
+#define NT_ARM_ZA	0x40c		/* ARM SME ZA registers */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 28/38] arm64/sme: Add ptrace support for ZA
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The ZA array can be read and written with the NT_ARM_ZA.  Similarly to
our interface for the SVE vector registers the regset consists of a
header with information on the current vector length followed by an
optional register data payload, represented as for signals as a series
of horizontal vectors from 0 to VL/8 in the endianness independent
format used for vectors.

On get if ZA is enabled then register data will be provided, otherwise
it will be omitted.  On set if register data is provided then ZA is
enabled and initialized using the provided data, otherwise it is
disabled.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/uapi/asm/ptrace.h |  56 +++++++++++
 arch/arm64/kernel/ptrace.c           | 144 +++++++++++++++++++++++++++
 include/uapi/linux/elf.h             |   1 +
 3 files changed, 201 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 522b925a78c1..7fa2f7036aa7 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -268,6 +268,62 @@ struct user_pac_generic_keys {
 	__uint128_t	apgakey;
 };
 
+/* ZA state (NT_ARM_ZA) */
+
+struct user_za_header {
+	__u32 size; /* total meaningful regset content in bytes */
+	__u32 max_size; /* maxmium possible size for this thread */
+	__u16 vl; /* current vector length */
+	__u16 max_vl; /* maximum possible vector length */
+	__u16 flags;
+	__u16 __reserved;
+};
+
+/*
+ * Common ZA_PT_* flags:
+ * These must be kept in sync with prctl interface in <linux/prctl.h>
+ */
+#define ZA_PT_VL_INHERIT		((1 << 17) /* PR_SME_VL_INHERIT */ >> 16)
+#define ZA_PT_VL_ONEXEC			((1 << 18) /* PR_SME_SET_VL_ONEXEC */ >> 16)
+
+
+/*
+ * The remainder of the ZA state follows struct user_za_header.  The
+ * total size of the ZA state (including header) depends on the
+ * metadata in the header:  ZA_PT_SIZE(vq, flags) gives the total size
+ * of the state in bytes, including the header.
+ *
+ * Refer to <asm/sigcontext.h> for details of how to pass the correct
+ * "vq" argument to these macros.
+ */
+
+/* Offset from the start of struct user_za_header to the register data */
+#define ZA_PT_ZA_OFFSET						\
+	((sizeof(struct user_za_header) + (__SVE_VQ_BYTES - 1))	\
+		/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
+
+/*
+ * The payload starts at offset ZA_PT_ZA_OFFSET, and is of size
+ * ZA_PT_ZA_SIZE(vq, flags).
+ *
+ * The ZA array is stored as a sequence of horizontal vectors ZAV of SVL/8
+ * bytes each, starting from vector 0.
+ *
+ * Additional data might be appended in the future.
+ *
+ * The ZA matrix is represented in memory in an endianness-invariant layout
+ * which differs from the layout used for the FPSIMD V-registers on big-endian
+ * systems: see sigcontext.h for more explanation.
+ */
+
+#define ZA_PT_ZAV_OFFSET(vq, n) \
+	(ZA_PT_ZA_OFFSET + ((vq * __SVE_VQ_BYTES) * n))
+
+#define ZA_PT_ZA_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
+
+#define ZA_PT_SIZE(vq)						\
+	(ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq))
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _UAPI__ASM_PTRACE_H */
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 414126ce5897..702765f50a47 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -996,6 +996,141 @@ static int ssve_set(struct task_struct *target,
 			      ARM64_VEC_SME);
 }
 
+static int za_get(struct task_struct *target,
+		  const struct user_regset *regset,
+		  struct membuf to)
+{
+	struct user_za_header header;
+	unsigned int vq;
+	unsigned long start, end;
+
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	/* Header */
+	memset(&header, 0, sizeof(header));
+
+	if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT))
+		header.flags |= ZA_PT_VL_INHERIT;
+
+	header.vl = task_get_sme_vl(target);
+	vq = sve_vq_from_vl(header.vl);
+	header.max_vl = sme_max_vl();
+	header.max_size = ZA_PT_SIZE(vq);
+
+	/* If ZA is not active there is only the header */
+	if (thread_za_enabled(&target->thread))
+		header.size = ZA_PT_SIZE(vq);
+	else
+		header.size = ZA_PT_ZA_OFFSET;
+
+	membuf_write(&to, &header, sizeof(header));
+
+	BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header));
+	end = ZA_PT_ZA_OFFSET;
+;
+	if (target == current)
+		fpsimd_preserve_current_state();
+
+	/* Any register data to include? */
+	if (thread_za_enabled(&target->thread)) {
+		start = end;
+		end = ZA_PT_SIZE(vq);
+		membuf_write(&to, target->thread.za_state, end - start);
+	}
+
+	/* Zero any trailing padding */
+	start = end;
+	end = ALIGN(header.size, SVE_VQ_BYTES);
+	return membuf_zero(&to, end - start);
+}
+
+static int za_set(struct task_struct *target,
+		  const struct user_regset *regset,
+		  unsigned int pos, unsigned int count,
+		  const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+	struct user_za_header header;
+	unsigned int vq;
+	unsigned long start, end;
+
+	if (!system_supports_sme())
+		return -EINVAL;
+
+	/* Header */
+	if (count < sizeof(header))
+		return -EINVAL;
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &header,
+				 0, sizeof(header));
+	if (ret)
+		goto out;
+
+	/*
+	 * All current ZA_PT_* flags are consumed by
+	 * vec_set_vector_length(), which will also validate them for
+	 * us:
+	 */
+	ret = vec_set_vector_length(target, ARM64_VEC_SME, header.vl,
+		((unsigned long)header.flags) << 16);
+	if (ret)
+		goto out;
+
+	/* Actual VL set may be less than the user asked for: */
+	vq = sve_vq_from_vl(task_get_sme_vl(target));
+
+	/* Ensure there is some SVE storage for streaming mode */
+	if (!target->thread.sve_state) {
+		sve_alloc(target);
+		if (!target->thread.sve_state) {
+			clear_thread_flag(TIF_SME);
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	/* Allocate/reinit ZA storage */
+	sme_alloc(target);
+	if (!target->thread.za_state) {
+		ret = -ENOMEM;
+		clear_tsk_thread_flag(target, TIF_SME);
+		goto out;
+	}
+
+	/* If there is no data then disable ZA */
+	if (!count) {
+		target->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
+		goto out;
+	}
+
+	/*
+	 * If setting a different VL from the requested VL and there is
+	 * register data, the data layout will be wrong: don't even
+	 * try to set the registers in this case.
+	 */
+	if (vq != sve_vq_from_vl(header.vl)) {
+		ret = -EIO;
+		goto out;
+	}
+
+	BUILD_BUG_ON(ZA_PT_ZA_OFFSET != sizeof(header));
+	start = ZA_PT_ZA_OFFSET;
+	end = ZA_PT_SIZE(vq);
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.za_state,
+				 start, end);
+	if (ret)
+		goto out;
+
+	/* Mark ZA as active and let userspace use it */
+	set_tsk_thread_flag(target, TIF_SME);
+	target->thread.svcr |= SYS_SVCR_EL0_ZA_MASK;
+
+out:
+	fpsimd_flush_task_state(target);
+	return ret;
+}
+
 #endif /* CONFIG_ARM64_SME */
 
 #ifdef CONFIG_ARM64_PTR_AUTH
@@ -1217,6 +1352,7 @@ enum aarch64_regset {
 #endif
 #ifdef CONFIG_ARM64_SVE
 	REGSET_SSVE,
+	REGSET_ZA,
 #endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	REGSET_PAC_MASK,
@@ -1308,6 +1444,14 @@ static const struct user_regset aarch64_regsets[] = {
 		.regset_get = ssve_get,
 		.set = ssve_set,
 	},
+	[REGSET_ZA] = { /* SME ZA */
+		.core_note_type = NT_ARM_ZA,
+		.n = DIV_ROUND_UP(ZA_PT_ZA_SIZE(SVE_VQ_MAX), SVE_VQ_BYTES),
+		.size = SVE_VQ_BYTES,
+		.align = SVE_VQ_BYTES,
+		.regset_get = za_get,
+		.set = za_set,
+	},
 #endif
 #ifdef CONFIG_ARM64_PTR_AUTH
 	[REGSET_PAC_MASK] = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 61502388683f..7ef574f3256a 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -428,6 +428,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
 #define NT_ARM_PAC_ENABLED_KEYS	0x40a	/* arm64 ptr auth enabled keys (prctl()) */
 #define NT_ARM_SSVE	0x40b		/* ARM Streaming SVE registers */
+#define NT_ARM_ZA	0x40c		/* ARM SME ZA registers */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 29/38] arm64/sme: Disable streaming mode and ZA when flushing CPU state
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Both streaming mode and ZA may increase power consumption when they are
enabled and streaming mode makes many FPSIMD and SVE instructions undefined
which will cause problems for any kernel mode floating point so disable
both when we flush the CPU state. This covers both kernel_neon_begin() and
idle and after flushing the state a reload is always required anyway.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 118435dc1da4..c34d32360502 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1736,6 +1736,17 @@ static void fpsimd_flush_cpu_state(void)
 {
 	WARN_ON(!system_supports_fpsimd());
 	__this_cpu_write(fpsimd_last_state.st, NULL);
+
+	/*
+	 * Leaving streaming mode enabled will cause issues for any kernel
+	 * NEON and leaving streaming mode or ZA enabled may incrase power
+	 * consumption.
+	 */
+	if (system_supports_sme())
+		sysreg_clear_set_s(SYS_SVCR_EL0,
+				   SYS_SVCR_EL0_ZA_MASK | SYS_SVCR_EL0_ZA_MASK,
+				   0);
+
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 29/38] arm64/sme: Disable streaming mode and ZA when flushing CPU state
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Both streaming mode and ZA may increase power consumption when they are
enabled and streaming mode makes many FPSIMD and SVE instructions undefined
which will cause problems for any kernel mode floating point so disable
both when we flush the CPU state. This covers both kernel_neon_begin() and
idle and after flushing the state a reload is always required anyway.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 118435dc1da4..c34d32360502 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1736,6 +1736,17 @@ static void fpsimd_flush_cpu_state(void)
 {
 	WARN_ON(!system_supports_fpsimd());
 	__this_cpu_write(fpsimd_last_state.st, NULL);
+
+	/*
+	 * Leaving streaming mode enabled will cause issues for any kernel
+	 * NEON and leaving streaming mode or ZA enabled may incrase power
+	 * consumption.
+	 */
+	if (system_supports_sme())
+		sysreg_clear_set_s(SYS_SVCR_EL0,
+				   SYS_SVCR_EL0_ZA_MASK | SYS_SVCR_EL0_ZA_MASK,
+				   0);
+
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 30/38] arm64/sme: Save and restore streaming mode over EFI runtime calls
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

When saving and restoring the floating point state over an EFI runtime
call ensure that we handle streaming mode, only handling FFR if we are not
in streaming mode and ensuring that we are in normal mode over the call
into runtime services.

We currently assume that ZA will not be modified by runtime services, the
specification is not yet finalised so this may need updating if that
changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 47 +++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index c34d32360502..62f8789f97cc 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1051,21 +1051,25 @@ int vec_verify_vq_map(enum vec_type type)
 
 static void __init sve_efi_setup(void)
 {
-	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
+	int max_vl = 0;
+	int i;
 
 	if (!IS_ENABLED(CONFIG_EFI))
 		return;
 
+	for (i = 0; i < ARRAY_SIZE(vl_info); i++)
+		max_vl = max(vl_info[i].max_vl, max_vl);
+
 	/*
 	 * alloc_percpu() warns and prints a backtrace if this goes wrong.
 	 * This is evidence of a crippled system and we are returning void,
 	 * so no attempt is made to handle this situation here.
 	 */
-	if (!sve_vl_valid(info->max_vl))
+	if (!sve_vl_valid(max_vl))
 		goto fail;
 
 	efi_sve_state = __alloc_percpu(
-		SVE_SIG_REGS_SIZE(sve_vq_from_vl(info->max_vl)), SVE_VQ_BYTES);
+		SVE_SIG_REGS_SIZE(sve_vq_from_vl(max_vl)), SVE_VQ_BYTES);
 	if (!efi_sve_state)
 		goto fail;
 
@@ -1824,6 +1828,7 @@ EXPORT_SYMBOL(kernel_neon_end);
 static DEFINE_PER_CPU(struct user_fpsimd_state, efi_fpsimd_state);
 static DEFINE_PER_CPU(bool, efi_fpsimd_state_used);
 static DEFINE_PER_CPU(bool, efi_sve_state_used);
+static DEFINE_PER_CPU(bool, efi_sm_state);
 
 /*
  * EFI runtime services support functions
@@ -1858,12 +1863,28 @@ void __efi_fpsimd_begin(void)
 		 */
 		if (system_supports_sve() && likely(efi_sve_state)) {
 			char *sve_state = this_cpu_ptr(efi_sve_state);
+			bool ffr = true;
+			u64 svcr;
 
 			__this_cpu_write(efi_sve_state_used, true);
 
+			/* If we are in streaming mode don't touch FFR */
+			if (system_supports_sme()) {
+				svcr = read_sysreg_s(SYS_SVCR_EL0);
+
+				ffr = svcr & SYS_SVCR_EL0_SM_MASK;
+
+				__this_cpu_write(efi_sm_state, ffr);
+			}
+
 			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
-				       true);
+				       ffr);
+
+			if (system_supports_sme())
+				sysreg_clear_set_s(SYS_SVCR_EL0,
+						   SYS_SVCR_EL0_SM_MASK, 0);
+
 		} else {
 			fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state));
 		}
@@ -1886,11 +1907,25 @@ void __efi_fpsimd_end(void)
 		if (system_supports_sve() &&
 		    likely(__this_cpu_read(efi_sve_state_used))) {
 			char const *sve_state = this_cpu_ptr(efi_sve_state);
+			bool ffr = true;
+
+			/*
+			 * Restore streaming mode; EFI calls are
+			 * normal function calls so should not return in
+			 * streaming mode.
+			 */
+			if (system_supports_sme()) {
+				if (__this_cpu_read(efi_sm_state)) {
+					sysreg_clear_set_s(SYS_SVCR_EL0,
+							   0,
+							   SYS_SVCR_EL0_SM_MASK);
+					ffr = false;
+				}
+			}
 
-			sve_set_vq(sve_vq_from_vl(sve_get_vl()) - 1);
 			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
-				       true);
+				       ffr);
 
 			__this_cpu_write(efi_sve_state_used, false);
 		} else {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 30/38] arm64/sme: Save and restore streaming mode over EFI runtime calls
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

When saving and restoring the floating point state over an EFI runtime
call ensure that we handle streaming mode, only handling FFR if we are not
in streaming mode and ensuring that we are in normal mode over the call
into runtime services.

We currently assume that ZA will not be modified by runtime services, the
specification is not yet finalised so this may need updating if that
changes.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/kernel/fpsimd.c | 47 +++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index c34d32360502..62f8789f97cc 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1051,21 +1051,25 @@ int vec_verify_vq_map(enum vec_type type)
 
 static void __init sve_efi_setup(void)
 {
-	struct vl_info *info = &vl_info[ARM64_VEC_SVE];
+	int max_vl = 0;
+	int i;
 
 	if (!IS_ENABLED(CONFIG_EFI))
 		return;
 
+	for (i = 0; i < ARRAY_SIZE(vl_info); i++)
+		max_vl = max(vl_info[i].max_vl, max_vl);
+
 	/*
 	 * alloc_percpu() warns and prints a backtrace if this goes wrong.
 	 * This is evidence of a crippled system and we are returning void,
 	 * so no attempt is made to handle this situation here.
 	 */
-	if (!sve_vl_valid(info->max_vl))
+	if (!sve_vl_valid(max_vl))
 		goto fail;
 
 	efi_sve_state = __alloc_percpu(
-		SVE_SIG_REGS_SIZE(sve_vq_from_vl(info->max_vl)), SVE_VQ_BYTES);
+		SVE_SIG_REGS_SIZE(sve_vq_from_vl(max_vl)), SVE_VQ_BYTES);
 	if (!efi_sve_state)
 		goto fail;
 
@@ -1824,6 +1828,7 @@ EXPORT_SYMBOL(kernel_neon_end);
 static DEFINE_PER_CPU(struct user_fpsimd_state, efi_fpsimd_state);
 static DEFINE_PER_CPU(bool, efi_fpsimd_state_used);
 static DEFINE_PER_CPU(bool, efi_sve_state_used);
+static DEFINE_PER_CPU(bool, efi_sm_state);
 
 /*
  * EFI runtime services support functions
@@ -1858,12 +1863,28 @@ void __efi_fpsimd_begin(void)
 		 */
 		if (system_supports_sve() && likely(efi_sve_state)) {
 			char *sve_state = this_cpu_ptr(efi_sve_state);
+			bool ffr = true;
+			u64 svcr;
 
 			__this_cpu_write(efi_sve_state_used, true);
 
+			/* If we are in streaming mode don't touch FFR */
+			if (system_supports_sme()) {
+				svcr = read_sysreg_s(SYS_SVCR_EL0);
+
+				ffr = svcr & SYS_SVCR_EL0_SM_MASK;
+
+				__this_cpu_write(efi_sm_state, ffr);
+			}
+
 			sve_save_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
-				       true);
+				       ffr);
+
+			if (system_supports_sme())
+				sysreg_clear_set_s(SYS_SVCR_EL0,
+						   SYS_SVCR_EL0_SM_MASK, 0);
+
 		} else {
 			fpsimd_save_state(this_cpu_ptr(&efi_fpsimd_state));
 		}
@@ -1886,11 +1907,25 @@ void __efi_fpsimd_end(void)
 		if (system_supports_sve() &&
 		    likely(__this_cpu_read(efi_sve_state_used))) {
 			char const *sve_state = this_cpu_ptr(efi_sve_state);
+			bool ffr = true;
+
+			/*
+			 * Restore streaming mode; EFI calls are
+			 * normal function calls so should not return in
+			 * streaming mode.
+			 */
+			if (system_supports_sme()) {
+				if (__this_cpu_read(efi_sm_state)) {
+					sysreg_clear_set_s(SYS_SVCR_EL0,
+							   0,
+							   SYS_SVCR_EL0_SM_MASK);
+					ffr = false;
+				}
+			}
 
-			sve_set_vq(sve_vq_from_vl(sve_get_vl()) - 1);
 			sve_load_state(sve_state + sve_ffr_offset(sve_max_vl()),
 				       &this_cpu_ptr(&efi_fpsimd_state)->fpsr,
-				       true);
+				       ffr);
 
 			__this_cpu_write(efi_sve_state_used, false);
 		} else {
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 31/38] arm64/sme: Provide Kconfig for SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Now that basline support for the Scalable Matrix Extension (SME) is present
introduce the Kconfig option allowing it to be built. While there is no
requirement for a system with SME to support SVE at runtime the support for
streaming mode SVE is mostly shared with normal SVE so depend on SVE.

Since there is currently no support for KVM and no handling of either
streaming mode or ZA with KVM the option the feature is marked as being
incompatible with KVM and not enabled by default.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/Kconfig | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5c7ae4c3954b..f7004dd0a1b4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1719,6 +1719,17 @@ config ARM64_SVE
 	  booting the kernel.  If unsure and you are not observing these
 	  symptoms, you should assume that it is safe to say Y.
 
+config ARM64_SME
+	bool "ARM Scalable Matrix Extension support"
+	depends on ARM64_SVE
+	depends on !KVM
+	help
+	  The Scalable Matrix Extension (SME) is an extension to the AArch64
+	  execution state which utilises a substantial subset of the SVE
+	  instruction set, together with the addition of new architectural
+	  register state capable of holding two dimensional matrix tiles to
+	  enable various matrix operations.
+
 config ARM64_MODULE_PLTS
 	bool "Use PLTs to allow module memory to spill over into vmalloc area"
 	depends on MODULES
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 31/38] arm64/sme: Provide Kconfig for SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Now that basline support for the Scalable Matrix Extension (SME) is present
introduce the Kconfig option allowing it to be built. While there is no
requirement for a system with SME to support SVE at runtime the support for
streaming mode SVE is mostly shared with normal SVE so depend on SVE.

Since there is currently no support for KVM and no handling of either
streaming mode or ZA with KVM the option the feature is marked as being
incompatible with KVM and not enabled by default.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/Kconfig | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5c7ae4c3954b..f7004dd0a1b4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1719,6 +1719,17 @@ config ARM64_SVE
 	  booting the kernel.  If unsure and you are not observing these
 	  symptoms, you should assume that it is safe to say Y.
 
+config ARM64_SME
+	bool "ARM Scalable Matrix Extension support"
+	depends on ARM64_SVE
+	depends on !KVM
+	help
+	  The Scalable Matrix Extension (SME) is an extension to the AArch64
+	  execution state which utilises a substantial subset of the SVE
+	  instruction set, together with the addition of new architectural
+	  register state capable of holding two dimensional matrix tiles to
+	  enable various matrix operations.
+
 config ARM64_MODULE_PLTS
 	bool "Use PLTs to allow module memory to spill over into vmalloc area"
 	depends on MODULES
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 32/38] kselftest/arm64: Add tests for TPIDR2
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The Scalable Matrix Extension adds a new system register TPIDR2 intended to
be used by libc for its own thread specific use, add some kselftests which
exercise the ABI for it.

Since this test should with some adjustment work for TPIDR and any other
similar registers added in future add tests for it in a separate
directory rather than placing it with the other floating point tests,
nothing existing looked suitable so I created a new test directory
called "abi".

Since this feature is intended to be used by libc the test is built as
freestanding code using nolibc so we don't end up with the test program
and libc both trying to manage the register simultaneously and
distrupting each other. As a result of being written using nolibc rather
than using hwcaps to identify if SME is available in the system we check
for the default SME vector length configuration in proc, adding hwcap
support to nolibc seems like disproportionate effort and didn't feel
entirely idiomatic for what nolibc is trying to do.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/Makefile       |   2 +-
 tools/testing/selftests/arm64/abi/.gitignore |   1 +
 tools/testing/selftests/arm64/abi/Makefile   |  13 ++
 tools/testing/selftests/arm64/abi/tpidr2.c   | 204 +++++++++++++++++++
 4 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/abi/.gitignore
 create mode 100644 tools/testing/selftests/arm64/abi/Makefile
 create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c

diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index ced910fb4019..1e8d9a8f59df 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -4,7 +4,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),aarch64 arm64))
-ARM64_SUBTARGETS ?= tags signal pauth fp mte bti
+ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi
 else
 ARM64_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/arm64/abi/.gitignore b/tools/testing/selftests/arm64/abi/.gitignore
new file mode 100644
index 000000000000..4b04670993ea
--- /dev/null
+++ b/tools/testing/selftests/arm64/abi/.gitignore
@@ -0,0 +1 @@
+tpidr2
diff --git a/tools/testing/selftests/arm64/abi/Makefile b/tools/testing/selftests/arm64/abi/Makefile
new file mode 100644
index 000000000000..c32fe00ee67f
--- /dev/null
+++ b/tools/testing/selftests/arm64/abi/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2021 ARM Limited
+
+TEST_GEN_PROGS := tpidr2
+
+include ../../lib.mk
+
+# Build with nolibc since TPIDR2 is intended to be actively managed by
+# libc and we're trying to test the functionality that it depends on here.
+$(OUTPUT)/tpidr2: tpidr2.c
+	$(CC) -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
+		-static -include ../../../../include/nolibc/nolibc.h \
+		 -Wall $^ -o $@ -lgcc
diff --git a/tools/testing/selftests/arm64/abi/tpidr2.c b/tools/testing/selftests/arm64/abi/tpidr2.c
new file mode 100644
index 000000000000..917962913257
--- /dev/null
+++ b/tools/testing/selftests/arm64/abi/tpidr2.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define SYS_TPIDR2 "S3_3_C13_C0_5"
+
+#define EXPECTED_TESTS 4
+
+static void putstr(const char *str)
+{
+	write(1, str, strlen(str));
+}
+
+static void putnum(unsigned int num)
+{
+	char c;
+
+	if (num / 10)
+		putnum(num / 10);
+
+	c = '0' + (num % 10);
+	write(1, &c, 1);
+}
+
+static int tests_run;
+static int tests_passed;
+static int tests_failed;
+static int tests_skipped;
+
+static void set_tpidr2(uint64_t val)
+{
+	asm volatile (
+		"msr	" SYS_TPIDR2 ", %0\n"
+		:
+		: "r"(val)
+		: "cc");
+}
+
+static uint64_t get_tpidr2(void)
+{
+	uint64_t val;
+
+	asm volatile (
+		"mrs	%0, " SYS_TPIDR2 "\n"
+		: "=r"(val)
+		:
+		: "cc");
+
+	return val;
+}
+
+static void print_summary(void)
+{
+	if (tests_passed + tests_failed + tests_skipped != EXPECTED_TESTS)
+		putstr("# UNEXPECTED TEST COUNT: ");
+
+	putstr("# Totals: pass:");
+	putnum(tests_passed);
+	putstr(" fail:");
+	putnum(tests_failed);
+	putstr(" xfail:0 xpass:0 skip:");
+	putnum(tests_skipped);
+	putstr(" error:0\n");
+}
+
+/* Processes should start with TPIDR2 == 0 */
+static int default_value(void)
+{
+	return get_tpidr2() == 0;
+}
+
+/* If we set TPIDR2 we should read that value */
+static int write_read(void)
+{
+	set_tpidr2(getpid());
+
+	return getpid() == get_tpidr2();
+}
+
+/* If we set a value we should read the same value after scheduling out */
+static int write_sleep_read(void)
+{
+	set_tpidr2(getpid());
+
+	msleep(100);
+
+	return getpid() == get_tpidr2();
+}
+
+/*
+ * If we fork the value in the parent should be unchanged and the
+ * child should start with 0 and be able to set its own value.
+ */
+static int write_fork_read(void)
+{
+	pid_t newpid, waiting;
+	int status;
+
+	set_tpidr2(getpid());
+
+	newpid = fork();
+	if (newpid == 0) {
+		/* In child */
+		if (get_tpidr2() != 0) {
+			putstr("# TPIDR2 non-zero in child: ");
+			putnum(get_tpidr2());
+			putstr("\n");
+			exit(0);
+		}
+
+		set_tpidr2(getpid());
+		if (get_tpidr2() == getpid()) {
+			exit(1);
+		} else {
+			putstr("# Failed to set TPIDR2 in child\n");
+			exit(0);
+		}
+	}
+	if (newpid < 0) {
+		putstr("# fork() failed: -");
+		putnum(-newpid);
+		putstr("\n");
+		return 0;
+	}
+
+	for (;;) {
+		waiting = waitpid(newpid, &status, 0);
+
+		if (waiting < 0) {
+			if (errno == EINTR)
+				continue;
+			putstr("# waitpid() failed: ");
+			putnum(errno);
+			putstr("\n");
+			return 0;
+		}
+		if (waiting != newpid) {
+			putstr("# waitpid() returned wrong PID\n");
+			return 0;
+		}
+
+		if (!WIFEXITED(status)) {
+			putstr("# child did not exit\n");
+			return 0;
+		}
+
+		if (getpid() != get_tpidr2()) {
+			putstr("# TPIDR2 corrupted in parent\n");
+			return 0;
+		}
+
+		return WEXITSTATUS(status);
+	}
+}
+
+#define run_test(name)			     \
+	if (name()) {			     \
+		tests_passed++;		     \
+	} else {			     \
+		tests_failed++;		     \
+		putstr("not ");		     \
+	}				     \
+	putstr("ok ");			     \
+	putnum(++tests_run);		     \
+	putstr(" " #name "\n");
+
+int main(int argc, char **argv)
+{
+	int ret, i;
+
+	putstr("TAP version 13\n");
+	putstr("1..");
+	putnum(EXPECTED_TESTS);
+	putstr("\n");
+
+	putstr("# PID: ");
+	putnum(getpid());
+	putstr("\n");
+
+	/*
+	 * This test is run with nolibc which doesn't support hwcap and
+	 * it's probably disproportionate to implement so instead check
+	 * for the default vector length configuration in /proc.
+	 */
+	ret = open("/proc/sys/abi/sme_default_vector_length", O_RDONLY, 0);
+	if (ret >= 0) {
+		run_test(default_value);
+		run_test(write_read);
+		run_test(write_sleep_read);
+		run_test(write_fork_read);
+	} else {
+		putstr("# SME support not present\n");
+
+		for (i = 0; i < EXPECTED_TESTS; i++) {
+			putstr("ok ");
+			putnum(i);
+			putstr(" skipped, TPIDR2 not supported\n");
+		}
+
+		tests_skipped += EXPECTED_TESTS;
+	}
+
+	print_summary();
+
+	return 0;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 32/38] kselftest/arm64: Add tests for TPIDR2
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

The Scalable Matrix Extension adds a new system register TPIDR2 intended to
be used by libc for its own thread specific use, add some kselftests which
exercise the ABI for it.

Since this test should with some adjustment work for TPIDR and any other
similar registers added in future add tests for it in a separate
directory rather than placing it with the other floating point tests,
nothing existing looked suitable so I created a new test directory
called "abi".

Since this feature is intended to be used by libc the test is built as
freestanding code using nolibc so we don't end up with the test program
and libc both trying to manage the register simultaneously and
distrupting each other. As a result of being written using nolibc rather
than using hwcaps to identify if SME is available in the system we check
for the default SME vector length configuration in proc, adding hwcap
support to nolibc seems like disproportionate effort and didn't feel
entirely idiomatic for what nolibc is trying to do.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/Makefile       |   2 +-
 tools/testing/selftests/arm64/abi/.gitignore |   1 +
 tools/testing/selftests/arm64/abi/Makefile   |  13 ++
 tools/testing/selftests/arm64/abi/tpidr2.c   | 204 +++++++++++++++++++
 4 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/abi/.gitignore
 create mode 100644 tools/testing/selftests/arm64/abi/Makefile
 create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c

diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index ced910fb4019..1e8d9a8f59df 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -4,7 +4,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),aarch64 arm64))
-ARM64_SUBTARGETS ?= tags signal pauth fp mte bti
+ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi
 else
 ARM64_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/arm64/abi/.gitignore b/tools/testing/selftests/arm64/abi/.gitignore
new file mode 100644
index 000000000000..4b04670993ea
--- /dev/null
+++ b/tools/testing/selftests/arm64/abi/.gitignore
@@ -0,0 +1 @@
+tpidr2
diff --git a/tools/testing/selftests/arm64/abi/Makefile b/tools/testing/selftests/arm64/abi/Makefile
new file mode 100644
index 000000000000..c32fe00ee67f
--- /dev/null
+++ b/tools/testing/selftests/arm64/abi/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2021 ARM Limited
+
+TEST_GEN_PROGS := tpidr2
+
+include ../../lib.mk
+
+# Build with nolibc since TPIDR2 is intended to be actively managed by
+# libc and we're trying to test the functionality that it depends on here.
+$(OUTPUT)/tpidr2: tpidr2.c
+	$(CC) -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
+		-static -include ../../../../include/nolibc/nolibc.h \
+		 -Wall $^ -o $@ -lgcc
diff --git a/tools/testing/selftests/arm64/abi/tpidr2.c b/tools/testing/selftests/arm64/abi/tpidr2.c
new file mode 100644
index 000000000000..917962913257
--- /dev/null
+++ b/tools/testing/selftests/arm64/abi/tpidr2.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define SYS_TPIDR2 "S3_3_C13_C0_5"
+
+#define EXPECTED_TESTS 4
+
+static void putstr(const char *str)
+{
+	write(1, str, strlen(str));
+}
+
+static void putnum(unsigned int num)
+{
+	char c;
+
+	if (num / 10)
+		putnum(num / 10);
+
+	c = '0' + (num % 10);
+	write(1, &c, 1);
+}
+
+static int tests_run;
+static int tests_passed;
+static int tests_failed;
+static int tests_skipped;
+
+static void set_tpidr2(uint64_t val)
+{
+	asm volatile (
+		"msr	" SYS_TPIDR2 ", %0\n"
+		:
+		: "r"(val)
+		: "cc");
+}
+
+static uint64_t get_tpidr2(void)
+{
+	uint64_t val;
+
+	asm volatile (
+		"mrs	%0, " SYS_TPIDR2 "\n"
+		: "=r"(val)
+		:
+		: "cc");
+
+	return val;
+}
+
+static void print_summary(void)
+{
+	if (tests_passed + tests_failed + tests_skipped != EXPECTED_TESTS)
+		putstr("# UNEXPECTED TEST COUNT: ");
+
+	putstr("# Totals: pass:");
+	putnum(tests_passed);
+	putstr(" fail:");
+	putnum(tests_failed);
+	putstr(" xfail:0 xpass:0 skip:");
+	putnum(tests_skipped);
+	putstr(" error:0\n");
+}
+
+/* Processes should start with TPIDR2 == 0 */
+static int default_value(void)
+{
+	return get_tpidr2() == 0;
+}
+
+/* If we set TPIDR2 we should read that value */
+static int write_read(void)
+{
+	set_tpidr2(getpid());
+
+	return getpid() == get_tpidr2();
+}
+
+/* If we set a value we should read the same value after scheduling out */
+static int write_sleep_read(void)
+{
+	set_tpidr2(getpid());
+
+	msleep(100);
+
+	return getpid() == get_tpidr2();
+}
+
+/*
+ * If we fork the value in the parent should be unchanged and the
+ * child should start with 0 and be able to set its own value.
+ */
+static int write_fork_read(void)
+{
+	pid_t newpid, waiting;
+	int status;
+
+	set_tpidr2(getpid());
+
+	newpid = fork();
+	if (newpid == 0) {
+		/* In child */
+		if (get_tpidr2() != 0) {
+			putstr("# TPIDR2 non-zero in child: ");
+			putnum(get_tpidr2());
+			putstr("\n");
+			exit(0);
+		}
+
+		set_tpidr2(getpid());
+		if (get_tpidr2() == getpid()) {
+			exit(1);
+		} else {
+			putstr("# Failed to set TPIDR2 in child\n");
+			exit(0);
+		}
+	}
+	if (newpid < 0) {
+		putstr("# fork() failed: -");
+		putnum(-newpid);
+		putstr("\n");
+		return 0;
+	}
+
+	for (;;) {
+		waiting = waitpid(newpid, &status, 0);
+
+		if (waiting < 0) {
+			if (errno == EINTR)
+				continue;
+			putstr("# waitpid() failed: ");
+			putnum(errno);
+			putstr("\n");
+			return 0;
+		}
+		if (waiting != newpid) {
+			putstr("# waitpid() returned wrong PID\n");
+			return 0;
+		}
+
+		if (!WIFEXITED(status)) {
+			putstr("# child did not exit\n");
+			return 0;
+		}
+
+		if (getpid() != get_tpidr2()) {
+			putstr("# TPIDR2 corrupted in parent\n");
+			return 0;
+		}
+
+		return WEXITSTATUS(status);
+	}
+}
+
+#define run_test(name)			     \
+	if (name()) {			     \
+		tests_passed++;		     \
+	} else {			     \
+		tests_failed++;		     \
+		putstr("not ");		     \
+	}				     \
+	putstr("ok ");			     \
+	putnum(++tests_run);		     \
+	putstr(" " #name "\n");
+
+int main(int argc, char **argv)
+{
+	int ret, i;
+
+	putstr("TAP version 13\n");
+	putstr("1..");
+	putnum(EXPECTED_TESTS);
+	putstr("\n");
+
+	putstr("# PID: ");
+	putnum(getpid());
+	putstr("\n");
+
+	/*
+	 * This test is run with nolibc which doesn't support hwcap and
+	 * it's probably disproportionate to implement so instead check
+	 * for the default vector length configuration in /proc.
+	 */
+	ret = open("/proc/sys/abi/sme_default_vector_length", O_RDONLY, 0);
+	if (ret >= 0) {
+		run_test(default_value);
+		run_test(write_read);
+		run_test(write_sleep_read);
+		run_test(write_fork_read);
+	} else {
+		putstr("# SME support not present\n");
+
+		for (i = 0; i < EXPECTED_TESTS; i++) {
+			putstr("ok ");
+			putnum(i);
+			putstr(" skipped, TPIDR2 not supported\n");
+		}
+
+		tests_skipped += EXPECTED_TESTS;
+	}
+
+	print_summary();
+
+	return 0;
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 33/38] kselftest/arm64: Extend vector configuration API tests to cover SME
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Provide RDVL helpers for SME and extend the main vector configuration tests
to cover SME.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore   |  1 +
 tools/testing/selftests/arm64/fp/Makefile     |  3 ++-
 tools/testing/selftests/arm64/fp/rdvl-sme.c   | 14 ++++++++++++++
 tools/testing/selftests/arm64/fp/rdvl.S       | 16 ++++++++++++++++
 tools/testing/selftests/arm64/fp/rdvl.h       |  1 +
 tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 ++++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index b67395903b9b..885dd592807b 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -1,4 +1,5 @@
 fpsimd-test
+rdvl-sme
 rdvl-sve
 sve-probe-vls
 sve-ptrace
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index 4367125b7c27..ff1c8fde3aed 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -3,7 +3,7 @@
 CFLAGS += -I../../../../../usr/include/
 TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
 TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
-	rdvl-sve \
+	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
 	vlset
 
@@ -11,6 +11,7 @@ all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
 
 fpsimd-test: fpsimd-test.o
 	$(CC) -nostdlib $^ -o $@
+rdvl-sme: rdvl-sme.o rdvl.o
 rdvl-sve: rdvl-sve.o rdvl.o
 sve-ptrace: sve-ptrace.o
 sve-probe-vls: sve-probe-vls.o rdvl.o
diff --git a/tools/testing/selftests/arm64/fp/rdvl-sme.c b/tools/testing/selftests/arm64/fp/rdvl-sme.c
new file mode 100644
index 000000000000..49b0b2e08bac
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/rdvl-sme.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <stdio.h>
+
+#include "rdvl.h"
+
+int main(void)
+{
+	int vl = rdvl_sme();
+
+	printf("%d\n", vl);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/arm64/fp/rdvl.S b/tools/testing/selftests/arm64/fp/rdvl.S
index c916c1c9defd..a9a028ba1b79 100644
--- a/tools/testing/selftests/arm64/fp/rdvl.S
+++ b/tools/testing/selftests/arm64/fp/rdvl.S
@@ -8,3 +8,19 @@ rdvl_sve:
 	hint	34			// BTI C
 	rdvl	x0, #1
 	ret
+
+.globl rdvl_sme
+rdvl_sme:
+	hint	34			// BTI C
+
+	// Enter streaming mode
+	mov	x16, #1
+	msr	S3_3_C4_C2_2, x16
+
+	rdvl	x0, #1
+
+	// Leave streaming mode
+	mov	x16, #0
+	msr	S3_3_C4_C2_2, x16
+
+	ret
diff --git a/tools/testing/selftests/arm64/fp/rdvl.h b/tools/testing/selftests/arm64/fp/rdvl.h
index 7c9d953fc9e7..5d323679fbc9 100644
--- a/tools/testing/selftests/arm64/fp/rdvl.h
+++ b/tools/testing/selftests/arm64/fp/rdvl.h
@@ -3,6 +3,7 @@
 #ifndef RDVL_H
 #define RDVL_H
 
+int rdvl_sme(void);
 int rdvl_sve(void);
 
 #endif
diff --git a/tools/testing/selftests/arm64/fp/vec-syscfg.c b/tools/testing/selftests/arm64/fp/vec-syscfg.c
index 272b888e018e..0b976eb1c1d1 100644
--- a/tools/testing/selftests/arm64/fp/vec-syscfg.c
+++ b/tools/testing/selftests/arm64/fp/vec-syscfg.c
@@ -53,6 +53,16 @@ static struct vec_data vec_data[] = {
 		.prctl_set = PR_SVE_SET_VL,
 		.default_vl_file = "/proc/sys/abi/sve_default_vector_length",
 	},
+	{
+		.name = "SME",
+		.hwcap_type = AT_HWCAP2,
+		.hwcap = HWCAP2_SME,
+		.rdvl = rdvl_sme,
+		.rdvl_binary = "./rdvl-sme",
+		.prctl_get = PR_SME_GET_VL,
+		.prctl_set = PR_SME_SET_VL,
+		.default_vl_file = "/proc/sys/abi/sme_default_vector_length",
+	},
 };
 
 static int stdio_read_integer(FILE *f, const char *what, int *val)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 33/38] kselftest/arm64: Extend vector configuration API tests to cover SME
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Provide RDVL helpers for SME and extend the main vector configuration tests
to cover SME.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore   |  1 +
 tools/testing/selftests/arm64/fp/Makefile     |  3 ++-
 tools/testing/selftests/arm64/fp/rdvl-sme.c   | 14 ++++++++++++++
 tools/testing/selftests/arm64/fp/rdvl.S       | 16 ++++++++++++++++
 tools/testing/selftests/arm64/fp/rdvl.h       |  1 +
 tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 ++++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index b67395903b9b..885dd592807b 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -1,4 +1,5 @@
 fpsimd-test
+rdvl-sme
 rdvl-sve
 sve-probe-vls
 sve-ptrace
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index 4367125b7c27..ff1c8fde3aed 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -3,7 +3,7 @@
 CFLAGS += -I../../../../../usr/include/
 TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
 TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
-	rdvl-sve \
+	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
 	vlset
 
@@ -11,6 +11,7 @@ all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
 
 fpsimd-test: fpsimd-test.o
 	$(CC) -nostdlib $^ -o $@
+rdvl-sme: rdvl-sme.o rdvl.o
 rdvl-sve: rdvl-sve.o rdvl.o
 sve-ptrace: sve-ptrace.o
 sve-probe-vls: sve-probe-vls.o rdvl.o
diff --git a/tools/testing/selftests/arm64/fp/rdvl-sme.c b/tools/testing/selftests/arm64/fp/rdvl-sme.c
new file mode 100644
index 000000000000..49b0b2e08bac
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/rdvl-sme.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <stdio.h>
+
+#include "rdvl.h"
+
+int main(void)
+{
+	int vl = rdvl_sme();
+
+	printf("%d\n", vl);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/arm64/fp/rdvl.S b/tools/testing/selftests/arm64/fp/rdvl.S
index c916c1c9defd..a9a028ba1b79 100644
--- a/tools/testing/selftests/arm64/fp/rdvl.S
+++ b/tools/testing/selftests/arm64/fp/rdvl.S
@@ -8,3 +8,19 @@ rdvl_sve:
 	hint	34			// BTI C
 	rdvl	x0, #1
 	ret
+
+.globl rdvl_sme
+rdvl_sme:
+	hint	34			// BTI C
+
+	// Enter streaming mode
+	mov	x16, #1
+	msr	S3_3_C4_C2_2, x16
+
+	rdvl	x0, #1
+
+	// Leave streaming mode
+	mov	x16, #0
+	msr	S3_3_C4_C2_2, x16
+
+	ret
diff --git a/tools/testing/selftests/arm64/fp/rdvl.h b/tools/testing/selftests/arm64/fp/rdvl.h
index 7c9d953fc9e7..5d323679fbc9 100644
--- a/tools/testing/selftests/arm64/fp/rdvl.h
+++ b/tools/testing/selftests/arm64/fp/rdvl.h
@@ -3,6 +3,7 @@
 #ifndef RDVL_H
 #define RDVL_H
 
+int rdvl_sme(void);
 int rdvl_sve(void);
 
 #endif
diff --git a/tools/testing/selftests/arm64/fp/vec-syscfg.c b/tools/testing/selftests/arm64/fp/vec-syscfg.c
index 272b888e018e..0b976eb1c1d1 100644
--- a/tools/testing/selftests/arm64/fp/vec-syscfg.c
+++ b/tools/testing/selftests/arm64/fp/vec-syscfg.c
@@ -53,6 +53,16 @@ static struct vec_data vec_data[] = {
 		.prctl_set = PR_SVE_SET_VL,
 		.default_vl_file = "/proc/sys/abi/sve_default_vector_length",
 	},
+	{
+		.name = "SME",
+		.hwcap_type = AT_HWCAP2,
+		.hwcap = HWCAP2_SME,
+		.rdvl = rdvl_sme,
+		.rdvl_binary = "./rdvl-sme",
+		.prctl_get = PR_SME_GET_VL,
+		.prctl_set = PR_SME_SET_VL,
+		.default_vl_file = "/proc/sys/abi/sme_default_vector_length",
+	},
 };
 
 static int stdio_read_integer(FILE *f, const char *what, int *val)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 34/38] kselftest/arm64: sme: Provide streaming mode SVE stress test
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

One of the features of SME is the addition of streaming mode, in which we
have access to a set of streaming mode SVE registers at the SME vector
length. Since these are accessed using the SVE instructions let's reuse
the existing SVE stress test for testing with a compile time option for
controlling the few small differences needed:

 - Enter streaming mode immediately on starting the program.
 - In streaming mode FFR is removed so skip reading and writing FFR.

In order to avoid requiring a cutting edge toolchain with SME support
use the op/CR form for specifying SVCR.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore  |  1 +
 tools/testing/selftests/arm64/fp/Makefile    |  3 +
 tools/testing/selftests/arm64/fp/ssve-stress | 59 ++++++++++++++++++++
 tools/testing/selftests/arm64/fp/sve-test.S  | 30 ++++++++++
 4 files changed, 93 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index 885dd592807b..73c600e1ab81 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -4,5 +4,6 @@ rdvl-sve
 sve-probe-vls
 sve-ptrace
 sve-test
+ssve-test
 vec-syscfg
 vlset
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index ff1c8fde3aed..11dbe05c5070 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -5,6 +5,7 @@ TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
 TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
 	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
+	ssve-test ssve-stress \
 	vlset
 
 all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
@@ -17,6 +18,8 @@ sve-ptrace: sve-ptrace.o
 sve-probe-vls: sve-probe-vls.o rdvl.o
 sve-test: sve-test.o
 	$(CC) -nostdlib $^ -o $@
+ssve-test: sve-test.S
+	$(CC) -DSSVE -nostdlib $^ -o $@
 vec-syscfg: vec-syscfg.o rdvl.o
 vlset: vlset.o
 
diff --git a/tools/testing/selftests/arm64/fp/ssve-stress b/tools/testing/selftests/arm64/fp/ssve-stress
new file mode 100644
index 000000000000..e2bd2cc184ad
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/ssve-stress
@@ -0,0 +1,59 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright (C) 2015-2019 ARM Limited.
+# Original author: Dave Martin <Dave.Martin@arm.com>
+
+set -ue
+
+NR_CPUS=`nproc`
+
+pids=
+logs=
+
+cleanup () {
+	trap - INT TERM CHLD
+	set +e
+
+	if [ -n "$pids" ]; then
+		kill $pids
+		wait $pids
+		pids=
+	fi
+
+	if [ -n "$logs" ]; then
+		cat $logs
+		rm $logs
+		logs=
+	fi
+}
+
+interrupt () {
+	cleanup
+	exit 0
+}
+
+child_died () {
+	cleanup
+	exit 1
+}
+
+trap interrupt INT TERM EXIT
+
+for x in `seq 0 $((NR_CPUS * 4))`; do
+	log=`mktemp`
+	logs=$logs\ $log
+	./ssve-test >$log &
+	pids=$pids\ $!
+done
+
+# Wait for all child processes to be created:
+sleep 10
+
+while :; do
+	kill -USR1 $pids
+done &
+pids=$pids\ $!
+
+wait
+
+exit 1
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index e3e08d9c7020..fa52d6735b76 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -292,6 +292,7 @@ endfunction
 // We fill the upper lanes of FFR with zeros.
 // Beware: corrupts P0.
 function setup_ffr
+#ifndef SSVE
 	mov	x4, x30
 
 	and	w0, w0, #0x3
@@ -314,6 +315,9 @@ function setup_ffr
 	wrffr	p0.b
 
 	ret	x4
+#else
+	ret
+#endif
 endfunction
 
 // Fill x1 bytes starting at x0 with 0xae (for canary purposes)
@@ -423,6 +427,7 @@ endfunction
 // Beware -- corrupts P0.
 // Clobbers x0-x5.
 function check_ffr
+#ifndef SSVE
 	mov	x3, x30
 
 	ldr	x4, =scratch
@@ -443,6 +448,9 @@ function check_ffr
 	mov	x2, x5
 	mov	x30, x3
 	b	memcmp
+#else
+	ret
+#endif
 endfunction
 
 // Any SVE register modified here can cause corruption in the main
@@ -458,13 +466,26 @@ function irritator_handler
 	movi	v0.8b, #1
 	movi	v9.16b, #2
 	movi	v31.8b, #3
+#ifndef SSVE
 	// And P0
 	rdffr	p0.b
 	// And FFR
 	wrffr	p15.b
+#endif
+
+	ret
+endfunction
+
+#ifdef SSVE
+function enable_sm
+	// Set SVCR.SM to 1, equivalent to SMSTART SM but doesn't need a
+	// SME capable toolchain.
+	mov	x0, #1
+	msr	S3_3_C4_C2_2, x0
 
 	ret
 endfunction
+#endif
 
 function terminate_handler
 	mov	w21, w0
@@ -522,6 +543,11 @@ endfunction
 .globl _start
 function _start
 _start:
+#ifdef SSVE
+	puts	"Streaming mode "
+	bl	enable_sm
+#endif
+
 	// Sanity-check and report the vector length
 
 	rdvl	x19, #8
@@ -570,6 +596,10 @@ _start:
 	orr	w2, w2, #SA_NODEFER
 	bl	setsignal
 
+#ifdef SSVE
+	bl	enable_sm	// syscalls will have exited streaming mode
+#endif
+
 	mov	x22, #0		// generation number, increments per iteration
 .Ltest_loop:
 	rdvl	x0, #8
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 34/38] kselftest/arm64: sme: Provide streaming mode SVE stress test
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

One of the features of SME is the addition of streaming mode, in which we
have access to a set of streaming mode SVE registers at the SME vector
length. Since these are accessed using the SVE instructions let's reuse
the existing SVE stress test for testing with a compile time option for
controlling the few small differences needed:

 - Enter streaming mode immediately on starting the program.
 - In streaming mode FFR is removed so skip reading and writing FFR.

In order to avoid requiring a cutting edge toolchain with SME support
use the op/CR form for specifying SVCR.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore  |  1 +
 tools/testing/selftests/arm64/fp/Makefile    |  3 +
 tools/testing/selftests/arm64/fp/ssve-stress | 59 ++++++++++++++++++++
 tools/testing/selftests/arm64/fp/sve-test.S  | 30 ++++++++++
 4 files changed, 93 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index 885dd592807b..73c600e1ab81 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -4,5 +4,6 @@ rdvl-sve
 sve-probe-vls
 sve-ptrace
 sve-test
+ssve-test
 vec-syscfg
 vlset
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index ff1c8fde3aed..11dbe05c5070 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -5,6 +5,7 @@ TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
 TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
 	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
+	ssve-test ssve-stress \
 	vlset
 
 all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
@@ -17,6 +18,8 @@ sve-ptrace: sve-ptrace.o
 sve-probe-vls: sve-probe-vls.o rdvl.o
 sve-test: sve-test.o
 	$(CC) -nostdlib $^ -o $@
+ssve-test: sve-test.S
+	$(CC) -DSSVE -nostdlib $^ -o $@
 vec-syscfg: vec-syscfg.o rdvl.o
 vlset: vlset.o
 
diff --git a/tools/testing/selftests/arm64/fp/ssve-stress b/tools/testing/selftests/arm64/fp/ssve-stress
new file mode 100644
index 000000000000..e2bd2cc184ad
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/ssve-stress
@@ -0,0 +1,59 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright (C) 2015-2019 ARM Limited.
+# Original author: Dave Martin <Dave.Martin@arm.com>
+
+set -ue
+
+NR_CPUS=`nproc`
+
+pids=
+logs=
+
+cleanup () {
+	trap - INT TERM CHLD
+	set +e
+
+	if [ -n "$pids" ]; then
+		kill $pids
+		wait $pids
+		pids=
+	fi
+
+	if [ -n "$logs" ]; then
+		cat $logs
+		rm $logs
+		logs=
+	fi
+}
+
+interrupt () {
+	cleanup
+	exit 0
+}
+
+child_died () {
+	cleanup
+	exit 1
+}
+
+trap interrupt INT TERM EXIT
+
+for x in `seq 0 $((NR_CPUS * 4))`; do
+	log=`mktemp`
+	logs=$logs\ $log
+	./ssve-test >$log &
+	pids=$pids\ $!
+done
+
+# Wait for all child processes to be created:
+sleep 10
+
+while :; do
+	kill -USR1 $pids
+done &
+pids=$pids\ $!
+
+wait
+
+exit 1
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index e3e08d9c7020..fa52d6735b76 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -292,6 +292,7 @@ endfunction
 // We fill the upper lanes of FFR with zeros.
 // Beware: corrupts P0.
 function setup_ffr
+#ifndef SSVE
 	mov	x4, x30
 
 	and	w0, w0, #0x3
@@ -314,6 +315,9 @@ function setup_ffr
 	wrffr	p0.b
 
 	ret	x4
+#else
+	ret
+#endif
 endfunction
 
 // Fill x1 bytes starting at x0 with 0xae (for canary purposes)
@@ -423,6 +427,7 @@ endfunction
 // Beware -- corrupts P0.
 // Clobbers x0-x5.
 function check_ffr
+#ifndef SSVE
 	mov	x3, x30
 
 	ldr	x4, =scratch
@@ -443,6 +448,9 @@ function check_ffr
 	mov	x2, x5
 	mov	x30, x3
 	b	memcmp
+#else
+	ret
+#endif
 endfunction
 
 // Any SVE register modified here can cause corruption in the main
@@ -458,13 +466,26 @@ function irritator_handler
 	movi	v0.8b, #1
 	movi	v9.16b, #2
 	movi	v31.8b, #3
+#ifndef SSVE
 	// And P0
 	rdffr	p0.b
 	// And FFR
 	wrffr	p15.b
+#endif
+
+	ret
+endfunction
+
+#ifdef SSVE
+function enable_sm
+	// Set SVCR.SM to 1, equivalent to SMSTART SM but doesn't need a
+	// SME capable toolchain.
+	mov	x0, #1
+	msr	S3_3_C4_C2_2, x0
 
 	ret
 endfunction
+#endif
 
 function terminate_handler
 	mov	w21, w0
@@ -522,6 +543,11 @@ endfunction
 .globl _start
 function _start
 _start:
+#ifdef SSVE
+	puts	"Streaming mode "
+	bl	enable_sm
+#endif
+
 	// Sanity-check and report the vector length
 
 	rdvl	x19, #8
@@ -570,6 +596,10 @@ _start:
 	orr	w2, w2, #SA_NODEFER
 	bl	setsignal
 
+#ifdef SSVE
+	bl	enable_sm	// syscalls will have exited streaming mode
+#endif
+
 	mov	x22, #0		// generation number, increments per iteration
 .Ltest_loop:
 	rdvl	x0, #8
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 35/38] kselftest/arm64: Add stress test for SME ZA context switching
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Add a stress test for context switching of the ZA register state based on
the similar tests Dave Martin wrote for FPSIMD and SVE registers. The test
loops indefinitely writing a data pattern to ZA then reading it back and
verifying that it's what was expected.

Unlike the other tests we manually assemble the SME instructions since at
present no released toolchain has SME support integrated.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore |   1 +
 tools/testing/selftests/arm64/fp/Makefile   |   3 +
 tools/testing/selftests/arm64/fp/za-stress  |  59 +++
 tools/testing/selftests/arm64/fp/za-test.S  | 545 ++++++++++++++++++++
 4 files changed, 608 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/fp/za-stress
 create mode 100644 tools/testing/selftests/arm64/fp/za-test.S

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index 73c600e1ab81..1178fecc7aa1 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -7,3 +7,4 @@ sve-test
 ssve-test
 vec-syscfg
 vlset
+za-test
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index 11dbe05c5070..4f32cb1041a0 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -6,6 +6,7 @@ TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
 	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
 	ssve-test ssve-stress \
+	za-test za-stress \
 	vlset
 
 all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
@@ -22,5 +23,7 @@ ssve-test: sve-test.S
 	$(CC) -DSSVE -nostdlib $^ -o $@
 vec-syscfg: vec-syscfg.o rdvl.o
 vlset: vlset.o
+za-test: za-test.o
+	$(CC) -nostdlib $^ -o $@
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/fp/za-stress b/tools/testing/selftests/arm64/fp/za-stress
new file mode 100644
index 000000000000..5ac386b55b95
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/za-stress
@@ -0,0 +1,59 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright (C) 2015-2019 ARM Limited.
+# Original author: Dave Martin <Dave.Martin@arm.com>
+
+set -ue
+
+NR_CPUS=`nproc`
+
+pids=
+logs=
+
+cleanup () {
+	trap - INT TERM CHLD
+	set +e
+
+	if [ -n "$pids" ]; then
+		kill $pids
+		wait $pids
+		pids=
+	fi
+
+	if [ -n "$logs" ]; then
+		cat $logs
+		rm $logs
+		logs=
+	fi
+}
+
+interrupt () {
+	cleanup
+	exit 0
+}
+
+child_died () {
+	cleanup
+	exit 1
+}
+
+trap interrupt INT TERM EXIT
+
+for x in `seq 0 $((NR_CPUS * 4))`; do
+	log=`mktemp`
+	logs=$logs\ $log
+	./za-test >$log &
+	pids=$pids\ $!
+done
+
+# Wait for all child processes to be created:
+sleep 10
+
+while :; do
+	kill -USR1 $pids
+done &
+pids=$pids\ $!
+
+wait
+
+exit 1
diff --git a/tools/testing/selftests/arm64/fp/za-test.S b/tools/testing/selftests/arm64/fp/za-test.S
new file mode 100644
index 000000000000..b9e0e2e07dad
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/za-test.S
@@ -0,0 +1,545 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2021 ARM Limited.
+// Original author: Mark Brown <broonie@kernel.org>
+//
+// Scalable Matrix Extension ZA context switch test
+// Repeatedly writes unique test patterns into each ZA tile
+// and reads them back to verify integrity.
+//
+// for x in `seq 1 NR_CPUS`; do sve-test & pids=$pids\ $! ; done
+// (leave it running for as long as you want...)
+// kill $pids
+
+#include <asm/unistd.h>
+#include "assembler.h"
+#include "asm-offsets.h"
+
+.arch_extension sve
+
+#define MAXVL     2048
+#define MAXVL_B   (MAXVL / 8)
+
+/*
+ * LDR (vector to ZA array):
+ *	LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _ldr_za nw, nxbase, offset=0
+	.inst	0xe1000000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+/*
+ * STR (vector from ZA array):
+ *	STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _str_za nw, nxbase, offset=0
+	.inst	0xe1200000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+// Print a single character x0 to stdout
+// Clobbers x0-x2,x8
+function putc
+	str	x0, [sp, #-16]!
+
+	mov	x0, #1			// STDOUT_FILENO
+	mov	x1, sp
+	mov	x2, #1
+	mov	x8, #__NR_write
+	svc	#0
+
+	add	sp, sp, #16
+	ret
+endfunction
+
+// Print a NUL-terminated string starting at address x0 to stdout
+// Clobbers x0-x3,x8
+function puts
+	mov	x1, x0
+
+	mov	x2, #0
+0:	ldrb	w3, [x0], #1
+	cbz	w3, 1f
+	add	x2, x2, #1
+	b	0b
+
+1:	mov	w0, #1			// STDOUT_FILENO
+	mov	x8, #__NR_write
+	svc	#0
+
+	ret
+endfunction
+
+// Utility macro to print a literal string
+// Clobbers x0-x4,x8
+.macro puts string
+	.pushsection .rodata.str1.1, "aMS", 1
+.L__puts_literal\@: .string "\string"
+	.popsection
+
+	ldr	x0, =.L__puts_literal\@
+	bl	puts
+.endm
+
+// Print an unsigned decimal number x0 to stdout
+// Clobbers x0-x4,x8
+function putdec
+	mov	x1, sp
+	str	x30, [sp, #-32]!	// Result can't be > 20 digits
+
+	mov	x2, #0
+	strb	w2, [x1, #-1]!		// Write the NUL terminator
+
+	mov	x2, #10
+0:	udiv	x3, x0, x2		// div-mod loop to generate the digits
+	msub	x0, x3, x2, x0
+	add	w0, w0, #'0'
+	strb	w0, [x1, #-1]!
+	mov	x0, x3
+	cbnz	x3, 0b
+
+	ldrb	w0, [x1]
+	cbnz	w0, 1f
+	mov	w0, #'0'		// Print "0" for 0, not ""
+	strb	w0, [x1, #-1]!
+
+1:	mov	x0, x1
+	bl	puts
+
+	ldr	x30, [sp], #32
+	ret
+endfunction
+
+// Print an unsigned decimal number x0 to stdout, followed by a newline
+// Clobbers x0-x5,x8
+function putdecn
+	mov	x5, x30
+
+	bl	putdec
+	mov	x0, #'\n'
+	bl	putc
+
+	ret	x5
+endfunction
+
+// Clobbers x0-x3,x8
+function puthexb
+	str	x30, [sp, #-0x10]!
+
+	mov	w3, w0
+	lsr	w0, w0, #4
+	bl	puthexnibble
+	mov	w0, w3
+
+	ldr	x30, [sp], #0x10
+	// fall through to puthexnibble
+endfunction
+// Clobbers x0-x2,x8
+function puthexnibble
+	and	w0, w0, #0xf
+	cmp	w0, #10
+	blo	1f
+	add	w0, w0, #'a' - ('9' + 1)
+1:	add	w0, w0, #'0'
+	b	putc
+endfunction
+
+// x0=data in, x1=size in, clobbers x0-x5,x8
+function dumphex
+	str	x30, [sp, #-0x10]!
+
+	mov	x4, x0
+	mov	x5, x1
+
+0:	subs	x5, x5, #1
+	b.lo	1f
+	ldrb	w0, [x4], #1
+	bl	puthexb
+	b	0b
+
+1:	ldr	x30, [sp], #0x10
+	ret
+endfunction
+
+// Declare some storage space to shadow ZA register contents and a
+// scratch buffer for a vector.
+.pushsection .text
+.data
+.align 4
+zaref:
+	.space	MAXVL_B * MAXVL_B
+scratch:
+	.space	MAXVL_B
+.popsection
+
+// Trivial memory copy: copy x2 bytes, starting at address x1, to address x0.
+// Clobbers x0-x3
+function memcpy
+	cmp	x2, #0
+	b.eq	1f
+0:	ldrb	w3, [x1], #1
+	strb	w3, [x0], #1
+	subs	x2, x2, #1
+	b.ne	0b
+1:	ret
+endfunction
+
+// Generate a test pattern for storage in ZA
+// x0: pid	(16 bits)
+// x1: row in ZA (8 bits)
+// x2: generation (4 bits)
+
+// These values are used to constuct a 32-bit pattern that is repeated in the
+// scratch buffer as many times as will fit:
+// bits 31:28	generation number (increments once per test_loop)
+// bits 27:22	32-bit lane index
+// bits 21:14	row number
+// bits 13: 0	pid
+
+function pattern
+	and	x0, x0, #0x1fff
+	orr	w1, w0, w1, lsl #13
+	orr	w2, w1, w2, lsl #28
+
+	ldr	x0, =scratch
+	mov	w1, #MAXVL_B / 4
+
+0:	str	w2, [x0], #4
+	add	w2, w2, #(1 << 22)
+	subs	w1, w1, #1
+	bne	0b
+
+	ret
+endfunction
+
+// Get the address of shadow data for ZA horizontal vector xn
+.macro _adrza xd, xn, nrtmp
+	ldr	\xd, =zaref
+	rdvl	x\nrtmp, #1
+	madd	\xd, x\nrtmp, \xn, \xd
+.endm
+
+// Set up test pattern in a ZA horizontal vector
+// x0: pid
+// x1: row number
+// x2: generation
+function setup_za
+	mov	x4, x30
+	mov	x12, x1			// Use x12 for vector select
+
+	bl	pattern			// Get pattern in scratch buffer
+	_adrza	x0, x12, 2		// Shadow buffer pointer to x0 and x5
+	mov	x5, x0
+	ldr	x1, =scratch
+	bl	memcpy			// length set up by pattern
+
+	_ldr_za 12, 5			// load vector w12 from pointer x5
+
+	ret	x4
+endfunction
+
+// Fill x1 bytes starting at x0 with 0xae (for canary purposes)
+// Clobbers x1, x2.
+function memfill_ae
+	mov	w2, #0xae
+	b	memfill
+endfunction
+
+// Fill x1 bytes starting at x0 with 0.
+// Clobbers x1, x2.
+function memclr
+	mov	w2, #0
+	b	memfill
+endfunction
+
+// Trivial memory fill: fill x1 bytes starting at address x0 with byte w2
+// Clobbers x1
+function memfill
+	cmp	x1, #0
+	b.eq	1f
+
+0:	strb	w2, [x0], #1
+	subs	x1, x1, #1
+	b.ne	0b
+
+1:	ret
+endfunction
+
+// Trivial memory compare: compare x2 bytes starting at address x0 with
+// bytes starting at address x1.
+// Returns only if all bytes match; otherwise, the program is aborted.
+// Clobbers x0-x5.
+function memcmp
+	cbz	x2, 2f
+
+	stp	x0, x1, [sp, #-0x20]!
+	str	x2, [sp, #0x10]
+
+	mov	x5, #0
+0:	ldrb	w3, [x0, x5]
+	ldrb	w4, [x1, x5]
+	add	x5, x5, #1
+	cmp	w3, w4
+	b.ne	1f
+	subs	x2, x2, #1
+	b.ne	0b
+
+1:	ldr	x2, [sp, #0x10]
+	ldp	x0, x1, [sp], #0x20
+	b.ne	barf
+
+2:	ret
+endfunction
+
+// Verify that a ZA vector matches its shadow in memory, else abort
+// x0: vector number
+// Clobbers x0-x7 and x12.
+function check_za
+	mov	x3, x30
+
+	mov	x12, x0
+	_adrza	x5, x0, 6		// pointer to expected value in x5
+	mov	x4, x0
+	ldr	x7, =scratch		// x7 is scratch
+
+	mov	x0, x7
+	mov	x1, x6
+	bl	memfill_ae
+
+	_str_za 12, 7			// save vector w12 to pointer x7
+
+	mov	x0, x5
+	mov	x1, x7
+	mov	x2, x6
+	mov	x30, x3
+	b	memcmp
+endfunction
+
+// Any SME register modified here can cause corruption in the main
+// thread -- but *only* the locations modified here.
+function irritator_handler
+	// Increment the irritation signal count (x23):
+	ldr	x0, [x2, #ucontext_regs + 8 * 23]
+	add	x0, x0, #1
+	str	x0, [x2, #ucontext_regs + 8 * 23]
+
+	// Corrupt some random ZA data
+#if 0
+	adr	x0, .text + (irritator_handler - .text) / 16 * 16
+	movi	v0.8b, #1
+	movi	v9.16b, #2
+	movi	v31.8b, #3
+#endif
+
+	ret
+endfunction
+
+function smstart
+	// Set SVCR.SM to 3, equivalent to SMSTART but doesn't need a
+	// SME capable toolchain.
+	mov	x0, #3
+	msr	S3_3_C4_C2_2, x0
+
+	ret
+endfunction
+
+function terminate_handler
+	mov	w21, w0
+	mov	x20, x2
+
+	puts	"Terminated by signal "
+	mov	w0, w21
+	bl	putdec
+	puts	", no error, iterations="
+	ldr	x0, [x20, #ucontext_regs + 8 * 22]
+	bl	putdec
+	puts	", signals="
+	ldr	x0, [x20, #ucontext_regs + 8 * 23]
+	bl	putdecn
+
+	mov	x0, #0
+	mov	x8, #__NR_exit
+	svc	#0
+endfunction
+
+// w0: signal number
+// x1: sa_action
+// w2: sa_flags
+// Clobbers x0-x6,x8
+function setsignal
+	str	x30, [sp, #-((sa_sz + 15) / 16 * 16 + 16)]!
+
+	mov	w4, w0
+	mov	x5, x1
+	mov	w6, w2
+
+	add	x0, sp, #16
+	mov	x1, #sa_sz
+	bl	memclr
+
+	mov	w0, w4
+	add	x1, sp, #16
+	str	w6, [x1, #sa_flags]
+	str	x5, [x1, #sa_handler]
+	mov	x2, #0
+	mov	x3, #sa_mask_sz
+	mov	x8, #__NR_rt_sigaction
+	svc	#0
+
+	cbz	w0, 1f
+
+	puts	"sigaction failure\n"
+	b	.Labort
+
+1:	ldr	x30, [sp], #((sa_sz + 15) / 16 * 16 + 16)
+	ret
+endfunction
+
+// Main program entry point
+.globl _start
+function _start
+_start:
+	puts	"Streaming mode "
+	bl	smstart
+
+	// Sanity-check and report the vector length
+
+	rdvl	x19, #8
+	cmp	x19, #128
+	b.lo	1f
+	cmp	x19, #2048
+	b.hi	1f
+	tst	x19, #(8 - 1)
+	b.eq	2f
+
+1:	puts	"bad vector length: "
+	mov	x0, x19
+	bl	putdecn
+	b	.Labort
+
+2:	puts	"vector length:\t"
+	mov	x0, x19
+	bl	putdec
+	puts	" bits\n"
+
+	// Obtain our PID, to ensure test pattern uniqueness between processes
+	mov	x8, #__NR_getpid
+	svc	#0
+	mov	x20, x0
+
+	puts	"PID:\t"
+	mov	x0, x20
+	bl	putdecn
+
+	mov	x23, #0		// Irritation signal count
+
+	mov	w0, #SIGINT
+	adr	x1, terminate_handler
+	mov	w2, #SA_SIGINFO
+	bl	setsignal
+
+	mov	w0, #SIGTERM
+	adr	x1, terminate_handler
+	mov	w2, #SA_SIGINFO
+	bl	setsignal
+
+	mov	w0, #SIGUSR1
+	adr	x1, irritator_handler
+	mov	w2, #SA_SIGINFO
+	orr	w2, w2, #SA_NODEFER
+	bl	setsignal
+
+	bl	smstart		// printing and signals dropped out of SM
+	mov	x22, #0		// generation number, increments per iteration
+.Ltest_loop:
+	rdvl	x0, #8
+	cmp	x0, x19
+	b.ne	vl_barf
+
+	rdvl	x21, #1		// Set up ZA & shadow with test pattern
+0:	mov	x0, x20
+	sub	x1, x21, #1
+	and	x2, x22, #0xf
+	bl	setup_za
+	subs	x21, x21, #1
+	bne	0b
+
+	mov	x8, #__NR_sched_yield	// Encourage preemption
+	svc	#0
+	bl	smstart		// syscall dropped out of SM
+
+	rdvl	x21, #1		// Set up ZA & shadow with test pattern
+0:	sub	x0, x21, #1
+	bl	check_za
+	subs	x21, x21, #1
+	bne	0b
+
+	add	x22, x22, #1	// Everything still working
+	b	.Ltest_loop
+
+.Labort:
+	mov	x0, #0
+	mov	x1, #SIGABRT
+	mov	x8, #__NR_kill
+	svc	#0
+endfunction
+
+function barf
+// fpsimd.c acitivty log dump hack
+//	ldr	w0, =0xdeadc0de
+//	mov	w8, #__NR_exit
+//	svc	#0
+// end hack
+	mov	x10, x0	// expected data
+	mov	x11, x1	// actual data
+	mov	x12, x2	// data size
+
+	puts	"Mismatch: PID="
+	mov	x0, x20
+	bl	putdec
+	puts	", iteration="
+	mov	x0, x22
+	bl	putdec
+	puts	", row="
+	mov	x0, x21
+	bl	putdecn
+	puts	"\tExpected ["
+	mov	x0, x10
+	mov	x1, x12
+	bl	dumphex
+	puts	"]\n\tGot      ["
+	mov	x0, x11
+	mov	x1, x12
+	bl	dumphex
+	puts	"]\n"
+
+	mov	x8, #__NR_getpid
+	svc	#0
+// fpsimd.c acitivty log dump hack
+//	ldr	w0, =0xdeadc0de
+//	mov	w8, #__NR_exit
+//	svc	#0
+// ^ end of hack
+	mov	x1, #SIGABRT
+	mov	x8, #__NR_kill
+	svc	#0
+//	mov	x8, #__NR_exit
+//	mov	x1, #1
+//	svc	#0
+endfunction
+
+function vl_barf
+	mov	x10, x0
+
+	puts	"Bad active VL: "
+	mov	x0, x10
+	bl	putdecn
+
+	mov	x8, #__NR_exit
+	mov	x1, #1
+	svc	#0
+endfunction
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 35/38] kselftest/arm64: Add stress test for SME ZA context switching
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Add a stress test for context switching of the ZA register state based on
the similar tests Dave Martin wrote for FPSIMD and SVE registers. The test
loops indefinitely writing a data pattern to ZA then reading it back and
verifying that it's what was expected.

Unlike the other tests we manually assemble the SME instructions since at
present no released toolchain has SME support integrated.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore |   1 +
 tools/testing/selftests/arm64/fp/Makefile   |   3 +
 tools/testing/selftests/arm64/fp/za-stress  |  59 +++
 tools/testing/selftests/arm64/fp/za-test.S  | 545 ++++++++++++++++++++
 4 files changed, 608 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/fp/za-stress
 create mode 100644 tools/testing/selftests/arm64/fp/za-test.S

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index 73c600e1ab81..1178fecc7aa1 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -7,3 +7,4 @@ sve-test
 ssve-test
 vec-syscfg
 vlset
+za-test
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index 11dbe05c5070..4f32cb1041a0 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -6,6 +6,7 @@ TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
 	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
 	ssve-test ssve-stress \
+	za-test za-stress \
 	vlset
 
 all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
@@ -22,5 +23,7 @@ ssve-test: sve-test.S
 	$(CC) -DSSVE -nostdlib $^ -o $@
 vec-syscfg: vec-syscfg.o rdvl.o
 vlset: vlset.o
+za-test: za-test.o
+	$(CC) -nostdlib $^ -o $@
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/fp/za-stress b/tools/testing/selftests/arm64/fp/za-stress
new file mode 100644
index 000000000000..5ac386b55b95
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/za-stress
@@ -0,0 +1,59 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright (C) 2015-2019 ARM Limited.
+# Original author: Dave Martin <Dave.Martin@arm.com>
+
+set -ue
+
+NR_CPUS=`nproc`
+
+pids=
+logs=
+
+cleanup () {
+	trap - INT TERM CHLD
+	set +e
+
+	if [ -n "$pids" ]; then
+		kill $pids
+		wait $pids
+		pids=
+	fi
+
+	if [ -n "$logs" ]; then
+		cat $logs
+		rm $logs
+		logs=
+	fi
+}
+
+interrupt () {
+	cleanup
+	exit 0
+}
+
+child_died () {
+	cleanup
+	exit 1
+}
+
+trap interrupt INT TERM EXIT
+
+for x in `seq 0 $((NR_CPUS * 4))`; do
+	log=`mktemp`
+	logs=$logs\ $log
+	./za-test >$log &
+	pids=$pids\ $!
+done
+
+# Wait for all child processes to be created:
+sleep 10
+
+while :; do
+	kill -USR1 $pids
+done &
+pids=$pids\ $!
+
+wait
+
+exit 1
diff --git a/tools/testing/selftests/arm64/fp/za-test.S b/tools/testing/selftests/arm64/fp/za-test.S
new file mode 100644
index 000000000000..b9e0e2e07dad
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/za-test.S
@@ -0,0 +1,545 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2021 ARM Limited.
+// Original author: Mark Brown <broonie@kernel.org>
+//
+// Scalable Matrix Extension ZA context switch test
+// Repeatedly writes unique test patterns into each ZA tile
+// and reads them back to verify integrity.
+//
+// for x in `seq 1 NR_CPUS`; do sve-test & pids=$pids\ $! ; done
+// (leave it running for as long as you want...)
+// kill $pids
+
+#include <asm/unistd.h>
+#include "assembler.h"
+#include "asm-offsets.h"
+
+.arch_extension sve
+
+#define MAXVL     2048
+#define MAXVL_B   (MAXVL / 8)
+
+/*
+ * LDR (vector to ZA array):
+ *	LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _ldr_za nw, nxbase, offset=0
+	.inst	0xe1000000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+/*
+ * STR (vector from ZA array):
+ *	STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ */
+.macro _str_za nw, nxbase, offset=0
+	.inst	0xe1200000			\
+		| (((\nw) & 3) << 13)		\
+		| ((\nxbase) << 5)		\
+		| ((\offset) & 7)
+.endm
+
+// Print a single character x0 to stdout
+// Clobbers x0-x2,x8
+function putc
+	str	x0, [sp, #-16]!
+
+	mov	x0, #1			// STDOUT_FILENO
+	mov	x1, sp
+	mov	x2, #1
+	mov	x8, #__NR_write
+	svc	#0
+
+	add	sp, sp, #16
+	ret
+endfunction
+
+// Print a NUL-terminated string starting at address x0 to stdout
+// Clobbers x0-x3,x8
+function puts
+	mov	x1, x0
+
+	mov	x2, #0
+0:	ldrb	w3, [x0], #1
+	cbz	w3, 1f
+	add	x2, x2, #1
+	b	0b
+
+1:	mov	w0, #1			// STDOUT_FILENO
+	mov	x8, #__NR_write
+	svc	#0
+
+	ret
+endfunction
+
+// Utility macro to print a literal string
+// Clobbers x0-x4,x8
+.macro puts string
+	.pushsection .rodata.str1.1, "aMS", 1
+.L__puts_literal\@: .string "\string"
+	.popsection
+
+	ldr	x0, =.L__puts_literal\@
+	bl	puts
+.endm
+
+// Print an unsigned decimal number x0 to stdout
+// Clobbers x0-x4,x8
+function putdec
+	mov	x1, sp
+	str	x30, [sp, #-32]!	// Result can't be > 20 digits
+
+	mov	x2, #0
+	strb	w2, [x1, #-1]!		// Write the NUL terminator
+
+	mov	x2, #10
+0:	udiv	x3, x0, x2		// div-mod loop to generate the digits
+	msub	x0, x3, x2, x0
+	add	w0, w0, #'0'
+	strb	w0, [x1, #-1]!
+	mov	x0, x3
+	cbnz	x3, 0b
+
+	ldrb	w0, [x1]
+	cbnz	w0, 1f
+	mov	w0, #'0'		// Print "0" for 0, not ""
+	strb	w0, [x1, #-1]!
+
+1:	mov	x0, x1
+	bl	puts
+
+	ldr	x30, [sp], #32
+	ret
+endfunction
+
+// Print an unsigned decimal number x0 to stdout, followed by a newline
+// Clobbers x0-x5,x8
+function putdecn
+	mov	x5, x30
+
+	bl	putdec
+	mov	x0, #'\n'
+	bl	putc
+
+	ret	x5
+endfunction
+
+// Clobbers x0-x3,x8
+function puthexb
+	str	x30, [sp, #-0x10]!
+
+	mov	w3, w0
+	lsr	w0, w0, #4
+	bl	puthexnibble
+	mov	w0, w3
+
+	ldr	x30, [sp], #0x10
+	// fall through to puthexnibble
+endfunction
+// Clobbers x0-x2,x8
+function puthexnibble
+	and	w0, w0, #0xf
+	cmp	w0, #10
+	blo	1f
+	add	w0, w0, #'a' - ('9' + 1)
+1:	add	w0, w0, #'0'
+	b	putc
+endfunction
+
+// x0=data in, x1=size in, clobbers x0-x5,x8
+function dumphex
+	str	x30, [sp, #-0x10]!
+
+	mov	x4, x0
+	mov	x5, x1
+
+0:	subs	x5, x5, #1
+	b.lo	1f
+	ldrb	w0, [x4], #1
+	bl	puthexb
+	b	0b
+
+1:	ldr	x30, [sp], #0x10
+	ret
+endfunction
+
+// Declare some storage space to shadow ZA register contents and a
+// scratch buffer for a vector.
+.pushsection .text
+.data
+.align 4
+zaref:
+	.space	MAXVL_B * MAXVL_B
+scratch:
+	.space	MAXVL_B
+.popsection
+
+// Trivial memory copy: copy x2 bytes, starting at address x1, to address x0.
+// Clobbers x0-x3
+function memcpy
+	cmp	x2, #0
+	b.eq	1f
+0:	ldrb	w3, [x1], #1
+	strb	w3, [x0], #1
+	subs	x2, x2, #1
+	b.ne	0b
+1:	ret
+endfunction
+
+// Generate a test pattern for storage in ZA
+// x0: pid	(16 bits)
+// x1: row in ZA (8 bits)
+// x2: generation (4 bits)
+
+// These values are used to constuct a 32-bit pattern that is repeated in the
+// scratch buffer as many times as will fit:
+// bits 31:28	generation number (increments once per test_loop)
+// bits 27:22	32-bit lane index
+// bits 21:14	row number
+// bits 13: 0	pid
+
+function pattern
+	and	x0, x0, #0x1fff
+	orr	w1, w0, w1, lsl #13
+	orr	w2, w1, w2, lsl #28
+
+	ldr	x0, =scratch
+	mov	w1, #MAXVL_B / 4
+
+0:	str	w2, [x0], #4
+	add	w2, w2, #(1 << 22)
+	subs	w1, w1, #1
+	bne	0b
+
+	ret
+endfunction
+
+// Get the address of shadow data for ZA horizontal vector xn
+.macro _adrza xd, xn, nrtmp
+	ldr	\xd, =zaref
+	rdvl	x\nrtmp, #1
+	madd	\xd, x\nrtmp, \xn, \xd
+.endm
+
+// Set up test pattern in a ZA horizontal vector
+// x0: pid
+// x1: row number
+// x2: generation
+function setup_za
+	mov	x4, x30
+	mov	x12, x1			// Use x12 for vector select
+
+	bl	pattern			// Get pattern in scratch buffer
+	_adrza	x0, x12, 2		// Shadow buffer pointer to x0 and x5
+	mov	x5, x0
+	ldr	x1, =scratch
+	bl	memcpy			// length set up by pattern
+
+	_ldr_za 12, 5			// load vector w12 from pointer x5
+
+	ret	x4
+endfunction
+
+// Fill x1 bytes starting at x0 with 0xae (for canary purposes)
+// Clobbers x1, x2.
+function memfill_ae
+	mov	w2, #0xae
+	b	memfill
+endfunction
+
+// Fill x1 bytes starting at x0 with 0.
+// Clobbers x1, x2.
+function memclr
+	mov	w2, #0
+	b	memfill
+endfunction
+
+// Trivial memory fill: fill x1 bytes starting at address x0 with byte w2
+// Clobbers x1
+function memfill
+	cmp	x1, #0
+	b.eq	1f
+
+0:	strb	w2, [x0], #1
+	subs	x1, x1, #1
+	b.ne	0b
+
+1:	ret
+endfunction
+
+// Trivial memory compare: compare x2 bytes starting at address x0 with
+// bytes starting at address x1.
+// Returns only if all bytes match; otherwise, the program is aborted.
+// Clobbers x0-x5.
+function memcmp
+	cbz	x2, 2f
+
+	stp	x0, x1, [sp, #-0x20]!
+	str	x2, [sp, #0x10]
+
+	mov	x5, #0
+0:	ldrb	w3, [x0, x5]
+	ldrb	w4, [x1, x5]
+	add	x5, x5, #1
+	cmp	w3, w4
+	b.ne	1f
+	subs	x2, x2, #1
+	b.ne	0b
+
+1:	ldr	x2, [sp, #0x10]
+	ldp	x0, x1, [sp], #0x20
+	b.ne	barf
+
+2:	ret
+endfunction
+
+// Verify that a ZA vector matches its shadow in memory, else abort
+// x0: vector number
+// Clobbers x0-x7 and x12.
+function check_za
+	mov	x3, x30
+
+	mov	x12, x0
+	_adrza	x5, x0, 6		// pointer to expected value in x5
+	mov	x4, x0
+	ldr	x7, =scratch		// x7 is scratch
+
+	mov	x0, x7
+	mov	x1, x6
+	bl	memfill_ae
+
+	_str_za 12, 7			// save vector w12 to pointer x7
+
+	mov	x0, x5
+	mov	x1, x7
+	mov	x2, x6
+	mov	x30, x3
+	b	memcmp
+endfunction
+
+// Any SME register modified here can cause corruption in the main
+// thread -- but *only* the locations modified here.
+function irritator_handler
+	// Increment the irritation signal count (x23):
+	ldr	x0, [x2, #ucontext_regs + 8 * 23]
+	add	x0, x0, #1
+	str	x0, [x2, #ucontext_regs + 8 * 23]
+
+	// Corrupt some random ZA data
+#if 0
+	adr	x0, .text + (irritator_handler - .text) / 16 * 16
+	movi	v0.8b, #1
+	movi	v9.16b, #2
+	movi	v31.8b, #3
+#endif
+
+	ret
+endfunction
+
+function smstart
+	// Set SVCR.SM to 3, equivalent to SMSTART but doesn't need a
+	// SME capable toolchain.
+	mov	x0, #3
+	msr	S3_3_C4_C2_2, x0
+
+	ret
+endfunction
+
+function terminate_handler
+	mov	w21, w0
+	mov	x20, x2
+
+	puts	"Terminated by signal "
+	mov	w0, w21
+	bl	putdec
+	puts	", no error, iterations="
+	ldr	x0, [x20, #ucontext_regs + 8 * 22]
+	bl	putdec
+	puts	", signals="
+	ldr	x0, [x20, #ucontext_regs + 8 * 23]
+	bl	putdecn
+
+	mov	x0, #0
+	mov	x8, #__NR_exit
+	svc	#0
+endfunction
+
+// w0: signal number
+// x1: sa_action
+// w2: sa_flags
+// Clobbers x0-x6,x8
+function setsignal
+	str	x30, [sp, #-((sa_sz + 15) / 16 * 16 + 16)]!
+
+	mov	w4, w0
+	mov	x5, x1
+	mov	w6, w2
+
+	add	x0, sp, #16
+	mov	x1, #sa_sz
+	bl	memclr
+
+	mov	w0, w4
+	add	x1, sp, #16
+	str	w6, [x1, #sa_flags]
+	str	x5, [x1, #sa_handler]
+	mov	x2, #0
+	mov	x3, #sa_mask_sz
+	mov	x8, #__NR_rt_sigaction
+	svc	#0
+
+	cbz	w0, 1f
+
+	puts	"sigaction failure\n"
+	b	.Labort
+
+1:	ldr	x30, [sp], #((sa_sz + 15) / 16 * 16 + 16)
+	ret
+endfunction
+
+// Main program entry point
+.globl _start
+function _start
+_start:
+	puts	"Streaming mode "
+	bl	smstart
+
+	// Sanity-check and report the vector length
+
+	rdvl	x19, #8
+	cmp	x19, #128
+	b.lo	1f
+	cmp	x19, #2048
+	b.hi	1f
+	tst	x19, #(8 - 1)
+	b.eq	2f
+
+1:	puts	"bad vector length: "
+	mov	x0, x19
+	bl	putdecn
+	b	.Labort
+
+2:	puts	"vector length:\t"
+	mov	x0, x19
+	bl	putdec
+	puts	" bits\n"
+
+	// Obtain our PID, to ensure test pattern uniqueness between processes
+	mov	x8, #__NR_getpid
+	svc	#0
+	mov	x20, x0
+
+	puts	"PID:\t"
+	mov	x0, x20
+	bl	putdecn
+
+	mov	x23, #0		// Irritation signal count
+
+	mov	w0, #SIGINT
+	adr	x1, terminate_handler
+	mov	w2, #SA_SIGINFO
+	bl	setsignal
+
+	mov	w0, #SIGTERM
+	adr	x1, terminate_handler
+	mov	w2, #SA_SIGINFO
+	bl	setsignal
+
+	mov	w0, #SIGUSR1
+	adr	x1, irritator_handler
+	mov	w2, #SA_SIGINFO
+	orr	w2, w2, #SA_NODEFER
+	bl	setsignal
+
+	bl	smstart		// printing and signals dropped out of SM
+	mov	x22, #0		// generation number, increments per iteration
+.Ltest_loop:
+	rdvl	x0, #8
+	cmp	x0, x19
+	b.ne	vl_barf
+
+	rdvl	x21, #1		// Set up ZA & shadow with test pattern
+0:	mov	x0, x20
+	sub	x1, x21, #1
+	and	x2, x22, #0xf
+	bl	setup_za
+	subs	x21, x21, #1
+	bne	0b
+
+	mov	x8, #__NR_sched_yield	// Encourage preemption
+	svc	#0
+	bl	smstart		// syscall dropped out of SM
+
+	rdvl	x21, #1		// Set up ZA & shadow with test pattern
+0:	sub	x0, x21, #1
+	bl	check_za
+	subs	x21, x21, #1
+	bne	0b
+
+	add	x22, x22, #1	// Everything still working
+	b	.Ltest_loop
+
+.Labort:
+	mov	x0, #0
+	mov	x1, #SIGABRT
+	mov	x8, #__NR_kill
+	svc	#0
+endfunction
+
+function barf
+// fpsimd.c acitivty log dump hack
+//	ldr	w0, =0xdeadc0de
+//	mov	w8, #__NR_exit
+//	svc	#0
+// end hack
+	mov	x10, x0	// expected data
+	mov	x11, x1	// actual data
+	mov	x12, x2	// data size
+
+	puts	"Mismatch: PID="
+	mov	x0, x20
+	bl	putdec
+	puts	", iteration="
+	mov	x0, x22
+	bl	putdec
+	puts	", row="
+	mov	x0, x21
+	bl	putdecn
+	puts	"\tExpected ["
+	mov	x0, x10
+	mov	x1, x12
+	bl	dumphex
+	puts	"]\n\tGot      ["
+	mov	x0, x11
+	mov	x1, x12
+	bl	dumphex
+	puts	"]\n"
+
+	mov	x8, #__NR_getpid
+	svc	#0
+// fpsimd.c acitivty log dump hack
+//	ldr	w0, =0xdeadc0de
+//	mov	w8, #__NR_exit
+//	svc	#0
+// ^ end of hack
+	mov	x1, #SIGABRT
+	mov	x8, #__NR_kill
+	svc	#0
+//	mov	x8, #__NR_exit
+//	mov	x1, #1
+//	svc	#0
+endfunction
+
+function vl_barf
+	mov	x10, x0
+
+	puts	"Bad active VL: "
+	mov	x0, x10
+	bl	putdecn
+
+	mov	x8, #__NR_exit
+	mov	x1, #1
+	svc	#0
+endfunction
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 36/38] kselftest/arm64: signal: Add SME signal handling tests
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Add test cases for the SME signal handing ABI patterned off the SVE tests.
Due to the small size of the tests and the differences in ABI (especially
around needing to account for both streaming SVE and ZA) there is some code
duplication here.

We currently cover:
 - Reporting of the vector length.
 - Lack of support for changing vector length.
 - Presence and size of register state for streaming SVE and ZA.

As with the SVE tests we do not yet have any validation of register
contents.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../testing/selftests/arm64/signal/.gitignore |   2 +
 .../selftests/arm64/signal/test_signals.h     |   2 +
 .../arm64/signal/test_signals_utils.c         |   3 +
 .../testcases/fake_sigreturn_sme_change_vl.c  |  92 +++++++++++++
 .../selftests/arm64/signal/testcases/sme_vl.c |  70 ++++++++++
 .../arm64/signal/testcases/ssve_regs.c        | 129 ++++++++++++++++++
 6 files changed, 298 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c

diff --git a/tools/testing/selftests/arm64/signal/.gitignore b/tools/testing/selftests/arm64/signal/.gitignore
index c1742755abb9..4de8eb26d4de 100644
--- a/tools/testing/selftests/arm64/signal/.gitignore
+++ b/tools/testing/selftests/arm64/signal/.gitignore
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 mangle_*
 fake_sigreturn_*
+sme_*
+ssve_*
 sve_*
 !*.[ch]
diff --git a/tools/testing/selftests/arm64/signal/test_signals.h b/tools/testing/selftests/arm64/signal/test_signals.h
index ebe8694dbef0..d0523a50ee78 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.h
+++ b/tools/testing/selftests/arm64/signal/test_signals.h
@@ -34,11 +34,13 @@
 enum {
 	FSSBS_BIT,
 	FSVE_BIT,
+	FSME_BIT,
 	FMAX_END
 };
 
 #define FEAT_SSBS		(1UL << FSSBS_BIT)
 #define FEAT_SVE		(1UL << FSVE_BIT)
+#define FEAT_SME		(1UL << FSME_BIT)
 
 /*
  * A descriptor used to describe and configure a test case.
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 22722abc9dfa..d0ff0515fca4 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -27,6 +27,7 @@ static int sig_copyctx = SIGTRAP;
 static char const *const feats_names[FMAX_END] = {
 	" SSBS ",
 	" SVE ",
+	" SME ",
 };
 
 #define MAX_FEATS_SZ	128
@@ -266,6 +267,8 @@ int test_init(struct tdescr *td)
 			td->feats_supported |= FEAT_SSBS;
 		if (getauxval(AT_HWCAP) & HWCAP_SVE)
 			td->feats_supported |= FEAT_SVE;
+		if (getauxval(AT_HWCAP2) & HWCAP2_SME)
+			td->feats_supported |= FEAT_SME;
 		if (feats_ok(td)) {
 			fprintf(stderr,
 				"Required Features: [%s] supported\n",
diff --git a/tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c b/tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
new file mode 100644
index 000000000000..7ed762b7202f
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Attempt to change the streaming SVE vector length in a signal
+ * handler, this is not supported and is expected to segfault.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+struct fake_sigframe sf;
+static unsigned int vls[SVE_VQ_MAX];
+unsigned int nvls = 0;
+
+static bool sme_get_vls(struct tdescr *td)
+{
+	int vq, vl;
+
+	/*
+	 * Enumerate up to SVE_VQ_MAX vector lengths
+	 */
+	for (vq = SVE_VQ_MAX; vq > 0; --vq) {
+		vl = prctl(PR_SVE_SET_VL, vq * 16);
+		if (vl == -1)
+			return false;
+
+		vl &= PR_SME_VL_LEN_MASK;
+
+		/* Skip missing VLs */
+		vq = sve_vq_from_vl(vl);
+
+		vls[nvls++] = vl;
+	}
+
+	/* We need at least two VLs */
+	if (nvls < 2) {
+		fprintf(stderr, "Only %d VL supported\n", nvls);
+		return false;
+	}
+
+	return true;
+}
+
+static int fake_sigreturn_ssve_change_vl(struct tdescr *td,
+					 siginfo_t *si, ucontext_t *uc)
+{
+	size_t resv_sz, offset;
+	struct _aarch64_ctx *head = GET_SF_RESV_HEAD(sf);
+	struct sve_context *sve;
+
+	/* Get a signal context with a SME ZA frame in it */
+	if (!get_current_context(td, &sf.uc))
+		return 1;
+
+	resv_sz = GET_SF_RESV_SIZE(sf);
+	head = get_header(head, SVE_MAGIC, resv_sz, &offset);
+	if (!head) {
+		fprintf(stderr, "No SVE context\n");
+		return 1;
+	}
+
+	if (head->size != sizeof(struct sve_context)) {
+		fprintf(stderr, "Register data present, aborting\n");
+		return 1;
+	}
+
+	sve = (struct sve_context *)head;
+
+	/* No changes are supported; init left us at minimum VL so go to max */
+	fprintf(stderr, "Attempting to change VL from %d to %d\n",
+		sve->vl, vls[0]);
+	sve->vl = vls[0];
+
+	fake_sigreturn(&sf, sizeof(sf), 0);
+
+	return 1;
+}
+
+struct tdescr tde = {
+	.name = "FAKE_SIGRETURN_SSVE_CHANGE",
+	.descr = "Attempt to change Streaming SVE VL",
+	.feats_required = FEAT_SME,
+	.sig_ok = SIGSEGV,
+	.timeout = 3,
+	.init = sme_get_vls,
+	.run = fake_sigreturn_ssve_change_vl,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/sme_vl.c b/tools/testing/selftests/arm64/signal/testcases/sme_vl.c
new file mode 100644
index 000000000000..c40e339a1bc3
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/sme_vl.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Check that the SME vector length reported in signal contexts is the
+ * expected one.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+struct fake_sigframe sf;
+unsigned int vl;
+
+static bool get_sme_vl(struct tdescr *td)
+{
+	int ret = prctl(PR_SME_GET_VL);
+	if (ret == -1)
+		return false;
+
+	vl = ret;
+
+	return true;
+}
+
+static int sme_vl(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+	size_t resv_sz, offset;
+	struct _aarch64_ctx *head = GET_SF_RESV_HEAD(sf);
+	struct sve_context *sve;
+
+	/* Get a signal context which should have a SVE frame in it */
+	if (!get_current_context(td, &sf.uc))
+		return 1;
+
+	resv_sz = GET_SF_RESV_SIZE(sf);
+	head = get_header(head, SVE_MAGIC, resv_sz, &offset);
+	if (!head) {
+		fprintf(stderr, "No SVE context\n");
+		return 1;
+	}
+	sve = (struct sve_context *)head;
+
+	if (sve->vl != vl) {
+		fprintf(stderr, "SSVE sigframe VL %u, expected %u\n",
+			sve->vl, vl);
+		return 1;
+	} else {
+		fprintf(stderr, "got SSVE expected VL %u\n", vl);
+	}
+
+	/* Also check ZA VL */
+
+	td->pass = 1;
+
+	return 0;
+}
+
+struct tdescr tde = {
+	.name = "SME VL",
+	.descr = "Check that we get the right SME VL reported",
+	.feats_required = FEAT_SME,
+	.timeout = 3,
+	.init = get_sme_vl,
+	.run = sme_vl,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/ssve_regs.c b/tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
new file mode 100644
index 000000000000..44a08d43cd50
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Verify that the streaming SVE register context in signal frames is
+ * set up as expected.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+struct fake_sigframe sf;
+static unsigned int vls[SVE_VQ_MAX];
+unsigned int nvls = 0;
+
+static bool sme_get_vls(struct tdescr *td)
+{
+	int vq, vl;
+
+	/*
+	 * Enumerate up to SVE_VQ_MAX vector lengths
+	 */
+	for (vq = SVE_VQ_MAX; vq > 0; --vq) {
+		vl = prctl(PR_SVE_SET_VL, vq * 16);
+		if (vl == -1)
+			return false;
+
+		vl &= PR_SME_VL_LEN_MASK;
+
+		/* Skip missing VLs */
+		vq = sve_vq_from_vl(vl);
+
+		vls[nvls++] = vl;
+	}
+
+	/* We need at least one VL */
+	if (nvls < 1) {
+		fprintf(stderr, "Only %d VL supported\n", nvls);
+		return false;
+	}
+
+	return true;
+}
+
+static void setup_ssve_regs(void)
+{
+	/* SMSTART SM */
+	asm volatile(".inst 0x7f4303d5");
+
+	/* RDVL x16, #1 so we should have SVE regs; real data is TODO */
+	asm volatile(".inst 0x04bf5030" : : : "x16" );
+}
+
+static int do_one_sme_vl(struct tdescr *td, siginfo_t *si, ucontext_t *uc,
+			 unsigned int vl)
+{
+	size_t resv_sz, offset;
+	struct _aarch64_ctx *head = GET_SF_RESV_HEAD(sf);
+	struct sve_context *ssve;
+
+	fprintf(stderr, "Testing VL %d\n", vl);
+
+	if (prctl(PR_SME_SET_VL, vl) == -1) {
+		fprintf(stderr, "Failed to set VL\n");
+		return 1;
+	}
+
+	/* 
+	 * Get a signal context which should have a SVE frame and registers
+	 * in it.
+	 */
+	setup_ssve_regs();
+	if (!get_current_context(td, &sf.uc))
+		return 1;
+
+	resv_sz = GET_SF_RESV_SIZE(sf);
+	head = get_header(head, SVE_MAGIC, resv_sz, &offset);
+	if (!head) {
+		fprintf(stderr, "No SVE context\n");
+		return 1;
+	}
+
+	ssve = (struct sve_context *)head;
+	if (ssve->vl != vl) {
+		fprintf(stderr, "Got VL %d, expected %d\n", ssve->vl, vl);
+		return 1;
+	}
+
+	/* The actual size validation is done in get_current_context() */
+	fprintf(stderr, "Got expected size %u and VL %d\n",
+		head->size, ssve->vl);
+
+	return 0;
+}
+
+static int sme_regs(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+	int i;
+
+	for (i = 0; i < nvls; i++) {
+		/*
+		 * TODO: the signal test helpers can't currently cope
+		 * with signal frames bigger than struct sigcontext,
+		 * skip VLs that will trigger that.
+		 */
+		if (vls[i] > 64)
+			continue;
+
+		if (do_one_sme_vl(td, si, uc, vls[i]))
+			return 1;
+	}
+
+	td->pass = 1;
+
+	return 0;
+}
+
+struct tdescr tde = {
+	.name = "Streaming SVE registers",
+	.descr = "Check that we get the right Streaming SVE registers reported",
+	.feats_required = FEAT_SME,
+	.timeout = 3,
+	.init = sme_get_vls,
+	.run = sme_regs,
+};
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 36/38] kselftest/arm64: signal: Add SME signal handling tests
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Add test cases for the SME signal handing ABI patterned off the SVE tests.
Due to the small size of the tests and the differences in ABI (especially
around needing to account for both streaming SVE and ZA) there is some code
duplication here.

We currently cover:
 - Reporting of the vector length.
 - Lack of support for changing vector length.
 - Presence and size of register state for streaming SVE and ZA.

As with the SVE tests we do not yet have any validation of register
contents.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../testing/selftests/arm64/signal/.gitignore |   2 +
 .../selftests/arm64/signal/test_signals.h     |   2 +
 .../arm64/signal/test_signals_utils.c         |   3 +
 .../testcases/fake_sigreturn_sme_change_vl.c  |  92 +++++++++++++
 .../selftests/arm64/signal/testcases/sme_vl.c |  70 ++++++++++
 .../arm64/signal/testcases/ssve_regs.c        | 129 ++++++++++++++++++
 6 files changed, 298 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
 create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c

diff --git a/tools/testing/selftests/arm64/signal/.gitignore b/tools/testing/selftests/arm64/signal/.gitignore
index c1742755abb9..4de8eb26d4de 100644
--- a/tools/testing/selftests/arm64/signal/.gitignore
+++ b/tools/testing/selftests/arm64/signal/.gitignore
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 mangle_*
 fake_sigreturn_*
+sme_*
+ssve_*
 sve_*
 !*.[ch]
diff --git a/tools/testing/selftests/arm64/signal/test_signals.h b/tools/testing/selftests/arm64/signal/test_signals.h
index ebe8694dbef0..d0523a50ee78 100644
--- a/tools/testing/selftests/arm64/signal/test_signals.h
+++ b/tools/testing/selftests/arm64/signal/test_signals.h
@@ -34,11 +34,13 @@
 enum {
 	FSSBS_BIT,
 	FSVE_BIT,
+	FSME_BIT,
 	FMAX_END
 };
 
 #define FEAT_SSBS		(1UL << FSSBS_BIT)
 #define FEAT_SVE		(1UL << FSVE_BIT)
+#define FEAT_SME		(1UL << FSME_BIT)
 
 /*
  * A descriptor used to describe and configure a test case.
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c
index 22722abc9dfa..d0ff0515fca4 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.c
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c
@@ -27,6 +27,7 @@ static int sig_copyctx = SIGTRAP;
 static char const *const feats_names[FMAX_END] = {
 	" SSBS ",
 	" SVE ",
+	" SME ",
 };
 
 #define MAX_FEATS_SZ	128
@@ -266,6 +267,8 @@ int test_init(struct tdescr *td)
 			td->feats_supported |= FEAT_SSBS;
 		if (getauxval(AT_HWCAP) & HWCAP_SVE)
 			td->feats_supported |= FEAT_SVE;
+		if (getauxval(AT_HWCAP2) & HWCAP2_SME)
+			td->feats_supported |= FEAT_SME;
 		if (feats_ok(td)) {
 			fprintf(stderr,
 				"Required Features: [%s] supported\n",
diff --git a/tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c b/tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
new file mode 100644
index 000000000000..7ed762b7202f
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Attempt to change the streaming SVE vector length in a signal
+ * handler, this is not supported and is expected to segfault.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+struct fake_sigframe sf;
+static unsigned int vls[SVE_VQ_MAX];
+unsigned int nvls = 0;
+
+static bool sme_get_vls(struct tdescr *td)
+{
+	int vq, vl;
+
+	/*
+	 * Enumerate up to SVE_VQ_MAX vector lengths
+	 */
+	for (vq = SVE_VQ_MAX; vq > 0; --vq) {
+		vl = prctl(PR_SVE_SET_VL, vq * 16);
+		if (vl == -1)
+			return false;
+
+		vl &= PR_SME_VL_LEN_MASK;
+
+		/* Skip missing VLs */
+		vq = sve_vq_from_vl(vl);
+
+		vls[nvls++] = vl;
+	}
+
+	/* We need at least two VLs */
+	if (nvls < 2) {
+		fprintf(stderr, "Only %d VL supported\n", nvls);
+		return false;
+	}
+
+	return true;
+}
+
+static int fake_sigreturn_ssve_change_vl(struct tdescr *td,
+					 siginfo_t *si, ucontext_t *uc)
+{
+	size_t resv_sz, offset;
+	struct _aarch64_ctx *head = GET_SF_RESV_HEAD(sf);
+	struct sve_context *sve;
+
+	/* Get a signal context with a SME ZA frame in it */
+	if (!get_current_context(td, &sf.uc))
+		return 1;
+
+	resv_sz = GET_SF_RESV_SIZE(sf);
+	head = get_header(head, SVE_MAGIC, resv_sz, &offset);
+	if (!head) {
+		fprintf(stderr, "No SVE context\n");
+		return 1;
+	}
+
+	if (head->size != sizeof(struct sve_context)) {
+		fprintf(stderr, "Register data present, aborting\n");
+		return 1;
+	}
+
+	sve = (struct sve_context *)head;
+
+	/* No changes are supported; init left us at minimum VL so go to max */
+	fprintf(stderr, "Attempting to change VL from %d to %d\n",
+		sve->vl, vls[0]);
+	sve->vl = vls[0];
+
+	fake_sigreturn(&sf, sizeof(sf), 0);
+
+	return 1;
+}
+
+struct tdescr tde = {
+	.name = "FAKE_SIGRETURN_SSVE_CHANGE",
+	.descr = "Attempt to change Streaming SVE VL",
+	.feats_required = FEAT_SME,
+	.sig_ok = SIGSEGV,
+	.timeout = 3,
+	.init = sme_get_vls,
+	.run = fake_sigreturn_ssve_change_vl,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/sme_vl.c b/tools/testing/selftests/arm64/signal/testcases/sme_vl.c
new file mode 100644
index 000000000000..c40e339a1bc3
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/sme_vl.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Check that the SME vector length reported in signal contexts is the
+ * expected one.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+struct fake_sigframe sf;
+unsigned int vl;
+
+static bool get_sme_vl(struct tdescr *td)
+{
+	int ret = prctl(PR_SME_GET_VL);
+	if (ret == -1)
+		return false;
+
+	vl = ret;
+
+	return true;
+}
+
+static int sme_vl(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+	size_t resv_sz, offset;
+	struct _aarch64_ctx *head = GET_SF_RESV_HEAD(sf);
+	struct sve_context *sve;
+
+	/* Get a signal context which should have a SVE frame in it */
+	if (!get_current_context(td, &sf.uc))
+		return 1;
+
+	resv_sz = GET_SF_RESV_SIZE(sf);
+	head = get_header(head, SVE_MAGIC, resv_sz, &offset);
+	if (!head) {
+		fprintf(stderr, "No SVE context\n");
+		return 1;
+	}
+	sve = (struct sve_context *)head;
+
+	if (sve->vl != vl) {
+		fprintf(stderr, "SSVE sigframe VL %u, expected %u\n",
+			sve->vl, vl);
+		return 1;
+	} else {
+		fprintf(stderr, "got SSVE expected VL %u\n", vl);
+	}
+
+	/* Also check ZA VL */
+
+	td->pass = 1;
+
+	return 0;
+}
+
+struct tdescr tde = {
+	.name = "SME VL",
+	.descr = "Check that we get the right SME VL reported",
+	.feats_required = FEAT_SME,
+	.timeout = 3,
+	.init = get_sme_vl,
+	.run = sme_vl,
+};
diff --git a/tools/testing/selftests/arm64/signal/testcases/ssve_regs.c b/tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
new file mode 100644
index 000000000000..44a08d43cd50
--- /dev/null
+++ b/tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 ARM Limited
+ *
+ * Verify that the streaming SVE register context in signal frames is
+ * set up as expected.
+ */
+
+#include <signal.h>
+#include <ucontext.h>
+#include <sys/prctl.h>
+
+#include "test_signals_utils.h"
+#include "testcases.h"
+
+struct fake_sigframe sf;
+static unsigned int vls[SVE_VQ_MAX];
+unsigned int nvls = 0;
+
+static bool sme_get_vls(struct tdescr *td)
+{
+	int vq, vl;
+
+	/*
+	 * Enumerate up to SVE_VQ_MAX vector lengths
+	 */
+	for (vq = SVE_VQ_MAX; vq > 0; --vq) {
+		vl = prctl(PR_SVE_SET_VL, vq * 16);
+		if (vl == -1)
+			return false;
+
+		vl &= PR_SME_VL_LEN_MASK;
+
+		/* Skip missing VLs */
+		vq = sve_vq_from_vl(vl);
+
+		vls[nvls++] = vl;
+	}
+
+	/* We need at least one VL */
+	if (nvls < 1) {
+		fprintf(stderr, "Only %d VL supported\n", nvls);
+		return false;
+	}
+
+	return true;
+}
+
+static void setup_ssve_regs(void)
+{
+	/* SMSTART SM */
+	asm volatile(".inst 0x7f4303d5");
+
+	/* RDVL x16, #1 so we should have SVE regs; real data is TODO */
+	asm volatile(".inst 0x04bf5030" : : : "x16" );
+}
+
+static int do_one_sme_vl(struct tdescr *td, siginfo_t *si, ucontext_t *uc,
+			 unsigned int vl)
+{
+	size_t resv_sz, offset;
+	struct _aarch64_ctx *head = GET_SF_RESV_HEAD(sf);
+	struct sve_context *ssve;
+
+	fprintf(stderr, "Testing VL %d\n", vl);
+
+	if (prctl(PR_SME_SET_VL, vl) == -1) {
+		fprintf(stderr, "Failed to set VL\n");
+		return 1;
+	}
+
+	/* 
+	 * Get a signal context which should have a SVE frame and registers
+	 * in it.
+	 */
+	setup_ssve_regs();
+	if (!get_current_context(td, &sf.uc))
+		return 1;
+
+	resv_sz = GET_SF_RESV_SIZE(sf);
+	head = get_header(head, SVE_MAGIC, resv_sz, &offset);
+	if (!head) {
+		fprintf(stderr, "No SVE context\n");
+		return 1;
+	}
+
+	ssve = (struct sve_context *)head;
+	if (ssve->vl != vl) {
+		fprintf(stderr, "Got VL %d, expected %d\n", ssve->vl, vl);
+		return 1;
+	}
+
+	/* The actual size validation is done in get_current_context() */
+	fprintf(stderr, "Got expected size %u and VL %d\n",
+		head->size, ssve->vl);
+
+	return 0;
+}
+
+static int sme_regs(struct tdescr *td, siginfo_t *si, ucontext_t *uc)
+{
+	int i;
+
+	for (i = 0; i < nvls; i++) {
+		/*
+		 * TODO: the signal test helpers can't currently cope
+		 * with signal frames bigger than struct sigcontext,
+		 * skip VLs that will trigger that.
+		 */
+		if (vls[i] > 64)
+			continue;
+
+		if (do_one_sme_vl(td, si, uc, vls[i]))
+			return 1;
+	}
+
+	td->pass = 1;
+
+	return 0;
+}
+
+struct tdescr tde = {
+	.name = "Streaming SVE registers",
+	.descr = "Check that we get the right Streaming SVE registers reported",
+	.feats_required = FEAT_SME,
+	.timeout = 3,
+	.init = sme_get_vls,
+	.run = sme_regs,
+};
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 37/38] selftests: arm64: Add streaming SVE to SVE ptrace tests
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In order to allow ptrace of streaming mode SVE registers we have added a
new regset for streaming mode which in isolation offers the same ABI as
regular SVE with a different vector type. Add this to the array of regsets
we handle, together with additional tests for the interoperation of the
two regsets.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/sve-ptrace.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index e200f7ed9572..654ac0f8f73f 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -28,6 +28,10 @@
 #define NT_ARM_SVE 0x405
 #endif
 
+#ifndef NT_ARM_SSVE
+#define NT_ARM_SSVE 0x40b
+#endif
+
 struct vec_type {
 	const char *name;
 	unsigned long hwcap_type;
@@ -44,6 +48,13 @@ static const struct vec_type vec_types[] = {
 		.regset = NT_ARM_SVE,
 		.prctl_set = PR_SVE_SET_VL,
 	},
+	{
+		.name = "Streaming SVE",
+		.hwcap_type = AT_HWCAP2,
+		.hwcap = HWCAP2_SME,
+		.regset = NT_ARM_SSVE,
+		.prctl_set = PR_SME_SET_VL,
+	},
 };
 
 #define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 37/38] selftests: arm64: Add streaming SVE to SVE ptrace tests
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

In order to allow ptrace of streaming mode SVE registers we have added a
new regset for streaming mode which in isolation offers the same ABI as
regular SVE with a different vector type. Add this to the array of regsets
we handle, together with additional tests for the interoperation of the
two regsets.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/sve-ptrace.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index e200f7ed9572..654ac0f8f73f 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -28,6 +28,10 @@
 #define NT_ARM_SVE 0x405
 #endif
 
+#ifndef NT_ARM_SSVE
+#define NT_ARM_SSVE 0x40b
+#endif
+
 struct vec_type {
 	const char *name;
 	unsigned long hwcap_type;
@@ -44,6 +48,13 @@ static const struct vec_type vec_types[] = {
 		.regset = NT_ARM_SVE,
 		.prctl_set = PR_SVE_SET_VL,
 	},
+	{
+		.name = "Streaming SVE",
+		.hwcap_type = AT_HWCAP2,
+		.hwcap = HWCAP2_SME,
+		.regset = NT_ARM_SSVE,
+		.prctl_set = PR_SME_SET_VL,
+	},
 };
 
 #define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 38/38] selftests: arm64: Add coverage for the ZA ptrace interface
  2021-09-30 18:11 ` Mark Brown
@ 2021-09-30 18:11   ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Add some basic coverage for the ZA ptrace interface, including walking
through all the vector lengths supported in the system. As with SVE we do
not currently validate the data all the way through to the running process.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore  |   1 +
 tools/testing/selftests/arm64/fp/Makefile    |   3 +-
 tools/testing/selftests/arm64/fp/za-ptrace.c | 353 +++++++++++++++++++
 3 files changed, 356 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index 1178fecc7aa1..59afc01f2019 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -7,4 +7,5 @@ sve-test
 ssve-test
 vec-syscfg
 vlset
+za-ptrace
 za-test
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index 4f32cb1041a0..f57ce07b2c91 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 CFLAGS += -I../../../../../usr/include/
-TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
+TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg za-ptrace
 TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
 	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
@@ -25,5 +25,6 @@ vec-syscfg: vec-syscfg.o rdvl.o
 vlset: vlset.o
 za-test: za-test.o
 	$(CC) -nostdlib $^ -o $@
+za-ptrace: za-ptrace.o
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/fp/za-ptrace.c b/tools/testing/selftests/arm64/fp/za-ptrace.c
new file mode 100644
index 000000000000..6c172a5629dc
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/za-ptrace.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 ARM Limited.
+ */
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <sys/wait.h>
+#include <asm/sigcontext.h>
+#include <asm/ptrace.h>
+
+#include "../../kselftest.h"
+
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
+
+/* <linux/elf.h> and <sys/auxv.h> don't like each other, so: */
+#ifndef NT_ARM_ZA
+#define NT_ARM_ZA 0x40c
+#endif
+
+#define EXPECTED_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
+
+static void fill_buf(char *buf, size_t size)
+{
+	int i;
+
+	for (i = 0; i < size; i++)
+		buf[i] = random();
+}
+
+static int do_child(void)
+{
+	if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
+		ksft_exit_fail_msg("PTRACE_TRACEME", strerror(errno));
+
+	if (raise(SIGSTOP))
+		ksft_exit_fail_msg("raise(SIGSTOP)", strerror(errno));
+
+	return EXIT_SUCCESS;
+}
+
+static struct user_za_header *get_za(pid_t pid, void **buf, size_t *size)
+{
+	struct user_za_header *za;
+	void *p;
+	size_t sz = sizeof(*za);
+	struct iovec iov;
+
+	while (1) {
+		if (*size < sz) {
+			p = realloc(*buf, sz);
+			if (!p) {
+				errno = ENOMEM;
+				goto error;
+			}
+
+			*buf = p;
+			*size = sz;
+		}
+
+		iov.iov_base = *buf;
+		iov.iov_len = sz;
+		if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_ZA, &iov))
+			goto error;
+
+		za = *buf;
+		if (za->size <= sz)
+			break;
+
+		sz = za->size;
+	}
+
+	return za;
+
+error:
+	return NULL;
+}
+
+static int set_za(pid_t pid, const struct user_za_header *za)
+{
+	struct iovec iov;
+
+	iov.iov_base = (void *)za;
+	iov.iov_len = za->size;
+	return ptrace(PTRACE_SETREGSET, pid, NT_ARM_ZA, &iov);
+}
+
+/* Validate attempting to set the specfied VL via ptrace */
+static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
+{
+	struct user_za_header za;
+	struct user_za_header *new_za = NULL;
+	size_t new_za_size = 0;
+	int ret, prctl_vl;
+
+	*supported = false;
+
+	/* Check if the VL is supported in this process */
+	prctl_vl = prctl(PR_SME_SET_VL, vl);
+	if (prctl_vl == -1)
+		ksft_exit_fail_msg("prctl(PR_SME_SET_VL) failed: %s (%d)\n",
+				   strerror(errno), errno);
+
+	/* If the VL is not supported then a supported VL will be returned */
+	*supported = (prctl_vl == vl);
+
+	/* Set the VL by doing a set with no register payload */
+	memset(&za, 0, sizeof(za));
+	za.size = sizeof(za);
+	za.vl = vl;
+	ret = set_za(child, &za);
+	if (ret != 0) {
+		ksft_test_result_fail("Failed to set VL %u\n", vl);
+		return;
+	}
+
+	/*
+	 * Read back the new register state and verify that we have the
+	 * same VL that we got from prctl() on ourselves.
+	 */
+	if (!get_za(child, (void **)&new_za, &new_za_size)) {
+		ksft_test_result_fail("Failed to read VL %u\n", vl);
+		return;
+	}
+
+	ksft_test_result(new_za->vl = prctl_vl, "Set VL %u\n", vl);
+
+	free(new_za);
+}
+
+/* Validate attempting to set no ZA data and read it back */
+static void ptrace_set_no_data(pid_t child, unsigned int vl)
+{
+	void *read_buf = NULL;
+	struct user_za_header write_za;
+	struct user_za_header *read_za;
+	size_t read_za_size = 0;
+	int ret;
+
+	/* Set up some data and write it out */
+	memset(&write_za, 0, sizeof(write_za));
+	write_za.size = ZA_PT_ZA_OFFSET;
+	write_za.vl = vl;
+
+	ret = set_za(child, &write_za);
+	if (ret != 0) {
+		ksft_test_result_fail("Failed to set VL %u no data\n", vl);
+		return;
+	}
+
+	/* Read the data back */
+	if (!get_za(child, (void **)&read_buf, &read_za_size)) {
+		ksft_test_result_fail("Failed to read VL %u no data\n", vl);
+		return;
+	}
+	read_za = read_buf;
+
+	/* We might read more data if there's extensions we don't know */
+	if (read_za->size < write_za.size) {
+		ksft_test_result_fail("VL %u wrote %d bytes, only read %d\n",
+				      vl, write_za.size, read_za->size);
+		goto out_read;
+	}
+
+	ksft_test_result(read_za->size == write_za.size,
+			 "Disabled ZA for VL %u\n", vl);
+
+out_read:
+	free(read_buf);
+}
+
+/* Validate attempting to set data and read it back */
+static void ptrace_set_get_data(pid_t child, unsigned int vl)
+{
+	void *write_buf;
+	void *read_buf = NULL;
+	struct user_za_header *write_za;
+	struct user_za_header *read_za;
+	size_t read_za_size = 0;
+	unsigned int vq = sve_vq_from_vl(vl);
+	int ret;
+	size_t data_size;
+
+	data_size = ZA_PT_SIZE(vq);
+	write_buf = malloc(data_size);
+	if (!write_buf) {
+		ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
+				      data_size, vl);
+		return;
+	}
+	write_za = write_buf;
+
+	/* Set up some data and write it out */
+	memset(write_za, 0, data_size);
+	write_za->size = data_size;
+	write_za->vl = vl;
+
+	fill_buf(write_buf + ZA_PT_ZA_OFFSET, ZA_PT_ZA_SIZE(vq));
+
+	ret = set_za(child, write_za);
+	if (ret != 0) {
+		ksft_test_result_fail("Failed to set VL %u data\n", vl);
+		goto out;
+	}
+
+	/* Read the data back */
+	if (!get_za(child, (void **)&read_buf, &read_za_size)) {
+		ksft_test_result_fail("Failed to read VL %u data\n", vl);
+		goto out;
+	}
+	read_za = read_buf;
+
+	/* We might read more data if there's extensions we don't know */
+	if (read_za->size < write_za->size) {
+		ksft_test_result_fail("VL %u wrote %d bytes, only read %d\n",
+				      vl, write_za->size, read_za->size);
+		goto out_read;
+	}
+
+	ksft_test_result(memcmp(write_buf + ZA_PT_ZA_OFFSET,
+				read_buf + ZA_PT_ZA_OFFSET,
+				ZA_PT_ZA_SIZE(vq)) == 0,
+			 "Data match for VL %u\n", vl);
+
+out_read:
+	free(read_buf);
+out:
+	free(write_buf);
+}
+
+static int do_parent(pid_t child)
+{
+	int ret = EXIT_FAILURE;
+	pid_t pid;
+	int status;
+	siginfo_t si;
+	unsigned int vq, vl;
+	bool vl_supported;
+
+	/* Attach to the child */
+	while (1) {
+		int sig;
+
+		pid = wait(&status);
+		if (pid == -1) {
+			perror("wait");
+			goto error;
+		}
+
+		/*
+		 * This should never happen but it's hard to flag in
+		 * the framework.
+		 */
+		if (pid != child)
+			continue;
+
+		if (WIFEXITED(status) || WIFSIGNALED(status))
+			ksft_exit_fail_msg("Child died unexpectedly\n");
+
+		if (!WIFSTOPPED(status))
+			goto error;
+
+		sig = WSTOPSIG(status);
+
+		if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &si)) {
+			if (errno == ESRCH)
+				goto disappeared;
+
+			if (errno == EINVAL) {
+				sig = 0; /* bust group-stop */
+				goto cont;
+			}
+
+			ksft_test_result_fail("PTRACE_GETSIGINFO: %s\n",
+					      strerror(errno));
+			goto error;
+		}
+
+		if (sig == SIGSTOP && si.si_code == SI_TKILL &&
+		    si.si_pid == pid)
+			break;
+
+	cont:
+		if (ptrace(PTRACE_CONT, pid, NULL, sig)) {
+			if (errno == ESRCH)
+				goto disappeared;
+
+			ksft_test_result_fail("PTRACE_CONT: %s\n",
+					      strerror(errno));
+			goto error;
+		}
+	}
+
+	/* Step through every possible VQ */
+	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
+		vl = sve_vl_from_vq(vq);
+
+		/* First, try to set this vector length */
+		ptrace_set_get_vl(child, vl, &vl_supported);
+
+		/* If the VL is supported validate data set/get */
+		if (vl_supported) {
+			ptrace_set_no_data(child, vl);
+			ptrace_set_get_data(child, vl);
+		} else {
+			ksft_test_result_skip("Disabled ZA for VL %u\n", vl);
+			ksft_test_result_skip("Get and set data for VL %u\n",
+					      vl);
+		}
+	}
+
+	ret = EXIT_SUCCESS;
+
+error:
+	kill(child, SIGKILL);
+
+disappeared:
+	return ret;
+}
+
+int main(void)
+{
+	int ret = EXIT_SUCCESS;
+	pid_t child;
+
+	srandom(getpid());
+
+	ksft_print_header();
+	ksft_set_plan(EXPECTED_TESTS);
+
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_SME))
+		ksft_exit_skip("SME not available\n");
+
+	child = fork();
+	if (!child)
+		return do_child();
+
+	if (do_parent(child))
+		ret = EXIT_FAILURE;
+
+	ksft_print_cnts();
+
+	return ret;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 143+ messages in thread

* [PATCH v1 38/38] selftests: arm64: Add coverage for the ZA ptrace interface
@ 2021-09-30 18:11   ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-09-30 18:11 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan
  Cc: Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest, Mark Brown

Add some basic coverage for the ZA ptrace interface, including walking
through all the vector lengths supported in the system. As with SVE we do
not currently validate the data all the way through to the running process.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 tools/testing/selftests/arm64/fp/.gitignore  |   1 +
 tools/testing/selftests/arm64/fp/Makefile    |   3 +-
 tools/testing/selftests/arm64/fp/za-ptrace.c | 353 +++++++++++++++++++
 3 files changed, 356 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c

diff --git a/tools/testing/selftests/arm64/fp/.gitignore b/tools/testing/selftests/arm64/fp/.gitignore
index 1178fecc7aa1..59afc01f2019 100644
--- a/tools/testing/selftests/arm64/fp/.gitignore
+++ b/tools/testing/selftests/arm64/fp/.gitignore
@@ -7,4 +7,5 @@ sve-test
 ssve-test
 vec-syscfg
 vlset
+za-ptrace
 za-test
diff --git a/tools/testing/selftests/arm64/fp/Makefile b/tools/testing/selftests/arm64/fp/Makefile
index 4f32cb1041a0..f57ce07b2c91 100644
--- a/tools/testing/selftests/arm64/fp/Makefile
+++ b/tools/testing/selftests/arm64/fp/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 CFLAGS += -I../../../../../usr/include/
-TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
+TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg za-ptrace
 TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
 	rdvl-sme rdvl-sve \
 	sve-test sve-stress \
@@ -25,5 +25,6 @@ vec-syscfg: vec-syscfg.o rdvl.o
 vlset: vlset.o
 za-test: za-test.o
 	$(CC) -nostdlib $^ -o $@
+za-ptrace: za-ptrace.o
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/fp/za-ptrace.c b/tools/testing/selftests/arm64/fp/za-ptrace.c
new file mode 100644
index 000000000000..6c172a5629dc
--- /dev/null
+++ b/tools/testing/selftests/arm64/fp/za-ptrace.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 ARM Limited.
+ */
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <sys/wait.h>
+#include <asm/sigcontext.h>
+#include <asm/ptrace.h>
+
+#include "../../kselftest.h"
+
+#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
+
+/* <linux/elf.h> and <sys/auxv.h> don't like each other, so: */
+#ifndef NT_ARM_ZA
+#define NT_ARM_ZA 0x40c
+#endif
+
+#define EXPECTED_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
+
+static void fill_buf(char *buf, size_t size)
+{
+	int i;
+
+	for (i = 0; i < size; i++)
+		buf[i] = random();
+}
+
+static int do_child(void)
+{
+	if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
+		ksft_exit_fail_msg("PTRACE_TRACEME", strerror(errno));
+
+	if (raise(SIGSTOP))
+		ksft_exit_fail_msg("raise(SIGSTOP)", strerror(errno));
+
+	return EXIT_SUCCESS;
+}
+
+static struct user_za_header *get_za(pid_t pid, void **buf, size_t *size)
+{
+	struct user_za_header *za;
+	void *p;
+	size_t sz = sizeof(*za);
+	struct iovec iov;
+
+	while (1) {
+		if (*size < sz) {
+			p = realloc(*buf, sz);
+			if (!p) {
+				errno = ENOMEM;
+				goto error;
+			}
+
+			*buf = p;
+			*size = sz;
+		}
+
+		iov.iov_base = *buf;
+		iov.iov_len = sz;
+		if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_ZA, &iov))
+			goto error;
+
+		za = *buf;
+		if (za->size <= sz)
+			break;
+
+		sz = za->size;
+	}
+
+	return za;
+
+error:
+	return NULL;
+}
+
+static int set_za(pid_t pid, const struct user_za_header *za)
+{
+	struct iovec iov;
+
+	iov.iov_base = (void *)za;
+	iov.iov_len = za->size;
+	return ptrace(PTRACE_SETREGSET, pid, NT_ARM_ZA, &iov);
+}
+
+/* Validate attempting to set the specfied VL via ptrace */
+static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
+{
+	struct user_za_header za;
+	struct user_za_header *new_za = NULL;
+	size_t new_za_size = 0;
+	int ret, prctl_vl;
+
+	*supported = false;
+
+	/* Check if the VL is supported in this process */
+	prctl_vl = prctl(PR_SME_SET_VL, vl);
+	if (prctl_vl == -1)
+		ksft_exit_fail_msg("prctl(PR_SME_SET_VL) failed: %s (%d)\n",
+				   strerror(errno), errno);
+
+	/* If the VL is not supported then a supported VL will be returned */
+	*supported = (prctl_vl == vl);
+
+	/* Set the VL by doing a set with no register payload */
+	memset(&za, 0, sizeof(za));
+	za.size = sizeof(za);
+	za.vl = vl;
+	ret = set_za(child, &za);
+	if (ret != 0) {
+		ksft_test_result_fail("Failed to set VL %u\n", vl);
+		return;
+	}
+
+	/*
+	 * Read back the new register state and verify that we have the
+	 * same VL that we got from prctl() on ourselves.
+	 */
+	if (!get_za(child, (void **)&new_za, &new_za_size)) {
+		ksft_test_result_fail("Failed to read VL %u\n", vl);
+		return;
+	}
+
+	ksft_test_result(new_za->vl = prctl_vl, "Set VL %u\n", vl);
+
+	free(new_za);
+}
+
+/* Validate attempting to set no ZA data and read it back */
+static void ptrace_set_no_data(pid_t child, unsigned int vl)
+{
+	void *read_buf = NULL;
+	struct user_za_header write_za;
+	struct user_za_header *read_za;
+	size_t read_za_size = 0;
+	int ret;
+
+	/* Set up some data and write it out */
+	memset(&write_za, 0, sizeof(write_za));
+	write_za.size = ZA_PT_ZA_OFFSET;
+	write_za.vl = vl;
+
+	ret = set_za(child, &write_za);
+	if (ret != 0) {
+		ksft_test_result_fail("Failed to set VL %u no data\n", vl);
+		return;
+	}
+
+	/* Read the data back */
+	if (!get_za(child, (void **)&read_buf, &read_za_size)) {
+		ksft_test_result_fail("Failed to read VL %u no data\n", vl);
+		return;
+	}
+	read_za = read_buf;
+
+	/* We might read more data if there's extensions we don't know */
+	if (read_za->size < write_za.size) {
+		ksft_test_result_fail("VL %u wrote %d bytes, only read %d\n",
+				      vl, write_za.size, read_za->size);
+		goto out_read;
+	}
+
+	ksft_test_result(read_za->size == write_za.size,
+			 "Disabled ZA for VL %u\n", vl);
+
+out_read:
+	free(read_buf);
+}
+
+/* Validate attempting to set data and read it back */
+static void ptrace_set_get_data(pid_t child, unsigned int vl)
+{
+	void *write_buf;
+	void *read_buf = NULL;
+	struct user_za_header *write_za;
+	struct user_za_header *read_za;
+	size_t read_za_size = 0;
+	unsigned int vq = sve_vq_from_vl(vl);
+	int ret;
+	size_t data_size;
+
+	data_size = ZA_PT_SIZE(vq);
+	write_buf = malloc(data_size);
+	if (!write_buf) {
+		ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
+				      data_size, vl);
+		return;
+	}
+	write_za = write_buf;
+
+	/* Set up some data and write it out */
+	memset(write_za, 0, data_size);
+	write_za->size = data_size;
+	write_za->vl = vl;
+
+	fill_buf(write_buf + ZA_PT_ZA_OFFSET, ZA_PT_ZA_SIZE(vq));
+
+	ret = set_za(child, write_za);
+	if (ret != 0) {
+		ksft_test_result_fail("Failed to set VL %u data\n", vl);
+		goto out;
+	}
+
+	/* Read the data back */
+	if (!get_za(child, (void **)&read_buf, &read_za_size)) {
+		ksft_test_result_fail("Failed to read VL %u data\n", vl);
+		goto out;
+	}
+	read_za = read_buf;
+
+	/* We might read more data if there's extensions we don't know */
+	if (read_za->size < write_za->size) {
+		ksft_test_result_fail("VL %u wrote %d bytes, only read %d\n",
+				      vl, write_za->size, read_za->size);
+		goto out_read;
+	}
+
+	ksft_test_result(memcmp(write_buf + ZA_PT_ZA_OFFSET,
+				read_buf + ZA_PT_ZA_OFFSET,
+				ZA_PT_ZA_SIZE(vq)) == 0,
+			 "Data match for VL %u\n", vl);
+
+out_read:
+	free(read_buf);
+out:
+	free(write_buf);
+}
+
+static int do_parent(pid_t child)
+{
+	int ret = EXIT_FAILURE;
+	pid_t pid;
+	int status;
+	siginfo_t si;
+	unsigned int vq, vl;
+	bool vl_supported;
+
+	/* Attach to the child */
+	while (1) {
+		int sig;
+
+		pid = wait(&status);
+		if (pid == -1) {
+			perror("wait");
+			goto error;
+		}
+
+		/*
+		 * This should never happen but it's hard to flag in
+		 * the framework.
+		 */
+		if (pid != child)
+			continue;
+
+		if (WIFEXITED(status) || WIFSIGNALED(status))
+			ksft_exit_fail_msg("Child died unexpectedly\n");
+
+		if (!WIFSTOPPED(status))
+			goto error;
+
+		sig = WSTOPSIG(status);
+
+		if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &si)) {
+			if (errno == ESRCH)
+				goto disappeared;
+
+			if (errno == EINVAL) {
+				sig = 0; /* bust group-stop */
+				goto cont;
+			}
+
+			ksft_test_result_fail("PTRACE_GETSIGINFO: %s\n",
+					      strerror(errno));
+			goto error;
+		}
+
+		if (sig == SIGSTOP && si.si_code == SI_TKILL &&
+		    si.si_pid == pid)
+			break;
+
+	cont:
+		if (ptrace(PTRACE_CONT, pid, NULL, sig)) {
+			if (errno == ESRCH)
+				goto disappeared;
+
+			ksft_test_result_fail("PTRACE_CONT: %s\n",
+					      strerror(errno));
+			goto error;
+		}
+	}
+
+	/* Step through every possible VQ */
+	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
+		vl = sve_vl_from_vq(vq);
+
+		/* First, try to set this vector length */
+		ptrace_set_get_vl(child, vl, &vl_supported);
+
+		/* If the VL is supported validate data set/get */
+		if (vl_supported) {
+			ptrace_set_no_data(child, vl);
+			ptrace_set_get_data(child, vl);
+		} else {
+			ksft_test_result_skip("Disabled ZA for VL %u\n", vl);
+			ksft_test_result_skip("Get and set data for VL %u\n",
+					      vl);
+		}
+	}
+
+	ret = EXIT_SUCCESS;
+
+error:
+	kill(child, SIGKILL);
+
+disappeared:
+	return ret;
+}
+
+int main(void)
+{
+	int ret = EXIT_SUCCESS;
+	pid_t child;
+
+	srandom(getpid());
+
+	ksft_print_header();
+	ksft_set_plan(EXPECTED_TESTS);
+
+	if (!(getauxval(AT_HWCAP2) & HWCAP2_SME))
+		ksft_exit_skip("SME not available\n");
+
+	child = fork();
+	if (!child)
+		return do_child();
+
+	if (do_parent(child))
+		ret = EXIT_FAILURE;
+
+	ksft_print_cnts();
+
+	return ret;
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 06/38] arm64/sve: Put system wide vector length information into structs
  2021-09-30 18:11   ` Mark Brown
  (?)
@ 2021-10-01  3:13     ` kernel test robot
  -1 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01  3:13 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas, Will Deacon, Shuah Khan
  Cc: llvm, kbuild-all, Alan Hayward, Luis Machado, Salil Akerkar,
	Basant Kumar Dwivedi, Szabolcs Nagy, linux-arm-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 6095 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arm64-randconfig-r024-20210930 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 962e503cc8bc411f7523cc393acae8aae425b1c4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/7e299f3a56180fd063d19d4fc4de16262d789820
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout 7e299f3a56180fd063d19d4fc4de16262d789820
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/kernel/ kernel/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/arm64/kernel/fpsimd.c:124:39: error: array has incomplete element type 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                                         ^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
>> arch/arm64/kernel/fpsimd.c:376:19: error: incomplete definition of type 'struct vl_info'
           int max_vl = info->max_vl;
                        ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:379:12: error: incomplete definition of type 'struct vl_info'
                   vl = info->min_vl;
                        ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:382:16: error: incomplete definition of type 'struct vl_info'
                   max_vl = info->min_vl;
                            ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:387:26: error: incomplete definition of type 'struct vl_info'
           bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
                               ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   5 errors generated.
--
>> arch/arm64/kernel/signal.c:597:13: error: implicit declaration of function 'sve_max_vl' [-Werror,-Wimplicit-function-declaration]
                           int vl = sve_max_vl();
                                    ^
   arch/arm64/kernel/signal.c:597:13: note: did you mean 'sve_get_vl'?
   arch/arm64/include/asm/fpsimd.h:72:21: note: 'sve_get_vl' declared here
   extern unsigned int sve_get_vl(void);
                       ^
   arch/arm64/kernel/signal.c:920:6: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes]
   void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
        ^
   arch/arm64/kernel/signal.c:920:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
   ^
   static 
   1 warning and 1 error generated.
--
>> arch/arm64/kernel/cpufeature.c:944:3: error: implicit declaration of function 'vec_init_vq_map' [-Werror,-Wimplicit-function-declaration]
                   vec_init_vq_map(ARM64_VEC_SVE);
                   ^
   arch/arm64/kernel/cpufeature.c:944:3: note: did you mean 'sve_init_vq_map'?
   arch/arm64/include/asm/fpsimd.h:237:20: note: 'sve_init_vq_map' declared here
   static inline void sve_init_vq_map(void) { }
                      ^
>> arch/arm64/kernel/cpufeature.c:1178:4: error: implicit declaration of function 'vec_update_vq_map' [-Werror,-Wimplicit-function-declaration]
                           vec_update_vq_map(ARM64_VEC_SVE);
                           ^
   arch/arm64/kernel/cpufeature.c:1178:4: note: did you mean 'sve_update_vq_map'?
   arch/arm64/include/asm/fpsimd.h:238:20: note: 'sve_update_vq_map' declared here
   static inline void sve_update_vq_map(void) { }
                      ^
>> arch/arm64/kernel/cpufeature.c:2742:24: error: implicit declaration of function 'vec_verify_vq_map' [-Werror,-Wimplicit-function-declaration]
           if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SVE)) {
                                 ^
   arch/arm64/kernel/cpufeature.c:2742:24: note: did you mean 'sve_verify_vq_map'?
   arch/arm64/include/asm/fpsimd.h:239:19: note: 'sve_verify_vq_map' declared here
   static inline int sve_verify_vq_map(void) { return 0; }
                     ^
   3 errors generated.


vim +124 arch/arm64/kernel/fpsimd.c

   123	
 > 124	__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
   125	#ifdef CONFIG_ARM64_SVE
   126		[ARM64_VEC_SVE] = {
   127			.type			= ARM64_VEC_SVE,
   128			.name			= "SVE",
   129			.min_vl			= SVE_VL_MIN,
   130			.max_vl			= SVE_VL_MIN,
   131			.max_virtualisable_vl	= SVE_VL_MIN,
   132		},
   133	#endif
   134	};
   135	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42378 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 06/38] arm64/sve: Put system wide vector length information into structs
@ 2021-10-01  3:13     ` kernel test robot
  0 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01  3:13 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas, Will Deacon, Shuah Khan
  Cc: llvm, kbuild-all, Alan Hayward, Luis Machado, Salil Akerkar,
	Basant Kumar Dwivedi, Szabolcs Nagy, linux-arm-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 6095 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arm64-randconfig-r024-20210930 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 962e503cc8bc411f7523cc393acae8aae425b1c4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/7e299f3a56180fd063d19d4fc4de16262d789820
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout 7e299f3a56180fd063d19d4fc4de16262d789820
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/kernel/ kernel/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/arm64/kernel/fpsimd.c:124:39: error: array has incomplete element type 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                                         ^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
>> arch/arm64/kernel/fpsimd.c:376:19: error: incomplete definition of type 'struct vl_info'
           int max_vl = info->max_vl;
                        ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:379:12: error: incomplete definition of type 'struct vl_info'
                   vl = info->min_vl;
                        ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:382:16: error: incomplete definition of type 'struct vl_info'
                   max_vl = info->min_vl;
                            ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:387:26: error: incomplete definition of type 'struct vl_info'
           bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
                               ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   5 errors generated.
--
>> arch/arm64/kernel/signal.c:597:13: error: implicit declaration of function 'sve_max_vl' [-Werror,-Wimplicit-function-declaration]
                           int vl = sve_max_vl();
                                    ^
   arch/arm64/kernel/signal.c:597:13: note: did you mean 'sve_get_vl'?
   arch/arm64/include/asm/fpsimd.h:72:21: note: 'sve_get_vl' declared here
   extern unsigned int sve_get_vl(void);
                       ^
   arch/arm64/kernel/signal.c:920:6: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes]
   void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
        ^
   arch/arm64/kernel/signal.c:920:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
   ^
   static 
   1 warning and 1 error generated.
--
>> arch/arm64/kernel/cpufeature.c:944:3: error: implicit declaration of function 'vec_init_vq_map' [-Werror,-Wimplicit-function-declaration]
                   vec_init_vq_map(ARM64_VEC_SVE);
                   ^
   arch/arm64/kernel/cpufeature.c:944:3: note: did you mean 'sve_init_vq_map'?
   arch/arm64/include/asm/fpsimd.h:237:20: note: 'sve_init_vq_map' declared here
   static inline void sve_init_vq_map(void) { }
                      ^
>> arch/arm64/kernel/cpufeature.c:1178:4: error: implicit declaration of function 'vec_update_vq_map' [-Werror,-Wimplicit-function-declaration]
                           vec_update_vq_map(ARM64_VEC_SVE);
                           ^
   arch/arm64/kernel/cpufeature.c:1178:4: note: did you mean 'sve_update_vq_map'?
   arch/arm64/include/asm/fpsimd.h:238:20: note: 'sve_update_vq_map' declared here
   static inline void sve_update_vq_map(void) { }
                      ^
>> arch/arm64/kernel/cpufeature.c:2742:24: error: implicit declaration of function 'vec_verify_vq_map' [-Werror,-Wimplicit-function-declaration]
           if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SVE)) {
                                 ^
   arch/arm64/kernel/cpufeature.c:2742:24: note: did you mean 'sve_verify_vq_map'?
   arch/arm64/include/asm/fpsimd.h:239:19: note: 'sve_verify_vq_map' declared here
   static inline int sve_verify_vq_map(void) { return 0; }
                     ^
   3 errors generated.


vim +124 arch/arm64/kernel/fpsimd.c

   123	
 > 124	__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
   125	#ifdef CONFIG_ARM64_SVE
   126		[ARM64_VEC_SVE] = {
   127			.type			= ARM64_VEC_SVE,
   128			.name			= "SVE",
   129			.min_vl			= SVE_VL_MIN,
   130			.max_vl			= SVE_VL_MIN,
   131			.max_virtualisable_vl	= SVE_VL_MIN,
   132		},
   133	#endif
   134	};
   135	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42378 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 06/38] arm64/sve: Put system wide vector length information into structs
@ 2021-10-01  3:13     ` kernel test robot
  0 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01  3:13 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6216 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arm64-randconfig-r024-20210930 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 962e503cc8bc411f7523cc393acae8aae425b1c4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/7e299f3a56180fd063d19d4fc4de16262d789820
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout 7e299f3a56180fd063d19d4fc4de16262d789820
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/kernel/ kernel/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/arm64/kernel/fpsimd.c:124:39: error: array has incomplete element type 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                                         ^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
>> arch/arm64/kernel/fpsimd.c:376:19: error: incomplete definition of type 'struct vl_info'
           int max_vl = info->max_vl;
                        ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:379:12: error: incomplete definition of type 'struct vl_info'
                   vl = info->min_vl;
                        ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:382:16: error: incomplete definition of type 'struct vl_info'
                   max_vl = info->min_vl;
                            ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   arch/arm64/kernel/fpsimd.c:387:26: error: incomplete definition of type 'struct vl_info'
           bit = find_next_bit(info->vq_map, SVE_VQ_MAX,
                               ~~~~^
   arch/arm64/kernel/fpsimd.c:124:24: note: forward declaration of 'struct vl_info'
   __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
                          ^
   5 errors generated.
--
>> arch/arm64/kernel/signal.c:597:13: error: implicit declaration of function 'sve_max_vl' [-Werror,-Wimplicit-function-declaration]
                           int vl = sve_max_vl();
                                    ^
   arch/arm64/kernel/signal.c:597:13: note: did you mean 'sve_get_vl'?
   arch/arm64/include/asm/fpsimd.h:72:21: note: 'sve_get_vl' declared here
   extern unsigned int sve_get_vl(void);
                       ^
   arch/arm64/kernel/signal.c:920:6: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes]
   void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
        ^
   arch/arm64/kernel/signal.c:920:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
   ^
   static 
   1 warning and 1 error generated.
--
>> arch/arm64/kernel/cpufeature.c:944:3: error: implicit declaration of function 'vec_init_vq_map' [-Werror,-Wimplicit-function-declaration]
                   vec_init_vq_map(ARM64_VEC_SVE);
                   ^
   arch/arm64/kernel/cpufeature.c:944:3: note: did you mean 'sve_init_vq_map'?
   arch/arm64/include/asm/fpsimd.h:237:20: note: 'sve_init_vq_map' declared here
   static inline void sve_init_vq_map(void) { }
                      ^
>> arch/arm64/kernel/cpufeature.c:1178:4: error: implicit declaration of function 'vec_update_vq_map' [-Werror,-Wimplicit-function-declaration]
                           vec_update_vq_map(ARM64_VEC_SVE);
                           ^
   arch/arm64/kernel/cpufeature.c:1178:4: note: did you mean 'sve_update_vq_map'?
   arch/arm64/include/asm/fpsimd.h:238:20: note: 'sve_update_vq_map' declared here
   static inline void sve_update_vq_map(void) { }
                      ^
>> arch/arm64/kernel/cpufeature.c:2742:24: error: implicit declaration of function 'vec_verify_vq_map' [-Werror,-Wimplicit-function-declaration]
           if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SVE)) {
                                 ^
   arch/arm64/kernel/cpufeature.c:2742:24: note: did you mean 'sve_verify_vq_map'?
   arch/arm64/include/asm/fpsimd.h:239:19: note: 'sve_verify_vq_map' declared here
   static inline int sve_verify_vq_map(void) { return 0; }
                     ^
   3 errors generated.


vim +124 arch/arm64/kernel/fpsimd.c

   123	
 > 124	__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
   125	#ifdef CONFIG_ARM64_SVE
   126		[ARM64_VEC_SVE] = {
   127			.type			= ARM64_VEC_SVE,
   128			.name			= "SVE",
   129			.min_vl			= SVE_VL_MIN,
   130			.max_vl			= SVE_VL_MIN,
   131			.max_virtualisable_vl	= SVE_VL_MIN,
   132		},
   133	#endif
   134	};
   135	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 42378 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
  2021-09-30 18:11   ` Mark Brown
  (?)
@ 2021-10-01  5:20     ` kernel test robot
  -1 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01  5:20 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas, Will Deacon, Shuah Khan
  Cc: kbuild-all, Alan Hayward, Luis Machado, Salil Akerkar,
	Basant Kumar Dwivedi, Szabolcs Nagy, linux-arm-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 11259 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arc-randconfig-r043-20210930 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sys.c: In function '__do_sys_prctl':
>> kernel/sys.c:2467:25: error: implicit declaration of function 'SME_SET_VL'; did you mean 'SVE_SET_VL'? [-Werror=implicit-function-declaration]
    2467 |                 error = SME_SET_VL(arg2);
         |                         ^~~~~~~~~~
         |                         SVE_SET_VL
>> kernel/sys.c:2470:25: error: implicit declaration of function 'SME_GET_VL'; did you mean 'SVE_GET_VL'? [-Werror=implicit-function-declaration]
    2470 |                 error = SME_GET_VL();
         |                         ^~~~~~~~~~
         |                         SVE_GET_VL
   In file included from include/linux/perf_event.h:25,
                    from kernel/sys.c:17:
   At top level:
   arch/arc/include/asm/perf_event.h:126:27: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=]
     126 | static const unsigned int arc_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
         |                           ^~~~~~~~~~~~~~~~~
   arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=]
      91 | static const char * const arc_pmu_ev_hw_map[] = {
         |                           ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +2467 kernel/sys.c

  2263	
  2264	SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
  2265			unsigned long, arg4, unsigned long, arg5)
  2266	{
  2267		struct task_struct *me = current;
  2268		unsigned char comm[sizeof(me->comm)];
  2269		long error;
  2270	
  2271		error = security_task_prctl(option, arg2, arg3, arg4, arg5);
  2272		if (error != -ENOSYS)
  2273			return error;
  2274	
  2275		error = 0;
  2276		switch (option) {
  2277		case PR_SET_PDEATHSIG:
  2278			if (!valid_signal(arg2)) {
  2279				error = -EINVAL;
  2280				break;
  2281			}
  2282			me->pdeath_signal = arg2;
  2283			break;
  2284		case PR_GET_PDEATHSIG:
  2285			error = put_user(me->pdeath_signal, (int __user *)arg2);
  2286			break;
  2287		case PR_GET_DUMPABLE:
  2288			error = get_dumpable(me->mm);
  2289			break;
  2290		case PR_SET_DUMPABLE:
  2291			if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) {
  2292				error = -EINVAL;
  2293				break;
  2294			}
  2295			set_dumpable(me->mm, arg2);
  2296			break;
  2297	
  2298		case PR_SET_UNALIGN:
  2299			error = SET_UNALIGN_CTL(me, arg2);
  2300			break;
  2301		case PR_GET_UNALIGN:
  2302			error = GET_UNALIGN_CTL(me, arg2);
  2303			break;
  2304		case PR_SET_FPEMU:
  2305			error = SET_FPEMU_CTL(me, arg2);
  2306			break;
  2307		case PR_GET_FPEMU:
  2308			error = GET_FPEMU_CTL(me, arg2);
  2309			break;
  2310		case PR_SET_FPEXC:
  2311			error = SET_FPEXC_CTL(me, arg2);
  2312			break;
  2313		case PR_GET_FPEXC:
  2314			error = GET_FPEXC_CTL(me, arg2);
  2315			break;
  2316		case PR_GET_TIMING:
  2317			error = PR_TIMING_STATISTICAL;
  2318			break;
  2319		case PR_SET_TIMING:
  2320			if (arg2 != PR_TIMING_STATISTICAL)
  2321				error = -EINVAL;
  2322			break;
  2323		case PR_SET_NAME:
  2324			comm[sizeof(me->comm) - 1] = 0;
  2325			if (strncpy_from_user(comm, (char __user *)arg2,
  2326					      sizeof(me->comm) - 1) < 0)
  2327				return -EFAULT;
  2328			set_task_comm(me, comm);
  2329			proc_comm_connector(me);
  2330			break;
  2331		case PR_GET_NAME:
  2332			get_task_comm(comm, me);
  2333			if (copy_to_user((char __user *)arg2, comm, sizeof(comm)))
  2334				return -EFAULT;
  2335			break;
  2336		case PR_GET_ENDIAN:
  2337			error = GET_ENDIAN(me, arg2);
  2338			break;
  2339		case PR_SET_ENDIAN:
  2340			error = SET_ENDIAN(me, arg2);
  2341			break;
  2342		case PR_GET_SECCOMP:
  2343			error = prctl_get_seccomp();
  2344			break;
  2345		case PR_SET_SECCOMP:
  2346			error = prctl_set_seccomp(arg2, (char __user *)arg3);
  2347			break;
  2348		case PR_GET_TSC:
  2349			error = GET_TSC_CTL(arg2);
  2350			break;
  2351		case PR_SET_TSC:
  2352			error = SET_TSC_CTL(arg2);
  2353			break;
  2354		case PR_TASK_PERF_EVENTS_DISABLE:
  2355			error = perf_event_task_disable();
  2356			break;
  2357		case PR_TASK_PERF_EVENTS_ENABLE:
  2358			error = perf_event_task_enable();
  2359			break;
  2360		case PR_GET_TIMERSLACK:
  2361			if (current->timer_slack_ns > ULONG_MAX)
  2362				error = ULONG_MAX;
  2363			else
  2364				error = current->timer_slack_ns;
  2365			break;
  2366		case PR_SET_TIMERSLACK:
  2367			if (arg2 <= 0)
  2368				current->timer_slack_ns =
  2369						current->default_timer_slack_ns;
  2370			else
  2371				current->timer_slack_ns = arg2;
  2372			break;
  2373		case PR_MCE_KILL:
  2374			if (arg4 | arg5)
  2375				return -EINVAL;
  2376			switch (arg2) {
  2377			case PR_MCE_KILL_CLEAR:
  2378				if (arg3 != 0)
  2379					return -EINVAL;
  2380				current->flags &= ~PF_MCE_PROCESS;
  2381				break;
  2382			case PR_MCE_KILL_SET:
  2383				current->flags |= PF_MCE_PROCESS;
  2384				if (arg3 == PR_MCE_KILL_EARLY)
  2385					current->flags |= PF_MCE_EARLY;
  2386				else if (arg3 == PR_MCE_KILL_LATE)
  2387					current->flags &= ~PF_MCE_EARLY;
  2388				else if (arg3 == PR_MCE_KILL_DEFAULT)
  2389					current->flags &=
  2390							~(PF_MCE_EARLY|PF_MCE_PROCESS);
  2391				else
  2392					return -EINVAL;
  2393				break;
  2394			default:
  2395				return -EINVAL;
  2396			}
  2397			break;
  2398		case PR_MCE_KILL_GET:
  2399			if (arg2 | arg3 | arg4 | arg5)
  2400				return -EINVAL;
  2401			if (current->flags & PF_MCE_PROCESS)
  2402				error = (current->flags & PF_MCE_EARLY) ?
  2403					PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
  2404			else
  2405				error = PR_MCE_KILL_DEFAULT;
  2406			break;
  2407		case PR_SET_MM:
  2408			error = prctl_set_mm(arg2, arg3, arg4, arg5);
  2409			break;
  2410		case PR_GET_TID_ADDRESS:
  2411			error = prctl_get_tid_address(me, (int __user * __user *)arg2);
  2412			break;
  2413		case PR_SET_CHILD_SUBREAPER:
  2414			me->signal->is_child_subreaper = !!arg2;
  2415			if (!arg2)
  2416				break;
  2417	
  2418			walk_process_tree(me, propagate_has_child_subreaper, NULL);
  2419			break;
  2420		case PR_GET_CHILD_SUBREAPER:
  2421			error = put_user(me->signal->is_child_subreaper,
  2422					 (int __user *)arg2);
  2423			break;
  2424		case PR_SET_NO_NEW_PRIVS:
  2425			if (arg2 != 1 || arg3 || arg4 || arg5)
  2426				return -EINVAL;
  2427	
  2428			task_set_no_new_privs(current);
  2429			break;
  2430		case PR_GET_NO_NEW_PRIVS:
  2431			if (arg2 || arg3 || arg4 || arg5)
  2432				return -EINVAL;
  2433			return task_no_new_privs(current) ? 1 : 0;
  2434		case PR_GET_THP_DISABLE:
  2435			if (arg2 || arg3 || arg4 || arg5)
  2436				return -EINVAL;
  2437			error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags);
  2438			break;
  2439		case PR_SET_THP_DISABLE:
  2440			if (arg3 || arg4 || arg5)
  2441				return -EINVAL;
  2442			if (mmap_write_lock_killable(me->mm))
  2443				return -EINTR;
  2444			if (arg2)
  2445				set_bit(MMF_DISABLE_THP, &me->mm->flags);
  2446			else
  2447				clear_bit(MMF_DISABLE_THP, &me->mm->flags);
  2448			mmap_write_unlock(me->mm);
  2449			break;
  2450		case PR_MPX_ENABLE_MANAGEMENT:
  2451		case PR_MPX_DISABLE_MANAGEMENT:
  2452			/* No longer implemented: */
  2453			return -EINVAL;
  2454		case PR_SET_FP_MODE:
  2455			error = SET_FP_MODE(me, arg2);
  2456			break;
  2457		case PR_GET_FP_MODE:
  2458			error = GET_FP_MODE(me);
  2459			break;
  2460		case PR_SVE_SET_VL:
  2461			error = SVE_SET_VL(arg2);
  2462			break;
  2463		case PR_SVE_GET_VL:
  2464			error = SVE_GET_VL();
  2465			break;
  2466		case PR_SME_SET_VL:
> 2467			error = SME_SET_VL(arg2);
  2468			break;
  2469		case PR_SME_GET_VL:
> 2470			error = SME_GET_VL();
  2471			break;
  2472		case PR_GET_SPECULATION_CTRL:
  2473			if (arg3 || arg4 || arg5)
  2474				return -EINVAL;
  2475			error = arch_prctl_spec_ctrl_get(me, arg2);
  2476			break;
  2477		case PR_SET_SPECULATION_CTRL:
  2478			if (arg4 || arg5)
  2479				return -EINVAL;
  2480			error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
  2481			break;
  2482		case PR_PAC_RESET_KEYS:
  2483			if (arg3 || arg4 || arg5)
  2484				return -EINVAL;
  2485			error = PAC_RESET_KEYS(me, arg2);
  2486			break;
  2487		case PR_PAC_SET_ENABLED_KEYS:
  2488			if (arg4 || arg5)
  2489				return -EINVAL;
  2490			error = PAC_SET_ENABLED_KEYS(me, arg2, arg3);
  2491			break;
  2492		case PR_PAC_GET_ENABLED_KEYS:
  2493			if (arg2 || arg3 || arg4 || arg5)
  2494				return -EINVAL;
  2495			error = PAC_GET_ENABLED_KEYS(me);
  2496			break;
  2497		case PR_SET_TAGGED_ADDR_CTRL:
  2498			if (arg3 || arg4 || arg5)
  2499				return -EINVAL;
  2500			error = SET_TAGGED_ADDR_CTRL(arg2);
  2501			break;
  2502		case PR_GET_TAGGED_ADDR_CTRL:
  2503			if (arg2 || arg3 || arg4 || arg5)
  2504				return -EINVAL;
  2505			error = GET_TAGGED_ADDR_CTRL();
  2506			break;
  2507		case PR_SET_IO_FLUSHER:
  2508			if (!capable(CAP_SYS_RESOURCE))
  2509				return -EPERM;
  2510	
  2511			if (arg3 || arg4 || arg5)
  2512				return -EINVAL;
  2513	
  2514			if (arg2 == 1)
  2515				current->flags |= PR_IO_FLUSHER;
  2516			else if (!arg2)
  2517				current->flags &= ~PR_IO_FLUSHER;
  2518			else
  2519				return -EINVAL;
  2520			break;
  2521		case PR_GET_IO_FLUSHER:
  2522			if (!capable(CAP_SYS_RESOURCE))
  2523				return -EPERM;
  2524	
  2525			if (arg2 || arg3 || arg4 || arg5)
  2526				return -EINVAL;
  2527	
  2528			error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
  2529			break;
  2530		case PR_SET_SYSCALL_USER_DISPATCH:
  2531			error = set_syscall_user_dispatch(arg2, arg3, arg4,
  2532							  (char __user *) arg5);
  2533			break;
  2534	#ifdef CONFIG_SCHED_CORE
  2535		case PR_SCHED_CORE:
  2536			error = sched_core_share_pid(arg2, arg3, arg4, arg5);
  2537			break;
  2538	#endif
  2539		default:
  2540			error = -EINVAL;
  2541			break;
  2542		}
  2543		return error;
  2544	}
  2545	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28992 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-01  5:20     ` kernel test robot
  0 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01  5:20 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas, Will Deacon, Shuah Khan
  Cc: kbuild-all, Alan Hayward, Luis Machado, Salil Akerkar,
	Basant Kumar Dwivedi, Szabolcs Nagy, linux-arm-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 11259 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arc-randconfig-r043-20210930 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sys.c: In function '__do_sys_prctl':
>> kernel/sys.c:2467:25: error: implicit declaration of function 'SME_SET_VL'; did you mean 'SVE_SET_VL'? [-Werror=implicit-function-declaration]
    2467 |                 error = SME_SET_VL(arg2);
         |                         ^~~~~~~~~~
         |                         SVE_SET_VL
>> kernel/sys.c:2470:25: error: implicit declaration of function 'SME_GET_VL'; did you mean 'SVE_GET_VL'? [-Werror=implicit-function-declaration]
    2470 |                 error = SME_GET_VL();
         |                         ^~~~~~~~~~
         |                         SVE_GET_VL
   In file included from include/linux/perf_event.h:25,
                    from kernel/sys.c:17:
   At top level:
   arch/arc/include/asm/perf_event.h:126:27: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=]
     126 | static const unsigned int arc_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
         |                           ^~~~~~~~~~~~~~~~~
   arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=]
      91 | static const char * const arc_pmu_ev_hw_map[] = {
         |                           ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +2467 kernel/sys.c

  2263	
  2264	SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
  2265			unsigned long, arg4, unsigned long, arg5)
  2266	{
  2267		struct task_struct *me = current;
  2268		unsigned char comm[sizeof(me->comm)];
  2269		long error;
  2270	
  2271		error = security_task_prctl(option, arg2, arg3, arg4, arg5);
  2272		if (error != -ENOSYS)
  2273			return error;
  2274	
  2275		error = 0;
  2276		switch (option) {
  2277		case PR_SET_PDEATHSIG:
  2278			if (!valid_signal(arg2)) {
  2279				error = -EINVAL;
  2280				break;
  2281			}
  2282			me->pdeath_signal = arg2;
  2283			break;
  2284		case PR_GET_PDEATHSIG:
  2285			error = put_user(me->pdeath_signal, (int __user *)arg2);
  2286			break;
  2287		case PR_GET_DUMPABLE:
  2288			error = get_dumpable(me->mm);
  2289			break;
  2290		case PR_SET_DUMPABLE:
  2291			if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) {
  2292				error = -EINVAL;
  2293				break;
  2294			}
  2295			set_dumpable(me->mm, arg2);
  2296			break;
  2297	
  2298		case PR_SET_UNALIGN:
  2299			error = SET_UNALIGN_CTL(me, arg2);
  2300			break;
  2301		case PR_GET_UNALIGN:
  2302			error = GET_UNALIGN_CTL(me, arg2);
  2303			break;
  2304		case PR_SET_FPEMU:
  2305			error = SET_FPEMU_CTL(me, arg2);
  2306			break;
  2307		case PR_GET_FPEMU:
  2308			error = GET_FPEMU_CTL(me, arg2);
  2309			break;
  2310		case PR_SET_FPEXC:
  2311			error = SET_FPEXC_CTL(me, arg2);
  2312			break;
  2313		case PR_GET_FPEXC:
  2314			error = GET_FPEXC_CTL(me, arg2);
  2315			break;
  2316		case PR_GET_TIMING:
  2317			error = PR_TIMING_STATISTICAL;
  2318			break;
  2319		case PR_SET_TIMING:
  2320			if (arg2 != PR_TIMING_STATISTICAL)
  2321				error = -EINVAL;
  2322			break;
  2323		case PR_SET_NAME:
  2324			comm[sizeof(me->comm) - 1] = 0;
  2325			if (strncpy_from_user(comm, (char __user *)arg2,
  2326					      sizeof(me->comm) - 1) < 0)
  2327				return -EFAULT;
  2328			set_task_comm(me, comm);
  2329			proc_comm_connector(me);
  2330			break;
  2331		case PR_GET_NAME:
  2332			get_task_comm(comm, me);
  2333			if (copy_to_user((char __user *)arg2, comm, sizeof(comm)))
  2334				return -EFAULT;
  2335			break;
  2336		case PR_GET_ENDIAN:
  2337			error = GET_ENDIAN(me, arg2);
  2338			break;
  2339		case PR_SET_ENDIAN:
  2340			error = SET_ENDIAN(me, arg2);
  2341			break;
  2342		case PR_GET_SECCOMP:
  2343			error = prctl_get_seccomp();
  2344			break;
  2345		case PR_SET_SECCOMP:
  2346			error = prctl_set_seccomp(arg2, (char __user *)arg3);
  2347			break;
  2348		case PR_GET_TSC:
  2349			error = GET_TSC_CTL(arg2);
  2350			break;
  2351		case PR_SET_TSC:
  2352			error = SET_TSC_CTL(arg2);
  2353			break;
  2354		case PR_TASK_PERF_EVENTS_DISABLE:
  2355			error = perf_event_task_disable();
  2356			break;
  2357		case PR_TASK_PERF_EVENTS_ENABLE:
  2358			error = perf_event_task_enable();
  2359			break;
  2360		case PR_GET_TIMERSLACK:
  2361			if (current->timer_slack_ns > ULONG_MAX)
  2362				error = ULONG_MAX;
  2363			else
  2364				error = current->timer_slack_ns;
  2365			break;
  2366		case PR_SET_TIMERSLACK:
  2367			if (arg2 <= 0)
  2368				current->timer_slack_ns =
  2369						current->default_timer_slack_ns;
  2370			else
  2371				current->timer_slack_ns = arg2;
  2372			break;
  2373		case PR_MCE_KILL:
  2374			if (arg4 | arg5)
  2375				return -EINVAL;
  2376			switch (arg2) {
  2377			case PR_MCE_KILL_CLEAR:
  2378				if (arg3 != 0)
  2379					return -EINVAL;
  2380				current->flags &= ~PF_MCE_PROCESS;
  2381				break;
  2382			case PR_MCE_KILL_SET:
  2383				current->flags |= PF_MCE_PROCESS;
  2384				if (arg3 == PR_MCE_KILL_EARLY)
  2385					current->flags |= PF_MCE_EARLY;
  2386				else if (arg3 == PR_MCE_KILL_LATE)
  2387					current->flags &= ~PF_MCE_EARLY;
  2388				else if (arg3 == PR_MCE_KILL_DEFAULT)
  2389					current->flags &=
  2390							~(PF_MCE_EARLY|PF_MCE_PROCESS);
  2391				else
  2392					return -EINVAL;
  2393				break;
  2394			default:
  2395				return -EINVAL;
  2396			}
  2397			break;
  2398		case PR_MCE_KILL_GET:
  2399			if (arg2 | arg3 | arg4 | arg5)
  2400				return -EINVAL;
  2401			if (current->flags & PF_MCE_PROCESS)
  2402				error = (current->flags & PF_MCE_EARLY) ?
  2403					PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
  2404			else
  2405				error = PR_MCE_KILL_DEFAULT;
  2406			break;
  2407		case PR_SET_MM:
  2408			error = prctl_set_mm(arg2, arg3, arg4, arg5);
  2409			break;
  2410		case PR_GET_TID_ADDRESS:
  2411			error = prctl_get_tid_address(me, (int __user * __user *)arg2);
  2412			break;
  2413		case PR_SET_CHILD_SUBREAPER:
  2414			me->signal->is_child_subreaper = !!arg2;
  2415			if (!arg2)
  2416				break;
  2417	
  2418			walk_process_tree(me, propagate_has_child_subreaper, NULL);
  2419			break;
  2420		case PR_GET_CHILD_SUBREAPER:
  2421			error = put_user(me->signal->is_child_subreaper,
  2422					 (int __user *)arg2);
  2423			break;
  2424		case PR_SET_NO_NEW_PRIVS:
  2425			if (arg2 != 1 || arg3 || arg4 || arg5)
  2426				return -EINVAL;
  2427	
  2428			task_set_no_new_privs(current);
  2429			break;
  2430		case PR_GET_NO_NEW_PRIVS:
  2431			if (arg2 || arg3 || arg4 || arg5)
  2432				return -EINVAL;
  2433			return task_no_new_privs(current) ? 1 : 0;
  2434		case PR_GET_THP_DISABLE:
  2435			if (arg2 || arg3 || arg4 || arg5)
  2436				return -EINVAL;
  2437			error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags);
  2438			break;
  2439		case PR_SET_THP_DISABLE:
  2440			if (arg3 || arg4 || arg5)
  2441				return -EINVAL;
  2442			if (mmap_write_lock_killable(me->mm))
  2443				return -EINTR;
  2444			if (arg2)
  2445				set_bit(MMF_DISABLE_THP, &me->mm->flags);
  2446			else
  2447				clear_bit(MMF_DISABLE_THP, &me->mm->flags);
  2448			mmap_write_unlock(me->mm);
  2449			break;
  2450		case PR_MPX_ENABLE_MANAGEMENT:
  2451		case PR_MPX_DISABLE_MANAGEMENT:
  2452			/* No longer implemented: */
  2453			return -EINVAL;
  2454		case PR_SET_FP_MODE:
  2455			error = SET_FP_MODE(me, arg2);
  2456			break;
  2457		case PR_GET_FP_MODE:
  2458			error = GET_FP_MODE(me);
  2459			break;
  2460		case PR_SVE_SET_VL:
  2461			error = SVE_SET_VL(arg2);
  2462			break;
  2463		case PR_SVE_GET_VL:
  2464			error = SVE_GET_VL();
  2465			break;
  2466		case PR_SME_SET_VL:
> 2467			error = SME_SET_VL(arg2);
  2468			break;
  2469		case PR_SME_GET_VL:
> 2470			error = SME_GET_VL();
  2471			break;
  2472		case PR_GET_SPECULATION_CTRL:
  2473			if (arg3 || arg4 || arg5)
  2474				return -EINVAL;
  2475			error = arch_prctl_spec_ctrl_get(me, arg2);
  2476			break;
  2477		case PR_SET_SPECULATION_CTRL:
  2478			if (arg4 || arg5)
  2479				return -EINVAL;
  2480			error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
  2481			break;
  2482		case PR_PAC_RESET_KEYS:
  2483			if (arg3 || arg4 || arg5)
  2484				return -EINVAL;
  2485			error = PAC_RESET_KEYS(me, arg2);
  2486			break;
  2487		case PR_PAC_SET_ENABLED_KEYS:
  2488			if (arg4 || arg5)
  2489				return -EINVAL;
  2490			error = PAC_SET_ENABLED_KEYS(me, arg2, arg3);
  2491			break;
  2492		case PR_PAC_GET_ENABLED_KEYS:
  2493			if (arg2 || arg3 || arg4 || arg5)
  2494				return -EINVAL;
  2495			error = PAC_GET_ENABLED_KEYS(me);
  2496			break;
  2497		case PR_SET_TAGGED_ADDR_CTRL:
  2498			if (arg3 || arg4 || arg5)
  2499				return -EINVAL;
  2500			error = SET_TAGGED_ADDR_CTRL(arg2);
  2501			break;
  2502		case PR_GET_TAGGED_ADDR_CTRL:
  2503			if (arg2 || arg3 || arg4 || arg5)
  2504				return -EINVAL;
  2505			error = GET_TAGGED_ADDR_CTRL();
  2506			break;
  2507		case PR_SET_IO_FLUSHER:
  2508			if (!capable(CAP_SYS_RESOURCE))
  2509				return -EPERM;
  2510	
  2511			if (arg3 || arg4 || arg5)
  2512				return -EINVAL;
  2513	
  2514			if (arg2 == 1)
  2515				current->flags |= PR_IO_FLUSHER;
  2516			else if (!arg2)
  2517				current->flags &= ~PR_IO_FLUSHER;
  2518			else
  2519				return -EINVAL;
  2520			break;
  2521		case PR_GET_IO_FLUSHER:
  2522			if (!capable(CAP_SYS_RESOURCE))
  2523				return -EPERM;
  2524	
  2525			if (arg2 || arg3 || arg4 || arg5)
  2526				return -EINVAL;
  2527	
  2528			error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
  2529			break;
  2530		case PR_SET_SYSCALL_USER_DISPATCH:
  2531			error = set_syscall_user_dispatch(arg2, arg3, arg4,
  2532							  (char __user *) arg5);
  2533			break;
  2534	#ifdef CONFIG_SCHED_CORE
  2535		case PR_SCHED_CORE:
  2536			error = sched_core_share_pid(arg2, arg3, arg4, arg5);
  2537			break;
  2538	#endif
  2539		default:
  2540			error = -EINVAL;
  2541			break;
  2542		}
  2543		return error;
  2544	}
  2545	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28992 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-01  5:20     ` kernel test robot
  0 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01  5:20 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 11596 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arc-randconfig-r043-20210930 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   kernel/sys.c: In function '__do_sys_prctl':
>> kernel/sys.c:2467:25: error: implicit declaration of function 'SME_SET_VL'; did you mean 'SVE_SET_VL'? [-Werror=implicit-function-declaration]
    2467 |                 error = SME_SET_VL(arg2);
         |                         ^~~~~~~~~~
         |                         SVE_SET_VL
>> kernel/sys.c:2470:25: error: implicit declaration of function 'SME_GET_VL'; did you mean 'SVE_GET_VL'? [-Werror=implicit-function-declaration]
    2470 |                 error = SME_GET_VL();
         |                         ^~~~~~~~~~
         |                         SVE_GET_VL
   In file included from include/linux/perf_event.h:25,
                    from kernel/sys.c:17:
   At top level:
   arch/arc/include/asm/perf_event.h:126:27: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=]
     126 | static const unsigned int arc_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
         |                           ^~~~~~~~~~~~~~~~~
   arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=]
      91 | static const char * const arc_pmu_ev_hw_map[] = {
         |                           ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +2467 kernel/sys.c

  2263	
  2264	SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
  2265			unsigned long, arg4, unsigned long, arg5)
  2266	{
  2267		struct task_struct *me = current;
  2268		unsigned char comm[sizeof(me->comm)];
  2269		long error;
  2270	
  2271		error = security_task_prctl(option, arg2, arg3, arg4, arg5);
  2272		if (error != -ENOSYS)
  2273			return error;
  2274	
  2275		error = 0;
  2276		switch (option) {
  2277		case PR_SET_PDEATHSIG:
  2278			if (!valid_signal(arg2)) {
  2279				error = -EINVAL;
  2280				break;
  2281			}
  2282			me->pdeath_signal = arg2;
  2283			break;
  2284		case PR_GET_PDEATHSIG:
  2285			error = put_user(me->pdeath_signal, (int __user *)arg2);
  2286			break;
  2287		case PR_GET_DUMPABLE:
  2288			error = get_dumpable(me->mm);
  2289			break;
  2290		case PR_SET_DUMPABLE:
  2291			if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) {
  2292				error = -EINVAL;
  2293				break;
  2294			}
  2295			set_dumpable(me->mm, arg2);
  2296			break;
  2297	
  2298		case PR_SET_UNALIGN:
  2299			error = SET_UNALIGN_CTL(me, arg2);
  2300			break;
  2301		case PR_GET_UNALIGN:
  2302			error = GET_UNALIGN_CTL(me, arg2);
  2303			break;
  2304		case PR_SET_FPEMU:
  2305			error = SET_FPEMU_CTL(me, arg2);
  2306			break;
  2307		case PR_GET_FPEMU:
  2308			error = GET_FPEMU_CTL(me, arg2);
  2309			break;
  2310		case PR_SET_FPEXC:
  2311			error = SET_FPEXC_CTL(me, arg2);
  2312			break;
  2313		case PR_GET_FPEXC:
  2314			error = GET_FPEXC_CTL(me, arg2);
  2315			break;
  2316		case PR_GET_TIMING:
  2317			error = PR_TIMING_STATISTICAL;
  2318			break;
  2319		case PR_SET_TIMING:
  2320			if (arg2 != PR_TIMING_STATISTICAL)
  2321				error = -EINVAL;
  2322			break;
  2323		case PR_SET_NAME:
  2324			comm[sizeof(me->comm) - 1] = 0;
  2325			if (strncpy_from_user(comm, (char __user *)arg2,
  2326					      sizeof(me->comm) - 1) < 0)
  2327				return -EFAULT;
  2328			set_task_comm(me, comm);
  2329			proc_comm_connector(me);
  2330			break;
  2331		case PR_GET_NAME:
  2332			get_task_comm(comm, me);
  2333			if (copy_to_user((char __user *)arg2, comm, sizeof(comm)))
  2334				return -EFAULT;
  2335			break;
  2336		case PR_GET_ENDIAN:
  2337			error = GET_ENDIAN(me, arg2);
  2338			break;
  2339		case PR_SET_ENDIAN:
  2340			error = SET_ENDIAN(me, arg2);
  2341			break;
  2342		case PR_GET_SECCOMP:
  2343			error = prctl_get_seccomp();
  2344			break;
  2345		case PR_SET_SECCOMP:
  2346			error = prctl_set_seccomp(arg2, (char __user *)arg3);
  2347			break;
  2348		case PR_GET_TSC:
  2349			error = GET_TSC_CTL(arg2);
  2350			break;
  2351		case PR_SET_TSC:
  2352			error = SET_TSC_CTL(arg2);
  2353			break;
  2354		case PR_TASK_PERF_EVENTS_DISABLE:
  2355			error = perf_event_task_disable();
  2356			break;
  2357		case PR_TASK_PERF_EVENTS_ENABLE:
  2358			error = perf_event_task_enable();
  2359			break;
  2360		case PR_GET_TIMERSLACK:
  2361			if (current->timer_slack_ns > ULONG_MAX)
  2362				error = ULONG_MAX;
  2363			else
  2364				error = current->timer_slack_ns;
  2365			break;
  2366		case PR_SET_TIMERSLACK:
  2367			if (arg2 <= 0)
  2368				current->timer_slack_ns =
  2369						current->default_timer_slack_ns;
  2370			else
  2371				current->timer_slack_ns = arg2;
  2372			break;
  2373		case PR_MCE_KILL:
  2374			if (arg4 | arg5)
  2375				return -EINVAL;
  2376			switch (arg2) {
  2377			case PR_MCE_KILL_CLEAR:
  2378				if (arg3 != 0)
  2379					return -EINVAL;
  2380				current->flags &= ~PF_MCE_PROCESS;
  2381				break;
  2382			case PR_MCE_KILL_SET:
  2383				current->flags |= PF_MCE_PROCESS;
  2384				if (arg3 == PR_MCE_KILL_EARLY)
  2385					current->flags |= PF_MCE_EARLY;
  2386				else if (arg3 == PR_MCE_KILL_LATE)
  2387					current->flags &= ~PF_MCE_EARLY;
  2388				else if (arg3 == PR_MCE_KILL_DEFAULT)
  2389					current->flags &=
  2390							~(PF_MCE_EARLY|PF_MCE_PROCESS);
  2391				else
  2392					return -EINVAL;
  2393				break;
  2394			default:
  2395				return -EINVAL;
  2396			}
  2397			break;
  2398		case PR_MCE_KILL_GET:
  2399			if (arg2 | arg3 | arg4 | arg5)
  2400				return -EINVAL;
  2401			if (current->flags & PF_MCE_PROCESS)
  2402				error = (current->flags & PF_MCE_EARLY) ?
  2403					PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
  2404			else
  2405				error = PR_MCE_KILL_DEFAULT;
  2406			break;
  2407		case PR_SET_MM:
  2408			error = prctl_set_mm(arg2, arg3, arg4, arg5);
  2409			break;
  2410		case PR_GET_TID_ADDRESS:
  2411			error = prctl_get_tid_address(me, (int __user * __user *)arg2);
  2412			break;
  2413		case PR_SET_CHILD_SUBREAPER:
  2414			me->signal->is_child_subreaper = !!arg2;
  2415			if (!arg2)
  2416				break;
  2417	
  2418			walk_process_tree(me, propagate_has_child_subreaper, NULL);
  2419			break;
  2420		case PR_GET_CHILD_SUBREAPER:
  2421			error = put_user(me->signal->is_child_subreaper,
  2422					 (int __user *)arg2);
  2423			break;
  2424		case PR_SET_NO_NEW_PRIVS:
  2425			if (arg2 != 1 || arg3 || arg4 || arg5)
  2426				return -EINVAL;
  2427	
  2428			task_set_no_new_privs(current);
  2429			break;
  2430		case PR_GET_NO_NEW_PRIVS:
  2431			if (arg2 || arg3 || arg4 || arg5)
  2432				return -EINVAL;
  2433			return task_no_new_privs(current) ? 1 : 0;
  2434		case PR_GET_THP_DISABLE:
  2435			if (arg2 || arg3 || arg4 || arg5)
  2436				return -EINVAL;
  2437			error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags);
  2438			break;
  2439		case PR_SET_THP_DISABLE:
  2440			if (arg3 || arg4 || arg5)
  2441				return -EINVAL;
  2442			if (mmap_write_lock_killable(me->mm))
  2443				return -EINTR;
  2444			if (arg2)
  2445				set_bit(MMF_DISABLE_THP, &me->mm->flags);
  2446			else
  2447				clear_bit(MMF_DISABLE_THP, &me->mm->flags);
  2448			mmap_write_unlock(me->mm);
  2449			break;
  2450		case PR_MPX_ENABLE_MANAGEMENT:
  2451		case PR_MPX_DISABLE_MANAGEMENT:
  2452			/* No longer implemented: */
  2453			return -EINVAL;
  2454		case PR_SET_FP_MODE:
  2455			error = SET_FP_MODE(me, arg2);
  2456			break;
  2457		case PR_GET_FP_MODE:
  2458			error = GET_FP_MODE(me);
  2459			break;
  2460		case PR_SVE_SET_VL:
  2461			error = SVE_SET_VL(arg2);
  2462			break;
  2463		case PR_SVE_GET_VL:
  2464			error = SVE_GET_VL();
  2465			break;
  2466		case PR_SME_SET_VL:
> 2467			error = SME_SET_VL(arg2);
  2468			break;
  2469		case PR_SME_GET_VL:
> 2470			error = SME_GET_VL();
  2471			break;
  2472		case PR_GET_SPECULATION_CTRL:
  2473			if (arg3 || arg4 || arg5)
  2474				return -EINVAL;
  2475			error = arch_prctl_spec_ctrl_get(me, arg2);
  2476			break;
  2477		case PR_SET_SPECULATION_CTRL:
  2478			if (arg4 || arg5)
  2479				return -EINVAL;
  2480			error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
  2481			break;
  2482		case PR_PAC_RESET_KEYS:
  2483			if (arg3 || arg4 || arg5)
  2484				return -EINVAL;
  2485			error = PAC_RESET_KEYS(me, arg2);
  2486			break;
  2487		case PR_PAC_SET_ENABLED_KEYS:
  2488			if (arg4 || arg5)
  2489				return -EINVAL;
  2490			error = PAC_SET_ENABLED_KEYS(me, arg2, arg3);
  2491			break;
  2492		case PR_PAC_GET_ENABLED_KEYS:
  2493			if (arg2 || arg3 || arg4 || arg5)
  2494				return -EINVAL;
  2495			error = PAC_GET_ENABLED_KEYS(me);
  2496			break;
  2497		case PR_SET_TAGGED_ADDR_CTRL:
  2498			if (arg3 || arg4 || arg5)
  2499				return -EINVAL;
  2500			error = SET_TAGGED_ADDR_CTRL(arg2);
  2501			break;
  2502		case PR_GET_TAGGED_ADDR_CTRL:
  2503			if (arg2 || arg3 || arg4 || arg5)
  2504				return -EINVAL;
  2505			error = GET_TAGGED_ADDR_CTRL();
  2506			break;
  2507		case PR_SET_IO_FLUSHER:
  2508			if (!capable(CAP_SYS_RESOURCE))
  2509				return -EPERM;
  2510	
  2511			if (arg3 || arg4 || arg5)
  2512				return -EINVAL;
  2513	
  2514			if (arg2 == 1)
  2515				current->flags |= PR_IO_FLUSHER;
  2516			else if (!arg2)
  2517				current->flags &= ~PR_IO_FLUSHER;
  2518			else
  2519				return -EINVAL;
  2520			break;
  2521		case PR_GET_IO_FLUSHER:
  2522			if (!capable(CAP_SYS_RESOURCE))
  2523				return -EPERM;
  2524	
  2525			if (arg2 || arg3 || arg4 || arg5)
  2526				return -EINVAL;
  2527	
  2528			error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
  2529			break;
  2530		case PR_SET_SYSCALL_USER_DISPATCH:
  2531			error = set_syscall_user_dispatch(arg2, arg3, arg4,
  2532							  (char __user *) arg5);
  2533			break;
  2534	#ifdef CONFIG_SCHED_CORE
  2535		case PR_SCHED_CORE:
  2536			error = sched_core_share_pid(arg2, arg3, arg4, arg5);
  2537			break;
  2538	#endif
  2539		default:
  2540			error = -EINVAL;
  2541			break;
  2542		}
  2543		return error;
  2544	}
  2545	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 28992 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
  2021-10-01  5:20     ` kernel test robot
  (?)
@ 2021-10-01 12:40       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-01 12:40 UTC (permalink / raw)
  To: kernel test robot
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, kbuild-all,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]

On Fri, Oct 01, 2021 at 01:20:12PM +0800, kernel test robot wrote:

> config: arc-randconfig-r043-20210930 (attached as .config)
> compiler: arc-elf-gcc (GCC) 11.2.0

I suspect you're reporting architectures in alphabetical order - can I
suggest making a priority list of widely used architectures?  I know I
tend to zone out issues which look like they only occur on less actively
maintained architectures as some of them have odd architecture specific
issues that really need to be fixed at the architecture level.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-01 12:40       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-01 12:40 UTC (permalink / raw)
  To: kernel test robot
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, kbuild-all,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest


[-- Attachment #1.1: Type: text/plain, Size: 517 bytes --]

On Fri, Oct 01, 2021 at 01:20:12PM +0800, kernel test robot wrote:

> config: arc-randconfig-r043-20210930 (attached as .config)
> compiler: arc-elf-gcc (GCC) 11.2.0

I suspect you're reporting architectures in alphabetical order - can I
suggest making a priority list of widely used architectures?  I know I
tend to zone out issues which look like they only occur on less actively
maintained architectures as some of them have odd architecture specific
issues that really need to be fixed at the architecture level.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-01 12:40       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-01 12:40 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

On Fri, Oct 01, 2021 at 01:20:12PM +0800, kernel test robot wrote:

> config: arc-randconfig-r043-20210930 (attached as .config)
> compiler: arc-elf-gcc (GCC) 11.2.0

I suspect you're reporting architectures in alphabetical order - can I
suggest making a priority list of widely used architectures?  I know I
tend to zone out issues which look like they only occur on less actively
maintained architectures as some of them have odd architecture specific
issues that really need to be fixed at the architecture level.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
  2021-09-30 18:11   ` Mark Brown
  (?)
@ 2021-10-01 16:38     ` kernel test robot
  -1 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01 16:38 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas, Will Deacon, Shuah Khan
  Cc: llvm, kbuild-all, Alan Hayward, Luis Machado, Salil Akerkar,
	Basant Kumar Dwivedi, Szabolcs Nagy, linux-arm-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 11327 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arm64-randconfig-r024-20210930 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 962e503cc8bc411f7523cc393acae8aae425b1c4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/sys.c:2467:11: error: implicit declaration of function 'sme_set_current_vl' [-Werror,-Wimplicit-function-declaration]
                   error = SME_SET_VL(arg2);
                           ^
   arch/arm64/include/asm/processor.h:360:25: note: expanded from macro 'SME_SET_VL'
   #define SME_SET_VL(arg) sme_set_current_vl(arg)
                           ^
   kernel/sys.c:2467:11: note: did you mean 'sve_set_current_vl'?
   arch/arm64/include/asm/processor.h:360:25: note: expanded from macro 'SME_SET_VL'
   #define SME_SET_VL(arg) sme_set_current_vl(arg)
                           ^
   arch/arm64/include/asm/fpsimd.h:236:19: note: 'sve_set_current_vl' declared here
   static inline int sve_set_current_vl(unsigned long arg)
                     ^
>> kernel/sys.c:2470:11: error: implicit declaration of function 'sme_get_current_vl' [-Werror,-Wimplicit-function-declaration]
                   error = SME_GET_VL();
                           ^
   arch/arm64/include/asm/processor.h:361:22: note: expanded from macro 'SME_GET_VL'
   #define SME_GET_VL()    sme_get_current_vl()
                           ^
   2 errors generated.


vim +/sme_set_current_vl +2467 kernel/sys.c

  2263	
  2264	SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
  2265			unsigned long, arg4, unsigned long, arg5)
  2266	{
  2267		struct task_struct *me = current;
  2268		unsigned char comm[sizeof(me->comm)];
  2269		long error;
  2270	
  2271		error = security_task_prctl(option, arg2, arg3, arg4, arg5);
  2272		if (error != -ENOSYS)
  2273			return error;
  2274	
  2275		error = 0;
  2276		switch (option) {
  2277		case PR_SET_PDEATHSIG:
  2278			if (!valid_signal(arg2)) {
  2279				error = -EINVAL;
  2280				break;
  2281			}
  2282			me->pdeath_signal = arg2;
  2283			break;
  2284		case PR_GET_PDEATHSIG:
  2285			error = put_user(me->pdeath_signal, (int __user *)arg2);
  2286			break;
  2287		case PR_GET_DUMPABLE:
  2288			error = get_dumpable(me->mm);
  2289			break;
  2290		case PR_SET_DUMPABLE:
  2291			if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) {
  2292				error = -EINVAL;
  2293				break;
  2294			}
  2295			set_dumpable(me->mm, arg2);
  2296			break;
  2297	
  2298		case PR_SET_UNALIGN:
  2299			error = SET_UNALIGN_CTL(me, arg2);
  2300			break;
  2301		case PR_GET_UNALIGN:
  2302			error = GET_UNALIGN_CTL(me, arg2);
  2303			break;
  2304		case PR_SET_FPEMU:
  2305			error = SET_FPEMU_CTL(me, arg2);
  2306			break;
  2307		case PR_GET_FPEMU:
  2308			error = GET_FPEMU_CTL(me, arg2);
  2309			break;
  2310		case PR_SET_FPEXC:
  2311			error = SET_FPEXC_CTL(me, arg2);
  2312			break;
  2313		case PR_GET_FPEXC:
  2314			error = GET_FPEXC_CTL(me, arg2);
  2315			break;
  2316		case PR_GET_TIMING:
  2317			error = PR_TIMING_STATISTICAL;
  2318			break;
  2319		case PR_SET_TIMING:
  2320			if (arg2 != PR_TIMING_STATISTICAL)
  2321				error = -EINVAL;
  2322			break;
  2323		case PR_SET_NAME:
  2324			comm[sizeof(me->comm) - 1] = 0;
  2325			if (strncpy_from_user(comm, (char __user *)arg2,
  2326					      sizeof(me->comm) - 1) < 0)
  2327				return -EFAULT;
  2328			set_task_comm(me, comm);
  2329			proc_comm_connector(me);
  2330			break;
  2331		case PR_GET_NAME:
  2332			get_task_comm(comm, me);
  2333			if (copy_to_user((char __user *)arg2, comm, sizeof(comm)))
  2334				return -EFAULT;
  2335			break;
  2336		case PR_GET_ENDIAN:
  2337			error = GET_ENDIAN(me, arg2);
  2338			break;
  2339		case PR_SET_ENDIAN:
  2340			error = SET_ENDIAN(me, arg2);
  2341			break;
  2342		case PR_GET_SECCOMP:
  2343			error = prctl_get_seccomp();
  2344			break;
  2345		case PR_SET_SECCOMP:
  2346			error = prctl_set_seccomp(arg2, (char __user *)arg3);
  2347			break;
  2348		case PR_GET_TSC:
  2349			error = GET_TSC_CTL(arg2);
  2350			break;
  2351		case PR_SET_TSC:
  2352			error = SET_TSC_CTL(arg2);
  2353			break;
  2354		case PR_TASK_PERF_EVENTS_DISABLE:
  2355			error = perf_event_task_disable();
  2356			break;
  2357		case PR_TASK_PERF_EVENTS_ENABLE:
  2358			error = perf_event_task_enable();
  2359			break;
  2360		case PR_GET_TIMERSLACK:
  2361			if (current->timer_slack_ns > ULONG_MAX)
  2362				error = ULONG_MAX;
  2363			else
  2364				error = current->timer_slack_ns;
  2365			break;
  2366		case PR_SET_TIMERSLACK:
  2367			if (arg2 <= 0)
  2368				current->timer_slack_ns =
  2369						current->default_timer_slack_ns;
  2370			else
  2371				current->timer_slack_ns = arg2;
  2372			break;
  2373		case PR_MCE_KILL:
  2374			if (arg4 | arg5)
  2375				return -EINVAL;
  2376			switch (arg2) {
  2377			case PR_MCE_KILL_CLEAR:
  2378				if (arg3 != 0)
  2379					return -EINVAL;
  2380				current->flags &= ~PF_MCE_PROCESS;
  2381				break;
  2382			case PR_MCE_KILL_SET:
  2383				current->flags |= PF_MCE_PROCESS;
  2384				if (arg3 == PR_MCE_KILL_EARLY)
  2385					current->flags |= PF_MCE_EARLY;
  2386				else if (arg3 == PR_MCE_KILL_LATE)
  2387					current->flags &= ~PF_MCE_EARLY;
  2388				else if (arg3 == PR_MCE_KILL_DEFAULT)
  2389					current->flags &=
  2390							~(PF_MCE_EARLY|PF_MCE_PROCESS);
  2391				else
  2392					return -EINVAL;
  2393				break;
  2394			default:
  2395				return -EINVAL;
  2396			}
  2397			break;
  2398		case PR_MCE_KILL_GET:
  2399			if (arg2 | arg3 | arg4 | arg5)
  2400				return -EINVAL;
  2401			if (current->flags & PF_MCE_PROCESS)
  2402				error = (current->flags & PF_MCE_EARLY) ?
  2403					PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
  2404			else
  2405				error = PR_MCE_KILL_DEFAULT;
  2406			break;
  2407		case PR_SET_MM:
  2408			error = prctl_set_mm(arg2, arg3, arg4, arg5);
  2409			break;
  2410		case PR_GET_TID_ADDRESS:
  2411			error = prctl_get_tid_address(me, (int __user * __user *)arg2);
  2412			break;
  2413		case PR_SET_CHILD_SUBREAPER:
  2414			me->signal->is_child_subreaper = !!arg2;
  2415			if (!arg2)
  2416				break;
  2417	
  2418			walk_process_tree(me, propagate_has_child_subreaper, NULL);
  2419			break;
  2420		case PR_GET_CHILD_SUBREAPER:
  2421			error = put_user(me->signal->is_child_subreaper,
  2422					 (int __user *)arg2);
  2423			break;
  2424		case PR_SET_NO_NEW_PRIVS:
  2425			if (arg2 != 1 || arg3 || arg4 || arg5)
  2426				return -EINVAL;
  2427	
  2428			task_set_no_new_privs(current);
  2429			break;
  2430		case PR_GET_NO_NEW_PRIVS:
  2431			if (arg2 || arg3 || arg4 || arg5)
  2432				return -EINVAL;
  2433			return task_no_new_privs(current) ? 1 : 0;
  2434		case PR_GET_THP_DISABLE:
  2435			if (arg2 || arg3 || arg4 || arg5)
  2436				return -EINVAL;
  2437			error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags);
  2438			break;
  2439		case PR_SET_THP_DISABLE:
  2440			if (arg3 || arg4 || arg5)
  2441				return -EINVAL;
  2442			if (mmap_write_lock_killable(me->mm))
  2443				return -EINTR;
  2444			if (arg2)
  2445				set_bit(MMF_DISABLE_THP, &me->mm->flags);
  2446			else
  2447				clear_bit(MMF_DISABLE_THP, &me->mm->flags);
  2448			mmap_write_unlock(me->mm);
  2449			break;
  2450		case PR_MPX_ENABLE_MANAGEMENT:
  2451		case PR_MPX_DISABLE_MANAGEMENT:
  2452			/* No longer implemented: */
  2453			return -EINVAL;
  2454		case PR_SET_FP_MODE:
  2455			error = SET_FP_MODE(me, arg2);
  2456			break;
  2457		case PR_GET_FP_MODE:
  2458			error = GET_FP_MODE(me);
  2459			break;
  2460		case PR_SVE_SET_VL:
  2461			error = SVE_SET_VL(arg2);
  2462			break;
  2463		case PR_SVE_GET_VL:
  2464			error = SVE_GET_VL();
  2465			break;
  2466		case PR_SME_SET_VL:
> 2467			error = SME_SET_VL(arg2);
  2468			break;
  2469		case PR_SME_GET_VL:
> 2470			error = SME_GET_VL();
  2471			break;
  2472		case PR_GET_SPECULATION_CTRL:
  2473			if (arg3 || arg4 || arg5)
  2474				return -EINVAL;
  2475			error = arch_prctl_spec_ctrl_get(me, arg2);
  2476			break;
  2477		case PR_SET_SPECULATION_CTRL:
  2478			if (arg4 || arg5)
  2479				return -EINVAL;
  2480			error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
  2481			break;
  2482		case PR_PAC_RESET_KEYS:
  2483			if (arg3 || arg4 || arg5)
  2484				return -EINVAL;
  2485			error = PAC_RESET_KEYS(me, arg2);
  2486			break;
  2487		case PR_PAC_SET_ENABLED_KEYS:
  2488			if (arg4 || arg5)
  2489				return -EINVAL;
  2490			error = PAC_SET_ENABLED_KEYS(me, arg2, arg3);
  2491			break;
  2492		case PR_PAC_GET_ENABLED_KEYS:
  2493			if (arg2 || arg3 || arg4 || arg5)
  2494				return -EINVAL;
  2495			error = PAC_GET_ENABLED_KEYS(me);
  2496			break;
  2497		case PR_SET_TAGGED_ADDR_CTRL:
  2498			if (arg3 || arg4 || arg5)
  2499				return -EINVAL;
  2500			error = SET_TAGGED_ADDR_CTRL(arg2);
  2501			break;
  2502		case PR_GET_TAGGED_ADDR_CTRL:
  2503			if (arg2 || arg3 || arg4 || arg5)
  2504				return -EINVAL;
  2505			error = GET_TAGGED_ADDR_CTRL();
  2506			break;
  2507		case PR_SET_IO_FLUSHER:
  2508			if (!capable(CAP_SYS_RESOURCE))
  2509				return -EPERM;
  2510	
  2511			if (arg3 || arg4 || arg5)
  2512				return -EINVAL;
  2513	
  2514			if (arg2 == 1)
  2515				current->flags |= PR_IO_FLUSHER;
  2516			else if (!arg2)
  2517				current->flags &= ~PR_IO_FLUSHER;
  2518			else
  2519				return -EINVAL;
  2520			break;
  2521		case PR_GET_IO_FLUSHER:
  2522			if (!capable(CAP_SYS_RESOURCE))
  2523				return -EPERM;
  2524	
  2525			if (arg2 || arg3 || arg4 || arg5)
  2526				return -EINVAL;
  2527	
  2528			error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
  2529			break;
  2530		case PR_SET_SYSCALL_USER_DISPATCH:
  2531			error = set_syscall_user_dispatch(arg2, arg3, arg4,
  2532							  (char __user *) arg5);
  2533			break;
  2534	#ifdef CONFIG_SCHED_CORE
  2535		case PR_SCHED_CORE:
  2536			error = sched_core_share_pid(arg2, arg3, arg4, arg5);
  2537			break;
  2538	#endif
  2539		default:
  2540			error = -EINVAL;
  2541			break;
  2542		}
  2543		return error;
  2544	}
  2545	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42378 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-01 16:38     ` kernel test robot
  0 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01 16:38 UTC (permalink / raw)
  To: Mark Brown, Catalin Marinas, Will Deacon, Shuah Khan
  Cc: llvm, kbuild-all, Alan Hayward, Luis Machado, Salil Akerkar,
	Basant Kumar Dwivedi, Szabolcs Nagy, linux-arm-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 11327 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arm64-randconfig-r024-20210930 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 962e503cc8bc411f7523cc393acae8aae425b1c4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/sys.c:2467:11: error: implicit declaration of function 'sme_set_current_vl' [-Werror,-Wimplicit-function-declaration]
                   error = SME_SET_VL(arg2);
                           ^
   arch/arm64/include/asm/processor.h:360:25: note: expanded from macro 'SME_SET_VL'
   #define SME_SET_VL(arg) sme_set_current_vl(arg)
                           ^
   kernel/sys.c:2467:11: note: did you mean 'sve_set_current_vl'?
   arch/arm64/include/asm/processor.h:360:25: note: expanded from macro 'SME_SET_VL'
   #define SME_SET_VL(arg) sme_set_current_vl(arg)
                           ^
   arch/arm64/include/asm/fpsimd.h:236:19: note: 'sve_set_current_vl' declared here
   static inline int sve_set_current_vl(unsigned long arg)
                     ^
>> kernel/sys.c:2470:11: error: implicit declaration of function 'sme_get_current_vl' [-Werror,-Wimplicit-function-declaration]
                   error = SME_GET_VL();
                           ^
   arch/arm64/include/asm/processor.h:361:22: note: expanded from macro 'SME_GET_VL'
   #define SME_GET_VL()    sme_get_current_vl()
                           ^
   2 errors generated.


vim +/sme_set_current_vl +2467 kernel/sys.c

  2263	
  2264	SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
  2265			unsigned long, arg4, unsigned long, arg5)
  2266	{
  2267		struct task_struct *me = current;
  2268		unsigned char comm[sizeof(me->comm)];
  2269		long error;
  2270	
  2271		error = security_task_prctl(option, arg2, arg3, arg4, arg5);
  2272		if (error != -ENOSYS)
  2273			return error;
  2274	
  2275		error = 0;
  2276		switch (option) {
  2277		case PR_SET_PDEATHSIG:
  2278			if (!valid_signal(arg2)) {
  2279				error = -EINVAL;
  2280				break;
  2281			}
  2282			me->pdeath_signal = arg2;
  2283			break;
  2284		case PR_GET_PDEATHSIG:
  2285			error = put_user(me->pdeath_signal, (int __user *)arg2);
  2286			break;
  2287		case PR_GET_DUMPABLE:
  2288			error = get_dumpable(me->mm);
  2289			break;
  2290		case PR_SET_DUMPABLE:
  2291			if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) {
  2292				error = -EINVAL;
  2293				break;
  2294			}
  2295			set_dumpable(me->mm, arg2);
  2296			break;
  2297	
  2298		case PR_SET_UNALIGN:
  2299			error = SET_UNALIGN_CTL(me, arg2);
  2300			break;
  2301		case PR_GET_UNALIGN:
  2302			error = GET_UNALIGN_CTL(me, arg2);
  2303			break;
  2304		case PR_SET_FPEMU:
  2305			error = SET_FPEMU_CTL(me, arg2);
  2306			break;
  2307		case PR_GET_FPEMU:
  2308			error = GET_FPEMU_CTL(me, arg2);
  2309			break;
  2310		case PR_SET_FPEXC:
  2311			error = SET_FPEXC_CTL(me, arg2);
  2312			break;
  2313		case PR_GET_FPEXC:
  2314			error = GET_FPEXC_CTL(me, arg2);
  2315			break;
  2316		case PR_GET_TIMING:
  2317			error = PR_TIMING_STATISTICAL;
  2318			break;
  2319		case PR_SET_TIMING:
  2320			if (arg2 != PR_TIMING_STATISTICAL)
  2321				error = -EINVAL;
  2322			break;
  2323		case PR_SET_NAME:
  2324			comm[sizeof(me->comm) - 1] = 0;
  2325			if (strncpy_from_user(comm, (char __user *)arg2,
  2326					      sizeof(me->comm) - 1) < 0)
  2327				return -EFAULT;
  2328			set_task_comm(me, comm);
  2329			proc_comm_connector(me);
  2330			break;
  2331		case PR_GET_NAME:
  2332			get_task_comm(comm, me);
  2333			if (copy_to_user((char __user *)arg2, comm, sizeof(comm)))
  2334				return -EFAULT;
  2335			break;
  2336		case PR_GET_ENDIAN:
  2337			error = GET_ENDIAN(me, arg2);
  2338			break;
  2339		case PR_SET_ENDIAN:
  2340			error = SET_ENDIAN(me, arg2);
  2341			break;
  2342		case PR_GET_SECCOMP:
  2343			error = prctl_get_seccomp();
  2344			break;
  2345		case PR_SET_SECCOMP:
  2346			error = prctl_set_seccomp(arg2, (char __user *)arg3);
  2347			break;
  2348		case PR_GET_TSC:
  2349			error = GET_TSC_CTL(arg2);
  2350			break;
  2351		case PR_SET_TSC:
  2352			error = SET_TSC_CTL(arg2);
  2353			break;
  2354		case PR_TASK_PERF_EVENTS_DISABLE:
  2355			error = perf_event_task_disable();
  2356			break;
  2357		case PR_TASK_PERF_EVENTS_ENABLE:
  2358			error = perf_event_task_enable();
  2359			break;
  2360		case PR_GET_TIMERSLACK:
  2361			if (current->timer_slack_ns > ULONG_MAX)
  2362				error = ULONG_MAX;
  2363			else
  2364				error = current->timer_slack_ns;
  2365			break;
  2366		case PR_SET_TIMERSLACK:
  2367			if (arg2 <= 0)
  2368				current->timer_slack_ns =
  2369						current->default_timer_slack_ns;
  2370			else
  2371				current->timer_slack_ns = arg2;
  2372			break;
  2373		case PR_MCE_KILL:
  2374			if (arg4 | arg5)
  2375				return -EINVAL;
  2376			switch (arg2) {
  2377			case PR_MCE_KILL_CLEAR:
  2378				if (arg3 != 0)
  2379					return -EINVAL;
  2380				current->flags &= ~PF_MCE_PROCESS;
  2381				break;
  2382			case PR_MCE_KILL_SET:
  2383				current->flags |= PF_MCE_PROCESS;
  2384				if (arg3 == PR_MCE_KILL_EARLY)
  2385					current->flags |= PF_MCE_EARLY;
  2386				else if (arg3 == PR_MCE_KILL_LATE)
  2387					current->flags &= ~PF_MCE_EARLY;
  2388				else if (arg3 == PR_MCE_KILL_DEFAULT)
  2389					current->flags &=
  2390							~(PF_MCE_EARLY|PF_MCE_PROCESS);
  2391				else
  2392					return -EINVAL;
  2393				break;
  2394			default:
  2395				return -EINVAL;
  2396			}
  2397			break;
  2398		case PR_MCE_KILL_GET:
  2399			if (arg2 | arg3 | arg4 | arg5)
  2400				return -EINVAL;
  2401			if (current->flags & PF_MCE_PROCESS)
  2402				error = (current->flags & PF_MCE_EARLY) ?
  2403					PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
  2404			else
  2405				error = PR_MCE_KILL_DEFAULT;
  2406			break;
  2407		case PR_SET_MM:
  2408			error = prctl_set_mm(arg2, arg3, arg4, arg5);
  2409			break;
  2410		case PR_GET_TID_ADDRESS:
  2411			error = prctl_get_tid_address(me, (int __user * __user *)arg2);
  2412			break;
  2413		case PR_SET_CHILD_SUBREAPER:
  2414			me->signal->is_child_subreaper = !!arg2;
  2415			if (!arg2)
  2416				break;
  2417	
  2418			walk_process_tree(me, propagate_has_child_subreaper, NULL);
  2419			break;
  2420		case PR_GET_CHILD_SUBREAPER:
  2421			error = put_user(me->signal->is_child_subreaper,
  2422					 (int __user *)arg2);
  2423			break;
  2424		case PR_SET_NO_NEW_PRIVS:
  2425			if (arg2 != 1 || arg3 || arg4 || arg5)
  2426				return -EINVAL;
  2427	
  2428			task_set_no_new_privs(current);
  2429			break;
  2430		case PR_GET_NO_NEW_PRIVS:
  2431			if (arg2 || arg3 || arg4 || arg5)
  2432				return -EINVAL;
  2433			return task_no_new_privs(current) ? 1 : 0;
  2434		case PR_GET_THP_DISABLE:
  2435			if (arg2 || arg3 || arg4 || arg5)
  2436				return -EINVAL;
  2437			error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags);
  2438			break;
  2439		case PR_SET_THP_DISABLE:
  2440			if (arg3 || arg4 || arg5)
  2441				return -EINVAL;
  2442			if (mmap_write_lock_killable(me->mm))
  2443				return -EINTR;
  2444			if (arg2)
  2445				set_bit(MMF_DISABLE_THP, &me->mm->flags);
  2446			else
  2447				clear_bit(MMF_DISABLE_THP, &me->mm->flags);
  2448			mmap_write_unlock(me->mm);
  2449			break;
  2450		case PR_MPX_ENABLE_MANAGEMENT:
  2451		case PR_MPX_DISABLE_MANAGEMENT:
  2452			/* No longer implemented: */
  2453			return -EINVAL;
  2454		case PR_SET_FP_MODE:
  2455			error = SET_FP_MODE(me, arg2);
  2456			break;
  2457		case PR_GET_FP_MODE:
  2458			error = GET_FP_MODE(me);
  2459			break;
  2460		case PR_SVE_SET_VL:
  2461			error = SVE_SET_VL(arg2);
  2462			break;
  2463		case PR_SVE_GET_VL:
  2464			error = SVE_GET_VL();
  2465			break;
  2466		case PR_SME_SET_VL:
> 2467			error = SME_SET_VL(arg2);
  2468			break;
  2469		case PR_SME_GET_VL:
> 2470			error = SME_GET_VL();
  2471			break;
  2472		case PR_GET_SPECULATION_CTRL:
  2473			if (arg3 || arg4 || arg5)
  2474				return -EINVAL;
  2475			error = arch_prctl_spec_ctrl_get(me, arg2);
  2476			break;
  2477		case PR_SET_SPECULATION_CTRL:
  2478			if (arg4 || arg5)
  2479				return -EINVAL;
  2480			error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
  2481			break;
  2482		case PR_PAC_RESET_KEYS:
  2483			if (arg3 || arg4 || arg5)
  2484				return -EINVAL;
  2485			error = PAC_RESET_KEYS(me, arg2);
  2486			break;
  2487		case PR_PAC_SET_ENABLED_KEYS:
  2488			if (arg4 || arg5)
  2489				return -EINVAL;
  2490			error = PAC_SET_ENABLED_KEYS(me, arg2, arg3);
  2491			break;
  2492		case PR_PAC_GET_ENABLED_KEYS:
  2493			if (arg2 || arg3 || arg4 || arg5)
  2494				return -EINVAL;
  2495			error = PAC_GET_ENABLED_KEYS(me);
  2496			break;
  2497		case PR_SET_TAGGED_ADDR_CTRL:
  2498			if (arg3 || arg4 || arg5)
  2499				return -EINVAL;
  2500			error = SET_TAGGED_ADDR_CTRL(arg2);
  2501			break;
  2502		case PR_GET_TAGGED_ADDR_CTRL:
  2503			if (arg2 || arg3 || arg4 || arg5)
  2504				return -EINVAL;
  2505			error = GET_TAGGED_ADDR_CTRL();
  2506			break;
  2507		case PR_SET_IO_FLUSHER:
  2508			if (!capable(CAP_SYS_RESOURCE))
  2509				return -EPERM;
  2510	
  2511			if (arg3 || arg4 || arg5)
  2512				return -EINVAL;
  2513	
  2514			if (arg2 == 1)
  2515				current->flags |= PR_IO_FLUSHER;
  2516			else if (!arg2)
  2517				current->flags &= ~PR_IO_FLUSHER;
  2518			else
  2519				return -EINVAL;
  2520			break;
  2521		case PR_GET_IO_FLUSHER:
  2522			if (!capable(CAP_SYS_RESOURCE))
  2523				return -EPERM;
  2524	
  2525			if (arg2 || arg3 || arg4 || arg5)
  2526				return -EINVAL;
  2527	
  2528			error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
  2529			break;
  2530		case PR_SET_SYSCALL_USER_DISPATCH:
  2531			error = set_syscall_user_dispatch(arg2, arg3, arg4,
  2532							  (char __user *) arg5);
  2533			break;
  2534	#ifdef CONFIG_SCHED_CORE
  2535		case PR_SCHED_CORE:
  2536			error = sched_core_share_pid(arg2, arg3, arg4, arg5);
  2537			break;
  2538	#endif
  2539		default:
  2540			error = -EINVAL;
  2541			break;
  2542		}
  2543		return error;
  2544	}
  2545	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 42378 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-01 16:38     ` kernel test robot
  0 siblings, 0 replies; 143+ messages in thread
From: kernel test robot @ 2021-10-01 16:38 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 11667 bytes --]

Hi Mark,

I love your patch! Yet something to improve:

[auto build test ERROR on 8694e5e6388695195a32bd5746635ca166a8df56]

url:    https://github.com/0day-ci/linux/commits/Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
base:   8694e5e6388695195a32bd5746635ca166a8df56
config: arm64-randconfig-r024-20210930 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 962e503cc8bc411f7523cc393acae8aae425b1c4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mark-Brown/arm64-sme-Initial-support-for-the-Scalable-Matrix-Extension/20211001-021749
        git checkout f3b1bea56a1628668cc399d8eaae7ea4cacd8186
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> kernel/sys.c:2467:11: error: implicit declaration of function 'sme_set_current_vl' [-Werror,-Wimplicit-function-declaration]
                   error = SME_SET_VL(arg2);
                           ^
   arch/arm64/include/asm/processor.h:360:25: note: expanded from macro 'SME_SET_VL'
   #define SME_SET_VL(arg) sme_set_current_vl(arg)
                           ^
   kernel/sys.c:2467:11: note: did you mean 'sve_set_current_vl'?
   arch/arm64/include/asm/processor.h:360:25: note: expanded from macro 'SME_SET_VL'
   #define SME_SET_VL(arg) sme_set_current_vl(arg)
                           ^
   arch/arm64/include/asm/fpsimd.h:236:19: note: 'sve_set_current_vl' declared here
   static inline int sve_set_current_vl(unsigned long arg)
                     ^
>> kernel/sys.c:2470:11: error: implicit declaration of function 'sme_get_current_vl' [-Werror,-Wimplicit-function-declaration]
                   error = SME_GET_VL();
                           ^
   arch/arm64/include/asm/processor.h:361:22: note: expanded from macro 'SME_GET_VL'
   #define SME_GET_VL()    sme_get_current_vl()
                           ^
   2 errors generated.


vim +/sme_set_current_vl +2467 kernel/sys.c

  2263	
  2264	SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
  2265			unsigned long, arg4, unsigned long, arg5)
  2266	{
  2267		struct task_struct *me = current;
  2268		unsigned char comm[sizeof(me->comm)];
  2269		long error;
  2270	
  2271		error = security_task_prctl(option, arg2, arg3, arg4, arg5);
  2272		if (error != -ENOSYS)
  2273			return error;
  2274	
  2275		error = 0;
  2276		switch (option) {
  2277		case PR_SET_PDEATHSIG:
  2278			if (!valid_signal(arg2)) {
  2279				error = -EINVAL;
  2280				break;
  2281			}
  2282			me->pdeath_signal = arg2;
  2283			break;
  2284		case PR_GET_PDEATHSIG:
  2285			error = put_user(me->pdeath_signal, (int __user *)arg2);
  2286			break;
  2287		case PR_GET_DUMPABLE:
  2288			error = get_dumpable(me->mm);
  2289			break;
  2290		case PR_SET_DUMPABLE:
  2291			if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) {
  2292				error = -EINVAL;
  2293				break;
  2294			}
  2295			set_dumpable(me->mm, arg2);
  2296			break;
  2297	
  2298		case PR_SET_UNALIGN:
  2299			error = SET_UNALIGN_CTL(me, arg2);
  2300			break;
  2301		case PR_GET_UNALIGN:
  2302			error = GET_UNALIGN_CTL(me, arg2);
  2303			break;
  2304		case PR_SET_FPEMU:
  2305			error = SET_FPEMU_CTL(me, arg2);
  2306			break;
  2307		case PR_GET_FPEMU:
  2308			error = GET_FPEMU_CTL(me, arg2);
  2309			break;
  2310		case PR_SET_FPEXC:
  2311			error = SET_FPEXC_CTL(me, arg2);
  2312			break;
  2313		case PR_GET_FPEXC:
  2314			error = GET_FPEXC_CTL(me, arg2);
  2315			break;
  2316		case PR_GET_TIMING:
  2317			error = PR_TIMING_STATISTICAL;
  2318			break;
  2319		case PR_SET_TIMING:
  2320			if (arg2 != PR_TIMING_STATISTICAL)
  2321				error = -EINVAL;
  2322			break;
  2323		case PR_SET_NAME:
  2324			comm[sizeof(me->comm) - 1] = 0;
  2325			if (strncpy_from_user(comm, (char __user *)arg2,
  2326					      sizeof(me->comm) - 1) < 0)
  2327				return -EFAULT;
  2328			set_task_comm(me, comm);
  2329			proc_comm_connector(me);
  2330			break;
  2331		case PR_GET_NAME:
  2332			get_task_comm(comm, me);
  2333			if (copy_to_user((char __user *)arg2, comm, sizeof(comm)))
  2334				return -EFAULT;
  2335			break;
  2336		case PR_GET_ENDIAN:
  2337			error = GET_ENDIAN(me, arg2);
  2338			break;
  2339		case PR_SET_ENDIAN:
  2340			error = SET_ENDIAN(me, arg2);
  2341			break;
  2342		case PR_GET_SECCOMP:
  2343			error = prctl_get_seccomp();
  2344			break;
  2345		case PR_SET_SECCOMP:
  2346			error = prctl_set_seccomp(arg2, (char __user *)arg3);
  2347			break;
  2348		case PR_GET_TSC:
  2349			error = GET_TSC_CTL(arg2);
  2350			break;
  2351		case PR_SET_TSC:
  2352			error = SET_TSC_CTL(arg2);
  2353			break;
  2354		case PR_TASK_PERF_EVENTS_DISABLE:
  2355			error = perf_event_task_disable();
  2356			break;
  2357		case PR_TASK_PERF_EVENTS_ENABLE:
  2358			error = perf_event_task_enable();
  2359			break;
  2360		case PR_GET_TIMERSLACK:
  2361			if (current->timer_slack_ns > ULONG_MAX)
  2362				error = ULONG_MAX;
  2363			else
  2364				error = current->timer_slack_ns;
  2365			break;
  2366		case PR_SET_TIMERSLACK:
  2367			if (arg2 <= 0)
  2368				current->timer_slack_ns =
  2369						current->default_timer_slack_ns;
  2370			else
  2371				current->timer_slack_ns = arg2;
  2372			break;
  2373		case PR_MCE_KILL:
  2374			if (arg4 | arg5)
  2375				return -EINVAL;
  2376			switch (arg2) {
  2377			case PR_MCE_KILL_CLEAR:
  2378				if (arg3 != 0)
  2379					return -EINVAL;
  2380				current->flags &= ~PF_MCE_PROCESS;
  2381				break;
  2382			case PR_MCE_KILL_SET:
  2383				current->flags |= PF_MCE_PROCESS;
  2384				if (arg3 == PR_MCE_KILL_EARLY)
  2385					current->flags |= PF_MCE_EARLY;
  2386				else if (arg3 == PR_MCE_KILL_LATE)
  2387					current->flags &= ~PF_MCE_EARLY;
  2388				else if (arg3 == PR_MCE_KILL_DEFAULT)
  2389					current->flags &=
  2390							~(PF_MCE_EARLY|PF_MCE_PROCESS);
  2391				else
  2392					return -EINVAL;
  2393				break;
  2394			default:
  2395				return -EINVAL;
  2396			}
  2397			break;
  2398		case PR_MCE_KILL_GET:
  2399			if (arg2 | arg3 | arg4 | arg5)
  2400				return -EINVAL;
  2401			if (current->flags & PF_MCE_PROCESS)
  2402				error = (current->flags & PF_MCE_EARLY) ?
  2403					PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
  2404			else
  2405				error = PR_MCE_KILL_DEFAULT;
  2406			break;
  2407		case PR_SET_MM:
  2408			error = prctl_set_mm(arg2, arg3, arg4, arg5);
  2409			break;
  2410		case PR_GET_TID_ADDRESS:
  2411			error = prctl_get_tid_address(me, (int __user * __user *)arg2);
  2412			break;
  2413		case PR_SET_CHILD_SUBREAPER:
  2414			me->signal->is_child_subreaper = !!arg2;
  2415			if (!arg2)
  2416				break;
  2417	
  2418			walk_process_tree(me, propagate_has_child_subreaper, NULL);
  2419			break;
  2420		case PR_GET_CHILD_SUBREAPER:
  2421			error = put_user(me->signal->is_child_subreaper,
  2422					 (int __user *)arg2);
  2423			break;
  2424		case PR_SET_NO_NEW_PRIVS:
  2425			if (arg2 != 1 || arg3 || arg4 || arg5)
  2426				return -EINVAL;
  2427	
  2428			task_set_no_new_privs(current);
  2429			break;
  2430		case PR_GET_NO_NEW_PRIVS:
  2431			if (arg2 || arg3 || arg4 || arg5)
  2432				return -EINVAL;
  2433			return task_no_new_privs(current) ? 1 : 0;
  2434		case PR_GET_THP_DISABLE:
  2435			if (arg2 || arg3 || arg4 || arg5)
  2436				return -EINVAL;
  2437			error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags);
  2438			break;
  2439		case PR_SET_THP_DISABLE:
  2440			if (arg3 || arg4 || arg5)
  2441				return -EINVAL;
  2442			if (mmap_write_lock_killable(me->mm))
  2443				return -EINTR;
  2444			if (arg2)
  2445				set_bit(MMF_DISABLE_THP, &me->mm->flags);
  2446			else
  2447				clear_bit(MMF_DISABLE_THP, &me->mm->flags);
  2448			mmap_write_unlock(me->mm);
  2449			break;
  2450		case PR_MPX_ENABLE_MANAGEMENT:
  2451		case PR_MPX_DISABLE_MANAGEMENT:
  2452			/* No longer implemented: */
  2453			return -EINVAL;
  2454		case PR_SET_FP_MODE:
  2455			error = SET_FP_MODE(me, arg2);
  2456			break;
  2457		case PR_GET_FP_MODE:
  2458			error = GET_FP_MODE(me);
  2459			break;
  2460		case PR_SVE_SET_VL:
  2461			error = SVE_SET_VL(arg2);
  2462			break;
  2463		case PR_SVE_GET_VL:
  2464			error = SVE_GET_VL();
  2465			break;
  2466		case PR_SME_SET_VL:
> 2467			error = SME_SET_VL(arg2);
  2468			break;
  2469		case PR_SME_GET_VL:
> 2470			error = SME_GET_VL();
  2471			break;
  2472		case PR_GET_SPECULATION_CTRL:
  2473			if (arg3 || arg4 || arg5)
  2474				return -EINVAL;
  2475			error = arch_prctl_spec_ctrl_get(me, arg2);
  2476			break;
  2477		case PR_SET_SPECULATION_CTRL:
  2478			if (arg4 || arg5)
  2479				return -EINVAL;
  2480			error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
  2481			break;
  2482		case PR_PAC_RESET_KEYS:
  2483			if (arg3 || arg4 || arg5)
  2484				return -EINVAL;
  2485			error = PAC_RESET_KEYS(me, arg2);
  2486			break;
  2487		case PR_PAC_SET_ENABLED_KEYS:
  2488			if (arg4 || arg5)
  2489				return -EINVAL;
  2490			error = PAC_SET_ENABLED_KEYS(me, arg2, arg3);
  2491			break;
  2492		case PR_PAC_GET_ENABLED_KEYS:
  2493			if (arg2 || arg3 || arg4 || arg5)
  2494				return -EINVAL;
  2495			error = PAC_GET_ENABLED_KEYS(me);
  2496			break;
  2497		case PR_SET_TAGGED_ADDR_CTRL:
  2498			if (arg3 || arg4 || arg5)
  2499				return -EINVAL;
  2500			error = SET_TAGGED_ADDR_CTRL(arg2);
  2501			break;
  2502		case PR_GET_TAGGED_ADDR_CTRL:
  2503			if (arg2 || arg3 || arg4 || arg5)
  2504				return -EINVAL;
  2505			error = GET_TAGGED_ADDR_CTRL();
  2506			break;
  2507		case PR_SET_IO_FLUSHER:
  2508			if (!capable(CAP_SYS_RESOURCE))
  2509				return -EPERM;
  2510	
  2511			if (arg3 || arg4 || arg5)
  2512				return -EINVAL;
  2513	
  2514			if (arg2 == 1)
  2515				current->flags |= PR_IO_FLUSHER;
  2516			else if (!arg2)
  2517				current->flags &= ~PR_IO_FLUSHER;
  2518			else
  2519				return -EINVAL;
  2520			break;
  2521		case PR_GET_IO_FLUSHER:
  2522			if (!capable(CAP_SYS_RESOURCE))
  2523				return -EPERM;
  2524	
  2525			if (arg2 || arg3 || arg4 || arg5)
  2526				return -EINVAL;
  2527	
  2528			error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
  2529			break;
  2530		case PR_SET_SYSCALL_USER_DISPATCH:
  2531			error = set_syscall_user_dispatch(arg2, arg3, arg4,
  2532							  (char __user *) arg5);
  2533			break;
  2534	#ifdef CONFIG_SCHED_CORE
  2535		case PR_SCHED_CORE:
  2536			error = sched_core_share_pid(arg2, arg3, arg4, arg5);
  2537			break;
  2538	#endif
  2539		default:
  2540			error = -EINVAL;
  2541			break;
  2542		}
  2543		return error;
  2544	}
  2545	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 42378 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [kbuild-all] Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
  2021-10-01 12:40       ` Mark Brown
  (?)
@ 2021-10-08  1:32         ` Chen, Rong A
  -1 siblings, 0 replies; 143+ messages in thread
From: Chen, Rong A @ 2021-10-08  1:32 UTC (permalink / raw)
  To: Mark Brown, kernel test robot
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, kbuild-all,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest



On 10/1/2021 8:40 PM, Mark Brown wrote:
> On Fri, Oct 01, 2021 at 01:20:12PM +0800, kernel test robot wrote:
> 
>> config: arc-randconfig-r043-20210930 (attached as .config)
>> compiler: arc-elf-gcc (GCC) 11.2.0
> 
> I suspect you're reporting architectures in alphabetical order - can I
> suggest making a priority list of widely used architectures?  I know I
> tend to zone out issues which look like they only occur on less actively
> maintained architectures as some of them have odd architecture specific
> issues that really need to be fixed at the architecture level.
> 
> 

Hi Mark,

Thanks for your advice, we run the tests on different architectures
randomly for now, and we'll consider a priority list of widely used
architectures.

Best Regards,
Rong Chen





^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [kbuild-all] Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-08  1:32         ` Chen, Rong A
  0 siblings, 0 replies; 143+ messages in thread
From: Chen, Rong A @ 2021-10-08  1:32 UTC (permalink / raw)
  To: Mark Brown, kernel test robot
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, kbuild-all,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest



On 10/1/2021 8:40 PM, Mark Brown wrote:
> On Fri, Oct 01, 2021 at 01:20:12PM +0800, kernel test robot wrote:
> 
>> config: arc-randconfig-r043-20210930 (attached as .config)
>> compiler: arc-elf-gcc (GCC) 11.2.0
> 
> I suspect you're reporting architectures in alphabetical order - can I
> suggest making a priority list of widely used architectures?  I know I
> tend to zone out issues which look like they only occur on less actively
> maintained architectures as some of them have odd architecture specific
> issues that really need to be fixed at the architecture level.
> 
> 

Hi Mark,

Thanks for your advice, we run the tests on different architectures
randomly for now, and we'll consider a priority list of widely used
architectures.

Best Regards,
Rong Chen





_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s
@ 2021-10-08  1:32         ` Chen, Rong A
  0 siblings, 0 replies; 143+ messages in thread
From: Chen, Rong A @ 2021-10-08  1:32 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 800 bytes --]



On 10/1/2021 8:40 PM, Mark Brown wrote:
> On Fri, Oct 01, 2021 at 01:20:12PM +0800, kernel test robot wrote:
> 
>> config: arc-randconfig-r043-20210930 (attached as .config)
>> compiler: arc-elf-gcc (GCC) 11.2.0
> 
> I suspect you're reporting architectures in alphabetical order - can I
> suggest making a priority list of widely used architectures?  I know I
> tend to zone out issues which look like they only occur on less actively
> maintained architectures as some of them have odd architecture specific
> issues that really need to be fixed at the architecture level.
> 
> 

Hi Mark,

Thanks for your advice, we run the tests on different architectures
randomly for now, and we'll consider a priority list of widely used
architectures.

Best Regards,
Rong Chen




^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-08 14:11     ` Alan Hayward
  -1 siblings, 0 replies; 143+ messages in thread
From: Alan Hayward @ 2021-10-08 14:11 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd

FWIW, I’ve looked through the series changes from a GDB perspective.
So, more the interfaces, less the lower level implementation details. What I've
read looks fine, given this builds on the existing SVE implementation. Although
I think effectively managing the different modes in the debugger may be tricky.
The .rst files are a huge help too.

What is returned if SME is in streaming mode and I call GETREGSET with NT_ARM_SVE ?
What is returned if SME is not in streaming mode and I call GETREGSET with NT_ARM_SSVE ?
Can NT_ARM_SSVE return a fpsimd?

(One minor typo below too)


Alan.

> On 30 Sep 2021, at 19:11, Mark Brown <broonie@kernel.org> wrote:
> 
> Provide ABI documentation for SME similar to that for SVE. Due to the very
> large overlap around streaming SVE mode in both implementation and
> interfaces documentation for streaming mode SVE is added to the SVE
> document rather than the SME one.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
> Documentation/arm64/index.rst |   1 +
> Documentation/arm64/sme.rst   | 427 ++++++++++++++++++++++++++++++++++
> Documentation/arm64/sve.rst   |  62 ++++-
> 3 files changed, 480 insertions(+), 10 deletions(-)
> create mode 100644 Documentation/arm64/sme.rst
> 
> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
> index 4f840bac083e..ae21f8118830 100644
> --- a/Documentation/arm64/index.rst
> +++ b/Documentation/arm64/index.rst
> @@ -21,6 +21,7 @@ ARM64 Architecture
>     perf
>     pointer-authentication
>     silicon-errata
> +    sme
>     sve
>     tagged-address-abi
>     tagged-pointers
> diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
> new file mode 100644
> index 000000000000..7a69bf637afd
> --- /dev/null
> +++ b/Documentation/arm64/sme.rst
> @@ -0,0 +1,427 @@
> +===================================================
> +Scalable Matrix Extension support for AArch64 Linux
> +===================================================
> +
> +This document outlines briefly the interface provided to userspace by Linux in
> +order to support use of the ARM Scalable Matrix Extension (SME).
> +
> +This is an outline of the most important features and issues only and not
> +intended to be exhaustive.  It should be read in conjunction with the SVE
> +documentation in sve.rst which provides details on the Streaming SVE mode
> +included in SME.
> +
> +This document does not aim to describe the SME architecture or programmer's
> +model.  To aid understanding, a minimal description of relevant programmer's
> +model features for SME is included in Appendix A.
> +
> +
> +1.  General
> +-----------
> +
> +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> +  register state are tracked per thread.
> +
> +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector

SME not SVE?

> +  AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
> +  instructions and registers, and the Linux-specific system interfaces
> +  described in this document.  SME is reported in /proc/cpuinfo as "sme".
> +
> +* Support for the execution of SME instructions in userspace can also be
> +  detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
> +  instruction, and checking that the value of the SME field is nonzero. [3]
> +
> +  It does not guarantee the presence of the system interfaces described in the
> +  following sections: software that needs to verify that those interfaces are
> +  present must check for HWCAP2_SME instead.
> +
> +* There are a number of optional SME features, presence of these is reported
> +  through AT_HWCAP2 through:
> +
> +	HWCAP2_SME_I16I64
> +	HWCAP2_SME_F64F64
> +	HWCAP2_SME_I8I32
> +	HWCAP2_SME_F16F32
> +	HWCAP2_SME_B16F32
> +	HWCAP2_SME_F32F32
> +
> +  This list may be extended over time as the SME architecture evolves.
> +
> +  These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
> +  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
> +  cpu-feature-registers.txt fo+
> r details.
> +* Debuggers should restrict themselves to interacting with the target via the
> +  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
> +  of detecting support for these regsets is to connect to a target process
> +  first and then attempt a
> +
> +	ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
> +
> +* Whenever ZA register values are exchanged in memory between userspace and
> +  the kernel, the register value is encoded in memory as a series of horizontal
> +  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
> +  used for SVE vectors.
> +
> +
> +2.  Vector lengths
> +------------------
> +
> +SME defines a second vector length similar to the SVE vector length which is
> +controls the size of the streaming mode SVE vectors and the ZA matrix array.
> +The ZA matrix is square with each side having as many bytes as a SVE vector.
> +
> +
> +3.  Sharing of streaming and non-streaming mode SVE state
> +---------------------------------------------------------
> +
> +It is implementation defined which if any parts of the SVE state are shared
> +between streaming and non-streaming modes.  When switching between modes
> +via software interfaces such as ptrace if no register content is provided as
> +part of switching no state will be assumed to be shared and everything will
> +be zeroed.
> +
> +
> +4.  System call behaviour
> +-------------------------
> +
> +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
> +  ZA matrix are preserved.
> +
> +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
> +  as normal.
> +
> +* Neither the SVE registers nor ZA are used to pass arguments to or receive
> +  results from any syscall.
> +
> +* On creation fork() or clone() the newly created process will have PSTATE.SM
> +  and PSTATE.ZA cleared.
> +
> +* All other SME state of a thread, including the currently configured vector
> +  length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
> +  length (if any), is preserved across all syscalls, subject to the specific
> +  exceptions for execve() described in section 6.
> +
> +
> +5.  Signal handling
> +-------------------
> +
> +* A new signal frame record za_context encodes the ZA register contents on
> +  signal delivery. [1]
> +
> +* The signal frame record for ZA always contains basic metadata, in particular
> +  the thread's vector length (in za_context.vl).
> +
> +* The ZA matrix may or may not be included in the record, depending on
> +  the value of PSTATE.ZA.  The registers are present if  and only if:
> +  za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
> +  in which case PSTATE.ZA == 1.
> +
> +* If matrix data is present, the remainder of the record has a vl-dependent
> +  size and layout.  Macros ZA_SIG_* are defined [1] to facilitate access to
> +  them.
> +
> +* The matrix is stored as a series of horizontal vectors in the same format as
> +  is used for SVE vectors.
> +
> +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
> +  space is allocated on the stack, an extra_context record is written in
> +  __reserved[] referencing this space.  za_context is then written in the
> +  extra space.  Refer to [1] for further details about this mechanism.
> +
> +
> +5.  Signal return
> +-----------------
> +
> +When returning from a signal handler:
> +
> +* If there is no za_context record in the signal frame, or if the record is
> +  present but contains no register data as desribed in the previous section,
> +  then ZA is disabled.
> +
> +* If za_context is present in the signal frame and contains matrix data then
> +  PSTATE.ZA is set to 1 and ZA is populated with the specified data.
> +
> +* The vector length cannot be changed via signal return.  If za_context.vl in
> +  the signal frame does not match the current vector length, the signal return
> +  attempt is treated as illegal, resulting in a forced SIGSEGV.
> +
> +
> +6.  prctl extensions
> +--------------------
> +
> +Some new prctl() calls are added to allow programs to manage the SVE vector
> +length:
> +
> +prctl(PR_SME_SET_VL, unsigned long arg)
> +
> +    Sets the vector length of the calling thread and related flags, where
> +    arg == vl | flags.  Other threads of the calling process are unaffected.
> +
> +    vl is the desired vector length, where sve_vl_valid(vl) must be true.
> +
> +    flags:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Inherit the current vector length across execve().  Otherwise, the
> +	    vector length is reset to the system default at execve().  (See
> +	    Section 9.)
> +
> +	PR_SME_SET_VL_ONEXEC
> +
> +	    Defer the requested vector length change until the next execve()
> +	    performed by this thread.
> +
> +	    The effect is equivalent to implicit exceution of the following
> +	    call immediately after the next execve() (if any) by the thread:
> +
> +		prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
> +
> +	    This allows launching of a new program with a different vector
> +	    length, while avoiding runtime side effects in the caller.
> +
> +	    Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
> +	    immediately.
> +
> +
> +    Return value: a nonnegative on success, or a negative value on error:
> +	EINVAL: SME not supported, invalid vector length requested, or
> +	    invalid flags.
> +
> +
> +    On success:
> +
> +    * Either the calling thread's vector length or the deferred vector length
> +      to be applied at the next execve() by the thread (dependent on whether
> +      PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
> +      supported by the system that is less than or equal to vl.  If vl ==
> +      SVE_VL_MAX, the value set will be the largest value supported by the
> +      system.
> +
> +    * Any previously outstanding deferred vector length change in the calling
> +      thread is cancelled.
> +
> +    * The returned value describes the resulting configuration, encoded as for
> +      PR_SME_GET_VL.  The vector length reported in this value is the new
> +      current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
> +      present in arg; otherwise, the reported vector length is the deferred
> +      vector length that will be applied at the next execve() by the calling
> +      thread.
> +
> +    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
> +      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
> +      unspecified, including both streaming and non-streaming SVE state.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +
> +prctl(PR_SME_GET_VL)
> +
> +    Gets the vector length of the calling thread.
> +
> +    The following flag may be OR-ed into the result:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Vector length will be inherited across execve().
> +
> +    There is no way to determine whether there is an outstanding deferred
> +    vector length change (which would only normally be the case between a
> +    fork() or vfork() and the corresponding execve() in typical use).
> +
> +    To extract the vector length from the result, and it with
> +    PR_SME_VL_LEN_MASK.
> +
> +    Return value: a nonnegative value on success, or a negative value on error:
> +	EINVAL: SME not supported.
> +
> +
> +7.  ptrace extensions
> +---------------------
> +
> +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
> +  state via PTRACE_GETREGSET and  PTRACE_SETREGSET, this is documented in
> +  sve.rst.
> +
> +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
> +  PTRACE_GETREGSET and PTRACE_SETREGSET.
> +
> +  Refer to [2] for definitions.
> +
> +The regset data starts with struct user_za_header, containing:
> +
> +    size
> +
> +	Size of the complete regset, in bytes.
> +	This depends on vl and possibly on other things in the future.
> +
> +	If a call to PTRACE_GETREGSET requests less data than the value of
> +	size, the caller can allocate a larger buffer and retry in order to
> +	read the complete regset.
> +
> +    max_size
> +
> +	Maximum size in bytes that the regset can grow to for the target
> +	thread.  The regset won't grow bigger than this even if the target
> +	thread changes its vector length etc.
> +
> +    vl
> +
> +	Target thread's current streaming vector length, in bytes.
> +
> +    max_vl
> +
> +	Maximum possible streaming vector length for the target thread.
> +
> +    flags
> +
> +	Zero or more of the following flags, which have the same
> +	meaning and behaviour as the corresponding PR_SET_VL_* flags:
> +
> +	    SME_PT_VL_INHERIT
> +
> +	    SME_PT_VL_ONEXEC (SETREGSET only).
> +
> +* The effects of changing the vector length and/or flags are equivalent to
> +  those documented for PR_SME_SET_VL.
> +
> +  The caller must make a further GETREGSET call if it needs to know what VL is
> +  actually set by SETREGSET, unless is it known in advance that the requested
> +  VL is supported.
> +
> +* The size and layout of the payload depends on the header fields.  The
> +  SME_PT_ZA_*() macros are provided to facilitate access to the data.
> +
> +* In either case, for SETREGSET it is permissible to omit the payload, in which
> +  case the vector length and flags are changed and PSTATE.ZA is set to 0
> +  (along with any consequences of those changes).  If a payload is provided
> +  then PSTATE.ZA will be set to 1.
> +
> +* For SETREGSET, if the requested VL is not supported, the effect will be the
> +  same as if the payload were omitted, except that an EIO error is reported.
> +  No attempt is made to translate the payload data to the correct layout
> +  for the vector length actually set.  It is up to the caller to translate the
> +  payload layout for the actual VL and retry.
> +
> +* The effect of writing a partial, incomplete payload is unspecified.
> +
> +
> +8.  ELF coredump extensions
> +---------------------------
> +
> +* NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
> +
> +* A NT_ARM_ZA note will be added to each coredump for each thread of the
> +  dumped process.  The contents will be equivalent to the data that would have
> +  been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
> +  when the coredump was generated.
> +
> +
> +9.  System runtime configuration
> +--------------------------------
> +
> +* To mitigate the ABI impact of expansion of the signal frame, a policy
> +  mechanism is provided for administrators, distro maintainers and developers
> +  to set the default vector length for userspace processes:
> +
> +/proc/sys/abi/sme_default_vector_length
> +
> +    Writing the text representation of an integer to this file sets the system
> +    default vector length to the specified value, unless the value is greater
> +    than the maximum vector length supported by the system in which case the
> +    default vector length is set to that maximum.
> +
> +    The result can be determined by reopening the file and reading its
> +    contents.
> +
> +    At boot, the default vector length is initially set to 32 or the maximum
> +    supported vector length, whichever is smaller and supported.  This
> +    determines the initial vector length of the init process (PID 1).
> +
> +    Reading this file returns the current system default vector length.
> +
> +* At every execve() call, the new vector length of the new process is set to
> +  the system default vector length, unless
> +
> +    * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
> +      calling thread, or
> +
> +    * a deferred vector length change is pending, established via the
> +      PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
> +
> +* Modifying the system default vector length does not affect the vector length
> +  of any existing process or thread that does not make an execve() call.
> +
> +
> +Appendix A.  SME programmer's model (informative)
> +=================================================
> +
> +This section provides a minimal description of the additions made by SVE to the
> +ARMv8-A programmer's model that are relevant to this document.
> +
> +Note: This section is for information only and not intended to be complete or
> +to replace any architectural specification.
> +
> +A.1.  Registers
> +---------------
> +
> +In A64 state, SME adds the following:
> +
> +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
> +  features are available.  When supported EL0 software may enter and leave
> +  streaming mode at any time.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  streaming mode only when it is actively being used.
> +
> +* A new vector length controlling the size of ZA and the Z registers when in
> +  streaming mode, separately to the vector length used for SVE when not in
> +  streaming mode.  There is no requirement that either the currently selected
> +  vector length or the set of vector lengths supported for the two modes in
> +  a given system have any relationship.  The streaming mode vector length
> +  is referred to as SVL.
> +
> +* A new ZA matrix register.  This is a square matrix of SVLxSVL bits.  Most
> +  operations on ZA require that streaming mode be enabled but ZA can be
> +  enabled without streaming mode in order to load, save and retain data.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  ZA only when it is actively being used.
> +
> +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
> +  SMSTOP instructions or by access to the SVCR system register:
> +
> +  * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
> +    data while if it is 0 then ZA can not be accessed.  When PSTATE.ZA is
> +    changed from 0 to 1 all bits in ZA are cleared.
> +
> +  * PSTATE.SM, if this is 1 then the PE is in streaming mode.  When the value
> +    of PSTATE.SM is changed then it is implementationd defined if the subset
> +    of the floating point register bits valid in both modes may be retained.
> +    Any other bits will be cleared.
> +
> +
> +References
> +==========
> +
> +[1] arch/arm64/include/uapi/asm/sigcontext.h
> +    AArch64 Linux signal ABI definitions
> +
> +[2] arch/arm64/include/uapi/asm/ptrace.h
> +    AArch64 Linux ptrace ABI definitions
> +
> +[3] Documentation/arm64/cpu-feature-registers.rst
> +
> +[4] ARM IHI0055C
> +    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
> +    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
> +    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
> diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
> index 03137154299e..574ad883b0f3 100644
> --- a/Documentation/arm64/sve.rst
> +++ b/Documentation/arm64/sve.rst
> @@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
> Date:   4 August 2017
> 
> This document outlines briefly the interface provided to userspace by Linux in
> -order to support use of the ARM Scalable Vector Extension (SVE).
> +order to support use of the ARM Scalable Vector Extension (SVE), including
> +interactions with Streaming SVE mode added by the Scalable Matrix Extension
> +(SME).
> 
> This is an outline of the most important features and issues only and not
> intended to be exhaustive.
> @@ -23,6 +25,9 @@ model features for SVE is included in Appendix A.
> * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
>   tracked per-thread.
> 
> +* In streaming mode FFR is not accessible, when these interfaces are used to
> +  access streaming mode FFR is read and written as zero.
> +
> * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
>   AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
>   instructions and registers, and the Linux-specific system interfaces
> @@ -53,10 +58,19 @@ model features for SVE is included in Appendix A.
>   which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
>   cpu-feature-registers.txt for details.
> 
> +* On hardware that supports the SME extensions, HWCAP2_SME will also be
> +  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
> +  streaming mode which provides a subset of the SVE feature set using a
> +  separate SME vector length and the same Z/V registers.  See sme.rst
> +  for more details.
> +
> * Debuggers should restrict themselves to interacting with the target via the
>   NT_ARM_SVE regset.  The recommended way of detecting support for this regset
>   is to connect to a target process first and then attempt a
> -  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
> +  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
> +  present and streaming SVE mode is in use the FPSIMD subset of registers
> +  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
> +  in the target.
> 
> * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
>   between userspace and the kernel, the register value is encoded in memory in
> @@ -126,6 +140,11 @@ the SVE instruction set architecture.
>   are only present in fpsimd_context.  For convenience, the content of V0..V31
>   is duplicated between sve_context and fpsimd_context.
> 
> +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
> +  if set indicates that the thread is in streaming mode and the vector length
> +  and register data (if present) describe the streaming SVE data and vector
> +  length.
> +
> * The signal frame record for SVE always contains basic metadata, in particular
>   the thread's vector length (in sve_context.vl).
> 
> @@ -170,6 +189,11 @@ When returning from a signal handler:
>   the signal frame does not match the current vector length, the signal return
>   attempt is treated as illegal, resulting in a forced SIGSEGV.
> 
> +* It is permitted to enter or leave streaming mode by setting or clearing
> +  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
> +  when doing so sve_context.vl and any register data are appropriate for the
> +  vector length in the new mode.
> +
> 
> 6.  prctl extensions
> --------------------
> @@ -265,8 +289,14 @@ prctl(PR_SVE_GET_VL)
> 7.  ptrace extensions
> ---------------------
> 
> -* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
> -  PTRACE_SETREGSET.
> +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
> +  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
> +  streaming mode SVE registers and NT_ARM_SVE describes the
> +  non-streaming mode SVE registers.
> +
> +  In this description a register set is referred to as being "live" when
> +  the target is in the appropriate streaming or non-streaming mode and is
> +  using data beyond the subset shared with the FPSIMD Vn registers.
> 
>   Refer to [2] for definitions.
> 
> @@ -297,7 +327,7 @@ The regset data starts with struct user_sve_header, containing:
> 
>     flags
> 
> -	either
> +	at most one of
> 
> 	    SVE_PT_REGS_FPSIMD
> 
> @@ -331,6 +361,10 @@ The regset data starts with struct user_sve_header, containing:
> 
> 	    SVE_PT_VL_ONEXEC (SETREGSET only).
> 
> +	If neither FPSIMD nor SVE flags are provided then no register
> +	payload is available, this is only possible when SME is implemented.
> +
> +
> * The effects of changing the vector length and/or flags are equivalent to
>   those documented for PR_SVE_SET_VL.
> 
> @@ -355,17 +389,25 @@ The regset data starts with struct user_sve_header, containing:
>   unspecified.  It is up to the caller to translate the payload layout
>   for the actual VL and retry.
> 
> +* Where SME is implemented it is not possible to GETREGSET the register
> +  state for normal SVE when in streaming mode, nor the streaming mode
> +  register state when in normal mode, regardless of the implementation defined
> +  behaviour of the hardware for sharing data between the two modes.
> +
> +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
> +  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
> +  if the target was not in streaming mode.
> +
> * The effect of writing a partial, incomplete payload is unspecified.
> 
> 
> 8.  ELF coredump extensions
> ---------------------------
> 
> -* A NT_ARM_SVE note will be added to each coredump for each thread of the
> -  dumped process.  The contents will be equivalent to the data that would have
> -  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
> -  when the coredump was generated.
> -
> +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
> 
> 9.  System runtime configuration
> --------------------------------
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-08 14:11     ` Alan Hayward
  0 siblings, 0 replies; 143+ messages in thread
From: Alan Hayward @ 2021-10-08 14:11 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd

FWIW, I’ve looked through the series changes from a GDB perspective.
So, more the interfaces, less the lower level implementation details. What I've
read looks fine, given this builds on the existing SVE implementation. Although
I think effectively managing the different modes in the debugger may be tricky.
The .rst files are a huge help too.

What is returned if SME is in streaming mode and I call GETREGSET with NT_ARM_SVE ?
What is returned if SME is not in streaming mode and I call GETREGSET with NT_ARM_SSVE ?
Can NT_ARM_SSVE return a fpsimd?

(One minor typo below too)


Alan.

> On 30 Sep 2021, at 19:11, Mark Brown <broonie@kernel.org> wrote:
> 
> Provide ABI documentation for SME similar to that for SVE. Due to the very
> large overlap around streaming SVE mode in both implementation and
> interfaces documentation for streaming mode SVE is added to the SVE
> document rather than the SME one.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
> ---
> Documentation/arm64/index.rst |   1 +
> Documentation/arm64/sme.rst   | 427 ++++++++++++++++++++++++++++++++++
> Documentation/arm64/sve.rst   |  62 ++++-
> 3 files changed, 480 insertions(+), 10 deletions(-)
> create mode 100644 Documentation/arm64/sme.rst
> 
> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
> index 4f840bac083e..ae21f8118830 100644
> --- a/Documentation/arm64/index.rst
> +++ b/Documentation/arm64/index.rst
> @@ -21,6 +21,7 @@ ARM64 Architecture
>     perf
>     pointer-authentication
>     silicon-errata
> +    sme
>     sve
>     tagged-address-abi
>     tagged-pointers
> diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
> new file mode 100644
> index 000000000000..7a69bf637afd
> --- /dev/null
> +++ b/Documentation/arm64/sme.rst
> @@ -0,0 +1,427 @@
> +===================================================
> +Scalable Matrix Extension support for AArch64 Linux
> +===================================================
> +
> +This document outlines briefly the interface provided to userspace by Linux in
> +order to support use of the ARM Scalable Matrix Extension (SME).
> +
> +This is an outline of the most important features and issues only and not
> +intended to be exhaustive.  It should be read in conjunction with the SVE
> +documentation in sve.rst which provides details on the Streaming SVE mode
> +included in SME.
> +
> +This document does not aim to describe the SME architecture or programmer's
> +model.  To aid understanding, a minimal description of relevant programmer's
> +model features for SME is included in Appendix A.
> +
> +
> +1.  General
> +-----------
> +
> +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> +  register state are tracked per thread.
> +
> +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector

SME not SVE?

> +  AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
> +  instructions and registers, and the Linux-specific system interfaces
> +  described in this document.  SME is reported in /proc/cpuinfo as "sme".
> +
> +* Support for the execution of SME instructions in userspace can also be
> +  detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
> +  instruction, and checking that the value of the SME field is nonzero. [3]
> +
> +  It does not guarantee the presence of the system interfaces described in the
> +  following sections: software that needs to verify that those interfaces are
> +  present must check for HWCAP2_SME instead.
> +
> +* There are a number of optional SME features, presence of these is reported
> +  through AT_HWCAP2 through:
> +
> +	HWCAP2_SME_I16I64
> +	HWCAP2_SME_F64F64
> +	HWCAP2_SME_I8I32
> +	HWCAP2_SME_F16F32
> +	HWCAP2_SME_B16F32
> +	HWCAP2_SME_F32F32
> +
> +  This list may be extended over time as the SME architecture evolves.
> +
> +  These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
> +  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
> +  cpu-feature-registers.txt fo+
> r details.
> +* Debuggers should restrict themselves to interacting with the target via the
> +  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
> +  of detecting support for these regsets is to connect to a target process
> +  first and then attempt a
> +
> +	ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
> +
> +* Whenever ZA register values are exchanged in memory between userspace and
> +  the kernel, the register value is encoded in memory as a series of horizontal
> +  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
> +  used for SVE vectors.
> +
> +
> +2.  Vector lengths
> +------------------
> +
> +SME defines a second vector length similar to the SVE vector length which is
> +controls the size of the streaming mode SVE vectors and the ZA matrix array.
> +The ZA matrix is square with each side having as many bytes as a SVE vector.
> +
> +
> +3.  Sharing of streaming and non-streaming mode SVE state
> +---------------------------------------------------------
> +
> +It is implementation defined which if any parts of the SVE state are shared
> +between streaming and non-streaming modes.  When switching between modes
> +via software interfaces such as ptrace if no register content is provided as
> +part of switching no state will be assumed to be shared and everything will
> +be zeroed.
> +
> +
> +4.  System call behaviour
> +-------------------------
> +
> +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
> +  ZA matrix are preserved.
> +
> +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
> +  as normal.
> +
> +* Neither the SVE registers nor ZA are used to pass arguments to or receive
> +  results from any syscall.
> +
> +* On creation fork() or clone() the newly created process will have PSTATE.SM
> +  and PSTATE.ZA cleared.
> +
> +* All other SME state of a thread, including the currently configured vector
> +  length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
> +  length (if any), is preserved across all syscalls, subject to the specific
> +  exceptions for execve() described in section 6.
> +
> +
> +5.  Signal handling
> +-------------------
> +
> +* A new signal frame record za_context encodes the ZA register contents on
> +  signal delivery. [1]
> +
> +* The signal frame record for ZA always contains basic metadata, in particular
> +  the thread's vector length (in za_context.vl).
> +
> +* The ZA matrix may or may not be included in the record, depending on
> +  the value of PSTATE.ZA.  The registers are present if  and only if:
> +  za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
> +  in which case PSTATE.ZA == 1.
> +
> +* If matrix data is present, the remainder of the record has a vl-dependent
> +  size and layout.  Macros ZA_SIG_* are defined [1] to facilitate access to
> +  them.
> +
> +* The matrix is stored as a series of horizontal vectors in the same format as
> +  is used for SVE vectors.
> +
> +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
> +  space is allocated on the stack, an extra_context record is written in
> +  __reserved[] referencing this space.  za_context is then written in the
> +  extra space.  Refer to [1] for further details about this mechanism.
> +
> +
> +5.  Signal return
> +-----------------
> +
> +When returning from a signal handler:
> +
> +* If there is no za_context record in the signal frame, or if the record is
> +  present but contains no register data as desribed in the previous section,
> +  then ZA is disabled.
> +
> +* If za_context is present in the signal frame and contains matrix data then
> +  PSTATE.ZA is set to 1 and ZA is populated with the specified data.
> +
> +* The vector length cannot be changed via signal return.  If za_context.vl in
> +  the signal frame does not match the current vector length, the signal return
> +  attempt is treated as illegal, resulting in a forced SIGSEGV.
> +
> +
> +6.  prctl extensions
> +--------------------
> +
> +Some new prctl() calls are added to allow programs to manage the SVE vector
> +length:
> +
> +prctl(PR_SME_SET_VL, unsigned long arg)
> +
> +    Sets the vector length of the calling thread and related flags, where
> +    arg == vl | flags.  Other threads of the calling process are unaffected.
> +
> +    vl is the desired vector length, where sve_vl_valid(vl) must be true.
> +
> +    flags:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Inherit the current vector length across execve().  Otherwise, the
> +	    vector length is reset to the system default at execve().  (See
> +	    Section 9.)
> +
> +	PR_SME_SET_VL_ONEXEC
> +
> +	    Defer the requested vector length change until the next execve()
> +	    performed by this thread.
> +
> +	    The effect is equivalent to implicit exceution of the following
> +	    call immediately after the next execve() (if any) by the thread:
> +
> +		prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
> +
> +	    This allows launching of a new program with a different vector
> +	    length, while avoiding runtime side effects in the caller.
> +
> +	    Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
> +	    immediately.
> +
> +
> +    Return value: a nonnegative on success, or a negative value on error:
> +	EINVAL: SME not supported, invalid vector length requested, or
> +	    invalid flags.
> +
> +
> +    On success:
> +
> +    * Either the calling thread's vector length or the deferred vector length
> +      to be applied at the next execve() by the thread (dependent on whether
> +      PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
> +      supported by the system that is less than or equal to vl.  If vl ==
> +      SVE_VL_MAX, the value set will be the largest value supported by the
> +      system.
> +
> +    * Any previously outstanding deferred vector length change in the calling
> +      thread is cancelled.
> +
> +    * The returned value describes the resulting configuration, encoded as for
> +      PR_SME_GET_VL.  The vector length reported in this value is the new
> +      current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
> +      present in arg; otherwise, the reported vector length is the deferred
> +      vector length that will be applied at the next execve() by the calling
> +      thread.
> +
> +    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
> +      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
> +      unspecified, including both streaming and non-streaming SVE state.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +
> +prctl(PR_SME_GET_VL)
> +
> +    Gets the vector length of the calling thread.
> +
> +    The following flag may be OR-ed into the result:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Vector length will be inherited across execve().
> +
> +    There is no way to determine whether there is an outstanding deferred
> +    vector length change (which would only normally be the case between a
> +    fork() or vfork() and the corresponding execve() in typical use).
> +
> +    To extract the vector length from the result, and it with
> +    PR_SME_VL_LEN_MASK.
> +
> +    Return value: a nonnegative value on success, or a negative value on error:
> +	EINVAL: SME not supported.
> +
> +
> +7.  ptrace extensions
> +---------------------
> +
> +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
> +  state via PTRACE_GETREGSET and  PTRACE_SETREGSET, this is documented in
> +  sve.rst.
> +
> +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
> +  PTRACE_GETREGSET and PTRACE_SETREGSET.
> +
> +  Refer to [2] for definitions.
> +
> +The regset data starts with struct user_za_header, containing:
> +
> +    size
> +
> +	Size of the complete regset, in bytes.
> +	This depends on vl and possibly on other things in the future.
> +
> +	If a call to PTRACE_GETREGSET requests less data than the value of
> +	size, the caller can allocate a larger buffer and retry in order to
> +	read the complete regset.
> +
> +    max_size
> +
> +	Maximum size in bytes that the regset can grow to for the target
> +	thread.  The regset won't grow bigger than this even if the target
> +	thread changes its vector length etc.
> +
> +    vl
> +
> +	Target thread's current streaming vector length, in bytes.
> +
> +    max_vl
> +
> +	Maximum possible streaming vector length for the target thread.
> +
> +    flags
> +
> +	Zero or more of the following flags, which have the same
> +	meaning and behaviour as the corresponding PR_SET_VL_* flags:
> +
> +	    SME_PT_VL_INHERIT
> +
> +	    SME_PT_VL_ONEXEC (SETREGSET only).
> +
> +* The effects of changing the vector length and/or flags are equivalent to
> +  those documented for PR_SME_SET_VL.
> +
> +  The caller must make a further GETREGSET call if it needs to know what VL is
> +  actually set by SETREGSET, unless is it known in advance that the requested
> +  VL is supported.
> +
> +* The size and layout of the payload depends on the header fields.  The
> +  SME_PT_ZA_*() macros are provided to facilitate access to the data.
> +
> +* In either case, for SETREGSET it is permissible to omit the payload, in which
> +  case the vector length and flags are changed and PSTATE.ZA is set to 0
> +  (along with any consequences of those changes).  If a payload is provided
> +  then PSTATE.ZA will be set to 1.
> +
> +* For SETREGSET, if the requested VL is not supported, the effect will be the
> +  same as if the payload were omitted, except that an EIO error is reported.
> +  No attempt is made to translate the payload data to the correct layout
> +  for the vector length actually set.  It is up to the caller to translate the
> +  payload layout for the actual VL and retry.
> +
> +* The effect of writing a partial, incomplete payload is unspecified.
> +
> +
> +8.  ELF coredump extensions
> +---------------------------
> +
> +* NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
> +
> +* A NT_ARM_ZA note will be added to each coredump for each thread of the
> +  dumped process.  The contents will be equivalent to the data that would have
> +  been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
> +  when the coredump was generated.
> +
> +
> +9.  System runtime configuration
> +--------------------------------
> +
> +* To mitigate the ABI impact of expansion of the signal frame, a policy
> +  mechanism is provided for administrators, distro maintainers and developers
> +  to set the default vector length for userspace processes:
> +
> +/proc/sys/abi/sme_default_vector_length
> +
> +    Writing the text representation of an integer to this file sets the system
> +    default vector length to the specified value, unless the value is greater
> +    than the maximum vector length supported by the system in which case the
> +    default vector length is set to that maximum.
> +
> +    The result can be determined by reopening the file and reading its
> +    contents.
> +
> +    At boot, the default vector length is initially set to 32 or the maximum
> +    supported vector length, whichever is smaller and supported.  This
> +    determines the initial vector length of the init process (PID 1).
> +
> +    Reading this file returns the current system default vector length.
> +
> +* At every execve() call, the new vector length of the new process is set to
> +  the system default vector length, unless
> +
> +    * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
> +      calling thread, or
> +
> +    * a deferred vector length change is pending, established via the
> +      PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
> +
> +* Modifying the system default vector length does not affect the vector length
> +  of any existing process or thread that does not make an execve() call.
> +
> +
> +Appendix A.  SME programmer's model (informative)
> +=================================================
> +
> +This section provides a minimal description of the additions made by SVE to the
> +ARMv8-A programmer's model that are relevant to this document.
> +
> +Note: This section is for information only and not intended to be complete or
> +to replace any architectural specification.
> +
> +A.1.  Registers
> +---------------
> +
> +In A64 state, SME adds the following:
> +
> +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
> +  features are available.  When supported EL0 software may enter and leave
> +  streaming mode at any time.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  streaming mode only when it is actively being used.
> +
> +* A new vector length controlling the size of ZA and the Z registers when in
> +  streaming mode, separately to the vector length used for SVE when not in
> +  streaming mode.  There is no requirement that either the currently selected
> +  vector length or the set of vector lengths supported for the two modes in
> +  a given system have any relationship.  The streaming mode vector length
> +  is referred to as SVL.
> +
> +* A new ZA matrix register.  This is a square matrix of SVLxSVL bits.  Most
> +  operations on ZA require that streaming mode be enabled but ZA can be
> +  enabled without streaming mode in order to load, save and retain data.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  ZA only when it is actively being used.
> +
> +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
> +  SMSTOP instructions or by access to the SVCR system register:
> +
> +  * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
> +    data while if it is 0 then ZA can not be accessed.  When PSTATE.ZA is
> +    changed from 0 to 1 all bits in ZA are cleared.
> +
> +  * PSTATE.SM, if this is 1 then the PE is in streaming mode.  When the value
> +    of PSTATE.SM is changed then it is implementationd defined if the subset
> +    of the floating point register bits valid in both modes may be retained.
> +    Any other bits will be cleared.
> +
> +
> +References
> +==========
> +
> +[1] arch/arm64/include/uapi/asm/sigcontext.h
> +    AArch64 Linux signal ABI definitions
> +
> +[2] arch/arm64/include/uapi/asm/ptrace.h
> +    AArch64 Linux ptrace ABI definitions
> +
> +[3] Documentation/arm64/cpu-feature-registers.rst
> +
> +[4] ARM IHI0055C
> +    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
> +    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
> +    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
> diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
> index 03137154299e..574ad883b0f3 100644
> --- a/Documentation/arm64/sve.rst
> +++ b/Documentation/arm64/sve.rst
> @@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
> Date:   4 August 2017
> 
> This document outlines briefly the interface provided to userspace by Linux in
> -order to support use of the ARM Scalable Vector Extension (SVE).
> +order to support use of the ARM Scalable Vector Extension (SVE), including
> +interactions with Streaming SVE mode added by the Scalable Matrix Extension
> +(SME).
> 
> This is an outline of the most important features and issues only and not
> intended to be exhaustive.
> @@ -23,6 +25,9 @@ model features for SVE is included in Appendix A.
> * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
>   tracked per-thread.
> 
> +* In streaming mode FFR is not accessible, when these interfaces are used to
> +  access streaming mode FFR is read and written as zero.
> +
> * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
>   AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
>   instructions and registers, and the Linux-specific system interfaces
> @@ -53,10 +58,19 @@ model features for SVE is included in Appendix A.
>   which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
>   cpu-feature-registers.txt for details.
> 
> +* On hardware that supports the SME extensions, HWCAP2_SME will also be
> +  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
> +  streaming mode which provides a subset of the SVE feature set using a
> +  separate SME vector length and the same Z/V registers.  See sme.rst
> +  for more details.
> +
> * Debuggers should restrict themselves to interacting with the target via the
>   NT_ARM_SVE regset.  The recommended way of detecting support for this regset
>   is to connect to a target process first and then attempt a
> -  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
> +  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
> +  present and streaming SVE mode is in use the FPSIMD subset of registers
> +  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
> +  in the target.
> 
> * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
>   between userspace and the kernel, the register value is encoded in memory in
> @@ -126,6 +140,11 @@ the SVE instruction set architecture.
>   are only present in fpsimd_context.  For convenience, the content of V0..V31
>   is duplicated between sve_context and fpsimd_context.
> 
> +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
> +  if set indicates that the thread is in streaming mode and the vector length
> +  and register data (if present) describe the streaming SVE data and vector
> +  length.
> +
> * The signal frame record for SVE always contains basic metadata, in particular
>   the thread's vector length (in sve_context.vl).
> 
> @@ -170,6 +189,11 @@ When returning from a signal handler:
>   the signal frame does not match the current vector length, the signal return
>   attempt is treated as illegal, resulting in a forced SIGSEGV.
> 
> +* It is permitted to enter or leave streaming mode by setting or clearing
> +  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
> +  when doing so sve_context.vl and any register data are appropriate for the
> +  vector length in the new mode.
> +
> 
> 6.  prctl extensions
> --------------------
> @@ -265,8 +289,14 @@ prctl(PR_SVE_GET_VL)
> 7.  ptrace extensions
> ---------------------
> 
> -* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
> -  PTRACE_SETREGSET.
> +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
> +  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
> +  streaming mode SVE registers and NT_ARM_SVE describes the
> +  non-streaming mode SVE registers.
> +
> +  In this description a register set is referred to as being "live" when
> +  the target is in the appropriate streaming or non-streaming mode and is
> +  using data beyond the subset shared with the FPSIMD Vn registers.
> 
>   Refer to [2] for definitions.
> 
> @@ -297,7 +327,7 @@ The regset data starts with struct user_sve_header, containing:
> 
>     flags
> 
> -	either
> +	at most one of
> 
> 	    SVE_PT_REGS_FPSIMD
> 
> @@ -331,6 +361,10 @@ The regset data starts with struct user_sve_header, containing:
> 
> 	    SVE_PT_VL_ONEXEC (SETREGSET only).
> 
> +	If neither FPSIMD nor SVE flags are provided then no register
> +	payload is available, this is only possible when SME is implemented.
> +
> +
> * The effects of changing the vector length and/or flags are equivalent to
>   those documented for PR_SVE_SET_VL.
> 
> @@ -355,17 +389,25 @@ The regset data starts with struct user_sve_header, containing:
>   unspecified.  It is up to the caller to translate the payload layout
>   for the actual VL and retry.
> 
> +* Where SME is implemented it is not possible to GETREGSET the register
> +  state for normal SVE when in streaming mode, nor the streaming mode
> +  register state when in normal mode, regardless of the implementation defined
> +  behaviour of the hardware for sharing data between the two modes.
> +
> +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
> +  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
> +  if the target was not in streaming mode.
> +
> * The effect of writing a partial, incomplete payload is unspecified.
> 
> 
> 8.  ELF coredump extensions
> ---------------------------
> 
> -* A NT_ARM_SVE note will be added to each coredump for each thread of the
> -  dumped process.  The contents will be equivalent to the data that would have
> -  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
> -  when the coredump was generated.
> -
> +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
> 
> 9.  System runtime configuration
> --------------------------------
> -- 
> 2.20.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-08 14:11     ` Alan Hayward
@ 2021-10-08 15:28       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-08 15:28 UTC (permalink / raw)
  To: Alan Hayward
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd

[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]

On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:

> read looks fine, given this builds on the existing SVE implementation. Although
> I think effectively managing the different modes in the debugger may be tricky.

Sadly I don't see any way of handling the hardware in a way that
doesn't present some kind of hassle.

> The .rst files are a huge help too.

Glad to hear it.

> What is returned if SME is in streaming mode and I call GETREGSET with NT_ARM_SVE ?
> What is returned if SME is not in streaming mode and I call GETREGSET with NT_ARM_SSVE ?

In both cases you'll get the user_sve_header with no register
payload and neither of the register types flagged.  I'll make
this a bit more explicit in the documentation, in the SVE
documentation it currently just talks about no register data
being available but doesn't ever actually explicitly say why that
would happen like we do for ZA, it's currently not super helpful.

> Can NT_ARM_SSVE return a fpsimd?

It's documented that way for simplicity but in the current
implementation it won't ever actually do so in practice.  The
only case where I could see that it might happen would be if we
change the syscalls to stay in streaming mode over syscall, in
that case we could do as we do for SVE and preserve FPSIMD
registers only.  At present we drop out of streaming mode if we
get a syscall with it enabled so it's a non-issue, if people
agree that that's the right thing for the syscalls then we should
update the documentation to specify this since otherwise we'll
doubtless catch someone by surprise if we ever manage to start
doing it in the future.

> > +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
> 
> SME not SVE?

Ack, yes.  Well, I guess given that SME should never appear in a
system without SVE the statement is true but...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-08 15:28       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-08 15:28 UTC (permalink / raw)
  To: Alan Hayward
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd


[-- Attachment #1.1: Type: text/plain, Size: 1886 bytes --]

On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:

> read looks fine, given this builds on the existing SVE implementation. Although
> I think effectively managing the different modes in the debugger may be tricky.

Sadly I don't see any way of handling the hardware in a way that
doesn't present some kind of hassle.

> The .rst files are a huge help too.

Glad to hear it.

> What is returned if SME is in streaming mode and I call GETREGSET with NT_ARM_SVE ?
> What is returned if SME is not in streaming mode and I call GETREGSET with NT_ARM_SSVE ?

In both cases you'll get the user_sve_header with no register
payload and neither of the register types flagged.  I'll make
this a bit more explicit in the documentation, in the SVE
documentation it currently just talks about no register data
being available but doesn't ever actually explicitly say why that
would happen like we do for ZA, it's currently not super helpful.

> Can NT_ARM_SSVE return a fpsimd?

It's documented that way for simplicity but in the current
implementation it won't ever actually do so in practice.  The
only case where I could see that it might happen would be if we
change the syscalls to stay in streaming mode over syscall, in
that case we could do as we do for SVE and preserve FPSIMD
registers only.  At present we drop out of streaming mode if we
get a syscall with it enabled so it's a non-issue, if people
agree that that's the right thing for the syscalls then we should
update the documentation to specify this since otherwise we'll
doubtless catch someone by surprise if we ever manage to start
doing it in the future.

> > +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
> 
> SME not SVE?

Ack, yes.  Well, I guess given that SME should never appear in a
system without SVE the statement is true but...

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-08 15:28       ` Mark Brown
@ 2021-10-08 16:45         ` Alan Hayward
  -1 siblings, 0 replies; 143+ messages in thread
From: Alan Hayward @ 2021-10-08 16:45 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd



> On 8 Oct 2021, at 16:28, Mark Brown <broonie@kernel.org> wrote:
> 
> On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:
> 
>> read looks fine, given this builds on the existing SVE implementation. Although
>> I think effectively managing the different modes in the debugger may be tricky.
> 
> Sadly I don't see any way of handling the hardware in a way that
> doesn't present some kind of hassle.

Agreed - complexity here is from the hardware, so needs handling by the debugger regardless.

> 
>> The .rst files are a huge help too.
> 
> Glad to hear it.
> 
>> What is returned if SME is in streaming mode and I call GETREGSET with NT_ARM_SVE ?
>> What is returned if SME is not in streaming mode and I call GETREGSET with NT_ARM_SSVE ?
> 
> In both cases you'll get the user_sve_header with no register
> payload and neither of the register types flagged.  I'll make
> this a bit more explicit in the documentation, in the SVE
> documentation it currently just talks about no register data
> being available but doesn't ever actually explicitly say why that
> would happen like we do for ZA, it's currently not super helpful.

Ok, that works for me.

> 
>> Can NT_ARM_SSVE return a fpsimd?
> 
> It's documented that way for simplicity but in the current
> implementation it won't ever actually do so in practice.  The
> only case where I could see that it might happen would be if we
> change the syscalls to stay in streaming mode over syscall, in
> that case we could do as we do for SVE and preserve FPSIMD
> registers only.  At present we drop out of streaming mode if we
> get a syscall with it enabled so it's a non-issue, if people
> agree that that's the right thing for the syscalls then we should
> update the documentation to specify this since otherwise we'll
> doubtless catch someone by surprise if we ever manage to start
> doing it in the future.

….or it’ll end up executing some code that was written to cope with fpsimd, but has never been
run. Might be worth making it explicit in the documentation.


> 
>>> +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
>> 
>> SME not SVE?
> 
> Ack, yes.  Well, I guess given that SME should never appear in a
> system without SVE the statement is true but...


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-08 16:45         ` Alan Hayward
  0 siblings, 0 replies; 143+ messages in thread
From: Alan Hayward @ 2021-10-08 16:45 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd



> On 8 Oct 2021, at 16:28, Mark Brown <broonie@kernel.org> wrote:
> 
> On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:
> 
>> read looks fine, given this builds on the existing SVE implementation. Although
>> I think effectively managing the different modes in the debugger may be tricky.
> 
> Sadly I don't see any way of handling the hardware in a way that
> doesn't present some kind of hassle.

Agreed - complexity here is from the hardware, so needs handling by the debugger regardless.

> 
>> The .rst files are a huge help too.
> 
> Glad to hear it.
> 
>> What is returned if SME is in streaming mode and I call GETREGSET with NT_ARM_SVE ?
>> What is returned if SME is not in streaming mode and I call GETREGSET with NT_ARM_SSVE ?
> 
> In both cases you'll get the user_sve_header with no register
> payload and neither of the register types flagged.  I'll make
> this a bit more explicit in the documentation, in the SVE
> documentation it currently just talks about no register data
> being available but doesn't ever actually explicitly say why that
> would happen like we do for ZA, it's currently not super helpful.

Ok, that works for me.

> 
>> Can NT_ARM_SSVE return a fpsimd?
> 
> It's documented that way for simplicity but in the current
> implementation it won't ever actually do so in practice.  The
> only case where I could see that it might happen would be if we
> change the syscalls to stay in streaming mode over syscall, in
> that case we could do as we do for SVE and preserve FPSIMD
> registers only.  At present we drop out of streaming mode if we
> get a syscall with it enabled so it's a non-issue, if people
> agree that that's the right thing for the syscalls then we should
> update the documentation to specify this since otherwise we'll
> doubtless catch someone by surprise if we ever manage to start
> doing it in the future.

….or it’ll end up executing some code that was written to cope with fpsimd, but has never been
run. Might be worth making it explicit in the documentation.


> 
>>> +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
>> 
>> SME not SVE?
> 
> Ack, yes.  Well, I guess given that SME should never appear in a
> system without SVE the statement is true but...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-08 16:45         ` Alan Hayward
@ 2021-10-08 17:04           ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-08 17:04 UTC (permalink / raw)
  To: Alan Hayward
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Fri, Oct 08, 2021 at 04:45:44PM +0000, Alan Hayward wrote:
> > On 8 Oct 2021, at 16:28, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:

> >> Can NT_ARM_SSVE return a fpsimd?

> > It's documented that way for simplicity but in the current
> > implementation it won't ever actually do so in practice.  The
> > only case where I could see that it might happen would be if we
> > change the syscalls to stay in streaming mode over syscall, in
> > that case we could do as we do for SVE and preserve FPSIMD
> > registers only.  At present we drop out of streaming mode if we
> > get a syscall with it enabled so it's a non-issue, if people
> > agree that that's the right thing for the syscalls then we should
> > update the documentation to specify this since otherwise we'll
> > doubtless catch someone by surprise if we ever manage to start
> > doing it in the future.

> ….or it’ll end up executing some code that was written to cope with fpsimd, but has never been
> run. Might be worth making it explicit in the documentation.

I will if/when it gets fixed that way.  Actually, while looking
at that code I was tempted to remove the support for returning
FPSIMD only registers via NT_ARM_SVE entirely and just always
convert to SVE format - I'm not sure what the use case was there?
It's not a pressing thing but it seemed like it was a bit of an
implementation detail that we were revealing.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-08 17:04           ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-08 17:04 UTC (permalink / raw)
  To: Alan Hayward
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd


[-- Attachment #1.1: Type: text/plain, Size: 1486 bytes --]

On Fri, Oct 08, 2021 at 04:45:44PM +0000, Alan Hayward wrote:
> > On 8 Oct 2021, at 16:28, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:

> >> Can NT_ARM_SSVE return a fpsimd?

> > It's documented that way for simplicity but in the current
> > implementation it won't ever actually do so in practice.  The
> > only case where I could see that it might happen would be if we
> > change the syscalls to stay in streaming mode over syscall, in
> > that case we could do as we do for SVE and preserve FPSIMD
> > registers only.  At present we drop out of streaming mode if we
> > get a syscall with it enabled so it's a non-issue, if people
> > agree that that's the right thing for the syscalls then we should
> > update the documentation to specify this since otherwise we'll
> > doubtless catch someone by surprise if we ever manage to start
> > doing it in the future.

> ….or it’ll end up executing some code that was written to cope with fpsimd, but has never been
> run. Might be worth making it explicit in the documentation.

I will if/when it gets fixed that way.  Actually, while looking
at that code I was tempted to remove the support for returning
FPSIMD only registers via NT_ARM_SVE entirely and just always
convert to SVE format - I'm not sure what the use case was there?
It's not a pressing thing but it seemed like it was a bit of an
implementation detail that we were revealing.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save()
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11  9:39     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11  9:39 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:07 +0100
Mark Brown <broonie@kernel.org> wrote:

> Currently all the active code in fpsimd_save() is inside a check for
> TIF_FOREIGN_FPSTATE. Reduce the indentation level by changing to return
> from the function if TIF_FOREIGN_FPSTATE is set.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
Hi Mark

Trivial formatting thing inline, but otherwise this is clearly a noop
refactor as expected.

> ---
>  arch/arm64/kernel/fpsimd.c | 38 ++++++++++++++++++++------------------
>  1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ff4962750b3d..995f8801602b 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -308,24 +308,26 @@ static void fpsimd_save(void)
>  	WARN_ON(!system_supports_fpsimd());
>  	WARN_ON(!have_cpu_fpsimd_context());
>  
> -	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> -		if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> -		    test_thread_flag(TIF_SVE)) {
> -			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
> -				/*
> -				 * Can't save the user regs, so current would
> -				 * re-enter user with corrupt state.
> -				 * There's no way to recover, so kill it:
> -				 */
> -				force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
> -				return;
> -			}
> -
> -			sve_save_state((char *)last->sve_state +
> -						sve_ffr_offset(last->sve_vl),
> -				       &last->st->fpsr);
> -		} else
> -			fpsimd_save_state(last->st);
> +	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
> +		return;
> +
> +	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> +	    test_thread_flag(TIF_SVE)) {

Trivial, but - the above now fits nicely on oneline under 80 chars.

Looks like you have another instance of this in patch 21 in roughly the same place.

Then you drop at least some of this code later. hohum, maybe not worth tidying up.


> +		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
> +			/*
> +			 * Can't save the user regs, so current would
> +			 * re-enter user with corrupt state.
> +			 * There's no way to recover, so kill it:
> +			 */
> +			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
> +			return;
> +		}
> +
> +		sve_save_state((char *)last->sve_state +
> +					sve_ffr_offset(last->sve_vl),
> +			       &last->st->fpsr);
> +	} else {
> +		fpsimd_save_state(last->st);
>  	}
>  }
>  


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save()
@ 2021-10-11  9:39     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11  9:39 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:07 +0100
Mark Brown <broonie@kernel.org> wrote:

> Currently all the active code in fpsimd_save() is inside a check for
> TIF_FOREIGN_FPSTATE. Reduce the indentation level by changing to return
> from the function if TIF_FOREIGN_FPSTATE is set.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
Hi Mark

Trivial formatting thing inline, but otherwise this is clearly a noop
refactor as expected.

> ---
>  arch/arm64/kernel/fpsimd.c | 38 ++++++++++++++++++++------------------
>  1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ff4962750b3d..995f8801602b 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -308,24 +308,26 @@ static void fpsimd_save(void)
>  	WARN_ON(!system_supports_fpsimd());
>  	WARN_ON(!have_cpu_fpsimd_context());
>  
> -	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> -		if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> -		    test_thread_flag(TIF_SVE)) {
> -			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
> -				/*
> -				 * Can't save the user regs, so current would
> -				 * re-enter user with corrupt state.
> -				 * There's no way to recover, so kill it:
> -				 */
> -				force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
> -				return;
> -			}
> -
> -			sve_save_state((char *)last->sve_state +
> -						sve_ffr_offset(last->sve_vl),
> -				       &last->st->fpsr);
> -		} else
> -			fpsimd_save_state(last->st);
> +	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
> +		return;
> +
> +	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> +	    test_thread_flag(TIF_SVE)) {

Trivial, but - the above now fits nicely on oneline under 80 chars.

Looks like you have another instance of this in patch 21 in roughly the same place.

Then you drop at least some of this code later. hohum, maybe not worth tidying up.


> +		if (WARN_ON(sve_get_vl() != last->sve_vl)) {
> +			/*
> +			 * Can't save the user regs, so current would
> +			 * re-enter user with corrupt state.
> +			 * There's no way to recover, so kill it:
> +			 */
> +			force_signal_inject(SIGKILL, SI_KERNEL, 0, 0);
> +			return;
> +		}
> +
> +		sve_save_state((char *)last->sve_state +
> +					sve_ffr_offset(last->sve_vl),
> +			       &last->st->fpsr);
> +	} else {
> +		fpsimd_save_state(last->st);
>  	}
>  }
>  


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 10:20     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 10:20 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:14 +0100
Mark Brown <broonie@kernel.org> wrote:

> As for SVE we will track a per task SME vector length for tasks. Convert
> the existing storage for the vector length into an array and update
> fpsimd_flush_task() to initialise this in a function.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>

I'm clearly having a trivial comment day.  Given reduction in indenting
it would be nice perhaps to reformat comments to take that into account.

I'm also unconvinced the trivial wrappers are worthwhile.  (maybe you drop
those later?)

Jonathan

> ---
>  arch/arm64/include/asm/processor.h   | 44 +++++++++++--
>  arch/arm64/include/asm/thread_info.h |  2 +-
>  arch/arm64/kernel/fpsimd.c           | 97 ++++++++++++++++------------
>  3 files changed, 95 insertions(+), 48 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index fb0608fe9ded..9b854e8196df 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -152,8 +152,8 @@ struct thread_struct {
>  
>  	unsigned int		fpsimd_cpu;
>  	void			*sve_state;	/* SVE registers, if any */
> -	unsigned int		sve_vl;		/* SVE vector length */
> -	unsigned int		sve_vl_onexec;	/* SVE vl after next exec */
> +	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
> +	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
>  	unsigned long		fault_address;	/* fault info */
>  	unsigned long		fault_code;	/* ESR_EL1 value */
>  	struct debug_info	debug;		/* debugging */
> @@ -169,15 +169,45 @@ struct thread_struct {
>  	u64			sctlr_user;
>  };
>  
> +static inline unsigned int thread_get_vl(struct thread_struct *thread,
> +					 enum vec_type type)
> +{
> +	return thread->vl[type];
> +}
> +
>  static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
>  {
> -	return thread->sve_vl;
> +	return thread_get_vl(thread, ARM64_VEC_SVE);
> +}
> +
> +unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
> +void task_set_vl(struct task_struct *task, enum vec_type type,
> +		 unsigned long vl);
> +void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
> +			unsigned long vl);
> +unsigned int task_get_vl_onexec(const struct task_struct *task,
> +				enum vec_type type);
> +
> +static inline unsigned int task_get_sve_vl(const struct task_struct *task)
> +{
> +	return task_get_vl(task, ARM64_VEC_SVE);
>  }
>  
> -unsigned int task_get_sve_vl(const struct task_struct *task);
> -void task_set_sve_vl(struct task_struct *task, unsigned long vl);
> -unsigned int task_get_sve_vl_onexec(const struct task_struct *task);
> -void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl);
> +static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
> +{
> +	task_set_vl(task, ARM64_VEC_SVE, vl);

Are these really worth having?  They make the refactor simpler perhaps, but
don't add any meaning in of themselves over just having the calls inline.

> +}
> +
> +static inline unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
> +{
> +	return task_get_vl_onexec(task, ARM64_VEC_SVE);
> +}
> +
> +static inline void task_set_sve_vl_onexec(struct task_struct *task,
> +					  unsigned long vl)
> +{
> +	task_set_vl_onexec(task, ARM64_VEC_SVE, vl);
> +}
>  
>  #define SCTLR_USER_MASK                                                        \
>  	(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 6623c99f0984..d5c8ac81ce11 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -78,7 +78,7 @@ int arch_dup_task_struct(struct task_struct *dst,
>  #define TIF_SINGLESTEP		21
>  #define TIF_32BIT		22	/* 32bit process */
>  #define TIF_SVE			23	/* Scalable Vector Extension in use */
> -#define TIF_SVE_VL_INHERIT	24	/* Inherit sve_vl_onexec across exec */
> +#define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
>  #define TIF_SSBD		25	/* Wants SSB mitigation */
>  #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
>  
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 44bb3203c9d1..814080209093 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -133,6 +133,17 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
>  #endif
>  };
>  
> +static unsigned int vec_vl_inherit_flag(enum vec_type type)
> +{
> +	switch (type) {
> +	case ARM64_VEC_SVE:
> +		return TIF_SVE_VL_INHERIT;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +}
> +
>  struct vl_config {
>  	int __default_vl;		/* Default VL for tasks */
>  };
> @@ -239,24 +250,27 @@ static void sve_free(struct task_struct *task)
>  	__sve_free(task);
>  }
>  
> -unsigned int task_get_sve_vl(const struct task_struct *task)
> +unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
>  {
> -	return task->thread.sve_vl;
> +	return task->thread.vl[type];
>  }
>  
> -void task_set_sve_vl(struct task_struct *task, unsigned long vl)
> +void task_set_vl(struct task_struct *task, enum vec_type type,
> +		 unsigned long vl)
>  {
> -	task->thread.sve_vl = vl;
> +	task->thread.vl[type] = vl;
>  }
>  
> -unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
> +unsigned int task_get_vl_onexec(const struct task_struct *task,
> +				enum vec_type type)
>  {
> -	return task->thread.sve_vl_onexec;
> +	return task->thread.vl_onexec[type];
>  }
>  
> -void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl)
> +void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
> +			unsigned long vl)
>  {
> -	task->thread.sve_vl_onexec = vl;
> +	task->thread.vl_onexec[type] = vl;
>  }
>  
>  /*
> @@ -1074,10 +1088,43 @@ void fpsimd_thread_switch(struct task_struct *next)
>  	__put_cpu_fpsimd_context();
>  }
>  
> -void fpsimd_flush_thread(void)
> +static void fpsimd_flush_thread_vl(enum vec_type type)
>  {
>  	int vl, supported_vl;
>  
> +	/*
> +	 * Reset the task vector length as required.  This is where we
> +	 * ensure that all user tasks have a valid vector length

Nice to rewrap this to 80 chars whilst you are here?

> +	 * configured: no kernel task can become a user task without
> +	 * an exec and hence a call to this function.  By the time the
> +	 * first call to this function is made, all early hardware
> +	 * probing is complete, so __sve_default_vl should be valid.
> +	 * If a bug causes this to go wrong, we make some noise and
> +	 * try to fudge thread.sve_vl to a safe value here.
> +	 */
> +	vl = task_get_vl_onexec(current, type);
> +	if (!vl)
> +		vl = get_default_vl(type);
> +
> +	if (WARN_ON(!sve_vl_valid(vl)))
> +		vl = SVE_VL_MIN;
> +
> +	supported_vl = find_supported_vector_length(type, vl);
> +	if (WARN_ON(supported_vl != vl))
> +		vl = supported_vl;
> +
> +	task_set_vl(current, type, vl);
> +
> +	/*
> +	 * If the task is not set to inherit, ensure that the vector
> +	 * length will be reset by a subsequent exec:
> +	 */
> +	if (!test_thread_flag(vec_vl_inherit_flag(type)))
> +		task_set_vl_onexec(current, type, 0);
> +}
> +
> +void fpsimd_flush_thread(void)
> +{
>  	if (!system_supports_fpsimd())
>  		return;
>  
> @@ -1090,37 +1137,7 @@ void fpsimd_flush_thread(void)
>  	if (system_supports_sve()) {
>  		clear_thread_flag(TIF_SVE);
>  		sve_free(current);
> -
> -		/*
> -		 * Reset the task vector length as required.
> -		 * This is where we ensure that all user tasks have a valid
> -		 * vector length configured: no kernel task can become a user
> -		 * task without an exec and hence a call to this function.
> -		 * By the time the first call to this function is made, all
> -		 * early hardware probing is complete, so __sve_default_vl
> -		 * should be valid.
> -		 * If a bug causes this to go wrong, we make some noise and
> -		 * try to fudge thread.sve_vl to a safe value here.
> -		 */
> -		vl = task_get_sve_vl_onexec(current);
> -		if (!vl)
> -			vl = get_sve_default_vl();
> -
> -		if (WARN_ON(!sve_vl_valid(vl)))
> -			vl = SVE_VL_MIN;
> -
> -		supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
> -		if (WARN_ON(supported_vl != vl))
> -			vl = supported_vl;
> -
> -		task_set_sve_vl(current, vl);
> -
> -		/*
> -		 * If the task is not set to inherit, ensure that the vector
> -		 * length will be reset by a subsequent exec:
> -		 */
> -		if (!test_thread_flag(TIF_SVE_VL_INHERIT))
> -			task_set_sve_vl_onexec(current, 0);
> +		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
>  	}
>  
>  	put_cpu_fpsimd_context();


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
@ 2021-10-11 10:20     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 10:20 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:14 +0100
Mark Brown <broonie@kernel.org> wrote:

> As for SVE we will track a per task SME vector length for tasks. Convert
> the existing storage for the vector length into an array and update
> fpsimd_flush_task() to initialise this in a function.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>

I'm clearly having a trivial comment day.  Given reduction in indenting
it would be nice perhaps to reformat comments to take that into account.

I'm also unconvinced the trivial wrappers are worthwhile.  (maybe you drop
those later?)

Jonathan

> ---
>  arch/arm64/include/asm/processor.h   | 44 +++++++++++--
>  arch/arm64/include/asm/thread_info.h |  2 +-
>  arch/arm64/kernel/fpsimd.c           | 97 ++++++++++++++++------------
>  3 files changed, 95 insertions(+), 48 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index fb0608fe9ded..9b854e8196df 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -152,8 +152,8 @@ struct thread_struct {
>  
>  	unsigned int		fpsimd_cpu;
>  	void			*sve_state;	/* SVE registers, if any */
> -	unsigned int		sve_vl;		/* SVE vector length */
> -	unsigned int		sve_vl_onexec;	/* SVE vl after next exec */
> +	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
> +	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
>  	unsigned long		fault_address;	/* fault info */
>  	unsigned long		fault_code;	/* ESR_EL1 value */
>  	struct debug_info	debug;		/* debugging */
> @@ -169,15 +169,45 @@ struct thread_struct {
>  	u64			sctlr_user;
>  };
>  
> +static inline unsigned int thread_get_vl(struct thread_struct *thread,
> +					 enum vec_type type)
> +{
> +	return thread->vl[type];
> +}
> +
>  static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
>  {
> -	return thread->sve_vl;
> +	return thread_get_vl(thread, ARM64_VEC_SVE);
> +}
> +
> +unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
> +void task_set_vl(struct task_struct *task, enum vec_type type,
> +		 unsigned long vl);
> +void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
> +			unsigned long vl);
> +unsigned int task_get_vl_onexec(const struct task_struct *task,
> +				enum vec_type type);
> +
> +static inline unsigned int task_get_sve_vl(const struct task_struct *task)
> +{
> +	return task_get_vl(task, ARM64_VEC_SVE);
>  }
>  
> -unsigned int task_get_sve_vl(const struct task_struct *task);
> -void task_set_sve_vl(struct task_struct *task, unsigned long vl);
> -unsigned int task_get_sve_vl_onexec(const struct task_struct *task);
> -void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl);
> +static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
> +{
> +	task_set_vl(task, ARM64_VEC_SVE, vl);

Are these really worth having?  They make the refactor simpler perhaps, but
don't add any meaning in of themselves over just having the calls inline.

> +}
> +
> +static inline unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
> +{
> +	return task_get_vl_onexec(task, ARM64_VEC_SVE);
> +}
> +
> +static inline void task_set_sve_vl_onexec(struct task_struct *task,
> +					  unsigned long vl)
> +{
> +	task_set_vl_onexec(task, ARM64_VEC_SVE, vl);
> +}
>  
>  #define SCTLR_USER_MASK                                                        \
>  	(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 6623c99f0984..d5c8ac81ce11 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -78,7 +78,7 @@ int arch_dup_task_struct(struct task_struct *dst,
>  #define TIF_SINGLESTEP		21
>  #define TIF_32BIT		22	/* 32bit process */
>  #define TIF_SVE			23	/* Scalable Vector Extension in use */
> -#define TIF_SVE_VL_INHERIT	24	/* Inherit sve_vl_onexec across exec */
> +#define TIF_SVE_VL_INHERIT	24	/* Inherit SVE vl_onexec across exec */
>  #define TIF_SSBD		25	/* Wants SSB mitigation */
>  #define TIF_TAGGED_ADDR		26	/* Allow tagged user addresses */
>  
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 44bb3203c9d1..814080209093 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -133,6 +133,17 @@ __ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
>  #endif
>  };
>  
> +static unsigned int vec_vl_inherit_flag(enum vec_type type)
> +{
> +	switch (type) {
> +	case ARM64_VEC_SVE:
> +		return TIF_SVE_VL_INHERIT;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +}
> +
>  struct vl_config {
>  	int __default_vl;		/* Default VL for tasks */
>  };
> @@ -239,24 +250,27 @@ static void sve_free(struct task_struct *task)
>  	__sve_free(task);
>  }
>  
> -unsigned int task_get_sve_vl(const struct task_struct *task)
> +unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
>  {
> -	return task->thread.sve_vl;
> +	return task->thread.vl[type];
>  }
>  
> -void task_set_sve_vl(struct task_struct *task, unsigned long vl)
> +void task_set_vl(struct task_struct *task, enum vec_type type,
> +		 unsigned long vl)
>  {
> -	task->thread.sve_vl = vl;
> +	task->thread.vl[type] = vl;
>  }
>  
> -unsigned int task_get_sve_vl_onexec(const struct task_struct *task)
> +unsigned int task_get_vl_onexec(const struct task_struct *task,
> +				enum vec_type type)
>  {
> -	return task->thread.sve_vl_onexec;
> +	return task->thread.vl_onexec[type];
>  }
>  
> -void task_set_sve_vl_onexec(struct task_struct *task, unsigned long vl)
> +void task_set_vl_onexec(struct task_struct *task, enum vec_type type,
> +			unsigned long vl)
>  {
> -	task->thread.sve_vl_onexec = vl;
> +	task->thread.vl_onexec[type] = vl;
>  }
>  
>  /*
> @@ -1074,10 +1088,43 @@ void fpsimd_thread_switch(struct task_struct *next)
>  	__put_cpu_fpsimd_context();
>  }
>  
> -void fpsimd_flush_thread(void)
> +static void fpsimd_flush_thread_vl(enum vec_type type)
>  {
>  	int vl, supported_vl;
>  
> +	/*
> +	 * Reset the task vector length as required.  This is where we
> +	 * ensure that all user tasks have a valid vector length

Nice to rewrap this to 80 chars whilst you are here?

> +	 * configured: no kernel task can become a user task without
> +	 * an exec and hence a call to this function.  By the time the
> +	 * first call to this function is made, all early hardware
> +	 * probing is complete, so __sve_default_vl should be valid.
> +	 * If a bug causes this to go wrong, we make some noise and
> +	 * try to fudge thread.sve_vl to a safe value here.
> +	 */
> +	vl = task_get_vl_onexec(current, type);
> +	if (!vl)
> +		vl = get_default_vl(type);
> +
> +	if (WARN_ON(!sve_vl_valid(vl)))
> +		vl = SVE_VL_MIN;
> +
> +	supported_vl = find_supported_vector_length(type, vl);
> +	if (WARN_ON(supported_vl != vl))
> +		vl = supported_vl;
> +
> +	task_set_vl(current, type, vl);
> +
> +	/*
> +	 * If the task is not set to inherit, ensure that the vector
> +	 * length will be reset by a subsequent exec:
> +	 */
> +	if (!test_thread_flag(vec_vl_inherit_flag(type)))
> +		task_set_vl_onexec(current, type, 0);
> +}
> +
> +void fpsimd_flush_thread(void)
> +{
>  	if (!system_supports_fpsimd())
>  		return;
>  
> @@ -1090,37 +1137,7 @@ void fpsimd_flush_thread(void)
>  	if (system_supports_sve()) {
>  		clear_thread_flag(TIF_SVE);
>  		sve_free(current);
> -
> -		/*
> -		 * Reset the task vector length as required.
> -		 * This is where we ensure that all user tasks have a valid
> -		 * vector length configured: no kernel task can become a user
> -		 * task without an exec and hence a call to this function.
> -		 * By the time the first call to this function is made, all
> -		 * early hardware probing is complete, so __sve_default_vl
> -		 * should be valid.
> -		 * If a bug causes this to go wrong, we make some noise and
> -		 * try to fudge thread.sve_vl to a safe value here.
> -		 */
> -		vl = task_get_sve_vl_onexec(current);
> -		if (!vl)
> -			vl = get_sve_default_vl();
> -
> -		if (WARN_ON(!sve_vl_valid(vl)))
> -			vl = SVE_VL_MIN;
> -
> -		supported_vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
> -		if (WARN_ON(supported_vl != vl))
> -			vl = supported_vl;
> -
> -		task_set_sve_vl(current, vl);
> -
> -		/*
> -		 * If the task is not set to inherit, ensure that the vector
> -		 * length will be reset by a subsequent exec:
> -		 */
> -		if (!test_thread_flag(TIF_SVE_VL_INHERIT))
> -			task_set_sve_vl_onexec(current, 0);
> +		fpsimd_flush_thread_vl(ARM64_VEC_SVE);
>  	}
>  
>  	put_cpu_fpsimd_context();


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 10:27     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 10:27 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:16 +0100
Mark Brown <broonie@kernel.org> wrote:

> In preparation for adding SME support update the bulk of the implementation
> for the vector length configuration prctl() calls to be independent of
> vector type.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>

And let the trivial continue (which is just a reflection of the fact I don't have
enough of a grasp on this to find anything substantial - or there is nothing there
to find :)

...

>  	/*
> -	 * Clamp to the maximum vector length that VL-agnostic SVE code can
> -	 * work with.  A flag may be assigned in the future to allow setting
> -	 * of larger vector lengths without confusing older software.
> +	 * Clamp to the maximum vector length that VL-agnostic code
> +	 * can work with.  A flag may be assigned in the future to
> +	 * allow setting of larger vector lengths without confusing
> +	 * older software.

Why the oddly short wrapping at sub 70 chars?

>  	 */
> -	if (vl > SVE_VL_ARCH_MAX)
> -		vl = SVE_VL_ARCH_MAX;
> +	if (vl > VL_ARCH_MAX)
> +		vl = VL_ARCH_MAX;

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME
@ 2021-10-11 10:27     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 10:27 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:16 +0100
Mark Brown <broonie@kernel.org> wrote:

> In preparation for adding SME support update the bulk of the implementation
> for the vector length configuration prctl() calls to be independent of
> vector type.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>

And let the trivial continue (which is just a reflection of the fact I don't have
enough of a grasp on this to find anything substantial - or there is nothing there
to find :)

...

>  	/*
> -	 * Clamp to the maximum vector length that VL-agnostic SVE code can
> -	 * work with.  A flag may be assigned in the future to allow setting
> -	 * of larger vector lengths without confusing older software.
> +	 * Clamp to the maximum vector length that VL-agnostic code
> +	 * can work with.  A flag may be assigned in the future to
> +	 * allow setting of larger vector lengths without confusing
> +	 * older software.

Why the oddly short wrapping at sub 70 chars?

>  	 */
> -	if (vl > SVE_VL_ARCH_MAX)
> -		vl = SVE_VL_ARCH_MAX;
> +	if (vl > VL_ARCH_MAX)
> +		vl = VL_ARCH_MAX;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 11:05     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 11:05 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:18 +0100
Mark Brown <broonie@kernel.org> wrote:

> Provide ABI documentation for SME similar to that for SVE. Due to the very
> large overlap around streaming SVE mode in both implementation and
> interfaces documentation for streaming mode SVE is added to the SVE
> document rather than the SME one.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
A few more editorial bits to add to what Alan pointed out.

thanks,

Jonathan

> ---
>  Documentation/arm64/index.rst |   1 +
>  Documentation/arm64/sme.rst   | 427 ++++++++++++++++++++++++++++++++++
>  Documentation/arm64/sve.rst   |  62 ++++-
>  3 files changed, 480 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/arm64/sme.rst
> 
> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
> index 4f840bac083e..ae21f8118830 100644
> --- a/Documentation/arm64/index.rst
> +++ b/Documentation/arm64/index.rst
> @@ -21,6 +21,7 @@ ARM64 Architecture
>      perf
>      pointer-authentication
>      silicon-errata
> +    sme
>      sve
>      tagged-address-abi
>      tagged-pointers
> diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
> new file mode 100644
> index 000000000000..7a69bf637afd
> --- /dev/null
> +++ b/Documentation/arm64/sme.rst
> @@ -0,0 +1,427 @@
> +===================================================
> +Scalable Matrix Extension support for AArch64 Linux
> +===================================================
> +
> +This document outlines briefly the interface provided to userspace by Linux in
> +order to support use of the ARM Scalable Matrix Extension (SME).
> +
> +This is an outline of the most important features and issues only and not
> +intended to be exhaustive.  It should be read in conjunction with the SVE
> +documentation in sve.rst which provides details on the Streaming SVE mode
> +included in SME.
> +
> +This document does not aim to describe the SME architecture or programmer's
> +model.  To aid understanding, a minimal description of relevant programmer's
> +model features for SME is included in Appendix A.
> +
> +
> +1.  General
> +-----------
> +
> +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> +  register state are tracked per thread.
> +
> +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
> +  AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
> +  instructions and registers, and the Linux-specific system interfaces
> +  described in this document.  SME is reported in /proc/cpuinfo as "sme".
> +
> +* Support for the execution of SME instructions in userspace can also be
> +  detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
> +  instruction, and checking that the value of the SME field is nonzero. [3]
> +
> +  It does not guarantee the presence of the system interfaces described in the
> +  following sections: software that needs to verify that those interfaces are
> +  present must check for HWCAP2_SME instead.
> +
> +* There are a number of optional SME features, presence of these is reported
> +  through AT_HWCAP2 through:
> +
> +	HWCAP2_SME_I16I64
> +	HWCAP2_SME_F64F64
> +	HWCAP2_SME_I8I32
> +	HWCAP2_SME_F16F32
> +	HWCAP2_SME_B16F32
> +	HWCAP2_SME_F32F32
> +
> +  This list may be extended over time as the SME architecture evolves.
> +
> +  These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
> +  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
> +  cpu-feature-registers.txt for details.
> +
> +* Debuggers should restrict themselves to interacting with the target via the
> +  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
> +  of detecting support for these regsets is to connect to a target process
> +  first and then attempt a
> +
> +	ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
> +
> +* Whenever ZA register values are exchanged in memory between userspace and
> +  the kernel, the register value is encoded in memory as a series of horizontal
> +  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
> +  used for SVE vectors.
> +
> +
> +2.  Vector lengths
> +------------------
> +
> +SME defines a second vector length similar to the SVE vector length which is
> +controls the size of the streaming mode SVE vectors and the ZA matrix array.
> +The ZA matrix is square with each side having as many bytes as a SVE vector.
> +
> +
> +3.  Sharing of streaming and non-streaming mode SVE state
> +---------------------------------------------------------
> +
> +It is implementation defined which if any parts of the SVE state are shared
> +between streaming and non-streaming modes.  When switching between modes
> +via software interfaces such as ptrace if no register content is provided as
> +part of switching no state will be assumed to be shared and everything will
> +be zeroed.
> +
> +
> +4.  System call behaviour
> +-------------------------
> +
> +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
> +  ZA matrix are preserved.
> +
> +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
> +  as normal.
> +
> +* Neither the SVE registers nor ZA are used to pass arguments to or receive
> +  results from any syscall.
> +
> +* On creation fork() or clone() the newly created process will have PSTATE.SM
> +  and PSTATE.ZA cleared.
> +
> +* All other SME state of a thread, including the currently configured vector
> +  length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
> +  length (if any), is preserved across all syscalls, subject to the specific
> +  exceptions for execve() described in section 6.
> +
> +
> +5.  Signal handling
> +-------------------
> +
> +* A new signal frame record za_context encodes the ZA register contents on
> +  signal delivery. [1]
> +
> +* The signal frame record for ZA always contains basic metadata, in particular
> +  the thread's vector length (in za_context.vl).
> +
> +* The ZA matrix may or may not be included in the record, depending on
> +  the value of PSTATE.ZA.  The registers are present if  and only if:

                                                          ^^

> +  za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
> +  in which case PSTATE.ZA == 1.
> +
> +* If matrix data is present, the remainder of the record has a vl-dependent
> +  size and layout.  Macros ZA_SIG_* are defined [1] to facilitate access to
> +  them.
> +
> +* The matrix is stored as a series of horizontal vectors in the same format as
> +  is used for SVE vectors.
> +
> +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
> +  space is allocated on the stack, an extra_context record is written in
> +  __reserved[] referencing this space.  za_context is then written in the
> +  extra space.  Refer to [1] for further details about this mechanism.
> +
> +
> +5.  Signal return
> +-----------------
> +
> +When returning from a signal handler:
> +
> +* If there is no za_context record in the signal frame, or if the record is
> +  present but contains no register data as desribed in the previous section,
> +  then ZA is disabled.
> +
> +* If za_context is present in the signal frame and contains matrix data then
> +  PSTATE.ZA is set to 1 and ZA is populated with the specified data.
> +
> +* The vector length cannot be changed via signal return.  If za_context.vl in
> +  the signal frame does not match the current vector length, the signal return
> +  attempt is treated as illegal, resulting in a forced SIGSEGV.
> +
> +
> +6.  prctl extensions
> +--------------------
> +
> +Some new prctl() calls are added to allow programs to manage the SVE vector

Another SVE rather than SME?

> +length:
> +
> +prctl(PR_SME_SET_VL, unsigned long arg)
> +
> +    Sets the vector length of the calling thread and related flags, where
> +    arg == vl | flags.  Other threads of the calling process are unaffected.
> +
> +    vl is the desired vector length, where sve_vl_valid(vl) must be true.
> +
> +    flags:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Inherit the current vector length across execve().  Otherwise, the
> +	    vector length is reset to the system default at execve().  (See
> +	    Section 9.)
> +
> +	PR_SME_SET_VL_ONEXEC
> +
> +	    Defer the requested vector length change until the next execve()
> +	    performed by this thread.
> +
> +	    The effect is equivalent to implicit exceution of the following
> +	    call immediately after the next execve() (if any) by the thread:
> +
> +		prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
> +
> +	    This allows launching of a new program with a different vector
> +	    length, while avoiding runtime side effects in the caller.
> +
> +	    Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
> +	    immediately.
> +
> +
> +    Return value: a nonnegative on success, or a negative value on error:
> +	EINVAL: SME not supported, invalid vector length requested, or
> +	    invalid flags.
> +
> +
> +    On success:
> +
> +    * Either the calling thread's vector length or the deferred vector length
> +      to be applied at the next execve() by the thread (dependent on whether
> +      PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
> +      supported by the system that is less than or equal to vl.  If vl ==
> +      SVE_VL_MAX, the value set will be the largest value supported by the
> +      system.
> +
> +    * Any previously outstanding deferred vector length change in the calling
> +      thread is cancelled.
> +
> +    * The returned value describes the resulting configuration, encoded as for
> +      PR_SME_GET_VL.  The vector length reported in this value is the new
> +      current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
> +      present in arg; otherwise, the reported vector length is the deferred
> +      vector length that will be applied at the next execve() by the calling
> +      thread.
> +
> +    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
> +      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
> +      unspecified, including both streaming and non-streaming SVE state.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +
> +prctl(PR_SME_GET_VL)
> +
> +    Gets the vector length of the calling thread.
> +
> +    The following flag may be OR-ed into the result:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Vector length will be inherited across execve().
> +
> +    There is no way to determine whether there is an outstanding deferred
> +    vector length change (which would only normally be the case between a
> +    fork() or vfork() and the corresponding execve() in typical use).
> +
> +    To extract the vector length from the result, and it with
> +    PR_SME_VL_LEN_MASK.
> +
> +    Return value: a nonnegative value on success, or a negative value on error:
> +	EINVAL: SME not supported.
> +
> +
> +7.  ptrace extensions
> +---------------------
> +
> +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
> +  state via PTRACE_GETREGSET and  PTRACE_SETREGSET, this is documented in
> +  sve.rst.
> +
> +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
> +  PTRACE_GETREGSET and PTRACE_SETREGSET.
> +
> +  Refer to [2] for definitions.
> +
> +The regset data starts with struct user_za_header, containing:
> +
> +    size
> +
> +	Size of the complete regset, in bytes.
> +	This depends on vl and possibly on other things in the future.
> +
> +	If a call to PTRACE_GETREGSET requests less data than the value of
> +	size, the caller can allocate a larger buffer and retry in order to
> +	read the complete regset.
> +
> +    max_size
> +
> +	Maximum size in bytes that the regset can grow to for the target
> +	thread.  The regset won't grow bigger than this even if the target
> +	thread changes its vector length etc.
> +
> +    vl
> +
> +	Target thread's current streaming vector length, in bytes.
> +
> +    max_vl
> +
> +	Maximum possible streaming vector length for the target thread.
> +
> +    flags
> +
> +	Zero or more of the following flags, which have the same
> +	meaning and behaviour as the corresponding PR_SET_VL_* flags:
> +
> +	    SME_PT_VL_INHERIT
> +
> +	    SME_PT_VL_ONEXEC (SETREGSET only).
> +
> +* The effects of changing the vector length and/or flags are equivalent to
> +  those documented for PR_SME_SET_VL.
> +
> +  The caller must make a further GETREGSET call if it needs to know what VL is
> +  actually set by SETREGSET, unless is it known in advance that the requested
> +  VL is supported.
> +
> +* The size and layout of the payload depends on the header fields.  The
> +  SME_PT_ZA_*() macros are provided to facilitate access to the data.
> +
> +* In either case, for SETREGSET it is permissible to omit the payload, in which
> +  case the vector length and flags are changed and PSTATE.ZA is set to 0
> +  (along with any consequences of those changes).  If a payload is provided
> +  then PSTATE.ZA will be set to 1.
> +
> +* For SETREGSET, if the requested VL is not supported, the effect will be the
> +  same as if the payload were omitted, except that an EIO error is reported.
> +  No attempt is made to translate the payload data to the correct layout
> +  for the vector length actually set.  It is up to the caller to translate the
> +  payload layout for the actual VL and retry.
> +
> +* The effect of writing a partial, incomplete payload is unspecified.
> +
> +
> +8.  ELF coredump extensions
> +---------------------------
> +
> +* NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
> +
> +* A NT_ARM_ZA note will be added to each coredump for each thread of the
> +  dumped process.  The contents will be equivalent to the data that would have
> +  been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
> +  when the coredump was generated.
> +
> +
> +9.  System runtime configuration
> +--------------------------------
> +
> +* To mitigate the ABI impact of expansion of the signal frame, a policy
> +  mechanism is provided for administrators, distro maintainers and developers
> +  to set the default vector length for userspace processes:
> +
> +/proc/sys/abi/sme_default_vector_length
> +
> +    Writing the text representation of an integer to this file sets the system
> +    default vector length to the specified value, unless the value is greater
> +    than the maximum vector length supported by the system in which case the
> +    default vector length is set to that maximum.
> +
> +    The result can be determined by reopening the file and reading its
> +    contents.
> +
> +    At boot, the default vector length is initially set to 32 or the maximum
> +    supported vector length, whichever is smaller and supported.  This
> +    determines the initial vector length of the init process (PID 1).
> +
> +    Reading this file returns the current system default vector length.
> +
> +* At every execve() call, the new vector length of the new process is set to
> +  the system default vector length, unless
> +
> +    * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
> +      calling thread, or
> +
> +    * a deferred vector length change is pending, established via the
> +      PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
> +
> +* Modifying the system default vector length does not affect the vector length
> +  of any existing process or thread that does not make an execve() call.
> +
> +
> +Appendix A.  SME programmer's model (informative)
> +=================================================
> +
> +This section provides a minimal description of the additions made by SVE to the
> +ARMv8-A programmer's model that are relevant to this document.
> +
> +Note: This section is for information only and not intended to be complete or
> +to replace any architectural specification.
> +
> +A.1.  Registers
> +---------------
> +
> +In A64 state, SME adds the following:
> +
> +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
> +  features are available.  When supported EL0 software may enter and leave
> +  streaming mode at any time.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  streaming mode only when it is actively being used.
> +
> +* A new vector length controlling the size of ZA and the Z registers when in
> +  streaming mode, separately to the vector length used for SVE when not in
> +  streaming mode.  There is no requirement that either the currently selected
> +  vector length or the set of vector lengths supported for the two modes in
> +  a given system have any relationship.  The streaming mode vector length
> +  is referred to as SVL.
> +
> +* A new ZA matrix register.  This is a square matrix of SVLxSVL bits.  Most
> +  operations on ZA require that streaming mode be enabled but ZA can be
> +  enabled without streaming mode in order to load, save and retain data.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  ZA only when it is actively being used.
> +
> +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
> +  SMSTOP instructions or by access to the SVCR system register:
> +
> +  * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
> +    data while if it is 0 then ZA can not be accessed.  When PSTATE.ZA is
> +    changed from 0 to 1 all bits in ZA are cleared.
> +
> +  * PSTATE.SM, if this is 1 then the PE is in streaming mode.  When the value
> +    of PSTATE.SM is changed then it is implementationd defined if the subset
> +    of the floating point register bits valid in both modes may be retained.
> +    Any other bits will be cleared.
> +
> +
> +References
> +==========
> +
> +[1] arch/arm64/include/uapi/asm/sigcontext.h
> +    AArch64 Linux signal ABI definitions
> +
> +[2] arch/arm64/include/uapi/asm/ptrace.h
> +    AArch64 Linux ptrace ABI definitions
> +
> +[3] Documentation/arm64/cpu-feature-registers.rst
> +
> +[4] ARM IHI0055C
> +    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
> +    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
> +    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
> diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
> index 03137154299e..574ad883b0f3 100644
> --- a/Documentation/arm64/sve.rst
> +++ b/Documentation/arm64/sve.rst
> @@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
>  Date:   4 August 2017
>  
>  This document outlines briefly the interface provided to userspace by Linux in
> -order to support use of the ARM Scalable Vector Extension (SVE).
> +order to support use of the ARM Scalable Vector Extension (SVE), including
> +interactions with Streaming SVE mode added by the Scalable Matrix Extension
> +(SME).
>  
>  This is an outline of the most important features and issues only and not
>  intended to be exhaustive.
> @@ -23,6 +25,9 @@ model features for SVE is included in Appendix A.
>  * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
>    tracked per-thread.
>  
> +* In streaming mode FFR is not accessible, when these interfaces are used to
> +  access streaming mode FFR is read and written as zero.
> +
>  * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
>    AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
>    instructions and registers, and the Linux-specific system interfaces
> @@ -53,10 +58,19 @@ model features for SVE is included in Appendix A.
>    which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
>    cpu-feature-registers.txt for details.
>  
> +* On hardware that supports the SME extensions, HWCAP2_SME will also be
> +  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
> +  streaming mode which provides a subset of the SVE feature set using a
> +  separate SME vector length and the same Z/V registers.  See sme.rst
> +  for more details.
> +
>  * Debuggers should restrict themselves to interacting with the target via the
>    NT_ARM_SVE regset.  The recommended way of detecting support for this regset
>    is to connect to a target process first and then attempt a
> -  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
> +  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
> +  present and streaming SVE mode is in use the FPSIMD subset of registers
> +  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
> +  in the target.
>  
>  * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
>    between userspace and the kernel, the register value is encoded in memory in
> @@ -126,6 +140,11 @@ the SVE instruction set architecture.
>    are only present in fpsimd_context.  For convenience, the content of V0..V31
>    is duplicated between sve_context and fpsimd_context.
>  
> +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
> +  if set indicates that the thread is in streaming mode and the vector length
> +  and register data (if present) describe the streaming SVE data and vector
> +  length.
> +
>  * The signal frame record for SVE always contains basic metadata, in particular
>    the thread's vector length (in sve_context.vl).
>  
> @@ -170,6 +189,11 @@ When returning from a signal handler:
>    the signal frame does not match the current vector length, the signal return
>    attempt is treated as illegal, resulting in a forced SIGSEGV.
>  
> +* It is permitted to enter or leave streaming mode by setting or clearing
> +  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
> +  when doing so sve_context.vl and any register data are appropriate for the
> +  vector length in the new mode.
> +
>  
>  6.  prctl extensions
>  --------------------
> @@ -265,8 +289,14 @@ prctl(PR_SVE_GET_VL)
>  7.  ptrace extensions
>  ---------------------
>  
> -* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
> -  PTRACE_SETREGSET.
> +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
> +  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
> +  streaming mode SVE registers and NT_ARM_SVE describes the
> +  non-streaming mode SVE registers.
> +
> +  In this description a register set is referred to as being "live" when
> +  the target is in the appropriate streaming or non-streaming mode and is
> +  using data beyond the subset shared with the FPSIMD Vn registers.
>  
>    Refer to [2] for definitions.
>  
> @@ -297,7 +327,7 @@ The regset data starts with struct user_sve_header, containing:
>  
>      flags
>  
> -	either
> +	at most one of
>  
>  	    SVE_PT_REGS_FPSIMD
>  
> @@ -331,6 +361,10 @@ The regset data starts with struct user_sve_header, containing:
>  
>  	    SVE_PT_VL_ONEXEC (SETREGSET only).
>  
> +	If neither FPSIMD nor SVE flags are provided then no register
> +	payload is available, this is only possible when SME is implemented.
> +
> +
>  * The effects of changing the vector length and/or flags are equivalent to
>    those documented for PR_SVE_SET_VL.
>  
> @@ -355,17 +389,25 @@ The regset data starts with struct user_sve_header, containing:
>    unspecified.  It is up to the caller to translate the payload layout
>    for the actual VL and retry.
>  
> +* Where SME is implemented it is not possible to GETREGSET the register
> +  state for normal SVE when in streaming mode, nor the streaming mode
> +  register state when in normal mode, regardless of the implementation defined
> +  behaviour of the hardware for sharing data between the two modes.
> +
> +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
> +  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
> +  if the target was not in streaming mode.
> +
>  * The effect of writing a partial, incomplete payload is unspecified.
>  
>  
>  8.  ELF coredump extensions
>  ---------------------------
>  
> -* A NT_ARM_SVE note will be added to each coredump for each thread of the
> -  dumped process.  The contents will be equivalent to the data that would have
> -  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
> -  when the coredump was generated.
> -
> +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
>  
>  9.  System runtime configuration
>  --------------------------------


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 11:05     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 11:05 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:18 +0100
Mark Brown <broonie@kernel.org> wrote:

> Provide ABI documentation for SME similar to that for SVE. Due to the very
> large overlap around streaming SVE mode in both implementation and
> interfaces documentation for streaming mode SVE is added to the SVE
> document rather than the SME one.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
A few more editorial bits to add to what Alan pointed out.

thanks,

Jonathan

> ---
>  Documentation/arm64/index.rst |   1 +
>  Documentation/arm64/sme.rst   | 427 ++++++++++++++++++++++++++++++++++
>  Documentation/arm64/sve.rst   |  62 ++++-
>  3 files changed, 480 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/arm64/sme.rst
> 
> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
> index 4f840bac083e..ae21f8118830 100644
> --- a/Documentation/arm64/index.rst
> +++ b/Documentation/arm64/index.rst
> @@ -21,6 +21,7 @@ ARM64 Architecture
>      perf
>      pointer-authentication
>      silicon-errata
> +    sme
>      sve
>      tagged-address-abi
>      tagged-pointers
> diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
> new file mode 100644
> index 000000000000..7a69bf637afd
> --- /dev/null
> +++ b/Documentation/arm64/sme.rst
> @@ -0,0 +1,427 @@
> +===================================================
> +Scalable Matrix Extension support for AArch64 Linux
> +===================================================
> +
> +This document outlines briefly the interface provided to userspace by Linux in
> +order to support use of the ARM Scalable Matrix Extension (SME).
> +
> +This is an outline of the most important features and issues only and not
> +intended to be exhaustive.  It should be read in conjunction with the SVE
> +documentation in sve.rst which provides details on the Streaming SVE mode
> +included in SME.
> +
> +This document does not aim to describe the SME architecture or programmer's
> +model.  To aid understanding, a minimal description of relevant programmer's
> +model features for SME is included in Appendix A.
> +
> +
> +1.  General
> +-----------
> +
> +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> +  register state are tracked per thread.
> +
> +* The presence of SVE is reported to userspace via HWCAP2_SME in the aux vector
> +  AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
> +  instructions and registers, and the Linux-specific system interfaces
> +  described in this document.  SME is reported in /proc/cpuinfo as "sme".
> +
> +* Support for the execution of SME instructions in userspace can also be
> +  detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
> +  instruction, and checking that the value of the SME field is nonzero. [3]
> +
> +  It does not guarantee the presence of the system interfaces described in the
> +  following sections: software that needs to verify that those interfaces are
> +  present must check for HWCAP2_SME instead.
> +
> +* There are a number of optional SME features, presence of these is reported
> +  through AT_HWCAP2 through:
> +
> +	HWCAP2_SME_I16I64
> +	HWCAP2_SME_F64F64
> +	HWCAP2_SME_I8I32
> +	HWCAP2_SME_F16F32
> +	HWCAP2_SME_B16F32
> +	HWCAP2_SME_F32F32
> +
> +  This list may be extended over time as the SME architecture evolves.
> +
> +  These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1,
> +  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
> +  cpu-feature-registers.txt for details.
> +
> +* Debuggers should restrict themselves to interacting with the target via the
> +  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
> +  of detecting support for these regsets is to connect to a target process
> +  first and then attempt a
> +
> +	ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
> +
> +* Whenever ZA register values are exchanged in memory between userspace and
> +  the kernel, the register value is encoded in memory as a series of horizontal
> +  vectors from 0 to VL/8-1 stored in the same endianness invariant format as is
> +  used for SVE vectors.
> +
> +
> +2.  Vector lengths
> +------------------
> +
> +SME defines a second vector length similar to the SVE vector length which is
> +controls the size of the streaming mode SVE vectors and the ZA matrix array.
> +The ZA matrix is square with each side having as many bytes as a SVE vector.
> +
> +
> +3.  Sharing of streaming and non-streaming mode SVE state
> +---------------------------------------------------------
> +
> +It is implementation defined which if any parts of the SVE state are shared
> +between streaming and non-streaming modes.  When switching between modes
> +via software interfaces such as ptrace if no register content is provided as
> +part of switching no state will be assumed to be shared and everything will
> +be zeroed.
> +
> +
> +4.  System call behaviour
> +-------------------------
> +
> +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
> +  ZA matrix are preserved.
> +
> +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled
> +  as normal.
> +
> +* Neither the SVE registers nor ZA are used to pass arguments to or receive
> +  results from any syscall.
> +
> +* On creation fork() or clone() the newly created process will have PSTATE.SM
> +  and PSTATE.ZA cleared.
> +
> +* All other SME state of a thread, including the currently configured vector
> +  length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector
> +  length (if any), is preserved across all syscalls, subject to the specific
> +  exceptions for execve() described in section 6.
> +
> +
> +5.  Signal handling
> +-------------------
> +
> +* A new signal frame record za_context encodes the ZA register contents on
> +  signal delivery. [1]
> +
> +* The signal frame record for ZA always contains basic metadata, in particular
> +  the thread's vector length (in za_context.vl).
> +
> +* The ZA matrix may or may not be included in the record, depending on
> +  the value of PSTATE.ZA.  The registers are present if  and only if:

                                                          ^^

> +  za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
> +  in which case PSTATE.ZA == 1.
> +
> +* If matrix data is present, the remainder of the record has a vl-dependent
> +  size and layout.  Macros ZA_SIG_* are defined [1] to facilitate access to
> +  them.
> +
> +* The matrix is stored as a series of horizontal vectors in the same format as
> +  is used for SVE vectors.
> +
> +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra
> +  space is allocated on the stack, an extra_context record is written in
> +  __reserved[] referencing this space.  za_context is then written in the
> +  extra space.  Refer to [1] for further details about this mechanism.
> +
> +
> +5.  Signal return
> +-----------------
> +
> +When returning from a signal handler:
> +
> +* If there is no za_context record in the signal frame, or if the record is
> +  present but contains no register data as desribed in the previous section,
> +  then ZA is disabled.
> +
> +* If za_context is present in the signal frame and contains matrix data then
> +  PSTATE.ZA is set to 1 and ZA is populated with the specified data.
> +
> +* The vector length cannot be changed via signal return.  If za_context.vl in
> +  the signal frame does not match the current vector length, the signal return
> +  attempt is treated as illegal, resulting in a forced SIGSEGV.
> +
> +
> +6.  prctl extensions
> +--------------------
> +
> +Some new prctl() calls are added to allow programs to manage the SVE vector

Another SVE rather than SME?

> +length:
> +
> +prctl(PR_SME_SET_VL, unsigned long arg)
> +
> +    Sets the vector length of the calling thread and related flags, where
> +    arg == vl | flags.  Other threads of the calling process are unaffected.
> +
> +    vl is the desired vector length, where sve_vl_valid(vl) must be true.
> +
> +    flags:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Inherit the current vector length across execve().  Otherwise, the
> +	    vector length is reset to the system default at execve().  (See
> +	    Section 9.)
> +
> +	PR_SME_SET_VL_ONEXEC
> +
> +	    Defer the requested vector length change until the next execve()
> +	    performed by this thread.
> +
> +	    The effect is equivalent to implicit exceution of the following
> +	    call immediately after the next execve() (if any) by the thread:
> +
> +		prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC)
> +
> +	    This allows launching of a new program with a different vector
> +	    length, while avoiding runtime side effects in the caller.
> +
> +	    Without PR_SME_SET_VL_ONEXEC, the requested change takes effect
> +	    immediately.
> +
> +
> +    Return value: a nonnegative on success, or a negative value on error:
> +	EINVAL: SME not supported, invalid vector length requested, or
> +	    invalid flags.
> +
> +
> +    On success:
> +
> +    * Either the calling thread's vector length or the deferred vector length
> +      to be applied at the next execve() by the thread (dependent on whether
> +      PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value
> +      supported by the system that is less than or equal to vl.  If vl ==
> +      SVE_VL_MAX, the value set will be the largest value supported by the
> +      system.
> +
> +    * Any previously outstanding deferred vector length change in the calling
> +      thread is cancelled.
> +
> +    * The returned value describes the resulting configuration, encoded as for
> +      PR_SME_GET_VL.  The vector length reported in this value is the new
> +      current vector length for this thread if PR_SME_SET_VL_ONEXEC was not
> +      present in arg; otherwise, the reported vector length is the deferred
> +      vector length that will be applied at the next execve() by the calling
> +      thread.
> +
> +    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
> +      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
> +      unspecified, including both streaming and non-streaming SVE state.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +    * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared.
> +      Calling PR_SME_SET_VL with vl equal to the thread's current vector
> +      length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
> +      does not constitute a change to the vector length for this purpose.
> +
> +
> +prctl(PR_SME_GET_VL)
> +
> +    Gets the vector length of the calling thread.
> +
> +    The following flag may be OR-ed into the result:
> +
> +	PR_SME_VL_INHERIT
> +
> +	    Vector length will be inherited across execve().
> +
> +    There is no way to determine whether there is an outstanding deferred
> +    vector length change (which would only normally be the case between a
> +    fork() or vfork() and the corresponding execve() in typical use).
> +
> +    To extract the vector length from the result, and it with
> +    PR_SME_VL_LEN_MASK.
> +
> +    Return value: a nonnegative value on success, or a negative value on error:
> +	EINVAL: SME not supported.
> +
> +
> +7.  ptrace extensions
> +---------------------
> +
> +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE
> +  state via PTRACE_GETREGSET and  PTRACE_SETREGSET, this is documented in
> +  sve.rst.
> +
> +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via
> +  PTRACE_GETREGSET and PTRACE_SETREGSET.
> +
> +  Refer to [2] for definitions.
> +
> +The regset data starts with struct user_za_header, containing:
> +
> +    size
> +
> +	Size of the complete regset, in bytes.
> +	This depends on vl and possibly on other things in the future.
> +
> +	If a call to PTRACE_GETREGSET requests less data than the value of
> +	size, the caller can allocate a larger buffer and retry in order to
> +	read the complete regset.
> +
> +    max_size
> +
> +	Maximum size in bytes that the regset can grow to for the target
> +	thread.  The regset won't grow bigger than this even if the target
> +	thread changes its vector length etc.
> +
> +    vl
> +
> +	Target thread's current streaming vector length, in bytes.
> +
> +    max_vl
> +
> +	Maximum possible streaming vector length for the target thread.
> +
> +    flags
> +
> +	Zero or more of the following flags, which have the same
> +	meaning and behaviour as the corresponding PR_SET_VL_* flags:
> +
> +	    SME_PT_VL_INHERIT
> +
> +	    SME_PT_VL_ONEXEC (SETREGSET only).
> +
> +* The effects of changing the vector length and/or flags are equivalent to
> +  those documented for PR_SME_SET_VL.
> +
> +  The caller must make a further GETREGSET call if it needs to know what VL is
> +  actually set by SETREGSET, unless is it known in advance that the requested
> +  VL is supported.
> +
> +* The size and layout of the payload depends on the header fields.  The
> +  SME_PT_ZA_*() macros are provided to facilitate access to the data.
> +
> +* In either case, for SETREGSET it is permissible to omit the payload, in which
> +  case the vector length and flags are changed and PSTATE.ZA is set to 0
> +  (along with any consequences of those changes).  If a payload is provided
> +  then PSTATE.ZA will be set to 1.
> +
> +* For SETREGSET, if the requested VL is not supported, the effect will be the
> +  same as if the payload were omitted, except that an EIO error is reported.
> +  No attempt is made to translate the payload data to the correct layout
> +  for the vector length actually set.  It is up to the caller to translate the
> +  payload layout for the actual VL and retry.
> +
> +* The effect of writing a partial, incomplete payload is unspecified.
> +
> +
> +8.  ELF coredump extensions
> +---------------------------
> +
> +* NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
> +
> +* A NT_ARM_ZA note will be added to each coredump for each thread of the
> +  dumped process.  The contents will be equivalent to the data that would have
> +  been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
> +  when the coredump was generated.
> +
> +
> +9.  System runtime configuration
> +--------------------------------
> +
> +* To mitigate the ABI impact of expansion of the signal frame, a policy
> +  mechanism is provided for administrators, distro maintainers and developers
> +  to set the default vector length for userspace processes:
> +
> +/proc/sys/abi/sme_default_vector_length
> +
> +    Writing the text representation of an integer to this file sets the system
> +    default vector length to the specified value, unless the value is greater
> +    than the maximum vector length supported by the system in which case the
> +    default vector length is set to that maximum.
> +
> +    The result can be determined by reopening the file and reading its
> +    contents.
> +
> +    At boot, the default vector length is initially set to 32 or the maximum
> +    supported vector length, whichever is smaller and supported.  This
> +    determines the initial vector length of the init process (PID 1).
> +
> +    Reading this file returns the current system default vector length.
> +
> +* At every execve() call, the new vector length of the new process is set to
> +  the system default vector length, unless
> +
> +    * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the
> +      calling thread, or
> +
> +    * a deferred vector length change is pending, established via the
> +      PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC).
> +
> +* Modifying the system default vector length does not affect the vector length
> +  of any existing process or thread that does not make an execve() call.
> +
> +
> +Appendix A.  SME programmer's model (informative)
> +=================================================
> +
> +This section provides a minimal description of the additions made by SVE to the
> +ARMv8-A programmer's model that are relevant to this document.
> +
> +Note: This section is for information only and not intended to be complete or
> +to replace any architectural specification.
> +
> +A.1.  Registers
> +---------------
> +
> +In A64 state, SME adds the following:
> +
> +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE
> +  features are available.  When supported EL0 software may enter and leave
> +  streaming mode at any time.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  streaming mode only when it is actively being used.
> +
> +* A new vector length controlling the size of ZA and the Z registers when in
> +  streaming mode, separately to the vector length used for SVE when not in
> +  streaming mode.  There is no requirement that either the currently selected
> +  vector length or the set of vector lengths supported for the two modes in
> +  a given system have any relationship.  The streaming mode vector length
> +  is referred to as SVL.
> +
> +* A new ZA matrix register.  This is a square matrix of SVLxSVL bits.  Most
> +  operations on ZA require that streaming mode be enabled but ZA can be
> +  enabled without streaming mode in order to load, save and retain data.
> +
> +  For best system performance it is strongly encourage for software to enable
> +  ZA only when it is actively being used.
> +
> +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
> +  SMSTOP instructions or by access to the SVCR system register:
> +
> +  * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid
> +    data while if it is 0 then ZA can not be accessed.  When PSTATE.ZA is
> +    changed from 0 to 1 all bits in ZA are cleared.
> +
> +  * PSTATE.SM, if this is 1 then the PE is in streaming mode.  When the value
> +    of PSTATE.SM is changed then it is implementationd defined if the subset
> +    of the floating point register bits valid in both modes may be retained.
> +    Any other bits will be cleared.
> +
> +
> +References
> +==========
> +
> +[1] arch/arm64/include/uapi/asm/sigcontext.h
> +    AArch64 Linux signal ABI definitions
> +
> +[2] arch/arm64/include/uapi/asm/ptrace.h
> +    AArch64 Linux ptrace ABI definitions
> +
> +[3] Documentation/arm64/cpu-feature-registers.rst
> +
> +[4] ARM IHI0055C
> +    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
> +    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
> +    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
> diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
> index 03137154299e..574ad883b0f3 100644
> --- a/Documentation/arm64/sve.rst
> +++ b/Documentation/arm64/sve.rst
> @@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
>  Date:   4 August 2017
>  
>  This document outlines briefly the interface provided to userspace by Linux in
> -order to support use of the ARM Scalable Vector Extension (SVE).
> +order to support use of the ARM Scalable Vector Extension (SVE), including
> +interactions with Streaming SVE mode added by the Scalable Matrix Extension
> +(SME).
>  
>  This is an outline of the most important features and issues only and not
>  intended to be exhaustive.
> @@ -23,6 +25,9 @@ model features for SVE is included in Appendix A.
>  * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
>    tracked per-thread.
>  
> +* In streaming mode FFR is not accessible, when these interfaces are used to
> +  access streaming mode FFR is read and written as zero.
> +
>  * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
>    AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
>    instructions and registers, and the Linux-specific system interfaces
> @@ -53,10 +58,19 @@ model features for SVE is included in Appendix A.
>    which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
>    cpu-feature-registers.txt for details.
>  
> +* On hardware that supports the SME extensions, HWCAP2_SME will also be
> +  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
> +  streaming mode which provides a subset of the SVE feature set using a
> +  separate SME vector length and the same Z/V registers.  See sme.rst
> +  for more details.
> +
>  * Debuggers should restrict themselves to interacting with the target via the
>    NT_ARM_SVE regset.  The recommended way of detecting support for this regset
>    is to connect to a target process first and then attempt a
> -  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
> +  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
> +  present and streaming SVE mode is in use the FPSIMD subset of registers
> +  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
> +  in the target.
>  
>  * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
>    between userspace and the kernel, the register value is encoded in memory in
> @@ -126,6 +140,11 @@ the SVE instruction set architecture.
>    are only present in fpsimd_context.  For convenience, the content of V0..V31
>    is duplicated between sve_context and fpsimd_context.
>  
> +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
> +  if set indicates that the thread is in streaming mode and the vector length
> +  and register data (if present) describe the streaming SVE data and vector
> +  length.
> +
>  * The signal frame record for SVE always contains basic metadata, in particular
>    the thread's vector length (in sve_context.vl).
>  
> @@ -170,6 +189,11 @@ When returning from a signal handler:
>    the signal frame does not match the current vector length, the signal return
>    attempt is treated as illegal, resulting in a forced SIGSEGV.
>  
> +* It is permitted to enter or leave streaming mode by setting or clearing
> +  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
> +  when doing so sve_context.vl and any register data are appropriate for the
> +  vector length in the new mode.
> +
>  
>  6.  prctl extensions
>  --------------------
> @@ -265,8 +289,14 @@ prctl(PR_SVE_GET_VL)
>  7.  ptrace extensions
>  ---------------------
>  
> -* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
> -  PTRACE_SETREGSET.
> +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
> +  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
> +  streaming mode SVE registers and NT_ARM_SVE describes the
> +  non-streaming mode SVE registers.
> +
> +  In this description a register set is referred to as being "live" when
> +  the target is in the appropriate streaming or non-streaming mode and is
> +  using data beyond the subset shared with the FPSIMD Vn registers.
>  
>    Refer to [2] for definitions.
>  
> @@ -297,7 +327,7 @@ The regset data starts with struct user_sve_header, containing:
>  
>      flags
>  
> -	either
> +	at most one of
>  
>  	    SVE_PT_REGS_FPSIMD
>  
> @@ -331,6 +361,10 @@ The regset data starts with struct user_sve_header, containing:
>  
>  	    SVE_PT_VL_ONEXEC (SETREGSET only).
>  
> +	If neither FPSIMD nor SVE flags are provided then no register
> +	payload is available, this is only possible when SME is implemented.
> +
> +
>  * The effects of changing the vector length and/or flags are equivalent to
>    those documented for PR_SVE_SET_VL.
>  
> @@ -355,17 +389,25 @@ The regset data starts with struct user_sve_header, containing:
>    unspecified.  It is up to the caller to translate the payload layout
>    for the actual VL and retry.
>  
> +* Where SME is implemented it is not possible to GETREGSET the register
> +  state for normal SVE when in streaming mode, nor the streaming mode
> +  register state when in normal mode, regardless of the implementation defined
> +  behaviour of the hardware for sharing data between the two modes.
> +
> +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
> +  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
> +  if the target was not in streaming mode.
> +
>  * The effect of writing a partial, incomplete payload is unspecified.
>  
>  
>  8.  ELF coredump extensions
>  ---------------------------
>  
> -* A NT_ARM_SVE note will be added to each coredump for each thread of the
> -  dumped process.  The contents will be equivalent to the data that would have
> -  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
> -  when the coredump was generated.
> -
> +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
> +  each thread of the dumped process.  The contents will be equivalent to the
> +  data that would have been read if a PTRACE_GETREGSET of the corresponding
> +  type were executed for each thread when the coredump was generated.
>  
>  9.  System runtime configuration
>  --------------------------------


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-08 17:04           ` Mark Brown
@ 2021-10-11 11:15             ` Alan Hayward
  -1 siblings, 0 replies; 143+ messages in thread
From: Alan Hayward @ 2021-10-11 11:15 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd



> On 8 Oct 2021, at 18:04, Mark Brown <broonie@kernel.org> wrote:
> 
> On Fri, Oct 08, 2021 at 04:45:44PM +0000, Alan Hayward wrote:
>>> On 8 Oct 2021, at 16:28, Mark Brown <broonie@kernel.org> wrote:
>>> On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:
> 
>>>> Can NT_ARM_SSVE return a fpsimd?
> 
>>> It's documented that way for simplicity but in the current
>>> implementation it won't ever actually do so in practice.  The
>>> only case where I could see that it might happen would be if we
>>> change the syscalls to stay in streaming mode over syscall, in
>>> that case we could do as we do for SVE and preserve FPSIMD
>>> registers only.  At present we drop out of streaming mode if we
>>> get a syscall with it enabled so it's a non-issue, if people
>>> agree that that's the right thing for the syscalls then we should
>>> update the documentation to specify this since otherwise we'll
>>> doubtless catch someone by surprise if we ever manage to start
>>> doing it in the future.
> 
>> ….or it’ll end up executing some code that was written to cope with fpsimd, but has never been
>> run. Might be worth making it explicit in the documentation.
> 
> I will if/when it gets fixed that way.  Actually, while looking
> at that code I was tempted to remove the support for returning
> FPSIMD only registers via NT_ARM_SVE entirely and just always
> convert to SVE format - I'm not sure what the use case was there?
> It's not a pressing thing but it seemed like it was a bit of an
> implementation detail that we were revealing.


What about in the write registers case?

With the existing code:
Read SVE registers with ptrace. Get FPSIMD. Update FPSIMD with new values. Write back to ptrace.
In that scenario, the internal SVE states stays “inactive” in the kernel.

If ptrace always returned an SVE structure, then writing back with the same structure would cause the
SVE state to become active. It’s a small difference, but we really don’t want the debugger to effect things.

You could solve that by adding an SVE_STATE_INACTIVE to the header flags. But, either way, the
debugger would then need to support the existing method and the new method (because the debugger
has no control on what OS version it’s running on).

I’d recommend leaving as is.


Alan.


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 11:15             ` Alan Hayward
  0 siblings, 0 replies; 143+ messages in thread
From: Alan Hayward @ 2021-10-11 11:15 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd



> On 8 Oct 2021, at 18:04, Mark Brown <broonie@kernel.org> wrote:
> 
> On Fri, Oct 08, 2021 at 04:45:44PM +0000, Alan Hayward wrote:
>>> On 8 Oct 2021, at 16:28, Mark Brown <broonie@kernel.org> wrote:
>>> On Fri, Oct 08, 2021 at 02:11:46PM +0000, Alan Hayward wrote:
> 
>>>> Can NT_ARM_SSVE return a fpsimd?
> 
>>> It's documented that way for simplicity but in the current
>>> implementation it won't ever actually do so in practice.  The
>>> only case where I could see that it might happen would be if we
>>> change the syscalls to stay in streaming mode over syscall, in
>>> that case we could do as we do for SVE and preserve FPSIMD
>>> registers only.  At present we drop out of streaming mode if we
>>> get a syscall with it enabled so it's a non-issue, if people
>>> agree that that's the right thing for the syscalls then we should
>>> update the documentation to specify this since otherwise we'll
>>> doubtless catch someone by surprise if we ever manage to start
>>> doing it in the future.
> 
>> ….or it’ll end up executing some code that was written to cope with fpsimd, but has never been
>> run. Might be worth making it explicit in the documentation.
> 
> I will if/when it gets fixed that way.  Actually, while looking
> at that code I was tempted to remove the support for returning
> FPSIMD only registers via NT_ARM_SVE entirely and just always
> convert to SVE format - I'm not sure what the use case was there?
> It's not a pressing thing but it seemed like it was a bit of an
> implementation detail that we were revealing.


What about in the write registers case?

With the existing code:
Read SVE registers with ptrace. Get FPSIMD. Update FPSIMD with new values. Write back to ptrace.
In that scenario, the internal SVE states stays “inactive” in the kernel.

If ptrace always returned an SVE structure, then writing back with the same structure would cause the
SVE state to become active. It’s a small difference, but we really don’t want the debugger to effect things.

You could solve that by adding an SVE_STATE_INACTIVE to the header flags. But, either way, the
debugger would then need to support the existing method and the new method (because the debugger
has no control on what OS version it’s running on).

I’d recommend leaving as is.


Alan.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-11 11:15             ` Alan Hayward
@ 2021-10-11 11:48               ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 11:48 UTC (permalink / raw)
  To: Alan Hayward
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

On Mon, Oct 11, 2021 at 11:15:09AM +0000, Alan Hayward wrote:
> > On 8 Oct 2021, at 18:04, Mark Brown <broonie@kernel.org> wrote:

> > I will if/when it gets fixed that way.  Actually, while looking
> > at that code I was tempted to remove the support for returning
> > FPSIMD only registers via NT_ARM_SVE entirely and just always
> > convert to SVE format - I'm not sure what the use case was there?
> > It's not a pressing thing but it seemed like it was a bit of an
> > implementation detail that we were revealing.

> What about in the write registers case?

Both register sets accept FPSIMD registers for writes - we can't remove
that for the SVE register set given that it's ABI.

> With the existing code:
> Read SVE registers with ptrace. Get FPSIMD. Update FPSIMD with new values. Write back to ptrace.
> In that scenario, the internal SVE states stays “inactive” in the kernel.

Right.

> If ptrace always returned an SVE structure, then writing back with the same structure would cause the
> SVE state to become active. It’s a small difference, but we really don’t want the debugger to effect things.

With the existing code - with the wider availability of SVE hardware
we'll see soon we might be looking at changing how the kernel handles
enabling and disabling SVE which might mean we want to change how that
looks internally anyway.  Part of what has me thinking about this is
that what we've got at the minute is quite tied to the implementation
and we might want to change that.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 11:48               ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 11:48 UTC (permalink / raw)
  To: Alan Hayward
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Luis Machado, Salil Akerkar, Basant KumarDwivedi, Szabolcs Nagy,
	linux-arm-kernel, linux-kselftest, nd


[-- Attachment #1.1: Type: text/plain, Size: 1535 bytes --]

On Mon, Oct 11, 2021 at 11:15:09AM +0000, Alan Hayward wrote:
> > On 8 Oct 2021, at 18:04, Mark Brown <broonie@kernel.org> wrote:

> > I will if/when it gets fixed that way.  Actually, while looking
> > at that code I was tempted to remove the support for returning
> > FPSIMD only registers via NT_ARM_SVE entirely and just always
> > convert to SVE format - I'm not sure what the use case was there?
> > It's not a pressing thing but it seemed like it was a bit of an
> > implementation detail that we were revealing.

> What about in the write registers case?

Both register sets accept FPSIMD registers for writes - we can't remove
that for the SVE register set given that it's ABI.

> With the existing code:
> Read SVE registers with ptrace. Get FPSIMD. Update FPSIMD with new values. Write back to ptrace.
> In that scenario, the internal SVE states stays “inactive” in the kernel.

Right.

> If ptrace always returned an SVE structure, then writing back with the same structure would cause the
> SVE state to become active. It’s a small difference, but we really don’t want the debugger to effect things.

With the existing code - with the wider availability of SVE hardware
we'll see soon we might be looking at changing how the kernel handles
enabling and disabling SVE which might mean we want to change how that
looks internally anyway.  Part of what has me thinking about this is
that what we've got at the minute is quite tied to the implementation
and we might want to change that.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 21/38] arm64/sme: Implement SVCR context switching
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 12:15     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 12:15 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:27 +0100
Mark Brown <broonie@kernel.org> wrote:

> In SME the use of both streaming SVE mode and ZA are tracked through
> PSTATE.SM and PSTATE.ZA, visible through the system register SVCR.  In
> order to context switch the floating point state for SME we need to
> context switch the contents of this register as part of context
> switching the floating point state.
> 
> Since changing the vector length exits streaming SVE mode and disables
> ZA we also make sure we update SVCR appropraitely when setting vector

appropriately

> length, and similarly ensure that new threads have streaming SVE mode
> and ZA disabled.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
...

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 21/38] arm64/sme: Implement SVCR context switching
@ 2021-10-11 12:15     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 12:15 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:27 +0100
Mark Brown <broonie@kernel.org> wrote:

> In SME the use of both streaming SVE mode and ZA are tracked through
> PSTATE.SM and PSTATE.ZA, visible through the system register SVCR.  In
> order to context switch the floating point state for SME we need to
> context switch the contents of this register as part of context
> switching the floating point state.
> 
> Since changing the vector length exits streaming SVE mode and disables
> ZA we also make sure we update SVCR appropraitely when setting vector

appropriately

> length, and similarly ensure that new threads have streaming SVE mode
> and ZA disabled.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 23/38] arm64/sme: Implement ZA context switching
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 12:27     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 12:27 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:29 +0100
Mark Brown <broonie@kernel.org> wrote:

> Allocate space for storing ZA on first access to SME and use that to save
> and restore ZA state when context switching. We do this by using the vector
> form of the LDR and STR ZA instructions, these do not require streaming
> mode and have implementation recommendations that they avoid contention
> issues in shared SMCU implementations.
> 
> Since ZA is architecturally guaranteed to be zeroed when enabled we do not
> need to explicitly zero ZA, either we will be restoring from a saved copy
> or trapping on first use of SME so we know that ZA must be disabled.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>

sme_alloc() forwards definition should be in the next patch.
> ---
>  arch/arm64/include/asm/fpsimd.h       |  5 ++++-
>  arch/arm64/include/asm/fpsimdmacros.h | 22 ++++++++++++++++++++++
>  arch/arm64/include/asm/processor.h    |  1 +
>  arch/arm64/kernel/entry-fpsimd.S      | 22 ++++++++++++++++++++++
>  arch/arm64/kernel/fpsimd.c            | 16 ++++++++++------
>  arch/arm64/kvm/fpsimd.c               |  2 +-
>  6 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 43737ca91f1a..45f7153067bb 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -47,7 +47,7 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>  
>  extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
>  				     void *sve_state, unsigned int sve_vl,
> -				     unsigned int sme_vl);
> +				     void *za_state, unsigned int sme_vl);
>  
>  extern void fpsimd_flush_task_state(struct task_struct *target);
>  extern void fpsimd_save_and_flush_cpu_state(void);
> @@ -90,6 +90,8 @@ extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>  extern void sve_set_vq(unsigned long vq_minus_1);
>  extern void sme_set_vq(unsigned long vq_minus_1);
> +extern void sme_save_state(void *state, unsigned int vq_minus_1);
> +extern void sme_load_state(void const *state, unsigned int vq_minus_1);
>  
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
> @@ -119,6 +121,7 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
>  extern size_t sve_state_size(struct task_struct const *task);
>  
>  extern void sve_alloc(struct task_struct *task);
> +extern void sme_alloc(struct task_struct *task);

Should be in the next patch where this function is introduced.

>  extern void fpsimd_release_task(struct task_struct *task);
>  extern void fpsimd_sync_to_sve(struct task_struct *task);
>  extern void sve_sync_to_fpsimd(struct task_struct *task);

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 23/38] arm64/sme: Implement ZA context switching
@ 2021-10-11 12:27     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 12:27 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:29 +0100
Mark Brown <broonie@kernel.org> wrote:

> Allocate space for storing ZA on first access to SME and use that to save
> and restore ZA state when context switching. We do this by using the vector
> form of the LDR and STR ZA instructions, these do not require streaming
> mode and have implementation recommendations that they avoid contention
> issues in shared SMCU implementations.
> 
> Since ZA is architecturally guaranteed to be zeroed when enabled we do not
> need to explicitly zero ZA, either we will be restoring from a saved copy
> or trapping on first use of SME so we know that ZA must be disabled.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>

sme_alloc() forwards definition should be in the next patch.
> ---
>  arch/arm64/include/asm/fpsimd.h       |  5 ++++-
>  arch/arm64/include/asm/fpsimdmacros.h | 22 ++++++++++++++++++++++
>  arch/arm64/include/asm/processor.h    |  1 +
>  arch/arm64/kernel/entry-fpsimd.S      | 22 ++++++++++++++++++++++
>  arch/arm64/kernel/fpsimd.c            | 16 ++++++++++------
>  arch/arm64/kvm/fpsimd.c               |  2 +-
>  6 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 43737ca91f1a..45f7153067bb 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -47,7 +47,7 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>  
>  extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
>  				     void *sve_state, unsigned int sve_vl,
> -				     unsigned int sme_vl);
> +				     void *za_state, unsigned int sme_vl);
>  
>  extern void fpsimd_flush_task_state(struct task_struct *target);
>  extern void fpsimd_save_and_flush_cpu_state(void);
> @@ -90,6 +90,8 @@ extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>  extern void sve_set_vq(unsigned long vq_minus_1);
>  extern void sme_set_vq(unsigned long vq_minus_1);
> +extern void sme_save_state(void *state, unsigned int vq_minus_1);
> +extern void sme_load_state(void const *state, unsigned int vq_minus_1);
>  
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
> @@ -119,6 +121,7 @@ static inline unsigned int __bit_to_vq(unsigned int bit)
>  extern size_t sve_state_size(struct task_struct const *task);
>  
>  extern void sve_alloc(struct task_struct *task);
> +extern void sme_alloc(struct task_struct *task);

Should be in the next patch where this function is introduced.

>  extern void fpsimd_release_task(struct task_struct *task);
>  extern void fpsimd_sync_to_sve(struct task_struct *task);
>  extern void sve_sync_to_fpsimd(struct task_struct *task);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 24/38] arm64/sme: Implement traps and syscall handling for SME
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 12:37     ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 12:37 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:30 +0100
Mark Brown <broonie@kernel.org> wrote:

> By default all SME operations in userspace will trap.  When this happens
> we allocate storage space for the SME register state, set up the SVE
> registers and disable traps.  We do not need to initialize ZA since the
> architecture guarantees that it will be zeroed when enabled and when we
> trap ZA is disabled.
> 
> On syscall we exit streaming mode if we were previously in it and ensure
> that all but the lower 128 bits of the registers are zeroed while
> preserving the state of ZA. This follows the aarch64 PCS for SME, ZA
> state is preserved over a function call and streaming mode is exited.
> Since the traps for SME do not distinguish between streaming mode SVE
> and ZA usage if ZA is in use rather than reenabling traps we instead
> zero the parts of the SVE registers not shared with FPSIMD and leave SME
> enabled, this simplifies handling SME traps. If ZA is not in use then we
> reenable SME traps and fall through to normal handling of SVE.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
...  continuing the trivial theme of my review today...


>  
>  	/*
>  	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
> -	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
> -	 * happens if a context switch or kernel_neon_begin() or context
> -	 * modification (sigreturn, ptrace) intervenes.
> -	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
> +	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which
> +	 * only happens if a context switch or kernel_neon_begin() or

Why the rewrap here?

> +	 * context modification (sigreturn, ptrace) intervenes.  So,
> +	 * ensure that CPACR_EL1 is already correct for the fast-path
> +	 * case.
>  	 */
>  	sve_user_disable();
>  }
>  
>  void do_el0_svc(struct pt_regs *regs)
>  {
> -	sve_user_discard();
> +	fp_user_discard();
>  	el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
>  }
>  


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 24/38] arm64/sme: Implement traps and syscall handling for SME
@ 2021-10-11 12:37     ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 12:37 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Thu, 30 Sep 2021 19:11:30 +0100
Mark Brown <broonie@kernel.org> wrote:

> By default all SME operations in userspace will trap.  When this happens
> we allocate storage space for the SME register state, set up the SVE
> registers and disable traps.  We do not need to initialize ZA since the
> architecture guarantees that it will be zeroed when enabled and when we
> trap ZA is disabled.
> 
> On syscall we exit streaming mode if we were previously in it and ensure
> that all but the lower 128 bits of the registers are zeroed while
> preserving the state of ZA. This follows the aarch64 PCS for SME, ZA
> state is preserved over a function call and streaming mode is exited.
> Since the traps for SME do not distinguish between streaming mode SVE
> and ZA usage if ZA is in use rather than reenabling traps we instead
> zero the parts of the SVE registers not shared with FPSIMD and leave SME
> enabled, this simplifies handling SME traps. If ZA is not in use then we
> reenable SME traps and fall through to normal handling of SVE.
> 
> Signed-off-by: Mark Brown <broonie@kernel.org>
...  continuing the trivial theme of my review today...


>  
>  	/*
>  	 * task_fpsimd_load() won't be called to update CPACR_EL1 in
> -	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which only
> -	 * happens if a context switch or kernel_neon_begin() or context
> -	 * modification (sigreturn, ptrace) intervenes.
> -	 * So, ensure that CPACR_EL1 is already correct for the fast-path case.
> +	 * ret_to_user unless TIF_FOREIGN_FPSTATE is still set, which
> +	 * only happens if a context switch or kernel_neon_begin() or

Why the rewrap here?

> +	 * context modification (sigreturn, ptrace) intervenes.  So,
> +	 * ensure that CPACR_EL1 is already correct for the fast-path
> +	 * case.
>  	 */
>  	sve_user_disable();
>  }
>  
>  void do_el0_svc(struct pt_regs *regs)
>  {
> -	sve_user_discard();
> +	fp_user_discard();
>  	el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
>  }
>  


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save()
  2021-10-11  9:39     ` Jonathan Cameron
@ 2021-10-11 13:02       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:02 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

On Mon, Oct 11, 2021 at 10:39:59AM +0100, Jonathan Cameron wrote:

> > +	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> > +	    test_thread_flag(TIF_SVE)) {

> Trivial, but - the above now fits nicely on oneline under 80 chars.

> Looks like you have another instance of this in patch 21 in roughly the same place.

> Then you drop at least some of this code later. hohum, maybe not worth tidying up.

Yes, indeed.  I think I deliberately didn't reindent to help make it a
bit more obviously mechanical given that as you've noticed there's going
to be meaningful changes replacing the actual code later.  Given that
this is at the start of a very large series I'll just leave things as
they are unless Catalin or Will really wants the reformatting here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save()
@ 2021-10-11 13:02       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:02 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest


[-- Attachment #1.1: Type: text/plain, Size: 745 bytes --]

On Mon, Oct 11, 2021 at 10:39:59AM +0100, Jonathan Cameron wrote:

> > +	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
> > +	    test_thread_flag(TIF_SVE)) {

> Trivial, but - the above now fits nicely on oneline under 80 chars.

> Looks like you have another instance of this in patch 21 in roughly the same place.

> Then you drop at least some of this code later. hohum, maybe not worth tidying up.

Yes, indeed.  I think I deliberately didn't reindent to help make it a
bit more obviously mechanical given that as you've noticed there's going
to be meaningful changes replacing the actual code later.  Given that
this is at the start of a very large series I'll just leave things as
they are unless Catalin or Will really wants the reformatting here.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
  2021-10-11 10:20     ` Jonathan Cameron
@ 2021-10-11 13:14       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:14 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

On Mon, Oct 11, 2021 at 11:20:57AM +0100, Jonathan Cameron wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > As for SVE we will track a per task SME vector length for tasks. Convert
> > the existing storage for the vector length into an array and update
> > fpsimd_flush_task() to initialise this in a function.

> I'm clearly having a trivial comment day.  Given reduction in indenting
> it would be nice perhaps to reformat comments to take that into account.

Again I'll have done this to make it clearer that things are just being
moved about - as a reviewer I do find things like that very helpful.

> I'm also unconvinced the trivial wrappers are worthwhile.  (maybe you drop
> those later?)

They do hide what we're doing from the rest of the series which makes
the whole thing a lot easier to work with, especially if we change what
data structures we use or there's some debate as to the names of the
constants.  A bunch of them directly map onto existing trivial wrappers
too, there's been some stylistic preference for that.

If people want to remove the wrappers I'd propose leaving them for now
then adding a patch afterwards which removes them, or at least waiting
until we've got very firm agreement from Catalin and Will on the data
structures and constants.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
@ 2021-10-11 13:14       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:14 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest


[-- Attachment #1.1: Type: text/plain, Size: 1278 bytes --]

On Mon, Oct 11, 2021 at 11:20:57AM +0100, Jonathan Cameron wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > As for SVE we will track a per task SME vector length for tasks. Convert
> > the existing storage for the vector length into an array and update
> > fpsimd_flush_task() to initialise this in a function.

> I'm clearly having a trivial comment day.  Given reduction in indenting
> it would be nice perhaps to reformat comments to take that into account.

Again I'll have done this to make it clearer that things are just being
moved about - as a reviewer I do find things like that very helpful.

> I'm also unconvinced the trivial wrappers are worthwhile.  (maybe you drop
> those later?)

They do hide what we're doing from the rest of the series which makes
the whole thing a lot easier to work with, especially if we change what
data structures we use or there's some debate as to the names of the
constants.  A bunch of them directly map onto existing trivial wrappers
too, there's been some stylistic preference for that.

If people want to remove the wrappers I'd propose leaving them for now
then adding a patch afterwards which removes them, or at least waiting
until we've got very firm agreement from Catalin and Will on the data
structures and constants.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-09-30 18:11   ` Mark Brown
@ 2021-10-11 13:17     ` Szabolcs Nagy
  -1 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-11 13:17 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest

The 09/30/2021 19:11, Mark Brown wrote:
> --- /dev/null
> +++ b/Documentation/arm64/sme.rst
> @@ -0,0 +1,427 @@
> +===================================================
> +Scalable Matrix Extension support for AArch64 Linux
> +===================================================
> +
> +This document outlines briefly the interface provided to userspace by Linux in
> +order to support use of the ARM Scalable Matrix Extension (SME).
> +
> +This is an outline of the most important features and issues only and not
> +intended to be exhaustive.  It should be read in conjunction with the SVE
> +documentation in sve.rst which provides details on the Streaming SVE mode
> +included in SME.
> +
> +This document does not aim to describe the SME architecture or programmer's
> +model.  To aid understanding, a minimal description of relevant programmer's
> +model features for SME is included in Appendix A.
> +
> +
> +1.  General
> +-----------
> +
> +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> +  register state are tracked per thread.

can you add a note that there is a new TPIDR2_EL0
per thread and on thread creation it's 0 (and i
guess unchanged on fork).

at least this abi makes most sense to me.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 13:17     ` Szabolcs Nagy
  0 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-11 13:17 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest

The 09/30/2021 19:11, Mark Brown wrote:
> --- /dev/null
> +++ b/Documentation/arm64/sme.rst
> @@ -0,0 +1,427 @@
> +===================================================
> +Scalable Matrix Extension support for AArch64 Linux
> +===================================================
> +
> +This document outlines briefly the interface provided to userspace by Linux in
> +order to support use of the ARM Scalable Matrix Extension (SME).
> +
> +This is an outline of the most important features and issues only and not
> +intended to be exhaustive.  It should be read in conjunction with the SVE
> +documentation in sve.rst which provides details on the Streaming SVE mode
> +included in SME.
> +
> +This document does not aim to describe the SME architecture or programmer's
> +model.  To aid understanding, a minimal description of relevant programmer's
> +model features for SME is included in Appendix A.
> +
> +
> +1.  General
> +-----------
> +
> +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> +  register state are tracked per thread.

can you add a note that there is a new TPIDR2_EL0
per thread and on thread creation it's 0 (and i
guess unchanged on fork).

at least this abi makes most sense to me.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
  2021-10-11 13:14       ` Mark Brown
@ 2021-10-11 13:18         ` Jonathan Cameron
  -1 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 13:18 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Mon, 11 Oct 2021 14:14:16 +0100
Mark Brown <broonie@kernel.org> wrote:

> On Mon, Oct 11, 2021 at 11:20:57AM +0100, Jonathan Cameron wrote:
> > Mark Brown <broonie@kernel.org> wrote:  
> 
> > > As for SVE we will track a per task SME vector length for tasks. Convert
> > > the existing storage for the vector length into an array and update
> > > fpsimd_flush_task() to initialise this in a function.  
> 
> > I'm clearly having a trivial comment day.  Given reduction in indenting
> > it would be nice perhaps to reformat comments to take that into account.  
> 
> Again I'll have done this to make it clearer that things are just being
> moved about - as a reviewer I do find things like that very helpful.
> 
> > I'm also unconvinced the trivial wrappers are worthwhile.  (maybe you drop
> > those later?)  
> 
> They do hide what we're doing from the rest of the series which makes
> the whole thing a lot easier to work with, especially if we change what
> data structures we use or there's some debate as to the names of the
> constants.  A bunch of them directly map onto existing trivial wrappers
> too, there's been some stylistic preference for that.
> 
> If people want to remove the wrappers I'd propose leaving them for now
> then adding a patch afterwards which removes them, or at least waiting
> until we've got very firm agreement from Catalin and Will on the data
> structures and constants.
> 

Makes sense.

J

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array
@ 2021-10-11 13:18         ` Jonathan Cameron
  0 siblings, 0 replies; 143+ messages in thread
From: Jonathan Cameron @ 2021-10-11 13:18 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

On Mon, 11 Oct 2021 14:14:16 +0100
Mark Brown <broonie@kernel.org> wrote:

> On Mon, Oct 11, 2021 at 11:20:57AM +0100, Jonathan Cameron wrote:
> > Mark Brown <broonie@kernel.org> wrote:  
> 
> > > As for SVE we will track a per task SME vector length for tasks. Convert
> > > the existing storage for the vector length into an array and update
> > > fpsimd_flush_task() to initialise this in a function.  
> 
> > I'm clearly having a trivial comment day.  Given reduction in indenting
> > it would be nice perhaps to reformat comments to take that into account.  
> 
> Again I'll have done this to make it clearer that things are just being
> moved about - as a reviewer I do find things like that very helpful.
> 
> > I'm also unconvinced the trivial wrappers are worthwhile.  (maybe you drop
> > those later?)  
> 
> They do hide what we're doing from the rest of the series which makes
> the whole thing a lot easier to work with, especially if we change what
> data structures we use or there's some debate as to the names of the
> constants.  A bunch of them directly map onto existing trivial wrappers
> too, there's been some stylistic preference for that.
> 
> If people want to remove the wrappers I'd propose leaving them for now
> then adding a patch afterwards which removes them, or at least waiting
> until we've got very firm agreement from Catalin and Will on the data
> structures and constants.
> 

Makes sense.

J

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME
  2021-10-11 10:27     ` Jonathan Cameron
@ 2021-10-11 13:18       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Mon, Oct 11, 2021 at 11:27:50AM +0100, Jonathan Cameron wrote:
> Mark Brown <broonie@kernel.org> wrote:

> >  	/*
> > -	 * Clamp to the maximum vector length that VL-agnostic SVE code can
> > -	 * work with.  A flag may be assigned in the future to allow setting
> > -	 * of larger vector lengths without confusing older software.
> > +	 * Clamp to the maximum vector length that VL-agnostic code
> > +	 * can work with.  A flag may be assigned in the future to
> > +	 * allow setting of larger vector lengths without confusing
> > +	 * older software.

> Why the oddly short wrapping at sub 70 chars?

This is the default for my editor in Linux kernel style, I guess among
other things it keeps things readable even when they're quoted several
times during review.  If there is a particular magic number that Catalin
and Will have in mind I can go through and reflow to that.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME
@ 2021-10-11 13:18       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest


[-- Attachment #1.1: Type: text/plain, Size: 880 bytes --]

On Mon, Oct 11, 2021 at 11:27:50AM +0100, Jonathan Cameron wrote:
> Mark Brown <broonie@kernel.org> wrote:

> >  	/*
> > -	 * Clamp to the maximum vector length that VL-agnostic SVE code can
> > -	 * work with.  A flag may be assigned in the future to allow setting
> > -	 * of larger vector lengths without confusing older software.
> > +	 * Clamp to the maximum vector length that VL-agnostic code
> > +	 * can work with.  A flag may be assigned in the future to
> > +	 * allow setting of larger vector lengths without confusing
> > +	 * older software.

> Why the oddly short wrapping at sub 70 chars?

This is the default for my editor in Linux kernel style, I guess among
other things it keeps things readable even when they're quoted several
times during review.  If there is a particular magic number that Catalin
and Will have in mind I can go through and reflow to that.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-11 11:05     ` Jonathan Cameron
@ 2021-10-11 13:20       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:20 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

On Mon, Oct 11, 2021 at 12:05:07PM +0100, Jonathan Cameron wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > Signed-off-by: Mark Brown <broonie@kernel.org>

> A few more editorial bits to add to what Alan pointed out.

I think I updated everything, thanks.  Please delete unneeded context
from mails when replying.  Doing this makes it much easier to find your
reply in the message, helping ensure it won't be missed by people
scrolling through the irrelevant quoted material.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 13:20       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:20 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	Szabolcs Nagy, linux-arm-kernel, linux-kselftest


[-- Attachment #1.1: Type: text/plain, Size: 491 bytes --]

On Mon, Oct 11, 2021 at 12:05:07PM +0100, Jonathan Cameron wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > Signed-off-by: Mark Brown <broonie@kernel.org>

> A few more editorial bits to add to what Alan pointed out.

I think I updated everything, thanks.  Please delete unneeded context
from mails when replying.  Doing this makes it much easier to find your
reply in the message, helping ensure it won't be missed by people
scrolling through the irrelevant quoted material.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-11 13:17     ` Szabolcs Nagy
@ 2021-10-11 13:23       ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:23 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

On Mon, Oct 11, 2021 at 02:17:30PM +0100, Szabolcs Nagy wrote:

> > +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> > +  register state are tracked per thread.

> can you add a note that there is a new TPIDR2_EL0
> per thread and on thread creation it's 0 (and i
> guess unchanged on fork).

Sure.  It's actually reset to 0 on fork() - would that be a problem?

> at least this abi makes most sense to me.

OK, thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 13:23       ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 13:23 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest


[-- Attachment #1.1: Type: text/plain, Size: 445 bytes --]

On Mon, Oct 11, 2021 at 02:17:30PM +0100, Szabolcs Nagy wrote:

> > +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> > +  register state are tracked per thread.

> can you add a note that there is a new TPIDR2_EL0
> per thread and on thread creation it's 0 (and i
> guess unchanged on fork).

Sure.  It's actually reset to 0 on fork() - would that be a problem?

> at least this abi makes most sense to me.

OK, thanks.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-11 13:23       ` Mark Brown
@ 2021-10-11 14:19         ` Szabolcs Nagy
  -1 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-11 14:19 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

The 10/11/2021 14:23, Mark Brown wrote:
> On Mon, Oct 11, 2021 at 02:17:30PM +0100, Szabolcs Nagy wrote:
> 
> > > +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> > > +  register state are tracked per thread.
> 
> > can you add a note that there is a new TPIDR2_EL0
> > per thread and on thread creation it's 0 (and i
> > guess unchanged on fork).
> 
> Sure.  It's actually reset to 0 on fork() - would that be a problem?

yes, i think fork should not change it. (just like the
normal thread pointer is unchanged in the child process)

tpidr2 is used in the sme pcs by the lazy za save scheme.
(cc += Richard Sandiford, who wrote the specs)

if fork resets tpidr2 then returning to a stack frame
that set tpidr2 could cause clobbered za state.
(userspace can fix this up, but if we want this case
to work then it's better to fix it on the kernel side)


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 14:19         ` Szabolcs Nagy
  0 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-11 14:19 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

The 10/11/2021 14:23, Mark Brown wrote:
> On Mon, Oct 11, 2021 at 02:17:30PM +0100, Szabolcs Nagy wrote:
> 
> > > +* PSTATE.SM and PSTATE.ZA, the streaming mode vector length and the ZA
> > > +  register state are tracked per thread.
> 
> > can you add a note that there is a new TPIDR2_EL0
> > per thread and on thread creation it's 0 (and i
> > guess unchanged on fork).
> 
> Sure.  It's actually reset to 0 on fork() - would that be a problem?

yes, i think fork should not change it. (just like the
normal thread pointer is unchanged in the child process)

tpidr2 is used in the sme pcs by the lazy za save scheme.
(cc += Richard Sandiford, who wrote the specs)

if fork resets tpidr2 then returning to a stack frame
that set tpidr2 could cause clobbered za state.
(userspace can fix this up, but if we want this case
to work then it's better to fix it on the kernel side)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-11 14:19         ` Szabolcs Nagy
@ 2021-10-11 20:10           ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 20:10 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 394 bytes --]

On Mon, Oct 11, 2021 at 03:19:37PM +0100, Szabolcs Nagy wrote:

> if fork resets tpidr2 then returning to a stack frame
> that set tpidr2 could cause clobbered za state.
> (userspace can fix this up, but if we want this case
> to work then it's better to fix it on the kernel side)

OK, that makes sense.  I've changed the code and the kselftest so that
TPIDR2 is preserved on thread creation.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-11 20:10           ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-11 20:10 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford


[-- Attachment #1.1: Type: text/plain, Size: 394 bytes --]

On Mon, Oct 11, 2021 at 03:19:37PM +0100, Szabolcs Nagy wrote:

> if fork resets tpidr2 then returning to a stack frame
> that set tpidr2 could cause clobbered za state.
> (userspace can fix this up, but if we want this case
> to work then it's better to fix it on the kernel side)

OK, that makes sense.  I've changed the code and the kselftest so that
TPIDR2 is preserved on thread creation.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-11 20:10           ` Mark Brown
@ 2021-10-12  8:23             ` Szabolcs Nagy
  -1 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-12  8:23 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

The 10/11/2021 21:10, Mark Brown wrote:
> On Mon, Oct 11, 2021 at 03:19:37PM +0100, Szabolcs Nagy wrote:
> 
> > if fork resets tpidr2 then returning to a stack frame
> > that set tpidr2 could cause clobbered za state.
> > (userspace can fix this up, but if we want this case
> > to work then it's better to fix it on the kernel side)
> 
> OK, that makes sense.  I've changed the code and the kselftest so that
> TPIDR2 is preserved on thread creation.

does thread creation have to work the same way as fork?

(in a pthread_create child we want tpidr2 to be 0,
since it represents thread specific data. in a fork
child we want to preserve tpidr2 to mirror the
state of the parent as much as possible)

for sme pcs to work, the libc has to be updated anyway,
so we can change the clone child to set tpidr2. however
that's not atomic wrt thread creation. i don't think we
aim to support async-signal-safe za usage in the pcs so
this is not a problem, but if the kernel did the work
then the situation would be clearer in userspace.

i'm not sure when to do tpidr2=0 exactly, but something
like CLONE_SETTLS is set or child runs on a new stack
would work for me. if that's too ugly then preserving
tpidr2 in the child is fine.


^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-12  8:23             ` Szabolcs Nagy
  0 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-12  8:23 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

The 10/11/2021 21:10, Mark Brown wrote:
> On Mon, Oct 11, 2021 at 03:19:37PM +0100, Szabolcs Nagy wrote:
> 
> > if fork resets tpidr2 then returning to a stack frame
> > that set tpidr2 could cause clobbered za state.
> > (userspace can fix this up, but if we want this case
> > to work then it's better to fix it on the kernel side)
> 
> OK, that makes sense.  I've changed the code and the kselftest so that
> TPIDR2 is preserved on thread creation.

does thread creation have to work the same way as fork?

(in a pthread_create child we want tpidr2 to be 0,
since it represents thread specific data. in a fork
child we want to preserve tpidr2 to mirror the
state of the parent as much as possible)

for sme pcs to work, the libc has to be updated anyway,
so we can change the clone child to set tpidr2. however
that's not atomic wrt thread creation. i don't think we
aim to support async-signal-safe za usage in the pcs so
this is not a problem, but if the kernel did the work
then the situation would be clearer in userspace.

i'm not sure when to do tpidr2=0 exactly, but something
like CLONE_SETTLS is set or child runs on a new stack
would work for me. if that's too ugly then preserving
tpidr2 in the child is fine.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-12  8:23             ` Szabolcs Nagy
@ 2021-10-13 18:37               ` Mark Brown
  -1 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-13 18:37 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

On Tue, Oct 12, 2021 at 09:23:21AM +0100, Szabolcs Nagy wrote:
> The 10/11/2021 21:10, Mark Brown wrote:

> > OK, that makes sense.  I've changed the code and the kselftest so that
> > TPIDR2 is preserved on thread creation.

> does thread creation have to work the same way as fork?

> (in a pthread_create child we want tpidr2 to be 0,
> since it represents thread specific data. in a fork
> child we want to preserve tpidr2 to mirror the
> state of the parent as much as possible)

...

> i'm not sure when to do tpidr2=0 exactly, but something
> like CLONE_SETTLS is set or child runs on a new stack
> would work for me. if that's too ugly then preserving
> tpidr2 in the child is fine.

Resetting it on CLONE_SETTLS is straightforward to implement so if that
works for you it sounds good to me, I've got it implemented locally
already with a test case.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-13 18:37               ` Mark Brown
  0 siblings, 0 replies; 143+ messages in thread
From: Mark Brown @ 2021-10-13 18:37 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford


[-- Attachment #1.1: Type: text/plain, Size: 858 bytes --]

On Tue, Oct 12, 2021 at 09:23:21AM +0100, Szabolcs Nagy wrote:
> The 10/11/2021 21:10, Mark Brown wrote:

> > OK, that makes sense.  I've changed the code and the kselftest so that
> > TPIDR2 is preserved on thread creation.

> does thread creation have to work the same way as fork?

> (in a pthread_create child we want tpidr2 to be 0,
> since it represents thread specific data. in a fork
> child we want to preserve tpidr2 to mirror the
> state of the parent as much as possible)

...

> i'm not sure when to do tpidr2=0 exactly, but something
> like CLONE_SETTLS is set or child runs on a new stack
> would work for me. if that's too ugly then preserving
> tpidr2 in the child is fine.

Resetting it on CLONE_SETTLS is straightforward to implement so if that
works for you it sounds good to me, I've got it implemented locally
already with a test case.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
  2021-10-13 18:37               ` Mark Brown
@ 2021-10-14  9:57                 ` Szabolcs Nagy
  -1 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-14  9:57 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

The 10/13/2021 19:37, Mark Brown wrote:
> On Tue, Oct 12, 2021 at 09:23:21AM +0100, Szabolcs Nagy wrote:
> > The 10/11/2021 21:10, Mark Brown wrote:
> 
> > > OK, that makes sense.  I've changed the code and the kselftest so that
> > > TPIDR2 is preserved on thread creation.
> 
> > does thread creation have to work the same way as fork?
> 
> > (in a pthread_create child we want tpidr2 to be 0,
> > since it represents thread specific data. in a fork
> > child we want to preserve tpidr2 to mirror the
> > state of the parent as much as possible)
> 
> ...
> 
> > i'm not sure when to do tpidr2=0 exactly, but something
> > like CLONE_SETTLS is set or child runs on a new stack
> > would work for me. if that's too ugly then preserving
> > tpidr2 in the child is fine.
> 
> Resetting it on CLONE_SETTLS is straightforward to implement so if that
> works for you it sounds good to me, I've got it implemented locally
> already with a test case.

thanks, i think that makes sense and works for thread implementations.

^ permalink raw reply	[flat|nested] 143+ messages in thread

* Re: [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME
@ 2021-10-14  9:57                 ` Szabolcs Nagy
  0 siblings, 0 replies; 143+ messages in thread
From: Szabolcs Nagy @ 2021-10-14  9:57 UTC (permalink / raw)
  To: Mark Brown
  Cc: Catalin Marinas, Will Deacon, Shuah Khan, Shuah Khan,
	Alan Hayward, Luis Machado, Salil Akerkar, Basant Kumar Dwivedi,
	linux-arm-kernel, linux-kselftest, Richard Sandiford

The 10/13/2021 19:37, Mark Brown wrote:
> On Tue, Oct 12, 2021 at 09:23:21AM +0100, Szabolcs Nagy wrote:
> > The 10/11/2021 21:10, Mark Brown wrote:
> 
> > > OK, that makes sense.  I've changed the code and the kselftest so that
> > > TPIDR2 is preserved on thread creation.
> 
> > does thread creation have to work the same way as fork?
> 
> > (in a pthread_create child we want tpidr2 to be 0,
> > since it represents thread specific data. in a fork
> > child we want to preserve tpidr2 to mirror the
> > state of the parent as much as possible)
> 
> ...
> 
> > i'm not sure when to do tpidr2=0 exactly, but something
> > like CLONE_SETTLS is set or child runs on a new stack
> > would work for me. if that's too ugly then preserving
> > tpidr2 in the child is fine.
> 
> Resetting it on CLONE_SETTLS is straightforward to implement so if that
> works for you it sounds good to me, I've got it implemented locally
> already with a test case.

thanks, i think that makes sense and works for thread implementations.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 143+ messages in thread

end of thread, other threads:[~2021-10-14  9:59 UTC | newest]

Thread overview: 143+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 18:11 [PATCH v1 00/38] arm64/sme: Initial support for the Scalable Matrix Extension Mark Brown
2021-09-30 18:11 ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 01/38] arm64/fp: Reindent fpsimd_save() Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-11  9:39   ` Jonathan Cameron
2021-10-11  9:39     ` Jonathan Cameron
2021-10-11 13:02     ` Mark Brown
2021-10-11 13:02       ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 02/38] arm64/sve: Remove sve_load_from_fpsimd_state() Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 03/38] arm64/sve: Make access to FFR optional Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 04/38] arm64/sve: Rename find_supported_vector_length() Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 05/38] arm64/sve: Use accessor functions for vector lengths in thread_struct Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 06/38] arm64/sve: Put system wide vector length information into structs Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-01  3:13   ` kernel test robot
2021-10-01  3:13     ` kernel test robot
2021-10-01  3:13     ` kernel test robot
2021-09-30 18:11 ` [PATCH v1 07/38] arm64/sve: Explicitly load vector length when restoring SVE state Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 08/38] arm64/sve: Track vector lengths for tasks in an array Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-11 10:20   ` Jonathan Cameron
2021-10-11 10:20     ` Jonathan Cameron
2021-10-11 13:14     ` Mark Brown
2021-10-11 13:14       ` Mark Brown
2021-10-11 13:18       ` Jonathan Cameron
2021-10-11 13:18         ` Jonathan Cameron
2021-09-30 18:11 ` [PATCH v1 09/38] arm64/sve: Make sysctl interface for SVE reusable by SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 10/38] arm64/sve: Generalise vector length configuration prctl() for SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-11 10:27   ` Jonathan Cameron
2021-10-11 10:27     ` Jonathan Cameron
2021-10-11 13:18     ` Mark Brown
2021-10-11 13:18       ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 11/38] selftests: arm64: Parameterise ptrace vector length information Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 12/38] arm64/sme: Provide ABI documentation for SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-08 14:11   ` Alan Hayward
2021-10-08 14:11     ` Alan Hayward
2021-10-08 15:28     ` Mark Brown
2021-10-08 15:28       ` Mark Brown
2021-10-08 16:45       ` Alan Hayward
2021-10-08 16:45         ` Alan Hayward
2021-10-08 17:04         ` Mark Brown
2021-10-08 17:04           ` Mark Brown
2021-10-11 11:15           ` Alan Hayward
2021-10-11 11:15             ` Alan Hayward
2021-10-11 11:48             ` Mark Brown
2021-10-11 11:48               ` Mark Brown
2021-10-11 11:05   ` Jonathan Cameron
2021-10-11 11:05     ` Jonathan Cameron
2021-10-11 13:20     ` Mark Brown
2021-10-11 13:20       ` Mark Brown
2021-10-11 13:17   ` Szabolcs Nagy
2021-10-11 13:17     ` Szabolcs Nagy
2021-10-11 13:23     ` Mark Brown
2021-10-11 13:23       ` Mark Brown
2021-10-11 14:19       ` Szabolcs Nagy
2021-10-11 14:19         ` Szabolcs Nagy
2021-10-11 20:10         ` Mark Brown
2021-10-11 20:10           ` Mark Brown
2021-10-12  8:23           ` Szabolcs Nagy
2021-10-12  8:23             ` Szabolcs Nagy
2021-10-13 18:37             ` Mark Brown
2021-10-13 18:37               ` Mark Brown
2021-10-14  9:57               ` Szabolcs Nagy
2021-10-14  9:57                 ` Szabolcs Nagy
2021-09-30 18:11 ` [PATCH v1 13/38] arm64/sme: System register and exception syndrome definitions Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 14/38] arm64/sme: Define macros for manually encoding SME instructions Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 15/38] arm64/sme: Early CPU setup for SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 16/38] arm64/sme: Basic enumeration support Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 17/38] arm64/sme: Identify supported SME vector lengths at boot Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 18/38] arm64/sme: Implement sysctl to set the default vector length Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 19/38] arm64/sme: Implement vector length configuration prctl()s Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-01  5:20   ` kernel test robot
2021-10-01  5:20     ` kernel test robot
2021-10-01  5:20     ` kernel test robot
2021-10-01 12:40     ` Mark Brown
2021-10-01 12:40       ` Mark Brown
2021-10-01 12:40       ` Mark Brown
2021-10-08  1:32       ` [kbuild-all] " Chen, Rong A
2021-10-08  1:32         ` Chen, Rong A
2021-10-08  1:32         ` [kbuild-all] " Chen, Rong A
2021-10-01 16:38   ` kernel test robot
2021-10-01 16:38     ` kernel test robot
2021-10-01 16:38     ` kernel test robot
2021-09-30 18:11 ` [PATCH v1 20/38] arm64/sme: Implement support for TPIDR2 Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 21/38] arm64/sme: Implement SVCR context switching Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-11 12:15   ` Jonathan Cameron
2021-10-11 12:15     ` Jonathan Cameron
2021-09-30 18:11 ` [PATCH v1 22/38] arm64/sme: Implement streaming SVE " Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 23/38] arm64/sme: Implement ZA " Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-11 12:27   ` Jonathan Cameron
2021-10-11 12:27     ` Jonathan Cameron
2021-09-30 18:11 ` [PATCH v1 24/38] arm64/sme: Implement traps and syscall handling for SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-10-11 12:37   ` Jonathan Cameron
2021-10-11 12:37     ` Jonathan Cameron
2021-09-30 18:11 ` [PATCH v1 25/38] arm64/sme: Implement streaming SVE signal handling Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 26/38] arm64/sme: Implement ZA " Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 27/38] arm64/sme: Implement ptrace support for streaming mode SVE registers Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 28/38] arm64/sme: Add ptrace support for ZA Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 29/38] arm64/sme: Disable streaming mode and ZA when flushing CPU state Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 30/38] arm64/sme: Save and restore streaming mode over EFI runtime calls Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 31/38] arm64/sme: Provide Kconfig for SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 32/38] kselftest/arm64: Add tests for TPIDR2 Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 33/38] kselftest/arm64: Extend vector configuration API tests to cover SME Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 34/38] kselftest/arm64: sme: Provide streaming mode SVE stress test Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 35/38] kselftest/arm64: Add stress test for SME ZA context switching Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 36/38] kselftest/arm64: signal: Add SME signal handling tests Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 37/38] selftests: arm64: Add streaming SVE to SVE ptrace tests Mark Brown
2021-09-30 18:11   ` Mark Brown
2021-09-30 18:11 ` [PATCH v1 38/38] selftests: arm64: Add coverage for the ZA ptrace interface Mark Brown
2021-09-30 18:11   ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.