All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] arm64/sve: Add userspace vector length control API
@ 2017-01-12 11:25 ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Yao Qi, Szabolcs Nagy, Joseph Myers,
	Alan Hayward, Torvald Riegel, Christoffer Dall

This is an add-on series for the Scalable Vector Extension (SVE) core
patches [1], adding an interface to allow userspace tasks to control
what vector length they use for SVE instructions.

(For an architectural overview of SVE, and an explanation of what a
"vector length" is, see [2].)

The amount of SVE register state depends on the vector length, so using
large vector lengths with SVE can requre the userspace signal frame to
grow.  This leads to some ABI impacts in common with other arches that
may have to grow their signal frame.


In this series I do not dictate any particular policy for the SVE vector
length: the kernel provides a default, but userspace can set the vector
length as it likes, provided the hardware supports the chosen length.

For example, userspace may pick a length that the images being loaded
have been validated against or optimised for.


Overview
========

The API proposed here consists of prctl() calls to allow a task to
set/query its vector length and related control flags, and corresponding
ptrace extensions to allow a debugger to do the same for a traced task:

prctl():
 * PR_SVE_SET_VL
 * PR_SVE_GET_VL

ptrace() NT_ARM_SVE regset:
 * user_sve_header.flags & SVE_PT_VL_*
 * user_sve_header.vl

(I follow the existing convention of not including the arch name in
prctl names).

The context switch logic is also extended to set the correct vector
length when scheduling a task in, since the vector length may now differ
between tasks.


The expected users of this API are libc startup, dynamic linker and
runtime environment plumbing code.  I don't expect ordinary user code to
change its own vector length on-the-fly, partly because it is generally
The Wrong Thing To Do with respect to the SVE programming model, and
partly because of ABI subtleties which make it difficult to do this
correctly.


ABI Impact
==========

The current arm64 signal frame size is not sufficient to save all SVE
state for larger vector lengths.

This won't affect existing binary distros, since the signal frame is not
extended unless some SVE instructions are executed by the user task.

However, non-SVE code executing in the same processes as SVE-aware code
may start to see the kernel using more than MINSIGSTKSZ bytes of stack
to deliver a signal, which may lead to stack overruns.  Other ABI
breakages are also possible if we were to simply increase the
MINSIGSTKSZ #define.  SVE aware code will need to move to a new
mechanism to discover the signal frame size: perhaps a new prctl() (not
implemented in this series).

As a temporary workaround, I added a Kconfig option in [1] to clamp the
vector length to a safe maximum that hides this effect, but this was
only intended as a short-term kludge.

This series removes the Kconfig kludge and introduces a new runtime
mechanism:

# echo <vector length in bytes> >/proc/cpu/sve_default_vector_length

will now set the default vector length for newly-exec'd processes.  This
is initialised to the ABI-safe value 512 at boot (or the maximum value
supported by the hardware, if smaller).  Administrators / distro
maintainers / developers can set this to something different in boot
scripts if they are comfortable doing so, or to see what happens.

We _could_ increase the kernel default in the future when and if we are
satisfied that the change is sufficiently low-impact.


User tasks can always override the default via prctl(): the logic is
that non-SVE-aware code doesn't know how to change the vector length,
and so won't do that anyway.  SVE-aware code is presumed to understand
the consequences.

The vector length can be made inheritable (allowing implementation of
taskset-like tools, or running a testsuite with a particular vector
length) or not (for general-purpose processes; in which case the vector
length is reset to the default across exec).


[1] arm64: Scalable Vector Extension core support
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470507.html

[2] Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A architecture
https://www.community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

Note: The size of an SVE vector register (the "vector length") is
choosable per hardware implementation, and the ISA allows software to be
coded independently of the actual vector length in use.  The vector
length can also be selected explicitly by software within the limits
supported by the hardware -- this is expected to be useful in some
situations.  This series exposes this control to userspace.


Dave Martin (10):
  prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls
  arm64/sve: Track vector length for each task
  arm64/sve: Set CPU vector length to match current task
  arm64/sve: Factor out clearing of tasks' SVE regs
  arm64/sve: Wire up vector length control prctl() calls
  arm64/sve: Disallow VL setting for individual threads by default
  arm64/sve: Add vector length inheritance control
  arm64/sve: ptrace: Wire up vector length control and reporting
  arm64/sve: Enable default vector length control via procfs
  Revert "arm64/sve: Limit vector length to 512 bits by default"

 arch/arm64/Kconfig                    |  35 ----
 arch/arm64/include/asm/fpsimd.h       |  25 ++-
 arch/arm64/include/asm/fpsimdmacros.h |   7 +-
 arch/arm64/include/asm/processor.h    |  12 ++
 arch/arm64/include/uapi/asm/ptrace.h  |   5 +
 arch/arm64/kernel/entry-fpsimd.S      |   2 +-
 arch/arm64/kernel/fpsimd.c            | 334 +++++++++++++++++++++++++++++++---
 arch/arm64/kernel/ptrace.c            |  27 +--
 arch/arm64/kernel/signal.c            |  15 +-
 arch/arm64/mm/proc.S                  |   5 -
 include/uapi/linux/prctl.h            |  10 +
 kernel/sys.c                          |  12 ++
 12 files changed, 407 insertions(+), 82 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 00/10] arm64/sve: Add userspace vector length control API
@ 2017-01-12 11:25 ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This is an add-on series for the Scalable Vector Extension (SVE) core
patches [1], adding an interface to allow userspace tasks to control
what vector length they use for SVE instructions.

(For an architectural overview of SVE, and an explanation of what a
"vector length" is, see [2].)

The amount of SVE register state depends on the vector length, so using
large vector lengths with SVE can requre the userspace signal frame to
grow.  This leads to some ABI impacts in common with other arches that
may have to grow their signal frame.


In this series I do not dictate any particular policy for the SVE vector
length: the kernel provides a default, but userspace can set the vector
length as it likes, provided the hardware supports the chosen length.

For example, userspace may pick a length that the images being loaded
have been validated against or optimised for.


Overview
========

The API proposed here consists of prctl() calls to allow a task to
set/query its vector length and related control flags, and corresponding
ptrace extensions to allow a debugger to do the same for a traced task:

prctl():
 * PR_SVE_SET_VL
 * PR_SVE_GET_VL

ptrace() NT_ARM_SVE regset:
 * user_sve_header.flags & SVE_PT_VL_*
 * user_sve_header.vl

(I follow the existing convention of not including the arch name in
prctl names).

The context switch logic is also extended to set the correct vector
length when scheduling a task in, since the vector length may now differ
between tasks.


The expected users of this API are libc startup, dynamic linker and
runtime environment plumbing code.  I don't expect ordinary user code to
change its own vector length on-the-fly, partly because it is generally
The Wrong Thing To Do with respect to the SVE programming model, and
partly because of ABI subtleties which make it difficult to do this
correctly.


ABI Impact
==========

The current arm64 signal frame size is not sufficient to save all SVE
state for larger vector lengths.

This won't affect existing binary distros, since the signal frame is not
extended unless some SVE instructions are executed by the user task.

However, non-SVE code executing in the same processes as SVE-aware code
may start to see the kernel using more than MINSIGSTKSZ bytes of stack
to deliver a signal, which may lead to stack overruns.  Other ABI
breakages are also possible if we were to simply increase the
MINSIGSTKSZ #define.  SVE aware code will need to move to a new
mechanism to discover the signal frame size: perhaps a new prctl() (not
implemented in this series).

As a temporary workaround, I added a Kconfig option in [1] to clamp the
vector length to a safe maximum that hides this effect, but this was
only intended as a short-term kludge.

This series removes the Kconfig kludge and introduces a new runtime
mechanism:

# echo <vector length in bytes> >/proc/cpu/sve_default_vector_length

will now set the default vector length for newly-exec'd processes.  This
is initialised to the ABI-safe value 512 at boot (or the maximum value
supported by the hardware, if smaller).  Administrators / distro
maintainers / developers can set this to something different in boot
scripts if they are comfortable doing so, or to see what happens.

We _could_ increase the kernel default in the future when and if we are
satisfied that the change is sufficiently low-impact.


User tasks can always override the default via prctl(): the logic is
that non-SVE-aware code doesn't know how to change the vector length,
and so won't do that anyway.  SVE-aware code is presumed to understand
the consequences.

The vector length can be made inheritable (allowing implementation of
taskset-like tools, or running a testsuite with a particular vector
length) or not (for general-purpose processes; in which case the vector
length is reset to the default across exec).


[1] arm64: Scalable Vector Extension core support
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470507.html

[2] Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A architecture
https://www.community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

Note: The size of an SVE vector register (the "vector length") is
choosable per hardware implementation, and the ISA allows software to be
coded independently of the actual vector length in use.  The vector
length can also be selected explicitly by software within the limits
supported by the hardware -- this is expected to be useful in some
situations.  This series exposes this control to userspace.


Dave Martin (10):
  prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls
  arm64/sve: Track vector length for each task
  arm64/sve: Set CPU vector length to match current task
  arm64/sve: Factor out clearing of tasks' SVE regs
  arm64/sve: Wire up vector length control prctl() calls
  arm64/sve: Disallow VL setting for individual threads by default
  arm64/sve: Add vector length inheritance control
  arm64/sve: ptrace: Wire up vector length control and reporting
  arm64/sve: Enable default vector length control via procfs
  Revert "arm64/sve: Limit vector length to 512 bits by default"

 arch/arm64/Kconfig                    |  35 ----
 arch/arm64/include/asm/fpsimd.h       |  25 ++-
 arch/arm64/include/asm/fpsimdmacros.h |   7 +-
 arch/arm64/include/asm/processor.h    |  12 ++
 arch/arm64/include/uapi/asm/ptrace.h  |   5 +
 arch/arm64/kernel/entry-fpsimd.S      |   2 +-
 arch/arm64/kernel/fpsimd.c            | 334 +++++++++++++++++++++++++++++++---
 arch/arm64/kernel/ptrace.c            |  27 +--
 arch/arm64/kernel/signal.c            |  15 +-
 arch/arm64/mm/proc.S                  |   5 -
 include/uapi/linux/prctl.h            |  10 +
 kernel/sys.c                          |  12 ++
 12 files changed, 407 insertions(+), 82 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 00/10] arm64/sve: Add userspace vector length control API
@ 2017-01-12 11:25 ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

This is an add-on series for the Scalable Vector Extension (SVE) core
patches [1], adding an interface to allow userspace tasks to control
what vector length they use for SVE instructions.

(For an architectural overview of SVE, and an explanation of what a
"vector length" is, see [2].)

The amount of SVE register state depends on the vector length, so using
large vector lengths with SVE can requre the userspace signal frame to
grow.  This leads to some ABI impacts in common with other arches that
may have to grow their signal frame.


In this series I do not dictate any particular policy for the SVE vector
length: the kernel provides a default, but userspace can set the vector
length as it likes, provided the hardware supports the chosen length.

For example, userspace may pick a length that the images being loaded
have been validated against or optimised for.


Overview
========

The API proposed here consists of prctl() calls to allow a task to
set/query its vector length and related control flags, and corresponding
ptrace extensions to allow a debugger to do the same for a traced task:

prctl():
 * PR_SVE_SET_VL
 * PR_SVE_GET_VL

ptrace() NT_ARM_SVE regset:
 * user_sve_header.flags & SVE_PT_VL_*
 * user_sve_header.vl

(I follow the existing convention of not including the arch name in
prctl names).

The context switch logic is also extended to set the correct vector
length when scheduling a task in, since the vector length may now differ
between tasks.


The expected users of this API are libc startup, dynamic linker and
runtime environment plumbing code.  I don't expect ordinary user code to
change its own vector length on-the-fly, partly because it is generally
The Wrong Thing To Do with respect to the SVE programming model, and
partly because of ABI subtleties which make it difficult to do this
correctly.


ABI Impact
==========

The current arm64 signal frame size is not sufficient to save all SVE
state for larger vector lengths.

This won't affect existing binary distros, since the signal frame is not
extended unless some SVE instructions are executed by the user task.

However, non-SVE code executing in the same processes as SVE-aware code
may start to see the kernel using more than MINSIGSTKSZ bytes of stack
to deliver a signal, which may lead to stack overruns.  Other ABI
breakages are also possible if we were to simply increase the
MINSIGSTKSZ #define.  SVE aware code will need to move to a new
mechanism to discover the signal frame size: perhaps a new prctl() (not
implemented in this series).

As a temporary workaround, I added a Kconfig option in [1] to clamp the
vector length to a safe maximum that hides this effect, but this was
only intended as a short-term kludge.

This series removes the Kconfig kludge and introduces a new runtime
mechanism:

# echo <vector length in bytes> >/proc/cpu/sve_default_vector_length

will now set the default vector length for newly-exec'd processes.  This
is initialised to the ABI-safe value 512 at boot (or the maximum value
supported by the hardware, if smaller).  Administrators / distro
maintainers / developers can set this to something different in boot
scripts if they are comfortable doing so, or to see what happens.

We _could_ increase the kernel default in the future when and if we are
satisfied that the change is sufficiently low-impact.


User tasks can always override the default via prctl(): the logic is
that non-SVE-aware code doesn't know how to change the vector length,
and so won't do that anyway.  SVE-aware code is presumed to understand
the consequences.

The vector length can be made inheritable (allowing implementation of
taskset-like tools, or running a testsuite with a particular vector
length) or not (for general-purpose processes; in which case the vector
length is reset to the default across exec).


[1] arm64: Scalable Vector Extension core support
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470507.html

[2] Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A architecture
https://www.community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

Note: The size of an SVE vector register (the "vector length") is
choosable per hardware implementation, and the ISA allows software to be
coded independently of the actual vector length in use.  The vector
length can also be selected explicitly by software within the limits
supported by the hardware -- this is expected to be useful in some
situations.  This series exposes this control to userspace.


Dave Martin (10):
  prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls
  arm64/sve: Track vector length for each task
  arm64/sve: Set CPU vector length to match current task
  arm64/sve: Factor out clearing of tasks' SVE regs
  arm64/sve: Wire up vector length control prctl() calls
  arm64/sve: Disallow VL setting for individual threads by default
  arm64/sve: Add vector length inheritance control
  arm64/sve: ptrace: Wire up vector length control and reporting
  arm64/sve: Enable default vector length control via procfs
  Revert "arm64/sve: Limit vector length to 512 bits by default"

 arch/arm64/Kconfig                    |  35 ----
 arch/arm64/include/asm/fpsimd.h       |  25 ++-
 arch/arm64/include/asm/fpsimdmacros.h |   7 +-
 arch/arm64/include/asm/processor.h    |  12 ++
 arch/arm64/include/uapi/asm/ptrace.h  |   5 +
 arch/arm64/kernel/entry-fpsimd.S      |   2 +-
 arch/arm64/kernel/fpsimd.c            | 334 +++++++++++++++++++++++++++++++---
 arch/arm64/kernel/ptrace.c            |  27 +--
 arch/arm64/kernel/signal.c            |  15 +-
 arch/arm64/mm/proc.S                  |   5 -
 include/uapi/linux/prctl.h            |  10 +
 kernel/sys.c                          |  12 ++
 12 files changed, 407 insertions(+), 82 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 01/10] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This patch adds a do-nothing skeleton for the arm64 Scalable Vector
Extension control prctls.

These prctls are only avilable with

    CONFIG_ARM64=y
    CONFIG_ARM64_SVE=y

Otherwise they will compile out and return -EINVAL if attempted
from userspace.

The backend functions sve_{set,get}_task_vl() will be fleshed out
with actual functionality in subsequent patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---

There will probably also be a call to discover the signal frame size
(MINSIGSTKSZ equivalent) for the current configuration, something like

 * PR_GET_MINSIGSTKSZ

This series doesn't add this yet: only the PR_SVE_{SET,GET}_VL prctls
are provided.


 arch/arm64/include/asm/fpsimd.h    | 19 +++++++++++++++++++
 arch/arm64/include/asm/processor.h |  4 ++++
 arch/arm64/kernel/fpsimd.c         | 13 +++++++++++++
 include/uapi/linux/prctl.h         |  4 ++++
 kernel/sys.c                       | 12 ++++++++++++
 5 files changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 88bcf69..e13259e 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -17,6 +17,7 @@
 #define __ASM_FP_H
 
 #include <asm/ptrace.h>
+#include <asm/errno.h>
 
 #ifndef __ASSEMBLY__
 
@@ -110,12 +111,30 @@ extern unsigned int sve_get_vl(void);
 extern void fpsimd_sync_to_sve(struct task_struct *task);
 
 #ifdef CONFIG_ARM64_SVE
+
 extern void fpsimd_sync_to_fpsimd(struct task_struct *task);
 extern void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task);
+extern int sve_set_task_vl(struct task_struct *task,
+			   unsigned long vector_length, unsigned long flags);
+extern int sve_get_task_vl(struct task_struct *task);
+
 #else /* !CONFIG_ARM64_SVE */
+
 static void __maybe_unused fpsimd_sync_to_fpsimd(struct task_struct *task) { }
 static void __maybe_unused fpsimd_sync_from_fpsimd_zeropad(
 	struct task_struct *task) { }
+
+static int __maybe_unused sve_set_task_vl(struct task_struct *task,
+	unsigned long vector_length, unsigned long flags)
+{
+	return -EINVAL;
+}
+
+static int __maybe_unused sve_get_task_vl(struct task_struct *task)
+{
+	return -EINVAL;
+}
+
 #endif /* !CONFIG_ARM64_SVE */
 
 #endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 747c65a..6f0f300 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -190,4 +190,8 @@ int cpu_enable_pan(void *__unused);
 int cpu_enable_uao(void *__unused);
 int cpu_enable_cache_maint_trap(void *__unused);
 
+#define SVE_SET_VL(task, vector_length, flags) \
+	sve_set_task_vl(task, vector_length, flags)
+#define SVE_GET_VL(task) sve_get_task_vl(task)
+
 #endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 2c113e4..bdacfcd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -468,6 +468,19 @@ void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task)
 	__fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl));
 }
 
+/* PR_SVE_SET_VL */
+int sve_set_task_vl(struct task_struct *task,
+		    unsigned long vector_length, unsigned long flags)
+{
+	return -EINVAL;
+}
+
+/* PR_SVE_GET_VL */
+int sve_get_task_vl(struct task_struct *task)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_ARM64_SVE */
 
 #ifdef CONFIG_KERNEL_MODE_NEON
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..e32e2da 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* arm64 Scalable Vector Extension controls */
+#define PR_SVE_SET_VL			48	/* set task vector length */
+#define PR_SVE_GET_VL			49	/* get task vector length */
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 842914e..5f50385 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -103,6 +103,12 @@
 #ifndef SET_FP_MODE
 # define SET_FP_MODE(a,b)	(-EINVAL)
 #endif
+#ifndef SVE_SET_VL
+# define SVE_SET_VL(a,b,c)	(-EINVAL)
+#endif
+#ifndef SVE_GET_VL
+# define SVE_GET_VL(a)		(-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2261,6 +2267,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+	case PR_SVE_SET_VL:
+		error = SVE_SET_VL(me, arg2, arg3);
+		break;
+	case PR_SVE_GET_VL:
+		error = SVE_GET_VL(me);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 01/10] prctl: Add skeleton for PR_SVE_{SET, GET}_VL controls
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds a do-nothing skeleton for the arm64 Scalable Vector
Extension control prctls.

These prctls are only avilable with

    CONFIG_ARM64=y
    CONFIG_ARM64_SVE=y

Otherwise they will compile out and return -EINVAL if attempted
from userspace.

The backend functions sve_{set,get}_task_vl() will be fleshed out
with actual functionality in subsequent patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---

There will probably also be a call to discover the signal frame size
(MINSIGSTKSZ equivalent) for the current configuration, something like

 * PR_GET_MINSIGSTKSZ

This series doesn't add this yet: only the PR_SVE_{SET,GET}_VL prctls
are provided.


 arch/arm64/include/asm/fpsimd.h    | 19 +++++++++++++++++++
 arch/arm64/include/asm/processor.h |  4 ++++
 arch/arm64/kernel/fpsimd.c         | 13 +++++++++++++
 include/uapi/linux/prctl.h         |  4 ++++
 kernel/sys.c                       | 12 ++++++++++++
 5 files changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 88bcf69..e13259e 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -17,6 +17,7 @@
 #define __ASM_FP_H
 
 #include <asm/ptrace.h>
+#include <asm/errno.h>
 
 #ifndef __ASSEMBLY__
 
@@ -110,12 +111,30 @@ extern unsigned int sve_get_vl(void);
 extern void fpsimd_sync_to_sve(struct task_struct *task);
 
 #ifdef CONFIG_ARM64_SVE
+
 extern void fpsimd_sync_to_fpsimd(struct task_struct *task);
 extern void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task);
+extern int sve_set_task_vl(struct task_struct *task,
+			   unsigned long vector_length, unsigned long flags);
+extern int sve_get_task_vl(struct task_struct *task);
+
 #else /* !CONFIG_ARM64_SVE */
+
 static void __maybe_unused fpsimd_sync_to_fpsimd(struct task_struct *task) { }
 static void __maybe_unused fpsimd_sync_from_fpsimd_zeropad(
 	struct task_struct *task) { }
+
+static int __maybe_unused sve_set_task_vl(struct task_struct *task,
+	unsigned long vector_length, unsigned long flags)
+{
+	return -EINVAL;
+}
+
+static int __maybe_unused sve_get_task_vl(struct task_struct *task)
+{
+	return -EINVAL;
+}
+
 #endif /* !CONFIG_ARM64_SVE */
 
 #endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 747c65a..6f0f300 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -190,4 +190,8 @@ int cpu_enable_pan(void *__unused);
 int cpu_enable_uao(void *__unused);
 int cpu_enable_cache_maint_trap(void *__unused);
 
+#define SVE_SET_VL(task, vector_length, flags) \
+	sve_set_task_vl(task, vector_length, flags)
+#define SVE_GET_VL(task) sve_get_task_vl(task)
+
 #endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 2c113e4..bdacfcd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -468,6 +468,19 @@ void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task)
 	__fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl));
 }
 
+/* PR_SVE_SET_VL */
+int sve_set_task_vl(struct task_struct *task,
+		    unsigned long vector_length, unsigned long flags)
+{
+	return -EINVAL;
+}
+
+/* PR_SVE_GET_VL */
+int sve_get_task_vl(struct task_struct *task)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_ARM64_SVE */
 
 #ifdef CONFIG_KERNEL_MODE_NEON
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..e32e2da 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* arm64 Scalable Vector Extension controls */
+#define PR_SVE_SET_VL			48	/* set task vector length */
+#define PR_SVE_GET_VL			49	/* get task vector length */
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 842914e..5f50385 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -103,6 +103,12 @@
 #ifndef SET_FP_MODE
 # define SET_FP_MODE(a,b)	(-EINVAL)
 #endif
+#ifndef SVE_SET_VL
+# define SVE_SET_VL(a,b,c)	(-EINVAL)
+#endif
+#ifndef SVE_GET_VL
+# define SVE_GET_VL(a)		(-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2261,6 +2267,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+	case PR_SVE_SET_VL:
+		error = SVE_SET_VL(me, arg2, arg3);
+		break;
+	case PR_SVE_GET_VL:
+		error = SVE_GET_VL(me);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 02/10] arm64/sve: Track vector length for each task
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

In preparation for allowing each task to have its own independent
vector length, this patch adds a sve_vl field to thread_struct to
track it, and interrogates this instead of interrogating the
hardware when knowledge of the task's vector length is needed.

The hardware supported vector length is not known straight out of
boot, so init_task and other kernel tasks forked early may lack
this knowledge.

We only need this knowledge when in the context of a user task that
has SVE state (or that has just trapped while attempting to have
SVE state).  So, we can hook into exec() to set task vector length
if it wasn't known at boot/fork time, before the task enters
userspace.

There is no way to change sve_vl for a task yet, so all tasks still
execute with the hardware vector length.  Subsequent patches will
enable changing the vector length for tasks.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/kernel/fpsimd.c         | 44 ++++++++++++++++++++++++++------------
 arch/arm64/kernel/ptrace.c         |  4 ++--
 arch/arm64/kernel/signal.c         | 15 +++++++++----
 4 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 6f0f300..96eada9 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -83,6 +83,7 @@ struct thread_struct {
 	unsigned long		tp2_value;
 #endif
 	struct fpsimd_state	fpsimd_state;
+	u16			sve_vl;		/* SVE vector length */
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index bdacfcd..6065707 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -99,6 +99,9 @@ void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
 	WARN_ON(1);
 }
 
+/* Maximum supported vector length across all CPUs (initially poisoned) */
+int sve_max_vl = -1;
+
 #ifdef CONFIG_ARM64_SVE
 
 static void task_fpsimd_to_sve(struct task_struct *task);
@@ -156,9 +159,9 @@ void *__task_sve_state(struct task_struct *task)
 
 static void *__task_pffr(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 
-	BUG_ON(vl % 16);
+	BUG_ON(!sve_vl_valid(vl));
 	return (char *)__task_sve_state(task) + 34 * vl;
 }
 
@@ -272,6 +275,18 @@ void fpsimd_flush_thread(void)
 		memset(__task_sve_state(current), 0,
 		       arch_task_struct_size -
 		       ((char *)__task_sve_state(current) - (char *)current));
+
+		/*
+		 * User tasks must have a valid vector length set, but tasks
+		 * forked early (e.g., init) may not have one yet.
+		 * By now, we will know what the hardware supports, so set the
+		 * task vector length if it doesn't have one:
+		 */
+		if (!current->thread.sve_vl) {
+			BUG_ON(!sve_vl_valid(sve_max_vl));
+
+			current->thread.sve_vl = sve_max_vl;
+		}
 	}
 
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
@@ -310,16 +325,15 @@ static void __task_sve_to_fpsimd(struct task_struct *task, unsigned int vq)
 
 static void task_sve_to_fpsimd(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 	unsigned int vq;
 
 	if (!(elf_hwcap & HWCAP_SVE))
 		return;
 
-	BUG_ON(vl % 16);
-	vq = vl / 16;
-	BUG_ON(vq < 1 || vq > 16);
+	BUG_ON(!sve_vl_valid(vl));
 
+	vq = sve_vq_from_vl(vl);
 	__task_sve_to_fpsimd(task, vq);
 }
 
@@ -373,16 +387,15 @@ static void __task_fpsimd_to_sve(struct task_struct *task, unsigned int vq)
 
 static void task_fpsimd_to_sve(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 	unsigned int vq;
 
 	if (!(elf_hwcap & HWCAP_SVE))
 		return;
 
-	BUG_ON(vl % 16);
-	vq = vl / 16;
-	BUG_ON(vq < 1 || vq > 16);
+	BUG_ON(!sve_vl_valid(vl));
 
+	vq = sve_vq_from_vl(vl);
 	__task_fpsimd_to_sve(task, vq);
 }
 
@@ -463,8 +476,9 @@ static void __fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task,
 
 void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 
+	BUG_ON(!sve_vl_valid(vl));
 	__fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl));
 }
 
@@ -606,11 +620,13 @@ void __init fpsimd_init_task_struct_size(void)
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    ((read_cpuid(ID_AA64PFR0_EL1) >> ID_AA64PFR0_SVE_SHIFT)
 	     & 0xf) == 1) {
-		arch_task_struct_size = sizeof(struct task_struct) +
-			35 * sve_get_vl();
+		/* FIXME: This should be the minimum across all CPUs */
+		sve_max_vl = sve_get_vl();
 
+		arch_task_struct_size = sizeof(struct task_struct) +
+			35 * sve_max_vl;
 		pr_info("SVE: enabled with maximum %u bits per vector\n",
-			sve_get_vl() * 8);
+			sve_max_vl * 8);
 	}
 }
 
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index fb8cef63..32debb8 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -736,7 +736,7 @@ static int sve_get(struct task_struct *target,
 	/* Header */
 	memset(&header, 0, sizeof(header));
 
-	header.vl = sve_get_vl();
+	header.vl = target->thread.sve_vl;
 
 	BUG_ON(!sve_vl_valid(header.vl));
 	vq = sve_vq_from_vl(header.vl);
@@ -830,7 +830,7 @@ static int sve_set(struct task_struct *target,
 	if (ret)
 		goto out;
 
-	if (header.vl != sve_get_vl())
+	if (header.vl != target->thread.sve_vl)
 		return -EINVAL;
 
 	BUG_ON(!sve_vl_valid(header.vl));
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 9ef9953..7615c7d 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -222,8 +222,11 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 {
 	int err = 0;
 	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
-	unsigned int vl = sve_get_vl();
-	unsigned int vq = sve_vq_from_vl(vl);
+	unsigned int vl = current->thread.sve_vl;
+	unsigned int vq;
+
+	BUG_ON(!sve_vl_valid(vl));
+	vq = sve_vq_from_vl(vl);
 
 	memset(reserved, 0, sizeof(reserved));
 
@@ -253,7 +256,7 @@ static int __restore_sve_fpsimd_context(struct user_ctxs *user,
 		__task_sve_state(current);
 	struct fpsimd_state fpsimd;
 
-	if (vl != sve_get_vl())
+	if (vl != current->thread.sve_vl)
 		return -EINVAL;
 
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
@@ -543,7 +546,11 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user)
 	}
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
-		unsigned int vq = sve_vq_from_vl(sve_get_vl());
+		unsigned int vl = current->thread.sve_vl;
+		unsigned int vq;
+
+		BUG_ON(!sve_vl_valid(vl));
+		vq = sve_vq_from_vl(vl);
 
 		BUG_ON(!(elf_hwcap & HWCAP_SVE));
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 02/10] arm64/sve: Track vector length for each task
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for allowing each task to have its own independent
vector length, this patch adds a sve_vl field to thread_struct to
track it, and interrogates this instead of interrogating the
hardware when knowledge of the task's vector length is needed.

The hardware supported vector length is not known straight out of
boot, so init_task and other kernel tasks forked early may lack
this knowledge.

We only need this knowledge when in the context of a user task that
has SVE state (or that has just trapped while attempting to have
SVE state).  So, we can hook into exec() to set task vector length
if it wasn't known at boot/fork time, before the task enters
userspace.

There is no way to change sve_vl for a task yet, so all tasks still
execute with the hardware vector length.  Subsequent patches will
enable changing the vector length for tasks.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/kernel/fpsimd.c         | 44 ++++++++++++++++++++++++++------------
 arch/arm64/kernel/ptrace.c         |  4 ++--
 arch/arm64/kernel/signal.c         | 15 +++++++++----
 4 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 6f0f300..96eada9 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -83,6 +83,7 @@ struct thread_struct {
 	unsigned long		tp2_value;
 #endif
 	struct fpsimd_state	fpsimd_state;
+	u16			sve_vl;		/* SVE vector length */
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index bdacfcd..6065707 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -99,6 +99,9 @@ void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
 	WARN_ON(1);
 }
 
+/* Maximum supported vector length across all CPUs (initially poisoned) */
+int sve_max_vl = -1;
+
 #ifdef CONFIG_ARM64_SVE
 
 static void task_fpsimd_to_sve(struct task_struct *task);
@@ -156,9 +159,9 @@ void *__task_sve_state(struct task_struct *task)
 
 static void *__task_pffr(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 
-	BUG_ON(vl % 16);
+	BUG_ON(!sve_vl_valid(vl));
 	return (char *)__task_sve_state(task) + 34 * vl;
 }
 
@@ -272,6 +275,18 @@ void fpsimd_flush_thread(void)
 		memset(__task_sve_state(current), 0,
 		       arch_task_struct_size -
 		       ((char *)__task_sve_state(current) - (char *)current));
+
+		/*
+		 * User tasks must have a valid vector length set, but tasks
+		 * forked early (e.g., init) may not have one yet.
+		 * By now, we will know what the hardware supports, so set the
+		 * task vector length if it doesn't have one:
+		 */
+		if (!current->thread.sve_vl) {
+			BUG_ON(!sve_vl_valid(sve_max_vl));
+
+			current->thread.sve_vl = sve_max_vl;
+		}
 	}
 
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
@@ -310,16 +325,15 @@ static void __task_sve_to_fpsimd(struct task_struct *task, unsigned int vq)
 
 static void task_sve_to_fpsimd(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 	unsigned int vq;
 
 	if (!(elf_hwcap & HWCAP_SVE))
 		return;
 
-	BUG_ON(vl % 16);
-	vq = vl / 16;
-	BUG_ON(vq < 1 || vq > 16);
+	BUG_ON(!sve_vl_valid(vl));
 
+	vq = sve_vq_from_vl(vl);
 	__task_sve_to_fpsimd(task, vq);
 }
 
@@ -373,16 +387,15 @@ static void __task_fpsimd_to_sve(struct task_struct *task, unsigned int vq)
 
 static void task_fpsimd_to_sve(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 	unsigned int vq;
 
 	if (!(elf_hwcap & HWCAP_SVE))
 		return;
 
-	BUG_ON(vl % 16);
-	vq = vl / 16;
-	BUG_ON(vq < 1 || vq > 16);
+	BUG_ON(!sve_vl_valid(vl));
 
+	vq = sve_vq_from_vl(vl);
 	__task_fpsimd_to_sve(task, vq);
 }
 
@@ -463,8 +476,9 @@ static void __fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task,
 
 void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task)
 {
-	unsigned int vl = sve_get_vl();
+	unsigned int vl = task->thread.sve_vl;
 
+	BUG_ON(!sve_vl_valid(vl));
 	__fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl));
 }
 
@@ -606,11 +620,13 @@ void __init fpsimd_init_task_struct_size(void)
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
 	    ((read_cpuid(ID_AA64PFR0_EL1) >> ID_AA64PFR0_SVE_SHIFT)
 	     & 0xf) == 1) {
-		arch_task_struct_size = sizeof(struct task_struct) +
-			35 * sve_get_vl();
+		/* FIXME: This should be the minimum across all CPUs */
+		sve_max_vl = sve_get_vl();
 
+		arch_task_struct_size = sizeof(struct task_struct) +
+			35 * sve_max_vl;
 		pr_info("SVE: enabled with maximum %u bits per vector\n",
-			sve_get_vl() * 8);
+			sve_max_vl * 8);
 	}
 }
 
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index fb8cef63..32debb8 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -736,7 +736,7 @@ static int sve_get(struct task_struct *target,
 	/* Header */
 	memset(&header, 0, sizeof(header));
 
-	header.vl = sve_get_vl();
+	header.vl = target->thread.sve_vl;
 
 	BUG_ON(!sve_vl_valid(header.vl));
 	vq = sve_vq_from_vl(header.vl);
@@ -830,7 +830,7 @@ static int sve_set(struct task_struct *target,
 	if (ret)
 		goto out;
 
-	if (header.vl != sve_get_vl())
+	if (header.vl != target->thread.sve_vl)
 		return -EINVAL;
 
 	BUG_ON(!sve_vl_valid(header.vl));
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 9ef9953..7615c7d 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -222,8 +222,11 @@ static int preserve_sve_context(struct sve_context __user *ctx)
 {
 	int err = 0;
 	u16 reserved[ARRAY_SIZE(ctx->__reserved)];
-	unsigned int vl = sve_get_vl();
-	unsigned int vq = sve_vq_from_vl(vl);
+	unsigned int vl = current->thread.sve_vl;
+	unsigned int vq;
+
+	BUG_ON(!sve_vl_valid(vl));
+	vq = sve_vq_from_vl(vl);
 
 	memset(reserved, 0, sizeof(reserved));
 
@@ -253,7 +256,7 @@ static int __restore_sve_fpsimd_context(struct user_ctxs *user,
 		__task_sve_state(current);
 	struct fpsimd_state fpsimd;
 
-	if (vl != sve_get_vl())
+	if (vl != current->thread.sve_vl)
 		return -EINVAL;
 
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
@@ -543,7 +546,11 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user)
 	}
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && test_thread_flag(TIF_SVE)) {
-		unsigned int vq = sve_vq_from_vl(sve_get_vl());
+		unsigned int vl = current->thread.sve_vl;
+		unsigned int vq;
+
+		BUG_ON(!sve_vl_valid(vl));
+		vq = sve_vq_from_vl(vl);
 
 		BUG_ON(!(elf_hwcap & HWCAP_SVE));
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 03/10] arm64/sve: Set CPU vector length to match current task
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This patch adds the necessary code to configure the CPU for the
task's vector length, whenever SVE state is loaded for a task.

No special action is needed on sched-out: only user tasks with SVE
state care what the CPU's VL is set to.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h       |  3 ++-
 arch/arm64/include/asm/fpsimdmacros.h |  7 ++++++-
 arch/arm64/kernel/entry-fpsimd.S      |  2 +-
 arch/arm64/kernel/fpsimd.c            | 10 +++++++---
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index e13259e..22d09c0 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -100,7 +100,8 @@ extern void __init fpsimd_init_task_struct_size(void);
 
 extern void *__task_sve_state(struct task_struct *task);
 extern void sve_save_state(void *state, u32 *pfpsr);
-extern void sve_load_state(void const *state, u32 const *pfpsr);
+extern void sve_load_state(void const *state, u32 const *pfpsr,
+			   unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
 /*
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index e2bb032..7b53bc1 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -254,7 +254,12 @@
 	.purgem savep
 .endm
 
-.macro sve_load nb, xpfpsr, ntmp
+.macro sve_load nb, xpfpsr, xvqminus1 ntmp
+	mrs_s	x\ntmp, ZCR_EL1
+	bic	x\ntmp, x\ntmp, ZCR_EL1_LEN_MASK
+	orr	x\ntmp, x\ntmp, \xvqminus1
+	msr_s	ZCR_EL1, x\ntmp	// self-synchronising
+
 	.macro loadz n
 		_zldrv	\n, \nb, (\n) - 34
 	.endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 5dcec55..0a3fdcb 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -73,7 +73,7 @@ ENTRY(sve_save_state)
 ENDPROC(sve_save_state)
 
 ENTRY(sve_load_state)
-	sve_load 0, x1, 2
+	sve_load 0, x1, x2, 3
 	ret
 ENDPROC(sve_load_state)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 6065707..ab2bb62 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -175,10 +175,14 @@ extern void *__task_pffr(struct task_struct *task);
 static void task_fpsimd_load(struct task_struct *task)
 {
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
-	    test_tsk_thread_flag(task, TIF_SVE))
+	    test_tsk_thread_flag(task, TIF_SVE)) {
+		unsigned int vl = task->thread.sve_vl;
+
+		BUG_ON(!sve_vl_valid(vl));
 		sve_load_state(__task_pffr(task),
-			       &task->thread.fpsimd_state.fpsr);
-	else
+			       &task->thread.fpsimd_state.fpsr,
+			       sve_vq_from_vl(vl) - 1);
+	} else
 		fpsimd_load_state(&task->thread.fpsimd_state);
 
 	/*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 03/10] arm64/sve: Set CPU vector length to match current task
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds the necessary code to configure the CPU for the
task's vector length, whenever SVE state is loaded for a task.

No special action is needed on sched-out: only user tasks with SVE
state care what the CPU's VL is set to.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h       |  3 ++-
 arch/arm64/include/asm/fpsimdmacros.h |  7 ++++++-
 arch/arm64/kernel/entry-fpsimd.S      |  2 +-
 arch/arm64/kernel/fpsimd.c            | 10 +++++++---
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index e13259e..22d09c0 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -100,7 +100,8 @@ extern void __init fpsimd_init_task_struct_size(void);
 
 extern void *__task_sve_state(struct task_struct *task);
 extern void sve_save_state(void *state, u32 *pfpsr);
-extern void sve_load_state(void const *state, u32 const *pfpsr);
+extern void sve_load_state(void const *state, u32 const *pfpsr,
+			   unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
 
 /*
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index e2bb032..7b53bc1 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -254,7 +254,12 @@
 	.purgem savep
 .endm
 
-.macro sve_load nb, xpfpsr, ntmp
+.macro sve_load nb, xpfpsr, xvqminus1 ntmp
+	mrs_s	x\ntmp, ZCR_EL1
+	bic	x\ntmp, x\ntmp, ZCR_EL1_LEN_MASK
+	orr	x\ntmp, x\ntmp, \xvqminus1
+	msr_s	ZCR_EL1, x\ntmp	// self-synchronising
+
 	.macro loadz n
 		_zldrv	\n, \nb, (\n) - 34
 	.endm
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 5dcec55..0a3fdcb 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -73,7 +73,7 @@ ENTRY(sve_save_state)
 ENDPROC(sve_save_state)
 
 ENTRY(sve_load_state)
-	sve_load 0, x1, 2
+	sve_load 0, x1, x2, 3
 	ret
 ENDPROC(sve_load_state)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 6065707..ab2bb62 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -175,10 +175,14 @@ extern void *__task_pffr(struct task_struct *task);
 static void task_fpsimd_load(struct task_struct *task)
 {
 	if (IS_ENABLED(CONFIG_ARM64_SVE) &&
-	    test_tsk_thread_flag(task, TIF_SVE))
+	    test_tsk_thread_flag(task, TIF_SVE)) {
+		unsigned int vl = task->thread.sve_vl;
+
+		BUG_ON(!sve_vl_valid(vl));
 		sve_load_state(__task_pffr(task),
-			       &task->thread.fpsimd_state.fpsr);
-	else
+			       &task->thread.fpsimd_state.fpsr,
+			       sve_vq_from_vl(vl) - 1);
+	} else
 		fpsimd_load_state(&task->thread.fpsimd_state);
 
 	/*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 04/10] arm64/sve: Factor out clearing of tasks' SVE regs
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Yao Qi, Szabolcs Nagy, Joseph Myers,
	Alan Hayward, Torvald Riegel, Christoffer Dall

The patch factors out the code that clears a task's SVE regs on
exec(), so that we can reuse it in subsequent patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ab2bb62..54d7ed0 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -260,6 +260,19 @@ void fpsimd_thread_switch(struct task_struct *next)
 	}
 }
 
+static void clear_sve_regs(struct task_struct *task)
+{
+	BUG_ON(task == current && preemptible());
+
+	BUG_ON((char *)__task_sve_state(task) < (char *)task);
+	BUG_ON(arch_task_struct_size <
+	       ((char *)__task_sve_state(task) - (char *)task));
+
+	memset(__task_sve_state(task), 0,
+	       arch_task_struct_size -
+	       ((char *)__task_sve_state(task) - (char *)task));
+}
+
 void fpsimd_flush_thread(void)
 {
 	if (!system_supports_fpsimd())
@@ -272,13 +285,7 @@ void fpsimd_flush_thread(void)
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && (elf_hwcap & HWCAP_SVE)) {
-		BUG_ON((char *)__task_sve_state(current) < (char *)current);
-		BUG_ON(arch_task_struct_size <
-		       ((char *)__task_sve_state(current) - (char *)current));
-
-		memset(__task_sve_state(current), 0,
-		       arch_task_struct_size -
-		       ((char *)__task_sve_state(current) - (char *)current));
+		clear_sve_regs(current);
 
 		/*
 		 * User tasks must have a valid vector length set, but tasks
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 04/10] arm64/sve: Factor out clearing of tasks' SVE regs
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

The patch factors out the code that clears a task's SVE regs on
exec(), so that we can reuse it in subsequent patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ab2bb62..54d7ed0 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -260,6 +260,19 @@ void fpsimd_thread_switch(struct task_struct *next)
 	}
 }
 
+static void clear_sve_regs(struct task_struct *task)
+{
+	BUG_ON(task == current && preemptible());
+
+	BUG_ON((char *)__task_sve_state(task) < (char *)task);
+	BUG_ON(arch_task_struct_size <
+	       ((char *)__task_sve_state(task) - (char *)task));
+
+	memset(__task_sve_state(task), 0,
+	       arch_task_struct_size -
+	       ((char *)__task_sve_state(task) - (char *)task));
+}
+
 void fpsimd_flush_thread(void)
 {
 	if (!system_supports_fpsimd())
@@ -272,13 +285,7 @@ void fpsimd_flush_thread(void)
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && (elf_hwcap & HWCAP_SVE)) {
-		BUG_ON((char *)__task_sve_state(current) < (char *)current);
-		BUG_ON(arch_task_struct_size <
-		       ((char *)__task_sve_state(current) - (char *)current));
-
-		memset(__task_sve_state(current), 0,
-		       arch_task_struct_size -
-		       ((char *)__task_sve_state(current) - (char *)current));
+		clear_sve_regs(current);
 
 		/*
 		 * User tasks must have a valid vector length set, but tasks
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 04/10] arm64/sve: Factor out clearing of tasks' SVE regs
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

The patch factors out the code that clears a task's SVE regs on
exec(), so that we can reuse it in subsequent patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ab2bb62..54d7ed0 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -260,6 +260,19 @@ void fpsimd_thread_switch(struct task_struct *next)
 	}
 }
 
+static void clear_sve_regs(struct task_struct *task)
+{
+	BUG_ON(task == current && preemptible());
+
+	BUG_ON((char *)__task_sve_state(task) < (char *)task);
+	BUG_ON(arch_task_struct_size <
+	       ((char *)__task_sve_state(task) - (char *)task));
+
+	memset(__task_sve_state(task), 0,
+	       arch_task_struct_size -
+	       ((char *)__task_sve_state(task) - (char *)task));
+}
+
 void fpsimd_flush_thread(void)
 {
 	if (!system_supports_fpsimd())
@@ -272,13 +285,7 @@ void fpsimd_flush_thread(void)
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
 
 	if (IS_ENABLED(CONFIG_ARM64_SVE) && (elf_hwcap & HWCAP_SVE)) {
-		BUG_ON((char *)__task_sve_state(current) < (char *)current);
-		BUG_ON(arch_task_struct_size <
-		       ((char *)__task_sve_state(current) - (char *)current));
-
-		memset(__task_sve_state(current), 0,
-		       arch_task_struct_size -
-		       ((char *)__task_sve_state(current) - (char *)current));
+		clear_sve_regs(current);
 
 		/*
 		 * User tasks must have a valid vector length set, but tasks
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 05/10] arm64/sve: Wire up vector length control prctl() calls
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This patch provides implementation for the PR_SVE_SET_VL and
PR_SVE_GET_VL prctls, which allow a task to set and query its
vector length and associated control flags (although no flags are
defined in this patch).

Currently any thread can set its VL, allowing a mix of VLs within
a single process -- this behaviour will be refined in susequent
patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h |  2 ++
 arch/arm64/kernel/fpsimd.c      | 65 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 22d09c0..1ec2363 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -103,6 +103,8 @@ extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
+extern int sve_set_vector_length(struct task_struct *task,
+				 unsigned long vl, unsigned long flags);
 
 /*
  * FPSIMD/SVE synchronisation helpers for ptrace:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 54d7ed0..5f2c24a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -493,17 +493,78 @@ void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task)
 	__fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl));
 }
 
+int sve_set_vector_length(struct task_struct *task,
+			  unsigned long vl, unsigned long flags)
+{
+	BUG_ON(task == current && preemptible());
+
+	if (flags)
+		return -EINVAL; /* No flags defined yet */
+
+	if (!sve_vl_valid(vl))
+		return -EINVAL;
+
+	if (vl > sve_max_vl) {
+		BUG_ON(!sve_vl_valid(sve_max_vl));
+		vl = sve_max_vl;
+	}
+
+	/*
+	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
+	 * write any live register state back to task_struct, and convert to a
+	 * non-SVE thread.
+	 */
+	if (vl != task->thread.sve_vl) {
+		if (task == current &&
+		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+			task_fpsimd_save(current);
+
+		if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+			task_sve_to_fpsimd(task);
+
+		/*
+		 * To avoid surprises, also zero out the SVE regs storage.
+		 * This means that the P-regs, FFR and high bits of Z-regs
+		 * will read as zero on next access:
+		 */
+		clear_sve_regs(task);
+	}
+
+	task->thread.sve_vl = vl;
+
+	fpsimd_flush_task_state(task);
+
+	return 0;
+}
+
 /* PR_SVE_SET_VL */
 int sve_set_task_vl(struct task_struct *task,
 		    unsigned long vector_length, unsigned long flags)
 {
-	return -EINVAL;
+	int ret;
+
+	if (!(elf_hwcap & HWCAP_SVE))
+		return -EINVAL;
+
+	BUG_ON(task != current);
+
+	preempt_disable();
+	ret = sve_set_vector_length(current, vector_length, flags);
+	preempt_enable();
+
+	if (ret)
+		return ret;
+
+	return task->thread.sve_vl;
 }
 
 /* PR_SVE_GET_VL */
 int sve_get_task_vl(struct task_struct *task)
 {
-	return -EINVAL;
+	if (!(elf_hwcap & HWCAP_SVE))
+		return -EINVAL;
+
+	return task->thread.sve_vl;
 }
 
 #endif /* CONFIG_ARM64_SVE */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 05/10] arm64/sve: Wire up vector length control prctl() calls
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

This patch provides implementation for the PR_SVE_SET_VL and
PR_SVE_GET_VL prctls, which allow a task to set and query its
vector length and associated control flags (although no flags are
defined in this patch).

Currently any thread can set its VL, allowing a mix of VLs within
a single process -- this behaviour will be refined in susequent
patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h |  2 ++
 arch/arm64/kernel/fpsimd.c      | 65 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 22d09c0..1ec2363 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -103,6 +103,8 @@ extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
 extern unsigned int sve_get_vl(void);
+extern int sve_set_vector_length(struct task_struct *task,
+				 unsigned long vl, unsigned long flags);
 
 /*
  * FPSIMD/SVE synchronisation helpers for ptrace:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 54d7ed0..5f2c24a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -493,17 +493,78 @@ void fpsimd_sync_from_fpsimd_zeropad(struct task_struct *task)
 	__fpsimd_sync_from_fpsimd_zeropad(task, sve_vq_from_vl(vl));
 }
 
+int sve_set_vector_length(struct task_struct *task,
+			  unsigned long vl, unsigned long flags)
+{
+	BUG_ON(task == current && preemptible());
+
+	if (flags)
+		return -EINVAL; /* No flags defined yet */
+
+	if (!sve_vl_valid(vl))
+		return -EINVAL;
+
+	if (vl > sve_max_vl) {
+		BUG_ON(!sve_vl_valid(sve_max_vl));
+		vl = sve_max_vl;
+	}
+
+	/*
+	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
+	 * write any live register state back to task_struct, and convert to a
+	 * non-SVE thread.
+	 */
+	if (vl != task->thread.sve_vl) {
+		if (task == current &&
+		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+			task_fpsimd_save(current);
+
+		if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+			task_sve_to_fpsimd(task);
+
+		/*
+		 * To avoid surprises, also zero out the SVE regs storage.
+		 * This means that the P-regs, FFR and high bits of Z-regs
+		 * will read as zero on next access:
+		 */
+		clear_sve_regs(task);
+	}
+
+	task->thread.sve_vl = vl;
+
+	fpsimd_flush_task_state(task);
+
+	return 0;
+}
+
 /* PR_SVE_SET_VL */
 int sve_set_task_vl(struct task_struct *task,
 		    unsigned long vector_length, unsigned long flags)
 {
-	return -EINVAL;
+	int ret;
+
+	if (!(elf_hwcap & HWCAP_SVE))
+		return -EINVAL;
+
+	BUG_ON(task != current);
+
+	preempt_disable();
+	ret = sve_set_vector_length(current, vector_length, flags);
+	preempt_enable();
+
+	if (ret)
+		return ret;
+
+	return task->thread.sve_vl;
 }
 
 /* PR_SVE_GET_VL */
 int sve_get_task_vl(struct task_struct *task)
 {
-	return -EINVAL;
+	if (!(elf_hwcap & HWCAP_SVE))
+		return -EINVAL;
+
+	return task->thread.sve_vl;
 }
 
 #endif /* CONFIG_ARM64_SVE */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

General-purpose code in userspace is not expected to work correctly
if multiple threads are allowed to run concurrently with different
vector lengths in a single process.

This patch adds an explicit flag PR_SVE_SET_VL_THREAD to request
this behaviour.  Without the flag, vector length setting is
permitted only for a single-threaded process (which matches the
expected usage model of setting the vector length at process
startup).

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 12 +++++++++++-
 include/uapi/linux/prctl.h |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 5f2c24a..32caca3 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -21,6 +21,7 @@
 #include <linux/cpu_pm.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/prctl.h>
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/hardirq.h>
@@ -498,8 +499,17 @@ int sve_set_vector_length(struct task_struct *task,
 {
 	BUG_ON(task == current && preemptible());
 
+	/*
+	 * To avoid accidents, forbid setting for individual threads of a
+	 * multithreaded process.  User code that knows what it's doing can
+	 * pass PR_SVE_SET_VL_THREAD to override this restriction:
+	 */
+	if (!(flags & PR_SVE_SET_VL_THREAD) && get_nr_threads(task) != 1)
+		return -EINVAL;
+
+	flags &= ~(unsigned long)PR_SVE_SET_VL_THREAD;
 	if (flags)
-		return -EINVAL; /* No flags defined yet */
+		return -EINVAL; /* No other flags defined yet */
 
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index e32e2da..c55530b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -199,6 +199,7 @@ struct prctl_mm_map {
 
 /* arm64 Scalable Vector Extension controls */
 #define PR_SVE_SET_VL			48	/* set task vector length */
+# define PR_SVE_SET_VL_THREAD		(1 << 1) /* set just this thread */
 #define PR_SVE_GET_VL			49	/* get task vector length */
 
 #endif /* _LINUX_PRCTL_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

General-purpose code in userspace is not expected to work correctly
if multiple threads are allowed to run concurrently with different
vector lengths in a single process.

This patch adds an explicit flag PR_SVE_SET_VL_THREAD to request
this behaviour.  Without the flag, vector length setting is
permitted only for a single-threaded process (which matches the
expected usage model of setting the vector length at process
startup).

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 12 +++++++++++-
 include/uapi/linux/prctl.h |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 5f2c24a..32caca3 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -21,6 +21,7 @@
 #include <linux/cpu_pm.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/prctl.h>
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/hardirq.h>
@@ -498,8 +499,17 @@ int sve_set_vector_length(struct task_struct *task,
 {
 	BUG_ON(task == current && preemptible());
 
+	/*
+	 * To avoid accidents, forbid setting for individual threads of a
+	 * multithreaded process.  User code that knows what it's doing can
+	 * pass PR_SVE_SET_VL_THREAD to override this restriction:
+	 */
+	if (!(flags & PR_SVE_SET_VL_THREAD) && get_nr_threads(task) != 1)
+		return -EINVAL;
+
+	flags &= ~(unsigned long)PR_SVE_SET_VL_THREAD;
 	if (flags)
-		return -EINVAL; /* No flags defined yet */
+		return -EINVAL; /* No other flags defined yet */
 
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index e32e2da..c55530b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -199,6 +199,7 @@ struct prctl_mm_map {
 
 /* arm64 Scalable Vector Extension controls */
 #define PR_SVE_SET_VL			48	/* set task vector length */
+# define PR_SVE_SET_VL_THREAD		(1 << 1) /* set just this thread */
 #define PR_SVE_GET_VL			49	/* get task vector length */
 
 #endif /* _LINUX_PRCTL_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

Currently the vector length is inherited across both fork() and
exec().

Inheritance across fork() is desirable both for creating a copy of
a process (traditional fork) or creating a thread (where we want
all threads to share the same VL by default).

Inheritance across exec() is less desirable, because of the ABI
impact of large vector lengths on the size of the signal frame --
when running a new binary, there is no guarantee that the new
binary is compatible with these ABI changes.

This flag makes the vector length non-inherited by default.
Instead, the vector length is reset to a system default value,
unless the THREAD_VL_INHERIT flag has been set for the thread.

THREAD_VL_INHERIT is currently sticky: i.e., if set, it gets
inherited too.  This behaviour may be refined in future if it is
not flexible enough.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/processor.h |  7 +++++++
 arch/arm64/kernel/fpsimd.c         | 33 ++++++++++++++++++++++++++-------
 include/uapi/linux/prctl.h         |  5 +++++
 3 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 96eada9..45ef11d 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -27,6 +27,7 @@
 
 #ifdef __KERNEL__
 
+#include <linux/prctl.h>
 #include <linux/string.h>
 
 #include <asm/alternative.h>
@@ -84,11 +85,17 @@ struct thread_struct {
 #endif
 	struct fpsimd_state	fpsimd_state;
 	u16			sve_vl;		/* SVE vector length */
+	u16			sve_flags;	/* SVE related flags */
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
 };
 
+/* Flags for sve_flags (intentionally defined to match the prctl flags) */
+
+/* Inherit sve_vl and sve_flags across execve(): */
+#define THREAD_VL_INHERIT	PR_SVE_SET_VL_INHERIT
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)						\
 ({									\
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 32caca3..f010a1c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -292,12 +292,15 @@ void fpsimd_flush_thread(void)
 		 * User tasks must have a valid vector length set, but tasks
 		 * forked early (e.g., init) may not have one yet.
 		 * By now, we will know what the hardware supports, so set the
-		 * task vector length if it doesn't have one:
+		 * task vector length to default if doesn't have one, or if
+		 * the thread wasn't configured to inherit SVE configuration:
 		 */
-		if (!current->thread.sve_vl) {
+		if (!current->thread.sve_vl ||
+		    !(current->thread.sve_flags & THREAD_VL_INHERIT)) {
 			BUG_ON(!sve_vl_valid(sve_max_vl));
 
 			current->thread.sve_vl = sve_max_vl;
+			current->thread.sve_flags = 0;
 		}
 	}
 
@@ -506,10 +509,10 @@ int sve_set_vector_length(struct task_struct *task,
 	 */
 	if (!(flags & PR_SVE_SET_VL_THREAD) && get_nr_threads(task) != 1)
 		return -EINVAL;
-
 	flags &= ~(unsigned long)PR_SVE_SET_VL_THREAD;
-	if (flags)
-		return -EINVAL; /* No other flags defined yet */
+
+	if (flags & ~(unsigned long)PR_SVE_SET_VL_INHERIT)
+		return -EINVAL;
 
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
@@ -542,11 +545,27 @@ int sve_set_vector_length(struct task_struct *task,
 
 	task->thread.sve_vl = vl;
 
+	/* The THREAD_VL_* flag encodings match the relevant PR_* flags: */
+	task->thread.sve_flags = flags;
+
 	fpsimd_flush_task_state(task);
 
 	return 0;
 }
 
+/*
+ * Encode the current vector length and flags for return.
+ * This is only required for prctl(): ptrace has separate fields
+ */
+static int sve_prctl_status(struct task_struct const *task)
+{
+	int ret = task->thread.sve_vl;
+
+	ret |= task->thread.sve_vl << 16;
+
+	return ret;
+}
+
 /* PR_SVE_SET_VL */
 int sve_set_task_vl(struct task_struct *task,
 		    unsigned long vector_length, unsigned long flags)
@@ -565,7 +584,7 @@ int sve_set_task_vl(struct task_struct *task,
 	if (ret)
 		return ret;
 
-	return task->thread.sve_vl;
+	return sve_prctl_status(task);
 }
 
 /* PR_SVE_GET_VL */
@@ -574,7 +593,7 @@ int sve_get_task_vl(struct task_struct *task)
 	if (!(elf_hwcap & HWCAP_SVE))
 		return -EINVAL;
 
-	return task->thread.sve_vl;
+	return sve_prctl_status(task);
 }
 
 #endif /* CONFIG_ARM64_SVE */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c55530b..cff6214 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -200,6 +200,11 @@ struct prctl_mm_map {
 /* arm64 Scalable Vector Extension controls */
 #define PR_SVE_SET_VL			48	/* set task vector length */
 # define PR_SVE_SET_VL_THREAD		(1 << 1) /* set just this thread */
+# define PR_SVE_SET_VL_INHERIT		(1 << 2) /* inherit across exec */
 #define PR_SVE_GET_VL			49	/* get task vector length */
+/* Decode helpers for the return value from PR_SVE_GET_VL: */
+# define PR_SVE_GET_VL_LEN(ret)		((ret) & 0x3fff) /* vector length */
+# define PR_SVE_GET_VL_INHERIT		(PR_SVE_SET_VL_INHERIT << 16)
+/* For conveinence, PR_SVE_SET_VL returns the result in the same encoding */
 
 #endif /* _LINUX_PRCTL_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

Currently the vector length is inherited across both fork() and
exec().

Inheritance across fork() is desirable both for creating a copy of
a process (traditional fork) or creating a thread (where we want
all threads to share the same VL by default).

Inheritance across exec() is less desirable, because of the ABI
impact of large vector lengths on the size of the signal frame --
when running a new binary, there is no guarantee that the new
binary is compatible with these ABI changes.

This flag makes the vector length non-inherited by default.
Instead, the vector length is reset to a system default value,
unless the THREAD_VL_INHERIT flag has been set for the thread.

THREAD_VL_INHERIT is currently sticky: i.e., if set, it gets
inherited too.  This behaviour may be refined in future if it is
not flexible enough.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/processor.h |  7 +++++++
 arch/arm64/kernel/fpsimd.c         | 33 ++++++++++++++++++++++++++-------
 include/uapi/linux/prctl.h         |  5 +++++
 3 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 96eada9..45ef11d 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -27,6 +27,7 @@
 
 #ifdef __KERNEL__
 
+#include <linux/prctl.h>
 #include <linux/string.h>
 
 #include <asm/alternative.h>
@@ -84,11 +85,17 @@ struct thread_struct {
 #endif
 	struct fpsimd_state	fpsimd_state;
 	u16			sve_vl;		/* SVE vector length */
+	u16			sve_flags;	/* SVE related flags */
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
 };
 
+/* Flags for sve_flags (intentionally defined to match the prctl flags) */
+
+/* Inherit sve_vl and sve_flags across execve(): */
+#define THREAD_VL_INHERIT	PR_SVE_SET_VL_INHERIT
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)						\
 ({									\
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 32caca3..f010a1c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -292,12 +292,15 @@ void fpsimd_flush_thread(void)
 		 * User tasks must have a valid vector length set, but tasks
 		 * forked early (e.g., init) may not have one yet.
 		 * By now, we will know what the hardware supports, so set the
-		 * task vector length if it doesn't have one:
+		 * task vector length to default if doesn't have one, or if
+		 * the thread wasn't configured to inherit SVE configuration:
 		 */
-		if (!current->thread.sve_vl) {
+		if (!current->thread.sve_vl ||
+		    !(current->thread.sve_flags & THREAD_VL_INHERIT)) {
 			BUG_ON(!sve_vl_valid(sve_max_vl));
 
 			current->thread.sve_vl = sve_max_vl;
+			current->thread.sve_flags = 0;
 		}
 	}
 
@@ -506,10 +509,10 @@ int sve_set_vector_length(struct task_struct *task,
 	 */
 	if (!(flags & PR_SVE_SET_VL_THREAD) && get_nr_threads(task) != 1)
 		return -EINVAL;
-
 	flags &= ~(unsigned long)PR_SVE_SET_VL_THREAD;
-	if (flags)
-		return -EINVAL; /* No other flags defined yet */
+
+	if (flags & ~(unsigned long)PR_SVE_SET_VL_INHERIT)
+		return -EINVAL;
 
 	if (!sve_vl_valid(vl))
 		return -EINVAL;
@@ -542,11 +545,27 @@ int sve_set_vector_length(struct task_struct *task,
 
 	task->thread.sve_vl = vl;
 
+	/* The THREAD_VL_* flag encodings match the relevant PR_* flags: */
+	task->thread.sve_flags = flags;
+
 	fpsimd_flush_task_state(task);
 
 	return 0;
 }
 
+/*
+ * Encode the current vector length and flags for return.
+ * This is only required for prctl(): ptrace has separate fields
+ */
+static int sve_prctl_status(struct task_struct const *task)
+{
+	int ret = task->thread.sve_vl;
+
+	ret |= task->thread.sve_vl << 16;
+
+	return ret;
+}
+
 /* PR_SVE_SET_VL */
 int sve_set_task_vl(struct task_struct *task,
 		    unsigned long vector_length, unsigned long flags)
@@ -565,7 +584,7 @@ int sve_set_task_vl(struct task_struct *task,
 	if (ret)
 		return ret;
 
-	return task->thread.sve_vl;
+	return sve_prctl_status(task);
 }
 
 /* PR_SVE_GET_VL */
@@ -574,7 +593,7 @@ int sve_get_task_vl(struct task_struct *task)
 	if (!(elf_hwcap & HWCAP_SVE))
 		return -EINVAL;
 
-	return task->thread.sve_vl;
+	return sve_prctl_status(task);
 }
 
 #endif /* CONFIG_ARM64_SVE */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index c55530b..cff6214 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -200,6 +200,11 @@ struct prctl_mm_map {
 /* arm64 Scalable Vector Extension controls */
 #define PR_SVE_SET_VL			48	/* set task vector length */
 # define PR_SVE_SET_VL_THREAD		(1 << 1) /* set just this thread */
+# define PR_SVE_SET_VL_INHERIT		(1 << 2) /* inherit across exec */
 #define PR_SVE_GET_VL			49	/* get task vector length */
+/* Decode helpers for the return value from PR_SVE_GET_VL: */
+# define PR_SVE_GET_VL_LEN(ret)		((ret) & 0x3fff) /* vector length */
+# define PR_SVE_GET_VL_INHERIT		(PR_SVE_SET_VL_INHERIT << 16)
+/* For conveinence, PR_SVE_SET_VL returns the result in the same encoding */
 
 #endif /* _LINUX_PRCTL_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This patch adds support for manipulating a task's vector length at
runtime via ptrace.

As a simplification, we turn the task back into an FPSIMD-only task
when changing the vector length.  If the register data is written
too, we then turn the task back into an SVE task, with changed
task_struct layout for the SVE data, before the actual data writing
is done.

Because the vector length is now variable, sve_get() now needs to
return the real maximum for user_sve_header.max_vl, since .vl may
be less than this (that's the whole point).

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h      |  1 +
 arch/arm64/include/uapi/asm/ptrace.h |  5 +++++
 arch/arm64/kernel/ptrace.c           | 25 +++++++++++++++----------
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 1ec2363..0f1b068 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -105,6 +105,7 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
 extern unsigned int sve_get_vl(void);
 extern int sve_set_vector_length(struct task_struct *task,
 				 unsigned long vl, unsigned long flags);
+extern int sve_max_vl;
 
 /*
  * FPSIMD/SVE synchronisation helpers for ptrace:
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 48b57a0..bcb542d 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -64,6 +64,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/prctl.h>
+
 /*
  * User structures for general purpose, floating point and debug registers.
  */
@@ -108,6 +110,9 @@ struct user_sve_header {
 #define SVE_PT_REGS_FPSIMD		0
 #define SVE_PT_REGS_SVE			SVE_PT_REGS_MASK
 
+#define SVE_PT_VL_THREAD		PR_SVE_SET_VL_THREAD
+#define SVE_PT_VL_INHERIT		PR_SVE_SET_VL_INHERIT
+
 
 /*
  * The remainder of the SVE state follows struct user_sve_header.  The
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 32debb8..7e40039 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -741,14 +741,15 @@ static int sve_get(struct task_struct *target,
 	BUG_ON(!sve_vl_valid(header.vl));
 	vq = sve_vq_from_vl(header.vl);
 
-	/* Until runtime or per-task vector length changing is supported: */
-	header.max_vl = header.vl;
+	BUG_ON(!sve_vl_valid(sve_max_vl));
+	header.max_vl = sve_max_vl;
 
 	header.flags = test_tsk_thread_flag(target, TIF_SVE) ?
 		SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
 
 	header.size = SVE_PT_SIZE(vq, header.flags);
-	header.max_size = SVE_PT_SIZE(vq, SVE_PT_REGS_SVE);
+	header.max_size = SVE_PT_SIZE(sve_vq_from_vl(header.max_vl),
+				      SVE_PT_REGS_SVE);
 
 	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header,
 				  0, sizeof(header));
@@ -830,14 +831,18 @@ static int sve_set(struct task_struct *target,
 	if (ret)
 		goto out;
 
-	if (header.vl != target->thread.sve_vl)
-		return -EINVAL;
-
-	BUG_ON(!sve_vl_valid(header.vl));
-	vq = sve_vq_from_vl(header.vl);
+	/*
+	 * Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by
+	 * sve_set_vector_length(), which will also validate them for us:
+	 */
+	ret = sve_set_vector_length(target, header.vl,
+				    header.flags & ~SVE_PT_REGS_MASK);
+	if (ret)
+		goto out;
 
-	if (header.flags & ~SVE_PT_REGS_MASK)
-		return -EINVAL;
+	/* Actual VL set may be less than the user asked for: */
+	BUG_ON(!sve_vl_valid(target->thread.sve_vl));
+	vq = sve_vq_from_vl(target->thread.sve_vl);
 
 	/* Registers: FPSIMD-only case */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for manipulating a task's vector length at
runtime via ptrace.

As a simplification, we turn the task back into an FPSIMD-only task
when changing the vector length.  If the register data is written
too, we then turn the task back into an SVE task, with changed
task_struct layout for the SVE data, before the actual data writing
is done.

Because the vector length is now variable, sve_get() now needs to
return the real maximum for user_sve_header.max_vl, since .vl may
be less than this (that's the whole point).

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h      |  1 +
 arch/arm64/include/uapi/asm/ptrace.h |  5 +++++
 arch/arm64/kernel/ptrace.c           | 25 +++++++++++++++----------
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 1ec2363..0f1b068 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -105,6 +105,7 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
 extern unsigned int sve_get_vl(void);
 extern int sve_set_vector_length(struct task_struct *task,
 				 unsigned long vl, unsigned long flags);
+extern int sve_max_vl;
 
 /*
  * FPSIMD/SVE synchronisation helpers for ptrace:
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 48b57a0..bcb542d 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -64,6 +64,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/prctl.h>
+
 /*
  * User structures for general purpose, floating point and debug registers.
  */
@@ -108,6 +110,9 @@ struct user_sve_header {
 #define SVE_PT_REGS_FPSIMD		0
 #define SVE_PT_REGS_SVE			SVE_PT_REGS_MASK
 
+#define SVE_PT_VL_THREAD		PR_SVE_SET_VL_THREAD
+#define SVE_PT_VL_INHERIT		PR_SVE_SET_VL_INHERIT
+
 
 /*
  * The remainder of the SVE state follows struct user_sve_header.  The
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 32debb8..7e40039 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -741,14 +741,15 @@ static int sve_get(struct task_struct *target,
 	BUG_ON(!sve_vl_valid(header.vl));
 	vq = sve_vq_from_vl(header.vl);
 
-	/* Until runtime or per-task vector length changing is supported: */
-	header.max_vl = header.vl;
+	BUG_ON(!sve_vl_valid(sve_max_vl));
+	header.max_vl = sve_max_vl;
 
 	header.flags = test_tsk_thread_flag(target, TIF_SVE) ?
 		SVE_PT_REGS_SVE : SVE_PT_REGS_FPSIMD;
 
 	header.size = SVE_PT_SIZE(vq, header.flags);
-	header.max_size = SVE_PT_SIZE(vq, SVE_PT_REGS_SVE);
+	header.max_size = SVE_PT_SIZE(sve_vq_from_vl(header.max_vl),
+				      SVE_PT_REGS_SVE);
 
 	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &header,
 				  0, sizeof(header));
@@ -830,14 +831,18 @@ static int sve_set(struct task_struct *target,
 	if (ret)
 		goto out;
 
-	if (header.vl != target->thread.sve_vl)
-		return -EINVAL;
-
-	BUG_ON(!sve_vl_valid(header.vl));
-	vq = sve_vq_from_vl(header.vl);
+	/*
+	 * Apart from PT_SVE_REGS_MASK, all PT_SVE_* flags are consumed by
+	 * sve_set_vector_length(), which will also validate them for us:
+	 */
+	ret = sve_set_vector_length(target, header.vl,
+				    header.flags & ~SVE_PT_REGS_MASK);
+	if (ret)
+		goto out;
 
-	if (header.flags & ~SVE_PT_REGS_MASK)
-		return -EINVAL;
+	/* Actual VL set may be less than the user asked for: */
+	BUG_ON(!sve_vl_valid(target->thread.sve_vl));
+	vq = sve_vq_from_vl(target->thread.sve_vl);
 
 	/* Registers: FPSIMD-only case */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 09/10] arm64/sve: Enable default vector length control via procfs
  2017-01-12 11:25 ` Dave Martin
@ 2017-01-12 11:26   ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This patch makes the default SVE vector length at exec() for user tasks
controllable via procfs, in /proc/cpu/sve_default_vector_length.

Limited effort is made to return sensible errors when writing the
procfs file, and anyway the value gets silently clamped to the
maximum VL supported by the platform: users should close and reopen
the file and read back to see the result.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---

Caveats:

 * /proc may not be the correct place for this (but it's not clear
   where in sysfs it should go either, and it's a global control with
   no corresponding kobject)

 * There seems to be no suitable framework for writable controls of
   this sort in /proc (unlike debugfs or sysfs).  So, I had to write a
   suspiciously large amount of code -- I may have missed an obvious
   trick :P


 arch/arm64/kernel/fpsimd.c | 160 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 158 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f010a1c..3c0e08c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -22,8 +22,12 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/prctl.h>
+#include <linux/proc_fs.h>
 #include <linux/sched.h>
+#include <linux/seq_file.h>
 #include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
 #include <linux/hardirq.h>
 
 #include <asm/fpsimd.h>
@@ -102,9 +106,132 @@ void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
 
 /* Maximum supported vector length across all CPUs (initially poisoned) */
 int sve_max_vl = -1;
+/* Default VL for tasks that don't set it explicitly: */
+int sve_default_vl = -1;
 
 #ifdef CONFIG_ARM64_SVE
 
+#ifdef CONFIG_PROC_FS
+
+struct default_vl_write_state {
+	bool invalid;
+	size_t len;
+	char buf[40];	/* enough for "0x" + 64-bit hex integer + NUL */
+};
+
+static int sve_default_vl_show(struct seq_file *s, void *data)
+{
+	seq_printf(s, "%d\n", sve_default_vl);
+	return 0;
+}
+
+static ssize_t sve_default_vl_write(struct file *f, const char __user *buf,
+				    size_t size, loff_t *pos)
+{
+	struct default_vl_write_state *state =
+		((struct seq_file *)f->private_data)->private;
+	long ret;
+
+	if (!size)
+		return 0;
+
+	if (*pos > sizeof(state->buf) ||
+	    size >= sizeof(state->buf) - *pos) {
+		ret = -ENOSPC;
+		goto error;
+	}
+
+	ret = copy_from_user(state->buf + *pos, buf, size);
+	if (ret > 0)
+		ret = -EINVAL;
+	if (ret)
+		goto error;
+
+	*pos += size;
+	if (*pos > state->len)
+		state->len = *pos;
+
+	return size;
+
+error:
+	state->invalid = true;
+	return ret;
+}
+
+static int sve_default_vl_release(struct inode *i, struct file *f)
+{
+	int ret = 0;
+	int t;
+	unsigned long value;
+	struct default_vl_write_state *state =
+		((struct seq_file *)f->private_data)->private;
+
+	if (!(f->f_mode & FMODE_WRITE))
+		goto out;
+
+	if (state->invalid)
+		goto out;
+
+	if (state->len >= sizeof(state->buf)) {
+		WARN_ON(1);
+		state->len = sizeof(state->buf) - 1;
+	}
+
+	state->buf[state->len] = '\0';
+	t = kstrtoul(state->buf, 0, &value);
+	if (t)
+		ret = t;
+
+	if (!sve_vl_valid(value))
+		ret = -EINVAL;
+
+	if (!sve_vl_valid(sve_max_vl)) {
+		WARN_ON(1);
+		ret = -EINVAL;
+	}
+
+	if (value > sve_max_vl)
+		value = sve_max_vl;
+
+	if (!ret)
+		sve_default_vl = value;
+
+out:
+	t = seq_release_private(i, f);
+	return ret ? ret : t;
+}
+
+static int sve_default_vl_open(struct inode *i, struct file *f)
+{
+	struct default_vl_write_state *data = NULL;
+	int ret;
+
+	if (f->f_mode & FMODE_WRITE) {
+		data = kzalloc(sizeof(*data), GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		data->invalid = false;
+		data->len = 0;
+	}
+
+	ret = single_open(f, sve_default_vl_show, data);
+	if (ret)
+		kfree(data);
+
+	return ret;
+}
+
+static const struct file_operations sve_default_vl_fops = {
+	.open		= sve_default_vl_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= sve_default_vl_release,
+	.write		= sve_default_vl_write,
+};
+
+#endif /* !CONFIG_PROC_FS */
+
 static void task_fpsimd_to_sve(struct task_struct *task);
 static void task_fpsimd_load(struct task_struct *task);
 static void task_fpsimd_save(struct task_struct *task);
@@ -299,7 +426,7 @@ void fpsimd_flush_thread(void)
 		    !(current->thread.sve_flags & THREAD_VL_INHERIT)) {
 			BUG_ON(!sve_vl_valid(sve_max_vl));
 
-			current->thread.sve_vl = sve_max_vl;
+			current->thread.sve_vl = sve_default_vl;
 			current->thread.sve_flags = 0;
 		}
 	}
@@ -723,6 +850,14 @@ void __init fpsimd_init_task_struct_size(void)
 	     & 0xf) == 1) {
 		/* FIXME: This should be the minimum across all CPUs */
 		sve_max_vl = sve_get_vl();
+		sve_default_vl = sve_max_vl;
+
+		/*
+		 * To avoid enlarging the signal frame by default, clamp to
+		 * 512 bits until/unless overridden by userspace:
+		 */
+		if (sve_default_vl > 512 / 8)
+			sve_default_vl = 512 / 8;
 
 		arch_task_struct_size = sizeof(struct task_struct) +
 			35 * sve_max_vl;
@@ -734,6 +869,27 @@ void __init fpsimd_init_task_struct_size(void)
 /*
  * FP/SIMD support code initialisation.
  */
+static int __init sve_procfs_init(void)
+{
+#if defined(CONFIG_ARM64_SVE) && defined(CONFIG_PROC_FS)
+	struct proc_dir_entry *dir;
+
+	if (!(elf_hwcap & HWCAP_SVE))
+		return 0;
+
+	/* This should be moved elsewhere is anything else ever uses it: */
+	dir = proc_mkdir("cpu", NULL);
+	if (!dir)
+		return -ENOMEM;
+
+	if (!proc_create("sve_default_vector_length",
+			 S_IRUGO | S_IWUSR, dir, &sve_default_vl_fops))
+		return -ENOMEM;
+#endif
+
+	return 0;
+}
+
 static int __init fpsimd_init(void)
 {
 	if (elf_hwcap & HWCAP_FP) {
@@ -749,6 +905,6 @@ static int __init fpsimd_init(void)
 	if (!(elf_hwcap & HWCAP_SVE))
 		pr_info("Scalable Vector Extension available\n");
 
-	return 0;
+	return sve_procfs_init();
 }
 late_initcall(fpsimd_init);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 09/10] arm64/sve: Enable default vector length control via procfs
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

This patch makes the default SVE vector length at exec() for user tasks
controllable via procfs, in /proc/cpu/sve_default_vector_length.

Limited effort is made to return sensible errors when writing the
procfs file, and anyway the value gets silently clamped to the
maximum VL supported by the platform: users should close and reopen
the file and read back to see the result.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---

Caveats:

 * /proc may not be the correct place for this (but it's not clear
   where in sysfs it should go either, and it's a global control with
   no corresponding kobject)

 * There seems to be no suitable framework for writable controls of
   this sort in /proc (unlike debugfs or sysfs).  So, I had to write a
   suspiciously large amount of code -- I may have missed an obvious
   trick :P


 arch/arm64/kernel/fpsimd.c | 160 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 158 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f010a1c..3c0e08c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -22,8 +22,12 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/prctl.h>
+#include <linux/proc_fs.h>
 #include <linux/sched.h>
+#include <linux/seq_file.h>
 #include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
 #include <linux/hardirq.h>
 
 #include <asm/fpsimd.h>
@@ -102,9 +106,132 @@ void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
 
 /* Maximum supported vector length across all CPUs (initially poisoned) */
 int sve_max_vl = -1;
+/* Default VL for tasks that don't set it explicitly: */
+int sve_default_vl = -1;
 
 #ifdef CONFIG_ARM64_SVE
 
+#ifdef CONFIG_PROC_FS
+
+struct default_vl_write_state {
+	bool invalid;
+	size_t len;
+	char buf[40];	/* enough for "0x" + 64-bit hex integer + NUL */
+};
+
+static int sve_default_vl_show(struct seq_file *s, void *data)
+{
+	seq_printf(s, "%d\n", sve_default_vl);
+	return 0;
+}
+
+static ssize_t sve_default_vl_write(struct file *f, const char __user *buf,
+				    size_t size, loff_t *pos)
+{
+	struct default_vl_write_state *state =
+		((struct seq_file *)f->private_data)->private;
+	long ret;
+
+	if (!size)
+		return 0;
+
+	if (*pos > sizeof(state->buf) ||
+	    size >= sizeof(state->buf) - *pos) {
+		ret = -ENOSPC;
+		goto error;
+	}
+
+	ret = copy_from_user(state->buf + *pos, buf, size);
+	if (ret > 0)
+		ret = -EINVAL;
+	if (ret)
+		goto error;
+
+	*pos += size;
+	if (*pos > state->len)
+		state->len = *pos;
+
+	return size;
+
+error:
+	state->invalid = true;
+	return ret;
+}
+
+static int sve_default_vl_release(struct inode *i, struct file *f)
+{
+	int ret = 0;
+	int t;
+	unsigned long value;
+	struct default_vl_write_state *state =
+		((struct seq_file *)f->private_data)->private;
+
+	if (!(f->f_mode & FMODE_WRITE))
+		goto out;
+
+	if (state->invalid)
+		goto out;
+
+	if (state->len >= sizeof(state->buf)) {
+		WARN_ON(1);
+		state->len = sizeof(state->buf) - 1;
+	}
+
+	state->buf[state->len] = '\0';
+	t = kstrtoul(state->buf, 0, &value);
+	if (t)
+		ret = t;
+
+	if (!sve_vl_valid(value))
+		ret = -EINVAL;
+
+	if (!sve_vl_valid(sve_max_vl)) {
+		WARN_ON(1);
+		ret = -EINVAL;
+	}
+
+	if (value > sve_max_vl)
+		value = sve_max_vl;
+
+	if (!ret)
+		sve_default_vl = value;
+
+out:
+	t = seq_release_private(i, f);
+	return ret ? ret : t;
+}
+
+static int sve_default_vl_open(struct inode *i, struct file *f)
+{
+	struct default_vl_write_state *data = NULL;
+	int ret;
+
+	if (f->f_mode & FMODE_WRITE) {
+		data = kzalloc(sizeof(*data), GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		data->invalid = false;
+		data->len = 0;
+	}
+
+	ret = single_open(f, sve_default_vl_show, data);
+	if (ret)
+		kfree(data);
+
+	return ret;
+}
+
+static const struct file_operations sve_default_vl_fops = {
+	.open		= sve_default_vl_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= sve_default_vl_release,
+	.write		= sve_default_vl_write,
+};
+
+#endif /* !CONFIG_PROC_FS */
+
 static void task_fpsimd_to_sve(struct task_struct *task);
 static void task_fpsimd_load(struct task_struct *task);
 static void task_fpsimd_save(struct task_struct *task);
@@ -299,7 +426,7 @@ void fpsimd_flush_thread(void)
 		    !(current->thread.sve_flags & THREAD_VL_INHERIT)) {
 			BUG_ON(!sve_vl_valid(sve_max_vl));
 
-			current->thread.sve_vl = sve_max_vl;
+			current->thread.sve_vl = sve_default_vl;
 			current->thread.sve_flags = 0;
 		}
 	}
@@ -723,6 +850,14 @@ void __init fpsimd_init_task_struct_size(void)
 	     & 0xf) == 1) {
 		/* FIXME: This should be the minimum across all CPUs */
 		sve_max_vl = sve_get_vl();
+		sve_default_vl = sve_max_vl;
+
+		/*
+		 * To avoid enlarging the signal frame by default, clamp to
+		 * 512 bits until/unless overridden by userspace:
+		 */
+		if (sve_default_vl > 512 / 8)
+			sve_default_vl = 512 / 8;
 
 		arch_task_struct_size = sizeof(struct task_struct) +
 			35 * sve_max_vl;
@@ -734,6 +869,27 @@ void __init fpsimd_init_task_struct_size(void)
 /*
  * FP/SIMD support code initialisation.
  */
+static int __init sve_procfs_init(void)
+{
+#if defined(CONFIG_ARM64_SVE) && defined(CONFIG_PROC_FS)
+	struct proc_dir_entry *dir;
+
+	if (!(elf_hwcap & HWCAP_SVE))
+		return 0;
+
+	/* This should be moved elsewhere is anything else ever uses it: */
+	dir = proc_mkdir("cpu", NULL);
+	if (!dir)
+		return -ENOMEM;
+
+	if (!proc_create("sve_default_vector_length",
+			 S_IRUGO | S_IWUSR, dir, &sve_default_vl_fops))
+		return -ENOMEM;
+#endif
+
+	return 0;
+}
+
 static int __init fpsimd_init(void)
 {
 	if (elf_hwcap & HWCAP_FP) {
@@ -749,6 +905,6 @@ static int __init fpsimd_init(void)
 	if (!(elf_hwcap & HWCAP_SVE))
 		pr_info("Scalable Vector Extension available\n");
 
-	return 0;
+	return sve_procfs_init();
 }
 late_initcall(fpsimd_init);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 10/10] Revert "arm64/sve: Limit vector length to 512 bits by default"
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Yao Qi, Szabolcs Nagy, Joseph Myers,
	Alan Hayward, Torvald Riegel, Christoffer Dall

This reverts commit cdedb254ccddd9c6a8db91b3737690327e48dfe7.

Vector length limiting via Kconfig is no longer required now that
the default vector length can be manipulated at runtime via procfs.
---
 arch/arm64/Kconfig   | 35 -----------------------------------
 arch/arm64/mm/proc.S |  5 -----
 2 files changed, 40 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 98c0934..bced568 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -900,41 +900,6 @@ config ARM64_SVE
 
 	  To enable use of this extension on CPUs that implement it, say Y.
 
-if ARM64_SVE
-
-config ARM64_SVE_FULL_VECTOR_LENGTH
-	bool "Enable full hardware vector length for userspace"
-	default n
-	help
-	  SVE vector lengths greater than 512 bits impact the size of signal
-	  frames and therefore the size requirements for any userspace stack
-	  onto which a signal may be delivered.  Using larger vector lengths
-	  may therefore cause problems for some software.  For this reason, the
-	  kernel currently limits the maximum vector length for userspace
-	  software to 512 bits by default.
-
-	  Enabling this option removes the limit, so that the full vector
-	  length implemented by the hardware is made available to userspace.
-
-	  Be aware: in general, software that (a) does not use SVE (including
-	  via libraries), or (b) does not handle signals, or (c) uses default
-	  process/thread stack sizes and does not use sigaltstack(2) should be
-	  unaffected by enabling larger vectors.  Software that does not meet
-	  these criteria or that relies on certain legacy uses of the ucontext
-	  API may be affected however.
-
-	  This is a transitional compatibility option only and will be replaced
-	  by a userspace ABI extension in the future.  Do not assume that this
-	  option will be available with compatible effect in future Linux
-	  releases.
-
-	  If you are developing software that uses SVE and understand the
-	  implications, you can consider saying Y here.
-
-	  If unsure, say N.
-
-endif
-
 config ARM64_MODULE_CMODEL_LARGE
 	bool
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index da2d602..aa96b4c 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -200,11 +200,6 @@ ENTRY(__cpu_setup)
 
 	mrs_s	x5, ZIDR_EL1			// SVE: Enable full vector len
 	and	x5, x5, #ZCR_EL1_LEN_MASK	// initially
-#ifndef CONFIG_ARM64_SVE_FULL_VECTOR_LENGTH
-	mov	x6, #(512 / 128 - 1)		// Clamp VL to 512 bits
-	cmp	x5, x6
-	csel	x5, x5, x6, lo
-#endif
 	msr_s	ZCR_EL1, x5
 
 	b	2f
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 10/10] Revert "arm64/sve: Limit vector length to 512 bits by default"
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Alan Hayward, Christoffer Dall,
	linux-arch, libc-alpha, Florian Weimer, Joseph Myers,
	Szabolcs Nagy, Torvald Riegel, gdb, Yao Qi

This reverts commit cdedb254ccddd9c6a8db91b3737690327e48dfe7.

Vector length limiting via Kconfig is no longer required now that
the default vector length can be manipulated at runtime via procfs.
---
 arch/arm64/Kconfig   | 35 -----------------------------------
 arch/arm64/mm/proc.S |  5 -----
 2 files changed, 40 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 98c0934..bced568 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -900,41 +900,6 @@ config ARM64_SVE
 
 	  To enable use of this extension on CPUs that implement it, say Y.
 
-if ARM64_SVE
-
-config ARM64_SVE_FULL_VECTOR_LENGTH
-	bool "Enable full hardware vector length for userspace"
-	default n
-	help
-	  SVE vector lengths greater than 512 bits impact the size of signal
-	  frames and therefore the size requirements for any userspace stack
-	  onto which a signal may be delivered.  Using larger vector lengths
-	  may therefore cause problems for some software.  For this reason, the
-	  kernel currently limits the maximum vector length for userspace
-	  software to 512 bits by default.
-
-	  Enabling this option removes the limit, so that the full vector
-	  length implemented by the hardware is made available to userspace.
-
-	  Be aware: in general, software that (a) does not use SVE (including
-	  via libraries), or (b) does not handle signals, or (c) uses default
-	  process/thread stack sizes and does not use sigaltstack(2) should be
-	  unaffected by enabling larger vectors.  Software that does not meet
-	  these criteria or that relies on certain legacy uses of the ucontext
-	  API may be affected however.
-
-	  This is a transitional compatibility option only and will be replaced
-	  by a userspace ABI extension in the future.  Do not assume that this
-	  option will be available with compatible effect in future Linux
-	  releases.
-
-	  If you are developing software that uses SVE and understand the
-	  implications, you can consider saying Y here.
-
-	  If unsure, say N.
-
-endif
-
 config ARM64_MODULE_CMODEL_LARGE
 	bool
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index da2d602..aa96b4c 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -200,11 +200,6 @@ ENTRY(__cpu_setup)
 
 	mrs_s	x5, ZIDR_EL1			// SVE: Enable full vector len
 	and	x5, x5, #ZCR_EL1_LEN_MASK	// initially
-#ifndef CONFIG_ARM64_SVE_FULL_VECTOR_LENGTH
-	mov	x6, #(512 / 128 - 1)		// Clamp VL to 512 bits
-	cmp	x5, x6
-	csel	x5, x5, x6, lo
-#endif
 	msr_s	ZCR_EL1, x5
 
 	b	2f
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 10/10] Revert "arm64/sve: Limit vector length to 512 bits by default"
@ 2017-01-12 11:26   ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-12 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

This reverts commit cdedb254ccddd9c6a8db91b3737690327e48dfe7.

Vector length limiting via Kconfig is no longer required now that
the default vector length can be manipulated at runtime via procfs.
---
 arch/arm64/Kconfig   | 35 -----------------------------------
 arch/arm64/mm/proc.S |  5 -----
 2 files changed, 40 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 98c0934..bced568 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -900,41 +900,6 @@ config ARM64_SVE
 
 	  To enable use of this extension on CPUs that implement it, say Y.
 
-if ARM64_SVE
-
-config ARM64_SVE_FULL_VECTOR_LENGTH
-	bool "Enable full hardware vector length for userspace"
-	default n
-	help
-	  SVE vector lengths greater than 512 bits impact the size of signal
-	  frames and therefore the size requirements for any userspace stack
-	  onto which a signal may be delivered.  Using larger vector lengths
-	  may therefore cause problems for some software.  For this reason, the
-	  kernel currently limits the maximum vector length for userspace
-	  software to 512 bits by default.
-
-	  Enabling this option removes the limit, so that the full vector
-	  length implemented by the hardware is made available to userspace.
-
-	  Be aware: in general, software that (a) does not use SVE (including
-	  via libraries), or (b) does not handle signals, or (c) uses default
-	  process/thread stack sizes and does not use sigaltstack(2) should be
-	  unaffected by enabling larger vectors.  Software that does not meet
-	  these criteria or that relies on certain legacy uses of the ucontext
-	  API may be affected however.
-
-	  This is a transitional compatibility option only and will be replaced
-	  by a userspace ABI extension in the future.  Do not assume that this
-	  option will be available with compatible effect in future Linux
-	  releases.
-
-	  If you are developing software that uses SVE and understand the
-	  implications, you can consider saying Y here.
-
-	  If unsure, say N.
-
-endif
-
 config ARM64_MODULE_CMODEL_LARGE
 	bool
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index da2d602..aa96b4c 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -200,11 +200,6 @@ ENTRY(__cpu_setup)
 
 	mrs_s	x5, ZIDR_EL1			// SVE: Enable full vector len
 	and	x5, x5, #ZCR_EL1_LEN_MASK	// initially
-#ifndef CONFIG_ARM64_SVE_FULL_VECTOR_LENGTH
-	mov	x6, #(512 / 128 - 1)		// Clamp VL to 512 bits
-	cmp	x5, x6
-	csel	x5, x5, x6, lo
-#endif
 	msr_s	ZCR_EL1, x5
 
 	b	2f
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default
  2017-01-12 11:26   ` Dave Martin
@ 2017-01-16 11:34     ` Yao Qi
  -1 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 11:34 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, Ard Biesheuvel, Marc Zyngier, Alan Hayward,
	Christoffer Dall, linux-arch, libc-alpha, Florian Weimer,
	Joseph Myers, Szabolcs Nagy, Torvald Riegel, gdb

On 17-01-12 11:26:05, Dave Martin wrote:
> General-purpose code in userspace is not expected to work correctly
> if multiple threads are allowed to run concurrently with different
> vector lengths in a single process.
> 
> This patch adds an explicit flag PR_SVE_SET_VL_THREAD to request
> this behaviour.  Without the flag, vector length setting is
> permitted only for a single-threaded process (which matches the
> expected usage model of setting the vector length at process
> startup).

Hi Dave,
PR_SVE_SET_VL_THREAD can be arch-independent, IMO, because prctl
needs a scope.  Looks some of them are system-wide, some of them are
about threads within the same process (like, PR_MPX_ENABLE_MANAGEMENT).
IOW, PR_SVE_SET_VL_THREAD can be general flag, to indicate the scope
of each new ptrcl command is per-thread.

I happen to see PR_SET_FP_MODE in man pages, which is about setting
FP register modes in runtime.  It is a little similar to setting VL in
this patch.  However the doc doesn't mention the effect or the scope
of this command.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default
@ 2017-01-16 11:34     ` Yao Qi
  0 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 17-01-12 11:26:05, Dave Martin wrote:
> General-purpose code in userspace is not expected to work correctly
> if multiple threads are allowed to run concurrently with different
> vector lengths in a single process.
> 
> This patch adds an explicit flag PR_SVE_SET_VL_THREAD to request
> this behaviour.  Without the flag, vector length setting is
> permitted only for a single-threaded process (which matches the
> expected usage model of setting the vector length at process
> startup).

Hi Dave,
PR_SVE_SET_VL_THREAD can be arch-independent, IMO, because prctl
needs a scope.  Looks some of them are system-wide, some of them are
about threads within the same process (like, PR_MPX_ENABLE_MANAGEMENT).
IOW, PR_SVE_SET_VL_THREAD can be general flag, to indicate the scope
of each new ptrcl command is per-thread.

I happen to see PR_SET_FP_MODE in man pages, which is about setting
FP register modes in runtime.  It is a little similar to setting VL in
this patch.  However the doc doesn't mention the effect or the scope
of this command.

-- 
Yao (??)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-12 11:26   ` Dave Martin
@ 2017-01-16 12:20     ` Yao Qi
  -1 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 12:20 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, Ard Biesheuvel, Marc Zyngier, Alan Hayward,
	Christoffer Dall, linux-arch, libc-alpha, Florian Weimer,
	Joseph Myers, Szabolcs Nagy, Torvald Riegel, gdb

On 17-01-12 11:26:07, Dave Martin wrote:
> This patch adds support for manipulating a task's vector length at
> runtime via ptrace.
> 

I hope kernel doesn't provide such interface to ptracer to change vector
length.  The vector length is sort of a read-only property of thread/process/
program to debugger, unless we really have a clear requirement to modify
vector length in debugging.  I may miss something because I haven't debug
SVE code yet.

> As a simplification, we turn the task back into an FPSIMD-only task
> when changing the vector length.  If the register data is written
> too, we then turn the task back into an SVE task, with changed
> task_struct layout for the SVE data, before the actual data writing
> is done.
> 
> Because the vector length is now variable, sve_get() now needs to
> return the real maximum for user_sve_header.max_vl, since .vl may
> be less than this (that's the whole point).
> 

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-16 12:20     ` Yao Qi
  0 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 12:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 17-01-12 11:26:07, Dave Martin wrote:
> This patch adds support for manipulating a task's vector length at
> runtime via ptrace.
> 

I hope kernel doesn't provide such interface to ptracer to change vector
length.  The vector length is sort of a read-only property of thread/process/
program to debugger, unless we really have a clear requirement to modify
vector length in debugging.  I may miss something because I haven't debug
SVE code yet.

> As a simplification, we turn the task back into an FPSIMD-only task
> when changing the vector length.  If the register data is written
> too, we then turn the task back into an SVE task, with changed
> task_struct layout for the SVE data, before the actual data writing
> is done.
> 
> Because the vector length is now variable, sve_get() now needs to
> return the real maximum for user_sve_header.max_vl, since .vl may
> be less than this (that's the whole point).
> 

-- 
Yao (??)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default
  2017-01-16 11:34     ` Yao Qi
@ 2017-01-16 12:23       ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 12:23 UTC (permalink / raw)
  To: Yao Qi
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Joseph Myers, Szabolcs Nagy, linux-arm-kernel,
	Alan Hayward, Torvald Riegel, Christoffer Dall

On Mon, Jan 16, 2017 at 11:34:39AM +0000, Yao Qi wrote:
> On 17-01-12 11:26:05, Dave Martin wrote:
> > General-purpose code in userspace is not expected to work correctly
> > if multiple threads are allowed to run concurrently with different
> > vector lengths in a single process.
> > 
> > This patch adds an explicit flag PR_SVE_SET_VL_THREAD to request
> > this behaviour.  Without the flag, vector length setting is
> > permitted only for a single-threaded process (which matches the
> > expected usage model of setting the vector length at process
> > startup).

To be clear, PR_SVE_SET_VL_THREAD is not persistent, it just overrides
the default one-thread-per-process restriction for this prctl call.

The idea is that if someone writes some code to set the VL and then
moves the code to a multithreaded environment, by default it will stop
working.  This is a hint that some actual work is likely needed to
port their code to work with multiple threads.

> Hi Dave,
> PR_SVE_SET_VL_THREAD can be arch-independent, IMO, because prctl
> needs a scope.  Looks some of them are system-wide, some of them are
> about threads within the same process (like, PR_MPX_ENABLE_MANAGEMENT).
> IOW, PR_SVE_SET_VL_THREAD can be general flag, to indicate the scope
> of each new ptrcl command is per-thread.

This can't be backported to the existing prctls because that would
change their behaviour.   Rather, what each prctl applies (thread or
process) is part of the definition of that particular prctl.

Since there are no other prctl() calls that can apply per-thread or
per-process, or that differ only in this regard, is seems a bit esoteric
to try to apply this concept across all prctls... ?

Which prctl()s are system-wide?  I didn't see any, but I may have missed
something.

> I happen to see PR_SET_FP_MODE in man pages, which is about setting
> FP register modes in runtime.  It is a little similar to setting VL in
> this patch.  However the doc doesn't mention the effect or the scope
> of this command.

The various FP/SIMD twiddling prctls() all seem to be arch-specific.
PR_SET_FP_MODE only exists for mips.

Unless the semantics are really the same, I'm not too keen on an arm64
prctl with the same name.

Putting "ARM64" in the name of the new prctls might be clearer, but
nobody seemed to care so far...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default
@ 2017-01-16 12:23       ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 16, 2017 at 11:34:39AM +0000, Yao Qi wrote:
> On 17-01-12 11:26:05, Dave Martin wrote:
> > General-purpose code in userspace is not expected to work correctly
> > if multiple threads are allowed to run concurrently with different
> > vector lengths in a single process.
> > 
> > This patch adds an explicit flag PR_SVE_SET_VL_THREAD to request
> > this behaviour.  Without the flag, vector length setting is
> > permitted only for a single-threaded process (which matches the
> > expected usage model of setting the vector length at process
> > startup).

To be clear, PR_SVE_SET_VL_THREAD is not persistent, it just overrides
the default one-thread-per-process restriction for this prctl call.

The idea is that if someone writes some code to set the VL and then
moves the code to a multithreaded environment, by default it will stop
working.  This is a hint that some actual work is likely needed to
port their code to work with multiple threads.

> Hi Dave,
> PR_SVE_SET_VL_THREAD can be arch-independent, IMO, because prctl
> needs a scope.  Looks some of them are system-wide, some of them are
> about threads within the same process (like, PR_MPX_ENABLE_MANAGEMENT).
> IOW, PR_SVE_SET_VL_THREAD can be general flag, to indicate the scope
> of each new ptrcl command is per-thread.

This can't be backported to the existing prctls because that would
change their behaviour.   Rather, what each prctl applies (thread or
process) is part of the definition of that particular prctl.

Since there are no other prctl() calls that can apply per-thread or
per-process, or that differ only in this regard, is seems a bit esoteric
to try to apply this concept across all prctls... ?

Which prctl()s are system-wide?  I didn't see any, but I may have missed
something.

> I happen to see PR_SET_FP_MODE in man pages, which is about setting
> FP register modes in runtime.  It is a little similar to setting VL in
> this patch.  However the doc doesn't mention the effect or the scope
> of this command.

The various FP/SIMD twiddling prctls() all seem to be arch-specific.
PR_SET_FP_MODE only exists for mips.

Unless the semantics are really the same, I'm not too keen on an arm64
prctl with the same name.

Putting "ARM64" in the name of the new prctls might be clearer, but
nobody seemed to care so far...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control
  2017-01-12 11:26   ` Dave Martin
@ 2017-01-16 12:27     ` Yao Qi
  -1 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 12:27 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, Ard Biesheuvel, Marc Zyngier, Alan Hayward,
	Christoffer Dall, linux-arch, libc-alpha, Florian Weimer,
	Joseph Myers, Szabolcs Nagy, Torvald Riegel, gdb

On 17-01-12 11:26:06, Dave Martin wrote:
> Currently the vector length is inherited across both fork() and
> exec().
> 
> Inheritance across fork() is desirable both for creating a copy of
> a process (traditional fork) or creating a thread (where we want
> all threads to share the same VL by default).
> 
> Inheritance across exec() is less desirable, because of the ABI
> impact of large vector lengths on the size of the signal frame --
> when running a new binary, there is no guarantee that the new
> binary is compatible with these ABI changes.
> 
> This flag makes the vector length non-inherited by default.

Can we make vector length inherited across fork but non-inherited
across exec by default?

> Instead, the vector length is reset to a system default value,
> unless the THREAD_VL_INHERIT flag has been set for the thread.
> 
> THREAD_VL_INHERIT is currently sticky: i.e., if set, it gets
> inherited too.  This behaviour may be refined in future if it is
> not flexible enough.
> 

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control
@ 2017-01-16 12:27     ` Yao Qi
  0 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 17-01-12 11:26:06, Dave Martin wrote:
> Currently the vector length is inherited across both fork() and
> exec().
> 
> Inheritance across fork() is desirable both for creating a copy of
> a process (traditional fork) or creating a thread (where we want
> all threads to share the same VL by default).
> 
> Inheritance across exec() is less desirable, because of the ABI
> impact of large vector lengths on the size of the signal frame --
> when running a new binary, there is no guarantee that the new
> binary is compatible with these ABI changes.
> 
> This flag makes the vector length non-inherited by default.

Can we make vector length inherited across fork but non-inherited
across exec by default?

> Instead, the vector length is reset to a system default value,
> unless the THREAD_VL_INHERIT flag has been set for the thread.
> 
> THREAD_VL_INHERIT is currently sticky: i.e., if set, it gets
> inherited too.  This behaviour may be refined in future if it is
> not flexible enough.
> 

-- 
Yao (??)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-16 12:20     ` Yao Qi
@ 2017-01-16 13:32       ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 13:32 UTC (permalink / raw)
  To: Yao Qi
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Joseph Myers, Szabolcs Nagy, linux-arm-kernel,
	Alan Hayward, Torvald Riegel, Christoffer Dall

On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> On 17-01-12 11:26:07, Dave Martin wrote:
> > This patch adds support for manipulating a task's vector length at
> > runtime via ptrace.
> > 
> 
> I hope kernel doesn't provide such interface to ptracer to change vector
> length.

It does, with this patch, beacuse...

> The vector length is sort of a read-only property of thread/process/
> program to debugger, unless we really have a clear requirement to modify
> vector length in debugging.  I may miss something because I haven't debug
> SVE code yet.

...the vector length is no longer read-only for the task, thanks to
the new prctls().

This does add complexity, but I figured that any programmer's model
state that the thread can modify for itself should be modifiable by the
debugger, if for no other reason than the user may want to experiment to
see what happens.  Without a ptrace interface, it would be necessary
to inject a prctl() call into the target, which is possible but awkward.

gdb must already re-detect the vector length on stop, since the target
could have called the prctl() in the meantime.

Access via ptrace also allows things like trapping on exec, fork or
clone and changing the vector length for the new process or thread
before it starts to run.  I'm guessing here, but such a scenario seems
legitimate (?)

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-16 13:32       ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> On 17-01-12 11:26:07, Dave Martin wrote:
> > This patch adds support for manipulating a task's vector length at
> > runtime via ptrace.
> > 
> 
> I hope kernel doesn't provide such interface to ptracer to change vector
> length.

It does, with this patch, beacuse...

> The vector length is sort of a read-only property of thread/process/
> program to debugger, unless we really have a clear requirement to modify
> vector length in debugging.  I may miss something because I haven't debug
> SVE code yet.

...the vector length is no longer read-only for the task, thanks to
the new prctls().

This does add complexity, but I figured that any programmer's model
state that the thread can modify for itself should be modifiable by the
debugger, if for no other reason than the user may want to experiment to
see what happens.  Without a ptrace interface, it would be necessary
to inject a prctl() call into the target, which is possible but awkward.

gdb must already re-detect the vector length on stop, since the target
could have called the prctl() in the meantime.

Access via ptrace also allows things like trapping on exec, fork or
clone and changing the vector length for the new process or thread
before it starts to run.  I'm guessing here, but such a scenario seems
legitimate (?)

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control
  2017-01-16 12:27     ` Yao Qi
@ 2017-01-16 13:34       ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 13:34 UTC (permalink / raw)
  To: Yao Qi
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Joseph Myers, Szabolcs Nagy, linux-arm-kernel,
	Alan Hayward, Torvald Riegel, Christoffer Dall

On Mon, Jan 16, 2017 at 12:27:55PM +0000, Yao Qi wrote:
> On 17-01-12 11:26:06, Dave Martin wrote:
> > Currently the vector length is inherited across both fork() and
> > exec().
> > 
> > Inheritance across fork() is desirable both for creating a copy of
> > a process (traditional fork) or creating a thread (where we want
> > all threads to share the same VL by default).
> > 
> > Inheritance across exec() is less desirable, because of the ABI
> > impact of large vector lengths on the size of the signal frame --
> > when running a new binary, there is no guarantee that the new
> > binary is compatible with these ABI changes.
> > 
> > This flag makes the vector length non-inherited by default.
> 
> Can we make vector length inherited across fork but non-inherited
> across exec by default?

That is the behaviour: I always inherit across fork/clone, since
you are still running the same binary after those.

I could word the commit message a bit more clearly here.

Cheers
---Dave 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control
@ 2017-01-16 13:34       ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 16, 2017 at 12:27:55PM +0000, Yao Qi wrote:
> On 17-01-12 11:26:06, Dave Martin wrote:
> > Currently the vector length is inherited across both fork() and
> > exec().
> > 
> > Inheritance across fork() is desirable both for creating a copy of
> > a process (traditional fork) or creating a thread (where we want
> > all threads to share the same VL by default).
> > 
> > Inheritance across exec() is less desirable, because of the ABI
> > impact of large vector lengths on the size of the signal frame --
> > when running a new binary, there is no guarantee that the new
> > binary is compatible with these ABI changes.
> > 
> > This flag makes the vector length non-inherited by default.
> 
> Can we make vector length inherited across fork but non-inherited
> across exec by default?

That is the behaviour: I always inherit across fork/clone, since
you are still running the same binary after those.

I could word the commit message a bit more clearly here.

Cheers
---Dave 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-16 13:32       ` Dave Martin
@ 2017-01-16 15:11         ` Yao Qi
  -1 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 15:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Joseph Myers, Szabolcs Nagy, linux-arm-kernel,
	Alan Hayward, Torvald Riegel, Christoffer Dall

On 17-01-16 13:32:31, Dave Martin wrote:
> On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> > On 17-01-12 11:26:07, Dave Martin wrote:
> > > This patch adds support for manipulating a task's vector length at
> > > runtime via ptrace.
> > > 
> > 
> > I hope kernel doesn't provide such interface to ptracer to change vector
> > length.
> 
> It does, with this patch, beacuse...
> 
> > The vector length is sort of a read-only property of thread/process/
> > program to debugger, unless we really have a clear requirement to modify
> > vector length in debugging.  I may miss something because I haven't debug
> > SVE code yet.
> 
> ...the vector length is no longer read-only for the task, thanks to
> the new prctls().

What I meant "read-only" is that debugger can't change it, while the program
itself can change it via prctl().

> 
> This does add complexity, but I figured that any programmer's model
> state that the thread can modify for itself should be modifiable by the
> debugger, if for no other reason than the user may want to experiment to
> see what happens.  Without a ptrace interface, it would be necessary
> to inject a prctl() call into the target, which is possible but awkward.

We only need such interface if it is useful, see more below.

Suppose it is useful to change vector length through ptrace, we should align
ptrace interface to prctl() as much as possible.  Looks that both prctl
change and ptrace change can go through sve_set_vector_length, easy to keep
two consistent.

> 
> gdb must already re-detect the vector length on stop, since the target
> could have called the prctl() in the meantime.

Yes, gdb assumes the vector length may be changed, so it re-detects on
every stop, but I don't see the need for gdb to change the vector length.

> 
> Access via ptrace also allows things like trapping on exec, fork or
> clone and changing the vector length for the new process or thread
> before it starts to run.  I'm guessing here, but such a scenario seems
> legitimate (?)
> 

Yes, these cases are valid, but the usefulness is still questionable to
me.  I just doubt that SVE developers do need to change vector length
when they are debugging code.  Note that it is not my strong objection
to this patch, if kernel people believe this is useful, I am fine with
it.

-- 
Yao (齐尧)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-16 15:11         ` Yao Qi
  0 siblings, 0 replies; 49+ messages in thread
From: Yao Qi @ 2017-01-16 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 17-01-16 13:32:31, Dave Martin wrote:
> On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> > On 17-01-12 11:26:07, Dave Martin wrote:
> > > This patch adds support for manipulating a task's vector length at
> > > runtime via ptrace.
> > > 
> > 
> > I hope kernel doesn't provide such interface to ptracer to change vector
> > length.
> 
> It does, with this patch, beacuse...
> 
> > The vector length is sort of a read-only property of thread/process/
> > program to debugger, unless we really have a clear requirement to modify
> > vector length in debugging.  I may miss something because I haven't debug
> > SVE code yet.
> 
> ...the vector length is no longer read-only for the task, thanks to
> the new prctls().

What I meant "read-only" is that debugger can't change it, while the program
itself can change it via prctl().

> 
> This does add complexity, but I figured that any programmer's model
> state that the thread can modify for itself should be modifiable by the
> debugger, if for no other reason than the user may want to experiment to
> see what happens.  Without a ptrace interface, it would be necessary
> to inject a prctl() call into the target, which is possible but awkward.

We only need such interface if it is useful, see more below.

Suppose it is useful to change vector length through ptrace, we should align
ptrace interface to prctl() as much as possible.  Looks that both prctl
change and ptrace change can go through sve_set_vector_length, easy to keep
two consistent.

> 
> gdb must already re-detect the vector length on stop, since the target
> could have called the prctl() in the meantime.

Yes, gdb assumes the vector length may be changed, so it re-detects on
every stop, but I don't see the need for gdb to change the vector length.

> 
> Access via ptrace also allows things like trapping on exec, fork or
> clone and changing the vector length for the new process or thread
> before it starts to run.  I'm guessing here, but such a scenario seems
> legitimate (?)
> 

Yes, these cases are valid, but the usefulness is still questionable to
me.  I just doubt that SVE developers do need to change vector length
when they are debugging code.  Note that it is not my strong objection
to this patch, if kernel people believe this is useful, I am fine with
it.

-- 
Yao (??)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-16 15:11         ` Yao Qi
@ 2017-01-16 15:47           ` Pedro Alves
  -1 siblings, 0 replies; 49+ messages in thread
From: Pedro Alves @ 2017-01-16 15:47 UTC (permalink / raw)
  To: Yao Qi, Dave Martin
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Joseph Myers, Szabolcs Nagy, linux-arm-kernel,
	Alan Hayward, Torvald Riegel, Christoffer Dall

On 01/16/2017 03:11 PM, Yao Qi wrote:
> 
>> > 
>> > gdb must already re-detect the vector length on stop, since the target
>> > could have called the prctl() in the meantime.
> Yes, gdb assumes the vector length may be changed, so it re-detects on
> every stop, but I don't see the need for gdb to change the vector length.
> 

Do we need to consider inferior function calls here?

Say the program is stopped in code that assumes "vector length N", and
the user does "print some_function_that_assumes_some_other_vector_length ()".

Is that a use case we need to cover?

If so, to make it work correctly, the debugger needs to be able to change the
vector length to the length assumed by that called function, and then
restore it back after the call completes (or is aborted).

I have no idea whether the debugger will be able to figure
out a function's assumed vector length from debug info or some such.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-16 15:47           ` Pedro Alves
  0 siblings, 0 replies; 49+ messages in thread
From: Pedro Alves @ 2017-01-16 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/16/2017 03:11 PM, Yao Qi wrote:
> 
>> > 
>> > gdb must already re-detect the vector length on stop, since the target
>> > could have called the prctl() in the meantime.
> Yes, gdb assumes the vector length may be changed, so it re-detects on
> every stop, but I don't see the need for gdb to change the vector length.
> 

Do we need to consider inferior function calls here?

Say the program is stopped in code that assumes "vector length N", and
the user does "print some_function_that_assumes_some_other_vector_length ()".

Is that a use case we need to cover?

If so, to make it work correctly, the debugger needs to be able to change the
vector length to the length assumed by that called function, and then
restore it back after the call completes (or is aborted).

I have no idea whether the debugger will be able to figure
out a function's assumed vector length from debug info or some such.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-16 15:47           ` Pedro Alves
@ 2017-01-16 16:31             ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 16:31 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Yao Qi, linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Christoffer Dall, Szabolcs Nagy,
	linux-arm-kernel, Alan Hayward, Torvald Riegel, Joseph Myers

On Mon, Jan 16, 2017 at 03:47:55PM +0000, Pedro Alves wrote:
> On 01/16/2017 03:11 PM, Yao Qi wrote:
> > 
> >> > 
> >> > gdb must already re-detect the vector length on stop, since the target
> >> > could have called the prctl() in the meantime.
> > Yes, gdb assumes the vector length may be changed, so it re-detects on
> > every stop, but I don't see the need for gdb to change the vector length.
> > 
> 
> Do we need to consider inferior function calls here?
> 
> Say the program is stopped in code that assumes "vector length N", and
> the user does "print some_function_that_assumes_some_other_vector_length ()".
> 
> Is that a use case we need to cover?
> 
> If so, to make it work correctly, the debugger needs to be able to change the
> vector length to the length assumed by that called function, and then
> restore it back after the call completes (or is aborted).
> 
> I have no idea whether the debugger will be able to figure
> out a function's assumed vector length from debug info or some such.

I think the proposed ptrace interface can support this -- i.e., it
should provide everything needed to save off the VL and register state,
switch VL, do something else, then restore the VL and state (if not,
that's a bug).

My current position is that determining what vector length is
required by what function or binary is a 100% userspace problem, though.

ELF/DWARF could have annotations about this, though it wouldn't
necessarily be per-function -- you might require a whole image to be
built for the same vector length (if any).

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-16 16:31             ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-16 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 16, 2017 at 03:47:55PM +0000, Pedro Alves wrote:
> On 01/16/2017 03:11 PM, Yao Qi wrote:
> > 
> >> > 
> >> > gdb must already re-detect the vector length on stop, since the target
> >> > could have called the prctl() in the meantime.
> > Yes, gdb assumes the vector length may be changed, so it re-detects on
> > every stop, but I don't see the need for gdb to change the vector length.
> > 
> 
> Do we need to consider inferior function calls here?
> 
> Say the program is stopped in code that assumes "vector length N", and
> the user does "print some_function_that_assumes_some_other_vector_length ()".
> 
> Is that a use case we need to cover?
> 
> If so, to make it work correctly, the debugger needs to be able to change the
> vector length to the length assumed by that called function, and then
> restore it back after the call completes (or is aborted).
> 
> I have no idea whether the debugger will be able to figure
> out a function's assumed vector length from debug info or some such.

I think the proposed ptrace interface can support this -- i.e., it
should provide everything needed to save off the VL and register state,
switch VL, do something else, then restore the VL and state (if not,
that's a bug).

My current position is that determining what vector length is
required by what function or binary is a 100% userspace problem, though.

ELF/DWARF could have annotations about this, though it wouldn't
necessarily be per-function -- you might require a whole image to be
built for the same vector length (if any).

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-16 15:11         ` Yao Qi
@ 2017-01-17 10:03           ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-17 10:03 UTC (permalink / raw)
  To: Yao Qi
  Cc: linux-arch, Florian Weimer, libc-alpha, Ard Biesheuvel,
	Marc Zyngier, gdb, Christoffer Dall, Szabolcs Nagy,
	linux-arm-kernel, Alan Hayward, Torvald Riegel, Joseph Myers

On Mon, Jan 16, 2017 at 03:11:56PM +0000, Yao Qi wrote:
> On 17-01-16 13:32:31, Dave Martin wrote:
> > On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> > > On 17-01-12 11:26:07, Dave Martin wrote:
> > > > This patch adds support for manipulating a task's vector length at
> > > > runtime via ptrace.
> > > > 
> > > 
> > > I hope kernel doesn't provide such interface to ptracer to change vector
> > > length.
> > 
> > It does, with this patch, beacuse...
> > 
> > > The vector length is sort of a read-only property of thread/process/
> > > program to debugger, unless we really have a clear requirement to modify
> > > vector length in debugging.  I may miss something because I haven't debug
> > > SVE code yet.
> > 
> > ...the vector length is no longer read-only for the task, thanks to
> > the new prctls().
> 
> What I meant "read-only" is that debugger can't change it, while the program
> itself can change it via prctl().

I see.

> > 
> > This does add complexity, but I figured that any programmer's model
> > state that the thread can modify for itself should be modifiable by the
> > debugger, if for no other reason than the user may want to experiment to
> > see what happens.  Without a ptrace interface, it would be necessary
> > to inject a prctl() call into the target, which is possible but awkward.
> 
> We only need such interface if it is useful, see more below.
> 
> Suppose it is useful to change vector length through ptrace, we should align
> ptrace interface to prctl() as much as possible.  Looks that both prctl
> change and ptrace change can go through sve_set_vector_length, easy to keep
> two consistent.
> 
> > 
> > gdb must already re-detect the vector length on stop, since the target
> > could have called the prctl() in the meantime.
> 
> Yes, gdb assumes the vector length may be changed, so it re-detects on
> every stop, but I don't see the need for gdb to change the vector length.
> 
> > 
> > Access via ptrace also allows things like trapping on exec, fork or
> > clone and changing the vector length for the new process or thread
> > before it starts to run.  I'm guessing here, but such a scenario seems
> > legitimate (?)
> > 
> 
> Yes, these cases are valid, but the usefulness is still questionable to
> me.  I just doubt that SVE developers do need to change vector length
> when they are debugging code.  Note that it is not my strong objection
> to this patch, if kernel people believe this is useful, I am fine with
> it.

That's fair.  I'll leave the patch there for now and see if anyone else
has a comment to make, but it could be removed without affecting
anything else.

Are there situations in which injecting a function call into the target
won't work, i.e., where we couldn't do:

set prctl(...)

?

Using the prctl interface this way, it would also be preferable to refer
to the #defines by name.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-17 10:03           ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-17 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 16, 2017 at 03:11:56PM +0000, Yao Qi wrote:
> On 17-01-16 13:32:31, Dave Martin wrote:
> > On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> > > On 17-01-12 11:26:07, Dave Martin wrote:
> > > > This patch adds support for manipulating a task's vector length at
> > > > runtime via ptrace.
> > > > 
> > > 
> > > I hope kernel doesn't provide such interface to ptracer to change vector
> > > length.
> > 
> > It does, with this patch, beacuse...
> > 
> > > The vector length is sort of a read-only property of thread/process/
> > > program to debugger, unless we really have a clear requirement to modify
> > > vector length in debugging.  I may miss something because I haven't debug
> > > SVE code yet.
> > 
> > ...the vector length is no longer read-only for the task, thanks to
> > the new prctls().
> 
> What I meant "read-only" is that debugger can't change it, while the program
> itself can change it via prctl().

I see.

> > 
> > This does add complexity, but I figured that any programmer's model
> > state that the thread can modify for itself should be modifiable by the
> > debugger, if for no other reason than the user may want to experiment to
> > see what happens.  Without a ptrace interface, it would be necessary
> > to inject a prctl() call into the target, which is possible but awkward.
> 
> We only need such interface if it is useful, see more below.
> 
> Suppose it is useful to change vector length through ptrace, we should align
> ptrace interface to prctl() as much as possible.  Looks that both prctl
> change and ptrace change can go through sve_set_vector_length, easy to keep
> two consistent.
> 
> > 
> > gdb must already re-detect the vector length on stop, since the target
> > could have called the prctl() in the meantime.
> 
> Yes, gdb assumes the vector length may be changed, so it re-detects on
> every stop, but I don't see the need for gdb to change the vector length.
> 
> > 
> > Access via ptrace also allows things like trapping on exec, fork or
> > clone and changing the vector length for the new process or thread
> > before it starts to run.  I'm guessing here, but such a scenario seems
> > legitimate (?)
> > 
> 
> Yes, these cases are valid, but the usefulness is still questionable to
> me.  I just doubt that SVE developers do need to change vector length
> when they are debugging code.  Note that it is not my strong objection
> to this patch, if kernel people believe this is useful, I am fine with
> it.

That's fair.  I'll leave the patch there for now and see if anyone else
has a comment to make, but it could be removed without affecting
anything else.

Are there situations in which injecting a function call into the target
won't work, i.e., where we couldn't do:

set prctl(...)

?

Using the prctl interface this way, it would also be preferable to refer
to the #defines by name.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-17 10:03           ` Dave Martin
@ 2017-01-17 13:31             ` Alan Hayward
  -1 siblings, 0 replies; 49+ messages in thread
From: Alan Hayward @ 2017-01-17 13:31 UTC (permalink / raw)
  To: Dave P Martin
  Cc: Yao Qi, linux-arch, Florian Weimer, libc-alpha, ard.biesheuvel,
	Marc Zyngier, gdb, Christoffer Dall, Szabolcs Nagy,
	linux-arm-kernel, Torvald Riegel, Joseph Myers, nd


> On 17 Jan 2017, at 10:03, Dave Martin <Dave.Martin@arm.com> wrote:
> 
> On Mon, Jan 16, 2017 at 03:11:56PM +0000, Yao Qi wrote:
>> On 17-01-16 13:32:31, Dave Martin wrote:
>>> On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
>>>> On 17-01-12 11:26:07, Dave Martin wrote:
>>>>> This patch adds support for manipulating a task's vector length at
>>>>> runtime via ptrace.
>>>>> 
>>>> 
>>>> I hope kernel doesn't provide such interface to ptracer to change vector
>>>> length.
>>> 
>>> It does, with this patch, beacuse...
>>> 
>>>> The vector length is sort of a read-only property of thread/process/
>>>> program to debugger, unless we really have a clear requirement to modify
>>>> vector length in debugging.  I may miss something because I haven't debug
>>>> SVE code yet.
>>> 
>>> ...the vector length is no longer read-only for the task, thanks to
>>> the new prctls().
>> 
>> What I meant "read-only" is that debugger can't change it, while the program
>> itself can change it via prctl().
> 
> I see.
> 
>>> 
>>> This does add complexity, but I figured that any programmer's model
>>> state that the thread can modify for itself should be modifiable by the
>>> debugger, if for no other reason than the user may want to experiment to
>>> see what happens.  Without a ptrace interface, it would be necessary
>>> to inject a prctl() call into the target, which is possible but awkward.
>> 
>> We only need such interface if it is useful, see more below.
>> 
>> Suppose it is useful to change vector length through ptrace, we should align
>> ptrace interface to prctl() as much as possible.  Looks that both prctl
>> change and ptrace change can go through sve_set_vector_length, easy to keep
>> two consistent.
>> 
>>> 
>>> gdb must already re-detect the vector length on stop, since the target
>>> could have called the prctl() in the meantime.
>> 
>> Yes, gdb assumes the vector length may be changed, so it re-detects on
>> every stop, but I don't see the need for gdb to change the vector length.
>> 
>>> 
>>> Access via ptrace also allows things like trapping on exec, fork or
>>> clone and changing the vector length for the new process or thread
>>> before it starts to run.  I'm guessing here, but such a scenario seems
>>> legitimate (?)
>>> 
>> 
>> Yes, these cases are valid, but the usefulness is still questionable to
>> me.  I just doubt that SVE developers do need to change vector length
>> when they are debugging code.  Note that it is not my strong objection
>> to this patch, if kernel people believe this is useful, I am fine with
>> it.
> 
> That's fair.  I'll leave the patch there for now and see if anyone else
> has a comment to make, but it could be removed without affecting
> anything else.
> 

I would say that whilst it is a very dangerous thing to do and has many
consequences, there is a requirement for a gdb user to be able to change VL
whilst debugging a running process, and I don’t think we should see
changing VL as much different from changing a register value on the fly.

Say you have a loop in assembly you are trying to debug - you might write
to $x2 and then single step to see how this effects the result. With SVE
code you might want to see how different VL values will effect the layout
of results in the vectors, how it effects the predicates and how it changes
the number of iterations the loop makes. Of course, once you exit the
loop all bets are off - just like if you had been changing register values.

The current proposal for gdb is that we will show $VL in the list of
registers, therefore for consistency it’d make sense for the gdb user to
be able to set it as if it was just another register. For this we need a
simple way to change the VL in another process, and I think ptrace() is
the easiest way (given that prctl() only changes its own process).


> Are there situations in which injecting a function call into the target
> won't work, i.e., where we couldn't do:
> 
> set prctl(...)
> 
> ?
> 
> Using the prctl interface this way, it would also be preferable to refer
> to the #defines by name.
> 
> Cheers
> —Dave


Thanks,
Alan.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-17 13:31             ` Alan Hayward
  0 siblings, 0 replies; 49+ messages in thread
From: Alan Hayward @ 2017-01-17 13:31 UTC (permalink / raw)
  To: linux-arm-kernel


> On 17 Jan 2017, at 10:03, Dave Martin <Dave.Martin@arm.com> wrote:
> 
> On Mon, Jan 16, 2017 at 03:11:56PM +0000, Yao Qi wrote:
>> On 17-01-16 13:32:31, Dave Martin wrote:
>>> On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
>>>> On 17-01-12 11:26:07, Dave Martin wrote:
>>>>> This patch adds support for manipulating a task's vector length at
>>>>> runtime via ptrace.
>>>>> 
>>>> 
>>>> I hope kernel doesn't provide such interface to ptracer to change vector
>>>> length.
>>> 
>>> It does, with this patch, beacuse...
>>> 
>>>> The vector length is sort of a read-only property of thread/process/
>>>> program to debugger, unless we really have a clear requirement to modify
>>>> vector length in debugging.  I may miss something because I haven't debug
>>>> SVE code yet.
>>> 
>>> ...the vector length is no longer read-only for the task, thanks to
>>> the new prctls().
>> 
>> What I meant "read-only" is that debugger can't change it, while the program
>> itself can change it via prctl().
> 
> I see.
> 
>>> 
>>> This does add complexity, but I figured that any programmer's model
>>> state that the thread can modify for itself should be modifiable by the
>>> debugger, if for no other reason than the user may want to experiment to
>>> see what happens.  Without a ptrace interface, it would be necessary
>>> to inject a prctl() call into the target, which is possible but awkward.
>> 
>> We only need such interface if it is useful, see more below.
>> 
>> Suppose it is useful to change vector length through ptrace, we should align
>> ptrace interface to prctl() as much as possible.  Looks that both prctl
>> change and ptrace change can go through sve_set_vector_length, easy to keep
>> two consistent.
>> 
>>> 
>>> gdb must already re-detect the vector length on stop, since the target
>>> could have called the prctl() in the meantime.
>> 
>> Yes, gdb assumes the vector length may be changed, so it re-detects on
>> every stop, but I don't see the need for gdb to change the vector length.
>> 
>>> 
>>> Access via ptrace also allows things like trapping on exec, fork or
>>> clone and changing the vector length for the new process or thread
>>> before it starts to run.  I'm guessing here, but such a scenario seems
>>> legitimate (?)
>>> 
>> 
>> Yes, these cases are valid, but the usefulness is still questionable to
>> me.  I just doubt that SVE developers do need to change vector length
>> when they are debugging code.  Note that it is not my strong objection
>> to this patch, if kernel people believe this is useful, I am fine with
>> it.
> 
> That's fair.  I'll leave the patch there for now and see if anyone else
> has a comment to make, but it could be removed without affecting
> anything else.
> 

I would say that whilst it is a very dangerous thing to do and has many
consequences, there is a requirement for a gdb user to be able to change VL
whilst debugging a running process, and I don?t think we should see
changing VL as much different from changing a register value on the fly.

Say you have a loop in assembly you are trying to debug - you might write
to $x2 and then single step to see how this effects the result. With SVE
code you might want to see how different VL values will effect the layout
of results in the vectors, how it effects the predicates and how it changes
the number of iterations the loop makes. Of course, once you exit the
loop all bets are off - just like if you had been changing register values.

The current proposal for gdb is that we will show $VL in the list of
registers, therefore for consistency it?d make sense for the gdb user to
be able to set it as if it was just another register. For this we need a
simple way to change the VL in another process, and I think ptrace() is
the easiest way (given that prctl() only changes its own process).


> Are there situations in which injecting a function call into the target
> won't work, i.e., where we couldn't do:
> 
> set prctl(...)
> 
> ?
> 
> Using the prctl interface this way, it would also be preferable to refer
> to the #defines by name.
> 
> Cheers
> ?Dave


Thanks,
Alan.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
  2017-01-17 13:31             ` Alan Hayward
@ 2017-01-19 17:11               ` Dave Martin
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-19 17:11 UTC (permalink / raw)
  To: Alan Hayward
  Cc: linux-arch, Florian Weimer, nd, libc-alpha, ard.biesheuvel,
	Marc Zyngier, gdb, Joseph Myers, Yao Qi, Szabolcs Nagy,
	linux-arm-kernel, Torvald Riegel, Christoffer Dall

On Tue, Jan 17, 2017 at 01:31:03PM +0000, Alan Hayward wrote:
> 
> > On 17 Jan 2017, at 10:03, Dave Martin <Dave.Martin@arm.com> wrote:
> > 
> > On Mon, Jan 16, 2017 at 03:11:56PM +0000, Yao Qi wrote:
> >> On 17-01-16 13:32:31, Dave Martin wrote:
> >>> On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> >>>> On 17-01-12 11:26:07, Dave Martin wrote:
> >>>>> This patch adds support for manipulating a task's vector length at
> >>>>> runtime via ptrace.
> >>>>> 
> >>>> 
> >>>> I hope kernel doesn't provide such interface to ptracer to change vector
> >>>> length.
> >>> 
> >>> It does, with this patch, beacuse...
> >>> 
> >>>> The vector length is sort of a read-only property of thread/process/
> >>>> program to debugger, unless we really have a clear requirement to modify
> >>>> vector length in debugging.  I may miss something because I haven't debug
> >>>> SVE code yet.
> >>> 
> >>> ...the vector length is no longer read-only for the task, thanks to
> >>> the new prctls().
> >> 
> >> What I meant "read-only" is that debugger can't change it, while the program
> >> itself can change it via prctl().
> > 
> > I see.
> > 
> >>> 
> >>> This does add complexity, but I figured that any programmer's model
> >>> state that the thread can modify for itself should be modifiable by the
> >>> debugger, if for no other reason than the user may want to experiment to
> >>> see what happens.  Without a ptrace interface, it would be necessary
> >>> to inject a prctl() call into the target, which is possible but awkward.
> >> 
> >> We only need such interface if it is useful, see more below.
> >> 
> >> Suppose it is useful to change vector length through ptrace, we should align
> >> ptrace interface to prctl() as much as possible.  Looks that both prctl
> >> change and ptrace change can go through sve_set_vector_length, easy to keep
> >> two consistent.
> >> 
> >>> 
> >>> gdb must already re-detect the vector length on stop, since the target
> >>> could have called the prctl() in the meantime.
> >> 
> >> Yes, gdb assumes the vector length may be changed, so it re-detects on
> >> every stop, but I don't see the need for gdb to change the vector length.
> >> 
> >>> 
> >>> Access via ptrace also allows things like trapping on exec, fork or
> >>> clone and changing the vector length for the new process or thread
> >>> before it starts to run.  I'm guessing here, but such a scenario seems
> >>> legitimate (?)
> >>> 
> >> 
> >> Yes, these cases are valid, but the usefulness is still questionable to
> >> me.  I just doubt that SVE developers do need to change vector length
> >> when they are debugging code.  Note that it is not my strong objection
> >> to this patch, if kernel people believe this is useful, I am fine with
> >> it.
> > 
> > That's fair.  I'll leave the patch there for now and see if anyone else
> > has a comment to make, but it could be removed without affecting
> > anything else.
> > 
> 
> I would say that whilst it is a very dangerous thing to do and has many

ptrace is inherently dangerous for the target task... that's rather the
point.

> consequences, there is a requirement for a gdb user to be able to change VL
> whilst debugging a running process, and I don’t think we should see
> changing VL as much different from changing a register value on the fly.
> 
> Say you have a loop in assembly you are trying to debug - you might write
> to $x2 and then single step to see how this effects the result. With SVE
> code you might want to see how different VL values will effect the layout
> of results in the vectors, how it effects the predicates and how it changes
> the number of iterations the loop makes. Of course, once you exit the
> loop all bets are off - just like if you had been changing register values.
> 
> The current proposal for gdb is that we will show $VL in the list of
> registers, therefore for consistency it’d make sense for the gdb user to
> be able to set it as if it was just another register. For this we need a
> simple way to change the VL in another process, and I think ptrace() is
> the easiest way (given that prctl() only changes its own process).

OK, I'll keep it for now, unless somebody has a strong objection.

It doesn't affect the underlying plumbing much -- doing this via
ptrace() is actually the simpler of the two options, since the task
is stopped and thus less synchronisation is needed.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting
@ 2017-01-19 17:11               ` Dave Martin
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Martin @ 2017-01-19 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 01:31:03PM +0000, Alan Hayward wrote:
> 
> > On 17 Jan 2017, at 10:03, Dave Martin <Dave.Martin@arm.com> wrote:
> > 
> > On Mon, Jan 16, 2017 at 03:11:56PM +0000, Yao Qi wrote:
> >> On 17-01-16 13:32:31, Dave Martin wrote:
> >>> On Mon, Jan 16, 2017 at 12:20:38PM +0000, Yao Qi wrote:
> >>>> On 17-01-12 11:26:07, Dave Martin wrote:
> >>>>> This patch adds support for manipulating a task's vector length at
> >>>>> runtime via ptrace.
> >>>>> 
> >>>> 
> >>>> I hope kernel doesn't provide such interface to ptracer to change vector
> >>>> length.
> >>> 
> >>> It does, with this patch, beacuse...
> >>> 
> >>>> The vector length is sort of a read-only property of thread/process/
> >>>> program to debugger, unless we really have a clear requirement to modify
> >>>> vector length in debugging.  I may miss something because I haven't debug
> >>>> SVE code yet.
> >>> 
> >>> ...the vector length is no longer read-only for the task, thanks to
> >>> the new prctls().
> >> 
> >> What I meant "read-only" is that debugger can't change it, while the program
> >> itself can change it via prctl().
> > 
> > I see.
> > 
> >>> 
> >>> This does add complexity, but I figured that any programmer's model
> >>> state that the thread can modify for itself should be modifiable by the
> >>> debugger, if for no other reason than the user may want to experiment to
> >>> see what happens.  Without a ptrace interface, it would be necessary
> >>> to inject a prctl() call into the target, which is possible but awkward.
> >> 
> >> We only need such interface if it is useful, see more below.
> >> 
> >> Suppose it is useful to change vector length through ptrace, we should align
> >> ptrace interface to prctl() as much as possible.  Looks that both prctl
> >> change and ptrace change can go through sve_set_vector_length, easy to keep
> >> two consistent.
> >> 
> >>> 
> >>> gdb must already re-detect the vector length on stop, since the target
> >>> could have called the prctl() in the meantime.
> >> 
> >> Yes, gdb assumes the vector length may be changed, so it re-detects on
> >> every stop, but I don't see the need for gdb to change the vector length.
> >> 
> >>> 
> >>> Access via ptrace also allows things like trapping on exec, fork or
> >>> clone and changing the vector length for the new process or thread
> >>> before it starts to run.  I'm guessing here, but such a scenario seems
> >>> legitimate (?)
> >>> 
> >> 
> >> Yes, these cases are valid, but the usefulness is still questionable to
> >> me.  I just doubt that SVE developers do need to change vector length
> >> when they are debugging code.  Note that it is not my strong objection
> >> to this patch, if kernel people believe this is useful, I am fine with
> >> it.
> > 
> > That's fair.  I'll leave the patch there for now and see if anyone else
> > has a comment to make, but it could be removed without affecting
> > anything else.
> > 
> 
> I would say that whilst it is a very dangerous thing to do and has many

ptrace is inherently dangerous for the target task... that's rather the
point.

> consequences, there is a requirement for a gdb user to be able to change VL
> whilst debugging a running process, and I don?t think we should see
> changing VL as much different from changing a register value on the fly.
> 
> Say you have a loop in assembly you are trying to debug - you might write
> to $x2 and then single step to see how this effects the result. With SVE
> code you might want to see how different VL values will effect the layout
> of results in the vectors, how it effects the predicates and how it changes
> the number of iterations the loop makes. Of course, once you exit the
> loop all bets are off - just like if you had been changing register values.
> 
> The current proposal for gdb is that we will show $VL in the list of
> registers, therefore for consistency it?d make sense for the gdb user to
> be able to set it as if it was just another register. For this we need a
> simple way to change the VL in another process, and I think ptrace() is
> the easiest way (given that prctl() only changes its own process).

OK, I'll keep it for now, unless somebody has a strong objection.

It doesn't affect the underlying plumbing much -- doing this via
ptrace() is actually the simpler of the two options, since the task
is stopped and thus less synchronisation is needed.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2017-01-19 17:11 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 11:25 [RFC PATCH 00/10] arm64/sve: Add userspace vector length control API Dave Martin
2017-01-12 11:25 ` Dave Martin
2017-01-12 11:25 ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 01/10] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls Dave Martin
2017-01-12 11:26   ` [RFC PATCH 01/10] prctl: Add skeleton for PR_SVE_{SET, GET}_VL controls Dave Martin
2017-01-12 11:26 ` [RFC PATCH 02/10] arm64/sve: Track vector length for each task Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 03/10] arm64/sve: Set CPU vector length to match current task Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 04/10] arm64/sve: Factor out clearing of tasks' SVE regs Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 05/10] arm64/sve: Wire up vector length control prctl() calls Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 06/10] arm64/sve: Disallow VL setting for individual threads by default Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-16 11:34   ` Yao Qi
2017-01-16 11:34     ` Yao Qi
2017-01-16 12:23     ` Dave Martin
2017-01-16 12:23       ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 07/10] arm64/sve: Add vector length inheritance control Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-16 12:27   ` Yao Qi
2017-01-16 12:27     ` Yao Qi
2017-01-16 13:34     ` Dave Martin
2017-01-16 13:34       ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 08/10] arm64/sve: ptrace: Wire up vector length control and reporting Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-16 12:20   ` Yao Qi
2017-01-16 12:20     ` Yao Qi
2017-01-16 13:32     ` Dave Martin
2017-01-16 13:32       ` Dave Martin
2017-01-16 15:11       ` Yao Qi
2017-01-16 15:11         ` Yao Qi
2017-01-16 15:47         ` Pedro Alves
2017-01-16 15:47           ` Pedro Alves
2017-01-16 16:31           ` Dave Martin
2017-01-16 16:31             ` Dave Martin
2017-01-17 10:03         ` Dave Martin
2017-01-17 10:03           ` Dave Martin
2017-01-17 13:31           ` Alan Hayward
2017-01-17 13:31             ` Alan Hayward
2017-01-19 17:11             ` Dave Martin
2017-01-19 17:11               ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 09/10] arm64/sve: Enable default vector length control via procfs Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26 ` [RFC PATCH 10/10] Revert "arm64/sve: Limit vector length to 512 bits by default" Dave Martin
2017-01-12 11:26   ` Dave Martin
2017-01-12 11:26   ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.