All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] riscv: Userspace pointer masking and tagged address ABI
@ 2024-03-19 21:58 ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Andrew Jones

RISC-V defines three extensions for pointer masking[1]:
 - Smmpm: configured in M-mode, affects M-mode
 - Smnpm: configured in M-mode, affects the next lower mode (S or U-mode)
 - Ssnpm: configured in S-mode, affects the next lower mode (U-mode)

This series adds support for configuring Smnpm or Ssnpm (depending on
which mode the kernel is running in) to allow pointer masking in
userspace by extending the existing PR_SET_TAGGED_ADDR_CTRL API from
arm64. Unlike arm64 TBI, userspace pointer masking is not enabled by
default on RISC-V. Additionally, the tag width (referred to as PMLEN) is
variable, so userspace needs to ask the kernel for a specific tag width
(which is interpreted as a minimum number of tag bits).

This series also adds support for a tagged address ABI similar to arm64.
Since accesses from the kernel to user memory use the kernel's pointer
masking configuration, not the user's, the kernel must untag user
pointers in software before dereferencing them.

This series can be tested in QEMU by applying a patch set[2].

KASAN support is not included here because there is not yet any standard
way for the kernel to ask firmware to enable pointer masking in S-mode.

[1]: https://github.com/riscv/riscv-j-extension/raw/a1e68469c60/zjpm-spec.pdf
[2]: https://patchwork.kernel.org/project/qemu-devel/list/?series=822467&archive=both


Samuel Holland (9):
  dt-bindings: riscv: Add pointer masking ISA extensions
  riscv: Add ISA extension parsing for pointer masking
  riscv: Add CSR definitions for pointer masking
  riscv: Define is_compat_thread()
  riscv: Split per-CPU and per-thread envcfg bits
  riscv: Add support for userspace pointer masking
  riscv: Add support for the tagged address ABI
  riscv: Allow ptrace control of the tagged address ABI
  selftests: riscv: Add a pointer masking test

 .../devicetree/bindings/riscv/extensions.yaml |  18 +
 arch/riscv/Kconfig                            |   8 +
 arch/riscv/include/asm/compat.h               |  16 +
 arch/riscv/include/asm/cpufeature.h           |   2 +
 arch/riscv/include/asm/csr.h                  |  16 +
 arch/riscv/include/asm/hwcap.h                |   5 +
 arch/riscv/include/asm/processor.h            |  10 +
 arch/riscv/include/asm/switch_to.h            |  12 +
 arch/riscv/include/asm/uaccess.h              |  40 ++-
 arch/riscv/kernel/cpufeature.c                |   7 +-
 arch/riscv/kernel/process.c                   | 154 +++++++++
 arch/riscv/kernel/ptrace.c                    |  42 +++
 include/uapi/linux/elf.h                      |   1 +
 include/uapi/linux/prctl.h                    |   3 +
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/tags/Makefile   |  10 +
 .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++
 17 files changed, 646 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/riscv/tags/Makefile
 create mode 100644 tools/testing/selftests/riscv/tags/pointer_masking.c

-- 
2.43.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 0/9] riscv: Userspace pointer masking and tagged address ABI
@ 2024-03-19 21:58 ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Andrew Jones

RISC-V defines three extensions for pointer masking[1]:
 - Smmpm: configured in M-mode, affects M-mode
 - Smnpm: configured in M-mode, affects the next lower mode (S or U-mode)
 - Ssnpm: configured in S-mode, affects the next lower mode (U-mode)

This series adds support for configuring Smnpm or Ssnpm (depending on
which mode the kernel is running in) to allow pointer masking in
userspace by extending the existing PR_SET_TAGGED_ADDR_CTRL API from
arm64. Unlike arm64 TBI, userspace pointer masking is not enabled by
default on RISC-V. Additionally, the tag width (referred to as PMLEN) is
variable, so userspace needs to ask the kernel for a specific tag width
(which is interpreted as a minimum number of tag bits).

This series also adds support for a tagged address ABI similar to arm64.
Since accesses from the kernel to user memory use the kernel's pointer
masking configuration, not the user's, the kernel must untag user
pointers in software before dereferencing them.

This series can be tested in QEMU by applying a patch set[2].

KASAN support is not included here because there is not yet any standard
way for the kernel to ask firmware to enable pointer masking in S-mode.

[1]: https://github.com/riscv/riscv-j-extension/raw/a1e68469c60/zjpm-spec.pdf
[2]: https://patchwork.kernel.org/project/qemu-devel/list/?series=822467&archive=both


Samuel Holland (9):
  dt-bindings: riscv: Add pointer masking ISA extensions
  riscv: Add ISA extension parsing for pointer masking
  riscv: Add CSR definitions for pointer masking
  riscv: Define is_compat_thread()
  riscv: Split per-CPU and per-thread envcfg bits
  riscv: Add support for userspace pointer masking
  riscv: Add support for the tagged address ABI
  riscv: Allow ptrace control of the tagged address ABI
  selftests: riscv: Add a pointer masking test

 .../devicetree/bindings/riscv/extensions.yaml |  18 +
 arch/riscv/Kconfig                            |   8 +
 arch/riscv/include/asm/compat.h               |  16 +
 arch/riscv/include/asm/cpufeature.h           |   2 +
 arch/riscv/include/asm/csr.h                  |  16 +
 arch/riscv/include/asm/hwcap.h                |   5 +
 arch/riscv/include/asm/processor.h            |  10 +
 arch/riscv/include/asm/switch_to.h            |  12 +
 arch/riscv/include/asm/uaccess.h              |  40 ++-
 arch/riscv/kernel/cpufeature.c                |   7 +-
 arch/riscv/kernel/process.c                   | 154 +++++++++
 arch/riscv/kernel/ptrace.c                    |  42 +++
 include/uapi/linux/elf.h                      |   1 +
 include/uapi/linux/prctl.h                    |   3 +
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/tags/Makefile   |  10 +
 .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++
 17 files changed, 646 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/riscv/tags/Makefile
 create mode 100644 tools/testing/selftests/riscv/tags/pointer_masking.c

-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 1/9] dt-bindings: riscv: Add pointer masking ISA extensions
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Paul Walmsley

The RISC-V Pointer Masking specification defines three extensions:
Smmpm, Smnpm, and Ssnpm. Document the behavior as of the current draft
of the specification, which is version 0.8.4.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 .../devicetree/bindings/riscv/extensions.yaml  | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 63d81dc895e5..bb7d5d84f31f 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -128,6 +128,18 @@ properties:
             changes to interrupts as frozen at commit ccbddab ("Merge pull
             request #42 from riscv/jhauser-2023-RC4") of riscv-aia.
 
+        - const: smmpm
+          description: |
+            The standard Smmpm extension for M-mode pointer masking as defined
+            at commit a1e68469c60 ("Minor correction to pointer masking spec.")
+            of riscv-j-extension.
+
+        - const: smnpm
+          description: |
+            The standard Smnpm extension for next-mode pointer masking as defined
+            at commit a1e68469c60 ("Minor correction to pointer masking spec.")
+            of riscv-j-extension.
+
         - const: smstateen
           description: |
             The standard Smstateen extension for controlling access to CSRs
@@ -147,6 +159,12 @@ properties:
             and mode-based filtering as ratified at commit 01d1df0 ("Add ability
             to manually trigger workflow. (#2)") of riscv-count-overflow.
 
+        - const: ssnpm
+          description: |
+            The standard Ssnpm extension for next-mode pointer masking as defined
+            at commit a1e68469c60 ("Minor correction to pointer masking spec.")
+            of riscv-j-extension.
+
         - const: sstc
           description: |
             The standard Sstc supervisor-level extension for time compare as
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 1/9] dt-bindings: riscv: Add pointer masking ISA extensions
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Paul Walmsley

The RISC-V Pointer Masking specification defines three extensions:
Smmpm, Smnpm, and Ssnpm. Document the behavior as of the current draft
of the specification, which is version 0.8.4.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 .../devicetree/bindings/riscv/extensions.yaml  | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 63d81dc895e5..bb7d5d84f31f 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -128,6 +128,18 @@ properties:
             changes to interrupts as frozen at commit ccbddab ("Merge pull
             request #42 from riscv/jhauser-2023-RC4") of riscv-aia.
 
+        - const: smmpm
+          description: |
+            The standard Smmpm extension for M-mode pointer masking as defined
+            at commit a1e68469c60 ("Minor correction to pointer masking spec.")
+            of riscv-j-extension.
+
+        - const: smnpm
+          description: |
+            The standard Smnpm extension for next-mode pointer masking as defined
+            at commit a1e68469c60 ("Minor correction to pointer masking spec.")
+            of riscv-j-extension.
+
         - const: smstateen
           description: |
             The standard Smstateen extension for controlling access to CSRs
@@ -147,6 +159,12 @@ properties:
             and mode-based filtering as ratified at commit 01d1df0 ("Add ability
             to manually trigger workflow. (#2)") of riscv-count-overflow.
 
+        - const: ssnpm
+          description: |
+            The standard Ssnpm extension for next-mode pointer masking as defined
+            at commit a1e68469c60 ("Minor correction to pointer masking spec.")
+            of riscv-j-extension.
+
         - const: sstc
           description: |
             The standard Sstc supervisor-level extension for time compare as
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 2/9] riscv: Add ISA extension parsing for pointer masking
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Andrew Jones

The RISC-V Pointer Masking specification defines three extensions:
Smmpm, Smnpm, and Ssnpm. Add support for parsing each of them.

Smmpm implies the existence of the mseccfg CSR. As it is the only user
of this CSR so far, there is no need for an Xlinuxmseccfg extension.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/hwcap.h | 5 +++++
 arch/riscv/kernel/cpufeature.c | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 1f2d2599c655..1a21dfc47f08 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -80,6 +80,9 @@
 #define RISCV_ISA_EXT_ZFA		71
 #define RISCV_ISA_EXT_ZTSO		72
 #define RISCV_ISA_EXT_ZACAS		73
+#define RISCV_ISA_EXT_SMMPM		74
+#define RISCV_ISA_EXT_SMNPM		75
+#define RISCV_ISA_EXT_SSNPM		76
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
@@ -88,8 +91,10 @@
 
 #ifdef CONFIG_RISCV_M_MODE
 #define RISCV_ISA_EXT_SxAIA		RISCV_ISA_EXT_SMAIA
+#define RISCV_ISA_EXT_SxNPM		RISCV_ISA_EXT_SMNPM
 #else
 #define RISCV_ISA_EXT_SxAIA		RISCV_ISA_EXT_SSAIA
+#define RISCV_ISA_EXT_SxNPM		RISCV_ISA_EXT_SSNPM
 #endif
 
 #endif /* _ASM_RISCV_HWCAP_H */
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 79a5a35fab96..d1846aab1f78 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -311,9 +311,12 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_BUNDLE(zvksg, riscv_zvksg_bundled_exts),
 	__RISCV_ISA_EXT_DATA(zvkt, RISCV_ISA_EXT_ZVKT),
 	__RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
+	__RISCV_ISA_EXT_DATA(smmpm, RISCV_ISA_EXT_SMMPM),
+	__RISCV_ISA_EXT_SUPERSET(smnpm, RISCV_ISA_EXT_SMNPM, riscv_xlinuxenvcfg_exts),
 	__RISCV_ISA_EXT_DATA(smstateen, RISCV_ISA_EXT_SMSTATEEN),
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
+	__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
 	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 2/9] riscv: Add ISA extension parsing for pointer masking
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Andrew Jones

The RISC-V Pointer Masking specification defines three extensions:
Smmpm, Smnpm, and Ssnpm. Add support for parsing each of them.

Smmpm implies the existence of the mseccfg CSR. As it is the only user
of this CSR so far, there is no need for an Xlinuxmseccfg extension.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/hwcap.h | 5 +++++
 arch/riscv/kernel/cpufeature.c | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 1f2d2599c655..1a21dfc47f08 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -80,6 +80,9 @@
 #define RISCV_ISA_EXT_ZFA		71
 #define RISCV_ISA_EXT_ZTSO		72
 #define RISCV_ISA_EXT_ZACAS		73
+#define RISCV_ISA_EXT_SMMPM		74
+#define RISCV_ISA_EXT_SMNPM		75
+#define RISCV_ISA_EXT_SSNPM		76
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
@@ -88,8 +91,10 @@
 
 #ifdef CONFIG_RISCV_M_MODE
 #define RISCV_ISA_EXT_SxAIA		RISCV_ISA_EXT_SMAIA
+#define RISCV_ISA_EXT_SxNPM		RISCV_ISA_EXT_SMNPM
 #else
 #define RISCV_ISA_EXT_SxAIA		RISCV_ISA_EXT_SSAIA
+#define RISCV_ISA_EXT_SxNPM		RISCV_ISA_EXT_SSNPM
 #endif
 
 #endif /* _ASM_RISCV_HWCAP_H */
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 79a5a35fab96..d1846aab1f78 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -311,9 +311,12 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_BUNDLE(zvksg, riscv_zvksg_bundled_exts),
 	__RISCV_ISA_EXT_DATA(zvkt, RISCV_ISA_EXT_ZVKT),
 	__RISCV_ISA_EXT_DATA(smaia, RISCV_ISA_EXT_SMAIA),
+	__RISCV_ISA_EXT_DATA(smmpm, RISCV_ISA_EXT_SMMPM),
+	__RISCV_ISA_EXT_SUPERSET(smnpm, RISCV_ISA_EXT_SMNPM, riscv_xlinuxenvcfg_exts),
 	__RISCV_ISA_EXT_DATA(smstateen, RISCV_ISA_EXT_SMSTATEEN),
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
+	__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
 	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 3/9] riscv: Add CSR definitions for pointer masking
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Andrew Jones,
	Greentime Hu

Pointer masking is controlled via a two-bit PMM field, which appears in
various CSRs depending on which extensions are implemented. Smmpm
defines the field in mseccfg; Smnpm defines the field in menvcfg; Ssnpm
defines the field in senvcfg and (if present) henvcfg and hstatus.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/csr.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 2468c55933cd..1d5a6d73482c 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -119,6 +119,10 @@
 
 /* HSTATUS flags */
 #ifdef CONFIG_64BIT
+#define HSTATUS_PMM		_AC(0x3000000000000, UL)
+#define HSTATUS_PMM_PMLEN_0	_AC(0x0000000000000, UL)
+#define HSTATUS_PMM_PMLEN_7	_AC(0x2000000000000, UL)
+#define HSTATUS_PMM_PMLEN_16	_AC(0x3000000000000, UL)
 #define HSTATUS_VSXL		_AC(0x300000000, UL)
 #define HSTATUS_VSXL_SHIFT	32
 #endif
@@ -194,6 +198,10 @@
 /* xENVCFG flags */
 #define ENVCFG_STCE			(_AC(1, ULL) << 63)
 #define ENVCFG_PBMTE			(_AC(1, ULL) << 62)
+#define ENVCFG_PMM			_AC(0x300000000, ULL)
+#define ENVCFG_PMM_PMLEN_0		_AC(0x000000000, ULL)
+#define ENVCFG_PMM_PMLEN_7		_AC(0x200000000, ULL)
+#define ENVCFG_PMM_PMLEN_16		_AC(0x300000000, ULL)
 #define ENVCFG_CBZE			(_AC(1, UL) << 7)
 #define ENVCFG_CBCFE			(_AC(1, UL) << 6)
 #define ENVCFG_CBIE_SHIFT		4
@@ -215,6 +223,12 @@
 #define SMSTATEEN0_SSTATEEN0_SHIFT	63
 #define SMSTATEEN0_SSTATEEN0		(_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT)
 
+/* mseccfg bits */
+#define MSECCFG_PMM			ENVCFG_PMM
+#define MSECCFG_PMM_PMLEN_0		ENVCFG_PMM_PMLEN_0
+#define MSECCFG_PMM_PMLEN_7		ENVCFG_PMM_PMLEN_7
+#define MSECCFG_PMM_PMLEN_16		ENVCFG_PMM_PMLEN_16
+
 /* symbolic CSR names: */
 #define CSR_CYCLE		0xc00
 #define CSR_TIME		0xc01
@@ -381,6 +395,8 @@
 #define CSR_MIP			0x344
 #define CSR_PMPCFG0		0x3a0
 #define CSR_PMPADDR0		0x3b0
+#define CSR_MSECCFG		0x747
+#define CSR_MSECCFGH		0x757
 #define CSR_MVENDORID		0xf11
 #define CSR_MARCHID		0xf12
 #define CSR_MIMPID		0xf13
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 3/9] riscv: Add CSR definitions for pointer masking
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Andrew Jones,
	Greentime Hu

Pointer masking is controlled via a two-bit PMM field, which appears in
various CSRs depending on which extensions are implemented. Smmpm
defines the field in mseccfg; Smnpm defines the field in menvcfg; Ssnpm
defines the field in senvcfg and (if present) henvcfg and hstatus.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/csr.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 2468c55933cd..1d5a6d73482c 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -119,6 +119,10 @@
 
 /* HSTATUS flags */
 #ifdef CONFIG_64BIT
+#define HSTATUS_PMM		_AC(0x3000000000000, UL)
+#define HSTATUS_PMM_PMLEN_0	_AC(0x0000000000000, UL)
+#define HSTATUS_PMM_PMLEN_7	_AC(0x2000000000000, UL)
+#define HSTATUS_PMM_PMLEN_16	_AC(0x3000000000000, UL)
 #define HSTATUS_VSXL		_AC(0x300000000, UL)
 #define HSTATUS_VSXL_SHIFT	32
 #endif
@@ -194,6 +198,10 @@
 /* xENVCFG flags */
 #define ENVCFG_STCE			(_AC(1, ULL) << 63)
 #define ENVCFG_PBMTE			(_AC(1, ULL) << 62)
+#define ENVCFG_PMM			_AC(0x300000000, ULL)
+#define ENVCFG_PMM_PMLEN_0		_AC(0x000000000, ULL)
+#define ENVCFG_PMM_PMLEN_7		_AC(0x200000000, ULL)
+#define ENVCFG_PMM_PMLEN_16		_AC(0x300000000, ULL)
 #define ENVCFG_CBZE			(_AC(1, UL) << 7)
 #define ENVCFG_CBCFE			(_AC(1, UL) << 6)
 #define ENVCFG_CBIE_SHIFT		4
@@ -215,6 +223,12 @@
 #define SMSTATEEN0_SSTATEEN0_SHIFT	63
 #define SMSTATEEN0_SSTATEEN0		(_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT)
 
+/* mseccfg bits */
+#define MSECCFG_PMM			ENVCFG_PMM
+#define MSECCFG_PMM_PMLEN_0		ENVCFG_PMM_PMLEN_0
+#define MSECCFG_PMM_PMLEN_7		ENVCFG_PMM_PMLEN_7
+#define MSECCFG_PMM_PMLEN_16		ENVCFG_PMM_PMLEN_16
+
 /* symbolic CSR names: */
 #define CSR_CYCLE		0xc00
 #define CSR_TIME		0xc01
@@ -381,6 +395,8 @@
 #define CSR_MIP			0x344
 #define CSR_PMPCFG0		0x3a0
 #define CSR_PMPADDR0		0x3b0
+#define CSR_MSECCFG		0x747
+#define CSR_MSECCFGH		0x757
 #define CSR_MVENDORID		0xf11
 #define CSR_MARCHID		0xf12
 #define CSR_MIMPID		0xf13
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 4/9] riscv: Define is_compat_thread()
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Paul Walmsley

This allows checking if some thread other than current is 32-bit.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/compat.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/riscv/include/asm/compat.h b/arch/riscv/include/asm/compat.h
index 2ac955b51148..233c439c12d7 100644
--- a/arch/riscv/include/asm/compat.h
+++ b/arch/riscv/include/asm/compat.h
@@ -12,11 +12,18 @@
 #include <linux/sched/task_stack.h>
 #include <asm-generic/compat.h>
 
+#ifdef CONFIG_COMPAT
+
 static inline int is_compat_task(void)
 {
 	return test_thread_flag(TIF_32BIT);
 }
 
+static inline int is_compat_thread(struct thread_info *thread)
+{
+	return test_ti_thread_flag(thread, TIF_32BIT);
+}
+
 struct compat_user_regs_struct {
 	compat_ulong_t pc;
 	compat_ulong_t ra;
@@ -126,4 +133,13 @@ static inline void cregs_to_regs(struct compat_user_regs_struct *cregs,
 	regs->t6	= (unsigned long) cregs->t6;
 };
 
+#else
+
+static inline int is_compat_thread(struct thread_info *thread)
+{
+	return 0;
+}
+
+#endif
+
 #endif /* __ASM_COMPAT_H */
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 4/9] riscv: Define is_compat_thread()
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Paul Walmsley

This allows checking if some thread other than current is 32-bit.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/compat.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/riscv/include/asm/compat.h b/arch/riscv/include/asm/compat.h
index 2ac955b51148..233c439c12d7 100644
--- a/arch/riscv/include/asm/compat.h
+++ b/arch/riscv/include/asm/compat.h
@@ -12,11 +12,18 @@
 #include <linux/sched/task_stack.h>
 #include <asm-generic/compat.h>
 
+#ifdef CONFIG_COMPAT
+
 static inline int is_compat_task(void)
 {
 	return test_thread_flag(TIF_32BIT);
 }
 
+static inline int is_compat_thread(struct thread_info *thread)
+{
+	return test_ti_thread_flag(thread, TIF_32BIT);
+}
+
 struct compat_user_regs_struct {
 	compat_ulong_t pc;
 	compat_ulong_t ra;
@@ -126,4 +133,13 @@ static inline void cregs_to_regs(struct compat_user_regs_struct *cregs,
 	regs->t6	= (unsigned long) cregs->t6;
 };
 
+#else
+
+static inline int is_compat_thread(struct thread_info *thread)
+{
+	return 0;
+}
+
+#endif
+
 #endif /* __ASM_COMPAT_H */
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Andrew Jones, Guo Ren,
	Heiko Stuebner, Paul Walmsley

Some envcfg bits need to be controlled on a per-thread basis, such as
the pointer masking mode. However, the envcfg CSR value cannot simply be
stored in struct thread_struct, because some hardware may implement a
different subset of envcfg CSR bits is across CPUs. As a result, we need
to combine the per-CPU and per-thread bits whenever we switch threads.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/cpufeature.h |  2 ++
 arch/riscv/include/asm/processor.h  |  1 +
 arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
 arch/riscv/kernel/cpufeature.c      |  4 +++-
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 0bd11862b760..b1ad8d0b4599 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
 /* Per-cpu ISA extensions. */
 extern struct riscv_isainfo hart_isa[NR_CPUS];
 
+DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
+
 void riscv_user_isa_enable(void);
 
 #ifdef CONFIG_RISCV_MISALIGNED
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index a8509cc31ab2..06b87402a4d8 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -118,6 +118,7 @@ struct thread_struct {
 	unsigned long s[12];	/* s[0]: frame pointer */
 	struct __riscv_d_ext_state fstate;
 	unsigned long bad_cause;
+	unsigned long envcfg;
 	u32 riscv_v_flags;
 	u32 vstate_ctrl;
 	struct __riscv_v_ext_state vstate;
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 7efdb0584d47..256a354a5c4a 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
 #define __switch_to_fpu(__prev, __next) do { } while (0)
 #endif
 
+static inline void sync_envcfg(struct task_struct *task)
+{
+	csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
+}
+
+static inline void __switch_to_envcfg(struct task_struct *next)
+{
+	if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
+		sync_envcfg(next);
+}
+
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
 
@@ -80,6 +91,7 @@ do {							\
 		__switch_to_fpu(__prev, __next);	\
 	if (has_vector())					\
 		__switch_to_vector(__prev, __next);	\
+	__switch_to_envcfg(__next);			\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index d1846aab1f78..32aaaf41f8a8 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
 /* Per-cpu ISA extensions. */
 struct riscv_isainfo hart_isa[NR_CPUS];
 
+DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
+
 /* Performance information */
 DEFINE_PER_CPU(long, misaligned_access_speed);
 
@@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
 void riscv_user_isa_enable(void)
 {
 	if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
-		csr_set(CSR_ENVCFG, ENVCFG_CBZE);
+		this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);
 }
 
 #ifdef CONFIG_RISCV_ALTERNATIVE
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Andrew Jones, Guo Ren,
	Heiko Stuebner, Paul Walmsley

Some envcfg bits need to be controlled on a per-thread basis, such as
the pointer masking mode. However, the envcfg CSR value cannot simply be
stored in struct thread_struct, because some hardware may implement a
different subset of envcfg CSR bits is across CPUs. As a result, we need
to combine the per-CPU and per-thread bits whenever we switch threads.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/cpufeature.h |  2 ++
 arch/riscv/include/asm/processor.h  |  1 +
 arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
 arch/riscv/kernel/cpufeature.c      |  4 +++-
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 0bd11862b760..b1ad8d0b4599 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
 /* Per-cpu ISA extensions. */
 extern struct riscv_isainfo hart_isa[NR_CPUS];
 
+DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
+
 void riscv_user_isa_enable(void);
 
 #ifdef CONFIG_RISCV_MISALIGNED
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index a8509cc31ab2..06b87402a4d8 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -118,6 +118,7 @@ struct thread_struct {
 	unsigned long s[12];	/* s[0]: frame pointer */
 	struct __riscv_d_ext_state fstate;
 	unsigned long bad_cause;
+	unsigned long envcfg;
 	u32 riscv_v_flags;
 	u32 vstate_ctrl;
 	struct __riscv_v_ext_state vstate;
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 7efdb0584d47..256a354a5c4a 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
 #define __switch_to_fpu(__prev, __next) do { } while (0)
 #endif
 
+static inline void sync_envcfg(struct task_struct *task)
+{
+	csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
+}
+
+static inline void __switch_to_envcfg(struct task_struct *next)
+{
+	if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
+		sync_envcfg(next);
+}
+
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
 
@@ -80,6 +91,7 @@ do {							\
 		__switch_to_fpu(__prev, __next);	\
 	if (has_vector())					\
 		__switch_to_vector(__prev, __next);	\
+	__switch_to_envcfg(__next);			\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index d1846aab1f78..32aaaf41f8a8 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
 /* Per-cpu ISA extensions. */
 struct riscv_isainfo hart_isa[NR_CPUS];
 
+DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
+
 /* Performance information */
 DEFINE_PER_CPU(long, misaligned_access_speed);
 
@@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
 void riscv_user_isa_enable(void)
 {
 	if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
-		csr_set(CSR_ENVCFG, ENVCFG_CBZE);
+		this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);
 }
 
 #ifdef CONFIG_RISCV_ALTERNATIVE
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 6/9] riscv: Add support for userspace pointer masking
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Guo Ren, Paul Walmsley,
	Stefan Roesch

RISC-V supports pointer masking with a variable number of tag bits
("PMLEN") and which is configured at the next higher privilege level.

Wire up the PR_SET_TAGGED_ADDR_CTRL and PR_GET_TAGGED_ADDR_CTRL prctls
so userspace can request a minimum number of tag bits and determine the
actual number of tag bits. As with PR_TAGGED_ADDR_ENABLE, the pointer
masking configuration is thread-scoped, inherited on clone() and fork()
and cleared on exec().

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/Kconfig                 |   8 +++
 arch/riscv/include/asm/processor.h |   8 +++
 arch/riscv/kernel/process.c        | 107 +++++++++++++++++++++++++++++
 include/uapi/linux/prctl.h         |   3 +
 4 files changed, 126 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e3142ce531a0..a1a1585120f0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -479,6 +479,14 @@ config RISCV_ISA_C
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_POINTER_MASKING
+	bool "Smmpm, Smnpm, and Ssnpm extensions for pointer masking"
+	depends on 64BIT
+	default y
+	help
+	  Add support to dynamically detect the presence of the Smmpm, Smnpm,
+	  and Ssnpm extensions (pointer masking) and enable their usage.
+
 config RISCV_ISA_SVNAPOT
 	bool "Svnapot extension support for supervisor mode NAPOT pages"
 	depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 06b87402a4d8..64b34e839802 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -185,6 +185,14 @@ extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val);
 #define GET_UNALIGN_CTL(tsk, addr)	get_unalign_ctl((tsk), (addr))
 #define SET_UNALIGN_CTL(tsk, val)	set_unalign_ctl((tsk), (val))
 
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+/* PR_{SET,GET}_TAGGED_ADDR_CTRL prctl */
+long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg);
+long get_tagged_addr_ctrl(struct task_struct *task);
+#define SET_TAGGED_ADDR_CTRL(arg)	set_tagged_addr_ctrl(current, arg)
+#define GET_TAGGED_ADDR_CTRL()		get_tagged_addr_ctrl(current)
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_RISCV_PROCESSOR_H */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 92922dbd5b5c..3578e75f4aa4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -7,6 +7,7 @@
  * Copyright (C) 2017 SiFive
  */
 
+#include <linux/bitfield.h>
 #include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/sched.h>
@@ -154,6 +155,18 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 #endif
 }
 
+static void flush_tagged_addr_state(void)
+{
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM))
+		return;
+
+	current->thread.envcfg &= ~ENVCFG_PMM;
+
+	sync_envcfg(current);
+#endif
+}
+
 void flush_thread(void)
 {
 #ifdef CONFIG_FPU
@@ -173,6 +186,7 @@ void flush_thread(void)
 	memset(&current->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
 	clear_tsk_thread_flag(current, TIF_RISCV_V_DEFER_RESTORE);
 #endif
+	flush_tagged_addr_state();
 }
 
 void arch_release_task_struct(struct task_struct *tsk)
@@ -236,3 +250,96 @@ void __init arch_task_cache_init(void)
 {
 	riscv_v_setup_ctx_cache();
 }
+
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+static bool have_user_pmlen_7;
+static bool have_user_pmlen_16;
+
+long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
+{
+	unsigned long valid_mask = PR_PMLEN_MASK;
+	struct thread_info *ti = task_thread_info(task);
+	u8 pmlen;
+
+	if (is_compat_thread(ti))
+		return -EINVAL;
+
+	if (arg & ~valid_mask)
+		return -EINVAL;
+
+	pmlen = FIELD_GET(PR_PMLEN_MASK, arg);
+	if (pmlen > 16) {
+		return -EINVAL;
+	} else if (pmlen > 7) {
+		if (have_user_pmlen_16)
+			pmlen = 16;
+		else
+			return -EINVAL;
+	} else if (pmlen > 0) {
+		/*
+		 * Prefer the smallest PMLEN that satisfies the user's request,
+		 * in case choosing a larger PMLEN has a performance impact.
+		 */
+		if (have_user_pmlen_7)
+			pmlen = 7;
+		else if (have_user_pmlen_16)
+			pmlen = 16;
+		else
+			return -EINVAL;
+	}
+
+	task->thread.envcfg &= ~ENVCFG_PMM;
+	if (pmlen == 7)
+		task->thread.envcfg |= ENVCFG_PMM_PMLEN_7;
+	else if (pmlen == 16)
+		task->thread.envcfg |= ENVCFG_PMM_PMLEN_16;
+
+	if (task == current)
+		sync_envcfg(current);
+
+	return 0;
+}
+
+long get_tagged_addr_ctrl(struct task_struct *task)
+{
+	struct thread_info *ti = task_thread_info(task);
+	long ret = 0;
+
+	if (is_compat_thread(ti))
+		return -EINVAL;
+
+	switch (task->thread.envcfg & ENVCFG_PMM) {
+	case ENVCFG_PMM_PMLEN_7:
+		ret |= FIELD_PREP(PR_PMLEN_MASK, 7);
+		break;
+	case ENVCFG_PMM_PMLEN_16:
+		ret |= FIELD_PREP(PR_PMLEN_MASK, 16);
+		break;
+	}
+
+	return ret;
+}
+
+static bool try_to_set_pmm(unsigned long value)
+{
+	csr_set(CSR_ENVCFG, value);
+	return (csr_read_clear(CSR_ENVCFG, ENVCFG_PMM) & ENVCFG_PMM) == value;
+}
+
+static int __init tagged_addr_init(void)
+{
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM))
+		return 0;
+
+	/*
+	 * envcfg.PMM is a WARL field. Detect which values are supported.
+	 * Assume the supported PMLEN values are the same on all harts.
+	 */
+	csr_clear(CSR_ENVCFG, ENVCFG_PMM);
+	have_user_pmlen_7 = try_to_set_pmm(ENVCFG_PMM_PMLEN_7);
+	have_user_pmlen_16 = try_to_set_pmm(ENVCFG_PMM_PMLEN_16);
+
+	return 0;
+}
+core_initcall(tagged_addr_init);
+#endif	/* CONFIG_RISCV_ISA_POINTER_MASKING */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 370ed14b1ae0..488b0d8e8495 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -244,6 +244,9 @@ struct prctl_mm_map {
 # define PR_MTE_TAG_MASK		(0xffffUL << PR_MTE_TAG_SHIFT)
 /* Unused; kept only for source compatibility */
 # define PR_MTE_TCF_SHIFT		1
+/* RISC-V pointer masking tag length */
+# define PR_PMLEN_SHIFT			24
+# define PR_PMLEN_MASK			(0x7fUL << PR_PMLEN_SHIFT)
 
 /* Control reclaim behavior when allocating memory */
 #define PR_SET_IO_FLUSHER		57
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 6/9] riscv: Add support for userspace pointer masking
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Guo Ren, Paul Walmsley,
	Stefan Roesch

RISC-V supports pointer masking with a variable number of tag bits
("PMLEN") and which is configured at the next higher privilege level.

Wire up the PR_SET_TAGGED_ADDR_CTRL and PR_GET_TAGGED_ADDR_CTRL prctls
so userspace can request a minimum number of tag bits and determine the
actual number of tag bits. As with PR_TAGGED_ADDR_ENABLE, the pointer
masking configuration is thread-scoped, inherited on clone() and fork()
and cleared on exec().

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/Kconfig                 |   8 +++
 arch/riscv/include/asm/processor.h |   8 +++
 arch/riscv/kernel/process.c        | 107 +++++++++++++++++++++++++++++
 include/uapi/linux/prctl.h         |   3 +
 4 files changed, 126 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e3142ce531a0..a1a1585120f0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -479,6 +479,14 @@ config RISCV_ISA_C
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_POINTER_MASKING
+	bool "Smmpm, Smnpm, and Ssnpm extensions for pointer masking"
+	depends on 64BIT
+	default y
+	help
+	  Add support to dynamically detect the presence of the Smmpm, Smnpm,
+	  and Ssnpm extensions (pointer masking) and enable their usage.
+
 config RISCV_ISA_SVNAPOT
 	bool "Svnapot extension support for supervisor mode NAPOT pages"
 	depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 06b87402a4d8..64b34e839802 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -185,6 +185,14 @@ extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val);
 #define GET_UNALIGN_CTL(tsk, addr)	get_unalign_ctl((tsk), (addr))
 #define SET_UNALIGN_CTL(tsk, val)	set_unalign_ctl((tsk), (val))
 
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+/* PR_{SET,GET}_TAGGED_ADDR_CTRL prctl */
+long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg);
+long get_tagged_addr_ctrl(struct task_struct *task);
+#define SET_TAGGED_ADDR_CTRL(arg)	set_tagged_addr_ctrl(current, arg)
+#define GET_TAGGED_ADDR_CTRL()		get_tagged_addr_ctrl(current)
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_RISCV_PROCESSOR_H */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 92922dbd5b5c..3578e75f4aa4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -7,6 +7,7 @@
  * Copyright (C) 2017 SiFive
  */
 
+#include <linux/bitfield.h>
 #include <linux/cpu.h>
 #include <linux/kernel.h>
 #include <linux/sched.h>
@@ -154,6 +155,18 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 #endif
 }
 
+static void flush_tagged_addr_state(void)
+{
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM))
+		return;
+
+	current->thread.envcfg &= ~ENVCFG_PMM;
+
+	sync_envcfg(current);
+#endif
+}
+
 void flush_thread(void)
 {
 #ifdef CONFIG_FPU
@@ -173,6 +186,7 @@ void flush_thread(void)
 	memset(&current->thread.vstate, 0, sizeof(struct __riscv_v_ext_state));
 	clear_tsk_thread_flag(current, TIF_RISCV_V_DEFER_RESTORE);
 #endif
+	flush_tagged_addr_state();
 }
 
 void arch_release_task_struct(struct task_struct *tsk)
@@ -236,3 +250,96 @@ void __init arch_task_cache_init(void)
 {
 	riscv_v_setup_ctx_cache();
 }
+
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+static bool have_user_pmlen_7;
+static bool have_user_pmlen_16;
+
+long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
+{
+	unsigned long valid_mask = PR_PMLEN_MASK;
+	struct thread_info *ti = task_thread_info(task);
+	u8 pmlen;
+
+	if (is_compat_thread(ti))
+		return -EINVAL;
+
+	if (arg & ~valid_mask)
+		return -EINVAL;
+
+	pmlen = FIELD_GET(PR_PMLEN_MASK, arg);
+	if (pmlen > 16) {
+		return -EINVAL;
+	} else if (pmlen > 7) {
+		if (have_user_pmlen_16)
+			pmlen = 16;
+		else
+			return -EINVAL;
+	} else if (pmlen > 0) {
+		/*
+		 * Prefer the smallest PMLEN that satisfies the user's request,
+		 * in case choosing a larger PMLEN has a performance impact.
+		 */
+		if (have_user_pmlen_7)
+			pmlen = 7;
+		else if (have_user_pmlen_16)
+			pmlen = 16;
+		else
+			return -EINVAL;
+	}
+
+	task->thread.envcfg &= ~ENVCFG_PMM;
+	if (pmlen == 7)
+		task->thread.envcfg |= ENVCFG_PMM_PMLEN_7;
+	else if (pmlen == 16)
+		task->thread.envcfg |= ENVCFG_PMM_PMLEN_16;
+
+	if (task == current)
+		sync_envcfg(current);
+
+	return 0;
+}
+
+long get_tagged_addr_ctrl(struct task_struct *task)
+{
+	struct thread_info *ti = task_thread_info(task);
+	long ret = 0;
+
+	if (is_compat_thread(ti))
+		return -EINVAL;
+
+	switch (task->thread.envcfg & ENVCFG_PMM) {
+	case ENVCFG_PMM_PMLEN_7:
+		ret |= FIELD_PREP(PR_PMLEN_MASK, 7);
+		break;
+	case ENVCFG_PMM_PMLEN_16:
+		ret |= FIELD_PREP(PR_PMLEN_MASK, 16);
+		break;
+	}
+
+	return ret;
+}
+
+static bool try_to_set_pmm(unsigned long value)
+{
+	csr_set(CSR_ENVCFG, value);
+	return (csr_read_clear(CSR_ENVCFG, ENVCFG_PMM) & ENVCFG_PMM) == value;
+}
+
+static int __init tagged_addr_init(void)
+{
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM))
+		return 0;
+
+	/*
+	 * envcfg.PMM is a WARL field. Detect which values are supported.
+	 * Assume the supported PMLEN values are the same on all harts.
+	 */
+	csr_clear(CSR_ENVCFG, ENVCFG_PMM);
+	have_user_pmlen_7 = try_to_set_pmm(ENVCFG_PMM_PMLEN_7);
+	have_user_pmlen_16 = try_to_set_pmm(ENVCFG_PMM_PMLEN_16);
+
+	return 0;
+}
+core_initcall(tagged_addr_init);
+#endif	/* CONFIG_RISCV_ISA_POINTER_MASKING */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 370ed14b1ae0..488b0d8e8495 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -244,6 +244,9 @@ struct prctl_mm_map {
 # define PR_MTE_TAG_MASK		(0xffffUL << PR_MTE_TAG_SHIFT)
 /* Unused; kept only for source compatibility */
 # define PR_MTE_TCF_SHIFT		1
+/* RISC-V pointer masking tag length */
+# define PR_PMLEN_SHIFT			24
+# define PR_PMLEN_MASK			(0x7fUL << PR_PMLEN_SHIFT)
 
 /* Control reclaim behavior when allocating memory */
 #define PR_SET_IO_FLUSHER		57
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 7/9] riscv: Add support for the tagged address ABI
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Greentime Hu

When pointer masking is enabled for userspace, the kernel can accept
tagged pointers as arguments to some system calls. Allow this by
untagging the pointers in access_ok() and the uaccess routines. The
software untagging in the uaccess routines is required because U-mode
and S-mode have entirely separate pointer masking configurations.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/processor.h |  1 +
 arch/riscv/include/asm/uaccess.h   | 40 +++++++++++++++++++++---
 arch/riscv/kernel/process.c        | 49 +++++++++++++++++++++++++++++-
 3 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 64b34e839802..cdc8569b2118 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -124,6 +124,7 @@ struct thread_struct {
 	struct __riscv_v_ext_state vstate;
 	unsigned long align_ctl;
 	struct __riscv_v_ext_state kernel_vstate;
+	u8 pmlen;
 };
 
 /* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index ec0cab9fbddd..ed282dcf9a6d 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -9,8 +9,38 @@
 #define _ASM_RISCV_UACCESS_H
 
 #include <asm/asm-extable.h>
+#include <asm/cpufeature.h>
 #include <asm/pgtable.h>		/* for TASK_SIZE */
 
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+static inline unsigned long __untagged_addr(unsigned long addr)
+{
+	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM)) {
+		u8 shift = current->thread.pmlen;
+
+		/*
+		 * Virtual addresses are sign-extended, while
+		 * physical addresses are zero-extended.
+		 */
+		if (IS_ENABLED(CONFIG_MMU))
+			return (long)(addr << shift) >> shift;
+		else
+			return (addr << shift) >> shift;
+	}
+
+	return addr;
+}
+
+#define untagged_addr(addr) ({					\
+	unsigned long __addr = (__force unsigned long)(addr);	\
+	(__force __typeof__(addr))__untagged_addr(__addr);	\
+})
+
+#define access_ok(addr, size) likely(__access_ok(untagged_addr(addr), size))
+#else
+#define untagged_addr(addr) addr
+#endif
+
 /*
  * User space memory access functions
  */
@@ -130,7 +160,7 @@ do {								\
  */
 #define __get_user(x, ptr)					\
 ({								\
-	const __typeof__(*(ptr)) __user *__gu_ptr = (ptr);	\
+	const __typeof__(*(ptr)) __user *__gu_ptr = untagged_addr(ptr); \
 	long __gu_err = 0;					\
 								\
 	__chk_user_ptr(__gu_ptr);				\
@@ -246,7 +276,7 @@ do {								\
  */
 #define __put_user(x, ptr)					\
 ({								\
-	__typeof__(*(ptr)) __user *__gu_ptr = (ptr);		\
+	__typeof__(*(ptr)) __user *__gu_ptr = untagged_addr(ptr); \
 	__typeof__(*__gu_ptr) __val = (x);			\
 	long __pu_err = 0;					\
 								\
@@ -293,13 +323,13 @@ unsigned long __must_check __asm_copy_from_user(void *to,
 static inline unsigned long
 raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	return __asm_copy_from_user(to, from, n);
+	return __asm_copy_from_user(to, untagged_addr(from), n);
 }
 
 static inline unsigned long
 raw_copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	return __asm_copy_to_user(to, from, n);
+	return __asm_copy_to_user(untagged_addr(to), from, n);
 }
 
 extern long strncpy_from_user(char *dest, const char __user *src, long count);
@@ -314,7 +344,7 @@ unsigned long __must_check clear_user(void __user *to, unsigned long n)
 {
 	might_fault();
 	return access_ok(to, n) ?
-		__clear_user(to, n) : n;
+		__clear_user(untagged_addr(to), n) : n;
 }
 
 #define __get_kernel_nofault(dst, src, type, err_label)			\
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 3578e75f4aa4..36129040b7bd 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -162,6 +162,7 @@ static void flush_tagged_addr_state(void)
 		return;
 
 	current->thread.envcfg &= ~ENVCFG_PMM;
+	current->thread.pmlen = 0;
 
 	sync_envcfg(current);
 #endif
@@ -255,9 +256,14 @@ void __init arch_task_cache_init(void)
 static bool have_user_pmlen_7;
 static bool have_user_pmlen_16;
 
+/*
+ * Control the relaxed ABI allowing tagged user addresses into the kernel.
+ */
+static unsigned int tagged_addr_disabled;
+
 long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
 {
-	unsigned long valid_mask = PR_PMLEN_MASK;
+	unsigned long valid_mask = PR_PMLEN_MASK | PR_TAGGED_ADDR_ENABLE;
 	struct thread_info *ti = task_thread_info(task);
 	u8 pmlen;
 
@@ -288,12 +294,25 @@ long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
 			return -EINVAL;
 	}
 
+	/*
+	 * Do not allow the enabling of the tagged address ABI if globally
+	 * disabled via sysctl abi.tagged_addr_disabled, if pointer masking
+	 * is disabled for userspace.
+	 */
+	if (arg & PR_TAGGED_ADDR_ENABLE && (tagged_addr_disabled || !pmlen))
+		return -EINVAL;
+
 	task->thread.envcfg &= ~ENVCFG_PMM;
 	if (pmlen == 7)
 		task->thread.envcfg |= ENVCFG_PMM_PMLEN_7;
 	else if (pmlen == 16)
 		task->thread.envcfg |= ENVCFG_PMM_PMLEN_16;
 
+	if (arg & PR_TAGGED_ADDR_ENABLE)
+		task->thread.pmlen = pmlen;
+	else
+		task->thread.pmlen = 0;
+
 	if (task == current)
 		sync_envcfg(current);
 
@@ -308,6 +327,13 @@ long get_tagged_addr_ctrl(struct task_struct *task)
 	if (is_compat_thread(ti))
 		return -EINVAL;
 
+	if (task->thread.pmlen)
+		ret = PR_TAGGED_ADDR_ENABLE;
+
+	/*
+	 * The task's pmlen is only set if the tagged address ABI is enabled,
+	 * so the effective PMLEN must be extracted from envcfg.PMM.
+	 */
 	switch (task->thread.envcfg & ENVCFG_PMM) {
 	case ENVCFG_PMM_PMLEN_7:
 		ret |= FIELD_PREP(PR_PMLEN_MASK, 7);
@@ -326,6 +352,24 @@ static bool try_to_set_pmm(unsigned long value)
 	return (csr_read_clear(CSR_ENVCFG, ENVCFG_PMM) & ENVCFG_PMM) == value;
 }
 
+/*
+ * Global sysctl to disable the tagged user addresses support. This control
+ * only prevents the tagged address ABI enabling via prctl() and does not
+ * disable it for tasks that already opted in to the relaxed ABI.
+ */
+
+static struct ctl_table tagged_addr_sysctl_table[] = {
+	{
+		.procname	= "tagged_addr_disabled",
+		.mode		= 0644,
+		.data		= &tagged_addr_disabled,
+		.maxlen		= sizeof(int),
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+};
+
 static int __init tagged_addr_init(void)
 {
 	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM))
@@ -339,6 +383,9 @@ static int __init tagged_addr_init(void)
 	have_user_pmlen_7 = try_to_set_pmm(ENVCFG_PMM_PMLEN_7);
 	have_user_pmlen_16 = try_to_set_pmm(ENVCFG_PMM_PMLEN_16);
 
+	if (!register_sysctl("abi", tagged_addr_sysctl_table))
+		return -EINVAL;
+
 	return 0;
 }
 core_initcall(tagged_addr_init);
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 7/9] riscv: Add support for the tagged address ABI
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Greentime Hu

When pointer masking is enabled for userspace, the kernel can accept
tagged pointers as arguments to some system calls. Allow this by
untagging the pointers in access_ok() and the uaccess routines. The
software untagging in the uaccess routines is required because U-mode
and S-mode have entirely separate pointer masking configurations.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/include/asm/processor.h |  1 +
 arch/riscv/include/asm/uaccess.h   | 40 +++++++++++++++++++++---
 arch/riscv/kernel/process.c        | 49 +++++++++++++++++++++++++++++-
 3 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 64b34e839802..cdc8569b2118 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -124,6 +124,7 @@ struct thread_struct {
 	struct __riscv_v_ext_state vstate;
 	unsigned long align_ctl;
 	struct __riscv_v_ext_state kernel_vstate;
+	u8 pmlen;
 };
 
 /* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index ec0cab9fbddd..ed282dcf9a6d 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -9,8 +9,38 @@
 #define _ASM_RISCV_UACCESS_H
 
 #include <asm/asm-extable.h>
+#include <asm/cpufeature.h>
 #include <asm/pgtable.h>		/* for TASK_SIZE */
 
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+static inline unsigned long __untagged_addr(unsigned long addr)
+{
+	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM)) {
+		u8 shift = current->thread.pmlen;
+
+		/*
+		 * Virtual addresses are sign-extended, while
+		 * physical addresses are zero-extended.
+		 */
+		if (IS_ENABLED(CONFIG_MMU))
+			return (long)(addr << shift) >> shift;
+		else
+			return (addr << shift) >> shift;
+	}
+
+	return addr;
+}
+
+#define untagged_addr(addr) ({					\
+	unsigned long __addr = (__force unsigned long)(addr);	\
+	(__force __typeof__(addr))__untagged_addr(__addr);	\
+})
+
+#define access_ok(addr, size) likely(__access_ok(untagged_addr(addr), size))
+#else
+#define untagged_addr(addr) addr
+#endif
+
 /*
  * User space memory access functions
  */
@@ -130,7 +160,7 @@ do {								\
  */
 #define __get_user(x, ptr)					\
 ({								\
-	const __typeof__(*(ptr)) __user *__gu_ptr = (ptr);	\
+	const __typeof__(*(ptr)) __user *__gu_ptr = untagged_addr(ptr); \
 	long __gu_err = 0;					\
 								\
 	__chk_user_ptr(__gu_ptr);				\
@@ -246,7 +276,7 @@ do {								\
  */
 #define __put_user(x, ptr)					\
 ({								\
-	__typeof__(*(ptr)) __user *__gu_ptr = (ptr);		\
+	__typeof__(*(ptr)) __user *__gu_ptr = untagged_addr(ptr); \
 	__typeof__(*__gu_ptr) __val = (x);			\
 	long __pu_err = 0;					\
 								\
@@ -293,13 +323,13 @@ unsigned long __must_check __asm_copy_from_user(void *to,
 static inline unsigned long
 raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	return __asm_copy_from_user(to, from, n);
+	return __asm_copy_from_user(to, untagged_addr(from), n);
 }
 
 static inline unsigned long
 raw_copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	return __asm_copy_to_user(to, from, n);
+	return __asm_copy_to_user(untagged_addr(to), from, n);
 }
 
 extern long strncpy_from_user(char *dest, const char __user *src, long count);
@@ -314,7 +344,7 @@ unsigned long __must_check clear_user(void __user *to, unsigned long n)
 {
 	might_fault();
 	return access_ok(to, n) ?
-		__clear_user(to, n) : n;
+		__clear_user(untagged_addr(to), n) : n;
 }
 
 #define __get_kernel_nofault(dst, src, type, err_label)			\
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 3578e75f4aa4..36129040b7bd 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -162,6 +162,7 @@ static void flush_tagged_addr_state(void)
 		return;
 
 	current->thread.envcfg &= ~ENVCFG_PMM;
+	current->thread.pmlen = 0;
 
 	sync_envcfg(current);
 #endif
@@ -255,9 +256,14 @@ void __init arch_task_cache_init(void)
 static bool have_user_pmlen_7;
 static bool have_user_pmlen_16;
 
+/*
+ * Control the relaxed ABI allowing tagged user addresses into the kernel.
+ */
+static unsigned int tagged_addr_disabled;
+
 long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
 {
-	unsigned long valid_mask = PR_PMLEN_MASK;
+	unsigned long valid_mask = PR_PMLEN_MASK | PR_TAGGED_ADDR_ENABLE;
 	struct thread_info *ti = task_thread_info(task);
 	u8 pmlen;
 
@@ -288,12 +294,25 @@ long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
 			return -EINVAL;
 	}
 
+	/*
+	 * Do not allow the enabling of the tagged address ABI if globally
+	 * disabled via sysctl abi.tagged_addr_disabled, if pointer masking
+	 * is disabled for userspace.
+	 */
+	if (arg & PR_TAGGED_ADDR_ENABLE && (tagged_addr_disabled || !pmlen))
+		return -EINVAL;
+
 	task->thread.envcfg &= ~ENVCFG_PMM;
 	if (pmlen == 7)
 		task->thread.envcfg |= ENVCFG_PMM_PMLEN_7;
 	else if (pmlen == 16)
 		task->thread.envcfg |= ENVCFG_PMM_PMLEN_16;
 
+	if (arg & PR_TAGGED_ADDR_ENABLE)
+		task->thread.pmlen = pmlen;
+	else
+		task->thread.pmlen = 0;
+
 	if (task == current)
 		sync_envcfg(current);
 
@@ -308,6 +327,13 @@ long get_tagged_addr_ctrl(struct task_struct *task)
 	if (is_compat_thread(ti))
 		return -EINVAL;
 
+	if (task->thread.pmlen)
+		ret = PR_TAGGED_ADDR_ENABLE;
+
+	/*
+	 * The task's pmlen is only set if the tagged address ABI is enabled,
+	 * so the effective PMLEN must be extracted from envcfg.PMM.
+	 */
 	switch (task->thread.envcfg & ENVCFG_PMM) {
 	case ENVCFG_PMM_PMLEN_7:
 		ret |= FIELD_PREP(PR_PMLEN_MASK, 7);
@@ -326,6 +352,24 @@ static bool try_to_set_pmm(unsigned long value)
 	return (csr_read_clear(CSR_ENVCFG, ENVCFG_PMM) & ENVCFG_PMM) == value;
 }
 
+/*
+ * Global sysctl to disable the tagged user addresses support. This control
+ * only prevents the tagged address ABI enabling via prctl() and does not
+ * disable it for tasks that already opted in to the relaxed ABI.
+ */
+
+static struct ctl_table tagged_addr_sysctl_table[] = {
+	{
+		.procname	= "tagged_addr_disabled",
+		.mode		= 0644,
+		.data		= &tagged_addr_disabled,
+		.maxlen		= sizeof(int),
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+};
+
 static int __init tagged_addr_init(void)
 {
 	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SxNPM))
@@ -339,6 +383,9 @@ static int __init tagged_addr_init(void)
 	have_user_pmlen_7 = try_to_set_pmm(ENVCFG_PMM_PMLEN_7);
 	have_user_pmlen_16 = try_to_set_pmm(ENVCFG_PMM_PMLEN_16);
 
+	if (!register_sysctl("abi", tagged_addr_sysctl_table))
+		return -EINVAL;
+
 	return 0;
 }
 core_initcall(tagged_addr_init);
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 8/9] riscv: Allow ptrace control of the tagged address ABI
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Alejandro Colomar, Oleg Nesterov,
	Paul Walmsley

This allows a tracer to control the ABI of the tracee, as on arm64.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/kernel/ptrace.c | 42 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/elf.h   |  1 +
 2 files changed, 43 insertions(+)

diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index e8515aa9d80b..3d414db2118b 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -28,6 +28,9 @@ enum riscv_regset {
 #ifdef CONFIG_RISCV_ISA_V
 	REGSET_V,
 #endif
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+	REGSET_TAGGED_ADDR_CTRL,
+#endif
 };
 
 static int riscv_gpr_get(struct task_struct *target,
@@ -152,6 +155,35 @@ static int riscv_vr_set(struct task_struct *target,
 }
 #endif
 
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+static int tagged_addr_ctrl_get(struct task_struct *target,
+				const struct user_regset *regset,
+				struct membuf to)
+{
+	long ctrl = get_tagged_addr_ctrl(target);
+
+	if (IS_ERR_VALUE(ctrl))
+		return ctrl;
+
+	return membuf_write(&to, &ctrl, sizeof(ctrl));
+}
+
+static int tagged_addr_ctrl_set(struct task_struct *target,
+				const struct user_regset *regset,
+				unsigned int pos, unsigned int count,
+				const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+	long ctrl;
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &ctrl, 0, -1);
+	if (ret)
+		return ret;
+
+	return set_tagged_addr_ctrl(target, ctrl);
+}
+#endif
+
 static const struct user_regset riscv_user_regset[] = {
 	[REGSET_X] = {
 		.core_note_type = NT_PRSTATUS,
@@ -182,6 +214,16 @@ static const struct user_regset riscv_user_regset[] = {
 		.set = riscv_vr_set,
 	},
 #endif
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+	[REGSET_TAGGED_ADDR_CTRL] = {
+		.core_note_type = NT_RISCV_TAGGED_ADDR_CTRL,
+		.n = 1,
+		.size = sizeof(long),
+		.align = sizeof(long),
+		.regset_get = tagged_addr_ctrl_get,
+		.set = tagged_addr_ctrl_set,
+	},
+#endif
 };
 
 static const struct user_regset_view riscv_user_native_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 9417309b7230..90806024fed6 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -447,6 +447,7 @@ typedef struct elf64_shdr {
 #define NT_MIPS_MSA	0x802		/* MIPS SIMD registers */
 #define NT_RISCV_CSR	0x900		/* RISC-V Control and Status Registers */
 #define NT_RISCV_VECTOR	0x901		/* RISC-V vector registers */
+#define NT_RISCV_TAGGED_ADDR_CTRL 0x902	/* RISC-V tagged address control (prctl()) */
 #define NT_LOONGARCH_CPUCFG	0xa00	/* LoongArch CPU config registers */
 #define NT_LOONGARCH_CSR	0xa01	/* LoongArch control and status registers */
 #define NT_LOONGARCH_LSX	0xa02	/* LoongArch Loongson SIMD Extension registers */
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 8/9] riscv: Allow ptrace control of the tagged address ABI
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Alejandro Colomar, Oleg Nesterov,
	Paul Walmsley

This allows a tracer to control the ABI of the tracee, as on arm64.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 arch/riscv/kernel/ptrace.c | 42 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/elf.h   |  1 +
 2 files changed, 43 insertions(+)

diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index e8515aa9d80b..3d414db2118b 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -28,6 +28,9 @@ enum riscv_regset {
 #ifdef CONFIG_RISCV_ISA_V
 	REGSET_V,
 #endif
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+	REGSET_TAGGED_ADDR_CTRL,
+#endif
 };
 
 static int riscv_gpr_get(struct task_struct *target,
@@ -152,6 +155,35 @@ static int riscv_vr_set(struct task_struct *target,
 }
 #endif
 
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+static int tagged_addr_ctrl_get(struct task_struct *target,
+				const struct user_regset *regset,
+				struct membuf to)
+{
+	long ctrl = get_tagged_addr_ctrl(target);
+
+	if (IS_ERR_VALUE(ctrl))
+		return ctrl;
+
+	return membuf_write(&to, &ctrl, sizeof(ctrl));
+}
+
+static int tagged_addr_ctrl_set(struct task_struct *target,
+				const struct user_regset *regset,
+				unsigned int pos, unsigned int count,
+				const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+	long ctrl;
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &ctrl, 0, -1);
+	if (ret)
+		return ret;
+
+	return set_tagged_addr_ctrl(target, ctrl);
+}
+#endif
+
 static const struct user_regset riscv_user_regset[] = {
 	[REGSET_X] = {
 		.core_note_type = NT_PRSTATUS,
@@ -182,6 +214,16 @@ static const struct user_regset riscv_user_regset[] = {
 		.set = riscv_vr_set,
 	},
 #endif
+#ifdef CONFIG_RISCV_ISA_POINTER_MASKING
+	[REGSET_TAGGED_ADDR_CTRL] = {
+		.core_note_type = NT_RISCV_TAGGED_ADDR_CTRL,
+		.n = 1,
+		.size = sizeof(long),
+		.align = sizeof(long),
+		.regset_get = tagged_addr_ctrl_get,
+		.set = tagged_addr_ctrl_set,
+	},
+#endif
 };
 
 static const struct user_regset_view riscv_user_native_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 9417309b7230..90806024fed6 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -447,6 +447,7 @@ typedef struct elf64_shdr {
 #define NT_MIPS_MSA	0x802		/* MIPS SIMD registers */
 #define NT_RISCV_CSR	0x900		/* RISC-V Control and Status Registers */
 #define NT_RISCV_VECTOR	0x901		/* RISC-V vector registers */
+#define NT_RISCV_TAGGED_ADDR_CTRL 0x902	/* RISC-V tagged address control (prctl()) */
 #define NT_LOONGARCH_CPUCFG	0xa00	/* LoongArch CPU config registers */
 #define NT_LOONGARCH_CSR	0xa01	/* LoongArch control and status registers */
 #define NT_LOONGARCH_LSX	0xa02	/* LoongArch Loongson SIMD Extension registers */
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test
  2024-03-19 21:58 ` Samuel Holland
@ 2024-03-19 21:58   ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Shuah Khan

This test covers the behavior of the PR_SET_TAGGED_ADDR_CTRL and
PR_GET_TAGGED_ADDR_CTRL prctl() operations, their effects on the
userspace ABI, and their effects on the system call ABI.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/tags/Makefile   |  10 +
 .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++
 3 files changed, 318 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/riscv/tags/Makefile
 create mode 100644 tools/testing/selftests/riscv/tags/pointer_masking.c

diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index 4a9ff515a3a0..6e7e6621a71a 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -5,7 +5,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),riscv))
-RISCV_SUBTARGETS ?= hwprobe vector mm
+RISCV_SUBTARGETS ?= hwprobe mm tags vector
 else
 RISCV_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/riscv/tags/Makefile b/tools/testing/selftests/riscv/tags/Makefile
new file mode 100644
index 000000000000..ed82ff9c664e
--- /dev/null
+++ b/tools/testing/selftests/riscv/tags/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -I$(top_srcdir)/tools/include
+
+TEST_GEN_PROGS := pointer_masking
+
+include ../../lib.mk
+
+$(OUTPUT)/pointer_masking: pointer_masking.c
+	$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/tags/pointer_masking.c b/tools/testing/selftests/riscv/tags/pointer_masking.c
new file mode 100644
index 000000000000..c9f66e8436ab
--- /dev/null
+++ b/tools/testing/selftests/riscv/tags/pointer_masking.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <errno.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../../kselftest.h"
+
+#ifndef PR_PMLEN_SHIFT
+#define PR_PMLEN_SHIFT			24
+#endif
+#ifndef PR_PMLEN_MASK
+#define PR_PMLEN_MASK			(0x7fUL << PR_PMLEN_SHIFT)
+#endif
+
+static int dev_zero;
+
+static sigjmp_buf jmpbuf;
+
+static void sigsegv_handler(int sig)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static int min_pmlen;
+static int max_pmlen;
+
+static inline bool valid_pmlen(int pmlen)
+{
+	return pmlen == 0 || pmlen == 7 || pmlen == 16;
+}
+
+static void test_pmlen(void)
+{
+	ksft_print_msg("Testing available PMLEN values\n");
+
+	for (int request = 0; request <= 16; request++) {
+		int pmlen, ret;
+
+		ret = prctl(PR_SET_TAGGED_ADDR_CTRL, request << PR_PMLEN_SHIFT, 0, 0, 0);
+		if (ret) {
+			ksft_test_result_skip("PMLEN=%d PR_GET_TAGGED_ADDR_CTRL\n", request);
+			ksft_test_result_skip("PMLEN=%d constraint\n", request);
+			ksft_test_result_skip("PMLEN=%d validity\n", request);
+			continue;
+		}
+
+		ret = prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0);
+		ksft_test_result(ret >= 0, "PMLEN=%d PR_GET_TAGGED_ADDR_CTRL\n", request);
+		if (ret < 0) {
+			ksft_test_result_skip("PMLEN=%d constraint\n", request);
+			ksft_test_result_skip("PMLEN=%d validity\n", request);
+			continue;
+		}
+
+		pmlen = (ret & PR_PMLEN_MASK) >> PR_PMLEN_SHIFT;
+		ksft_test_result(pmlen >= request, "PMLEN=%d constraint\n", request);
+		ksft_test_result(valid_pmlen(pmlen), "PMLEN=%d validity\n", request);
+
+		if (min_pmlen == 0)
+			min_pmlen = pmlen;
+		if (max_pmlen < pmlen)
+			max_pmlen = pmlen;
+	}
+
+	if (max_pmlen == 0)
+		ksft_exit_fail_msg("Failed to enable pointer masking\n");
+}
+
+static int set_tagged_addr_ctrl(int pmlen, bool tagged_addr_abi)
+{
+	int arg, ret;
+
+	arg = pmlen << PR_PMLEN_SHIFT | tagged_addr_abi;
+	ret = prctl(PR_SET_TAGGED_ADDR_CTRL, arg, 0, 0, 0);
+	if (!ret) {
+		ret = prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0);
+		if (ret == arg)
+			return 0;
+	}
+
+	return ret < 0 ? -errno : -ENODATA;
+}
+
+static void test_dereference_pmlen(int pmlen)
+{
+	static volatile int i;
+	volatile int *p;
+	int ret;
+
+	ret = set_tagged_addr_ctrl(pmlen, false);
+	if (ret)
+		return ksft_test_result_error("PMLEN=%d setup (%d)\n", pmlen, ret);
+
+	i = pmlen;
+
+	if (pmlen) {
+		p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen);
+
+		/* These dereferences should succeed. */
+		if (sigsetjmp(jmpbuf, 1))
+			return ksft_test_result_fail("PMLEN=%d valid tag\n", pmlen);
+		if (*p != pmlen)
+			return ksft_test_result_fail("PMLEN=%d bad value\n", pmlen);
+		*p++;
+	}
+
+	p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen - 1);
+
+	/* These dereferences should raise SIGSEGV. */
+	if (sigsetjmp(jmpbuf, 1))
+		return ksft_test_result_pass("PMLEN=%d dereference\n", pmlen);
+	*p++;
+	ksft_test_result_fail("PMLEN=%d invalid tag\n", pmlen);
+}
+
+static void test_dereference(void)
+{
+	ksft_print_msg("Testing userspace pointer dereference\n");
+
+	signal(SIGSEGV, sigsegv_handler);
+
+	test_dereference_pmlen(0);
+	test_dereference_pmlen(min_pmlen);
+	test_dereference_pmlen(max_pmlen);
+
+	signal(SIGSEGV, SIG_DFL);
+}
+
+static void test_fork_exec(void)
+{
+	int ret, status;
+
+	ksft_print_msg("Testing fork/exec behavior\n");
+
+	ret = set_tagged_addr_ctrl(min_pmlen, false);
+	if (ret)
+		return ksft_test_result_error("setup (%d)\n", ret);
+
+	if (fork()) {
+		wait(&status);
+		ksft_test_result(WIFEXITED(status) && WEXITSTATUS(status) == 0,
+				 "dereference after fork\n");
+	} else {
+		static volatile int i;
+		volatile int *p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - min_pmlen);
+
+		exit(*p);
+	}
+
+	if (fork()) {
+		wait(&status);
+		ksft_test_result(WIFSIGNALED(status) && WTERMSIG(status) == SIGSEGV,
+				 "dereference after fork+exec\n");
+	} else {
+		execl("/proc/self/exe", "", NULL);
+	}
+}
+
+static void test_tagged_addr_abi_sysctl(void)
+{
+	char value;
+	int fd;
+
+	ksft_print_msg("Testing tagged address ABI sysctl\n");
+
+	fd = open("/proc/sys/abi/tagged_addr_disabled", O_WRONLY);
+	if (fd < 0) {
+		ksft_test_result_skip("failed to open sysctl file\n");
+		ksft_test_result_skip("failed to open sysctl file\n");
+		return;
+	}
+
+	value = '1';
+	pwrite(fd, &value, 1, 0);
+	ksft_test_result(set_tagged_addr_ctrl(min_pmlen, true) == -EINVAL,
+			 "sysctl disabled\n");
+
+	value = '0';
+	pwrite(fd, &value, 1, 0);
+	ksft_test_result(set_tagged_addr_ctrl(min_pmlen, true) == 0,
+			 "sysctl enabled\n");
+
+	set_tagged_addr_ctrl(0, false);
+
+	close(fd);
+}
+
+static void test_tagged_addr_abi_pmlen(int pmlen)
+{
+	int i, *p, ret;
+
+	i = ~pmlen;
+
+	if (pmlen) {
+		p = (int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen);
+
+		ret = set_tagged_addr_ctrl(pmlen, false);
+		if (ret)
+			return ksft_test_result_error("PMLEN=%d ABI disabled setup (%d)\n",
+						      pmlen, ret);
+
+		ret = write(dev_zero, p, sizeof(*p));
+		if (ret >= 0 || errno != EFAULT)
+			return ksft_test_result_fail("PMLEN=%d ABI disabled write\n", pmlen);
+
+		ret = read(dev_zero, p, sizeof(*p));
+		if (ret >= 0 || errno != EFAULT)
+			return ksft_test_result_fail("PMLEN=%d ABI disabled read\n", pmlen);
+
+		if (i != ~pmlen)
+			return ksft_test_result_fail("PMLEN=%d ABI disabled value\n", pmlen);
+
+		ret = set_tagged_addr_ctrl(pmlen, true);
+		if (ret)
+			return ksft_test_result_error("PMLEN=%d ABI enabled setup (%d)\n",
+						      pmlen, ret);
+
+		ret = write(dev_zero, p, sizeof(*p));
+		if (ret != sizeof(*p))
+			return ksft_test_result_fail("PMLEN=%d ABI enabled write\n", pmlen);
+
+		ret = read(dev_zero, p, sizeof(*p));
+		if (ret != sizeof(*p))
+			return ksft_test_result_fail("PMLEN=%d ABI enabled read\n", pmlen);
+
+		if (i)
+			return ksft_test_result_fail("PMLEN=%d ABI enabled value\n", pmlen);
+
+		i = ~pmlen;
+	} else {
+		/* The tagged address ABI cannot be enabled when PMLEN == 0. */
+		ret = set_tagged_addr_ctrl(pmlen, true);
+		if (ret != -EINVAL)
+			return ksft_test_result_error("PMLEN=%d ABI setup (%d)\n",
+						      pmlen, ret);
+	}
+
+	p = (int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen - 1);
+
+	ret = write(dev_zero, p, sizeof(*p));
+	if (ret >= 0 || errno != EFAULT)
+		return ksft_test_result_fail("PMLEN=%d invalid tag write (%d)\n", pmlen, errno);
+
+	ret = read(dev_zero, p, sizeof(*p));
+	if (ret >= 0 || errno != EFAULT)
+		return ksft_test_result_fail("PMLEN=%d invalid tag read\n", pmlen);
+
+	if (i != ~pmlen)
+		return ksft_test_result_fail("PMLEN=%d invalid tag value\n", pmlen);
+
+	ksft_test_result_pass("PMLEN=%d tagged address ABI\n", pmlen);
+}
+
+static void test_tagged_addr_abi(void)
+{
+	ksft_print_msg("Testing tagged address ABI\n");
+
+	test_tagged_addr_abi_pmlen(0);
+	test_tagged_addr_abi_pmlen(min_pmlen);
+	test_tagged_addr_abi_pmlen(max_pmlen);
+}
+
+static struct test_info {
+	unsigned int nr_tests;
+	void (*test_fn)(void);
+} tests[] = {
+	{ .nr_tests = 17 * 3, test_pmlen },
+	{ .nr_tests = 3, test_dereference },
+	{ .nr_tests = 2, test_fork_exec },
+	{ .nr_tests = 2, test_tagged_addr_abi_sysctl },
+	{ .nr_tests = 3, test_tagged_addr_abi },
+};
+
+int main(int argc, char **argv)
+{
+	unsigned int plan = 0;
+
+	/* Check if this is the child process after execl(). */
+	if (!argv[0][0]) {
+		static volatile int i;
+		volatile int *p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - 7);
+
+		return *p;
+	}
+
+	dev_zero = open("/dev/zero", O_RDWR);
+	if (dev_zero < 0)
+		return 1;
+
+	ksft_print_header();
+
+	for (int i = 0; i < ARRAY_SIZE(tests); ++i)
+		plan += tests[i].nr_tests;
+
+	ksft_set_plan(plan);
+
+	for (int i = 0; i < ARRAY_SIZE(tests); ++i)
+		tests[i].test_fn();
+
+	ksft_finished();
+}
-- 
2.43.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test
@ 2024-03-19 21:58   ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-19 21:58 UTC (permalink / raw)
  To: Palmer Dabbelt, linux-riscv
  Cc: devicetree, Catalin Marinas, linux-kernel, tech-j-ext,
	Conor Dooley, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Samuel Holland, Albert Ou, Shuah Khan

This test covers the behavior of the PR_SET_TAGGED_ADDR_CTRL and
PR_GET_TAGGED_ADDR_CTRL prctl() operations, their effects on the
userspace ABI, and their effects on the system call ABI.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/tags/Makefile   |  10 +
 .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++
 3 files changed, 318 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/riscv/tags/Makefile
 create mode 100644 tools/testing/selftests/riscv/tags/pointer_masking.c

diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index 4a9ff515a3a0..6e7e6621a71a 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -5,7 +5,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),riscv))
-RISCV_SUBTARGETS ?= hwprobe vector mm
+RISCV_SUBTARGETS ?= hwprobe mm tags vector
 else
 RISCV_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/riscv/tags/Makefile b/tools/testing/selftests/riscv/tags/Makefile
new file mode 100644
index 000000000000..ed82ff9c664e
--- /dev/null
+++ b/tools/testing/selftests/riscv/tags/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -I$(top_srcdir)/tools/include
+
+TEST_GEN_PROGS := pointer_masking
+
+include ../../lib.mk
+
+$(OUTPUT)/pointer_masking: pointer_masking.c
+	$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/tags/pointer_masking.c b/tools/testing/selftests/riscv/tags/pointer_masking.c
new file mode 100644
index 000000000000..c9f66e8436ab
--- /dev/null
+++ b/tools/testing/selftests/riscv/tags/pointer_masking.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <errno.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../../kselftest.h"
+
+#ifndef PR_PMLEN_SHIFT
+#define PR_PMLEN_SHIFT			24
+#endif
+#ifndef PR_PMLEN_MASK
+#define PR_PMLEN_MASK			(0x7fUL << PR_PMLEN_SHIFT)
+#endif
+
+static int dev_zero;
+
+static sigjmp_buf jmpbuf;
+
+static void sigsegv_handler(int sig)
+{
+	siglongjmp(jmpbuf, 1);
+}
+
+static int min_pmlen;
+static int max_pmlen;
+
+static inline bool valid_pmlen(int pmlen)
+{
+	return pmlen == 0 || pmlen == 7 || pmlen == 16;
+}
+
+static void test_pmlen(void)
+{
+	ksft_print_msg("Testing available PMLEN values\n");
+
+	for (int request = 0; request <= 16; request++) {
+		int pmlen, ret;
+
+		ret = prctl(PR_SET_TAGGED_ADDR_CTRL, request << PR_PMLEN_SHIFT, 0, 0, 0);
+		if (ret) {
+			ksft_test_result_skip("PMLEN=%d PR_GET_TAGGED_ADDR_CTRL\n", request);
+			ksft_test_result_skip("PMLEN=%d constraint\n", request);
+			ksft_test_result_skip("PMLEN=%d validity\n", request);
+			continue;
+		}
+
+		ret = prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0);
+		ksft_test_result(ret >= 0, "PMLEN=%d PR_GET_TAGGED_ADDR_CTRL\n", request);
+		if (ret < 0) {
+			ksft_test_result_skip("PMLEN=%d constraint\n", request);
+			ksft_test_result_skip("PMLEN=%d validity\n", request);
+			continue;
+		}
+
+		pmlen = (ret & PR_PMLEN_MASK) >> PR_PMLEN_SHIFT;
+		ksft_test_result(pmlen >= request, "PMLEN=%d constraint\n", request);
+		ksft_test_result(valid_pmlen(pmlen), "PMLEN=%d validity\n", request);
+
+		if (min_pmlen == 0)
+			min_pmlen = pmlen;
+		if (max_pmlen < pmlen)
+			max_pmlen = pmlen;
+	}
+
+	if (max_pmlen == 0)
+		ksft_exit_fail_msg("Failed to enable pointer masking\n");
+}
+
+static int set_tagged_addr_ctrl(int pmlen, bool tagged_addr_abi)
+{
+	int arg, ret;
+
+	arg = pmlen << PR_PMLEN_SHIFT | tagged_addr_abi;
+	ret = prctl(PR_SET_TAGGED_ADDR_CTRL, arg, 0, 0, 0);
+	if (!ret) {
+		ret = prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0);
+		if (ret == arg)
+			return 0;
+	}
+
+	return ret < 0 ? -errno : -ENODATA;
+}
+
+static void test_dereference_pmlen(int pmlen)
+{
+	static volatile int i;
+	volatile int *p;
+	int ret;
+
+	ret = set_tagged_addr_ctrl(pmlen, false);
+	if (ret)
+		return ksft_test_result_error("PMLEN=%d setup (%d)\n", pmlen, ret);
+
+	i = pmlen;
+
+	if (pmlen) {
+		p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen);
+
+		/* These dereferences should succeed. */
+		if (sigsetjmp(jmpbuf, 1))
+			return ksft_test_result_fail("PMLEN=%d valid tag\n", pmlen);
+		if (*p != pmlen)
+			return ksft_test_result_fail("PMLEN=%d bad value\n", pmlen);
+		*p++;
+	}
+
+	p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen - 1);
+
+	/* These dereferences should raise SIGSEGV. */
+	if (sigsetjmp(jmpbuf, 1))
+		return ksft_test_result_pass("PMLEN=%d dereference\n", pmlen);
+	*p++;
+	ksft_test_result_fail("PMLEN=%d invalid tag\n", pmlen);
+}
+
+static void test_dereference(void)
+{
+	ksft_print_msg("Testing userspace pointer dereference\n");
+
+	signal(SIGSEGV, sigsegv_handler);
+
+	test_dereference_pmlen(0);
+	test_dereference_pmlen(min_pmlen);
+	test_dereference_pmlen(max_pmlen);
+
+	signal(SIGSEGV, SIG_DFL);
+}
+
+static void test_fork_exec(void)
+{
+	int ret, status;
+
+	ksft_print_msg("Testing fork/exec behavior\n");
+
+	ret = set_tagged_addr_ctrl(min_pmlen, false);
+	if (ret)
+		return ksft_test_result_error("setup (%d)\n", ret);
+
+	if (fork()) {
+		wait(&status);
+		ksft_test_result(WIFEXITED(status) && WEXITSTATUS(status) == 0,
+				 "dereference after fork\n");
+	} else {
+		static volatile int i;
+		volatile int *p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - min_pmlen);
+
+		exit(*p);
+	}
+
+	if (fork()) {
+		wait(&status);
+		ksft_test_result(WIFSIGNALED(status) && WTERMSIG(status) == SIGSEGV,
+				 "dereference after fork+exec\n");
+	} else {
+		execl("/proc/self/exe", "", NULL);
+	}
+}
+
+static void test_tagged_addr_abi_sysctl(void)
+{
+	char value;
+	int fd;
+
+	ksft_print_msg("Testing tagged address ABI sysctl\n");
+
+	fd = open("/proc/sys/abi/tagged_addr_disabled", O_WRONLY);
+	if (fd < 0) {
+		ksft_test_result_skip("failed to open sysctl file\n");
+		ksft_test_result_skip("failed to open sysctl file\n");
+		return;
+	}
+
+	value = '1';
+	pwrite(fd, &value, 1, 0);
+	ksft_test_result(set_tagged_addr_ctrl(min_pmlen, true) == -EINVAL,
+			 "sysctl disabled\n");
+
+	value = '0';
+	pwrite(fd, &value, 1, 0);
+	ksft_test_result(set_tagged_addr_ctrl(min_pmlen, true) == 0,
+			 "sysctl enabled\n");
+
+	set_tagged_addr_ctrl(0, false);
+
+	close(fd);
+}
+
+static void test_tagged_addr_abi_pmlen(int pmlen)
+{
+	int i, *p, ret;
+
+	i = ~pmlen;
+
+	if (pmlen) {
+		p = (int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen);
+
+		ret = set_tagged_addr_ctrl(pmlen, false);
+		if (ret)
+			return ksft_test_result_error("PMLEN=%d ABI disabled setup (%d)\n",
+						      pmlen, ret);
+
+		ret = write(dev_zero, p, sizeof(*p));
+		if (ret >= 0 || errno != EFAULT)
+			return ksft_test_result_fail("PMLEN=%d ABI disabled write\n", pmlen);
+
+		ret = read(dev_zero, p, sizeof(*p));
+		if (ret >= 0 || errno != EFAULT)
+			return ksft_test_result_fail("PMLEN=%d ABI disabled read\n", pmlen);
+
+		if (i != ~pmlen)
+			return ksft_test_result_fail("PMLEN=%d ABI disabled value\n", pmlen);
+
+		ret = set_tagged_addr_ctrl(pmlen, true);
+		if (ret)
+			return ksft_test_result_error("PMLEN=%d ABI enabled setup (%d)\n",
+						      pmlen, ret);
+
+		ret = write(dev_zero, p, sizeof(*p));
+		if (ret != sizeof(*p))
+			return ksft_test_result_fail("PMLEN=%d ABI enabled write\n", pmlen);
+
+		ret = read(dev_zero, p, sizeof(*p));
+		if (ret != sizeof(*p))
+			return ksft_test_result_fail("PMLEN=%d ABI enabled read\n", pmlen);
+
+		if (i)
+			return ksft_test_result_fail("PMLEN=%d ABI enabled value\n", pmlen);
+
+		i = ~pmlen;
+	} else {
+		/* The tagged address ABI cannot be enabled when PMLEN == 0. */
+		ret = set_tagged_addr_ctrl(pmlen, true);
+		if (ret != -EINVAL)
+			return ksft_test_result_error("PMLEN=%d ABI setup (%d)\n",
+						      pmlen, ret);
+	}
+
+	p = (int *)((uintptr_t)&i | 1UL << __riscv_xlen - pmlen - 1);
+
+	ret = write(dev_zero, p, sizeof(*p));
+	if (ret >= 0 || errno != EFAULT)
+		return ksft_test_result_fail("PMLEN=%d invalid tag write (%d)\n", pmlen, errno);
+
+	ret = read(dev_zero, p, sizeof(*p));
+	if (ret >= 0 || errno != EFAULT)
+		return ksft_test_result_fail("PMLEN=%d invalid tag read\n", pmlen);
+
+	if (i != ~pmlen)
+		return ksft_test_result_fail("PMLEN=%d invalid tag value\n", pmlen);
+
+	ksft_test_result_pass("PMLEN=%d tagged address ABI\n", pmlen);
+}
+
+static void test_tagged_addr_abi(void)
+{
+	ksft_print_msg("Testing tagged address ABI\n");
+
+	test_tagged_addr_abi_pmlen(0);
+	test_tagged_addr_abi_pmlen(min_pmlen);
+	test_tagged_addr_abi_pmlen(max_pmlen);
+}
+
+static struct test_info {
+	unsigned int nr_tests;
+	void (*test_fn)(void);
+} tests[] = {
+	{ .nr_tests = 17 * 3, test_pmlen },
+	{ .nr_tests = 3, test_dereference },
+	{ .nr_tests = 2, test_fork_exec },
+	{ .nr_tests = 2, test_tagged_addr_abi_sysctl },
+	{ .nr_tests = 3, test_tagged_addr_abi },
+};
+
+int main(int argc, char **argv)
+{
+	unsigned int plan = 0;
+
+	/* Check if this is the child process after execl(). */
+	if (!argv[0][0]) {
+		static volatile int i;
+		volatile int *p = (volatile int *)((uintptr_t)&i | 1UL << __riscv_xlen - 7);
+
+		return *p;
+	}
+
+	dev_zero = open("/dev/zero", O_RDWR);
+	if (dev_zero < 0)
+		return 1;
+
+	ksft_print_header();
+
+	for (int i = 0; i < ARRAY_SIZE(tests); ++i)
+		plan += tests[i].nr_tests;
+
+	ksft_set_plan(plan);
+
+	for (int i = 0; i < ARRAY_SIZE(tests); ++i)
+		tests[i].test_fn();
+
+	ksft_finished();
+}
-- 
2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-19 21:58   ` Samuel Holland
@ 2024-03-19 23:55     ` Deepak Gupta
  -1 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-19 23:55 UTC (permalink / raw)
  To: samuel.holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
<samuel.holland=sifive.com@lists.riscv.org> wrote:
>
> Some envcfg bits need to be controlled on a per-thread basis, such as
> the pointer masking mode. However, the envcfg CSR value cannot simply be
> stored in struct thread_struct, because some hardware may implement a
> different subset of envcfg CSR bits is across CPUs. As a result, we need
> to combine the per-CPU and per-thread bits whenever we switch threads.
>

Why not do something like this

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index b3400517b0a9..01ba87954da2 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -202,6 +202,8 @@
 #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
 #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
 #define ENVCFG_FIOM                    _AC(0x1, UL)
+/* by default all threads should be able to zero cache */
+#define ENVCFG_BASE                    ENVCFG_CBZE

 /* Smstateen bits */
 #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 4f21d970a129..2420123444c4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
        else
                regs->status |= SR_UXL_64;
 #endif
+       current->thread_info.envcfg = ENVCFG_BASE;
 }

And instead of context switching in `_switch_to`,
In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.

This construction avoids
- declaring per cpu riscv_cpu_envcfg
- syncing up
- collection of *envcfg bits.


> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
>  arch/riscv/include/asm/cpufeature.h |  2 ++
>  arch/riscv/include/asm/processor.h  |  1 +
>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
>  arch/riscv/kernel/cpufeature.c      |  4 +++-
>  4 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> index 0bd11862b760..b1ad8d0b4599 100644
> --- a/arch/riscv/include/asm/cpufeature.h
> +++ b/arch/riscv/include/asm/cpufeature.h
> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
>  /* Per-cpu ISA extensions. */
>  extern struct riscv_isainfo hart_isa[NR_CPUS];
>
> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
> +
>  void riscv_user_isa_enable(void);
>
>  #ifdef CONFIG_RISCV_MISALIGNED
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index a8509cc31ab2..06b87402a4d8 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -118,6 +118,7 @@ struct thread_struct {
>         unsigned long s[12];    /* s[0]: frame pointer */
>         struct __riscv_d_ext_state fstate;
>         unsigned long bad_cause;
> +       unsigned long envcfg;
>         u32 riscv_v_flags;
>         u32 vstate_ctrl;
>         struct __riscv_v_ext_state vstate;
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 7efdb0584d47..256a354a5c4a 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>  #endif
>
> +static inline void sync_envcfg(struct task_struct *task)
> +{
> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> +}
> +
> +static inline void __switch_to_envcfg(struct task_struct *next)
> +{
> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))

I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
even if ALTERNATIVES was turned on.
Can you check disasm on your end as well.  IMHO, `entry.S` is a better
place to pick up *envcfg.

> +               sync_envcfg(next);
> +}
> +
>  extern struct task_struct *__switch_to(struct task_struct *,
>                                        struct task_struct *);
>
> @@ -80,6 +91,7 @@ do {                                                  \
>                 __switch_to_fpu(__prev, __next);        \
>         if (has_vector())                                       \
>                 __switch_to_vector(__prev, __next);     \
> +       __switch_to_envcfg(__next);                     \
>         ((last) = __switch_to(__prev, __next));         \
>  } while (0)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index d1846aab1f78..32aaaf41f8a8 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
>  /* Per-cpu ISA extensions. */
>  struct riscv_isainfo hart_isa[NR_CPUS];
>
> +DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
> +
>  /* Performance information */
>  DEFINE_PER_CPU(long, misaligned_access_speed);
>
> @@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
>  void riscv_user_isa_enable(void)
>  {
>         if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
> -               csr_set(CSR_ENVCFG, ENVCFG_CBZE);
> +               this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);
>  }
>
>  #ifdef CONFIG_RISCV_ALTERNATIVE
> --
> 2.43.1
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#659): https://lists.riscv.org/g/tech-j-ext/message/659
> Mute This Topic: https://lists.riscv.org/mt/105033914/7300952
> Group Owner: tech-j-ext+owner@lists.riscv.org
> Unsubscribe: https://lists.riscv.org/g/tech-j-ext/unsub [debug@rivosinc.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-19 23:55     ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-19 23:55 UTC (permalink / raw)
  To: samuel.holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
<samuel.holland=sifive.com@lists.riscv.org> wrote:
>
> Some envcfg bits need to be controlled on a per-thread basis, such as
> the pointer masking mode. However, the envcfg CSR value cannot simply be
> stored in struct thread_struct, because some hardware may implement a
> different subset of envcfg CSR bits is across CPUs. As a result, we need
> to combine the per-CPU and per-thread bits whenever we switch threads.
>

Why not do something like this

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index b3400517b0a9..01ba87954da2 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -202,6 +202,8 @@
 #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
 #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
 #define ENVCFG_FIOM                    _AC(0x1, UL)
+/* by default all threads should be able to zero cache */
+#define ENVCFG_BASE                    ENVCFG_CBZE

 /* Smstateen bits */
 #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 4f21d970a129..2420123444c4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
        else
                regs->status |= SR_UXL_64;
 #endif
+       current->thread_info.envcfg = ENVCFG_BASE;
 }

And instead of context switching in `_switch_to`,
In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.

This construction avoids
- declaring per cpu riscv_cpu_envcfg
- syncing up
- collection of *envcfg bits.


> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
>  arch/riscv/include/asm/cpufeature.h |  2 ++
>  arch/riscv/include/asm/processor.h  |  1 +
>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
>  arch/riscv/kernel/cpufeature.c      |  4 +++-
>  4 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> index 0bd11862b760..b1ad8d0b4599 100644
> --- a/arch/riscv/include/asm/cpufeature.h
> +++ b/arch/riscv/include/asm/cpufeature.h
> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
>  /* Per-cpu ISA extensions. */
>  extern struct riscv_isainfo hart_isa[NR_CPUS];
>
> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
> +
>  void riscv_user_isa_enable(void);
>
>  #ifdef CONFIG_RISCV_MISALIGNED
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index a8509cc31ab2..06b87402a4d8 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -118,6 +118,7 @@ struct thread_struct {
>         unsigned long s[12];    /* s[0]: frame pointer */
>         struct __riscv_d_ext_state fstate;
>         unsigned long bad_cause;
> +       unsigned long envcfg;
>         u32 riscv_v_flags;
>         u32 vstate_ctrl;
>         struct __riscv_v_ext_state vstate;
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 7efdb0584d47..256a354a5c4a 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>  #endif
>
> +static inline void sync_envcfg(struct task_struct *task)
> +{
> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> +}
> +
> +static inline void __switch_to_envcfg(struct task_struct *next)
> +{
> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))

I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
even if ALTERNATIVES was turned on.
Can you check disasm on your end as well.  IMHO, `entry.S` is a better
place to pick up *envcfg.

> +               sync_envcfg(next);
> +}
> +
>  extern struct task_struct *__switch_to(struct task_struct *,
>                                        struct task_struct *);
>
> @@ -80,6 +91,7 @@ do {                                                  \
>                 __switch_to_fpu(__prev, __next);        \
>         if (has_vector())                                       \
>                 __switch_to_vector(__prev, __next);     \
> +       __switch_to_envcfg(__next);                     \
>         ((last) = __switch_to(__prev, __next));         \
>  } while (0)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index d1846aab1f78..32aaaf41f8a8 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
>  /* Per-cpu ISA extensions. */
>  struct riscv_isainfo hart_isa[NR_CPUS];
>
> +DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
> +
>  /* Performance information */
>  DEFINE_PER_CPU(long, misaligned_access_speed);
>
> @@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
>  void riscv_user_isa_enable(void)
>  {
>         if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
> -               csr_set(CSR_ENVCFG, ENVCFG_CBZE);
> +               this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);
>  }
>
>  #ifdef CONFIG_RISCV_ALTERNATIVE
> --
> 2.43.1
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#659): https://lists.riscv.org/g/tech-j-ext/message/659
> Mute This Topic: https://lists.riscv.org/mt/105033914/7300952
> Group Owner: tech-j-ext+owner@lists.riscv.org
> Unsubscribe: https://lists.riscv.org/g/tech-j-ext/unsub [debug@rivosinc.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-19 23:55     ` Deepak Gupta
@ 2024-03-20  2:20       ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-20  2:20 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

Hi Deepak,

On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> <samuel.holland=sifive.com@lists.riscv.org> wrote:
>>
>> Some envcfg bits need to be controlled on a per-thread basis, such as
>> the pointer masking mode. However, the envcfg CSR value cannot simply be
>> stored in struct thread_struct, because some hardware may implement a
>> different subset of envcfg CSR bits is across CPUs. As a result, we need
>> to combine the per-CPU and per-thread bits whenever we switch threads.
>>
> 
> Why not do something like this
> 
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index b3400517b0a9..01ba87954da2 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -202,6 +202,8 @@
>  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
>  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
>  #define ENVCFG_FIOM                    _AC(0x1, UL)
> +/* by default all threads should be able to zero cache */
> +#define ENVCFG_BASE                    ENVCFG_CBZE

Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
have no idea what the CBZE bit does--there's no guarantee it has the standard
meaning--so it's not safe to set the bit unconditionally. If that policy
changes, we could definitely simplify the code.

>  /* Smstateen bits */
>  #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 4f21d970a129..2420123444c4 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
>         else
>                 regs->status |= SR_UXL_64;
>  #endif
> +       current->thread_info.envcfg = ENVCFG_BASE;
>  }
> 
> And instead of context switching in `_switch_to`,
> In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.

The immediate reason is that writing envcfg in ret_from_exception() adds cycles
to every IRQ and system call exit, even though most of them will not change the
envcfg value. This is especially the case when returning from an IRQ/exception
back to S-mode, since envcfg has zero effect there.

The CSRs that are read/written in entry.S are generally those where the value
can be updated by hardware, as part of taking an exception. But envcfg never
changes on its own. The kernel knows exactly when its value will change, and
those places are:

 1) Task switch, i.e. switch_to()
 2) execve(), i.e. start_thread() or flush_thread()
 3) A system call that specifically affects a feature controlled by envcfg

So that's where this series writes it. There are a couple of minor tradeoffs
about when exactly to do the write:

- We could drop the sync_envcfg() calls outside of switch_to() by reading the
  current CSR value when scheduling out a thread, but again that adds overhead
  to the fast path to remove a tiny bit of code in the prctl() handlers.
- We don't need to write envcfg when switching to a kernel thread, only when
  switching to a user thread, because kernel threads never leave S-mode, so
  envcfg doesn't affect them. But checking the thread type takes many more
  instructions than just writing the CSR.

Overall, the optimal implementation will approximate the rule of only writing
envcfg when its value changes.

> This construction avoids
> - declaring per cpu riscv_cpu_envcfg

This is really a separate concern than when we write envcfg. The per-CPU
variable is only necessary to support hardware where a subset of harts support
Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
specifically for Zicboz, I assume this is an important use case, and dropping
support for this hardware would be a regression. After all, hwprobe() allows
userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
weigh in on that.

If we decide to enable Zicboz only when all harts support it, or we decide it's
safe to attempt to set the envcfg.CBZE bit on harts that do not declare support
for Zicboz, then we could drop the percpu variable.

> - syncing up
> - collection of *envcfg bits.
> 
> 
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>>  arch/riscv/include/asm/cpufeature.h |  2 ++
>>  arch/riscv/include/asm/processor.h  |  1 +
>>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
>>  arch/riscv/kernel/cpufeature.c      |  4 +++-
>>  4 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
>> index 0bd11862b760..b1ad8d0b4599 100644
>> --- a/arch/riscv/include/asm/cpufeature.h
>> +++ b/arch/riscv/include/asm/cpufeature.h
>> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
>>  /* Per-cpu ISA extensions. */
>>  extern struct riscv_isainfo hart_isa[NR_CPUS];
>>
>> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
>> +
>>  void riscv_user_isa_enable(void);
>>
>>  #ifdef CONFIG_RISCV_MISALIGNED
>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>> index a8509cc31ab2..06b87402a4d8 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -118,6 +118,7 @@ struct thread_struct {
>>         unsigned long s[12];    /* s[0]: frame pointer */
>>         struct __riscv_d_ext_state fstate;
>>         unsigned long bad_cause;
>> +       unsigned long envcfg;
>>         u32 riscv_v_flags;
>>         u32 vstate_ctrl;
>>         struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
>> index 7efdb0584d47..256a354a5c4a 100644
>> --- a/arch/riscv/include/asm/switch_to.h
>> +++ b/arch/riscv/include/asm/switch_to.h
>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>>  #endif
>>
>> +static inline void sync_envcfg(struct task_struct *task)
>> +{
>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
>> +}
>> +
>> +static inline void __switch_to_envcfg(struct task_struct *next)
>> +{
>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> 
> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> even if ALTERNATIVES was turned on.
> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> place to pick up *envcfg.

The branchiness is sort of expected, since that function is implemented by
switching on/off a branch instruction, so the alternate code is necessarily a
separate basic block. It's a tradeoff so we don't have to write assembly code
for every bit of code that depends on an extension. However, the cost should be
somewhat lowered since the branch is unconditional and so entirely predictable.

If the branch turns out to be problematic for performance, then we could use
ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.

>> +               sync_envcfg(next);
>> +}
>> +
>>  extern struct task_struct *__switch_to(struct task_struct *,
>>                                        struct task_struct *);
>>
>> @@ -80,6 +91,7 @@ do {                                                  \
>>                 __switch_to_fpu(__prev, __next);        \
>>         if (has_vector())                                       \
>>                 __switch_to_vector(__prev, __next);     \
>> +       __switch_to_envcfg(__next);                     \
>>         ((last) = __switch_to(__prev, __next));         \
>>  } while (0)
>>
>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> index d1846aab1f78..32aaaf41f8a8 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
>>  /* Per-cpu ISA extensions. */
>>  struct riscv_isainfo hart_isa[NR_CPUS];
>>
>> +DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
>> +
>>  /* Performance information */
>>  DEFINE_PER_CPU(long, misaligned_access_speed);
>>
>> @@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
>>  void riscv_user_isa_enable(void)
>>  {
>>         if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
>> -               csr_set(CSR_ENVCFG, ENVCFG_CBZE);
>> +               this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);

If we drop the percpu variable, this becomes

	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_ZICBOZ))
		current->thread.envcfg |= ENVCFG_CBZE;

since the init thread's envcfg gets copied to all other threads via fork(), and
we can drop the call to riscv_user_isa_enable() from smp_callin(). Or if we
decide CBZE is always safe to set, then the function is even simpler:

	current->thread.envcfg = ENVCFG_CBZE;

Regards,
Samuel

>>  }
>>
>>  #ifdef CONFIG_RISCV_ALTERNATIVE
>> --
>> 2.43.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-20  2:20       ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-20  2:20 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

Hi Deepak,

On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> <samuel.holland=sifive.com@lists.riscv.org> wrote:
>>
>> Some envcfg bits need to be controlled on a per-thread basis, such as
>> the pointer masking mode. However, the envcfg CSR value cannot simply be
>> stored in struct thread_struct, because some hardware may implement a
>> different subset of envcfg CSR bits is across CPUs. As a result, we need
>> to combine the per-CPU and per-thread bits whenever we switch threads.
>>
> 
> Why not do something like this
> 
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index b3400517b0a9..01ba87954da2 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -202,6 +202,8 @@
>  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
>  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
>  #define ENVCFG_FIOM                    _AC(0x1, UL)
> +/* by default all threads should be able to zero cache */
> +#define ENVCFG_BASE                    ENVCFG_CBZE

Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
have no idea what the CBZE bit does--there's no guarantee it has the standard
meaning--so it's not safe to set the bit unconditionally. If that policy
changes, we could definitely simplify the code.

>  /* Smstateen bits */
>  #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 4f21d970a129..2420123444c4 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
>         else
>                 regs->status |= SR_UXL_64;
>  #endif
> +       current->thread_info.envcfg = ENVCFG_BASE;
>  }
> 
> And instead of context switching in `_switch_to`,
> In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.

The immediate reason is that writing envcfg in ret_from_exception() adds cycles
to every IRQ and system call exit, even though most of them will not change the
envcfg value. This is especially the case when returning from an IRQ/exception
back to S-mode, since envcfg has zero effect there.

The CSRs that are read/written in entry.S are generally those where the value
can be updated by hardware, as part of taking an exception. But envcfg never
changes on its own. The kernel knows exactly when its value will change, and
those places are:

 1) Task switch, i.e. switch_to()
 2) execve(), i.e. start_thread() or flush_thread()
 3) A system call that specifically affects a feature controlled by envcfg

So that's where this series writes it. There are a couple of minor tradeoffs
about when exactly to do the write:

- We could drop the sync_envcfg() calls outside of switch_to() by reading the
  current CSR value when scheduling out a thread, but again that adds overhead
  to the fast path to remove a tiny bit of code in the prctl() handlers.
- We don't need to write envcfg when switching to a kernel thread, only when
  switching to a user thread, because kernel threads never leave S-mode, so
  envcfg doesn't affect them. But checking the thread type takes many more
  instructions than just writing the CSR.

Overall, the optimal implementation will approximate the rule of only writing
envcfg when its value changes.

> This construction avoids
> - declaring per cpu riscv_cpu_envcfg

This is really a separate concern than when we write envcfg. The per-CPU
variable is only necessary to support hardware where a subset of harts support
Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
specifically for Zicboz, I assume this is an important use case, and dropping
support for this hardware would be a regression. After all, hwprobe() allows
userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
weigh in on that.

If we decide to enable Zicboz only when all harts support it, or we decide it's
safe to attempt to set the envcfg.CBZE bit on harts that do not declare support
for Zicboz, then we could drop the percpu variable.

> - syncing up
> - collection of *envcfg bits.
> 
> 
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>>  arch/riscv/include/asm/cpufeature.h |  2 ++
>>  arch/riscv/include/asm/processor.h  |  1 +
>>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
>>  arch/riscv/kernel/cpufeature.c      |  4 +++-
>>  4 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
>> index 0bd11862b760..b1ad8d0b4599 100644
>> --- a/arch/riscv/include/asm/cpufeature.h
>> +++ b/arch/riscv/include/asm/cpufeature.h
>> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
>>  /* Per-cpu ISA extensions. */
>>  extern struct riscv_isainfo hart_isa[NR_CPUS];
>>
>> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
>> +
>>  void riscv_user_isa_enable(void);
>>
>>  #ifdef CONFIG_RISCV_MISALIGNED
>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>> index a8509cc31ab2..06b87402a4d8 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -118,6 +118,7 @@ struct thread_struct {
>>         unsigned long s[12];    /* s[0]: frame pointer */
>>         struct __riscv_d_ext_state fstate;
>>         unsigned long bad_cause;
>> +       unsigned long envcfg;
>>         u32 riscv_v_flags;
>>         u32 vstate_ctrl;
>>         struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
>> index 7efdb0584d47..256a354a5c4a 100644
>> --- a/arch/riscv/include/asm/switch_to.h
>> +++ b/arch/riscv/include/asm/switch_to.h
>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>>  #endif
>>
>> +static inline void sync_envcfg(struct task_struct *task)
>> +{
>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
>> +}
>> +
>> +static inline void __switch_to_envcfg(struct task_struct *next)
>> +{
>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> 
> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> even if ALTERNATIVES was turned on.
> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> place to pick up *envcfg.

The branchiness is sort of expected, since that function is implemented by
switching on/off a branch instruction, so the alternate code is necessarily a
separate basic block. It's a tradeoff so we don't have to write assembly code
for every bit of code that depends on an extension. However, the cost should be
somewhat lowered since the branch is unconditional and so entirely predictable.

If the branch turns out to be problematic for performance, then we could use
ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.

>> +               sync_envcfg(next);
>> +}
>> +
>>  extern struct task_struct *__switch_to(struct task_struct *,
>>                                        struct task_struct *);
>>
>> @@ -80,6 +91,7 @@ do {                                                  \
>>                 __switch_to_fpu(__prev, __next);        \
>>         if (has_vector())                                       \
>>                 __switch_to_vector(__prev, __next);     \
>> +       __switch_to_envcfg(__next);                     \
>>         ((last) = __switch_to(__prev, __next));         \
>>  } while (0)
>>
>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> index d1846aab1f78..32aaaf41f8a8 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
>>  /* Per-cpu ISA extensions. */
>>  struct riscv_isainfo hart_isa[NR_CPUS];
>>
>> +DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
>> +
>>  /* Performance information */
>>  DEFINE_PER_CPU(long, misaligned_access_speed);
>>
>> @@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
>>  void riscv_user_isa_enable(void)
>>  {
>>         if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
>> -               csr_set(CSR_ENVCFG, ENVCFG_CBZE);
>> +               this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);

If we drop the percpu variable, this becomes

	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_ZICBOZ))
		current->thread.envcfg |= ENVCFG_CBZE;

since the init thread's envcfg gets copied to all other threads via fork(), and
we can drop the call to riscv_user_isa_enable() from smp_callin(). Or if we
decide CBZE is always safe to set, then the function is even simpler:

	current->thread.envcfg = ENVCFG_CBZE;

Regards,
Samuel

>>  }
>>
>>  #ifdef CONFIG_RISCV_ALTERNATIVE
>> --
>> 2.43.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20  2:20       ` Samuel Holland
@ 2024-03-20  4:39         ` Deepak Gupta
  -1 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-20  4:39 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

Hi Samuel,

Thanks for your response.

On Tue, Mar 19, 2024 at 7:21 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Deepak,
>
> On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> > On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> > <samuel.holland=sifive.com@lists.riscv.org> wrote:
> >>
> >> Some envcfg bits need to be controlled on a per-thread basis, such as
> >> the pointer masking mode. However, the envcfg CSR value cannot simply be
> >> stored in struct thread_struct, because some hardware may implement a
> >> different subset of envcfg CSR bits is across CPUs. As a result, we need
> >> to combine the per-CPU and per-thread bits whenever we switch threads.
> >>
> >
> > Why not do something like this
> >
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index b3400517b0a9..01ba87954da2 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -202,6 +202,8 @@
> >  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
> >  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
> >  #define ENVCFG_FIOM                    _AC(0x1, UL)
> > +/* by default all threads should be able to zero cache */
> > +#define ENVCFG_BASE                    ENVCFG_CBZE
>
> Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
> have no idea what the CBZE bit does--there's no guarantee it has the standard
> meaning--so it's not safe to set the bit unconditionally. If that policy
> changes, we could definitely simplify the code.
>

Yeah, it makes sense.

> >  /* Smstateen bits */
> >  #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 4f21d970a129..2420123444c4 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> >         else
> >                 regs->status |= SR_UXL_64;
> >  #endif
> > +       current->thread_info.envcfg = ENVCFG_BASE;
> >  }
> >
> > And instead of context switching in `_switch_to`,
> > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>
> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
> to every IRQ and system call exit, even though most of them will not change the
> envcfg value. This is especially the case when returning from an IRQ/exception
> back to S-mode, since envcfg has zero effect there.
>
> The CSRs that are read/written in entry.S are generally those where the value
> can be updated by hardware, as part of taking an exception. But envcfg never
> changes on its own. The kernel knows exactly when its value will change, and
> those places are:
>
>  1) Task switch, i.e. switch_to()
>  2) execve(), i.e. start_thread() or flush_thread()
>  3) A system call that specifically affects a feature controlled by envcfg

Yeah I was optimizing for a single place to write instead of
sprinkling at multiple places.
But I see your argument. That's fine.

>
> So that's where this series writes it. There are a couple of minor tradeoffs
> about when exactly to do the write:
>
> - We could drop the sync_envcfg() calls outside of switch_to() by reading the
>   current CSR value when scheduling out a thread, but again that adds overhead
>   to the fast path to remove a tiny bit of code in the prctl() handlers.
> - We don't need to write envcfg when switching to a kernel thread, only when
>   switching to a user thread, because kernel threads never leave S-mode, so
>   envcfg doesn't affect them. But checking the thread type takes many more
>   instructions than just writing the CSR.
>
> Overall, the optimal implementation will approximate the rule of only writing
> envcfg when its value changes.
>
> > This construction avoids
> > - declaring per cpu riscv_cpu_envcfg
>
> This is really a separate concern than when we write envcfg. The per-CPU
> variable is only necessary to support hardware where a subset of harts support
> Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
> specifically for Zicboz, I assume this is an important use case, and dropping
> support for this hardware would be a regression. After all, hwprobe() allows
> userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
> weigh in on that.

I am not sure of the practicality of this heterogeneity for Zicboz and
for that matter any of the upcoming
features that'll be enabled via senvcfg (control flow integrity,
pointer masking, etc).

As an example if cache zeroing instructions are used by app binary, I
expect it to be used in following
manner

 - Explicitly inserting cbo.zero by application developer
 - Some compiler flag which ensures that structures larger than cache
line gets zeroed by cbo.zero

In either of the cases, the developer is not expecting to target it to
a specific hart on SoC and instead expect it to work.
There might be libraries (installed via sudo apt get) with cache zero
support in them which may run in different address spaces.
Should the library be aware of the CPU on which it's running. Now
whoever is running these binaries should be aware which CPUs
they get assigned to in order to avoid faults?

That seems excessive, doesn't it?

>
> If we decide to enable Zicboz only when all harts support it, or we decide it's
> safe to attempt to set the envcfg.CBZE bit on harts that do not declare support
> for Zicboz, then we could drop the percpu variable.
>
> > - syncing up
> > - collection of *envcfg bits.
> >
> >
> >> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> >> ---
> >>
> >>  arch/riscv/include/asm/cpufeature.h |  2 ++
> >>  arch/riscv/include/asm/processor.h  |  1 +
> >>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
> >>  arch/riscv/kernel/cpufeature.c      |  4 +++-
> >>  4 files changed, 18 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> >> index 0bd11862b760..b1ad8d0b4599 100644
> >> --- a/arch/riscv/include/asm/cpufeature.h
> >> +++ b/arch/riscv/include/asm/cpufeature.h
> >> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
> >>  /* Per-cpu ISA extensions. */
> >>  extern struct riscv_isainfo hart_isa[NR_CPUS];
> >>
> >> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
> >> +
> >>  void riscv_user_isa_enable(void);
> >>
> >>  #ifdef CONFIG_RISCV_MISALIGNED
> >> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> >> index a8509cc31ab2..06b87402a4d8 100644
> >> --- a/arch/riscv/include/asm/processor.h
> >> +++ b/arch/riscv/include/asm/processor.h
> >> @@ -118,6 +118,7 @@ struct thread_struct {
> >>         unsigned long s[12];    /* s[0]: frame pointer */
> >>         struct __riscv_d_ext_state fstate;
> >>         unsigned long bad_cause;
> >> +       unsigned long envcfg;
> >>         u32 riscv_v_flags;
> >>         u32 vstate_ctrl;
> >>         struct __riscv_v_ext_state vstate;
> >> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> >> index 7efdb0584d47..256a354a5c4a 100644
> >> --- a/arch/riscv/include/asm/switch_to.h
> >> +++ b/arch/riscv/include/asm/switch_to.h
> >> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> >>  #define __switch_to_fpu(__prev, __next) do { } while (0)
> >>  #endif
> >>
> >> +static inline void sync_envcfg(struct task_struct *task)
> >> +{
> >> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> >> +}
> >> +
> >> +static inline void __switch_to_envcfg(struct task_struct *next)
> >> +{
> >> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> >
> > I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> > even if ALTERNATIVES was turned on.
> > Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> > place to pick up *envcfg.
>
> The branchiness is sort of expected, since that function is implemented by
> switching on/off a branch instruction, so the alternate code is necessarily a
> separate basic block. It's a tradeoff so we don't have to write assembly code
> for every bit of code that depends on an extension. However, the cost should be
> somewhat lowered since the branch is unconditional and so entirely predictable.
>
> If the branch turns out to be problematic for performance, then we could use
> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.

Yeah I lean towards using alternatives directly.

>
> >> +               sync_envcfg(next);
> >> +}

>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-20  4:39         ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-20  4:39 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

Hi Samuel,

Thanks for your response.

On Tue, Mar 19, 2024 at 7:21 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Hi Deepak,
>
> On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> > On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> > <samuel.holland=sifive.com@lists.riscv.org> wrote:
> >>
> >> Some envcfg bits need to be controlled on a per-thread basis, such as
> >> the pointer masking mode. However, the envcfg CSR value cannot simply be
> >> stored in struct thread_struct, because some hardware may implement a
> >> different subset of envcfg CSR bits is across CPUs. As a result, we need
> >> to combine the per-CPU and per-thread bits whenever we switch threads.
> >>
> >
> > Why not do something like this
> >
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index b3400517b0a9..01ba87954da2 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -202,6 +202,8 @@
> >  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
> >  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
> >  #define ENVCFG_FIOM                    _AC(0x1, UL)
> > +/* by default all threads should be able to zero cache */
> > +#define ENVCFG_BASE                    ENVCFG_CBZE
>
> Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
> have no idea what the CBZE bit does--there's no guarantee it has the standard
> meaning--so it's not safe to set the bit unconditionally. If that policy
> changes, we could definitely simplify the code.
>

Yeah, it makes sense.

> >  /* Smstateen bits */
> >  #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 4f21d970a129..2420123444c4 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
> >         else
> >                 regs->status |= SR_UXL_64;
> >  #endif
> > +       current->thread_info.envcfg = ENVCFG_BASE;
> >  }
> >
> > And instead of context switching in `_switch_to`,
> > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>
> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
> to every IRQ and system call exit, even though most of them will not change the
> envcfg value. This is especially the case when returning from an IRQ/exception
> back to S-mode, since envcfg has zero effect there.
>
> The CSRs that are read/written in entry.S are generally those where the value
> can be updated by hardware, as part of taking an exception. But envcfg never
> changes on its own. The kernel knows exactly when its value will change, and
> those places are:
>
>  1) Task switch, i.e. switch_to()
>  2) execve(), i.e. start_thread() or flush_thread()
>  3) A system call that specifically affects a feature controlled by envcfg

Yeah I was optimizing for a single place to write instead of
sprinkling at multiple places.
But I see your argument. That's fine.

>
> So that's where this series writes it. There are a couple of minor tradeoffs
> about when exactly to do the write:
>
> - We could drop the sync_envcfg() calls outside of switch_to() by reading the
>   current CSR value when scheduling out a thread, but again that adds overhead
>   to the fast path to remove a tiny bit of code in the prctl() handlers.
> - We don't need to write envcfg when switching to a kernel thread, only when
>   switching to a user thread, because kernel threads never leave S-mode, so
>   envcfg doesn't affect them. But checking the thread type takes many more
>   instructions than just writing the CSR.
>
> Overall, the optimal implementation will approximate the rule of only writing
> envcfg when its value changes.
>
> > This construction avoids
> > - declaring per cpu riscv_cpu_envcfg
>
> This is really a separate concern than when we write envcfg. The per-CPU
> variable is only necessary to support hardware where a subset of harts support
> Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
> specifically for Zicboz, I assume this is an important use case, and dropping
> support for this hardware would be a regression. After all, hwprobe() allows
> userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
> weigh in on that.

I am not sure of the practicality of this heterogeneity for Zicboz and
for that matter any of the upcoming
features that'll be enabled via senvcfg (control flow integrity,
pointer masking, etc).

As an example if cache zeroing instructions are used by app binary, I
expect it to be used in following
manner

 - Explicitly inserting cbo.zero by application developer
 - Some compiler flag which ensures that structures larger than cache
line gets zeroed by cbo.zero

In either of the cases, the developer is not expecting to target it to
a specific hart on SoC and instead expect it to work.
There might be libraries (installed via sudo apt get) with cache zero
support in them which may run in different address spaces.
Should the library be aware of the CPU on which it's running. Now
whoever is running these binaries should be aware which CPUs
they get assigned to in order to avoid faults?

That seems excessive, doesn't it?

>
> If we decide to enable Zicboz only when all harts support it, or we decide it's
> safe to attempt to set the envcfg.CBZE bit on harts that do not declare support
> for Zicboz, then we could drop the percpu variable.
>
> > - syncing up
> > - collection of *envcfg bits.
> >
> >
> >> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> >> ---
> >>
> >>  arch/riscv/include/asm/cpufeature.h |  2 ++
> >>  arch/riscv/include/asm/processor.h  |  1 +
> >>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
> >>  arch/riscv/kernel/cpufeature.c      |  4 +++-
> >>  4 files changed, 18 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> >> index 0bd11862b760..b1ad8d0b4599 100644
> >> --- a/arch/riscv/include/asm/cpufeature.h
> >> +++ b/arch/riscv/include/asm/cpufeature.h
> >> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
> >>  /* Per-cpu ISA extensions. */
> >>  extern struct riscv_isainfo hart_isa[NR_CPUS];
> >>
> >> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
> >> +
> >>  void riscv_user_isa_enable(void);
> >>
> >>  #ifdef CONFIG_RISCV_MISALIGNED
> >> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> >> index a8509cc31ab2..06b87402a4d8 100644
> >> --- a/arch/riscv/include/asm/processor.h
> >> +++ b/arch/riscv/include/asm/processor.h
> >> @@ -118,6 +118,7 @@ struct thread_struct {
> >>         unsigned long s[12];    /* s[0]: frame pointer */
> >>         struct __riscv_d_ext_state fstate;
> >>         unsigned long bad_cause;
> >> +       unsigned long envcfg;
> >>         u32 riscv_v_flags;
> >>         u32 vstate_ctrl;
> >>         struct __riscv_v_ext_state vstate;
> >> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> >> index 7efdb0584d47..256a354a5c4a 100644
> >> --- a/arch/riscv/include/asm/switch_to.h
> >> +++ b/arch/riscv/include/asm/switch_to.h
> >> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> >>  #define __switch_to_fpu(__prev, __next) do { } while (0)
> >>  #endif
> >>
> >> +static inline void sync_envcfg(struct task_struct *task)
> >> +{
> >> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> >> +}
> >> +
> >> +static inline void __switch_to_envcfg(struct task_struct *next)
> >> +{
> >> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> >
> > I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> > even if ALTERNATIVES was turned on.
> > Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> > place to pick up *envcfg.
>
> The branchiness is sort of expected, since that function is implemented by
> switching on/off a branch instruction, so the alternate code is necessarily a
> separate basic block. It's a tradeoff so we don't have to write assembly code
> for every bit of code that depends on an extension. However, the cost should be
> somewhat lowered since the branch is unconditional and so entirely predictable.
>
> If the branch turns out to be problematic for performance, then we could use
> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.

Yeah I lean towards using alternatives directly.

>
> >> +               sync_envcfg(next);
> >> +}

>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20  2:20       ` Samuel Holland
@ 2024-03-20  8:06         ` Conor Dooley
  -1 siblings, 0 replies; 54+ messages in thread
From: Conor Dooley @ 2024-03-20  8:06 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Deepak Gupta, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

[-- Attachment #1: Type: text/plain, Size: 2467 bytes --]

On Tue, Mar 19, 2024 at 09:20:59PM -0500, Samuel Holland wrote:
> On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> > On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> > <samuel.holland=sifive.com@lists.riscv.org> wrote:
> >>
> >> Some envcfg bits need to be controlled on a per-thread basis, such as
> >> the pointer masking mode. However, the envcfg CSR value cannot simply be
> >> stored in struct thread_struct, because some hardware may implement a
> >> different subset of envcfg CSR bits is across CPUs. As a result, we need
> >> to combine the per-CPU and per-thread bits whenever we switch threads.
> >>
> > 
> > Why not do something like this
> > 
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index b3400517b0a9..01ba87954da2 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -202,6 +202,8 @@
> >  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
> >  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
> >  #define ENVCFG_FIOM                    _AC(0x1, UL)
> > +/* by default all threads should be able to zero cache */
> > +#define ENVCFG_BASE                    ENVCFG_CBZE
> 
> Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
> have no idea what the CBZE bit does--there's no guarantee it has the standard
> meaning--so it's not safe to set the bit unconditionally. If that policy
> changes, we could definitely simplify the code.

The wording for that "extension", if two lines in the profiles doc makes
something an extension is:
"No non-conforming extensions are present. Attempts to execute unimplemented
opcodes or access unimplemented CSRs in the standard or reserved encoding
spaces raises an illegal instruction exception that results in a contained
trap to the supervisor-mode trap handler."

I know we have had new extensions come along and mark previously fair
game interrupts for vendors as out of bounds. I wonder if there's a risk
of that happening with CSRs or opcodes too (or maybe it has happened and
I cannot recall).

Going back to the interrupts - is the Andes PMU non-conforming because
it uses an interrupt that was declared as vendor usable but is now part
of the standard space because of AIA? If it is, then the meaning of
Sstrict could vary wildly based on the set of extensions (and their
versions for specs). That sounds like a lot of fun.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-20  8:06         ` Conor Dooley
  0 siblings, 0 replies; 54+ messages in thread
From: Conor Dooley @ 2024-03-20  8:06 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Deepak Gupta, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley


[-- Attachment #1.1: Type: text/plain, Size: 2467 bytes --]

On Tue, Mar 19, 2024 at 09:20:59PM -0500, Samuel Holland wrote:
> On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> > On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> > <samuel.holland=sifive.com@lists.riscv.org> wrote:
> >>
> >> Some envcfg bits need to be controlled on a per-thread basis, such as
> >> the pointer masking mode. However, the envcfg CSR value cannot simply be
> >> stored in struct thread_struct, because some hardware may implement a
> >> different subset of envcfg CSR bits is across CPUs. As a result, we need
> >> to combine the per-CPU and per-thread bits whenever we switch threads.
> >>
> > 
> > Why not do something like this
> > 
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index b3400517b0a9..01ba87954da2 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -202,6 +202,8 @@
> >  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
> >  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
> >  #define ENVCFG_FIOM                    _AC(0x1, UL)
> > +/* by default all threads should be able to zero cache */
> > +#define ENVCFG_BASE                    ENVCFG_CBZE
> 
> Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
> have no idea what the CBZE bit does--there's no guarantee it has the standard
> meaning--so it's not safe to set the bit unconditionally. If that policy
> changes, we could definitely simplify the code.

The wording for that "extension", if two lines in the profiles doc makes
something an extension is:
"No non-conforming extensions are present. Attempts to execute unimplemented
opcodes or access unimplemented CSRs in the standard or reserved encoding
spaces raises an illegal instruction exception that results in a contained
trap to the supervisor-mode trap handler."

I know we have had new extensions come along and mark previously fair
game interrupts for vendors as out of bounds. I wonder if there's a risk
of that happening with CSRs or opcodes too (or maybe it has happened and
I cannot recall).

Going back to the interrupts - is the Andes PMU non-conforming because
it uses an interrupt that was declared as vendor usable but is now part
of the standard space because of AIA? If it is, then the meaning of
Sstrict could vary wildly based on the set of extensions (and their
versions for specs). That sounds like a lot of fun.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test
  2024-03-19 21:58   ` Samuel Holland
@ 2024-03-20 17:21     ` Conor Dooley
  -1 siblings, 0 replies; 54+ messages in thread
From: Conor Dooley @ 2024-03-20 17:21 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, kasan-dev, Evgenii Stepanov,
	Krzysztof Kozlowski, Rob Herring, Albert Ou, Shuah Khan

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Tue, Mar 19, 2024 at 02:58:35PM -0700, Samuel Holland wrote:
> This test covers the behavior of the PR_SET_TAGGED_ADDR_CTRL and
> PR_GET_TAGGED_ADDR_CTRL prctl() operations, their effects on the
> userspace ABI, and their effects on the system call ABI.
> 
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
> 
>  tools/testing/selftests/riscv/Makefile        |   2 +-
>  tools/testing/selftests/riscv/tags/Makefile   |  10 +
>  .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++

I dunno much about selftests, but this patch seems to produce some
warnings about gitignores with allmodconfig:
tools/testing/selftests/riscv/tags/Makefile: warning: ignored by one of the .gitignore files
tools/testing/selftests/riscv/tags/pointer_masking.c: warning: ignored by one of the .gitignore files

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test
@ 2024-03-20 17:21     ` Conor Dooley
  0 siblings, 0 replies; 54+ messages in thread
From: Conor Dooley @ 2024-03-20 17:21 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, kasan-dev, Evgenii Stepanov,
	Krzysztof Kozlowski, Rob Herring, Albert Ou, Shuah Khan


[-- Attachment #1.1: Type: text/plain, Size: 842 bytes --]

On Tue, Mar 19, 2024 at 02:58:35PM -0700, Samuel Holland wrote:
> This test covers the behavior of the PR_SET_TAGGED_ADDR_CTRL and
> PR_GET_TAGGED_ADDR_CTRL prctl() operations, their effects on the
> userspace ABI, and their effects on the system call ABI.
> 
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
> 
>  tools/testing/selftests/riscv/Makefile        |   2 +-
>  tools/testing/selftests/riscv/tags/Makefile   |  10 +
>  .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++

I dunno much about selftests, but this patch seems to produce some
warnings about gitignores with allmodconfig:
tools/testing/selftests/riscv/tags/Makefile: warning: ignored by one of the .gitignore files
tools/testing/selftests/riscv/tags/pointer_masking.c: warning: ignored by one of the .gitignore files

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test
  2024-03-20 17:21     ` Conor Dooley
@ 2024-03-20 18:04       ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-20 18:04 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, kasan-dev, Evgenii Stepanov,
	Krzysztof Kozlowski, Rob Herring, Albert Ou, Shuah Khan

Hi Conor,

On 2024-03-20 12:21 PM, Conor Dooley wrote:
> On Tue, Mar 19, 2024 at 02:58:35PM -0700, Samuel Holland wrote:
>> This test covers the behavior of the PR_SET_TAGGED_ADDR_CTRL and
>> PR_GET_TAGGED_ADDR_CTRL prctl() operations, their effects on the
>> userspace ABI, and their effects on the system call ABI.
>>
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>>  tools/testing/selftests/riscv/Makefile        |   2 +-
>>  tools/testing/selftests/riscv/tags/Makefile   |  10 +
>>  .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++
> 
> I dunno much about selftests, but this patch seems to produce some
> warnings about gitignores with allmodconfig:
> tools/testing/selftests/riscv/tags/Makefile: warning: ignored by one of the .gitignore files
> tools/testing/selftests/riscv/tags/pointer_masking.c: warning: ignored by one of the .gitignore files

This is because the "tags" directory name is ignored by the top-level
.gitignore. I chose the name to match tools/testing/selftests/arm64/tags, but I
am fine with renaming it to avoid the warning.

Regards,
Samuel


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test
@ 2024-03-20 18:04       ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-20 18:04 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, kasan-dev, Evgenii Stepanov,
	Krzysztof Kozlowski, Rob Herring, Albert Ou, Shuah Khan

Hi Conor,

On 2024-03-20 12:21 PM, Conor Dooley wrote:
> On Tue, Mar 19, 2024 at 02:58:35PM -0700, Samuel Holland wrote:
>> This test covers the behavior of the PR_SET_TAGGED_ADDR_CTRL and
>> PR_GET_TAGGED_ADDR_CTRL prctl() operations, their effects on the
>> userspace ABI, and their effects on the system call ABI.
>>
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>> ---
>>
>>  tools/testing/selftests/riscv/Makefile        |   2 +-
>>  tools/testing/selftests/riscv/tags/Makefile   |  10 +
>>  .../selftests/riscv/tags/pointer_masking.c    | 307 ++++++++++++++++++
> 
> I dunno much about selftests, but this patch seems to produce some
> warnings about gitignores with allmodconfig:
> tools/testing/selftests/riscv/tags/Makefile: warning: ignored by one of the .gitignore files
> tools/testing/selftests/riscv/tags/pointer_masking.c: warning: ignored by one of the .gitignore files

This is because the "tags" directory name is ignored by the top-level
.gitignore. I chose the name to match tools/testing/selftests/arm64/tags, but I
am fine with renaming it to avoid the warning.

Regards,
Samuel


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
       [not found]       ` <17BE5F38AFE245E5.29196@lists.riscv.org>
@ 2024-03-20 23:27           ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-20 23:27 UTC (permalink / raw)
  To: debug
  Cc: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Andrew Jones, Guo Ren, Heiko Stuebner, Paul Walmsley

> > >
> > > And instead of context switching in `_switch_to`,
> > > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
> >
> > The immediate reason is that writing envcfg in ret_from_exception() adds cycles
> > to every IRQ and system call exit, even though most of them will not change the
> > envcfg value. This is especially the case when returning from an IRQ/exception
> > back to S-mode, since envcfg has zero effect there.
> >
> > The CSRs that are read/written in entry.S are generally those where the value
> > can be updated by hardware, as part of taking an exception. But envcfg never
> > changes on its own. The kernel knows exactly when its value will change, and
> > those places are:
> >
> >  1) Task switch, i.e. switch_to()
> >  2) execve(), i.e. start_thread() or flush_thread()
> >  3) A system call that specifically affects a feature controlled by envcfg
>
> Yeah I was optimizing for a single place to write instead of
> sprinkling at multiple places.
> But I see your argument. That's fine.
>

Because this is RFC and we are discussing it. I thought a little bit
more about this.

If we were to go with the above approach that essentially requires
whenever a envcfg bit changes, `sync_envcfg`
has to be called to reflect the correct value.

What if some of these features enable/disable are exposed to `ptrace`
(gdb, etc use cases) for enable/disable.
How will syncing work then ?

I can see the reasoning behind saving some cycles during trap return.
But `senvcfg` is not actually a user state, it
controls the execution environment configuration for user mode. I
think the best place for this CSR to be written is
trap return and writing at a single place from a single image on stack
reduces chances of bugs and errors. And allows
`senvcfg` features to be exposed to other kernel flows (like `ptrace`)

We can figure out ways on how to optimize in trap return path to avoid
writing it if we entered and exiting on the same
task.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-20 23:27           ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-20 23:27 UTC (permalink / raw)
  To: debug
  Cc: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Andrew Jones, Guo Ren, Heiko Stuebner, Paul Walmsley

> > >
> > > And instead of context switching in `_switch_to`,
> > > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
> >
> > The immediate reason is that writing envcfg in ret_from_exception() adds cycles
> > to every IRQ and system call exit, even though most of them will not change the
> > envcfg value. This is especially the case when returning from an IRQ/exception
> > back to S-mode, since envcfg has zero effect there.
> >
> > The CSRs that are read/written in entry.S are generally those where the value
> > can be updated by hardware, as part of taking an exception. But envcfg never
> > changes on its own. The kernel knows exactly when its value will change, and
> > those places are:
> >
> >  1) Task switch, i.e. switch_to()
> >  2) execve(), i.e. start_thread() or flush_thread()
> >  3) A system call that specifically affects a feature controlled by envcfg
>
> Yeah I was optimizing for a single place to write instead of
> sprinkling at multiple places.
> But I see your argument. That's fine.
>

Because this is RFC and we are discussing it. I thought a little bit
more about this.

If we were to go with the above approach that essentially requires
whenever a envcfg bit changes, `sync_envcfg`
has to be called to reflect the correct value.

What if some of these features enable/disable are exposed to `ptrace`
(gdb, etc use cases) for enable/disable.
How will syncing work then ?

I can see the reasoning behind saving some cycles during trap return.
But `senvcfg` is not actually a user state, it
controls the execution environment configuration for user mode. I
think the best place for this CSR to be written is
trap return and writing at a single place from a single image on stack
reduces chances of bugs and errors. And allows
`senvcfg` features to be exposed to other kernel flows (like `ptrace`)

We can figure out ways on how to optimize in trap return path to avoid
writing it if we entered and exiting on the same
task.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20  4:39         ` Deepak Gupta
@ 2024-03-22  0:13           ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-22  0:13 UTC (permalink / raw)
  To: Deepak Gupta, Conor Dooley, Palmer Dabbelt
  Cc: linux-riscv, devicetree, Catalin Marinas, linux-kernel,
	tech-j-ext, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Andrew Jones, Guo Ren, Heiko Stuebner,
	Paul Walmsley

On 2024-03-19 11:39 PM, Deepak Gupta wrote:
>>>> --- a/arch/riscv/include/asm/switch_to.h
>>>> +++ b/arch/riscv/include/asm/switch_to.h
>>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>>>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>>>>  #endif
>>>>
>>>> +static inline void sync_envcfg(struct task_struct *task)
>>>> +{
>>>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
>>>> +}
>>>> +
>>>> +static inline void __switch_to_envcfg(struct task_struct *next)
>>>> +{
>>>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
>>>
>>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
>>> even if ALTERNATIVES was turned on.
>>> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
>>> place to pick up *envcfg.
>>
>> The branchiness is sort of expected, since that function is implemented by
>> switching on/off a branch instruction, so the alternate code is necessarily a
>> separate basic block. It's a tradeoff so we don't have to write assembly code
>> for every bit of code that depends on an extension. However, the cost should be
>> somewhat lowered since the branch is unconditional and so entirely predictable.
>>
>> If the branch turns out to be problematic for performance, then we could use
>> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> 
> Yeah I lean towards using alternatives directly.

One thing to note here: we can't use alternatives directly if the behavior needs
to be different on different harts (i.e. a subset of harts implement the envcfg
CSR). I think we need some policy about which ISA extensions are allowed to be
asymmetric across harts, or else we add too much complexity.

Regards,
Samuel


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-22  0:13           ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-22  0:13 UTC (permalink / raw)
  To: Deepak Gupta, Conor Dooley, Palmer Dabbelt
  Cc: linux-riscv, devicetree, Catalin Marinas, linux-kernel,
	tech-j-ext, kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski,
	Rob Herring, Andrew Jones, Guo Ren, Heiko Stuebner,
	Paul Walmsley

On 2024-03-19 11:39 PM, Deepak Gupta wrote:
>>>> --- a/arch/riscv/include/asm/switch_to.h
>>>> +++ b/arch/riscv/include/asm/switch_to.h
>>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>>>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>>>>  #endif
>>>>
>>>> +static inline void sync_envcfg(struct task_struct *task)
>>>> +{
>>>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
>>>> +}
>>>> +
>>>> +static inline void __switch_to_envcfg(struct task_struct *next)
>>>> +{
>>>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
>>>
>>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
>>> even if ALTERNATIVES was turned on.
>>> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
>>> place to pick up *envcfg.
>>
>> The branchiness is sort of expected, since that function is implemented by
>> switching on/off a branch instruction, so the alternate code is necessarily a
>> separate basic block. It's a tradeoff so we don't have to write assembly code
>> for every bit of code that depends on an extension. However, the cost should be
>> somewhat lowered since the branch is unconditional and so entirely predictable.
>>
>> If the branch turns out to be problematic for performance, then we could use
>> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> 
> Yeah I lean towards using alternatives directly.

One thing to note here: we can't use alternatives directly if the behavior needs
to be different on different harts (i.e. a subset of harts implement the envcfg
CSR). I think we need some policy about which ISA extensions are allowed to be
asymmetric across harts, or else we add too much complexity.

Regards,
Samuel


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20 23:27           ` Deepak Gupta
@ 2024-03-22  3:43             ` Samuel Holland
  -1 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-22  3:43 UTC (permalink / raw)
  To: Deepak Gupta, Andrew Jones
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Guo Ren,
	Heiko Stuebner, Paul Walmsley

Hi Deepak,

On 2024-03-20 6:27 PM, Deepak Gupta wrote:
>>>> And instead of context switching in `_switch_to`,
>>>> In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>>>
>>> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
>>> to every IRQ and system call exit, even though most of them will not change the
>>> envcfg value. This is especially the case when returning from an IRQ/exception
>>> back to S-mode, since envcfg has zero effect there.
>>>
>>> The CSRs that are read/written in entry.S are generally those where the value
>>> can be updated by hardware, as part of taking an exception. But envcfg never
>>> changes on its own. The kernel knows exactly when its value will change, and
>>> those places are:
>>>
>>>  1) Task switch, i.e. switch_to()
>>>  2) execve(), i.e. start_thread() or flush_thread()
>>>  3) A system call that specifically affects a feature controlled by envcfg
>>
>> Yeah I was optimizing for a single place to write instead of
>> sprinkling at multiple places.
>> But I see your argument. That's fine.
>>
> 
> Because this is RFC and we are discussing it. I thought a little bit
> more about this.

Thanks for your comments and the discussion! I know several in-progress features
depend on envcfg, so hopefully we can agree on a design acceptable to everyone.

> If we were to go with the above approach that essentially requires
> whenever a envcfg bit changes, `sync_envcfg`
> has to be called to reflect the correct value.

sync_envcfg() is only needed if the task being updated is `current`. Would it be
more acceptable if this happened inside a helper function? Something like:

static inline void envcfg_update_bits(struct task_struct *task,
				      unsigned long mask, unsigned long val)
{
	unsigned long envcfg;

	envcfg = (task->thread.envcfg & ~mask) | val;
	task->thread.envcfg = envcfg;
	if (task == current)
		csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | envcfg);
}

> What if some of these features enable/disable are exposed to `ptrace`
> (gdb, etc use cases) for enable/disable.
> How will syncing work then ?

ptrace_check_attach() ensures the tracee is scheduled out while a ptrace
operation is running, so there is no need to sync anything. Any changes to
task->thread.envcfg are written to the CSR when the tracee is scheduled back in.

> I can see the reasoning behind saving some cycles during trap return.
> But `senvcfg` is not actually a user state, it
> controls the execution environment configuration for user mode. I
> think the best place for this CSR to be written is
> trap return and writing at a single place from a single image on stack
> reduces chances of bugs and errors. And allows
> `senvcfg` features to be exposed to other kernel flows (like `ptrace`)

If ptrace is accessing a process, then task->thread.envcfg is always up to date.
The only complication is that the per-CPU bits need to be ORed back in to get
the real CSR value for another process, but this again is unrelated to whether
the CSR is written in switch_to() or ret_from_exception().

> We can figure out ways on how to optimize in trap return path to avoid
> writing it if we entered and exiting on the same
> task.

Optimizing out the CSR write when the task did not switch requires knowing if
the current task's envcfg was changed during this trip to S-mode... and this
starts looking similar to sync_envcfg().

Regards,
Samuel


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-22  3:43             ` Samuel Holland
  0 siblings, 0 replies; 54+ messages in thread
From: Samuel Holland @ 2024-03-22  3:43 UTC (permalink / raw)
  To: Deepak Gupta, Andrew Jones
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Guo Ren,
	Heiko Stuebner, Paul Walmsley

Hi Deepak,

On 2024-03-20 6:27 PM, Deepak Gupta wrote:
>>>> And instead of context switching in `_switch_to`,
>>>> In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>>>
>>> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
>>> to every IRQ and system call exit, even though most of them will not change the
>>> envcfg value. This is especially the case when returning from an IRQ/exception
>>> back to S-mode, since envcfg has zero effect there.
>>>
>>> The CSRs that are read/written in entry.S are generally those where the value
>>> can be updated by hardware, as part of taking an exception. But envcfg never
>>> changes on its own. The kernel knows exactly when its value will change, and
>>> those places are:
>>>
>>>  1) Task switch, i.e. switch_to()
>>>  2) execve(), i.e. start_thread() or flush_thread()
>>>  3) A system call that specifically affects a feature controlled by envcfg
>>
>> Yeah I was optimizing for a single place to write instead of
>> sprinkling at multiple places.
>> But I see your argument. That's fine.
>>
> 
> Because this is RFC and we are discussing it. I thought a little bit
> more about this.

Thanks for your comments and the discussion! I know several in-progress features
depend on envcfg, so hopefully we can agree on a design acceptable to everyone.

> If we were to go with the above approach that essentially requires
> whenever a envcfg bit changes, `sync_envcfg`
> has to be called to reflect the correct value.

sync_envcfg() is only needed if the task being updated is `current`. Would it be
more acceptable if this happened inside a helper function? Something like:

static inline void envcfg_update_bits(struct task_struct *task,
				      unsigned long mask, unsigned long val)
{
	unsigned long envcfg;

	envcfg = (task->thread.envcfg & ~mask) | val;
	task->thread.envcfg = envcfg;
	if (task == current)
		csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | envcfg);
}

> What if some of these features enable/disable are exposed to `ptrace`
> (gdb, etc use cases) for enable/disable.
> How will syncing work then ?

ptrace_check_attach() ensures the tracee is scheduled out while a ptrace
operation is running, so there is no need to sync anything. Any changes to
task->thread.envcfg are written to the CSR when the tracee is scheduled back in.

> I can see the reasoning behind saving some cycles during trap return.
> But `senvcfg` is not actually a user state, it
> controls the execution environment configuration for user mode. I
> think the best place for this CSR to be written is
> trap return and writing at a single place from a single image on stack
> reduces chances of bugs and errors. And allows
> `senvcfg` features to be exposed to other kernel flows (like `ptrace`)

If ptrace is accessing a process, then task->thread.envcfg is always up to date.
The only complication is that the per-CPU bits need to be ORed back in to get
the real CSR value for another process, but this again is unrelated to whether
the CSR is written in switch_to() or ret_from_exception().

> We can figure out ways on how to optimize in trap return path to avoid
> writing it if we entered and exiting on the same
> task.

Optimizing out the CSR write when the task did not switch requires knowing if
the current task's envcfg was changed during this trip to S-mode... and this
starts looking similar to sync_envcfg().

Regards,
Samuel


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20  2:20       ` Samuel Holland
@ 2024-03-22  7:58         ` Andrew Jones
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Jones @ 2024-03-22  7:58 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Deepak Gupta, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 09:20:59PM -0500, Samuel Holland wrote:
...
> This is really a separate concern than when we write envcfg. The per-CPU
> variable is only necessary to support hardware where a subset of harts support
> Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
> specifically for Zicboz, I assume this is an important use case, and dropping
> support for this hardware would be a regression. After all, hwprobe() allows
> userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
> weigh in on that.
>

Hi Samuel,

I've approached Zicboz the same way I would approach all extensions, which
is to be per-hart. I'm not currently aware of a platform that is / will be
composed of harts where some have Zicboz and others don't, but there's
nothing stopping a platform like that from being built. I realize this
adds complexity that we may not want to manage in Linux without an actual
use case requiring it. I wouldn't be opposed to keeping things simple for
now, only bringing in complexity when needed (for this extension or for a
future extension with envcfg bits), but we should ensure we make it clear
that we're making those simplifications now based on assumptions, and we
may need to change things later.

Thanks,
drew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-22  7:58         ` Andrew Jones
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Jones @ 2024-03-22  7:58 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Deepak Gupta, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 09:20:59PM -0500, Samuel Holland wrote:
...
> This is really a separate concern than when we write envcfg. The per-CPU
> variable is only necessary to support hardware where a subset of harts support
> Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
> specifically for Zicboz, I assume this is an important use case, and dropping
> support for this hardware would be a regression. After all, hwprobe() allows
> userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
> weigh in on that.
>

Hi Samuel,

I've approached Zicboz the same way I would approach all extensions, which
is to be per-hart. I'm not currently aware of a platform that is / will be
composed of harts where some have Zicboz and others don't, but there's
nothing stopping a platform like that from being built. I realize this
adds complexity that we may not want to manage in Linux without an actual
use case requiring it. I wouldn't be opposed to keeping things simple for
now, only bringing in complexity when needed (for this extension or for a
future extension with envcfg bits), but we should ensure we make it clear
that we're making those simplifications now based on assumptions, and we
may need to change things later.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20  4:39         ` Deepak Gupta
@ 2024-03-22  8:09           ` Andrew Jones
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Jones @ 2024-03-22  8:09 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 09:39:52PM -0700, Deepak Gupta wrote:
...
> I am not sure of the practicality of this heterogeneity for Zicboz and
> for that matter any of the upcoming
> features that'll be enabled via senvcfg (control flow integrity,
> pointer masking, etc).
> 
> As an example if cache zeroing instructions are used by app binary, I
> expect it to be used in following
> manner
> 
>  - Explicitly inserting cbo.zero by application developer
>  - Some compiler flag which ensures that structures larger than cache
> line gets zeroed by cbo.zero
> 
> In either of the cases, the developer is not expecting to target it to
> a specific hart on SoC and instead expect it to work.
> There might be libraries (installed via sudo apt get) with cache zero
> support in them which may run in different address spaces.
> Should the library be aware of the CPU on which it's running. Now
> whoever is running these binaries should be aware which CPUs
> they get assigned to in order to avoid faults?
> 
> That seems excessive, doesn't it?
>

It might be safe to assume extensions like Zicboz will be on all harts if
any, but I wouldn't expect all extensions in the future to be present on
all available harts. For example, some Arm big.LITTLE boards only have
virt extensions on big CPUs. When a VMM wants to launch a guest it must
be aware of which CPUs it will use for the VCPU threads. For riscv, we
have the which-cpus variant of the hwprobe syscall to try and make this
type of thing easier to manage, but I agree it will still be a pain for
software since it will need to make that query and then set its affinity,
which is something it hasn't needed to do before.

Thanks,
drew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-22  8:09           ` Andrew Jones
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Jones @ 2024-03-22  8:09 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 09:39:52PM -0700, Deepak Gupta wrote:
...
> I am not sure of the practicality of this heterogeneity for Zicboz and
> for that matter any of the upcoming
> features that'll be enabled via senvcfg (control flow integrity,
> pointer masking, etc).
> 
> As an example if cache zeroing instructions are used by app binary, I
> expect it to be used in following
> manner
> 
>  - Explicitly inserting cbo.zero by application developer
>  - Some compiler flag which ensures that structures larger than cache
> line gets zeroed by cbo.zero
> 
> In either of the cases, the developer is not expecting to target it to
> a specific hart on SoC and instead expect it to work.
> There might be libraries (installed via sudo apt get) with cache zero
> support in them which may run in different address spaces.
> Should the library be aware of the CPU on which it's running. Now
> whoever is running these binaries should be aware which CPUs
> they get assigned to in order to avoid faults?
> 
> That seems excessive, doesn't it?
>

It might be safe to assume extensions like Zicboz will be on all harts if
any, but I wouldn't expect all extensions in the future to be present on
all available harts. For example, some Arm big.LITTLE boards only have
virt extensions on big CPUs. When a VMM wants to launch a guest it must
be aware of which CPUs it will use for the VCPU threads. For riscv, we
have the which-cpus variant of the hwprobe syscall to try and make this
type of thing easier to manage, but I agree it will still be a pain for
software since it will need to make that query and then set its affinity,
which is something it hasn't needed to do before.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-22  8:09           ` Andrew Jones
@ 2024-03-22 16:52             ` Deepak Gupta
  -1 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-22 16:52 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Fri, Mar 22, 2024 at 1:09 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Tue, Mar 19, 2024 at 09:39:52PM -0700, Deepak Gupta wrote:
> ...
> > I am not sure of the practicality of this heterogeneity for Zicboz and
> > for that matter any of the upcoming
> > features that'll be enabled via senvcfg (control flow integrity,
> > pointer masking, etc).
> >
> > As an example if cache zeroing instructions are used by app binary, I
> > expect it to be used in following
> > manner
> >
> >  - Explicitly inserting cbo.zero by application developer
> >  - Some compiler flag which ensures that structures larger than cache
> > line gets zeroed by cbo.zero
> >
> > In either of the cases, the developer is not expecting to target it to
> > a specific hart on SoC and instead expect it to work.
> > There might be libraries (installed via sudo apt get) with cache zero
> > support in them which may run in different address spaces.
> > Should the library be aware of the CPU on which it's running. Now
> > whoever is running these binaries should be aware which CPUs
> > they get assigned to in order to avoid faults?
> >
> > That seems excessive, doesn't it?
> >
>
> It might be safe to assume extensions like Zicboz will be on all harts if
> any, but I wouldn't expect all extensions in the future to be present on
> all available harts. For example, some Arm big.LITTLE boards only have
> virt extensions on big CPUs. When a VMM wants to launch a guest it must
> be aware of which CPUs it will use for the VCPU threads. For riscv, we
> have the which-cpus variant of the hwprobe syscall to try and make this
> type of thing easier to manage, but I agree it will still be a pain for
> software since it will need to make that query and then set its affinity,
> which is something it hasn't needed to do before.
>

Sure, the future may be a world where heterogeneous ISA is a thing. But
that's not the present. Let's not try to build for something which
doesn't exist.
It has been (heterogeneous ISA) tried earlier many times and mostly have
fallen flat (remember on Intel alder lake, Intel had to ship a ucode patch to
disable AVX512 exactly for same reason)
https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/2

As and when ISA features get enabled, they get compiled into libraries/binaries
and end user many times use things like `taskset` to set affinity
without even realizing
there is some weirdness going on under the hood. For majority of use
cases -- heterogeneous
ISA doesn't make sense. Sure if someone is willing to build a custom
SoC with heterogeneous
ISA for their strict usecase, they control their software and hardware
and thus they can do that.
But littering linux kernel to support wierd usecases and putting a
burden of that on majority of
usecases and software is not wise.

If something like this has to be done, I expect first that it doesn't
force end users to learn
about ISA differences between harts on their system and then figure
out which installed
packages have which ISA features compiled in. This is like walking on
eggshells from the end
user perspective. Sure, end user can be extremely intelligent / smart
and figure it all out but
that population is rare and that rare population can develop their
custom kernel and libc
patches to do something like this.

This is a good science project to support heterogeneous ISA but
practically not viable unless
there is a high level end user use case.

> Thanks,
> drew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-22 16:52             ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-22 16:52 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Fri, Mar 22, 2024 at 1:09 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Tue, Mar 19, 2024 at 09:39:52PM -0700, Deepak Gupta wrote:
> ...
> > I am not sure of the practicality of this heterogeneity for Zicboz and
> > for that matter any of the upcoming
> > features that'll be enabled via senvcfg (control flow integrity,
> > pointer masking, etc).
> >
> > As an example if cache zeroing instructions are used by app binary, I
> > expect it to be used in following
> > manner
> >
> >  - Explicitly inserting cbo.zero by application developer
> >  - Some compiler flag which ensures that structures larger than cache
> > line gets zeroed by cbo.zero
> >
> > In either of the cases, the developer is not expecting to target it to
> > a specific hart on SoC and instead expect it to work.
> > There might be libraries (installed via sudo apt get) with cache zero
> > support in them which may run in different address spaces.
> > Should the library be aware of the CPU on which it's running. Now
> > whoever is running these binaries should be aware which CPUs
> > they get assigned to in order to avoid faults?
> >
> > That seems excessive, doesn't it?
> >
>
> It might be safe to assume extensions like Zicboz will be on all harts if
> any, but I wouldn't expect all extensions in the future to be present on
> all available harts. For example, some Arm big.LITTLE boards only have
> virt extensions on big CPUs. When a VMM wants to launch a guest it must
> be aware of which CPUs it will use for the VCPU threads. For riscv, we
> have the which-cpus variant of the hwprobe syscall to try and make this
> type of thing easier to manage, but I agree it will still be a pain for
> software since it will need to make that query and then set its affinity,
> which is something it hasn't needed to do before.
>

Sure, the future may be a world where heterogeneous ISA is a thing. But
that's not the present. Let's not try to build for something which
doesn't exist.
It has been (heterogeneous ISA) tried earlier many times and mostly have
fallen flat (remember on Intel alder lake, Intel had to ship a ucode patch to
disable AVX512 exactly for same reason)
https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/2

As and when ISA features get enabled, they get compiled into libraries/binaries
and end user many times use things like `taskset` to set affinity
without even realizing
there is some weirdness going on under the hood. For majority of use
cases -- heterogeneous
ISA doesn't make sense. Sure if someone is willing to build a custom
SoC with heterogeneous
ISA for their strict usecase, they control their software and hardware
and thus they can do that.
But littering linux kernel to support wierd usecases and putting a
burden of that on majority of
usecases and software is not wise.

If something like this has to be done, I expect first that it doesn't
force end users to learn
about ISA differences between harts on their system and then figure
out which installed
packages have which ISA features compiled in. This is like walking on
eggshells from the end
user perspective. Sure, end user can be extremely intelligent / smart
and figure it all out but
that population is rare and that rare population can develop their
custom kernel and libc
patches to do something like this.

This is a good science project to support heterogeneous ISA but
practically not viable unless
there is a high level end user use case.

> Thanks,
> drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-22  0:13           ` Samuel Holland
@ 2024-03-22 17:13             ` Deepak Gupta
  -1 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-22 17:13 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Conor Dooley, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Thu, Mar 21, 2024 at 5:13 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> On 2024-03-19 11:39 PM, Deepak Gupta wrote:
> >>>> --- a/arch/riscv/include/asm/switch_to.h
> >>>> +++ b/arch/riscv/include/asm/switch_to.h
> >>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> >>>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
> >>>>  #endif
> >>>>
> >>>> +static inline void sync_envcfg(struct task_struct *task)
> >>>> +{
> >>>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> >>>> +}
> >>>> +
> >>>> +static inline void __switch_to_envcfg(struct task_struct *next)
> >>>> +{
> >>>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> >>>
> >>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> >>> even if ALTERNATIVES was turned on.
> >>> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> >>> place to pick up *envcfg.
> >>
> >> The branchiness is sort of expected, since that function is implemented by
> >> switching on/off a branch instruction, so the alternate code is necessarily a
> >> separate basic block. It's a tradeoff so we don't have to write assembly code
> >> for every bit of code that depends on an extension. However, the cost should be
> >> somewhat lowered since the branch is unconditional and so entirely predictable.
> >>
> >> If the branch turns out to be problematic for performance, then we could use
> >> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> >
> > Yeah I lean towards using alternatives directly.
>
> One thing to note here: we can't use alternatives directly if the behavior needs
> to be different on different harts (i.e. a subset of harts implement the envcfg
> CSR). I think we need some policy about which ISA extensions are allowed to be
> asymmetric across harts, or else we add too much complexity.

As I've responded on the same thread . We are adding too much
complexity by assuming
that heterogeneous ISA exists (which it doesn't today). And even if it
exists, it wouldn't work.
Nobody wants to spend a lot of time figuring out which harts have
which ISA and which
packages are compiled with which ISA. Most of the end users do `sudo
apt get install blah blah`
And then expect it to just work. It doesn't work for other
architectures and even when someone
tried, they had to disable certain ISA features to make sure that all
cores have the same ISA feature
(search AVX12 Intel Alder Lake Disable).

>
> Regards,
> Samuel
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-22 17:13             ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-22 17:13 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Conor Dooley, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Thu, Mar 21, 2024 at 5:13 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> On 2024-03-19 11:39 PM, Deepak Gupta wrote:
> >>>> --- a/arch/riscv/include/asm/switch_to.h
> >>>> +++ b/arch/riscv/include/asm/switch_to.h
> >>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> >>>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
> >>>>  #endif
> >>>>
> >>>> +static inline void sync_envcfg(struct task_struct *task)
> >>>> +{
> >>>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> >>>> +}
> >>>> +
> >>>> +static inline void __switch_to_envcfg(struct task_struct *next)
> >>>> +{
> >>>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> >>>
> >>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> >>> even if ALTERNATIVES was turned on.
> >>> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> >>> place to pick up *envcfg.
> >>
> >> The branchiness is sort of expected, since that function is implemented by
> >> switching on/off a branch instruction, so the alternate code is necessarily a
> >> separate basic block. It's a tradeoff so we don't have to write assembly code
> >> for every bit of code that depends on an extension. However, the cost should be
> >> somewhat lowered since the branch is unconditional and so entirely predictable.
> >>
> >> If the branch turns out to be problematic for performance, then we could use
> >> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> >
> > Yeah I lean towards using alternatives directly.
>
> One thing to note here: we can't use alternatives directly if the behavior needs
> to be different on different harts (i.e. a subset of harts implement the envcfg
> CSR). I think we need some policy about which ISA extensions are allowed to be
> asymmetric across harts, or else we add too much complexity.

As I've responded on the same thread . We are adding too much
complexity by assuming
that heterogeneous ISA exists (which it doesn't today). And even if it
exists, it wouldn't work.
Nobody wants to spend a lot of time figuring out which harts have
which ISA and which
packages are compiled with which ISA. Most of the end users do `sudo
apt get install blah blah`
And then expect it to just work. It doesn't work for other
architectures and even when someone
tried, they had to disable certain ISA features to make sure that all
cores have the same ISA feature
(search AVX12 Intel Alder Lake Disable).

>
> Regards,
> Samuel
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-22 17:13             ` Deepak Gupta
@ 2024-03-23  9:35               ` Andrew Jones
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Jones @ 2024-03-23  9:35 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Samuel Holland, Conor Dooley, Palmer Dabbelt, linux-riscv,
	devicetree, Catalin Marinas, linux-kernel, tech-j-ext, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Guo Ren,
	Heiko Stuebner, Paul Walmsley

On Fri, Mar 22, 2024 at 10:13:48AM -0700, Deepak Gupta wrote:
> On Thu, Mar 21, 2024 at 5:13 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
> >
> > On 2024-03-19 11:39 PM, Deepak Gupta wrote:
> > >>>> --- a/arch/riscv/include/asm/switch_to.h
> > >>>> +++ b/arch/riscv/include/asm/switch_to.h
> > >>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> > >>>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
> > >>>>  #endif
> > >>>>
> > >>>> +static inline void sync_envcfg(struct task_struct *task)
> > >>>> +{
> > >>>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> > >>>> +}
> > >>>> +
> > >>>> +static inline void __switch_to_envcfg(struct task_struct *next)
> > >>>> +{
> > >>>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> > >>>
> > >>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> > >>> even if ALTERNATIVES was turned on.
> > >>> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> > >>> place to pick up *envcfg.
> > >>
> > >> The branchiness is sort of expected, since that function is implemented by
> > >> switching on/off a branch instruction, so the alternate code is necessarily a
> > >> separate basic block. It's a tradeoff so we don't have to write assembly code
> > >> for every bit of code that depends on an extension. However, the cost should be
> > >> somewhat lowered since the branch is unconditional and so entirely predictable.
> > >>
> > >> If the branch turns out to be problematic for performance, then we could use
> > >> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> > >
> > > Yeah I lean towards using alternatives directly.
> >
> > One thing to note here: we can't use alternatives directly if the behavior needs
> > to be different on different harts (i.e. a subset of harts implement the envcfg
> > CSR). I think we need some policy about which ISA extensions are allowed to be
> > asymmetric across harts, or else we add too much complexity.
> 
> As I've responded on the same thread . We are adding too much
> complexity by assuming
> that heterogeneous ISA exists (which it doesn't today). And even if it
> exists, it wouldn't work.
> Nobody wants to spend a lot of time figuring out which harts have
> which ISA and which
> packages are compiled with which ISA. Most of the end users do `sudo
> apt get install blah blah`
> And then expect it to just work.

That will still work if the applications and libraries installed are
heterogeneous-platform aware, i.e. they do the figuring out which harts
have which extensions themselves. Applications/libraries should already
be probing for ISA extensions before using them. It's not a huge leap to
also check which harts support those extensions and then ensure affinity
is set appropriately.

> It doesn't work for other
> architectures and even when someone
> tried, they had to disable certain ISA features to make sure that all
> cores have the same ISA feature
> (search AVX12 Intel Alder Lake Disable).

The RISC-V software ecosystem is still being developed. We have an
opportunity to drop assumptions made by other architectures.


As I said in a different reply, it's reasonable for Linux to not add the
complexity until a use case comes along that Linux would like to support,
but I think it would be premature for Linux to put a stake in the sand.

So, how about we add code that confirms Zicboz is on all harts. If any
hart does not have it, then we complain loudly and disable it on all
the other harts. If it was just a hardware description bug, then it'll
get fixed. If there's actually a platform which doesn't have Zicboz
on all harts, then, when the issue is reported, we can decide to not
support it, support it with defconfig, or support it under a Kconfig
guard which must be enabled by the user.

Thanks,
drew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-23  9:35               ` Andrew Jones
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Jones @ 2024-03-23  9:35 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Samuel Holland, Conor Dooley, Palmer Dabbelt, linux-riscv,
	devicetree, Catalin Marinas, linux-kernel, tech-j-ext, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Guo Ren,
	Heiko Stuebner, Paul Walmsley

On Fri, Mar 22, 2024 at 10:13:48AM -0700, Deepak Gupta wrote:
> On Thu, Mar 21, 2024 at 5:13 PM Samuel Holland
> <samuel.holland@sifive.com> wrote:
> >
> > On 2024-03-19 11:39 PM, Deepak Gupta wrote:
> > >>>> --- a/arch/riscv/include/asm/switch_to.h
> > >>>> +++ b/arch/riscv/include/asm/switch_to.h
> > >>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> > >>>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
> > >>>>  #endif
> > >>>>
> > >>>> +static inline void sync_envcfg(struct task_struct *task)
> > >>>> +{
> > >>>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> > >>>> +}
> > >>>> +
> > >>>> +static inline void __switch_to_envcfg(struct task_struct *next)
> > >>>> +{
> > >>>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> > >>>
> > >>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> > >>> even if ALTERNATIVES was turned on.
> > >>> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> > >>> place to pick up *envcfg.
> > >>
> > >> The branchiness is sort of expected, since that function is implemented by
> > >> switching on/off a branch instruction, so the alternate code is necessarily a
> > >> separate basic block. It's a tradeoff so we don't have to write assembly code
> > >> for every bit of code that depends on an extension. However, the cost should be
> > >> somewhat lowered since the branch is unconditional and so entirely predictable.
> > >>
> > >> If the branch turns out to be problematic for performance, then we could use
> > >> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> > >
> > > Yeah I lean towards using alternatives directly.
> >
> > One thing to note here: we can't use alternatives directly if the behavior needs
> > to be different on different harts (i.e. a subset of harts implement the envcfg
> > CSR). I think we need some policy about which ISA extensions are allowed to be
> > asymmetric across harts, or else we add too much complexity.
> 
> As I've responded on the same thread . We are adding too much
> complexity by assuming
> that heterogeneous ISA exists (which it doesn't today). And even if it
> exists, it wouldn't work.
> Nobody wants to spend a lot of time figuring out which harts have
> which ISA and which
> packages are compiled with which ISA. Most of the end users do `sudo
> apt get install blah blah`
> And then expect it to just work.

That will still work if the applications and libraries installed are
heterogeneous-platform aware, i.e. they do the figuring out which harts
have which extensions themselves. Applications/libraries should already
be probing for ISA extensions before using them. It's not a huge leap to
also check which harts support those extensions and then ensure affinity
is set appropriately.

> It doesn't work for other
> architectures and even when someone
> tried, they had to disable certain ISA features to make sure that all
> cores have the same ISA feature
> (search AVX12 Intel Alder Lake Disable).

The RISC-V software ecosystem is still being developed. We have an
opportunity to drop assumptions made by other architectures.


As I said in a different reply, it's reasonable for Linux to not add the
complexity until a use case comes along that Linux would like to support,
but I think it would be premature for Linux to put a stake in the sand.

So, how about we add code that confirms Zicboz is on all harts. If any
hart does not have it, then we complain loudly and disable it on all
the other harts. If it was just a hardware description bug, then it'll
get fixed. If there's actually a platform which doesn't have Zicboz
on all harts, then, when the issue is reported, we can decide to not
support it, support it with defconfig, or support it under a Kconfig
guard which must be enabled by the user.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-23  9:35               ` Andrew Jones
@ 2024-03-23 20:37                 ` Deepak Gupta
  -1 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-23 20:37 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Samuel Holland, Conor Dooley, Palmer Dabbelt, linux-riscv,
	devicetree, Catalin Marinas, linux-kernel, tech-j-ext, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Guo Ren,
	Heiko Stuebner, Paul Walmsley

On Sat, Mar 23, 2024 at 2:35 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Fri, Mar 22, 2024 at 10:13:48AM -0700, Deepak Gupta wrote:

> > > > Yeah I lean towards using alternatives directly.
> > >
> > > One thing to note here: we can't use alternatives directly if the behavior needs
> > > to be different on different harts (i.e. a subset of harts implement the envcfg
> > > CSR). I think we need some policy about which ISA extensions are allowed to be
> > > asymmetric across harts, or else we add too much complexity.
> >
> > As I've responded on the same thread . We are adding too much
> > complexity by assuming
> > that heterogeneous ISA exists (which it doesn't today). And even if it
> > exists, it wouldn't work.
> > Nobody wants to spend a lot of time figuring out which harts have
> > which ISA and which
> > packages are compiled with which ISA. Most of the end users do `sudo
> > apt get install blah blah`
> > And then expect it to just work.
>
> That will still work if the applications and libraries installed are
> heterogeneous-platform aware, i.e. they do the figuring out which harts
> have which extensions themselves. Applications/libraries should already
> be probing for ISA extensions before using them. It's not a huge leap to
> also check which harts support those extensions and then ensure affinity
> is set appropriately.

How ?
It's a single image of a library that will be loaded in multiple address spaces.
You expect all code pages to do COW for multiple address spaces or
expect to have
per task variables to choose different code paths in the library based
on address space its
running in ?
On top of that, the library/application developer doesn't know how the
end user is going to use them.
End users (sysadmin, etc)  just might use taskset to put affinity on
tasks without being aware.
I just don't see the motivation in an application developer/library
developer to do something
like this. No application/library developer has time for this. Putting
a lot of burden on application
developers is mostly a nuisance considering they don't have to think
about these nuisance
when they expect the same code to be deployed on non-riscv architectures.

One good example of putting unnecessary burden on app/library
developer is Intel SGX
This is exactly the reason Intel SGX failed. Application developers
don't have time to develop
confidential compute version of the application for a specific CPU
while on other CPUs carry
a different version of application. But at the same time virtual
machine confidential compute is
better approach where all complicated decision making is delegated to
operating system
developer and application/library developers are empowered to only
think about their stuff.

>
> > It doesn't work for other
> > architectures and even when someone
> > tried, they had to disable certain ISA features to make sure that all
> > cores have the same ISA feature
> > (search AVX12 Intel Alder Lake Disable).
>
> The RISC-V software ecosystem is still being developed. We have an
> opportunity to drop assumptions made by other architectures.

It doesn't mean that it should try to make the same mistakes which
others have done.

If there is a motivation and use case from end user perspective, please provide.
Otherwise no point doing something which is just a science thought
exercise and no concrete use case.

Please note that these arguments are against Heterogeneous ISA on cores.
From power and efficiency perspective cores can still be heterogeneous.

>
>
> As I said in a different reply, it's reasonable for Linux to not add the
> complexity until a use case comes along that Linux would like to support,
> but I think it would be premature for Linux to put a stake in the sand.
>
> So, how about we add code that confirms Zicboz is on all harts. If any
> hart does not have it, then we complain loudly and disable it on all
> the other harts. If it was just a hardware description bug, then it'll
> get fixed. If there's actually a platform which doesn't have Zicboz
> on all harts, then, when the issue is reported, we can decide to not
> support it, support it with defconfig, or support it under a Kconfig
> guard which must be enabled by the user.
>
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-23 20:37                 ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-23 20:37 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Samuel Holland, Conor Dooley, Palmer Dabbelt, linux-riscv,
	devicetree, Catalin Marinas, linux-kernel, tech-j-ext, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Guo Ren,
	Heiko Stuebner, Paul Walmsley

On Sat, Mar 23, 2024 at 2:35 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Fri, Mar 22, 2024 at 10:13:48AM -0700, Deepak Gupta wrote:

> > > > Yeah I lean towards using alternatives directly.
> > >
> > > One thing to note here: we can't use alternatives directly if the behavior needs
> > > to be different on different harts (i.e. a subset of harts implement the envcfg
> > > CSR). I think we need some policy about which ISA extensions are allowed to be
> > > asymmetric across harts, or else we add too much complexity.
> >
> > As I've responded on the same thread . We are adding too much
> > complexity by assuming
> > that heterogeneous ISA exists (which it doesn't today). And even if it
> > exists, it wouldn't work.
> > Nobody wants to spend a lot of time figuring out which harts have
> > which ISA and which
> > packages are compiled with which ISA. Most of the end users do `sudo
> > apt get install blah blah`
> > And then expect it to just work.
>
> That will still work if the applications and libraries installed are
> heterogeneous-platform aware, i.e. they do the figuring out which harts
> have which extensions themselves. Applications/libraries should already
> be probing for ISA extensions before using them. It's not a huge leap to
> also check which harts support those extensions and then ensure affinity
> is set appropriately.

How ?
It's a single image of a library that will be loaded in multiple address spaces.
You expect all code pages to do COW for multiple address spaces or
expect to have
per task variables to choose different code paths in the library based
on address space its
running in ?
On top of that, the library/application developer doesn't know how the
end user is going to use them.
End users (sysadmin, etc)  just might use taskset to put affinity on
tasks without being aware.
I just don't see the motivation in an application developer/library
developer to do something
like this. No application/library developer has time for this. Putting
a lot of burden on application
developers is mostly a nuisance considering they don't have to think
about these nuisance
when they expect the same code to be deployed on non-riscv architectures.

One good example of putting unnecessary burden on app/library
developer is Intel SGX
This is exactly the reason Intel SGX failed. Application developers
don't have time to develop
confidential compute version of the application for a specific CPU
while on other CPUs carry
a different version of application. But at the same time virtual
machine confidential compute is
better approach where all complicated decision making is delegated to
operating system
developer and application/library developers are empowered to only
think about their stuff.

>
> > It doesn't work for other
> > architectures and even when someone
> > tried, they had to disable certain ISA features to make sure that all
> > cores have the same ISA feature
> > (search AVX12 Intel Alder Lake Disable).
>
> The RISC-V software ecosystem is still being developed. We have an
> opportunity to drop assumptions made by other architectures.

It doesn't mean that it should try to make the same mistakes which
others have done.

If there is a motivation and use case from end user perspective, please provide.
Otherwise no point doing something which is just a science thought
exercise and no concrete use case.

Please note that these arguments are against Heterogeneous ISA on cores.
From power and efficiency perspective cores can still be heterogeneous.

>
>
> As I said in a different reply, it's reasonable for Linux to not add the
> complexity until a use case comes along that Linux would like to support,
> but I think it would be premature for Linux to put a stake in the sand.
>
> So, how about we add code that confirms Zicboz is on all harts. If any
> hart does not have it, then we complain loudly and disable it on all
> the other harts. If it was just a hardware description bug, then it'll
> get fixed. If there's actually a platform which doesn't have Zicboz
> on all harts, then, when the issue is reported, we can decide to not
> support it, support it with defconfig, or support it under a Kconfig
> guard which must be enabled by the user.
>
> Thanks,
> drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
  2024-03-20  2:20       ` Samuel Holland
@ 2024-03-28  1:58         ` Deepak Gupta
  -1 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-28  1:58 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 7:21 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> >         else
> >                 regs->status |= SR_UXL_64;
> >  #endif
> > +       current->thread_info.envcfg = ENVCFG_BASE;
> >  }
> >
> > And instead of context switching in `_switch_to`,
> > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>
> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
> to every IRQ and system call exit, even though most of them will not change the
> envcfg value. This is especially the case when returning from an IRQ/exception
> back to S-mode, since envcfg has zero effect there.
>

A quick observation:
So I tried this on my setup. When I put `senvcfg` writes in
`__switch_to ` path, qemu suddenly
just tanks and takes a lot of time to boot up as opposed to when
`senvcfg` was in trap return path.
In my case entire userspace (all processes) have cfi enabled for them
via `senvcfg` and it gets
context switched. Not sure it's specific to my setup. I don't think it
should be an issue on actual
hardware.

Still debugging why it slows down my qemu drastically when same writes
to same CSR
are moved from `ret_from_exception` to `switch_to`

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-28  1:58         ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-28  1:58 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Palmer Dabbelt, linux-riscv, devicetree, Catalin Marinas,
	linux-kernel, tech-j-ext, Conor Dooley, kasan-dev,
	Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring, Andrew Jones,
	Guo Ren, Heiko Stuebner, Paul Walmsley

On Tue, Mar 19, 2024 at 7:21 PM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> >         else
> >                 regs->status |= SR_UXL_64;
> >  #endif
> > +       current->thread_info.envcfg = ENVCFG_BASE;
> >  }
> >
> > And instead of context switching in `_switch_to`,
> > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>
> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
> to every IRQ and system call exit, even though most of them will not change the
> envcfg value. This is especially the case when returning from an IRQ/exception
> back to S-mode, since envcfg has zero effect there.
>

A quick observation:
So I tried this on my setup. When I put `senvcfg` writes in
`__switch_to ` path, qemu suddenly
just tanks and takes a lot of time to boot up as opposed to when
`senvcfg` was in trap return path.
In my case entire userspace (all processes) have cfi enabled for them
via `senvcfg` and it gets
context switched. Not sure it's specific to my setup. I don't think it
should be an issue on actual
hardware.

Still debugging why it slows down my qemu drastically when same writes
to same CSR
are moved from `ret_from_exception` to `switch_to`

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
       [not found]       ` <17C0CB122DBB0EAE.6770@lists.riscv.org>
@ 2024-03-28 19:34           ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-28 19:34 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Andrew Jones, Guo Ren, Heiko Stuebner, Paul Walmsley

On Wed, Mar 27, 2024 at 06:58:45PM -0700, Deepak Gupta via lists.riscv.org wrote:
>On Tue, Mar 19, 2024 at 7:21 PM Samuel Holland
><samuel.holland@sifive.com> wrote:
>>
>> >         else
>> >                 regs->status |= SR_UXL_64;
>> >  #endif
>> > +       current->thread_info.envcfg = ENVCFG_BASE;
>> >  }
>> >
>> > And instead of context switching in `_switch_to`,
>> > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>>
>> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
>> to every IRQ and system call exit, even though most of them will not change the
>> envcfg value. This is especially the case when returning from an IRQ/exception
>> back to S-mode, since envcfg has zero effect there.
>>
>
>A quick observation:
>So I tried this on my setup. When I put `senvcfg` writes in
>`__switch_to ` path, qemu suddenly
>just tanks and takes a lot of time to boot up as opposed to when
>`senvcfg` was in trap return path.
>In my case entire userspace (all processes) have cfi enabled for them
>via `senvcfg` and it gets
>context switched. Not sure it's specific to my setup. I don't think it
>should be an issue on actual
>hardware.
>
>Still debugging why it slows down my qemu drastically when same writes
>to same CSR
>are moved from `ret_from_exception` to `switch_to`

Nevermind and sorry for the bother. An issue on my setup.

>
>
>-=-=-=-=-=-=-=-=-=-=-=-
>Links: You receive all messages sent to this group.
>View/Reply Online (#680): https://lists.riscv.org/g/tech-j-ext/message/680
>Mute This Topic: https://lists.riscv.org/mt/105033914/7300952
>Group Owner: tech-j-ext+owner@lists.riscv.org
>Unsubscribe: https://lists.riscv.org/g/tech-j-ext/unsub [debug@rivosinc.com]
>-=-=-=-=-=-=-=-=-=-=-=-
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits
@ 2024-03-28 19:34           ` Deepak Gupta
  0 siblings, 0 replies; 54+ messages in thread
From: Deepak Gupta @ 2024-03-28 19:34 UTC (permalink / raw)
  To: Samuel Holland, Palmer Dabbelt, linux-riscv, devicetree,
	Catalin Marinas, linux-kernel, tech-j-ext, Conor Dooley,
	kasan-dev, Evgenii Stepanov, Krzysztof Kozlowski, Rob Herring,
	Andrew Jones, Guo Ren, Heiko Stuebner, Paul Walmsley

On Wed, Mar 27, 2024 at 06:58:45PM -0700, Deepak Gupta via lists.riscv.org wrote:
>On Tue, Mar 19, 2024 at 7:21 PM Samuel Holland
><samuel.holland@sifive.com> wrote:
>>
>> >         else
>> >                 regs->status |= SR_UXL_64;
>> >  #endif
>> > +       current->thread_info.envcfg = ENVCFG_BASE;
>> >  }
>> >
>> > And instead of context switching in `_switch_to`,
>> > In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>>
>> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
>> to every IRQ and system call exit, even though most of them will not change the
>> envcfg value. This is especially the case when returning from an IRQ/exception
>> back to S-mode, since envcfg has zero effect there.
>>
>
>A quick observation:
>So I tried this on my setup. When I put `senvcfg` writes in
>`__switch_to ` path, qemu suddenly
>just tanks and takes a lot of time to boot up as opposed to when
>`senvcfg` was in trap return path.
>In my case entire userspace (all processes) have cfi enabled for them
>via `senvcfg` and it gets
>context switched. Not sure it's specific to my setup. I don't think it
>should be an issue on actual
>hardware.
>
>Still debugging why it slows down my qemu drastically when same writes
>to same CSR
>are moved from `ret_from_exception` to `switch_to`

Nevermind and sorry for the bother. An issue on my setup.

>
>
>-=-=-=-=-=-=-=-=-=-=-=-
>Links: You receive all messages sent to this group.
>View/Reply Online (#680): https://lists.riscv.org/g/tech-j-ext/message/680
>Mute This Topic: https://lists.riscv.org/mt/105033914/7300952
>Group Owner: tech-j-ext+owner@lists.riscv.org
>Unsubscribe: https://lists.riscv.org/g/tech-j-ext/unsub [debug@rivosinc.com]
>-=-=-=-=-=-=-=-=-=-=-=-
>
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2024-03-28 19:34 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-19 21:58 [RFC PATCH 0/9] riscv: Userspace pointer masking and tagged address ABI Samuel Holland
2024-03-19 21:58 ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 1/9] dt-bindings: riscv: Add pointer masking ISA extensions Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 2/9] riscv: Add ISA extension parsing for pointer masking Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 3/9] riscv: Add CSR definitions " Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 4/9] riscv: Define is_compat_thread() Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 23:55   ` [RISC-V] [tech-j-ext] " Deepak Gupta
2024-03-19 23:55     ` Deepak Gupta
2024-03-20  2:20     ` Samuel Holland
2024-03-20  2:20       ` Samuel Holland
2024-03-20  4:39       ` Deepak Gupta
2024-03-20  4:39         ` Deepak Gupta
2024-03-22  0:13         ` Samuel Holland
2024-03-22  0:13           ` Samuel Holland
2024-03-22 17:13           ` Deepak Gupta
2024-03-22 17:13             ` Deepak Gupta
2024-03-23  9:35             ` Andrew Jones
2024-03-23  9:35               ` Andrew Jones
2024-03-23 20:37               ` Deepak Gupta
2024-03-23 20:37                 ` Deepak Gupta
2024-03-22  8:09         ` Andrew Jones
2024-03-22  8:09           ` Andrew Jones
2024-03-22 16:52           ` Deepak Gupta
2024-03-22 16:52             ` Deepak Gupta
2024-03-20  8:06       ` Conor Dooley
2024-03-20  8:06         ` Conor Dooley
     [not found]       ` <17BE5F38AFE245E5.29196@lists.riscv.org>
2024-03-20 23:27         ` Deepak Gupta
2024-03-20 23:27           ` Deepak Gupta
2024-03-22  3:43           ` Samuel Holland
2024-03-22  3:43             ` Samuel Holland
2024-03-22  7:58       ` Andrew Jones
2024-03-22  7:58         ` Andrew Jones
2024-03-28  1:58       ` Deepak Gupta
2024-03-28  1:58         ` Deepak Gupta
     [not found]       ` <17C0CB122DBB0EAE.6770@lists.riscv.org>
2024-03-28 19:34         ` Deepak Gupta
2024-03-28 19:34           ` Deepak Gupta
2024-03-19 21:58 ` [RFC PATCH 6/9] riscv: Add support for userspace pointer masking Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 7/9] riscv: Add support for the tagged address ABI Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 8/9] riscv: Allow ptrace control of " Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-19 21:58 ` [RFC PATCH 9/9] selftests: riscv: Add a pointer masking test Samuel Holland
2024-03-19 21:58   ` Samuel Holland
2024-03-20 17:21   ` Conor Dooley
2024-03-20 17:21     ` Conor Dooley
2024-03-20 18:04     ` Samuel Holland
2024-03-20 18:04       ` Samuel Holland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.