linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] membarrier: riscv: Core serializing command
@ 2024-01-10 14:55 Andrea Parri
  2024-01-10 14:55 ` [PATCH v3 1/4] membarrier: riscv: Add full memory barrier in switch_mm() Andrea Parri
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 14:55 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel, Andrea Parri

Changes since v2 ([1]):
  - amaned inline comments
  - drop ARCH_HAS_MEMBARRIER, create membarrrier.rst

Changes since v1 ([2]):
  - add smp_mb() in switch_mm()
  - introduce ARCH_HAS_MEMBARRIER, amend documentation

Changes since RFC ([3]):
  - introduce prepare_sync_core_cmd()
  - fix nosmp builds

[1] https://lore.kernel.org/lkml/20231211094414.8078-1-parri.andrea@gmail.com/
[2] https://lore.kernel.org/lkml/20231127103235.28442-1-parri.andrea@gmail.com/
[3] https://lore.kernel.org/lkml/20230803040111.5101-1-parri.andrea@gmail.com/

Andrea Parri (4):
  membarrier: riscv: Add full memory barrier in switch_mm()
  membarrier: Create Documentation/scheduler/membarrier.rst
  locking: Introduce prepare_sync_core_cmd()
  membarrier: riscv: Provide core serializing command

 .../membarrier-sync-core/arch-support.txt     | 18 ++++++-
 Documentation/scheduler/index.rst             |  1 +
 Documentation/scheduler/membarrier.rst        | 37 ++++++++++++++
 MAINTAINERS                                   |  4 +-
 arch/riscv/Kconfig                            |  4 ++
 arch/riscv/include/asm/membarrier.h           | 50 +++++++++++++++++++
 arch/riscv/include/asm/sync_core.h            | 29 +++++++++++
 arch/riscv/mm/context.c                       |  2 +
 include/linux/sync_core.h                     | 16 +++++-
 init/Kconfig                                  |  3 ++
 kernel/sched/core.c                           | 16 ++++--
 kernel/sched/membarrier.c                     | 13 +++--
 12 files changed, 183 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/scheduler/membarrier.rst
 create mode 100644 arch/riscv/include/asm/membarrier.h
 create mode 100644 arch/riscv/include/asm/sync_core.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/4] membarrier: riscv: Add full memory barrier in switch_mm()
  2024-01-10 14:55 [PATCH v3 0/4] membarrier: riscv: Core serializing command Andrea Parri
@ 2024-01-10 14:55 ` Andrea Parri
  2024-01-10 14:55 ` [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst Andrea Parri
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 14:55 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel, Andrea Parri

The membarrier system call requires a full memory barrier after storing
to rq->curr, before going back to user-space.  The barrier is only
needed when switching between processes: the barrier is implied by
mmdrop() when switching from kernel to userspace, and it's not needed
when switching from userspace to kernel.

Rely on the feature/mechanism ARCH_HAS_MEMBARRIER_CALLBACKS and on the
primitive membarrier_arch_switch_mm(), already adopted by the PowerPC
architecture, to insert the required barrier.

Fixes: fab957c11efe2f ("RISC-V: Atomic and Locking Code")
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 MAINTAINERS                         |  2 +-
 arch/riscv/Kconfig                  |  1 +
 arch/riscv/include/asm/membarrier.h | 31 +++++++++++++++++++++++++++++
 arch/riscv/mm/context.c             |  2 ++
 kernel/sched/core.c                 |  5 +++--
 5 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a7c4cf8201e01..0f8cec504b2ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13815,7 +13815,7 @@ M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
 M:	"Paul E. McKenney" <paulmck@kernel.org>
 L:	linux-kernel@vger.kernel.org
 S:	Supported
-F:	arch/powerpc/include/asm/membarrier.h
+F:	arch/*/include/asm/membarrier.h
 F:	include/uapi/linux/membarrier.h
 F:	kernel/sched/membarrier.c
 
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index cd4c9a204d08c..33d9ea5fa392f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MEMBARRIER_CALLBACKS
 	select ARCH_HAS_MMIOWB
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PMEM_API
diff --git a/arch/riscv/include/asm/membarrier.h b/arch/riscv/include/asm/membarrier.h
new file mode 100644
index 0000000000000..6c016ebb5020a
--- /dev/null
+++ b/arch/riscv/include/asm/membarrier.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_MEMBARRIER_H
+#define _ASM_RISCV_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+					     struct mm_struct *next,
+					     struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (IS_ENABLED(CONFIG_SMP) &&
+	    likely(!(atomic_read(&next->membarrier_state) &
+		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
+		      MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 * Matches a full barrier in the proximity of the membarrier
+	 * system call entry.
+	 */
+	smp_mb();
+}
+
+#endif /* _ASM_RISCV_MEMBARRIER_H */
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 217fd4de61342..ba8eb3944687c 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -323,6 +323,8 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	if (unlikely(prev == next))
 		return;
 
+	membarrier_arch_switch_mm(prev, next, task);
+
 	/*
 	 * Mark the current MM context as inactive, and the next as
 	 * active.  This is at least used by the icache flushing
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a708d225c28e8..711dc753f7216 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		 *
 		 * Here are the schemes providing that barrier on the
 		 * various architectures:
-		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
-		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
+		 *   RISC-V.  switch_mm() relies on membarrier_arch_switch_mm()
+		 *   on PowerPC and on RISC-V.
 		 * - finish_lock_switch() for weakly-ordered
 		 *   architectures where spin_unlock is a full barrier,
 		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst
  2024-01-10 14:55 [PATCH v3 0/4] membarrier: riscv: Core serializing command Andrea Parri
  2024-01-10 14:55 ` [PATCH v3 1/4] membarrier: riscv: Add full memory barrier in switch_mm() Andrea Parri
@ 2024-01-10 14:55 ` Andrea Parri
  2024-01-10 18:15   ` Randy Dunlap
  2024-01-24 16:13   ` Mathieu Desnoyers
  2024-01-10 14:55 ` [PATCH v3 3/4] locking: Introduce prepare_sync_core_cmd() Andrea Parri
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 14:55 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel, Andrea Parri

To gather the architecture requirements of the "private/global
expedited" membarrier commands.  The file will be expanded to
integrate further information about the membarrier syscall (as
needed/desired in the future).  While at it, amend some related
inline comments in the membarrier codebase.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 Documentation/scheduler/index.rst      |  1 +
 Documentation/scheduler/membarrier.rst | 37 ++++++++++++++++++++++++++
 MAINTAINERS                            |  1 +
 kernel/sched/core.c                    |  7 ++++-
 kernel/sched/membarrier.c              |  8 +++---
 5 files changed, 49 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/scheduler/membarrier.rst

diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
index 3170747226f6d..43bd8a145b7a9 100644
--- a/Documentation/scheduler/index.rst
+++ b/Documentation/scheduler/index.rst
@@ -7,6 +7,7 @@ Scheduler
 
 
     completion
+    membarrier
     sched-arch
     sched-bwc
     sched-deadline
diff --git a/Documentation/scheduler/membarrier.rst b/Documentation/scheduler/membarrier.rst
new file mode 100644
index 0000000000000..ab7ee3824b407
--- /dev/null
+++ b/Documentation/scheduler/membarrier.rst
@@ -0,0 +1,37 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+membarrier() System Call
+========================
+
+MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements
+=====================================================================
+
+Memory barriers before updating rq->curr
+----------------------------------------
+
+The command requires each architecture to have a full memory barrier after
+coming from user-space, before updating rq->curr.  This barrier is implied
+by the sequence rq_lock(); smp_mb__after_spinlock() in __schedule().  The
+barrier matches a full barrier in the proximity of the membarrier system
+call exit, cf. membarrier_{private,global}_expedited().
+
+Memory barriers after updating rq->curr
+---------------------------------------
+
+The command requires each architecture to have a full memory barrier after
+updating rq->curr, before returning to user-space.  The schemes providing
+this barrier on the various architectures are as follows.
+
+ - alpha, arc, arm, hexagon, mips rely on the full barrier implied by
+   spin_unlock() in finish_lock_switch().
+
+ - arm64 relies on the full barrier implied by switch_to().
+
+ - powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by
+   switch_mm(), if mm is not NULL; they rely on the full barrier implied
+   by mmdrop(), otherwise.  On powerpc and riscv, switch_mm() relies on
+   membarrier_arch_switch_mm().
+
+The barrier matches a full barrier in the proximity of the membarrier system
+call entry, cf. membarrier_{private,global}_expedited().
diff --git a/MAINTAINERS b/MAINTAINERS
index 0f8cec504b2ba..6bce0aeecb4f2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13815,6 +13815,7 @@ M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
 M:	"Paul E. McKenney" <paulmck@kernel.org>
 L:	linux-kernel@vger.kernel.org
 S:	Supported
+F:	Documentation/scheduler/membarrier.rst
 F:	arch/*/include/asm/membarrier.h
 F:	include/uapi/linux/membarrier.h
 F:	kernel/sched/membarrier.c
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 711dc753f7216..b51bc86f8340c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6599,7 +6599,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	 *     if (signal_pending_state())	    if (p->state & @state)
 	 *
 	 * Also, the membarrier system call requires a full memory barrier
-	 * after coming from user-space, before storing to rq->curr.
+	 * after coming from user-space, before storing to rq->curr; this
+	 * barrier matches a full barrier in the proximity of the membarrier
+	 * system call exit.
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
@@ -6677,6 +6679,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		 *   architectures where spin_unlock is a full barrier,
 		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
 		 *   is a RELEASE barrier),
+		 *
+		 * The barrier matches a full barrier in the proximity of
+		 * the membarrier system call entry.
 		 */
 		++*switch_count;
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 2ad881d07752c..f3d91628d6b8a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -251,7 +251,7 @@ static int membarrier_global_expedited(void)
 		return 0;
 
 	/*
-	 * Matches memory barriers around rq->curr modification in
+	 * Matches memory barriers after rq->curr modification in
 	 * scheduler.
 	 */
 	smp_mb();	/* system call entry is not a mb. */
@@ -300,7 +300,7 @@ static int membarrier_global_expedited(void)
 
 	/*
 	 * Memory barrier on the caller thread _after_ we finished
-	 * waiting for the last IPI. Matches memory barriers around
+	 * waiting for the last IPI. Matches memory barriers before
 	 * rq->curr modification in scheduler.
 	 */
 	smp_mb();	/* exit from system call is not a mb */
@@ -339,7 +339,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		return 0;
 
 	/*
-	 * Matches memory barriers around rq->curr modification in
+	 * Matches memory barriers after rq->curr modification in
 	 * scheduler.
 	 */
 	smp_mb();	/* system call entry is not a mb. */
@@ -415,7 +415,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 
 	/*
 	 * Memory barrier on the caller thread _after_ we finished
-	 * waiting for the last IPI. Matches memory barriers around
+	 * waiting for the last IPI. Matches memory barriers before
 	 * rq->curr modification in scheduler.
 	 */
 	smp_mb();	/* exit from system call is not a mb */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/4] locking: Introduce prepare_sync_core_cmd()
  2024-01-10 14:55 [PATCH v3 0/4] membarrier: riscv: Core serializing command Andrea Parri
  2024-01-10 14:55 ` [PATCH v3 1/4] membarrier: riscv: Add full memory barrier in switch_mm() Andrea Parri
  2024-01-10 14:55 ` [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst Andrea Parri
@ 2024-01-10 14:55 ` Andrea Parri
  2024-01-10 14:55 ` [PATCH v3 4/4] membarrier: riscv: Provide core serializing command Andrea Parri
  2024-01-24 14:13 ` [PATCH v3 0/4] membarrier: riscv: Core " Andrea Parri
  4 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 14:55 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel, Andrea Parri

Introduce an architecture function that architectures can use to set
up ("prepare") SYNC_CORE commands.

The function will be used by RISC-V to update its "deferred icache-
flush" data structures (icache_stale_mask).

Architectures defining prepare_sync_core_cmd() static inline need to
select ARCH_HAS_PREPARE_SYNC_CORE_CMD.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/sync_core.h | 16 +++++++++++++++-
 init/Kconfig              |  3 +++
 kernel/sched/membarrier.c |  1 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/sync_core.h b/include/linux/sync_core.h
index 013da4b8b3272..67bb9794b8758 100644
--- a/include/linux/sync_core.h
+++ b/include/linux/sync_core.h
@@ -17,5 +17,19 @@ static inline void sync_core_before_usermode(void)
 }
 #endif
 
-#endif /* _LINUX_SYNC_CORE_H */
+#ifdef CONFIG_ARCH_HAS_PREPARE_SYNC_CORE_CMD
+#include <asm/sync_core.h>
+#else
+/*
+ * This is a dummy prepare_sync_core_cmd() implementation that can be used on
+ * all architectures which provide unconditional core serializing instructions
+ * in switch_mm().
+ * If your architecture doesn't provide such core serializing instructions in
+ * switch_mm(), you may need to write your own functions.
+ */
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+}
+#endif
 
+#endif /* _LINUX_SYNC_CORE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 9ffb103fc927b..87daf50838f02 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1972,6 +1972,9 @@ source "kernel/Kconfig.locks"
 config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	bool
 
+config ARCH_HAS_PREPARE_SYNC_CORE_CMD
+	bool
+
 config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	bool
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index f3d91628d6b8a..6d1f31b3a967b 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -320,6 +320,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
 			return -EPERM;
 		ipi_func = ipi_sync_core;
+		prepare_sync_core_cmd(mm);
 	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
 		if (!IS_ENABLED(CONFIG_RSEQ))
 			return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-10 14:55 [PATCH v3 0/4] membarrier: riscv: Core serializing command Andrea Parri
                   ` (2 preceding siblings ...)
  2024-01-10 14:55 ` [PATCH v3 3/4] locking: Introduce prepare_sync_core_cmd() Andrea Parri
@ 2024-01-10 14:55 ` Andrea Parri
  2024-01-10 19:27   ` Stefan O'Rear
  2024-01-24 16:18   ` Mathieu Desnoyers
  2024-01-24 14:13 ` [PATCH v3 0/4] membarrier: riscv: Core " Andrea Parri
  4 siblings, 2 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 14:55 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel, Andrea Parri

RISC-V uses xRET instructions on return from interrupt and to go back
to user-space; the xRET instruction is not core serializing.

Use FENCE.I for providing core serialization as follows:

 - by calling sync_core_before_usermode() on return from interrupt (cf.
   ipi_sync_core()),

 - via switch_mm() and sync_core_before_usermode() (respectively, for
   uthread->uthread and kthread->uthread transitions) to go back to
   user-space.

On RISC-V, the serialization in switch_mm() is activated by resetting
the icache_stale_mask of the mm at prepare_sync_core_cmd().

Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 .../membarrier-sync-core/arch-support.txt     | 18 +++++++++++-
 MAINTAINERS                                   |  1 +
 arch/riscv/Kconfig                            |  3 ++
 arch/riscv/include/asm/membarrier.h           | 19 ++++++++++++
 arch/riscv/include/asm/sync_core.h            | 29 +++++++++++++++++++
 kernel/sched/core.c                           |  4 +++
 kernel/sched/membarrier.c                     |  4 +++
 7 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/sync_core.h

diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index d96b778b87ed8..a163170fc0f48 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -10,6 +10,22 @@
 # Rely on implicit context synchronization as a result of exception return
 # when returning from IPI handler, and when returning to user-space.
 #
+# * riscv
+#
+# riscv uses xRET as return from interrupt and to return to user-space.
+#
+# Given that xRET is not core serializing, we rely on FENCE.I for providing
+# core serialization:
+#
+#  - by calling sync_core_before_usermode() on return from interrupt (cf.
+#    ipi_sync_core()),
+#
+#  - via switch_mm() and sync_core_before_usermode() (respectively, for
+#    uthread->uthread and kthread->uthread transitions) to go back to
+#    user-space.
+#
+#  The serialization in switch_mm() is activated by prepare_sync_core_cmd().
+#
 # * x86
 #
 # x86-32 uses IRET as return from interrupt, which takes care of the IPI.
@@ -43,7 +59,7 @@
     |    openrisc: | TODO |
     |      parisc: | TODO |
     |     powerpc: |  ok  |
-    |       riscv: | TODO |
+    |       riscv: |  ok  |
     |        s390: |  ok  |
     |          sh: | TODO |
     |       sparc: | TODO |
diff --git a/MAINTAINERS b/MAINTAINERS
index 6bce0aeecb4f2..e4ca6288ea3d1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13817,6 +13817,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	Documentation/scheduler/membarrier.rst
 F:	arch/*/include/asm/membarrier.h
+F:	arch/*/include/asm/sync_core.h
 F:	include/uapi/linux/membarrier.h
 F:	kernel/sched/membarrier.c
 
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 33d9ea5fa392f..2ad63a216d69a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,14 +28,17 @@ config RISCV
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_HAS_KCOV
 	select ARCH_HAS_MEMBARRIER_CALLBACKS
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_MMIOWB
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PMEM_API
+	select ARCH_HAS_PREPARE_SYNC_CORE_CMD
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_SET_DIRECT_MAP if MMU
 	select ARCH_HAS_SET_MEMORY if MMU
 	select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
 	select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
+	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_SYSCALL_WRAPPER
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/riscv/include/asm/membarrier.h b/arch/riscv/include/asm/membarrier.h
index 6c016ebb5020a..47b240d0d596a 100644
--- a/arch/riscv/include/asm/membarrier.h
+++ b/arch/riscv/include/asm/membarrier.h
@@ -22,6 +22,25 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 	/*
 	 * The membarrier system call requires a full memory barrier
 	 * after storing to rq->curr, before going back to user-space.
+	 *
+	 * This barrier is also needed for the SYNC_CORE command when
+	 * switching between processes; in particular, on a transition
+	 * from a thread belonging to another mm to a thread belonging
+	 * to the mm for which a membarrier SYNC_CORE is done on CPU0:
+	 *
+	 *   - [CPU0] sets all bits in the mm icache_stale_mask (in
+	 *     prepare_sync_core_cmd());
+	 *
+	 *   - [CPU1] stores to rq->curr (by the scheduler);
+	 *
+	 *   - [CPU0] loads rq->curr within membarrier and observes
+	 *     cpu_rq(1)->curr->mm != mm, so the IPI is skipped on
+	 *     CPU1; this means membarrier relies on switch_mm() to
+	 *     issue the sync-core;
+	 *
+	 *   - [CPU1] switch_mm() loads icache_stale_mask; if the bit
+	 *     is zero, switch_mm() may incorrectly skip the sync-core.
+	 *
 	 * Matches a full barrier in the proximity of the membarrier
 	 * system call entry.
 	 */
diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h
new file mode 100644
index 0000000000000..9153016da8f14
--- /dev/null
+++ b/arch/riscv/include/asm/sync_core.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_SYNC_CORE_H
+#define _ASM_RISCV_SYNC_CORE_H
+
+/*
+ * RISC-V implements return to user-space through an xRET instruction,
+ * which is not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+	asm volatile ("fence.i" ::: "memory");
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Ensure the next switch_mm() on every CPU issues a core serializing
+ * instruction for the given @mm.
+ */
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+	cpumask_setall(&mm->context.icache_stale_mask);
+}
+#else
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+}
+#endif /* CONFIG_SMP */
+
+#endif /* _ASM_RISCV_SYNC_CORE_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b51bc86f8340c..82de2b7d253cd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6682,6 +6682,10 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 		 *
 		 * The barrier matches a full barrier in the proximity of
 		 * the membarrier system call entry.
+		 *
+		 * On RISC-V, this barrier pairing is also needed for the
+		 * SYNC_CORE command when switching between processes, cf.
+		 * the inline comments in membarrier_arch_switch_mm().
 		 */
 		++*switch_count;
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 6d1f31b3a967b..703e8d80a576d 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -342,6 +342,10 @@ static int membarrier_private_expedited(int flags, int cpu_id)
 	/*
 	 * Matches memory barriers after rq->curr modification in
 	 * scheduler.
+	 *
+	 * On RISC-V, this barrier pairing is also needed for the
+	 * SYNC_CORE command when switching between processes, cf.
+	 * the inline comments in membarrier_arch_switch_mm().
 	 */
 	smp_mb();	/* system call entry is not a mb. */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst
  2024-01-10 14:55 ` [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst Andrea Parri
@ 2024-01-10 18:15   ` Randy Dunlap
  2024-01-10 19:05     ` Andrea Parri
  2024-01-24 16:13   ` Mathieu Desnoyers
  1 sibling, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2024-01-10 18:15 UTC (permalink / raw)
  To: Andrea Parri, paul.walmsley, palmer, aou, mathieu.desnoyers,
	paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel

Hi,

On 1/10/24 06:55, Andrea Parri wrote:
> To gather the architecture requirements of the "private/global
> expedited" membarrier commands.  The file will be expanded to
> integrate further information about the membarrier syscall (as
> needed/desired in the future).  While at it, amend some related
> inline comments in the membarrier codebase.
> 
> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  Documentation/scheduler/index.rst      |  1 +
>  Documentation/scheduler/membarrier.rst | 37 ++++++++++++++++++++++++++
>  MAINTAINERS                            |  1 +
>  kernel/sched/core.c                    |  7 ++++-
>  kernel/sched/membarrier.c              |  8 +++---
>  5 files changed, 49 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/scheduler/membarrier.rst
> 
> diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
> index 3170747226f6d..43bd8a145b7a9 100644
> --- a/Documentation/scheduler/index.rst
> +++ b/Documentation/scheduler/index.rst
> @@ -7,6 +7,7 @@ Scheduler
>  
>  
>      completion
> +    membarrier
>      sched-arch
>      sched-bwc
>      sched-deadline
> diff --git a/Documentation/scheduler/membarrier.rst b/Documentation/scheduler/membarrier.rst
> new file mode 100644
> index 0000000000000..ab7ee3824b407
> --- /dev/null
> +++ b/Documentation/scheduler/membarrier.rst
> @@ -0,0 +1,37 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +========================
> +membarrier() System Call
> +========================
> +
> +MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements
> +=====================================================================
> +
> +Memory barriers before updating rq->curr
> +----------------------------------------
> +
> +The command requires each architecture to have a full memory barrier after
> +coming from user-space, before updating rq->curr.  This barrier is implied
> +by the sequence rq_lock(); smp_mb__after_spinlock() in __schedule().  The
> +barrier matches a full barrier in the proximity of the membarrier system
> +call exit, cf. membarrier_{private,global}_expedited().
> +

What does "The command" refer to above and below, please?

Thanks.

> +Memory barriers after updating rq->curr
> +---------------------------------------
> +
> +The command requires each architecture to have a full memory barrier after
> +updating rq->curr, before returning to user-space.  The schemes providing
> +this barrier on the various architectures are as follows.
> +
> + - alpha, arc, arm, hexagon, mips rely on the full barrier implied by
> +   spin_unlock() in finish_lock_switch().
> +
> + - arm64 relies on the full barrier implied by switch_to().
> +
> + - powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by
> +   switch_mm(), if mm is not NULL; they rely on the full barrier implied
> +   by mmdrop(), otherwise.  On powerpc and riscv, switch_mm() relies on
> +   membarrier_arch_switch_mm().
> +
> +The barrier matches a full barrier in the proximity of the membarrier system
> +call entry, cf. membarrier_{private,global}_expedited().


-- 
#Randy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst
  2024-01-10 18:15   ` Randy Dunlap
@ 2024-01-10 19:05     ` Andrea Parri
  2024-01-10 19:08       ` Randy Dunlap
  0 siblings, 1 reply; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 19:05 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet,
	mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel

Hi Randy,

> > +MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements
> > +=====================================================================
> > +
> > +Memory barriers before updating rq->curr
> > +----------------------------------------
> > +
> > +The command requires each architecture to have a full memory barrier after
> > +coming from user-space, before updating rq->curr.  This barrier is implied
> > +by the sequence rq_lock(); smp_mb__after_spinlock() in __schedule().  The
> > +barrier matches a full barrier in the proximity of the membarrier system
> > +call exit, cf. membarrier_{private,global}_expedited().
> > +
> 
> What does "The command" refer to above and below, please?

The term was meant to refer to any of MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_GLOBAL_EXPEDITED (from the section title); FWIW, this seems
to align with the terminology adopted in MEMBARRIER(2) for example.

Mmh, unless I get a better idea, I'll expand those occurrences to:

  "The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDIDED
   require [...]"

  Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst
  2024-01-10 19:05     ` Andrea Parri
@ 2024-01-10 19:08       ` Randy Dunlap
  2024-01-10 19:19         ` Andrea Parri
  0 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2024-01-10 19:08 UTC (permalink / raw)
  To: Andrea Parri
  Cc: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet,
	mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel



On 1/10/24 11:05, Andrea Parri wrote:
> Hi Randy,
> 
>>> +MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements
>>> +=====================================================================
>>> +
>>> +Memory barriers before updating rq->curr
>>> +----------------------------------------
>>> +
>>> +The command requires each architecture to have a full memory barrier after
>>> +coming from user-space, before updating rq->curr.  This barrier is implied
>>> +by the sequence rq_lock(); smp_mb__after_spinlock() in __schedule().  The
>>> +barrier matches a full barrier in the proximity of the membarrier system
>>> +call exit, cf. membarrier_{private,global}_expedited().
>>> +
>>
>> What does "The command" refer to above and below, please?
> 
> The term was meant to refer to any of MEMBARRIER_CMD_PRIVATE_EXPEDITED and
> MEMBARRIER_CMD_GLOBAL_EXPEDITED (from the section title); FWIW, this seems
> to align with the terminology adopted in MEMBARRIER(2) for example.

I see.

> Mmh, unless I get a better idea, I'll expand those occurrences to:
> 
>   "The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDIDED
>    require [...]"
                                                                            _EXPEDITED

OK, that's better IMO. Thanks.

-- 
#Randy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst
  2024-01-10 19:08       ` Randy Dunlap
@ 2024-01-10 19:19         ` Andrea Parri
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 19:19 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet,
	mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel

> > Mmh, unless I get a better idea, I'll expand those occurrences to:
> > 
> >   "The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDIDED
> >    require [...]"
>                                                                             _EXPEDITED

Oops. :-|  Thanks.

  Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-10 14:55 ` [PATCH v3 4/4] membarrier: riscv: Provide core serializing command Andrea Parri
@ 2024-01-10 19:27   ` Stefan O'Rear
  2024-01-10 22:34     ` Andrea Parri
  2024-01-24 16:18   ` Mathieu Desnoyers
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan O'Rear @ 2024-01-10 19:27 UTC (permalink / raw)
  To: Andrea Parri, paul.walmsley, Palmer Dabbelt, Albert Ou,
	mathieu.desnoyers, paulmck, Jonathan Corbet
  Cc: mmaas, Hans Boehm, striker, charlie, rehn, linux-riscv,
	linux-doc, linux-kernel

On Wed, Jan 10, 2024, at 9:55 AM, Andrea Parri wrote:
> RISC-V uses xRET instructions on return from interrupt and to go back
> to user-space; the xRET instruction is not core serializing.
>
> Use FENCE.I for providing core serialization as follows:
>
>  - by calling sync_core_before_usermode() on return from interrupt (cf.
>    ipi_sync_core()),
>
>  - via switch_mm() and sync_core_before_usermode() (respectively, for
>    uthread->uthread and kthread->uthread transitions) to go back to
>    user-space.
>
> On RISC-V, the serialization in switch_mm() is activated by resetting
> the icache_stale_mask of the mm at prepare_sync_core_cmd().
>
> Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  .../membarrier-sync-core/arch-support.txt     | 18 +++++++++++-
>  MAINTAINERS                                   |  1 +
>  arch/riscv/Kconfig                            |  3 ++
>  arch/riscv/include/asm/membarrier.h           | 19 ++++++++++++
>  arch/riscv/include/asm/sync_core.h            | 29 +++++++++++++++++++
>  kernel/sched/core.c                           |  4 +++
>  kernel/sched/membarrier.c                     |  4 +++
>  7 files changed, 77 insertions(+), 1 deletion(-)
>  create mode 100644 arch/riscv/include/asm/sync_core.h
>
> diff --git 
> a/Documentation/features/sched/membarrier-sync-core/arch-support.txt 
> b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> index d96b778b87ed8..a163170fc0f48 100644
> --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> @@ -10,6 +10,22 @@
>  # Rely on implicit context synchronization as a result of exception 
> return
>  # when returning from IPI handler, and when returning to user-space.
>  #
> +# * riscv
> +#
> +# riscv uses xRET as return from interrupt and to return to user-space.
> +#
> +# Given that xRET is not core serializing, we rely on FENCE.I for 
> providing
> +# core serialization:

"core serialization" is a meaningless sequence of words for RISC-V users,
and an extremely strange way to describe running fence.i on all remote
cores.  fence.i is a _fence_; it is not required to affect a core pipeline
beyond what is needed to ensure that all instruction fetches after the
barrier completes see writes performed before the barrier.

The feature seems useful, but it should document what it does using
terminology actually used in the RISC-V specifications.

> +#
> +#  - by calling sync_core_before_usermode() on return from interrupt 
> (cf.
> +#    ipi_sync_core()),
> +#
> +#  - via switch_mm() and sync_core_before_usermode() (respectively, for
> +#    uthread->uthread and kthread->uthread transitions) to go back to
> +#    user-space.
> +#
> +#  The serialization in switch_mm() is activated by 
> prepare_sync_core_cmd().
> +#
>  # * x86
>  #
>  # x86-32 uses IRET as return from interrupt, which takes care of the 
> IPI.
> @@ -43,7 +59,7 @@
>      |    openrisc: | TODO |
>      |      parisc: | TODO |
>      |     powerpc: |  ok  |
> -    |       riscv: | TODO |
> +    |       riscv: |  ok  |
>      |        s390: |  ok  |
>      |          sh: | TODO |
>      |       sparc: | TODO |
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6bce0aeecb4f2..e4ca6288ea3d1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13817,6 +13817,7 @@ L:	linux-kernel@vger.kernel.org
>  S:	Supported
>  F:	Documentation/scheduler/membarrier.rst
>  F:	arch/*/include/asm/membarrier.h
> +F:	arch/*/include/asm/sync_core.h
>  F:	include/uapi/linux/membarrier.h
>  F:	kernel/sched/membarrier.c
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 33d9ea5fa392f..2ad63a216d69a 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -28,14 +28,17 @@ config RISCV
>  	select ARCH_HAS_GIGANTIC_PAGE
>  	select ARCH_HAS_KCOV
>  	select ARCH_HAS_MEMBARRIER_CALLBACKS
> +	select ARCH_HAS_MEMBARRIER_SYNC_CORE
>  	select ARCH_HAS_MMIOWB
>  	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>  	select ARCH_HAS_PMEM_API
> +	select ARCH_HAS_PREPARE_SYNC_CORE_CMD
>  	select ARCH_HAS_PTE_SPECIAL
>  	select ARCH_HAS_SET_DIRECT_MAP if MMU
>  	select ARCH_HAS_SET_MEMORY if MMU
>  	select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
>  	select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
> +	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
>  	select ARCH_HAS_SYSCALL_WRAPPER
>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>  	select ARCH_HAS_UBSAN_SANITIZE_ALL
> diff --git a/arch/riscv/include/asm/membarrier.h 
> b/arch/riscv/include/asm/membarrier.h
> index 6c016ebb5020a..47b240d0d596a 100644
> --- a/arch/riscv/include/asm/membarrier.h
> +++ b/arch/riscv/include/asm/membarrier.h
> @@ -22,6 +22,25 @@ static inline void membarrier_arch_switch_mm(struct 
> mm_struct *prev,
>  	/*
>  	 * The membarrier system call requires a full memory barrier
>  	 * after storing to rq->curr, before going back to user-space.
> +	 *
> +	 * This barrier is also needed for the SYNC_CORE command when
> +	 * switching between processes; in particular, on a transition
> +	 * from a thread belonging to another mm to a thread belonging
> +	 * to the mm for which a membarrier SYNC_CORE is done on CPU0:
> +	 *
> +	 *   - [CPU0] sets all bits in the mm icache_stale_mask (in
> +	 *     prepare_sync_core_cmd());
> +	 *
> +	 *   - [CPU1] stores to rq->curr (by the scheduler);
> +	 *
> +	 *   - [CPU0] loads rq->curr within membarrier and observes
> +	 *     cpu_rq(1)->curr->mm != mm, so the IPI is skipped on
> +	 *     CPU1; this means membarrier relies on switch_mm() to
> +	 *     issue the sync-core;
> +	 *
> +	 *   - [CPU1] switch_mm() loads icache_stale_mask; if the bit
> +	 *     is zero, switch_mm() may incorrectly skip the sync-core.
> +	 *
>  	 * Matches a full barrier in the proximity of the membarrier
>  	 * system call entry.
>  	 */
> diff --git a/arch/riscv/include/asm/sync_core.h 
> b/arch/riscv/include/asm/sync_core.h
> new file mode 100644
> index 0000000000000..9153016da8f14
> --- /dev/null
> +++ b/arch/riscv/include/asm/sync_core.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_RISCV_SYNC_CORE_H
> +#define _ASM_RISCV_SYNC_CORE_H
> +
> +/*
> + * RISC-V implements return to user-space through an xRET instruction,
> + * which is not core serializing.
> + */
> +static inline void sync_core_before_usermode(void)
> +{
> +	asm volatile ("fence.i" ::: "memory");
> +}

Not standard terminology.

> +
> +#ifdef CONFIG_SMP
> +/*
> + * Ensure the next switch_mm() on every CPU issues a core serializing
> + * instruction for the given @mm.
> + */
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> +	cpumask_setall(&mm->context.icache_stale_mask);
> +}
> +#else
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> +}
> +#endif /* CONFIG_SMP */
> +
> +#endif /* _ASM_RISCV_SYNC_CORE_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b51bc86f8340c..82de2b7d253cd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6682,6 +6682,10 @@ static void __sched notrace __schedule(unsigned 
> int sched_mode)
>  		 *
>  		 * The barrier matches a full barrier in the proximity of
>  		 * the membarrier system call entry.
> +		 *
> +		 * On RISC-V, this barrier pairing is also needed for the
> +		 * SYNC_CORE command when switching between processes, cf.
> +		 * the inline comments in membarrier_arch_switch_mm().
>  		 */
>  		++*switch_count;
> 
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 6d1f31b3a967b..703e8d80a576d 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -342,6 +342,10 @@ static int membarrier_private_expedited(int flags, 
> int cpu_id)
>  	/*
>  	 * Matches memory barriers after rq->curr modification in
>  	 * scheduler.
> +	 *
> +	 * On RISC-V, this barrier pairing is also needed for the
> +	 * SYNC_CORE command when switching between processes, cf.
> +	 * the inline comments in membarrier_arch_switch_mm().
>  	 */
>  	smp_mb();	/* system call entry is not a mb. */
> 
> -- 
> 2.34.1

-s

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-10 19:27   ` Stefan O'Rear
@ 2024-01-10 22:34     ` Andrea Parri
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-10 22:34 UTC (permalink / raw)
  To: Stefan O'Rear
  Cc: paul.walmsley, Palmer Dabbelt, Albert Ou, mathieu.desnoyers,
	paulmck, Jonathan Corbet, mmaas, Hans Boehm, striker, charlie,
	rehn, linux-riscv, linux-doc, linux-kernel

Hi Stefan,


> "core serialization" is a meaningless sequence of words for RISC-V users,

The expression is inherited from MEMBARRIER(2).  Quoting from the RFC
discussion (cf. [3] in the cover letter),

  "RISC-V does not have "core serializing instructions", meaning
  that there is no occurence of such a term in the RISC-V ISA. The
  discussion and git history about the SYNC_CORE command suggested
  the implementation below: a FENCE.I instruction [...]"


> The feature seems useful, but it should document what it does using
> terminology actually used in the RISC-V specifications.

In _current RISC-V parlance, it's pretty clear: we are doing FENCE.I.
As Palmer and others mentioned in the RFC, there're proposals for ISA
extensions aiming to "replace" FENCE.I, but those are still WIP. (*)

  Andrea

(*) https://github.com/riscv/riscv-j-extension

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/4] membarrier: riscv: Core serializing command
  2024-01-10 14:55 [PATCH v3 0/4] membarrier: riscv: Core serializing command Andrea Parri
                   ` (3 preceding siblings ...)
  2024-01-10 14:55 ` [PATCH v3 4/4] membarrier: riscv: Provide core serializing command Andrea Parri
@ 2024-01-24 14:13 ` Andrea Parri
  4 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-24 14:13 UTC (permalink / raw)
  To: paul.walmsley, palmer, aou, mathieu.desnoyers, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel

On Wed, Jan 10, 2024 at 03:55:29PM +0100, Andrea Parri wrote:
> Changes since v2 ([1]):
>   - amaned inline comments
>   - drop ARCH_HAS_MEMBARRIER, create membarrrier.rst
> 
> Changes since v1 ([2]):
>   - add smp_mb() in switch_mm()
>   - introduce ARCH_HAS_MEMBARRIER, amend documentation
> 
> Changes since RFC ([3]):
>   - introduce prepare_sync_core_cmd()
>   - fix nosmp builds
> 
> [1] https://lore.kernel.org/lkml/20231211094414.8078-1-parri.andrea@gmail.com/
> [2] https://lore.kernel.org/lkml/20231127103235.28442-1-parri.andrea@gmail.com/
> [3] https://lore.kernel.org/lkml/20230803040111.5101-1-parri.andrea@gmail.com/
> 
> Andrea Parri (4):
>   membarrier: riscv: Add full memory barrier in switch_mm()
>   membarrier: Create Documentation/scheduler/membarrier.rst
>   locking: Introduce prepare_sync_core_cmd()
>   membarrier: riscv: Provide core serializing command

Gentle ping to the riscv&membarrier people who have survived the merge
window: any other thoughts on this series? suggestions for a v4?

  Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst
  2024-01-10 14:55 ` [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst Andrea Parri
  2024-01-10 18:15   ` Randy Dunlap
@ 2024-01-24 16:13   ` Mathieu Desnoyers
  1 sibling, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2024-01-24 16:13 UTC (permalink / raw)
  To: Andrea Parri, paul.walmsley, palmer, aou, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel

On 2024-01-10 09:55, Andrea Parri wrote:
> To gather the architecture requirements of the "private/global
> expedited" membarrier commands.  The file will be expanded to
> integrate further information about the membarrier syscall (as
> needed/desired in the future).  While at it, amend some related
> inline comments in the membarrier codebase.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> 
> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>   Documentation/scheduler/index.rst      |  1 +
>   Documentation/scheduler/membarrier.rst | 37 ++++++++++++++++++++++++++
>   MAINTAINERS                            |  1 +
>   kernel/sched/core.c                    |  7 ++++-
>   kernel/sched/membarrier.c              |  8 +++---
>   5 files changed, 49 insertions(+), 5 deletions(-)
>   create mode 100644 Documentation/scheduler/membarrier.rst
> 
> diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
> index 3170747226f6d..43bd8a145b7a9 100644
> --- a/Documentation/scheduler/index.rst
> +++ b/Documentation/scheduler/index.rst
> @@ -7,6 +7,7 @@ Scheduler
>   
>   
>       completion
> +    membarrier
>       sched-arch
>       sched-bwc
>       sched-deadline
> diff --git a/Documentation/scheduler/membarrier.rst b/Documentation/scheduler/membarrier.rst
> new file mode 100644
> index 0000000000000..ab7ee3824b407
> --- /dev/null
> +++ b/Documentation/scheduler/membarrier.rst
> @@ -0,0 +1,37 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +========================
> +membarrier() System Call
> +========================
> +
> +MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements
> +=====================================================================
> +
> +Memory barriers before updating rq->curr
> +----------------------------------------
> +
> +The command requires each architecture to have a full memory barrier after
> +coming from user-space, before updating rq->curr.  This barrier is implied
> +by the sequence rq_lock(); smp_mb__after_spinlock() in __schedule().  The
> +barrier matches a full barrier in the proximity of the membarrier system
> +call exit, cf. membarrier_{private,global}_expedited().
> +
> +Memory barriers after updating rq->curr
> +---------------------------------------
> +
> +The command requires each architecture to have a full memory barrier after
> +updating rq->curr, before returning to user-space.  The schemes providing
> +this barrier on the various architectures are as follows.
> +
> + - alpha, arc, arm, hexagon, mips rely on the full barrier implied by
> +   spin_unlock() in finish_lock_switch().
> +
> + - arm64 relies on the full barrier implied by switch_to().
> +
> + - powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by
> +   switch_mm(), if mm is not NULL; they rely on the full barrier implied
> +   by mmdrop(), otherwise.  On powerpc and riscv, switch_mm() relies on
> +   membarrier_arch_switch_mm().
> +
> +The barrier matches a full barrier in the proximity of the membarrier system
> +call entry, cf. membarrier_{private,global}_expedited().
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0f8cec504b2ba..6bce0aeecb4f2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13815,6 +13815,7 @@ M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>   M:	"Paul E. McKenney" <paulmck@kernel.org>
>   L:	linux-kernel@vger.kernel.org
>   S:	Supported
> +F:	Documentation/scheduler/membarrier.rst
>   F:	arch/*/include/asm/membarrier.h
>   F:	include/uapi/linux/membarrier.h
>   F:	kernel/sched/membarrier.c
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 711dc753f7216..b51bc86f8340c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6599,7 +6599,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
>   	 *     if (signal_pending_state())	    if (p->state & @state)
>   	 *
>   	 * Also, the membarrier system call requires a full memory barrier
> -	 * after coming from user-space, before storing to rq->curr.
> +	 * after coming from user-space, before storing to rq->curr; this
> +	 * barrier matches a full barrier in the proximity of the membarrier
> +	 * system call exit.
>   	 */
>   	rq_lock(rq, &rf);
>   	smp_mb__after_spinlock();
> @@ -6677,6 +6679,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
>   		 *   architectures where spin_unlock is a full barrier,
>   		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
>   		 *   is a RELEASE barrier),
> +		 *
> +		 * The barrier matches a full barrier in the proximity of
> +		 * the membarrier system call entry.
>   		 */
>   		++*switch_count;
>   
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 2ad881d07752c..f3d91628d6b8a 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -251,7 +251,7 @@ static int membarrier_global_expedited(void)
>   		return 0;
>   
>   	/*
> -	 * Matches memory barriers around rq->curr modification in
> +	 * Matches memory barriers after rq->curr modification in
>   	 * scheduler.
>   	 */
>   	smp_mb();	/* system call entry is not a mb. */
> @@ -300,7 +300,7 @@ static int membarrier_global_expedited(void)
>   
>   	/*
>   	 * Memory barrier on the caller thread _after_ we finished
> -	 * waiting for the last IPI. Matches memory barriers around
> +	 * waiting for the last IPI. Matches memory barriers before
>   	 * rq->curr modification in scheduler.
>   	 */
>   	smp_mb();	/* exit from system call is not a mb */
> @@ -339,7 +339,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
>   		return 0;
>   
>   	/*
> -	 * Matches memory barriers around rq->curr modification in
> +	 * Matches memory barriers after rq->curr modification in
>   	 * scheduler.
>   	 */
>   	smp_mb();	/* system call entry is not a mb. */
> @@ -415,7 +415,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
>   
>   	/*
>   	 * Memory barrier on the caller thread _after_ we finished
> -	 * waiting for the last IPI. Matches memory barriers around
> +	 * waiting for the last IPI. Matches memory barriers before
>   	 * rq->curr modification in scheduler.
>   	 */
>   	smp_mb();	/* exit from system call is not a mb */

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-10 14:55 ` [PATCH v3 4/4] membarrier: riscv: Provide core serializing command Andrea Parri
  2024-01-10 19:27   ` Stefan O'Rear
@ 2024-01-24 16:18   ` Mathieu Desnoyers
  2024-01-24 18:44     ` Andrea Parri
  1 sibling, 1 reply; 17+ messages in thread
From: Mathieu Desnoyers @ 2024-01-24 16:18 UTC (permalink / raw)
  To: Andrea Parri, paul.walmsley, palmer, aou, paulmck, corbet
  Cc: mmaas, hboehm, striker, charlie, rehn, linux-riscv, linux-doc,
	linux-kernel

On 2024-01-10 09:55, Andrea Parri wrote:
[...]
> 
> diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> index d96b778b87ed8..a163170fc0f48 100644
> --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
> @@ -10,6 +10,22 @@
>   # Rely on implicit context synchronization as a result of exception return
>   # when returning from IPI handler, and when returning to user-space.
>   #
> +# * riscv
> +#
> +# riscv uses xRET as return from interrupt and to return to user-space.
> +#
> +# Given that xRET is not core serializing, we rely on FENCE.I for providing
> +# core serialization:
> +#
> +#  - by calling sync_core_before_usermode() on return from interrupt (cf.
> +#    ipi_sync_core()),
> +#
> +#  - via switch_mm() and sync_core_before_usermode() (respectively, for
> +#    uthread->uthread and kthread->uthread transitions) to go back to
> +#    user-space.

I don't quite get the meaning of the sentence above. There seems to be a
missing marker before "to go back".

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-24 16:18   ` Mathieu Desnoyers
@ 2024-01-24 18:44     ` Andrea Parri
  2024-01-24 18:56       ` Mathieu Desnoyers
  0 siblings, 1 reply; 17+ messages in thread
From: Andrea Parri @ 2024-01-24 18:44 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paul.walmsley, palmer, aou, paulmck, corbet, mmaas, hboehm,
	striker, charlie, rehn, linux-riscv, linux-doc, linux-kernel

> > +# riscv uses xRET as return from interrupt and to return to user-space.
> > +#
> > +# Given that xRET is not core serializing, we rely on FENCE.I for providing
> > +# core serialization:
> > +#
> > +#  - by calling sync_core_before_usermode() on return from interrupt (cf.
> > +#    ipi_sync_core()),
> > +#
> > +#  - via switch_mm() and sync_core_before_usermode() (respectively, for
> > +#    uthread->uthread and kthread->uthread transitions) to go back to
> > +#    user-space.
> 
> I don't quite get the meaning of the sentence above. There seems to be a
> missing marker before "to go back".

Let's see.  Without the round brackets, the last part becomes:

  - via switch_mm() and sync_core_before_usermode() to go back to
    user-space.

This is indeed what I meant to say.  What am I missing?

  Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-24 18:44     ` Andrea Parri
@ 2024-01-24 18:56       ` Mathieu Desnoyers
  2024-01-24 21:43         ` Andrea Parri
  0 siblings, 1 reply; 17+ messages in thread
From: Mathieu Desnoyers @ 2024-01-24 18:56 UTC (permalink / raw)
  To: Andrea Parri
  Cc: paul.walmsley, palmer, aou, paulmck, corbet, mmaas, hboehm,
	striker, charlie, rehn, linux-riscv, linux-doc, linux-kernel

On 2024-01-24 13:44, Andrea Parri wrote:
>>> +# riscv uses xRET as return from interrupt and to return to user-space.
>>> +#
>>> +# Given that xRET is not core serializing, we rely on FENCE.I for providing
>>> +# core serialization:
>>> +#
>>> +#  - by calling sync_core_before_usermode() on return from interrupt (cf.
>>> +#    ipi_sync_core()),
>>> +#
>>> +#  - via switch_mm() and sync_core_before_usermode() (respectively, for
>>> +#    uthread->uthread and kthread->uthread transitions) to go back to
>>> +#    user-space.
>>
>> I don't quite get the meaning of the sentence above. There seems to be a
>> missing marker before "to go back".
> 
> Let's see.  Without the round brackets, the last part becomes:
> 
>    - via switch_mm() and sync_core_before_usermode() to go back to
>      user-space.
> 
> This is indeed what I meant to say.  What am I missing?

Would it still fit your intent if we say "before returning to
user-space" rather than "to go back to user-space" ?

Because the switch_mm(), for instance, does not happen exactly on
return to user-space, but rather when the scheduler switches tasks.
Therefore, I think that stating that core serialization needs to
happen before returning to user-space is clearer than stating that
it happens "when" we go back to user-space.

Also, on another topic, did you find a way forward with respect of
the different choice of words between the membarrier man page and
documentation vs the RISC-V official semantic with respect to "core
serializing" vs FENCE.I ?

Thanks,

Mathieu


> 
>    Andrea

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] membarrier: riscv: Provide core serializing command
  2024-01-24 18:56       ` Mathieu Desnoyers
@ 2024-01-24 21:43         ` Andrea Parri
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Parri @ 2024-01-24 21:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paul.walmsley, palmer, aou, paulmck, corbet, mmaas, hboehm,
	striker, charlie, rehn, linux-riscv, linux-doc, linux-kernel

On Wed, Jan 24, 2024 at 01:56:39PM -0500, Mathieu Desnoyers wrote:
> On 2024-01-24 13:44, Andrea Parri wrote:
> > > > +# riscv uses xRET as return from interrupt and to return to user-space.
> > > > +#
> > > > +# Given that xRET is not core serializing, we rely on FENCE.I for providing
> > > > +# core serialization:
> > > > +#
> > > > +#  - by calling sync_core_before_usermode() on return from interrupt (cf.
> > > > +#    ipi_sync_core()),
> > > > +#
> > > > +#  - via switch_mm() and sync_core_before_usermode() (respectively, for
> > > > +#    uthread->uthread and kthread->uthread transitions) to go back to
> > > > +#    user-space.
> > > 
> > > I don't quite get the meaning of the sentence above. There seems to be a
> > > missing marker before "to go back".
> > 
> > Let's see.  Without the round brackets, the last part becomes:
> > 
> >    - via switch_mm() and sync_core_before_usermode() to go back to
> >      user-space.
> > 
> > This is indeed what I meant to say.  What am I missing?
> 
> Would it still fit your intent if we say "before returning to
> user-space" rather than "to go back to user-space" ?

Yes, works for me.  Will change in v4.


> Because the switch_mm(), for instance, does not happen exactly on
> return to user-space, but rather when the scheduler switches tasks.
> Therefore, I think that stating that core serialization needs to
> happen before returning to user-space is clearer than stating that
> it happens "when" we go back to user-space.
> 
> Also, on another topic, did you find a way forward with respect of
> the different choice of words between the membarrier man page and
> documentation vs the RISC-V official semantic with respect to "core
> serializing" vs FENCE.I ?

The way forward I envision involves the continuous (iterative) discussion
/review of the respective documentation and use-cases/litmus tests/models
/etc.

AFAICS, that is not that different from discussions about smp_mb() (as in
memory-barriers.txt) vs. FENCE RW,RW (RISC-V ISA manual) - only time will
tell.

  Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-01-24 21:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-10 14:55 [PATCH v3 0/4] membarrier: riscv: Core serializing command Andrea Parri
2024-01-10 14:55 ` [PATCH v3 1/4] membarrier: riscv: Add full memory barrier in switch_mm() Andrea Parri
2024-01-10 14:55 ` [PATCH v3 2/4] membarrier: Create Documentation/scheduler/membarrier.rst Andrea Parri
2024-01-10 18:15   ` Randy Dunlap
2024-01-10 19:05     ` Andrea Parri
2024-01-10 19:08       ` Randy Dunlap
2024-01-10 19:19         ` Andrea Parri
2024-01-24 16:13   ` Mathieu Desnoyers
2024-01-10 14:55 ` [PATCH v3 3/4] locking: Introduce prepare_sync_core_cmd() Andrea Parri
2024-01-10 14:55 ` [PATCH v3 4/4] membarrier: riscv: Provide core serializing command Andrea Parri
2024-01-10 19:27   ` Stefan O'Rear
2024-01-10 22:34     ` Andrea Parri
2024-01-24 16:18   ` Mathieu Desnoyers
2024-01-24 18:44     ` Andrea Parri
2024-01-24 18:56       ` Mathieu Desnoyers
2024-01-24 21:43         ` Andrea Parri
2024-01-24 14:13 ` [PATCH v3 0/4] membarrier: riscv: Core " Andrea Parri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).