linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
@ 2020-08-11  0:09 Peter Oskolkov
  2020-08-11  0:09 ` [PATCH 2/2 v3] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
  2020-08-11  6:27 ` [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Oskolkov @ 2020-08-11  0:09 UTC (permalink / raw)
  To: Mathieu Desnoyers, Paul E . McKenney, Peter Zijlstra, Boqun Feng,
	linux-kernel
  Cc: Paul Turner, Chris Kennelly, Peter Oskolkov, Peter Oskolkov

This patchset is based on Google-internal RSEQ
work done by Paul Turner and Andrew Hunter.

When working with per-CPU RSEQ-based memory allocations,
it is sometimes important to make sure that a global
memory location is no longer accessed from RSEQ critical
sections. For example, there can be two per-CPU lists,
one is "active" and accessed per-CPU, while another one
is inactive and worked on asynchronously "off CPU" (e.g.
garbage collection is performed). Then at some point
the two lists are swapped, and a fast RCU-like mechanism
is required to make sure that the previously active
list is no longer accessed.

This patch introduces such a mechanism: in short,
membarrier() syscall issues an IPI to a CPU, restarting
a potentially active RSEQ critical section on the CPU.

v1->v2:
  - removed the ability to IPI all CPUs in a single sycall;
  - use task->mm rather than task->group_leader to identify
    tasks belonging to the same process.
v2->v3:
  - re-added the ability to IPI all CPUs in a single syscall;
  - integrated with membarrier_private_expedited() to
    make sure only CPUs running tasks with the same mm as
    the current task are interrupted;
  - also added MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ;
  - flags in membarrier_private_expedited are never actually
    bit flags but always distinct values (i.e. never two flags
    are combined), so I modified bit testing to full equation
    comparison for simplicity (otherwise the code needs to
    work when several bits are set, for example).

The second patch in the patchset adds a selftest
of this feature.

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 include/linux/sched/mm.h        |  3 ++
 include/uapi/linux/membarrier.h | 24 ++++++++++++++
 kernel/sched/membarrier.c       | 55 ++++++++++++++++++++++++++++-----
 3 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 6be66f52a2ad..477d5526e053 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -352,10 +352,13 @@ enum {
 	MEMBARRIER_STATE_GLOBAL_EXPEDITED			= (1U << 3),
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY	= (1U << 4),
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE		= (1U << 5),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY		= (1U << 6),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ			= (1U << 7),
 };
 
 enum {
 	MEMBARRIER_FLAG_SYNC_CORE	= (1U << 0),
+	MEMBARRIER_FLAG_RSEQ		= (1U << 1),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 5891d7614c8c..f6bafe0676ba 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -114,6 +114,28 @@
  *                          If this command is not implemented by an
  *                          architecture, -EINVAL is returned.
  *                          Returns 0 on success.
+ * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ *                          In addition to provide memory ordering
+ *                          guarantees described in
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE,
+ *                          ensure the caller thread, upon return from
+ *                          system call, that all its running thread
+ *                          siblings have any currently running rseq
+ *                          critical sections restarted if @flags
+ *                          parameter is -1; if @flags parameter is >= 0,
+ *                          then this operation is performed only
+ *                          on CPU indicated by @flags. If this command is
+ *                          not implemented by an architecture, -EINVAL
+ *                          is returned. A process needs to register its
+ *                          intent to use the private expedited rseq
+ *                          command prior to using it, otherwise
+ *                          this command returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
+ *                          Register the process intent to use
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
+ *                          If this command is not implemented by an
+ *                          architecture, -EINVAL is returned.
+ *                          Returns 0 on success.
  * @MEMBARRIER_CMD_SHARED:
  *                          Alias to MEMBARRIER_CMD_GLOBAL. Provided for
  *                          header backward compatibility.
@@ -131,6 +153,8 @@ enum membarrier_cmd {
 	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
 	MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE		= (1 << 5),
 	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE	= (1 << 6),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ			= (1 << 7),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ		= (1 << 8),
 
 	/* Alias for header backward compatibility. */
 	MEMBARRIER_CMD_SHARED			= MEMBARRIER_CMD_GLOBAL,
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 168479a7d61b..54c1093ebc87 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -18,6 +18,14 @@
 #define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK	0
 #endif
 
+#ifdef CONFIG_RSEQ
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK		\
+	(MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ			\
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK)
+#else
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK	0
+#endif
+
 #define MEMBARRIER_CMD_BITMASK						\
 	(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED	\
 	| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED			\
@@ -27,6 +35,12 @@
 
 static void ipi_mb(void *info)
 {
+	int *flags = info;
+
+#ifdef CONFIG_RSEQ
+	if (flags && (*flags == MEMBARRIER_FLAG_RSEQ))
+		rseq_preempt(current);
+#endif
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
@@ -129,19 +143,26 @@ static int membarrier_global_expedited(void)
 	return 0;
 }
 
-static int membarrier_private_expedited(int flags)
+static int membarrier_private_expedited(int flags, int cpu_id)
 {
 	int cpu;
 	cpumask_var_t tmpmask;
 	struct mm_struct *mm = current->mm;
 
-	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+	if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
 			return -EINVAL;
 		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
 			return -EPERM;
+	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
+		if (!IS_ENABLED(CONFIG_RSEQ))
+			return -EINVAL;
+		if (!(atomic_read(&mm->membarrier_state) &
+		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY))
+			return -EPERM;
 	} else {
+		BUG_ON(flags != 0);
 		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
 			return -EPERM;
@@ -174,6 +195,8 @@ static int membarrier_private_expedited(int flags)
 		 */
 		if (cpu == raw_smp_processor_id())
 			continue;
+		if (cpu_id >= 0 && cpu != cpu_id)
+			continue;
 		p = rcu_dereference(cpu_rq(cpu)->curr);
 		if (p && p->mm == mm)
 			__cpumask_set_cpu(cpu, tmpmask);
@@ -181,7 +204,7 @@ static int membarrier_private_expedited(int flags)
 	rcu_read_unlock();
 
 	preempt_disable();
-	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	smp_call_function_many(tmpmask, ipi_mb, &flags, 1);
 	preempt_enable();
 
 	free_cpumask_var(tmpmask);
@@ -283,11 +306,18 @@ static int membarrier_register_private_expedited(int flags)
 	    set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED,
 	    ret;
 
-	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+	if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
 			return -EINVAL;
 		ready_state =
 			MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+	} else if (flags == MEMBARRIER_FLAG_RSEQ) {
+		if (!IS_ENABLED(CONFIG_RSEQ))
+			return -EINVAL;
+		ready_state =
+			MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY;
+	} else {
+		BUG_ON(flags != 0);
 	}
 
 	/*
@@ -299,6 +329,8 @@ static int membarrier_register_private_expedited(int flags)
 		return 0;
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
 		set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE;
+	if (flags & MEMBARRIER_FLAG_RSEQ)
+		set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ;
 	atomic_or(set_state, &mm->membarrier_state);
 	ret = sync_runqueues_membarrier_state(mm);
 	if (ret)
@@ -311,7 +343,10 @@ static int membarrier_register_private_expedited(int flags)
 /**
  * sys_membarrier - issue memory barriers on a set of threads
  * @cmd:   Takes command values defined in enum membarrier_cmd.
- * @flags: Currently needs to be 0. For future extensions.
+ * @flags: Currently needs to be 0 for all commands other than
+ *         MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ: in the latter
+ *         case it indicates the CPU on which to interrupt (= restart)
+ *         the RSEQ critical section, or -1 to restart on all CPUs.
  *
  * If this system call is not implemented, -ENOSYS is returned. If the
  * command specified does not exist, not available on the running
@@ -339,7 +374,7 @@ static int membarrier_register_private_expedited(int flags)
  */
 SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 {
-	if (unlikely(flags))
+	if (unlikely(flags) && cmd != MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
 		return -EINVAL;
 	switch (cmd) {
 	case MEMBARRIER_CMD_QUERY:
@@ -362,13 +397,17 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 	case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
 		return membarrier_register_global_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		return membarrier_private_expedited(0);
+		return membarrier_private_expedited(0, -1);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
 		return membarrier_register_private_expedited(0);
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
-		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, -1);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
 		return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+		return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, flags);
+	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
+		return membarrier_register_private_expedited(MEMBARRIER_FLAG_RSEQ);
 	default:
 		return -EINVAL;
 	}
-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2 v3] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-11  0:09 [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
@ 2020-08-11  0:09 ` Peter Oskolkov
  2020-08-12 20:11   ` Mathieu Desnoyers
  2020-08-11  6:27 ` [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Oskolkov @ 2020-08-11  0:09 UTC (permalink / raw)
  To: Mathieu Desnoyers, Paul E . McKenney, Peter Zijlstra, Boqun Feng,
	linux-kernel
  Cc: Paul Turner, Chris Kennelly, Peter Oskolkov, Peter Oskolkov

Based on Google-internal RSEQ work done by
Paul Turner and Andrew Hunter.

This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
The test quite often fails without the previous patch in this patchset,
but consistently passes with it.

v3: added rseq_offset_deref_addv() to x86_64 to make the test
    more explicit; on other architectures I kept using existing
    rseq_cmpeqv_cmpeqv_storev() as I have no easy way to test
    there. Added a comment explaining why the test works this way.

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 .../selftests/rseq/basic_percpu_ops_test.c    | 196 ++++++++++++++++++
 tools/testing/selftests/rseq/rseq-x86.h       |  55 +++++
 2 files changed, 251 insertions(+)

diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
index eb3f6db36d36..c9784a3d19fb 100644
--- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -3,16 +3,22 @@
 #include <assert.h>
 #include <pthread.h>
 #include <sched.h>
+#include <stdatomic.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <stddef.h>
+#include <syscall.h>
+#include <unistd.h>
 
 #include "rseq.h"
 
 #define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
 
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ	(1<<7)
+#define MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ	(1<<8)
+
 struct percpu_lock_entry {
 	intptr_t v;
 } __attribute__((aligned(128)));
@@ -289,6 +295,194 @@ void test_percpu_list(void)
 	assert(sum == expected_sum);
 }
 
+struct test_membarrier_thread_args {
+	int stop;
+	intptr_t percpu_list_ptr;
+};
+
+/* Worker threads modify data in their "active" percpu lists. */
+void *test_membarrier_worker_thread(void *arg)
+{
+	struct test_membarrier_thread_args *args =
+		(struct test_membarrier_thread_args *)arg;
+	const int iters = 10 * 1000 * 1000;
+	int i;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	/* Wait for initialization. */
+	while (!atomic_load(&args->percpu_list_ptr)) {}
+
+	for (i = 0; i < iters; ++i) {
+		int ret;
+
+		do {
+			int cpu = rseq_cpu_start();
+#if defined(__x86_64__)
+			/* For x86_64, we have rseq_offset_deref_addv. */
+			ret = rseq_offset_deref_addv(&args->percpu_list_ptr,
+							128 * cpu, 1, cpu);
+#else
+			/*
+			 *  For other architectures, we rely on the fact that
+			 *  the manager thread keeps list_ptr alive, so we can
+			 *  use rseq_cmpeqv_cmpeqv_storev to make sure
+			 *  list_ptr we got outside of rseq cs is still
+			 *  "active".
+			 */
+			struct percpu_list *list_ptr = (struct percpu_list *)
+					atomic_load(&args->percpu_list_ptr);
+
+			struct percpu_list_node *node = list_ptr->c[cpu].head;
+			const intptr_t prev = node->data;
+
+			ret = rseq_cmpeqv_cmpeqv_storev(&node->data, prev,
+					&args->percpu_list_ptr,
+					(intptr_t)list_ptr, prev + 1, cpu);
+#endif
+		} while (rseq_unlikely(ret));
+	}
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	return NULL;
+}
+
+void test_membarrier_init_percpu_list(struct percpu_list *list)
+{
+	int i;
+
+	memset(list, 0, sizeof(*list));
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		struct percpu_list_node *node;
+
+		node = malloc(sizeof(*node));
+		assert(node);
+		node->data = 0;
+		node->next = NULL;
+		list->c[i].head = node;
+	}
+}
+
+void test_membarrier_free_percpu_list(struct percpu_list *list)
+{
+	int i;
+
+	for (i = 0; i < CPU_SETSIZE; i++)
+		free(list->c[i].head);
+}
+
+static int sys_membarrier(int cmd, int flags)
+{
+	return syscall(__NR_membarrier, cmd, flags);
+}
+
+/*
+ * The manager thread swaps per-cpu lists that worker threads see,
+ * and validates that there are no unexpected modifications.
+ */
+void *test_membarrier_manager_thread(void *arg)
+{
+	struct test_membarrier_thread_args *args =
+		(struct test_membarrier_thread_args *)arg;
+	struct percpu_list list_a, list_b;
+	intptr_t expect_a = 0, expect_b = 0;
+	int cpu_a = 0, cpu_b = 0;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	/* Init lists. */
+	test_membarrier_init_percpu_list(&list_a);
+	test_membarrier_init_percpu_list(&list_b);
+
+	atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
+
+	while (!atomic_load(&args->stop)) {
+		/* list_a is "active". */
+		cpu_a = rand() % CPU_SETSIZE;
+		/*
+		 * As list_b is "inactive", we should never see changes
+		 * to list_b.
+		 */
+		if (expect_b != atomic_load(&list_b.c[cpu_b].head->data)) {
+			fprintf(stderr, "Membarrier test failed\n");
+			abort();
+		}
+
+		/* Make list_b "active". */
+		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_b);
+		sys_membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ, cpu_a);
+		/*
+		 * Cpu A should now only modify list_b, so we values
+		 * in list_a should be stable.
+		 */
+		expect_a = atomic_load(&list_a.c[cpu_a].head->data);
+
+		cpu_b = rand() % CPU_SETSIZE;
+		/*
+		 * As list_a is "inactive", we should never see changes
+		 * to list_a.
+		 */
+		if (expect_a != atomic_load(&list_a.c[cpu_a].head->data)) {
+			fprintf(stderr, "Membarrier test failed\n");
+			abort();
+		}
+
+		/* Make list_a "active". */
+		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
+		sys_membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ, cpu_b);
+		/* Remember a value from list_b. */
+		expect_b = atomic_load(&list_b.c[cpu_b].head->data);
+	}
+
+	test_membarrier_free_percpu_list(&list_a);
+	test_membarrier_free_percpu_list(&list_b);
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	return NULL;
+}
+
+/* Test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU membarrier command. */
+void test_membarrier(void)
+{
+	struct test_membarrier_thread_args thread_args;
+	pthread_t worker_threads[CPU_SETSIZE];
+	pthread_t manager_thread;
+	int i;
+
+	sys_membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ, 0);
+
+	thread_args.stop = 0;
+	thread_args.percpu_list_ptr = 0;
+	pthread_create(&manager_thread, NULL,
+		       test_membarrier_manager_thread, &thread_args);
+
+	for (i = 0; i < CPU_SETSIZE; i++)
+		pthread_create(&worker_threads[i], NULL,
+		       test_membarrier_worker_thread, &thread_args);
+
+	for (i = 0; i < CPU_SETSIZE; i++)
+		pthread_join(worker_threads[i], NULL);
+
+	atomic_store(&thread_args.stop, 1);
+	pthread_join(manager_thread, NULL);
+}
+
 int main(int argc, char **argv)
 {
 	if (rseq_register_current_thread()) {
@@ -300,6 +494,8 @@ int main(int argc, char **argv)
 	test_percpu_spinlock();
 	printf("percpu_list\n");
 	test_percpu_list();
+	printf("membarrier\n");
+	test_membarrier();
 	if (rseq_unregister_current_thread()) {
 		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
 			errno, strerror(errno));
diff --git a/tools/testing/selftests/rseq/rseq-x86.h b/tools/testing/selftests/rseq/rseq-x86.h
index b2da6004fe30..3ed13a6a47e3 100644
--- a/tools/testing/selftests/rseq/rseq-x86.h
+++ b/tools/testing/selftests/rseq/rseq-x86.h
@@ -279,6 +279,61 @@ int rseq_addv(intptr_t *v, intptr_t count, int cpu)
 #endif
 }
 
+/*
+ *   pval = *(ptr+off)
+ *  *pval += inc;
+ */
+static inline __attribute__((always_inline))
+int rseq_offset_deref_addv(intptr_t *ptr, off_t off, intptr_t inc, int cpu)
+{
+	RSEQ_INJECT_C(9)
+
+	__asm__ __volatile__ goto (
+		RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */
+#ifdef RSEQ_COMPARE_TWICE
+		RSEQ_ASM_DEFINE_EXIT_POINT(1f, %l[error1])
+#endif
+		/* Start rseq by storing table entry pointer into rseq_cs. */
+		RSEQ_ASM_STORE_RSEQ_CS(1, 3b, RSEQ_CS_OFFSET(%[rseq_abi]))
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, RSEQ_CPU_ID_OFFSET(%[rseq_abi]), 4f)
+		RSEQ_INJECT_ASM(3)
+#ifdef RSEQ_COMPARE_TWICE
+		RSEQ_ASM_CMP_CPU_ID(cpu_id, RSEQ_CPU_ID_OFFSET(%[rseq_abi]), %l[error1])
+#endif
+		/* get p+v */
+		"movq %[ptr], %%rbx\n\t"
+		"addq %[off], %%rbx\n\t"
+		/* get pv */
+		"movq (%%rbx), %%rcx\n\t"
+		/* *pv += inc */
+		"addq %[inc], (%%rcx)\n\t"
+		"2:\n\t"
+		RSEQ_INJECT_ASM(4)
+		RSEQ_ASM_DEFINE_ABORT(4, "", abort)
+		: /* gcc asm goto does not allow outputs */
+		: [cpu_id]		"r" (cpu),
+		  [rseq_abi]		"r" (&__rseq_abi),
+		  /* final store input */
+		  [ptr]			"m" (*ptr),
+		  [off]			"er" (off),
+		  [inc]			"er" (inc)
+		: "memory", "cc", "rax", "rbx", "rcx"
+		  RSEQ_INJECT_CLOBBER
+		: abort
+#ifdef RSEQ_COMPARE_TWICE
+		  , error1
+#endif
+	);
+	return 0;
+abort:
+	RSEQ_INJECT_FAILED
+	return -1;
+#ifdef RSEQ_COMPARE_TWICE
+error1:
+	rseq_bug("cpu_id comparison failed");
+#endif
+}
+
 static inline __attribute__((always_inline))
 int rseq_cmpeqv_trystorev_storev(intptr_t *v, intptr_t expect,
 				 intptr_t *v2, intptr_t newv2,
-- 
2.28.0.236.gb10cc79966-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-11  0:09 [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
  2020-08-11  0:09 ` [PATCH 2/2 v3] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
@ 2020-08-11  6:27 ` Peter Zijlstra
  2020-08-11  7:54   ` Peter Zijlstra
  2020-08-11 21:08   ` Peter Oskolkov
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2020-08-11  6:27 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Mathieu Desnoyers, Paul E . McKenney, Boqun Feng, linux-kernel,
	Paul Turner, Chris Kennelly, Peter Oskolkov

On Mon, Aug 10, 2020 at 05:09:58PM -0700, Peter Oskolkov wrote:
> @@ -27,6 +35,12 @@
>  
>  static void ipi_mb(void *info)
>  {

The #ifdef wants to behere, otherwise you'll get a compile warning for
!RSEQ builds.

> +	int *flags = info;
> +
> +#ifdef CONFIG_RSEQ
> +	if (flags && (*flags == MEMBARRIER_FLAG_RSEQ))
> +		rseq_preempt(current);
> +#endif
>  	smp_mb();	/* IPIs should be serializing but paranoid. */
>  }

But yes, this looks much better.

Mathieu did mention a few other points that I didn't see addressed:

 - he didn't like abusing the @flags syscall argument for a CPUid;
 - he wondered if we should support SYNC_CORE + RSEQ.


Not sure we can easily change the syscall at this point, but the latter
point could be addressed with something like this.

---
Index: linux-2.6/kernel/sched/membarrier.c
===================================================================
--- linux-2.6.orig/kernel/sched/membarrier.c
+++ linux-2.6/kernel/sched/membarrier.c
@@ -374,8 +374,26 @@ static int membarrier_register_private_e
  */
 SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 {
+	int cflags = 0, int cpuid = -1;
+
 	if (unlikely(flags) && cmd != MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
 		return -EINVAL;
+
+	if (cmd & (MEMBARRIER_CMD_PRIVATE_EXPEDITED |
+		   MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE |
+		   MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)) {
+
+		if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
+			cflags |= MEMBARRIER_FLAG_RSEQ;
+
+		if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+			cflags |= MEMBARRIER_FLAG_SYNC_CORE;
+			cpuid = flags;
+		}
+
+		cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED;
+	}
+
 	switch (cmd) {
 	case MEMBARRIER_CMD_QUERY:
 	{
@@ -396,18 +414,16 @@ SYSCALL_DEFINE2(membarrier, int, cmd, in
 		return membarrier_global_expedited();
 	case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
 		return membarrier_register_global_expedited();
-	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		return membarrier_private_expedited(0, -1);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
 		return membarrier_register_private_expedited(0);
-	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
-		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, -1);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
 		return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
-	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
-		return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, flags);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
 		return membarrier_register_private_expedited(MEMBARRIER_FLAG_RSEQ);
+
+	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
+		return membarrier_private_expedited(cflags, cpuid);
+
 	default:
 		return -EINVAL;
 	}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-11  6:27 ` [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Zijlstra
@ 2020-08-11  7:54   ` Peter Zijlstra
  2020-08-11 21:08   ` Peter Oskolkov
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2020-08-11  7:54 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Mathieu Desnoyers, Paul E . McKenney, Boqun Feng, linux-kernel,
	Paul Turner, Chris Kennelly, Peter Oskolkov

On Tue, Aug 11, 2020 at 08:27:33AM +0200, Peter Zijlstra wrote:
>  SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>  {
> +	int cflags = 0, int cpuid = -1;
> +
>  	if (unlikely(flags) && cmd != MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
>  		return -EINVAL;
> +
> +	if (cmd & (MEMBARRIER_CMD_PRIVATE_EXPEDITED |
> +		   MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE |
> +		   MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)) {
> +
> +		if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
> +			cflags |= MEMBARRIER_FLAG_RSEQ;
> +
> +		if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
> +			cflags |= MEMBARRIER_FLAG_SYNC_CORE;
> +			cpuid = flags;
> +		}
> +
> +		cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED;
> +	}

This of course fails to check if other bits are set, and it goes really
'funny' if you use cpuid != -1.

That all needs a little more thought.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-11  6:27 ` [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Zijlstra
  2020-08-11  7:54   ` Peter Zijlstra
@ 2020-08-11 21:08   ` Peter Oskolkov
  2020-08-12 18:30     ` Mathieu Desnoyers
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Oskolkov @ 2020-08-11 21:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Peter Oskolkov, Mathieu Desnoyers, Paul E . McKenney, Boqun Feng,
	Linux Kernel Mailing List, Paul Turner, Chris Kennelly

On Mon, Aug 10, 2020 at 11:27 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Aug 10, 2020 at 05:09:58PM -0700, Peter Oskolkov wrote:
> > @@ -27,6 +35,12 @@
> >
> >  static void ipi_mb(void *info)
> >  {
>
> The #ifdef wants to behere, otherwise you'll get a compile warning for
> !RSEQ builds.

Ack. Will do in the next version - for now waiting for the rest to be
worked out.

[...]

>
> Mathieu did mention a few other points that I didn't see addressed:
>
>  - he didn't like abusing the @flags syscall argument for a CPUid;

@flags is not used now; maybe just rename it to something more
generic? @param? Or @options? Or maybe more specific, like @cpu_id?

>  - he wondered if we should support SYNC_CORE + RSEQ.

It seems to me that CMD_PRIVATE_EXPEDITED_RSEQ is basically
CMD_PRIVATE_EXPEDITED_SYNC_CORE with the extra "restart RSEQ CSs"
behavior. Am I missing something? If not, what is the point of
complicating the code as suggested below? Maybe just renaming
CMD_PRIVATE_EXPEDITED_RSEQ to CMD_PRIVATE_EXPEDITED_SYNC_CORE_RSEQ
will do?

>
>
> Not sure we can easily change the syscall at this point, but the latter
> point could be addressed with something like this.
>
> ---
> Index: linux-2.6/kernel/sched/membarrier.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/membarrier.c
> +++ linux-2.6/kernel/sched/membarrier.c
> @@ -374,8 +374,26 @@ static int membarrier_register_private_e
>   */
>  SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>  {
> +       int cflags = 0, int cpuid = -1;
> +
>         if (unlikely(flags) && cmd != MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
>                 return -EINVAL;
> +
> +       if (cmd & (MEMBARRIER_CMD_PRIVATE_EXPEDITED |
> +                  MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE |
> +                  MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)) {
> +
> +               if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
> +                       cflags |= MEMBARRIER_FLAG_RSEQ;
> +
> +               if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
> +                       cflags |= MEMBARRIER_FLAG_SYNC_CORE;
> +                       cpuid = flags;
> +               }
> +
> +               cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED;
> +       }
> +
>         switch (cmd) {
>         case MEMBARRIER_CMD_QUERY:
>         {
> @@ -396,18 +414,16 @@ SYSCALL_DEFINE2(membarrier, int, cmd, in
>                 return membarrier_global_expedited();
>         case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
>                 return membarrier_register_global_expedited();
> -       case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
> -               return membarrier_private_expedited(0, -1);
>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
>                 return membarrier_register_private_expedited(0);
> -       case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
> -               return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, -1);
>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
>                 return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
> -       case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
> -               return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, flags);
>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
>                 return membarrier_register_private_expedited(MEMBARRIER_FLAG_RSEQ);
> +
> +       case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
> +               return membarrier_private_expedited(cflags, cpuid);
> +
>         default:
>                 return -EINVAL;
>         }
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-11 21:08   ` Peter Oskolkov
@ 2020-08-12 18:30     ` Mathieu Desnoyers
  2020-08-12 18:48       ` Peter Oskolkov
  0 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2020-08-12 18:30 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Peter Zijlstra, Peter Oskolkov, paulmck, Boqun Feng,
	linux-kernel, Paul Turner, Chris Kennelly



----- On Aug 11, 2020, at 5:08 PM, Peter Oskolkov posk@posk.io wrote:

> On Mon, Aug 10, 2020 at 11:27 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Mon, Aug 10, 2020 at 05:09:58PM -0700, Peter Oskolkov wrote:
>> > @@ -27,6 +35,12 @@
>> >
>> >  static void ipi_mb(void *info)
>> >  {
>>
>> The #ifdef wants to behere, otherwise you'll get a compile warning for
>> !RSEQ builds.
> 
> Ack. Will do in the next version - for now waiting for the rest to be
> worked out.
> 
> [...]
> 
>>
>> Mathieu did mention a few other points that I didn't see addressed:
>>
>>  - he didn't like abusing the @flags syscall argument for a CPUid;
> 
> @flags is not used now; maybe just rename it to something more
> generic? @param? Or @options? Or maybe more specific, like @cpu_id?

"flags" is there to allow extensibility without requiring to add new
membarrier commands for every change. Even though it is not used now,
I don't think re-purposing it is a good idea. What is wrong with just
adding an additional "cpu" parameter to the system call ?

A "flags" parameter is very common for system calls. I don't see why
we should change its name, especially given it is already exposed and
documented as "flags" in man pages.

> 
>>  - he wondered if we should support SYNC_CORE + RSEQ.
> 
> It seems to me that CMD_PRIVATE_EXPEDITED_RSEQ is basically
> CMD_PRIVATE_EXPEDITED_SYNC_CORE with the extra "restart RSEQ CSs"
> behavior. Am I missing something?

No. The "sync-core" is about doing context synchronization for JIts, and
is not implemented on all architectures today. RSEQ however is available
on a wider range of architectures.

> If not, what is the point of
> complicating the code as suggested below? Maybe just renaming
> CMD_PRIVATE_EXPEDITED_RSEQ to CMD_PRIVATE_EXPEDITED_SYNC_CORE_RSEQ
> will do?

We basically have the following feature matrix:

- private / global
- expedited / non-expedited
- sync-core / non-sync-core
- rseq-fence / non-rseq-fence

For a total of about 16 combinations in total if we want to support them
all.

We can continue to add separate commands for new combinations, but if we
want to allow them to be combined, using flags rather than adding extra
commands would have the advantage of keeping the number of commands
manageable.

However, if there is no actual use-case for combining a membarrier sync-core
and a membarrier rseq-fence, then it limits the number of commands and maybe
then it's acceptable to add the rseq-fence as a separate membarrier command.

I prefer to have this discussion now rather than once we get to the point of
having 40 membarrier commands for all possible combinations.

Thanks,

Mathieu

> 
>>
>>
>> Not sure we can easily change the syscall at this point, but the latter
>> point could be addressed with something like this.
>>
>> ---
>> Index: linux-2.6/kernel/sched/membarrier.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/sched/membarrier.c
>> +++ linux-2.6/kernel/sched/membarrier.c
>> @@ -374,8 +374,26 @@ static int membarrier_register_private_e
>>   */
>>  SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>>  {
>> +       int cflags = 0, int cpuid = -1;
>> +
>>         if (unlikely(flags) && cmd != MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
>>                 return -EINVAL;
>> +
>> +       if (cmd & (MEMBARRIER_CMD_PRIVATE_EXPEDITED |
>> +                  MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE |
>> +                  MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)) {
>> +
>> +               if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ)
>> +                       cflags |= MEMBARRIER_FLAG_RSEQ;
>> +
>> +               if (cmd & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
>> +                       cflags |= MEMBARRIER_FLAG_SYNC_CORE;
>> +                       cpuid = flags;
>> +               }
>> +
>> +               cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED;
>> +       }
>> +
>>         switch (cmd) {
>>         case MEMBARRIER_CMD_QUERY:
>>         {
>> @@ -396,18 +414,16 @@ SYSCALL_DEFINE2(membarrier, int, cmd, in
>>                 return membarrier_global_expedited();
>>         case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
>>                 return membarrier_register_global_expedited();
>> -       case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
>> -               return membarrier_private_expedited(0, -1);
>>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
>>                 return membarrier_register_private_expedited(0);
>> -       case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
>> -               return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE,
>> -1);
>>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
>>                 return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
>> -       case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
>> -               return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ,
>> flags);
>>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
>>                 return membarrier_register_private_expedited(MEMBARRIER_FLAG_RSEQ);
>> +
>> +       case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
>> +               return membarrier_private_expedited(cflags, cpuid);
>> +
>>         default:
>>                 return -EINVAL;
>>         }

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-12 18:30     ` Mathieu Desnoyers
@ 2020-08-12 18:48       ` Peter Oskolkov
  2020-08-12 19:44         ` Mathieu Desnoyers
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Oskolkov @ 2020-08-12 18:48 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Peter Oskolkov, paulmck, Boqun Feng,
	linux-kernel, Paul Turner, Chris Kennelly

On Wed, Aug 12, 2020 at 11:30 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:

[...]

> "flags" is there to allow extensibility without requiring to add new
> membarrier commands for every change. Even though it is not used now,
> I don't think re-purposing it is a good idea. What is wrong with just
> adding an additional "cpu" parameter to the system call ?

Can we do that? I thought adding an additional parameter means adding
another syscall (ABI => parameter types/count cannot change?)

> A "flags" parameter is very common for system calls. I don't see why
> we should change its name, especially given it is already exposed and
> documented as "flags" in man pages.
>

[...]

> We basically have the following feature matrix:
>
> - private / global
> - expedited / non-expedited
> - sync-core / non-sync-core
> - rseq-fence / non-rseq-fence
>
> For a total of about 16 combinations in total if we want to support them
> all.
>
> We can continue to add separate commands for new combinations, but if we
> want to allow them to be combined, using flags rather than adding extra
> commands would have the advantage of keeping the number of commands
> manageable.
>
> However, if there is no actual use-case for combining a membarrier sync-core
> and a membarrier rseq-fence, then it limits the number of commands and maybe
> then it's acceptable to add the rseq-fence as a separate membarrier command.
>
> I prefer to have this discussion now rather than once we get to the point of
> having 40 membarrier commands for all possible combinations.

All commands are currently distinct bits, but are treated as separate commands.
One way of doing what you suggest is to allow some commands to be bitwise-ORed.

So, for example, the user could call

membarrier(CMD_PRIVATE_EXPEDITED_SYNC_CORE | CMD_PRIVATE_EXPEDITED_RSEQ, cpu_id)

Is this what you have in mind?

[...]

Thanks,
Peter

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-12 18:48       ` Peter Oskolkov
@ 2020-08-12 19:44         ` Mathieu Desnoyers
  2020-08-20 17:42           ` Peter Oskolkov
  0 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2020-08-12 19:44 UTC (permalink / raw)
  To: Peter Oskolkov, linux-arch
  Cc: Peter Zijlstra, Peter Oskolkov, paulmck, Boqun Feng,
	linux-kernel, Paul Turner, Chris Kennelly

----- On Aug 12, 2020, at 2:48 PM, Peter Oskolkov posk@posk.io wrote:

> On Wed, Aug 12, 2020 at 11:30 AM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> 
> [...]
> 
>> "flags" is there to allow extensibility without requiring to add new
>> membarrier commands for every change. Even though it is not used now,
>> I don't think re-purposing it is a good idea. What is wrong with just
>> adding an additional "cpu" parameter to the system call ?
> 
> Can we do that? I thought adding an additional parameter means adding
> another syscall (ABI => parameter types/count cannot change?)

I was under the impression that adding parameters to a system call
for new flags (or commands) was not an issue. One example is the
clone system call which expects the ctid argument if the
CLONE_CHILD_CLEARTID flag is set. But maybe it was OK at some earlier
point in time, but it's not OK anymore ? (CCing linux-arch to ask for
advice)

> 
>> A "flags" parameter is very common for system calls. I don't see why
>> we should change its name, especially given it is already exposed and
>> documented as "flags" in man pages.
>>
> 
> [...]
> 
>> We basically have the following feature matrix:
>>
>> - private / global
>> - expedited / non-expedited
>> - sync-core / non-sync-core
>> - rseq-fence / non-rseq-fence
>>
>> For a total of about 16 combinations in total if we want to support them
>> all.
>>
>> We can continue to add separate commands for new combinations, but if we
>> want to allow them to be combined, using flags rather than adding extra
>> commands would have the advantage of keeping the number of commands
>> manageable.
>>
>> However, if there is no actual use-case for combining a membarrier sync-core
>> and a membarrier rseq-fence, then it limits the number of commands and maybe
>> then it's acceptable to add the rseq-fence as a separate membarrier command.
>>
>> I prefer to have this discussion now rather than once we get to the point of
>> having 40 membarrier commands for all possible combinations.
> 
> All commands are currently distinct bits, but are treated as separate commands.

Indeed! I forgot about that. It was done so we can return a mask of supported
commands with the MEMBARRIER_CMD_QUERY for feature discoverability. Those were
never meant to be OR'd though, because then it's hard for user-space to discover
what are the allowed command combinations.

> One way of doing what you suggest is to allow some commands to be bitwise-ORed.
> 
> So, for example, the user could call
> 
> membarrier(CMD_PRIVATE_EXPEDITED_SYNC_CORE | CMD_PRIVATE_EXPEDITED_RSEQ, cpu_id)
> 
> Is this what you have in mind?

Not really. This would not take care of the fact that we would end up multiplying
the number of commands as we allow combinations. E.g. if we ever want to have RSEQ
work in private and global, and in non-expedited and expedited, we end up needing:

- CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ
- CMD_PRIVATE_EXPEDITED_RSEQ
- CMD_PRIVATE_RSEQ
- CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ
- CMD_GLOBAL_EXPEDITED_RSEQ
- CMD_GLOBAL_RSEQ

The only thing we would save by OR'ing it with the SYNC_CORE command is the additional
list:

- CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_SYNC_CORE
- CMD_PRIVATE_EXPEDITED_RSEQ_SYNC_CORE
- CMD_PRIVATE_RSEQ_SYNC_CORE
- CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ_SYNC_CORE
- CMD_GLOBAL_EXPEDITED_RSEQ_SYNC_CORE
- CMD_GLOBAL_RSEQ_SYNC_CORE

But unless we receive feedback that doing a membarrier with RSEQ+sync_core all in
one go is a significant use-case, I am tempted to leave out that scenario for now.
If we go for new commands, this means we could add (for private-expedited-rseq):

- MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ,
- MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ,

I do however have use-cases for using RSEQ across shared memory (between
processes). Not currently for a rseq-fence, but for rseq acting as per-cpu
atomic operations. If I ever end up needing rseq-fence across shared memory,
that would result in the following new commands:

- MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ,
- MEMBARRIER_CMD_GLOBAL_EXPEDITED_RSEQ,

The remaining open question is whether it would be OK to define a new
membarrier flag=MEMBARRIER_FLAG_CPU, which would expect an additional
@cpu parameter.

Thanks,

Mathieu

> 
> [...]
> 
> Thanks,
> Peter

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2 v3] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-11  0:09 ` [PATCH 2/2 v3] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
@ 2020-08-12 20:11   ` Mathieu Desnoyers
  0 siblings, 0 replies; 12+ messages in thread
From: Mathieu Desnoyers @ 2020-08-12 20:11 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: paulmck, Peter Zijlstra, Boqun Feng, linux-kernel, Paul Turner,
	Chris Kennelly, Peter Oskolkov

----- On Aug 10, 2020, at 8:09 PM, Peter Oskolkov posk@google.com wrote:

> Based on Google-internal RSEQ work done by
> Paul Turner and Andrew Hunter.
> 
> This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
> The test quite often fails without the previous patch in this patchset,
> but consistently passes with it.
> 
> v3: added rseq_offset_deref_addv() to x86_64 to make the test
>    more explicit; on other architectures I kept using existing
>    rseq_cmpeqv_cmpeqv_storev() as I have no easy way to test
>    there. Added a comment explaining why the test works this way.
> 
> Signed-off-by: Peter Oskolkov <posk@google.com>
> ---
> .../selftests/rseq/basic_percpu_ops_test.c    | 196 ++++++++++++++++++
> tools/testing/selftests/rseq/rseq-x86.h       |  55 +++++
> 2 files changed, 251 insertions(+)
> 
> diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> index eb3f6db36d36..c9784a3d19fb 100644
> --- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> @@ -3,16 +3,22 @@
> #include <assert.h>
> #include <pthread.h>
> #include <sched.h>
> +#include <stdatomic.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <stddef.h>
> +#include <syscall.h>
> +#include <unistd.h>
> 
> #include "rseq.h"
> 
> #define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
> 
> +#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ	(1<<7)
> +#define MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ	(1<<8)
> +

No need to define membarrier commands here if we include <linux/membarrier.h>

> struct percpu_lock_entry {
> 	intptr_t v;
> } __attribute__((aligned(128)));
> @@ -289,6 +295,194 @@ void test_percpu_list(void)
> 	assert(sum == expected_sum);
> }
> 
> +struct test_membarrier_thread_args {
> +	int stop;
> +	intptr_t percpu_list_ptr;
> +};
> +
> +/* Worker threads modify data in their "active" percpu lists. */
> +void *test_membarrier_worker_thread(void *arg)
> +{
> +	struct test_membarrier_thread_args *args =
> +		(struct test_membarrier_thread_args *)arg;
> +	const int iters = 10 * 1000 * 1000;
> +	int i;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	/* Wait for initialization. */
> +	while (!atomic_load(&args->percpu_list_ptr)) {}
> +
> +	for (i = 0; i < iters; ++i) {
> +		int ret;
> +
> +		do {
> +			int cpu = rseq_cpu_start();
> +#if defined(__x86_64__)
> +			/* For x86_64, we have rseq_offset_deref_addv. */
> +			ret = rseq_offset_deref_addv(&args->percpu_list_ptr,
> +							128 * cpu, 1, cpu);
> +#else
> +			/*
> +			 *  For other architectures, we rely on the fact that
> +			 *  the manager thread keeps list_ptr alive, so we can
> +			 *  use rseq_cmpeqv_cmpeqv_storev to make sure
> +			 *  list_ptr we got outside of rseq cs is still
> +			 *  "active".
> +			 */
> +			struct percpu_list *list_ptr = (struct percpu_list *)
> +					atomic_load(&args->percpu_list_ptr);
> +
> +			struct percpu_list_node *node = list_ptr->c[cpu].head;
> +			const intptr_t prev = node->data;
> +
> +			ret = rseq_cmpeqv_cmpeqv_storev(&node->data, prev,
> +					&args->percpu_list_ptr,
> +					(intptr_t)list_ptr, prev + 1, cpu);
> +#endif

Please don't special-case the implementation of a test per architecture.

We should instead "skip" (or even fail) the test on architectures that do
not support this, as an incentive for architecture maintainers to implement
the missing APIs in the test.

One way to do this would be to define RSEQ_ARCH_HAS_OFFSET_DEREF_ADDV in the
architecture header, and skip the test if the define is not present.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-12 19:44         ` Mathieu Desnoyers
@ 2020-08-20 17:42           ` Peter Oskolkov
  2020-08-25 16:58             ` Mathieu Desnoyers
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Oskolkov @ 2020-08-20 17:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-arch, Peter Zijlstra, Peter Oskolkov, paulmck, Boqun Feng,
	linux-kernel, Paul Turner, Chris Kennelly

On Wed, Aug 12, 2020 at 12:44 PM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
[...]
>
> > One way of doing what you suggest is to allow some commands to be bitwise-ORed.
> >
> > So, for example, the user could call
> >
> > membarrier(CMD_PRIVATE_EXPEDITED_SYNC_CORE | CMD_PRIVATE_EXPEDITED_RSEQ, cpu_id)
> >
> > Is this what you have in mind?
>
> Not really. This would not take care of the fact that we would end up multiplying
> the number of commands as we allow combinations. E.g. if we ever want to have RSEQ
> work in private and global, and in non-expedited and expedited, we end up needing:
>
> - CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ
> - CMD_PRIVATE_EXPEDITED_RSEQ
> - CMD_PRIVATE_RSEQ
> - CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ
> - CMD_GLOBAL_EXPEDITED_RSEQ
> - CMD_GLOBAL_RSEQ
>
> The only thing we would save by OR'ing it with the SYNC_CORE command is the additional
> list:
>
> - CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_SYNC_CORE
> - CMD_PRIVATE_EXPEDITED_RSEQ_SYNC_CORE
> - CMD_PRIVATE_RSEQ_SYNC_CORE
> - CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ_SYNC_CORE
> - CMD_GLOBAL_EXPEDITED_RSEQ_SYNC_CORE
> - CMD_GLOBAL_RSEQ_SYNC_CORE
>
> But unless we receive feedback that doing a membarrier with RSEQ+sync_core all in
> one go is a significant use-case, I am tempted to leave out that scenario for now.
> If we go for new commands, this means we could add (for private-expedited-rseq):
>
> - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ,
> - MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ,
>
> I do however have use-cases for using RSEQ across shared memory (between
> processes). Not currently for a rseq-fence, but for rseq acting as per-cpu
> atomic operations. If I ever end up needing rseq-fence across shared memory,
> that would result in the following new commands:
>
> - MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ,
> - MEMBARRIER_CMD_GLOBAL_EXPEDITED_RSEQ,
>
> The remaining open question is whether it would be OK to define a new
> membarrier flag=MEMBARRIER_FLAG_CPU, which would expect an additional
> @cpu parameter.

Hi  Mathieu,

I do not think there is any reason to wait for additional feedback, so I believe
we should finalize the API/ABI.

I see two issues to resolve:
1: how to combine commands:
  - you do not like adding new commands that are combinations of existing ones;
  - you do not like ORing.
=> I'm not sure what other options we have here?

2: should @flags be repurposed for cpu_id, or MEMBARRIER_FLAG_CPU
   added with a new syscall parameter.
=> I'm still not sure a new parameter can be cleanly added, but I can try
   it in the next patchset if you prefer it this way.

Please let me know your thoughts.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-20 17:42           ` Peter Oskolkov
@ 2020-08-25 16:58             ` Mathieu Desnoyers
  2020-08-25 17:22               ` Peter Oskolkov
  0 siblings, 1 reply; 12+ messages in thread
From: Mathieu Desnoyers @ 2020-08-25 16:58 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: linux-arch, Peter Zijlstra, Peter Oskolkov, paulmck, Boqun Feng,
	linux-kernel, Paul Turner, Chris Kennelly

----- On Aug 20, 2020, at 1:42 PM, Peter Oskolkov posk@posk.io wrote:

> On Wed, Aug 12, 2020 at 12:44 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
> [...]
>>
>> > One way of doing what you suggest is to allow some commands to be bitwise-ORed.
>> >
>> > So, for example, the user could call
>> >
>> > membarrier(CMD_PRIVATE_EXPEDITED_SYNC_CORE | CMD_PRIVATE_EXPEDITED_RSEQ, cpu_id)
>> >
>> > Is this what you have in mind?
>>
>> Not really. This would not take care of the fact that we would end up
>> multiplying
>> the number of commands as we allow combinations. E.g. if we ever want to have
>> RSEQ
>> work in private and global, and in non-expedited and expedited, we end up
>> needing:
>>
>> - CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ
>> - CMD_PRIVATE_EXPEDITED_RSEQ
>> - CMD_PRIVATE_RSEQ
>> - CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ
>> - CMD_GLOBAL_EXPEDITED_RSEQ
>> - CMD_GLOBAL_RSEQ
>>
>> The only thing we would save by OR'ing it with the SYNC_CORE command is the
>> additional
>> list:
>>
>> - CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_SYNC_CORE
>> - CMD_PRIVATE_EXPEDITED_RSEQ_SYNC_CORE
>> - CMD_PRIVATE_RSEQ_SYNC_CORE
>> - CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ_SYNC_CORE
>> - CMD_GLOBAL_EXPEDITED_RSEQ_SYNC_CORE
>> - CMD_GLOBAL_RSEQ_SYNC_CORE
>>
>> But unless we receive feedback that doing a membarrier with RSEQ+sync_core all
>> in
>> one go is a significant use-case, I am tempted to leave out that scenario for
>> now.
>> If we go for new commands, this means we could add (for private-expedited-rseq):
>>
>> - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ,
>> - MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ,
>>
>> I do however have use-cases for using RSEQ across shared memory (between
>> processes). Not currently for a rseq-fence, but for rseq acting as per-cpu
>> atomic operations. If I ever end up needing rseq-fence across shared memory,
>> that would result in the following new commands:
>>
>> - MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED_RSEQ,
>> - MEMBARRIER_CMD_GLOBAL_EXPEDITED_RSEQ,
>>
>> The remaining open question is whether it would be OK to define a new
>> membarrier flag=MEMBARRIER_FLAG_CPU, which would expect an additional
>> @cpu parameter.
> 
> Hi  Mathieu,
> 
> I do not think there is any reason to wait for additional feedback, so I believe
> we should finalize the API/ABI.
> 
> I see two issues to resolve:
> 1: how to combine commands:
>  - you do not like adding new commands that are combinations of existing ones;
>  - you do not like ORing.
> => I'm not sure what other options we have here?

Concretely speaking, let's just add a new membarrier command for the use-case
at hand. All other ways of doing things we have discussed are tricky to expose
in a way that is discoverable by user-space through the QUERY command. (using
a flag, or OR'ing many commands together)

> 
> 2: should @flags be repurposed for cpu_id, or MEMBARRIER_FLAG_CPU
>   added with a new syscall parameter.
> => I'm still not sure a new parameter can be cleanly added, but I can try
>   it in the next patchset if you prefer it this way.

Yes please, it's easy to implement and we'll quickly see if anyone yells. If
it turns out to be a bad idea, you can always blame me. ;-)

In summary:

- We add 2 new membarrier commands:
  - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ
  - MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

- We reserve a membarrier flag:

enum membarrier_flag {
  MEMBARRIER_FLAG_CPU = (1 << 0),
}

So for CMD_PRIVATE_EXPEDITED_RSEQ, if flags & MEMBARRIER_FLAG_CPU is true,
then we expect the additional "int cpu" parameter (3rd parameter). Else the cpu
parameter is unused.

Are you OK with this approach ?

Thanks,

Mathieu

> 
> Please let me know your thoughts.
> 
> Thanks,
> Peter

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  2020-08-25 16:58             ` Mathieu Desnoyers
@ 2020-08-25 17:22               ` Peter Oskolkov
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Oskolkov @ 2020-08-25 17:22 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-arch, Peter Zijlstra, Peter Oskolkov, paulmck, Boqun Feng,
	linux-kernel, Paul Turner, Chris Kennelly

On Tue, Aug 25, 2020 at 9:58 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
[...]
>
> Concretely speaking, let's just add a new membarrier command for the use-case
> at hand. All other ways of doing things we have discussed are tricky to expose
> in a way that is discoverable by user-space through the QUERY command. (using
> a flag, or OR'ing many commands together)
>
> >
> > 2: should @flags be repurposed for cpu_id, or MEMBARRIER_FLAG_CPU
> >   added with a new syscall parameter.
> > => I'm still not sure a new parameter can be cleanly added, but I can try
> >   it in the next patchset if you prefer it this way.
>
> Yes please, it's easy to implement and we'll quickly see if anyone yells. If
> it turns out to be a bad idea, you can always blame me. ;-)
>
> In summary:
>
> - We add 2 new membarrier commands:
>   - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ
>   - MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
>
> - We reserve a membarrier flag:
>
> enum membarrier_flag {
>   MEMBARRIER_FLAG_CPU = (1 << 0),
> }
>
> So for CMD_PRIVATE_EXPEDITED_RSEQ, if flags & MEMBARRIER_FLAG_CPU is true,
> then we expect the additional "int cpu" parameter (3rd parameter). Else the cpu
> parameter is unused.
>
> Are you OK with this approach ?

Yes, thanks for looking into this. I'll send a v4 later this week.

Thanks,
Peter

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-08-25 17:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-11  0:09 [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
2020-08-11  0:09 ` [PATCH 2/2 v3] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Oskolkov
2020-08-12 20:11   ` Mathieu Desnoyers
2020-08-11  6:27 ` [PATCH 1/2 v3] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ Peter Zijlstra
2020-08-11  7:54   ` Peter Zijlstra
2020-08-11 21:08   ` Peter Oskolkov
2020-08-12 18:30     ` Mathieu Desnoyers
2020-08-12 18:48       ` Peter Oskolkov
2020-08-12 19:44         ` Mathieu Desnoyers
2020-08-20 17:42           ` Peter Oskolkov
2020-08-25 16:58             ` Mathieu Desnoyers
2020-08-25 17:22               ` Peter Oskolkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).