linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
@ 2020-08-06 17:05 Peter Oskolkov
  2020-08-06 17:05 ` [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
  2020-08-07 13:37 ` [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU peterz
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Oskolkov @ 2020-08-06 17:05 UTC (permalink / raw)
  To: Mathieu Desnoyers, Paul E . McKenney, Peter Zijlstra, Boqun Feng,
	linux-kernel
  Cc: Paul Turner, Chris Kennelly, Peter Oskolkov, Peter Oskolkov

This patchset is based on Google-internal RSEQ
work done by Paul Turner and Andrew Hunter.

When working with per-CPU RSEQ-based memory allocations,
it is sometimes important to make sure that a global
memory location is no longer accessed from RSEQ critical
sections. For example, there can be two per-CPU lists,
one is "active" and accessed per-CPU, while another one
is inactive and worked on asynchronously "off CPU" (e.g.
garbage collection is performed). Then at some point
the two lists are swapped, and a fast RCU-like mechanism
is required to make sure that the previously active
list is no longer accessed.

This patch introduces such a mechanism: in short,
membarrier() syscall issues an IPI to a CPU, restarting
a potentially active RSEQ critical section on the CPU.

v1->v2:
  - removed the ability to IPI all CPUs in a single sycall;
  - use task->mm rather than task->group_leader to identify
    tasks belonging to the same process.

The second patch in the patchset adds a selftest
of this feature.

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 include/uapi/linux/membarrier.h |  5 ++++
 kernel/sched/membarrier.c       | 43 +++++++++++++++++++++++++++++++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 5891d7614c8c..ce4628ea17fa 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -114,6 +114,10 @@
  *                          If this command is not implemented by an
  *                          architecture, -EINVAL is returned.
  *                          Returns 0 on success.
+ * @MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU:
+ *                          If a thread belonging to the current process
+ *                          is currently in an RSEQ critical section on the
+ *                          CPU identified by flags parameter, restart it.
  * @MEMBARRIER_CMD_SHARED:
  *                          Alias to MEMBARRIER_CMD_GLOBAL. Provided for
  *                          header backward compatibility.
@@ -131,6 +135,7 @@ enum membarrier_cmd {
 	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
 	MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE		= (1 << 5),
 	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE	= (1 << 6),
+	MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU		= (1 << 7),
 
 	/* Alias for header backward compatibility. */
 	MEMBARRIER_CMD_SHARED			= MEMBARRIER_CMD_GLOBAL,
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 168479a7d61b..1cdfde23696c 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -18,11 +18,19 @@
 #define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK	0
 #endif
 
+#ifdef CONFIG_RSEQ
+#define MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU_BITMASK		\
+	MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
+#else
+#define MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU_BITMASK	0
+#endif
+
 #define MEMBARRIER_CMD_BITMASK						\
 	(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED	\
 	| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED			\
 	| MEMBARRIER_CMD_PRIVATE_EXPEDITED				\
 	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED			\
+	| MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU_BITMASK		\
 	| MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK)
 
 static void ipi_mb(void *info)
@@ -308,10 +316,39 @@ static int membarrier_register_private_expedited(int flags)
 	return 0;
 }
 
+#ifdef CONFIG_RSEQ
+static void membarrier_rseq_ipi(void *arg)
+{
+	if (current->mm != arg)  /* Not our process. */
+		return;
+	if (!current->rseq)  /* RSEQ not set up for the current task/thread. */
+		return;
+
+	rseq_preempt(current);
+}
+#endif
+
+static int membarrier_private_restart_rseq_on_cpu(int cpu_id)
+{
+#ifdef CONFIG_RSEQ
+	/* syscalls are not allowed inside rseq critical sections. */
+	if (cpu_id == raw_smp_processor_id())
+		return 0;
+
+	return smp_call_function_single(cpu_id, membarrier_rseq_ipi,
+					current->mm, true);
+#else
+	return 0;
+#endif
+}
+
 /**
  * sys_membarrier - issue memory barriers on a set of threads
  * @cmd:   Takes command values defined in enum membarrier_cmd.
- * @flags: Currently needs to be 0. For future extensions.
+ * @flags: Currently needs to be 0 for all commands other than
+ *         MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU: in the latter
+ *         case it indicates the CPU on which to interrupt (= restart)
+ *         the RSEQ critical section.
  *
  * If this system call is not implemented, -ENOSYS is returned. If the
  * command specified does not exist, not available on the running
@@ -339,7 +376,7 @@ static int membarrier_register_private_expedited(int flags)
  */
 SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 {
-	if (unlikely(flags))
+	if (unlikely(flags && cmd != MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU))
 		return -EINVAL;
 	switch (cmd) {
 	case MEMBARRIER_CMD_QUERY:
@@ -369,6 +406,8 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
 		return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+	case MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU:
+		return membarrier_private_restart_rseq_on_cpu(flags);
 	default:
 		return -EINVAL;
 	}
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-06 17:05 [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
@ 2020-08-06 17:05 ` Peter Oskolkov
  2020-08-07  0:27   ` Boqun Feng
  2020-08-07 13:37 ` [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU peterz
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Oskolkov @ 2020-08-06 17:05 UTC (permalink / raw)
  To: Mathieu Desnoyers, Paul E . McKenney, Peter Zijlstra, Boqun Feng,
	linux-kernel
  Cc: Paul Turner, Chris Kennelly, Peter Oskolkov, Peter Oskolkov

Based on Google-internal RSEQ work done by
Paul Turner and Andrew Hunter.

This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU.
The test quite often fails without the previous patch in this patchset,
but consistently passes with it.

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 .../selftests/rseq/basic_percpu_ops_test.c    | 181 ++++++++++++++++++
 1 file changed, 181 insertions(+)

diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
index eb3f6db36d36..147c80deac19 100644
--- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
+++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
@@ -3,16 +3,21 @@
 #include <assert.h>
 #include <pthread.h>
 #include <sched.h>
+#include <stdatomic.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <stddef.h>
+#include <syscall.h>
+#include <unistd.h>
 
 #include "rseq.h"
 
 #define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
 
+#define MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU	(1<<7)
+
 struct percpu_lock_entry {
 	intptr_t v;
 } __attribute__((aligned(128)));
@@ -289,6 +294,180 @@ void test_percpu_list(void)
 	assert(sum == expected_sum);
 }
 
+struct test_membarrier_thread_args {
+	int stop;
+	intptr_t percpu_list_ptr;
+};
+
+/* Worker threads modify data in their "active" percpu lists. */
+void *test_membarrier_worker_thread(void *arg)
+{
+	struct test_membarrier_thread_args *args =
+		(struct test_membarrier_thread_args *)arg;
+	const int iters = 10 * 1000 * 1000;
+	int i;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	for (i = 0; i < iters; ++i) {
+		while (true) {
+			int cpu, ret;
+			struct percpu_list *list_ptr = (struct percpu_list *)
+				atomic_load(&args->percpu_list_ptr);
+
+			if (!list_ptr)
+				continue;  /* Not yet initialized. */
+
+			cpu = rseq_cpu_start();
+			struct percpu_list_node *node = list_ptr->c[cpu].head;
+			const intptr_t prev = node->data;
+
+			ret = rseq_cmpeqv_cmpeqv_storev(&node->data, prev,
+					&args->percpu_list_ptr,
+					(intptr_t)list_ptr, prev + 1, cpu);
+			if (!ret)
+				break;  /* Success. */
+		}
+	}
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	return NULL;
+}
+
+void test_membarrier_init_percpu_list(struct percpu_list *list)
+{
+	int i;
+
+	memset(list, 0, sizeof(*list));
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		struct percpu_list_node *node;
+
+		node = malloc(sizeof(*node));
+		assert(node);
+		node->data = 0;
+		node->next = NULL;
+		list->c[i].head = node;
+	}
+}
+
+void test_membarrier_free_percpu_list(struct percpu_list *list)
+{
+	int i;
+
+	for (i = 0; i < CPU_SETSIZE; i++)
+		free(list->c[i].head);
+}
+
+static int sys_membarrier(int cmd, int flags)
+{
+	return syscall(__NR_membarrier, cmd, flags);
+}
+
+/*
+ * The manager thread swaps per-cpu lists that worker threads see,
+ * and validates that there are no unexpected modifications.
+ */
+void *test_membarrier_manager_thread(void *arg)
+{
+	struct test_membarrier_thread_args *args =
+		(struct test_membarrier_thread_args *)arg;
+	struct percpu_list list_a, list_b;
+	intptr_t expect_a = 0, expect_b = 0;
+	int cpu_a = 0, cpu_b = 0;
+
+	if (rseq_register_current_thread()) {
+		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+
+	/* Init lists. */
+	test_membarrier_init_percpu_list(&list_a);
+	test_membarrier_init_percpu_list(&list_b);
+
+	atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
+
+	while (!atomic_load(&args->stop)) {
+		/* list_a is "active". */
+		cpu_a = rand() % CPU_SETSIZE;
+		/*
+		 * As list_b is "inactive", we should never see changes
+		 * to list_b.
+		 */
+		if (expect_b != atomic_load(&list_b.c[cpu_b].head->data)) {
+			fprintf(stderr, "Membarrier test failed\n");
+			abort();
+		}
+
+		/* Make list_b "active". */
+		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_b);
+		sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_a);
+		/*
+		 * Cpu A should now only modify list_b, so we values
+		 * in list_a should be stable.
+		 */
+		expect_a = atomic_load(&list_a.c[cpu_a].head->data);
+
+		cpu_b = rand() % CPU_SETSIZE;
+		/*
+		 * As list_a is "inactive", we should never see changes
+		 * to list_a.
+		 */
+		if (expect_a != atomic_load(&list_a.c[cpu_a].head->data)) {
+			fprintf(stderr, "Membarrier test failed\n");
+			abort();
+		}
+
+		/* Make list_a "active". */
+		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
+		sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_b);
+		/* Remember a value from list_b. */
+		expect_b = atomic_load(&list_b.c[cpu_b].head->data);
+	}
+
+	test_membarrier_free_percpu_list(&list_a);
+	test_membarrier_free_percpu_list(&list_b);
+
+	if (rseq_unregister_current_thread()) {
+		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
+			errno, strerror(errno));
+		abort();
+	}
+	return NULL;
+}
+
+/* Test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU membarrier command. */
+void test_membarrier(void)
+{
+	struct test_membarrier_thread_args thread_args;
+	pthread_t worker_threads[CPU_SETSIZE];
+	pthread_t manager_thread;
+	int i;
+
+	thread_args.stop = 0;
+	thread_args.percpu_list_ptr = 0;
+	pthread_create(&manager_thread, NULL,
+		       test_membarrier_manager_thread, &thread_args);
+
+	for (i = 0; i < CPU_SETSIZE; i++)
+		pthread_create(&worker_threads[i], NULL,
+		       test_membarrier_worker_thread, &thread_args);
+
+	for (i = 0; i < CPU_SETSIZE; i++)
+		pthread_join(worker_threads[i], NULL);
+
+	atomic_store(&thread_args.stop, 1);
+	pthread_join(manager_thread, NULL);
+}
+
 int main(int argc, char **argv)
 {
 	if (rseq_register_current_thread()) {
@@ -300,6 +479,8 @@ int main(int argc, char **argv)
 	test_percpu_spinlock();
 	printf("percpu_list\n");
 	test_percpu_list();
+	printf("membarrier\n");
+	test_membarrier();
 	if (rseq_unregister_current_thread()) {
 		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
 			errno, strerror(errno));
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-06 17:05 ` [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
@ 2020-08-07  0:27   ` Boqun Feng
  2020-08-07 12:54     ` Mathieu Desnoyers
  2020-08-07 17:55     ` Peter Oskolkov
  0 siblings, 2 replies; 10+ messages in thread
From: Boqun Feng @ 2020-08-07  0:27 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Mathieu Desnoyers, Paul E . McKenney, Peter Zijlstra,
	linux-kernel, Paul Turner, Chris Kennelly, Peter Oskolkov

[-- Attachment #1: Type: text/plain, Size: 7772 bytes --]

On Thu, Aug 06, 2020 at 10:05:44AM -0700, Peter Oskolkov wrote:
> Based on Google-internal RSEQ work done by
> Paul Turner and Andrew Hunter.
> 
> This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU.
> The test quite often fails without the previous patch in this patchset,
> but consistently passes with it.
> 
> Signed-off-by: Peter Oskolkov <posk@google.com>
> ---
>  .../selftests/rseq/basic_percpu_ops_test.c    | 181 ++++++++++++++++++
>  1 file changed, 181 insertions(+)
> 
> diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> index eb3f6db36d36..147c80deac19 100644
> --- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> @@ -3,16 +3,21 @@
>  #include <assert.h>
>  #include <pthread.h>
>  #include <sched.h>
> +#include <stdatomic.h>
>  #include <stdint.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <stddef.h>
> +#include <syscall.h>
> +#include <unistd.h>
>  
>  #include "rseq.h"
>  
>  #define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
>  
> +#define MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU	(1<<7)
> +
>  struct percpu_lock_entry {
>  	intptr_t v;
>  } __attribute__((aligned(128)));
> @@ -289,6 +294,180 @@ void test_percpu_list(void)
>  	assert(sum == expected_sum);
>  }
>  
> +struct test_membarrier_thread_args {
> +	int stop;
> +	intptr_t percpu_list_ptr;
> +};
> +
> +/* Worker threads modify data in their "active" percpu lists. */
> +void *test_membarrier_worker_thread(void *arg)
> +{
> +	struct test_membarrier_thread_args *args =
> +		(struct test_membarrier_thread_args *)arg;
> +	const int iters = 10 * 1000 * 1000;
> +	int i;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	for (i = 0; i < iters; ++i) {
> +		while (true) {
> +			int cpu, ret;
> +			struct percpu_list *list_ptr = (struct percpu_list *)
> +				atomic_load(&args->percpu_list_ptr);
> +

What if the manager thread update ->percpu_list_ptr and call
membarrier() here? I.e.

	CPU0			CPU1
				list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
	
	atomic_store(&args->percpu_list_ptr, list_a);
	sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to restart rseq.cs on CPU1

				<got IPI, but not in a rseq.cs, so nothing to do>
				cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!

The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
is outside the rseq.cs, simply restarting rseq doesn't kill this
reference.

Am I missing something subtle?

Regards,
Boqun

> +			if (!list_ptr)
> +				continue;  /* Not yet initialized. */
> +
> +			cpu = rseq_cpu_start();
> +			struct percpu_list_node *node = list_ptr->c[cpu].head;
> +			const intptr_t prev = node->data;
> +
> +			ret = rseq_cmpeqv_cmpeqv_storev(&node->data, prev,
> +					&args->percpu_list_ptr,
> +					(intptr_t)list_ptr, prev + 1, cpu);
> +			if (!ret)
> +				break;  /* Success. */
> +		}
> +	}
> +
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +	return NULL;
> +}
> +
> +void test_membarrier_init_percpu_list(struct percpu_list *list)
> +{
> +	int i;
> +
> +	memset(list, 0, sizeof(*list));
> +	for (i = 0; i < CPU_SETSIZE; i++) {
> +		struct percpu_list_node *node;
> +
> +		node = malloc(sizeof(*node));
> +		assert(node);
> +		node->data = 0;
> +		node->next = NULL;
> +		list->c[i].head = node;
> +	}
> +}
> +
> +void test_membarrier_free_percpu_list(struct percpu_list *list)
> +{
> +	int i;
> +
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		free(list->c[i].head);
> +}
> +
> +static int sys_membarrier(int cmd, int flags)
> +{
> +	return syscall(__NR_membarrier, cmd, flags);
> +}
> +
> +/*
> + * The manager thread swaps per-cpu lists that worker threads see,
> + * and validates that there are no unexpected modifications.
> + */
> +void *test_membarrier_manager_thread(void *arg)
> +{
> +	struct test_membarrier_thread_args *args =
> +		(struct test_membarrier_thread_args *)arg;
> +	struct percpu_list list_a, list_b;
> +	intptr_t expect_a = 0, expect_b = 0;
> +	int cpu_a = 0, cpu_b = 0;
> +
> +	if (rseq_register_current_thread()) {
> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +
> +	/* Init lists. */
> +	test_membarrier_init_percpu_list(&list_a);
> +	test_membarrier_init_percpu_list(&list_b);
> +
> +	atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
> +
> +	while (!atomic_load(&args->stop)) {
> +		/* list_a is "active". */
> +		cpu_a = rand() % CPU_SETSIZE;
> +		/*
> +		 * As list_b is "inactive", we should never see changes
> +		 * to list_b.
> +		 */
> +		if (expect_b != atomic_load(&list_b.c[cpu_b].head->data)) {
> +			fprintf(stderr, "Membarrier test failed\n");
> +			abort();
> +		}
> +
> +		/* Make list_b "active". */
> +		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_b);
> +		sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_a);
> +		/*
> +		 * Cpu A should now only modify list_b, so we values
> +		 * in list_a should be stable.
> +		 */
> +		expect_a = atomic_load(&list_a.c[cpu_a].head->data);
> +
> +		cpu_b = rand() % CPU_SETSIZE;
> +		/*
> +		 * As list_a is "inactive", we should never see changes
> +		 * to list_a.
> +		 */
> +		if (expect_a != atomic_load(&list_a.c[cpu_a].head->data)) {
> +			fprintf(stderr, "Membarrier test failed\n");
> +			abort();
> +		}
> +
> +		/* Make list_a "active". */
> +		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
> +		sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_b);
> +		/* Remember a value from list_b. */
> +		expect_b = atomic_load(&list_b.c[cpu_b].head->data);
> +	}
> +
> +	test_membarrier_free_percpu_list(&list_a);
> +	test_membarrier_free_percpu_list(&list_b);
> +
> +	if (rseq_unregister_current_thread()) {
> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> +			errno, strerror(errno));
> +		abort();
> +	}
> +	return NULL;
> +}
> +
> +/* Test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU membarrier command. */
> +void test_membarrier(void)
> +{
> +	struct test_membarrier_thread_args thread_args;
> +	pthread_t worker_threads[CPU_SETSIZE];
> +	pthread_t manager_thread;
> +	int i;
> +
> +	thread_args.stop = 0;
> +	thread_args.percpu_list_ptr = 0;
> +	pthread_create(&manager_thread, NULL,
> +		       test_membarrier_manager_thread, &thread_args);
> +
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		pthread_create(&worker_threads[i], NULL,
> +		       test_membarrier_worker_thread, &thread_args);
> +
> +	for (i = 0; i < CPU_SETSIZE; i++)
> +		pthread_join(worker_threads[i], NULL);
> +
> +	atomic_store(&thread_args.stop, 1);
> +	pthread_join(manager_thread, NULL);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	if (rseq_register_current_thread()) {
> @@ -300,6 +479,8 @@ int main(int argc, char **argv)
>  	test_percpu_spinlock();
>  	printf("percpu_list\n");
>  	test_percpu_list();
> +	printf("membarrier\n");
> +	test_membarrier();
>  	if (rseq_unregister_current_thread()) {
>  		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
>  			errno, strerror(errno));
> -- 
> 2.28.0.163.g6104cc2f0b6-goog
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-07  0:27   ` Boqun Feng
@ 2020-08-07 12:54     ` Mathieu Desnoyers
  2020-08-07 17:55     ` Peter Oskolkov
  1 sibling, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-08-07 12:54 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Oskolkov, paulmck, Peter Zijlstra, linux-kernel,
	Paul Turner, Chris Kennelly, Peter Oskolkov

----- On Aug 6, 2020, at 8:27 PM, Boqun Feng boqun.feng@gmail.com wrote:

> On Thu, Aug 06, 2020 at 10:05:44AM -0700, Peter Oskolkov wrote:
>> Based on Google-internal RSEQ work done by
>> Paul Turner and Andrew Hunter.
>> 
>> This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU.
>> The test quite often fails without the previous patch in this patchset,
>> but consistently passes with it.
>> 
>> Signed-off-by: Peter Oskolkov <posk@google.com>
>> ---
>>  .../selftests/rseq/basic_percpu_ops_test.c    | 181 ++++++++++++++++++
>>  1 file changed, 181 insertions(+)
>> 
>> diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
>> b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
>> index eb3f6db36d36..147c80deac19 100644
>> --- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
>> +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
>> @@ -3,16 +3,21 @@
>>  #include <assert.h>
>>  #include <pthread.h>
>>  #include <sched.h>
>> +#include <stdatomic.h>
>>  #include <stdint.h>
>>  #include <stdio.h>
>>  #include <stdlib.h>
>>  #include <string.h>
>>  #include <stddef.h>
>> +#include <syscall.h>
>> +#include <unistd.h>
>>  
>>  #include "rseq.h"
>>  
>>  #define ARRAY_SIZE(arr)	(sizeof(arr) / sizeof((arr)[0]))
>>  
>> +#define MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU	(1<<7)
>> +
>>  struct percpu_lock_entry {
>>  	intptr_t v;
>>  } __attribute__((aligned(128)));
>> @@ -289,6 +294,180 @@ void test_percpu_list(void)
>>  	assert(sum == expected_sum);
>>  }
>>  
>> +struct test_membarrier_thread_args {
>> +	int stop;
>> +	intptr_t percpu_list_ptr;
>> +};
>> +
>> +/* Worker threads modify data in their "active" percpu lists. */
>> +void *test_membarrier_worker_thread(void *arg)
>> +{
>> +	struct test_membarrier_thread_args *args =
>> +		(struct test_membarrier_thread_args *)arg;
>> +	const int iters = 10 * 1000 * 1000;
>> +	int i;
>> +
>> +	if (rseq_register_current_thread()) {
>> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
>> +			errno, strerror(errno));
>> +		abort();
>> +	}
>> +
>> +	for (i = 0; i < iters; ++i) {
>> +		while (true) {
>> +			int cpu, ret;
>> +			struct percpu_list *list_ptr = (struct percpu_list *)
>> +				atomic_load(&args->percpu_list_ptr);
>> +
> 
> What if the manager thread update ->percpu_list_ptr and call
> membarrier() here? I.e.
> 
>	CPU0			CPU1
>				list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
>	
>	atomic_store(&args->percpu_list_ptr, list_a);
>	sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to
>	restart rseq.cs on CPU1
> 
>				<got IPI, but not in a rseq.cs, so nothing to do>
>				cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!
> 
> The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
> is outside the rseq.cs, simply restarting rseq doesn't kill this
> reference.
> 
> Am I missing something subtle?

I'm with you on this, something looks fishy. It would be good to use
delay-inducing testing methods like rseq parametrized selftests to
increase the odds of hitting this race more reliably.

Thanks,

Mathieu

> 
> Regards,
> Boqun
> 
>> +			if (!list_ptr)
>> +				continue;  /* Not yet initialized. */
>> +
>> +			cpu = rseq_cpu_start();
>> +			struct percpu_list_node *node = list_ptr->c[cpu].head;
>> +			const intptr_t prev = node->data;
>> +
>> +			ret = rseq_cmpeqv_cmpeqv_storev(&node->data, prev,
>> +					&args->percpu_list_ptr,
>> +					(intptr_t)list_ptr, prev + 1, cpu);
>> +			if (!ret)
>> +				break;  /* Success. */
>> +		}
>> +	}
>> +
>> +	if (rseq_unregister_current_thread()) {
>> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d):
>> %s\n",
>> +			errno, strerror(errno));
>> +		abort();
>> +	}
>> +	return NULL;
>> +}
>> +
>> +void test_membarrier_init_percpu_list(struct percpu_list *list)
>> +{
>> +	int i;
>> +
>> +	memset(list, 0, sizeof(*list));
>> +	for (i = 0; i < CPU_SETSIZE; i++) {
>> +		struct percpu_list_node *node;
>> +
>> +		node = malloc(sizeof(*node));
>> +		assert(node);
>> +		node->data = 0;
>> +		node->next = NULL;
>> +		list->c[i].head = node;
>> +	}
>> +}
>> +
>> +void test_membarrier_free_percpu_list(struct percpu_list *list)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < CPU_SETSIZE; i++)
>> +		free(list->c[i].head);
>> +}
>> +
>> +static int sys_membarrier(int cmd, int flags)
>> +{
>> +	return syscall(__NR_membarrier, cmd, flags);
>> +}
>> +
>> +/*
>> + * The manager thread swaps per-cpu lists that worker threads see,
>> + * and validates that there are no unexpected modifications.
>> + */
>> +void *test_membarrier_manager_thread(void *arg)
>> +{
>> +	struct test_membarrier_thread_args *args =
>> +		(struct test_membarrier_thread_args *)arg;
>> +	struct percpu_list list_a, list_b;
>> +	intptr_t expect_a = 0, expect_b = 0;
>> +	int cpu_a = 0, cpu_b = 0;
>> +
>> +	if (rseq_register_current_thread()) {
>> +		fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
>> +			errno, strerror(errno));
>> +		abort();
>> +	}
>> +
>> +	/* Init lists. */
>> +	test_membarrier_init_percpu_list(&list_a);
>> +	test_membarrier_init_percpu_list(&list_b);
>> +
>> +	atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
>> +
>> +	while (!atomic_load(&args->stop)) {
>> +		/* list_a is "active". */
>> +		cpu_a = rand() % CPU_SETSIZE;
>> +		/*
>> +		 * As list_b is "inactive", we should never see changes
>> +		 * to list_b.
>> +		 */
>> +		if (expect_b != atomic_load(&list_b.c[cpu_b].head->data)) {
>> +			fprintf(stderr, "Membarrier test failed\n");
>> +			abort();
>> +		}
>> +
>> +		/* Make list_b "active". */
>> +		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_b);
>> +		sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_a);
>> +		/*
>> +		 * Cpu A should now only modify list_b, so we values
>> +		 * in list_a should be stable.
>> +		 */
>> +		expect_a = atomic_load(&list_a.c[cpu_a].head->data);
>> +
>> +		cpu_b = rand() % CPU_SETSIZE;
>> +		/*
>> +		 * As list_a is "inactive", we should never see changes
>> +		 * to list_a.
>> +		 */
>> +		if (expect_a != atomic_load(&list_a.c[cpu_a].head->data)) {
>> +			fprintf(stderr, "Membarrier test failed\n");
>> +			abort();
>> +		}
>> +
>> +		/* Make list_a "active". */
>> +		atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
>> +		sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_b);
>> +		/* Remember a value from list_b. */
>> +		expect_b = atomic_load(&list_b.c[cpu_b].head->data);
>> +	}
>> +
>> +	test_membarrier_free_percpu_list(&list_a);
>> +	test_membarrier_free_percpu_list(&list_b);
>> +
>> +	if (rseq_unregister_current_thread()) {
>> +		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d):
>> %s\n",
>> +			errno, strerror(errno));
>> +		abort();
>> +	}
>> +	return NULL;
>> +}
>> +
>> +/* Test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU membarrier command. */
>> +void test_membarrier(void)
>> +{
>> +	struct test_membarrier_thread_args thread_args;
>> +	pthread_t worker_threads[CPU_SETSIZE];
>> +	pthread_t manager_thread;
>> +	int i;
>> +
>> +	thread_args.stop = 0;
>> +	thread_args.percpu_list_ptr = 0;
>> +	pthread_create(&manager_thread, NULL,
>> +		       test_membarrier_manager_thread, &thread_args);
>> +
>> +	for (i = 0; i < CPU_SETSIZE; i++)
>> +		pthread_create(&worker_threads[i], NULL,
>> +		       test_membarrier_worker_thread, &thread_args);
>> +
>> +	for (i = 0; i < CPU_SETSIZE; i++)
>> +		pthread_join(worker_threads[i], NULL);
>> +
>> +	atomic_store(&thread_args.stop, 1);
>> +	pthread_join(manager_thread, NULL);
>> +}
>> +
>>  int main(int argc, char **argv)
>>  {
>>  	if (rseq_register_current_thread()) {
>> @@ -300,6 +479,8 @@ int main(int argc, char **argv)
>>  	test_percpu_spinlock();
>>  	printf("percpu_list\n");
>>  	test_percpu_list();
>> +	printf("membarrier\n");
>> +	test_membarrier();
>>  	if (rseq_unregister_current_thread()) {
>>  		fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
>>  			errno, strerror(errno));
>> --
>> 2.28.0.163.g6104cc2f0b6-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-06 17:05 [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
  2020-08-06 17:05 ` [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
@ 2020-08-07 13:37 ` peterz
  2020-08-07 17:50   ` Peter Oskolkov
  1 sibling, 1 reply; 10+ messages in thread
From: peterz @ 2020-08-07 13:37 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Mathieu Desnoyers, Paul E . McKenney, Boqun Feng, linux-kernel,
	Paul Turner, Chris Kennelly, Peter Oskolkov

On Thu, Aug 06, 2020 at 10:05:43AM -0700, Peter Oskolkov wrote:
> +#ifdef CONFIG_RSEQ
> +static void membarrier_rseq_ipi(void *arg)
> +{
> +	if (current->mm != arg)  /* Not our process. */
> +		return;
> +	if (!current->rseq)  /* RSEQ not set up for the current task/thread. */
> +		return;
> +
> +	rseq_preempt(current);
> +}
> +#endif
> +
> +static int membarrier_private_restart_rseq_on_cpu(int cpu_id)
> +{
> +#ifdef CONFIG_RSEQ
> +	/* syscalls are not allowed inside rseq critical sections. */
> +	if (cpu_id == raw_smp_processor_id())
> +		return 0;
> +
> +	return smp_call_function_single(cpu_id, membarrier_rseq_ipi,
> +					current->mm, true);
> +#else
> +	return 0;
> +#endif
> +}

I'm thinking even this is a problem, we can end up sending IPIs to CPUs
outside out partition (they might be NOHZ_FULL) and that's a no-no too.

Something like so perhaps... that really limits it to CPUs that match
our mm.

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 6be66f52a2ad..bee5e98e6774 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -356,6 +356,7 @@ enum {
 
 enum {
 	MEMBARRIER_FLAG_SYNC_CORE	= (1U << 0),
+	MEMBARRIER_FLAG_RSEQ		= (1U << 1),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 168479a7d61b..4d9b22c2f5e2 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -27,6 +27,11 @@
 
 static void ipi_mb(void *info)
 {
+	int *flags = info;
+
+	if (flags && (*flags & MEMBARRIER_FLAG_RSEQ))
+		rseq_preempt(current);
+
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
@@ -129,11 +134,11 @@ static int membarrier_global_expedited(void)
 	return 0;
 }
 
-static int membarrier_private_expedited(int flags)
+static int membarrier_private_expedited(int flags, int cpu_id)
 {
-	int cpu;
-	cpumask_var_t tmpmask;
 	struct mm_struct *mm = current->mm;
+	cpumask_var_t tmpmask;
+	int cpu;
 
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
@@ -174,6 +179,10 @@ static int membarrier_private_expedited(int flags)
 		 */
 		if (cpu == raw_smp_processor_id())
 			continue;
+
+		if (cpu_id >= 0 && cpu != cpu_id)
+			continue;
+
 		p = rcu_dereference(cpu_rq(cpu)->curr);
 		if (p && p->mm == mm)
 			__cpumask_set_cpu(cpu, tmpmask);
@@ -181,7 +190,7 @@ static int membarrier_private_expedited(int flags)
 	rcu_read_unlock();
 
 	preempt_disable();
-	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	smp_call_function_many(tmpmask, ipi_mb, &flags, 1);
 	preempt_enable();
 
 	free_cpumask_var(tmpmask);
@@ -362,11 +371,13 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 	case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
 		return membarrier_register_global_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		return membarrier_private_expedited(0);
+		return membarrier_private_expedited(0, -1);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
 		return membarrier_register_private_expedited(0);
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
-		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, -1);
+	case MEMBERRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+		return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, flags);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
 		return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
 	default:

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-07 13:37 ` [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU peterz
@ 2020-08-07 17:50   ` Peter Oskolkov
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Oskolkov @ 2020-08-07 17:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Peter Oskolkov, Mathieu Desnoyers, Paul E . McKenney, Boqun Feng,
	Linux Kernel Mailing List, Paul Turner, Chris Kennelly

On Fri, Aug 7, 2020 at 6:38 AM <peterz@infradead.org> wrote:
>

[...]

> I'm thinking even this is a problem, we can end up sending IPIs to CPUs
> outside out partition (they might be NOHZ_FULL) and that's a no-no too.
>
> Something like so perhaps... that really limits it to CPUs that match
> our mm.

Thanks for the suggestion - I'll prepare a v3 based on your and
Mathieu's feedback.

>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 6be66f52a2ad..bee5e98e6774 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -356,6 +356,7 @@ enum {
>
>  enum {
>         MEMBARRIER_FLAG_SYNC_CORE       = (1U << 0),
> +       MEMBARRIER_FLAG_RSEQ            = (1U << 1),
>  };
>
>  #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 168479a7d61b..4d9b22c2f5e2 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -27,6 +27,11 @@
>
>  static void ipi_mb(void *info)
>  {
> +       int *flags = info;
> +
> +       if (flags && (*flags & MEMBARRIER_FLAG_RSEQ))
> +               rseq_preempt(current);
> +
>         smp_mb();       /* IPIs should be serializing but paranoid. */
>  }
>
> @@ -129,11 +134,11 @@ static int membarrier_global_expedited(void)
>         return 0;
>  }
>
> -static int membarrier_private_expedited(int flags)
> +static int membarrier_private_expedited(int flags, int cpu_id)
>  {
> -       int cpu;
> -       cpumask_var_t tmpmask;
>         struct mm_struct *mm = current->mm;
> +       cpumask_var_t tmpmask;
> +       int cpu;
>
>         if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
>                 if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
> @@ -174,6 +179,10 @@ static int membarrier_private_expedited(int flags)
>                  */
>                 if (cpu == raw_smp_processor_id())
>                         continue;
> +
> +               if (cpu_id >= 0 && cpu != cpu_id)
> +                       continue;
> +
>                 p = rcu_dereference(cpu_rq(cpu)->curr);
>                 if (p && p->mm == mm)
>                         __cpumask_set_cpu(cpu, tmpmask);
> @@ -181,7 +190,7 @@ static int membarrier_private_expedited(int flags)
>         rcu_read_unlock();
>
>         preempt_disable();
> -       smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
> +       smp_call_function_many(tmpmask, ipi_mb, &flags, 1);
>         preempt_enable();
>
>         free_cpumask_var(tmpmask);
> @@ -362,11 +371,13 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>         case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
>                 return membarrier_register_global_expedited();
>         case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
> -               return membarrier_private_expedited(0);
> +               return membarrier_private_expedited(0, -1);
>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
>                 return membarrier_register_private_expedited(0);
>         case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
> -               return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
> +               return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, -1);
> +       case MEMBERRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
> +               return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, flags);
>         case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
>                 return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
>         default:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-07  0:27   ` Boqun Feng
  2020-08-07 12:54     ` Mathieu Desnoyers
@ 2020-08-07 17:55     ` Peter Oskolkov
  2020-08-07 18:25       ` Mathieu Desnoyers
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Oskolkov @ 2020-08-07 17:55 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Oskolkov, Mathieu Desnoyers, Paul E . McKenney,
	Peter Zijlstra, Linux Kernel Mailing List, Paul Turner,
	Chris Kennelly

On Thu, Aug 6, 2020 at 5:27 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Thu, Aug 06, 2020 at 10:05:44AM -0700, Peter Oskolkov wrote:
> > Based on Google-internal RSEQ work done by
> > Paul Turner and Andrew Hunter.
> >
> > This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU.
> > The test quite often fails without the previous patch in this patchset,
> > but consistently passes with it.
> >
> > Signed-off-by: Peter Oskolkov <posk@google.com>
> > ---
> >  .../selftests/rseq/basic_percpu_ops_test.c    | 181 ++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >
> > diff --git a/tools/testing/selftests/rseq/basic_percpu_ops_test.c b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> > index eb3f6db36d36..147c80deac19 100644
> > --- a/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> > +++ b/tools/testing/selftests/rseq/basic_percpu_ops_test.c
> > @@ -3,16 +3,21 @@
> >  #include <assert.h>
> >  #include <pthread.h>
> >  #include <sched.h>
> > +#include <stdatomic.h>
> >  #include <stdint.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> >  #include <string.h>
> >  #include <stddef.h>
> > +#include <syscall.h>
> > +#include <unistd.h>
> >
> >  #include "rseq.h"
> >
> >  #define ARRAY_SIZE(arr)      (sizeof(arr) / sizeof((arr)[0]))
> >
> > +#define MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU   (1<<7)
> > +
> >  struct percpu_lock_entry {
> >       intptr_t v;
> >  } __attribute__((aligned(128)));
> > @@ -289,6 +294,180 @@ void test_percpu_list(void)
> >       assert(sum == expected_sum);
> >  }
> >
> > +struct test_membarrier_thread_args {
> > +     int stop;
> > +     intptr_t percpu_list_ptr;
> > +};
> > +
> > +/* Worker threads modify data in their "active" percpu lists. */
> > +void *test_membarrier_worker_thread(void *arg)
> > +{
> > +     struct test_membarrier_thread_args *args =
> > +             (struct test_membarrier_thread_args *)arg;
> > +     const int iters = 10 * 1000 * 1000;
> > +     int i;
> > +
> > +     if (rseq_register_current_thread()) {
> > +             fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> > +                     errno, strerror(errno));
> > +             abort();
> > +     }
> > +
> > +     for (i = 0; i < iters; ++i) {
> > +             while (true) {
> > +                     int cpu, ret;
> > +                     struct percpu_list *list_ptr = (struct percpu_list *)
> > +                             atomic_load(&args->percpu_list_ptr);
> > +
>
> What if the manager thread update ->percpu_list_ptr and call
> membarrier() here? I.e.
>
>         CPU0                    CPU1
>                                 list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
>
>         atomic_store(&args->percpu_list_ptr, list_a);
>         sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to restart rseq.cs on CPU1
>
>                                 <got IPI, but not in a rseq.cs, so nothing to do>
>                                 cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!
>
> The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
> is outside the rseq.cs, simply restarting rseq doesn't kill this
> reference.
>
> Am I missing something subtle?

rseq_cmpeqv_cmpeqv_store is used below to make sure the reference is
the one that should be used; if it is no longer "active", the
iteration is restarted.

>
> Regards,
> Boqun
>
> > +                     if (!list_ptr)
> > +                             continue;  /* Not yet initialized. */
> > +
> > +                     cpu = rseq_cpu_start();
> > +                     struct percpu_list_node *node = list_ptr->c[cpu].head;
> > +                     const intptr_t prev = node->data;
> > +
> > +                     ret = rseq_cmpeqv_cmpeqv_storev(&node->data, prev,
> > +                                     &args->percpu_list_ptr,
> > +                                     (intptr_t)list_ptr, prev + 1, cpu);
> > +                     if (!ret)
> > +                             break;  /* Success. */
> > +             }
> > +     }
> > +
> > +     if (rseq_unregister_current_thread()) {
> > +             fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> > +                     errno, strerror(errno));
> > +             abort();
> > +     }
> > +     return NULL;
> > +}
> > +
> > +void test_membarrier_init_percpu_list(struct percpu_list *list)
> > +{
> > +     int i;
> > +
> > +     memset(list, 0, sizeof(*list));
> > +     for (i = 0; i < CPU_SETSIZE; i++) {
> > +             struct percpu_list_node *node;
> > +
> > +             node = malloc(sizeof(*node));
> > +             assert(node);
> > +             node->data = 0;
> > +             node->next = NULL;
> > +             list->c[i].head = node;
> > +     }
> > +}
> > +
> > +void test_membarrier_free_percpu_list(struct percpu_list *list)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < CPU_SETSIZE; i++)
> > +             free(list->c[i].head);
> > +}
> > +
> > +static int sys_membarrier(int cmd, int flags)
> > +{
> > +     return syscall(__NR_membarrier, cmd, flags);
> > +}
> > +
> > +/*
> > + * The manager thread swaps per-cpu lists that worker threads see,
> > + * and validates that there are no unexpected modifications.
> > + */
> > +void *test_membarrier_manager_thread(void *arg)
> > +{
> > +     struct test_membarrier_thread_args *args =
> > +             (struct test_membarrier_thread_args *)arg;
> > +     struct percpu_list list_a, list_b;
> > +     intptr_t expect_a = 0, expect_b = 0;
> > +     int cpu_a = 0, cpu_b = 0;
> > +
> > +     if (rseq_register_current_thread()) {
> > +             fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s\n",
> > +                     errno, strerror(errno));
> > +             abort();
> > +     }
> > +
> > +     /* Init lists. */
> > +     test_membarrier_init_percpu_list(&list_a);
> > +     test_membarrier_init_percpu_list(&list_b);
> > +
> > +     atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
> > +
> > +     while (!atomic_load(&args->stop)) {
> > +             /* list_a is "active". */
> > +             cpu_a = rand() % CPU_SETSIZE;
> > +             /*
> > +              * As list_b is "inactive", we should never see changes
> > +              * to list_b.
> > +              */
> > +             if (expect_b != atomic_load(&list_b.c[cpu_b].head->data)) {
> > +                     fprintf(stderr, "Membarrier test failed\n");
> > +                     abort();
> > +             }
> > +
> > +             /* Make list_b "active". */
> > +             atomic_store(&args->percpu_list_ptr, (intptr_t)&list_b);
> > +             sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_a);
> > +             /*
> > +              * Cpu A should now only modify list_b, so we values
> > +              * in list_a should be stable.
> > +              */
> > +             expect_a = atomic_load(&list_a.c[cpu_a].head->data);
> > +
> > +             cpu_b = rand() % CPU_SETSIZE;
> > +             /*
> > +              * As list_a is "inactive", we should never see changes
> > +              * to list_a.
> > +              */
> > +             if (expect_a != atomic_load(&list_a.c[cpu_a].head->data)) {
> > +                     fprintf(stderr, "Membarrier test failed\n");
> > +                     abort();
> > +             }
> > +
> > +             /* Make list_a "active". */
> > +             atomic_store(&args->percpu_list_ptr, (intptr_t)&list_a);
> > +             sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, cpu_b);
> > +             /* Remember a value from list_b. */
> > +             expect_b = atomic_load(&list_b.c[cpu_b].head->data);
> > +     }
> > +
> > +     test_membarrier_free_percpu_list(&list_a);
> > +     test_membarrier_free_percpu_list(&list_b);
> > +
> > +     if (rseq_unregister_current_thread()) {
> > +             fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> > +                     errno, strerror(errno));
> > +             abort();
> > +     }
> > +     return NULL;
> > +}
> > +
> > +/* Test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU membarrier command. */
> > +void test_membarrier(void)
> > +{
> > +     struct test_membarrier_thread_args thread_args;
> > +     pthread_t worker_threads[CPU_SETSIZE];
> > +     pthread_t manager_thread;
> > +     int i;
> > +
> > +     thread_args.stop = 0;
> > +     thread_args.percpu_list_ptr = 0;
> > +     pthread_create(&manager_thread, NULL,
> > +                    test_membarrier_manager_thread, &thread_args);
> > +
> > +     for (i = 0; i < CPU_SETSIZE; i++)
> > +             pthread_create(&worker_threads[i], NULL,
> > +                    test_membarrier_worker_thread, &thread_args);
> > +
> > +     for (i = 0; i < CPU_SETSIZE; i++)
> > +             pthread_join(worker_threads[i], NULL);
> > +
> > +     atomic_store(&thread_args.stop, 1);
> > +     pthread_join(manager_thread, NULL);
> > +}
> > +
> >  int main(int argc, char **argv)
> >  {
> >       if (rseq_register_current_thread()) {
> > @@ -300,6 +479,8 @@ int main(int argc, char **argv)
> >       test_percpu_spinlock();
> >       printf("percpu_list\n");
> >       test_percpu_list();
> > +     printf("membarrier\n");
> > +     test_membarrier();
> >       if (rseq_unregister_current_thread()) {
> >               fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): %s\n",
> >                       errno, strerror(errno));
> > --
> > 2.28.0.163.g6104cc2f0b6-goog
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-07 17:55     ` Peter Oskolkov
@ 2020-08-07 18:25       ` Mathieu Desnoyers
  2020-08-07 18:47         ` Peter Oskolkov
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-08-07 18:25 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Boqun Feng, Peter Oskolkov, paulmck, Peter Zijlstra,
	linux-kernel, Paul Turner, Chris Kennelly

----- On Aug 7, 2020, at 1:55 PM, Peter Oskolkov posk@posk.io wrote:

> On Thu, Aug 6, 2020 at 5:27 PM Boqun Feng <boqun.feng@gmail.com> wrote:
[...]
>> What if the manager thread update ->percpu_list_ptr and call
>> membarrier() here? I.e.
>>
>>         CPU0                    CPU1
>>                                 list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
>>
>>         atomic_store(&args->percpu_list_ptr, list_a);
>>         sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to
>>         restart rseq.cs on CPU1
>>
>>                                 <got IPI, but not in a rseq.cs, so nothing to do>
>>                                 cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!
>>
>> The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
>> is outside the rseq.cs, simply restarting rseq doesn't kill this
>> reference.
>>
>> Am I missing something subtle?
> 
> rseq_cmpeqv_cmpeqv_store is used below to make sure the reference is
> the one that should be used; if it is no longer "active", the
> iteration is restarted.

I suspect it "works" because the manager thread does not free and
repurpose the memory where list_a is allocated, nor does it store to
its list head (which would corrupt the pointer dereferenced by CPU 1
in the scenario above). This shares similarities with type-safe memory
allocation (see SLAB_TYPESAFE_BY_RCU).

Even though it is not documented as such (or otherwise) in the example code,
I feel this example looks like it guarantees that the manager thread "owns"
list_a after the rseq-fence, when in fact it can still be read by the rseq
critical sections.

AFAIU moving the atomic_load(&args->percpu_list_ptr) into the critical section
should entirely solve this and guarantee exclusive access to the old list
after the manager's rseq-fence. I wonder why this simpler approach is not
favored ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-07 18:25       ` Mathieu Desnoyers
@ 2020-08-07 18:47         ` Peter Oskolkov
  2020-08-07 20:29           ` Mathieu Desnoyers
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Oskolkov @ 2020-08-07 18:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Boqun Feng, Peter Oskolkov, paulmck, Peter Zijlstra,
	linux-kernel, Paul Turner, Chris Kennelly

On Fri, Aug 7, 2020 at 11:25 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Aug 7, 2020, at 1:55 PM, Peter Oskolkov posk@posk.io wrote:
>
> > On Thu, Aug 6, 2020 at 5:27 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> [...]
> >> What if the manager thread update ->percpu_list_ptr and call
> >> membarrier() here? I.e.
> >>
> >>         CPU0                    CPU1
> >>                                 list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
> >>
> >>         atomic_store(&args->percpu_list_ptr, list_a);
> >>         sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to
> >>         restart rseq.cs on CPU1
> >>
> >>                                 <got IPI, but not in a rseq.cs, so nothing to do>
> >>                                 cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!
> >>
> >> The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
> >> is outside the rseq.cs, simply restarting rseq doesn't kill this
> >> reference.
> >>
> >> Am I missing something subtle?
> >
> > rseq_cmpeqv_cmpeqv_store is used below to make sure the reference is
> > the one that should be used; if it is no longer "active", the
> > iteration is restarted.
>
> I suspect it "works" because the manager thread does not free and
> repurpose the memory where list_a is allocated, nor does it store to
> its list head (which would corrupt the pointer dereferenced by CPU 1
> in the scenario above). This shares similarities with type-safe memory
> allocation (see SLAB_TYPESAFE_BY_RCU).
>
> Even though it is not documented as such (or otherwise) in the example code,
> I feel this example looks like it guarantees that the manager thread "owns"
> list_a after the rseq-fence, when in fact it can still be read by the rseq
> critical sections.
>
> AFAIU moving the atomic_load(&args->percpu_list_ptr) into the critical section
> should entirely solve this and guarantee exclusive access to the old list
> after the manager's rseq-fence. I wonder why this simpler approach is not
> favored ?

I think the test code mimics our actual production code, where the concerns
you outlined are not particularly relevant. I'll see if the test can
be simplified
in v3 along the lines you suggested.

Thanks,
Peter

>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU
  2020-08-07 18:47         ` Peter Oskolkov
@ 2020-08-07 20:29           ` Mathieu Desnoyers
  0 siblings, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2020-08-07 20:29 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Boqun Feng, Peter Oskolkov, paulmck, Peter Zijlstra,
	linux-kernel, Paul Turner, Chris Kennelly

----- On Aug 7, 2020, at 2:47 PM, Peter Oskolkov posk@posk.io wrote:

> On Fri, Aug 7, 2020 at 11:25 AM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> ----- On Aug 7, 2020, at 1:55 PM, Peter Oskolkov posk@posk.io wrote:
>>
>> > On Thu, Aug 6, 2020 at 5:27 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>> [...]
>> >> What if the manager thread update ->percpu_list_ptr and call
>> >> membarrier() here? I.e.
>> >>
>> >>         CPU0                    CPU1
>> >>                                 list_ptr = atomic_load(&args->percpu_list_ptr); // read list_b
>> >>
>> >>         atomic_store(&args->percpu_list_ptr, list_a);
>> >>         sys_membarrier(MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU, 1); // send ipi to
>> >>         restart rseq.cs on CPU1
>> >>
>> >>                                 <got IPI, but not in a rseq.cs, so nothing to do>
>> >>                                 cpu = rseq_cpu_start(); // start a rseq.cs and accessing list_b!
>> >>
>> >> The thing is, atomic_load() is an reference to ->percpu_list_ptr, which
>> >> is outside the rseq.cs, simply restarting rseq doesn't kill this
>> >> reference.
>> >>
>> >> Am I missing something subtle?
>> >
>> > rseq_cmpeqv_cmpeqv_store is used below to make sure the reference is
>> > the one that should be used; if it is no longer "active", the
>> > iteration is restarted.
>>
>> I suspect it "works" because the manager thread does not free and
>> repurpose the memory where list_a is allocated, nor does it store to
>> its list head (which would corrupt the pointer dereferenced by CPU 1
>> in the scenario above). This shares similarities with type-safe memory
>> allocation (see SLAB_TYPESAFE_BY_RCU).
>>
>> Even though it is not documented as such (or otherwise) in the example code,
>> I feel this example looks like it guarantees that the manager thread "owns"
>> list_a after the rseq-fence, when in fact it can still be read by the rseq
>> critical sections.
>>
>> AFAIU moving the atomic_load(&args->percpu_list_ptr) into the critical section
>> should entirely solve this and guarantee exclusive access to the old list
>> after the manager's rseq-fence. I wonder why this simpler approach is not
>> favored ?
> 
> I think the test code mimics our actual production code, where the concerns
> you outlined are not particularly relevant. I'll see if the test can
> be simplified
> in v3 along the lines you suggested.

In order to implement that, you'll need to extend the rseq per-arch
macros. Here is one I did for x86 (but not all other arch) which dereferences
a pointer, adds an offset that the resulting address, and loads the contents
of that memory location, all within a rseq critical section. See
https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/include/rseq/rseq-x86.h#n1292

int rseq_deref_loadoffp(intptr_t *p, off_t voffp, intptr_t *load, int cpu)

I did that following a discussion with Paul Turner about the requirements for the
rseq fence.

For the use-case you have in this example, you will probably want to create a new

int rseq_deref_offset_addv(intptr_t *p, off_t voffp, intptr_t count, int cpu)

Which dereferences the list pointer and adds an offset within the critical section,
and then increments the value at that memory location as a commit.

offsetof() is very useful to generate the voffp argument.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-08-07 20:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06 17:05 [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
2020-08-06 17:05 ` [PATCH 2/2 v2] rseq/selftests: test MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU Peter Oskolkov
2020-08-07  0:27   ` Boqun Feng
2020-08-07 12:54     ` Mathieu Desnoyers
2020-08-07 17:55     ` Peter Oskolkov
2020-08-07 18:25       ` Mathieu Desnoyers
2020-08-07 18:47         ` Peter Oskolkov
2020-08-07 20:29           ` Mathieu Desnoyers
2020-08-07 13:37 ` [PATCH 1/2 v2] rseq/membarrier: add MEMBARRIER_CMD_PRIVATE_RESTART_RSEQ_ON_CPU peterz
2020-08-07 17:50   ` Peter Oskolkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).