linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
@ 2019-03-05 19:47 Mathieu Desnoyers
  2019-03-05 19:47 ` [PATCH for 5.1 1/3] rseq: cleanup: Reflect removal of event counter in comments Mathieu Desnoyers
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-05 19:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, Peter Zijlstra, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers

Those changes aiming at 5.1 include one comment cleanup, the removal of
the rseq_len field from the task struct which serves no purpose
considering that the struct size is fixed by the ABI, and a selftest
improvement adapting the number of threads to the number of detected
CPUs, which is nicer for smaller systems.

Thanks,

Mathieu

Mathieu Desnoyers (3):
  rseq: cleanup: Reflect removal of event counter in comments
  rseq: cleanup: remove rseq_len from task_struct
  rseq/selftests: Adapt number of threads to the number of detected cpus

 arch/arm/kernel/signal.c                       | 3 +--
 arch/x86/kernel/signal.c                       | 5 +----
 include/linux/sched.h                          | 4 ----
 kernel/rseq.c                                  | 9 +++------
 tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
 5 files changed, 10 insertions(+), 18 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH for 5.1 1/3] rseq: cleanup: Reflect removal of event counter in comments
  2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
@ 2019-03-05 19:47 ` Mathieu Desnoyers
  2019-04-19 10:43   ` [tip:core/rseq] rseq: Clean up comments by reflecting removal of event counter tip-bot for Mathieu Desnoyers
  2019-03-05 19:47 ` [PATCH for 5.1 2/3] rseq: cleanup: remove rseq_len from task_struct Mathieu Desnoyers
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-05 19:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, Peter Zijlstra, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers

The "event counter" was removed from rseq before it was merged upstream.
However, a few comments in the source code still refer to it. Adapt the
comments to match reality.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
---
 arch/arm/kernel/signal.c | 3 +--
 arch/x86/kernel/signal.c | 5 +----
 kernel/rseq.c            | 3 +--
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 76bb8de6bf6b..be5edfdde558 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -549,8 +549,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	int ret;
 
 	/*
-	 * Increment event counter and perform fixup for the pre-signal
-	 * frame.
+	 * Perform fixup for the pre-signal frame.
 	 */
 	rseq_signal_deliver(ksig, regs);
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 08dfd4c1a4f9..22c233b509da 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -688,10 +688,7 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	sigset_t *set = sigmask_to_save();
 	compat_sigset_t *cset = (compat_sigset_t *) set;
 
-	/*
-	 * Increment event counter and perform fixup for the pre-signal
-	 * frame.
-	 */
+	/* Perform fixup for the pre-signal frame. */
 	rseq_signal_deliver(ksig, regs);
 
 	/* Set up the stack frame */
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 25e9a7b60eba..849afe749131 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -254,8 +254,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
  * - signal delivery,
  * and return to user-space.
  *
- * This is how we can ensure that the entire rseq critical section,
- * consisting of both the C part and the assembly instruction sequence,
+ * This is how we can ensure that the entire rseq critical section
  * will issue the commit instruction only if executed atomically with
  * respect to other threads scheduled on the same CPU, and with respect
  * to signal handlers.
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for 5.1 2/3] rseq: cleanup: remove rseq_len from task_struct
  2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
  2019-03-05 19:47 ` [PATCH for 5.1 1/3] rseq: cleanup: Reflect removal of event counter in comments Mathieu Desnoyers
@ 2019-03-05 19:47 ` Mathieu Desnoyers
  2019-04-19 10:43   ` [tip:core/rseq] rseq: Remove superfluous " tip-bot for Mathieu Desnoyers
  2019-03-05 19:47 ` [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus Mathieu Desnoyers
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-05 19:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, Peter Zijlstra, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers

The rseq system call, when invoked with flags of "0" or
"RSEQ_FLAG_UNREGISTER" values, expects the rseq_len parameter to
be equal to sizeof(struct rseq), which is fixed-size and fixed-layout,
specified in uapi linux/rseq.h.

Expecting a fixed size for rseq_len is a design choice that ensures
multiple libraries and application defining __rseq_abi in the same
process agree on its exact size.

Considering that this size is and will always be the same value, there
is no point in saving this value within task_struct rseq_len. Remove
this field from task_struct.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
---
 include/linux/sched.h | 4 ----
 kernel/rseq.c         | 6 ++----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f9b43c989577..e355688fa82c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1059,7 +1059,6 @@ struct task_struct {
 
 #ifdef CONFIG_RSEQ
 	struct rseq __user *rseq;
-	u32 rseq_len;
 	u32 rseq_sig;
 	/*
 	 * RmW on rseq_event_mask must be performed atomically
@@ -1852,12 +1851,10 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
 {
 	if (clone_flags & CLONE_THREAD) {
 		t->rseq = NULL;
-		t->rseq_len = 0;
 		t->rseq_sig = 0;
 		t->rseq_event_mask = 0;
 	} else {
 		t->rseq = current->rseq;
-		t->rseq_len = current->rseq_len;
 		t->rseq_sig = current->rseq_sig;
 		t->rseq_event_mask = current->rseq_event_mask;
 	}
@@ -1866,7 +1863,6 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
 static inline void rseq_execve(struct task_struct *t)
 {
 	t->rseq = NULL;
-	t->rseq_len = 0;
 	t->rseq_sig = 0;
 	t->rseq_event_mask = 0;
 }
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 849afe749131..9424ee90589e 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -313,7 +313,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 		/* Unregister rseq for current thread. */
 		if (current->rseq != rseq || !current->rseq)
 			return -EINVAL;
-		if (current->rseq_len != rseq_len)
+		if (rseq_len != sizeof(*rseq))
 			return -EINVAL;
 		if (current->rseq_sig != sig)
 			return -EPERM;
@@ -321,7 +321,6 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 		if (ret)
 			return ret;
 		current->rseq = NULL;
-		current->rseq_len = 0;
 		current->rseq_sig = 0;
 		return 0;
 	}
@@ -335,7 +334,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 		 * the provided address differs from the prior
 		 * one.
 		 */
-		if (current->rseq != rseq || current->rseq_len != rseq_len)
+		if (current->rseq != rseq || rseq_len != sizeof(*rseq))
 			return -EINVAL;
 		if (current->rseq_sig != sig)
 			return -EPERM;
@@ -353,7 +352,6 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 	if (!access_ok(rseq, rseq_len))
 		return -EFAULT;
 	current->rseq = rseq;
-	current->rseq_len = rseq_len;
 	current->rseq_sig = sig;
 	/*
 	 * If rseq was previously inactive, and has just been
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
  2019-03-05 19:47 ` [PATCH for 5.1 1/3] rseq: cleanup: Reflect removal of event counter in comments Mathieu Desnoyers
  2019-03-05 19:47 ` [PATCH for 5.1 2/3] rseq: cleanup: remove rseq_len from task_struct Mathieu Desnoyers
@ 2019-03-05 19:47 ` Mathieu Desnoyers
  2019-04-19 10:38   ` Ingo Molnar
                     ` (2 more replies)
  2019-03-05 20:18 ` [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-05 19:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, Peter Zijlstra, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers, Shuah Khan,
	linux-kselftest

On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.

Detect the number of online cpus at test runtime, and multiply that
by 6 to have 6 rseq threads per cpu preempting each other.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
index 3acd6d75ff9f..e426304fd4a0 100755
--- a/tools/testing/selftests/rseq/run_param_test.sh
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0+ or MIT
 
+NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
+
 EXTRA_ARGS=${@}
 
 OLDIFS="$IFS"
@@ -28,15 +30,16 @@ IFS="$OLDIFS"
 
 REPS=1000
 SLOW_REPS=100
+NR_THREADS=$((6*${NR_CPUS}))
 
 function do_tests()
 {
 	local i=0
 	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
 		echo "Running test ${TEST_NAME[$i]}"
-		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
 		echo "Running compare-twice test ${TEST_NAME[$i]}"
-		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
 		let "i++"
 	done
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2019-03-05 19:47 ` [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus Mathieu Desnoyers
@ 2019-03-05 20:18 ` Mathieu Desnoyers
  2019-03-05 21:58   ` Peter Zijlstra
  2019-03-05 21:49 ` Peter Zijlstra
  2019-04-19 10:41 ` Ingo Molnar
  5 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-05 20:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-api, Peter Zijlstra, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk,
	Joel Fernandes, Carlos O'Donell, Florian Weimer

----- On Mar 5, 2019, at 2:47 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> Those changes aiming at 5.1 include one comment cleanup, the removal of
> the rseq_len field from the task struct which serves no purpose
> considering that the struct size is fixed by the ABI, and a selftest
> improvement adapting the number of threads to the number of detected
> CPUs, which is nicer for smaller systems.

For those interested, here is a status update on how things are evolving
in terms of rseq Linux ecosystem integration:

I've been working with glibc maintainers for the past months to get rseq
registration integrated into glibc. The patchset is awaiting feedback
from glibc maintainers at this point. An important part of that integration
is the user-level ABI defining interaction between the executable and
libraries wishing to register rseq within the same process. This is needed
because the rseq system call only supports a single rseq registration per
thread (this was done on purpose). If all goes well we should see rseq
integration in glibc as part of the glibc 2.30 release in August 2019.

For those interested in upcoming rseq kernel patches I have ready, those are
available at [1].

The main reason why we're not seeing more users of rseq out there right now
is because the user-level ABI to interact between libc, applications, and
early adopter libraries needs to be specified before projects start using
rseq. An early adopter of rseq should not be incompatible with a glibc
which introduces rseq registration support.

Once glibc integration is done, here are a few things I have ready:

* NUMA node ID in TLS

Having the NUMA node ID available in a TLS variable would allow glibc to
perform interesting NUMA performance improvements within its locking
implementation, so I have a patch adding NUMA node ID support to rseq
as a new rseq system call flag.

* Adaptative mutex improvements

I have done a prototype using rseq to implement an adaptative mutex which
can detect preemption using a rseq critical section. This ensures the
thread doesn't continue to busy-loop after it returns from preemption, and
calls sys_futex() instead. This is part of a user-space prototype branch [2],
and does not require any kernel change.

* cpu_opv system call

Use-cases requiring access to remote per-cpu data such as memory migration
in a memory allocator and access to specific per-cpu buffers from a
ring-buffer consumer will end up requiring this additional system call.
I'm awaiting a broader adoption of rseq (which depends on glibc integration)
for simpler use-cases before pushing the cpu_opv system call again.

Thanks,

Mathieu

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git/
[2] https://github.com/compudj/rseq-test/blob/adapt-lock/test-rseq-adaptative-lock.c

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2019-03-05 20:18 ` [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
@ 2019-03-05 21:49 ` Peter Zijlstra
  2019-04-19 10:41 ` Ingo Molnar
  5 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-03-05 21:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, linux-kernel, linux-api, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H . Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes

On Tue, Mar 05, 2019 at 02:47:52PM -0500, Mathieu Desnoyers wrote:
> Mathieu Desnoyers (3):
>   rseq: cleanup: Reflect removal of event counter in comments
>   rseq: cleanup: remove rseq_len from task_struct
>   rseq/selftests: Adapt number of threads to the number of detected cpus
> 
>  arch/arm/kernel/signal.c                       | 3 +--
>  arch/x86/kernel/signal.c                       | 5 +----
>  include/linux/sched.h                          | 4 ----
>  kernel/rseq.c                                  | 9 +++------
>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>  5 files changed, 10 insertions(+), 18 deletions(-)

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 20:18 ` [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
@ 2019-03-05 21:58   ` Peter Zijlstra
  2019-03-05 22:32     ` Mathieu Desnoyers
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-03-05 21:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, linux-kernel, linux-api, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk,
	Joel Fernandes, Carlos O'Donell, Florian Weimer

On Tue, Mar 05, 2019 at 03:18:35PM -0500, Mathieu Desnoyers wrote:
> * NUMA node ID in TLS
> 
> Having the NUMA node ID available in a TLS variable would allow glibc to
> perform interesting NUMA performance improvements within its locking
> implementation, so I have a patch adding NUMA node ID support to rseq
> as a new rseq system call flag.

Details? There's just not much room in the futex word, and futexes
themselves are not numa aware.

Before all this spectre nonsense; tglx and me were looking at a futex2
syscall that would, among other things, cure this.

> * Adaptative mutex improvements
> 
> I have done a prototype using rseq to implement an adaptative mutex which
> can detect preemption using a rseq critical section. This ensures the
> thread doesn't continue to busy-loop after it returns from preemption, and
> calls sys_futex() instead. This is part of a user-space prototype branch [2],
> and does not require any kernel change.

I'm still not convinced that is actually the right way to go about
things. The kernel heuristic is spin while the _owner_ runs, and we
don't get preempted, obviously.

And the only userspace spinning that makes sense is to cover the cost of
the syscall. Now Obviously PTI wrecked everything, but before that
syscalls were actually plenty fast and you didn't need many cmpxchg
cycles to amortize the syscall itself -- which could then do kernel side
adaptive spinning (when required).



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 21:58   ` Peter Zijlstra
@ 2019-03-05 22:32     ` Mathieu Desnoyers
  2019-03-06  8:21       ` Peter Zijlstra
  2019-03-06  8:30       ` Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-05 22:32 UTC (permalink / raw)
  To: Peter Zijlstra, H.J. Lu, libc-alpha
  Cc: Thomas Gleixner, linux-kernel, linux-api, Paul E . McKenney,
	Boqun Feng, Andy Lutomirski, Dave Watson, Paul Turner,
	Andrew Morton, Russell King, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Will Deacon, Michael Kerrisk,
	Joel Fernandes, Carlos O'Donell, Florian Weimer

----- On Mar 5, 2019, at 4:58 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Mar 05, 2019 at 03:18:35PM -0500, Mathieu Desnoyers wrote:
>> * NUMA node ID in TLS
>> 
>> Having the NUMA node ID available in a TLS variable would allow glibc to
>> perform interesting NUMA performance improvements within its locking
>> implementation, so I have a patch adding NUMA node ID support to rseq
>> as a new rseq system call flag.
> 
> Details? There's just not much room in the futex word, and futexes
> themselves are not numa aware.

It was discussed in this libc-alpha mailing list thread:

https://public-inbox.org/libc-alpha/CAMe9rOo7i_-keOooa0D+P_wzatVCdKkTRiFiJ-cxpnvi+eApuQ@mail.gmail.com/

(adding the relevant people in CC)

I'd like to hear in more details on how they intend to design
NUMA-aware spinlocks within glibc. All I know is that quick
access to the node ID would help for this.

I would suspect we could split a lock into per-numa-node locks.
Grabbing the local numa lock would then allow grabbing the global
lock. This should help reducing remote NUMA accesses on the global
lock in contended cases, but I'm really just guessing here.

> 
> Before all this spectre nonsense; tglx and me were looking at a futex2
> syscall that would, among other things, cure this.

The email thread I point to above talks about "spinlocks", so I'm not
sure whether their intent is to apply this to mutexes as well.

> 
>> * Adaptative mutex improvements
>> 
>> I have done a prototype using rseq to implement an adaptative mutex which
>> can detect preemption using a rseq critical section. This ensures the
>> thread doesn't continue to busy-loop after it returns from preemption, and
>> calls sys_futex() instead. This is part of a user-space prototype branch [2],
>> and does not require any kernel change.
> 
> I'm still not convinced that is actually the right way to go about
> things. The kernel heuristic is spin while the _owner_ runs, and we
> don't get preempted, obviously.
> 
> And the only userspace spinning that makes sense is to cover the cost of
> the syscall. Now Obviously PTI wrecked everything, but before that
> syscalls were actually plenty fast and you didn't need many cmpxchg
> cycles to amortize the syscall itself -- which could then do kernel side
> adaptive spinning (when required).

Indeed with PTI the system calls are back to their slow self. ;)

You point about owner is interesting. Perhaps there is one tweak that I
should add in there. We could write the owner thread ID in the lock word.

When trying to grab a lock, one of a few situations can happen:

- It's unlocked, so we grab it by storing our thread ID,
- It's locked, and we can fetch the CPU number of the thread owning it
  if we can access its (struct rseq *)->cpu_id through a lookup using its
  thread ID, We can then check whether it's the same CPU we are running on.
  - If so, we _know_ we should let the owner run, so we call futex right away,
    no spinning. We can even boost it for priority inheritance mutexes,
  - If it's owned by a thread which was last running on a different CPU,
    then it may make sense to actively try to grab the lock by spinning
    up to a certain number of loops (which can be either fixed or adaptative).
    After that limit, call futex. If preempted while looping, call futex.

Do you see this as an improvement over what exists today, or am I
on the wrong track ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 22:32     ` Mathieu Desnoyers
@ 2019-03-06  8:21       ` Peter Zijlstra
  2019-03-06  8:30       ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-03-06  8:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H.J. Lu, libc-alpha, Thomas Gleixner, linux-kernel, linux-api,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, Carlos O'Donell,
	Florian Weimer

On Tue, Mar 05, 2019 at 05:32:10PM -0500, Mathieu Desnoyers wrote:
> ----- On Mar 5, 2019, at 4:58 PM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Tue, Mar 05, 2019 at 03:18:35PM -0500, Mathieu Desnoyers wrote:
> >> * NUMA node ID in TLS
> >> 
> >> Having the NUMA node ID available in a TLS variable would allow glibc to
> >> perform interesting NUMA performance improvements within its locking
> >> implementation, so I have a patch adding NUMA node ID support to rseq
> >> as a new rseq system call flag.
> > 
> > Details? There's just not much room in the futex word, and futexes
> > themselves are not numa aware.
> 
> It was discussed in this libc-alpha mailing list thread:
> 
> https://public-inbox.org/libc-alpha/CAMe9rOo7i_-keOooa0D+P_wzatVCdKkTRiFiJ-cxpnvi+eApuQ@mail.gmail.com/
> 
> (adding the relevant people in CC)
> 
> I'd like to hear in more details on how they intend to design
> NUMA-aware spinlocks within glibc. All I know is that quick
> access to the node ID would help for this.

Userspace spinlocks are a trainwreck anyway. The only case where they
can possibly work is when there's only a single thread on every cpu.
Pretty much any other scenario is fail; see why we have paravirt
spinlocks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 22:32     ` Mathieu Desnoyers
  2019-03-06  8:21       ` Peter Zijlstra
@ 2019-03-06  8:30       ` Peter Zijlstra
  2019-03-06 17:00         ` Mathieu Desnoyers
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-03-06  8:30 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H.J. Lu, libc-alpha, Thomas Gleixner, linux-kernel, linux-api,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, Carlos O'Donell,
	Florian Weimer

On Tue, Mar 05, 2019 at 05:32:10PM -0500, Mathieu Desnoyers wrote:

> >> * Adaptative mutex improvements
> >> 
> >> I have done a prototype using rseq to implement an adaptative mutex which
> >> can detect preemption using a rseq critical section. This ensures the
> >> thread doesn't continue to busy-loop after it returns from preemption, and
> >> calls sys_futex() instead. This is part of a user-space prototype branch [2],
> >> and does not require any kernel change.
> > 
> > I'm still not convinced that is actually the right way to go about
> > things. The kernel heuristic is spin while the _owner_ runs, and we
> > don't get preempted, obviously.
> > 
> > And the only userspace spinning that makes sense is to cover the cost of
> > the syscall. Now Obviously PTI wrecked everything, but before that
> > syscalls were actually plenty fast and you didn't need many cmpxchg
> > cycles to amortize the syscall itself -- which could then do kernel side
> > adaptive spinning (when required).
> 
> Indeed with PTI the system calls are back to their slow self. ;)
> 
> You point about owner is interesting. Perhaps there is one tweak that I
> should add in there. We could write the owner thread ID in the lock word.

This is already required for PI (and I think robust) futexes. There have
been proposals for FUTEX_LOCK and FUTEX_UNLOCK (!PI) primitives that
require the same.

Waiman had some patches; but I think all went under because 'important'
stuff happened.

> When trying to grab a lock, one of a few situations can happen:
> 
> - It's unlocked, so we grab it by storing our thread ID,
> - It's locked, and we can fetch the CPU number of the thread owning it
>   if we can access its (struct rseq *)->cpu_id through a lookup using its
>   thread ID, We can then check whether it's the same CPU we are running on.

That might just work with threads (private futexes; which are the
majority these these I think), but will obviously not work with regular
(shared) futexes.

>   - If so, we _know_ we should let the owner run, so we call futex right away,
>     no spinning. We can even boost it for priority inheritance mutexes,
>   - If it's owned by a thread which was last running on a different CPU,
>     then it may make sense to actively try to grab the lock by spinning
>     up to a certain number of loops (which can be either fixed or adaptative).
>     After that limit, call futex. If preempted while looping, call futex.
> 
> Do you see this as an improvement over what exists today, or am I
> on the wrong track ?

That's probably better than what they have today. Last time I looked at
libc pthread I got really sad -- arguably that was a long time ago, and
some of that stuff is because POSIX, but still.

Some day we should redesign all that.. futex2 etc.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-06  8:30       ` Peter Zijlstra
@ 2019-03-06 17:00         ` Mathieu Desnoyers
  0 siblings, 0 replies; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-03-06 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H.J. Lu, libc-alpha, Thomas Gleixner, linux-kernel, linux-api,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, carlos, Florian Weimer

----- On Mar 6, 2019, at 3:30 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Tue, Mar 05, 2019 at 05:32:10PM -0500, Mathieu Desnoyers wrote:
> 
>> >> * Adaptative mutex improvements
>> >> 
>> >> I have done a prototype using rseq to implement an adaptative mutex which
>> >> can detect preemption using a rseq critical section. This ensures the
>> >> thread doesn't continue to busy-loop after it returns from preemption, and
>> >> calls sys_futex() instead. This is part of a user-space prototype branch [2],
>> >> and does not require any kernel change.
>> > 
>> > I'm still not convinced that is actually the right way to go about
>> > things. The kernel heuristic is spin while the _owner_ runs, and we
>> > don't get preempted, obviously.
>> > 
>> > And the only userspace spinning that makes sense is to cover the cost of
>> > the syscall. Now Obviously PTI wrecked everything, but before that
>> > syscalls were actually plenty fast and you didn't need many cmpxchg
>> > cycles to amortize the syscall itself -- which could then do kernel side
>> > adaptive spinning (when required).
>> 
>> Indeed with PTI the system calls are back to their slow self. ;)
>> 
>> You point about owner is interesting. Perhaps there is one tweak that I
>> should add in there. We could write the owner thread ID in the lock word.
> 
> This is already required for PI (and I think robust) futexes. There have
> been proposals for FUTEX_LOCK and FUTEX_UNLOCK (!PI) primitives that
> require the same.
> 
> Waiman had some patches; but I think all went under because 'important'
> stuff happened.
> 
>> When trying to grab a lock, one of a few situations can happen:
>> 
>> - It's unlocked, so we grab it by storing our thread ID,
>> - It's locked, and we can fetch the CPU number of the thread owning it
>>   if we can access its (struct rseq *)->cpu_id through a lookup using its
>>   thread ID, We can then check whether it's the same CPU we are running on.
> 
> That might just work with threads (private futexes; which are the
> majority these these I think), but will obviously not work with regular
> (shared) futexes.

If we have enough space available either in the lock word or just nearby,
we could write the CPU number that was current when the thread owning
the lock grabbed it. Considering that it should be infrequent that threads
get migrated to other CPUs while holding the lock, it might be a good enough
heuristic to figure out whether a thread needs to busy-wait or immediately
call futex.

Writing the CPU number would work both for private and shared futexes.

> 
>>   - If so, we _know_ we should let the owner run, so we call futex right away,
>>     no spinning. We can even boost it for priority inheritance mutexes,
>>   - If it's owned by a thread which was last running on a different CPU,
>>     then it may make sense to actively try to grab the lock by spinning
>>     up to a certain number of loops (which can be either fixed or adaptative).
>>     After that limit, call futex. If preempted while looping, call futex.
>> 
>> Do you see this as an improvement over what exists today, or am I
>> on the wrong track ?
> 
> That's probably better than what they have today. Last time I looked at
> libc pthread I got really sad -- arguably that was a long time ago, and
> some of that stuff is because POSIX, but still.
> 
> Some day we should redesign all that.. futex2 etc.

It sounds like an interesting topic to bring up at the next LPC! In the
meantime, a good start would be to state the desiderata of what requirements
should be covered by this redesign.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-03-05 19:47 ` [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus Mathieu Desnoyers
@ 2019-04-19 10:38   ` Ingo Molnar
  2019-04-19 12:41     ` Mathieu Desnoyers
  2019-04-19 10:44   ` [tip:core/rseq] " tip-bot for Mathieu Desnoyers
  2019-04-19 10:46   ` [tip:core/rseq] rseq/selftests: Adapt number of threads to the number of detected CPUs tip-bot for Mathieu Desnoyers
  2 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2019-04-19 10:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H . Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Joel Fernandes, Shuah Khan,
	linux-kselftest


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> On smaller systems, running a test with 200 threads can take a long
> time on machines with smaller number of CPUs.
> 
> Detect the number of online cpus at test runtime, and multiply that
> by 6 to have 6 rseq threads per cpu preempting each other.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Joel Fernandes <joelaf@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Dave Watson <davejwatson@fb.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: linux-kselftest@vger.kernel.org
> Cc: "H . Peter Anvin" <hpa@zytor.com>
> Cc: Chris Lameter <cl@linux.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Paul Turner <pjt@google.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Maurer <bmaurer@fb.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
> index 3acd6d75ff9f..e426304fd4a0 100755
> --- a/tools/testing/selftests/rseq/run_param_test.sh
> +++ b/tools/testing/selftests/rseq/run_param_test.sh
> @@ -1,6 +1,8 @@
>  #!/bin/bash
>  # SPDX-License-Identifier: GPL-2.0+ or MIT
>  
> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
> +
>  EXTRA_ARGS=${@}
>  
>  OLDIFS="$IFS"
> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>  
>  REPS=1000
>  SLOW_REPS=100
> +NR_THREADS=$((6*${NR_CPUS}))
>  
>  function do_tests()
>  {
>  	local i=0
>  	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>  		echo "Running test ${TEST_NAME[$i]}"
> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
>  		echo "Running compare-twice test ${TEST_NAME[$i]}"
> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
>  		let "i++"
>  	done
>  }

BTW., when trying to build the rseq self-tests I get this build failure:

  dagon:~/tip/tools/testing/selftests/rseq> make
  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared -fPIC rseq.c -lpthread -o /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_percpu_ops_test.c -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined reference to `.L8'
  /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined reference to `.L49'
  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined reference to `.L57'
  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to `.L49'
  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to `.L55'
  collect2: error: ld returned 1 exit status
  make: *** [Makefile:22: /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1

Is this a known problem, or do I miss something from my build environment 
perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2019-03-05 21:49 ` Peter Zijlstra
@ 2019-04-19 10:41 ` Ingo Molnar
  2019-04-19 12:42   ` Mathieu Desnoyers
  5 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2019-04-19 10:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H . Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer,
	Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
	Will Deacon, Michael Kerrisk, Joel Fernandes


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Those changes aiming at 5.1 include one comment cleanup, the removal of
> the rseq_len field from the task struct which serves no purpose
> considering that the struct size is fixed by the ABI, and a selftest
> improvement adapting the number of threads to the number of detected
> CPUs, which is nicer for smaller systems.
> 
> Thanks,
> 
> Mathieu
> 
> Mathieu Desnoyers (3):
>   rseq: cleanup: Reflect removal of event counter in comments
>   rseq: cleanup: remove rseq_len from task_struct
>   rseq/selftests: Adapt number of threads to the number of detected cpus
> 
>  arch/arm/kernel/signal.c                       | 3 +--
>  arch/x86/kernel/signal.c                       | 5 +----
>  include/linux/sched.h                          | 4 ----
>  kernel/rseq.c                                  | 9 +++------
>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>  5 files changed, 10 insertions(+), 18 deletions(-)

Looks good, I've applied these to tip:core/rseq to make sure they don't 
miss the v5.2 merge window.

(Let me know if you wanted to handle this differently.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:core/rseq] rseq: Clean up comments by reflecting removal of event counter
  2019-03-05 19:47 ` [PATCH for 5.1 1/3] rseq: cleanup: Reflect removal of event counter in comments Mathieu Desnoyers
@ 2019-04-19 10:43   ` tip-bot for Mathieu Desnoyers
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mathieu Desnoyers @ 2019-04-19 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: joelaf, pjt, bmaurer, peterz, torvalds, hpa, will.deacon,
	paulmck, tglx, josh, catalin.marinas, mathieu.desnoyers, cl,
	davejwatson, linux, mingo, rostedt, mtk.manpages, luto, akpm,
	boqun.feng, linux-kernel

Commit-ID:  bff9504bfc9c5c6610b42d47f689f350fd969eb8
Gitweb:     https://git.kernel.org/tip/bff9504bfc9c5c6610b42d47f689f350fd969eb8
Author:     Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate: Tue, 5 Mar 2019 14:47:53 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Apr 2019 12:39:31 +0200

rseq: Clean up comments by reflecting removal of event counter

The "event counter" was removed from rseq before it was merged upstream.
However, a few comments in the source code still refer to it. Adapt the
comments to match reality.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20190305194755.2602-2-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/arm/kernel/signal.c | 3 +--
 arch/x86/kernel/signal.c | 5 +----
 kernel/rseq.c            | 3 +--
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 76bb8de6bf6b..be5edfdde558 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -549,8 +549,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	int ret;
 
 	/*
-	 * Increment event counter and perform fixup for the pre-signal
-	 * frame.
+	 * Perform fixup for the pre-signal frame.
 	 */
 	rseq_signal_deliver(ksig, regs);
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 08dfd4c1a4f9..22c233b509da 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -688,10 +688,7 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
 	sigset_t *set = sigmask_to_save();
 	compat_sigset_t *cset = (compat_sigset_t *) set;
 
-	/*
-	 * Increment event counter and perform fixup for the pre-signal
-	 * frame.
-	 */
+	/* Perform fixup for the pre-signal frame. */
 	rseq_signal_deliver(ksig, regs);
 
 	/* Set up the stack frame */
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 25e9a7b60eba..849afe749131 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -254,8 +254,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
  * - signal delivery,
  * and return to user-space.
  *
- * This is how we can ensure that the entire rseq critical section,
- * consisting of both the C part and the assembly instruction sequence,
+ * This is how we can ensure that the entire rseq critical section
  * will issue the commit instruction only if executed atomically with
  * respect to other threads scheduled on the same CPU, and with respect
  * to signal handlers.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:core/rseq] rseq: Remove superfluous rseq_len from task_struct
  2019-03-05 19:47 ` [PATCH for 5.1 2/3] rseq: cleanup: remove rseq_len from task_struct Mathieu Desnoyers
@ 2019-04-19 10:43   ` tip-bot for Mathieu Desnoyers
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mathieu Desnoyers @ 2019-04-19 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux, mtk.manpages, catalin.marinas, boqun.feng, davejwatson,
	peterz, pjt, akpm, josh, cl, linux-kernel, will.deacon, joelaf,
	torvalds, hpa, mathieu.desnoyers, bmaurer, tglx, rostedt,
	paulmck, mingo, luto

Commit-ID:  83b0b15bcb0f700e7c1d070aae2e7841170a4c33
Gitweb:     https://git.kernel.org/tip/83b0b15bcb0f700e7c1d070aae2e7841170a4c33
Author:     Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate: Tue, 5 Mar 2019 14:47:54 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Apr 2019 12:39:32 +0200

rseq: Remove superfluous rseq_len from task_struct

The rseq system call, when invoked with flags of "0" or
"RSEQ_FLAG_UNREGISTER" values, expects the rseq_len parameter to
be equal to sizeof(struct rseq), which is fixed-size and fixed-layout,
specified in uapi linux/rseq.h.

Expecting a fixed size for rseq_len is a design choice that ensures
multiple libraries and application defining __rseq_abi in the same
process agree on its exact size.

Considering that this size is and will always be the same value, there
is no point in saving this value within task_struct rseq_len. Remove
this field from task_struct.

No change in functionality intended.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20190305194755.2602-3-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h | 4 ----
 kernel/rseq.c         | 6 ++----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1549584a1538..50606a6e73d6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1057,7 +1057,6 @@ struct task_struct {
 
 #ifdef CONFIG_RSEQ
 	struct rseq __user *rseq;
-	u32 rseq_len;
 	u32 rseq_sig;
 	/*
 	 * RmW on rseq_event_mask must be performed atomically
@@ -1855,12 +1854,10 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
 {
 	if (clone_flags & CLONE_THREAD) {
 		t->rseq = NULL;
-		t->rseq_len = 0;
 		t->rseq_sig = 0;
 		t->rseq_event_mask = 0;
 	} else {
 		t->rseq = current->rseq;
-		t->rseq_len = current->rseq_len;
 		t->rseq_sig = current->rseq_sig;
 		t->rseq_event_mask = current->rseq_event_mask;
 	}
@@ -1869,7 +1866,6 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
 static inline void rseq_execve(struct task_struct *t)
 {
 	t->rseq = NULL;
-	t->rseq_len = 0;
 	t->rseq_sig = 0;
 	t->rseq_event_mask = 0;
 }
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 849afe749131..9424ee90589e 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -313,7 +313,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 		/* Unregister rseq for current thread. */
 		if (current->rseq != rseq || !current->rseq)
 			return -EINVAL;
-		if (current->rseq_len != rseq_len)
+		if (rseq_len != sizeof(*rseq))
 			return -EINVAL;
 		if (current->rseq_sig != sig)
 			return -EPERM;
@@ -321,7 +321,6 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 		if (ret)
 			return ret;
 		current->rseq = NULL;
-		current->rseq_len = 0;
 		current->rseq_sig = 0;
 		return 0;
 	}
@@ -335,7 +334,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 		 * the provided address differs from the prior
 		 * one.
 		 */
-		if (current->rseq != rseq || current->rseq_len != rseq_len)
+		if (current->rseq != rseq || rseq_len != sizeof(*rseq))
 			return -EINVAL;
 		if (current->rseq_sig != sig)
 			return -EPERM;
@@ -353,7 +352,6 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
 	if (!access_ok(rseq, rseq_len))
 		return -EFAULT;
 	current->rseq = rseq;
-	current->rseq_len = rseq_len;
 	current->rseq_sig = sig;
 	/*
 	 * If rseq was previously inactive, and has just been

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:core/rseq] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-03-05 19:47 ` [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus Mathieu Desnoyers
  2019-04-19 10:38   ` Ingo Molnar
@ 2019-04-19 10:44   ` tip-bot for Mathieu Desnoyers
  2019-04-19 10:46   ` [tip:core/rseq] rseq/selftests: Adapt number of threads to the number of detected CPUs tip-bot for Mathieu Desnoyers
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mathieu Desnoyers @ 2019-04-19 10:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: josh, catalin.marinas, joelaf, paulmck, hpa, mtk.manpages, tglx,
	mathieu.desnoyers, bmaurer, rostedt, davejwatson, boqun.feng,
	pjt, will.deacon, torvalds, linux, cl, akpm, shuah, linux-kernel,
	mingo, luto, peterz

Commit-ID:  90083998312324599cb2a546020755cdc934de0f
Gitweb:     https://git.kernel.org/tip/90083998312324599cb2a546020755cdc934de0f
Author:     Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate: Tue, 5 Mar 2019 14:47:55 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Apr 2019 12:39:32 +0200

rseq/selftests: Adapt number of threads to the number of detected cpus

On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.

Detect the number of online cpus at test runtime, and multiply that
by 6 to have 6 rseq threads per cpu preempting each other.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Link: http://lkml.kernel.org/r/20190305194755.2602-4-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
index 3acd6d75ff9f..e426304fd4a0 100755
--- a/tools/testing/selftests/rseq/run_param_test.sh
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0+ or MIT
 
+NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
+
 EXTRA_ARGS=${@}
 
 OLDIFS="$IFS"
@@ -28,15 +30,16 @@ IFS="$OLDIFS"
 
 REPS=1000
 SLOW_REPS=100
+NR_THREADS=$((6*${NR_CPUS}))
 
 function do_tests()
 {
 	local i=0
 	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
 		echo "Running test ${TEST_NAME[$i]}"
-		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
 		echo "Running compare-twice test ${TEST_NAME[$i]}"
-		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
 		let "i++"
 	done
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:core/rseq] rseq/selftests: Adapt number of threads to the number of detected CPUs
  2019-03-05 19:47 ` [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus Mathieu Desnoyers
  2019-04-19 10:38   ` Ingo Molnar
  2019-04-19 10:44   ` [tip:core/rseq] " tip-bot for Mathieu Desnoyers
@ 2019-04-19 10:46   ` tip-bot for Mathieu Desnoyers
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mathieu Desnoyers @ 2019-04-19 10:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: davejwatson, mingo, cl, linux-kernel, linux, mathieu.desnoyers,
	rostedt, joelaf, catalin.marinas, mtk.manpages, luto, boqun.feng,
	pjt, bmaurer, shuah, paulmck, akpm, will.deacon, torvalds, tglx,
	peterz, hpa, josh

Commit-ID:  03a33ecc26af28ca9535de2b0612c54deafa9728
Gitweb:     https://git.kernel.org/tip/03a33ecc26af28ca9535de2b0612c54deafa9728
Author:     Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate: Tue, 5 Mar 2019 14:47:55 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 19 Apr 2019 12:42:14 +0200

rseq/selftests: Adapt number of threads to the number of detected CPUs

On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.

Detect the number of online CPUs at test runtime, and multiply that
by 6 to have 6 rseq threads per CPU preempting each other.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Link: http://lkml.kernel.org/r/20190305194755.2602-4-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
index 3acd6d75ff9f..e426304fd4a0 100755
--- a/tools/testing/selftests/rseq/run_param_test.sh
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0+ or MIT
 
+NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
+
 EXTRA_ARGS=${@}
 
 OLDIFS="$IFS"
@@ -28,15 +30,16 @@ IFS="$OLDIFS"
 
 REPS=1000
 SLOW_REPS=100
+NR_THREADS=$((6*${NR_CPUS}))
 
 function do_tests()
 {
 	local i=0
 	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
 		echo "Running test ${TEST_NAME[$i]}"
-		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
 		echo "Running compare-twice test ${TEST_NAME[$i]}"
-		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
 		let "i++"
 	done
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 10:38   ` Ingo Molnar
@ 2019-04-19 12:41     ` Mathieu Desnoyers
  2019-04-19 12:55       ` Mathieu Desnoyers
  0 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 12:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, shuah, linux-kselftest

----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:

> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> On smaller systems, running a test with 200 threads can take a long
>> time on machines with smaller number of CPUs.
>> 
>> Detect the number of online cpus at test runtime, and multiply that
>> by 6 to have 6 rseq threads per cpu preempting each other.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Shuah Khan <shuah@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Joel Fernandes <joelaf@google.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Dave Watson <davejwatson@fb.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Andi Kleen <andi@firstfloor.org>
>> Cc: linux-kselftest@vger.kernel.org
>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>> Cc: Chris Lameter <cl@linux.com>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>> Cc: Paul Turner <pjt@google.com>
>> Cc: Boqun Feng <boqun.feng@gmail.com>
>> Cc: Josh Triplett <josh@joshtriplett.org>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Ben Maurer <bmaurer@fb.com>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> ---
>>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>> b/tools/testing/selftests/rseq/run_param_test.sh
>> index 3acd6d75ff9f..e426304fd4a0 100755
>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>> @@ -1,6 +1,8 @@
>>  #!/bin/bash
>>  # SPDX-License-Identifier: GPL-2.0+ or MIT
>>  
>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>> +
>>  EXTRA_ARGS=${@}
>>  
>>  OLDIFS="$IFS"
>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>  
>>  REPS=1000
>>  SLOW_REPS=100
>> +NR_THREADS=$((6*${NR_CPUS}))
>>  
>>  function do_tests()
>>  {
>>  	local i=0
>>  	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>  		echo "Running test ${TEST_NAME[$i]}"
>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>> || exit 1
>>  		echo "Running compare-twice test ${TEST_NAME[$i]}"
>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>> exit 1
>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>> ${EXTRA_ARGS} || exit 1
>>  		let "i++"
>>  	done
>>  }
> 
> BTW., when trying to build the rseq self-tests I get this build failure:
> 
>  dagon:~/tip/tools/testing/selftests/rseq> make
>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>  -fPIC rseq.c -lpthread -o
>  /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>  -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>  basic_percpu_ops_test.c -lpthread -lrseq -o
>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>  reference to `.L8'
>  /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>  undefined reference to `.L49'
>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>  reference to `.L57'
>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>  `.L49'
>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>  `.L55'
>  collect2: error: ld returned 1 exit status
>  make: *** [Makefile:22:
>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
> 
> Is this a known problem, or do I miss something from my build environment
> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).

It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
(experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).

Thanks for reporting! I will investigate.

Mathieu


> 
> Thanks,
> 
> 	Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1
  2019-04-19 10:41 ` Ingo Molnar
@ 2019-04-19 12:42   ` Mathieu Desnoyers
  0 siblings, 0 replies; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes

----- On Apr 19, 2019, at 6:41 AM, Ingo Molnar mingo@kernel.org wrote:

> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> Those changes aiming at 5.1 include one comment cleanup, the removal of
>> the rseq_len field from the task struct which serves no purpose
>> considering that the struct size is fixed by the ABI, and a selftest
>> improvement adapting the number of threads to the number of detected
>> CPUs, which is nicer for smaller systems.
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> Mathieu Desnoyers (3):
>>   rseq: cleanup: Reflect removal of event counter in comments
>>   rseq: cleanup: remove rseq_len from task_struct
>>   rseq/selftests: Adapt number of threads to the number of detected cpus
>> 
>>  arch/arm/kernel/signal.c                       | 3 +--
>>  arch/x86/kernel/signal.c                       | 5 +----
>>  include/linux/sched.h                          | 4 ----
>>  kernel/rseq.c                                  | 9 +++------
>>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>  5 files changed, 10 insertions(+), 18 deletions(-)
> 
> Looks good, I've applied these to tip:core/rseq to make sure they don't
> miss the v5.2 merge window.
> 
> (Let me know if you wanted to handle this differently.)

That's fine by me!

Thanks,

Mathieu


> 
> Thanks,
> 
> 	Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 12:41     ` Mathieu Desnoyers
@ 2019-04-19 12:55       ` Mathieu Desnoyers
  2019-04-19 13:42         ` Mathieu Desnoyers
  0 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 12:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, shuah, linux-kselftest

----- On Apr 19, 2019, at 8:41 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:
> 
>> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> 
>>> On smaller systems, running a test with 200 threads can take a long
>>> time on machines with smaller number of CPUs.
>>> 
>>> Detect the number of online cpus at test runtime, and multiply that
>>> by 6 to have 6 rseq threads per cpu preempting each other.
>>> 
>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Cc: Shuah Khan <shuah@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Joel Fernandes <joelaf@google.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Dave Watson <davejwatson@fb.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Andi Kleen <andi@firstfloor.org>
>>> Cc: linux-kselftest@vger.kernel.org
>>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>>> Cc: Chris Lameter <cl@linux.com>
>>> Cc: Russell King <linux@arm.linux.org.uk>
>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>>> Cc: Paul Turner <pjt@google.com>
>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>> Cc: Ben Maurer <bmaurer@fb.com>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> ---
>>>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>>> b/tools/testing/selftests/rseq/run_param_test.sh
>>> index 3acd6d75ff9f..e426304fd4a0 100755
>>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>>> @@ -1,6 +1,8 @@
>>>  #!/bin/bash
>>>  # SPDX-License-Identifier: GPL-2.0+ or MIT
>>>  
>>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>>> +
>>>  EXTRA_ARGS=${@}
>>>  
>>>  OLDIFS="$IFS"
>>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>>  
>>>  REPS=1000
>>>  SLOW_REPS=100
>>> +NR_THREADS=$((6*${NR_CPUS}))
>>>  
>>>  function do_tests()
>>>  {
>>>  	local i=0
>>>  	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>>  		echo "Running test ${TEST_NAME[$i]}"
>>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>>> || exit 1
>>>  		echo "Running compare-twice test ${TEST_NAME[$i]}"
>>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>>> exit 1
>>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>>> ${EXTRA_ARGS} || exit 1
>>>  		let "i++"
>>>  	done
>>>  }
>> 
>> BTW., when trying to build the rseq self-tests I get this build failure:
>> 
>>  dagon:~/tip/tools/testing/selftests/rseq> make
>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>>  -fPIC rseq.c -lpthread -o
>>  /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>>  -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>>  basic_percpu_ops_test.c -lpthread -lrseq -o
>>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>>  reference to `.L8'
>>  /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>>  undefined reference to `.L49'
>>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>>  reference to `.L57'
>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>>  `.L49'
>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>>  `.L55'
>>  collect2: error: ld returned 1 exit status
>>  make: *** [Makefile:22:
>>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
>> 
>> Is this a known problem, or do I miss something from my build environment
>> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).
> 
> It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
> but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
> (experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).
> 
> Thanks for reporting! I will investigate.

It looks like gcc-8 optimize away the target of asm goto labels when
there are more than one of them on x86-64. I'll try to come up with
a simpler reproducer.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 12:55       ` Mathieu Desnoyers
@ 2019-04-19 13:42         ` Mathieu Desnoyers
  2019-04-19 13:48           ` Mathieu Desnoyers
  0 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 13:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, shuah, linux-kselftest

----- On Apr 19, 2019, at 8:55 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Apr 19, 2019, at 8:41 AM, Mathieu Desnoyers
> mathieu.desnoyers@efficios.com wrote:
> 
>> ----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:
>> 
>>> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>> 
>>>> On smaller systems, running a test with 200 threads can take a long
>>>> time on machines with smaller number of CPUs.
>>>> 
>>>> Detect the number of online cpus at test runtime, and multiply that
>>>> by 6 to have 6 rseq threads per cpu preempting each other.
>>>> 
>>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>> Cc: Shuah Khan <shuah@kernel.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Joel Fernandes <joelaf@google.com>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Dave Watson <davejwatson@fb.com>
>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>> Cc: Andi Kleen <andi@firstfloor.org>
>>>> Cc: linux-kselftest@vger.kernel.org
>>>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>>>> Cc: Chris Lameter <cl@linux.com>
>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>>>> Cc: Paul Turner <pjt@google.com>
>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>> Cc: Ben Maurer <bmaurer@fb.com>
>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>> ---
>>>>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>>>> b/tools/testing/selftests/rseq/run_param_test.sh
>>>> index 3acd6d75ff9f..e426304fd4a0 100755
>>>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>>>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>>>> @@ -1,6 +1,8 @@
>>>>  #!/bin/bash
>>>>  # SPDX-License-Identifier: GPL-2.0+ or MIT
>>>>  
>>>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>>>> +
>>>>  EXTRA_ARGS=${@}
>>>>  
>>>>  OLDIFS="$IFS"
>>>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>>>  
>>>>  REPS=1000
>>>>  SLOW_REPS=100
>>>> +NR_THREADS=$((6*${NR_CPUS}))
>>>>  
>>>>  function do_tests()
>>>>  {
>>>>  	local i=0
>>>>  	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>>>  		echo "Running test ${TEST_NAME[$i]}"
>>>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>>>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>>>> || exit 1
>>>>  		echo "Running compare-twice test ${TEST_NAME[$i]}"
>>>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>>>> exit 1
>>>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>>>> ${EXTRA_ARGS} || exit 1
>>>>  		let "i++"
>>>>  	done
>>>>  }
>>> 
>>> BTW., when trying to build the rseq self-tests I get this build failure:
>>> 
>>>  dagon:~/tip/tools/testing/selftests/rseq> make
>>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>>>  -fPIC rseq.c -lpthread -o
>>>  /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>>>  -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>>>  basic_percpu_ops_test.c -lpthread -lrseq -o
>>>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>>>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>>>  reference to `.L8'
>>>  /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>>>  undefined reference to `.L49'
>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>>>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>>>  reference to `.L57'
>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>>>  `.L49'
>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>>>  `.L55'
>>>  collect2: error: ld returned 1 exit status
>>>  make: *** [Makefile:22:
>>>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
>>> 
>>> Is this a known problem, or do I miss something from my build environment
>>> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).
>> 
>> It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
>> but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
>> (experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).
>> 
>> Thanks for reporting! I will investigate.
> 
> It looks like gcc-8 optimize away the target of asm goto labels when
> there are more than one of them on x86-64. I'll try to come up with
> a simpler reproducer.

It appears to be related to gcc-8 mishandling combination of
asm goto and thread-local storage input operands on x86-64.
Here is a simple reproducer:

__thread int var;
  
static int fct(void)
{
        asm goto (      "jmp %l[testlabel]\n\t"
                        : : [var] "m" (var) : : testlabel);
        return 0;
testlabel:
        return 1;
}

int main()
{
        return fct();
}

building with gcc-7 -O2 is fine. Building with gcc-8 -O0 is
fine too. Building with gcc-8 -O1 and -O2 fails with:

/tmp/ccuXTFfs.o: In function `main':
test-asm-goto.c:(.text.startup+0x1): undefined reference to `.L2'
collect2: error: ld returned 1 exit status

With gcc-7 -O2, the assembly of main has the .L2 label:

main:
.LFB1:
        .cfi_startproc
#APP
# 5 "test-asm-goto.c" 1
        jmp .L2

# 0 "" 2
#NO_APP
.L4:
.L3:
        xorl    %eax, %eax
        ret
.L2:
        movl    $1, %eax
        ret
        .cfi_endproc

However, with gcc-8 -O2, it's missing:

main:
.LFB1:
        .cfi_startproc
.L3:
#APP
# 5 "test-asm-goto.c" 1
        jmp .L2

# 0 "" 2
#NO_APP
        xorl    %eax, %eax
        ret
        .cfi_endproc

It looks like we have a compiler issue. :-/

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 13:42         ` Mathieu Desnoyers
@ 2019-04-19 13:48           ` Mathieu Desnoyers
  2019-04-19 14:17             ` shuah
  0 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 13:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, shuah, linux-kselftest

----- On Apr 19, 2019, at 9:42 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Apr 19, 2019, at 8:55 AM, Mathieu Desnoyers
> mathieu.desnoyers@efficios.com wrote:
> 
>> ----- On Apr 19, 2019, at 8:41 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>> 
>>> ----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:
>>> 
>>>> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>> 
>>>>> On smaller systems, running a test with 200 threads can take a long
>>>>> time on machines with smaller number of CPUs.
>>>>> 
>>>>> Detect the number of online cpus at test runtime, and multiply that
>>>>> by 6 to have 6 rseq threads per cpu preempting each other.
>>>>> 
>>>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>> Cc: Shuah Khan <shuah@kernel.org>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Joel Fernandes <joelaf@google.com>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Dave Watson <davejwatson@fb.com>
>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>> Cc: Andi Kleen <andi@firstfloor.org>
>>>>> Cc: linux-kselftest@vger.kernel.org
>>>>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>>>>> Cc: Chris Lameter <cl@linux.com>
>>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>>>>> Cc: Paul Turner <pjt@google.com>
>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>> Cc: Ben Maurer <bmaurer@fb.com>
>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> ---
>>>>>  tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>>>>> b/tools/testing/selftests/rseq/run_param_test.sh
>>>>> index 3acd6d75ff9f..e426304fd4a0 100755
>>>>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>>>>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>>>>> @@ -1,6 +1,8 @@
>>>>>  #!/bin/bash
>>>>>  # SPDX-License-Identifier: GPL-2.0+ or MIT
>>>>>  
>>>>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>>>>> +
>>>>>  EXTRA_ARGS=${@}
>>>>>  
>>>>>  OLDIFS="$IFS"
>>>>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>>>>  
>>>>>  REPS=1000
>>>>>  SLOW_REPS=100
>>>>> +NR_THREADS=$((6*${NR_CPUS}))
>>>>>  
>>>>>  function do_tests()
>>>>>  {
>>>>>  	local i=0
>>>>>  	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>>>>  		echo "Running test ${TEST_NAME[$i]}"
>>>>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>>>>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>>>>> || exit 1
>>>>>  		echo "Running compare-twice test ${TEST_NAME[$i]}"
>>>>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>>>>> exit 1
>>>>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>>>>> ${EXTRA_ARGS} || exit 1
>>>>>  		let "i++"
>>>>>  	done
>>>>>  }
>>>> 
>>>> BTW., when trying to build the rseq self-tests I get this build failure:
>>>> 
>>>>  dagon:~/tip/tools/testing/selftests/rseq> make
>>>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>>>>  -fPIC rseq.c -lpthread -o
>>>>  /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>>>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>>>>  -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>>>>  gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>>>>  basic_percpu_ops_test.c -lpthread -lrseq -o
>>>>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>>>>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>>>>  reference to `.L8'
>>>>  /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>>>>  undefined reference to `.L49'
>>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>>>>  /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>>>>  reference to `.L57'
>>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>>>>  `.L49'
>>>>  /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>>>>  `.L55'
>>>>  collect2: error: ld returned 1 exit status
>>>>  make: *** [Makefile:22:
>>>>  /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
>>>> 
>>>> Is this a known problem, or do I miss something from my build environment
>>>> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).
>>> 
>>> It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
>>> but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
>>> (experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).
>>> 
>>> Thanks for reporting! I will investigate.
>> 
>> It looks like gcc-8 optimize away the target of asm goto labels when
>> there are more than one of them on x86-64. I'll try to come up with
>> a simpler reproducer.
> 
> It appears to be related to gcc-8 mishandling combination of
> asm goto and thread-local storage input operands on x86-64.
> Here is a simple reproducer:
> 
> __thread int var;
>  
> static int fct(void)
> {
>        asm goto (      "jmp %l[testlabel]\n\t"
>                        : : [var] "m" (var) : : testlabel);
>        return 0;
> testlabel:

FWIW, if I add an empty

   asm volatile ("");

here after the label, gcc-8 -O2 builds "something" which is
a bogus assembler (an endless loop) :

main:
.LFB24:
        .cfi_startproc
.L2:
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
#APP
# 6 "test-asm-goto.c" 1
        jmp .L2

# 0 "" 2
#NO_APP
        movl    %fs:var@tpoff, %edx
        leaq    .LC0(%rip), %rsi
        movl    $1, %edi
        xorl    %eax, %eax
        call    __printf_chk@PLT
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

Thoughts ?

Thanks,

Mathieu

>        return 1;
> }
> 
> int main()
> {
>        return fct();
> }
> 
> building with gcc-7 -O2 is fine. Building with gcc-8 -O0 is
> fine too. Building with gcc-8 -O1 and -O2 fails with:
> 
> /tmp/ccuXTFfs.o: In function `main':
> test-asm-goto.c:(.text.startup+0x1): undefined reference to `.L2'
> collect2: error: ld returned 1 exit status
> 
> With gcc-7 -O2, the assembly of main has the .L2 label:
> 
> main:
> .LFB1:
>        .cfi_startproc
> #APP
> # 5 "test-asm-goto.c" 1
>        jmp .L2
> 
> # 0 "" 2
> #NO_APP
> .L4:
> .L3:
>        xorl    %eax, %eax
>        ret
> .L2:
>        movl    $1, %eax
>        ret
>        .cfi_endproc
> 
> However, with gcc-8 -O2, it's missing:
> 
> main:
> .LFB1:
>        .cfi_startproc
> .L3:
> #APP
> # 5 "test-asm-goto.c" 1
>        jmp .L2
> 
> # 0 "" 2
> #NO_APP
>        xorl    %eax, %eax
>        ret
>        .cfi_endproc
> 
> It looks like we have a compiler issue. :-/
> 
> Thanks,
> 
> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 13:48           ` Mathieu Desnoyers
@ 2019-04-19 14:17             ` shuah
  2019-04-19 14:40               ` Mathieu Desnoyers
  0 siblings, 1 reply; 27+ messages in thread
From: shuah @ 2019-04-19 14:17 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, linux-kselftest, shuah

On 4/19/19 7:48 AM, Mathieu Desnoyers wrote:
> ----- On Apr 19, 2019, at 9:42 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> 
>> ----- On Apr 19, 2019, at 8:55 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>>
>>> ----- On Apr 19, 2019, at 8:41 AM, Mathieu Desnoyers
>>> mathieu.desnoyers@efficios.com wrote:
>>>
>>>> ----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:
>>>>
>>>>> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>>
>>>>>> On smaller systems, running a test with 200 threads can take a long
>>>>>> time on machines with smaller number of CPUs.
>>>>>>
>>>>>> Detect the number of online cpus at test runtime, and multiply that
>>>>>> by 6 to have 6 rseq threads per cpu preempting each other.
>>>>>>
>>>>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>> Cc: Shuah Khan <shuah@kernel.org>
>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>> Cc: Joel Fernandes <joelaf@google.com>
>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>>> Cc: Dave Watson <davejwatson@fb.com>
>>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>>> Cc: Andi Kleen <andi@firstfloor.org>
>>>>>> Cc: linux-kselftest@vger.kernel.org
>>>>>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>>>>>> Cc: Chris Lameter <cl@linux.com>
>>>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>>>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>>>>>> Cc: Paul Turner <pjt@google.com>
>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>>> Cc: Ben Maurer <bmaurer@fb.com>
>>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>> ---
>>>>>>   tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>>>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>>>>>> b/tools/testing/selftests/rseq/run_param_test.sh
>>>>>> index 3acd6d75ff9f..e426304fd4a0 100755
>>>>>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>>>>>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>>>>>> @@ -1,6 +1,8 @@
>>>>>>   #!/bin/bash
>>>>>>   # SPDX-License-Identifier: GPL-2.0+ or MIT
>>>>>>   
>>>>>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>>>>>> +
>>>>>>   EXTRA_ARGS=${@}
>>>>>>   
>>>>>>   OLDIFS="$IFS"
>>>>>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>>>>>   
>>>>>>   REPS=1000
>>>>>>   SLOW_REPS=100
>>>>>> +NR_THREADS=$((6*${NR_CPUS}))
>>>>>>   
>>>>>>   function do_tests()
>>>>>>   {
>>>>>>   	local i=0
>>>>>>   	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>>>>>   		echo "Running test ${TEST_NAME[$i]}"
>>>>>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>>>>>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>>>>>> || exit 1
>>>>>>   		echo "Running compare-twice test ${TEST_NAME[$i]}"
>>>>>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>>>>>> exit 1
>>>>>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>>>>>> ${EXTRA_ARGS} || exit 1
>>>>>>   		let "i++"
>>>>>>   	done
>>>>>>   }
>>>>>
>>>>> BTW., when trying to build the rseq self-tests I get this build failure:
>>>>>
>>>>>   dagon:~/tip/tools/testing/selftests/rseq> make
>>>>>   gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>>>>>   -fPIC rseq.c -lpthread -o
>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>>>>>   gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>>>>>   -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>>>>>   gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>>>>>   basic_percpu_ops_test.c -lpthread -lrseq -o
>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>>>>>   reference to `.L8'
>>>>>   /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>>>>>   undefined reference to `.L49'
>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>>>>>   reference to `.L57'
>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>>>>>   `.L49'
>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>>>>>   `.L55'
>>>>>   collect2: error: ld returned 1 exit status
>>>>>   make: *** [Makefile:22:
>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
>>>>>
>>>>> Is this a known problem, or do I miss something from my build environment
>>>>> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).
>>>>
>>>> It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
>>>> but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
>>>> (experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).
>>>>
>>>> Thanks for reporting! I will investigate.
>>>
>>> It looks like gcc-8 optimize away the target of asm goto labels when
>>> there are more than one of them on x86-64. I'll try to come up with
>>> a simpler reproducer.
>>
>> It appears to be related to gcc-8 mishandling combination of
>> asm goto and thread-local storage input operands on x86-64.
>> Here is a simple reproducer:
>>
>> __thread int var;
>>   
>> static int fct(void)
>> {
>>         asm goto (      "jmp %l[testlabel]\n\t"
>>                         : : [var] "m" (var) : : testlabel);
>>         return 0;
>> testlabel:
> 
> FWIW, if I add an empty
> 
>     asm volatile ("");
> 
> here after the label, gcc-8 -O2 builds "something" which is
> a bogus assembler (an endless loop) :
> 
> main:
> .LFB24:
>          .cfi_startproc
> .L2:
>          subq    $8, %rsp
>          .cfi_def_cfa_offset 16
> #APP
> # 6 "test-asm-goto.c" 1
>          jmp .L2
> 
> # 0 "" 2
> #NO_APP
>          movl    %fs:var@tpoff, %edx
>          leaq    .LC0(%rip), %rsi
>          movl    $1, %edi
>          xorl    %eax, %eax
>          call    __printf_chk@PLT
>          xorl    %eax, %eax
>          addq    $8, %rsp
>          .cfi_def_cfa_offset 8
>          ret
>          .cfi_endproc
> 
> Thoughts ?
> 

Didn't see problems when I tested it before applying it to
linux-kselftest next.

I have gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

thanks,
-- Shuah



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 14:17             ` shuah
@ 2019-04-19 14:40               ` Mathieu Desnoyers
  2019-04-19 18:57                 ` shuah
  0 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 14:40 UTC (permalink / raw)
  To: shuah, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, linux-kselftest

----- On Apr 19, 2019, at 10:17 AM, shuah shuah@kernel.org wrote:

> On 4/19/19 7:48 AM, Mathieu Desnoyers wrote:
>> ----- On Apr 19, 2019, at 9:42 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>> 
>>> ----- On Apr 19, 2019, at 8:55 AM, Mathieu Desnoyers
>>> mathieu.desnoyers@efficios.com wrote:
>>>
>>>> ----- On Apr 19, 2019, at 8:41 AM, Mathieu Desnoyers
>>>> mathieu.desnoyers@efficios.com wrote:
>>>>
>>>>> ----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:
>>>>>
>>>>>> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>>>
>>>>>>> On smaller systems, running a test with 200 threads can take a long
>>>>>>> time on machines with smaller number of CPUs.
>>>>>>>
>>>>>>> Detect the number of online cpus at test runtime, and multiply that
>>>>>>> by 6 to have 6 rseq threads per cpu preempting each other.
>>>>>>>
>>>>>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>>> Cc: Shuah Khan <shuah@kernel.org>
>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>> Cc: Joel Fernandes <joelaf@google.com>
>>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>>>> Cc: Dave Watson <davejwatson@fb.com>
>>>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>>>> Cc: Andi Kleen <andi@firstfloor.org>
>>>>>>> Cc: linux-kselftest@vger.kernel.org
>>>>>>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>>>>>>> Cc: Chris Lameter <cl@linux.com>
>>>>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>>>>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>>>>>>> Cc: Paul Turner <pjt@google.com>
>>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>>>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>>>> Cc: Ben Maurer <bmaurer@fb.com>
>>>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>> ---
>>>>>>>   tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>>>>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>> b/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>> index 3acd6d75ff9f..e426304fd4a0 100755
>>>>>>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>   #!/bin/bash
>>>>>>>   # SPDX-License-Identifier: GPL-2.0+ or MIT
>>>>>>>   
>>>>>>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>>>>>>> +
>>>>>>>   EXTRA_ARGS=${@}
>>>>>>>   
>>>>>>>   OLDIFS="$IFS"
>>>>>>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>>>>>>   
>>>>>>>   REPS=1000
>>>>>>>   SLOW_REPS=100
>>>>>>> +NR_THREADS=$((6*${NR_CPUS}))
>>>>>>>   
>>>>>>>   function do_tests()
>>>>>>>   {
>>>>>>>   	local i=0
>>>>>>>   	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>>>>>>   		echo "Running test ${TEST_NAME[$i]}"
>>>>>>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>>>>>>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>>>>>>> || exit 1
>>>>>>>   		echo "Running compare-twice test ${TEST_NAME[$i]}"
>>>>>>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>>>>>>> exit 1
>>>>>>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>>>>>>> ${EXTRA_ARGS} || exit 1
>>>>>>>   		let "i++"
>>>>>>>   	done
>>>>>>>   }
>>>>>>
>>>>>> BTW., when trying to build the rseq self-tests I get this build failure:
>>>>>>
>>>>>>   dagon:~/tip/tools/testing/selftests/rseq> make
>>>>>>   gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>>>>>>   -fPIC rseq.c -lpthread -o
>>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>>>>>>   gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>>>>>>   -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>>>>>>   gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>>>>>>   basic_percpu_ops_test.c -lpthread -lrseq -o
>>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>>>>>>   reference to `.L8'
>>>>>>   /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>>>>>>   undefined reference to `.L49'
>>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>>>>>>   reference to `.L57'
>>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>>>>>>   `.L49'
>>>>>>   /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>>>>>>   `.L55'
>>>>>>   collect2: error: ld returned 1 exit status
>>>>>>   make: *** [Makefile:22:
>>>>>>   /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
>>>>>>
>>>>>> Is this a known problem, or do I miss something from my build environment
>>>>>> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).
>>>>>
>>>>> It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
>>>>> but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
>>>>> (experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).
>>>>>
>>>>> Thanks for reporting! I will investigate.
>>>>
>>>> It looks like gcc-8 optimize away the target of asm goto labels when
>>>> there are more than one of them on x86-64. I'll try to come up with
>>>> a simpler reproducer.
>>>
>>> It appears to be related to gcc-8 mishandling combination of
>>> asm goto and thread-local storage input operands on x86-64.
>>> Here is a simple reproducer:
>>>
>>> __thread int var;
>>>   
>>> static int fct(void)
>>> {
>>>         asm goto (      "jmp %l[testlabel]\n\t"
>>>                         : : [var] "m" (var) : : testlabel);
>>>         return 0;
>>> testlabel:
>> 
>> FWIW, if I add an empty
>> 
>>     asm volatile ("");
>> 
>> here after the label, gcc-8 -O2 builds "something" which is
>> a bogus assembler (an endless loop) :
>> 
>> main:
>> .LFB24:
>>          .cfi_startproc
>> .L2:
>>          subq    $8, %rsp
>>          .cfi_def_cfa_offset 16
>> #APP
>> # 6 "test-asm-goto.c" 1
>>          jmp .L2
>> 
>> # 0 "" 2
>> #NO_APP
>>          movl    %fs:var@tpoff, %edx
>>          leaq    .LC0(%rip), %rsi
>>          movl    $1, %edi
>>          xorl    %eax, %eax
>>          call    __printf_chk@PLT
>>          xorl    %eax, %eax
>>          addq    $8, %rsp
>>          .cfi_def_cfa_offset 8
>>          ret
>>          .cfi_endproc
>> 
>> Thoughts ?
>> 
> 
> Didn't see problems when I tested it before applying it to
> linux-kselftest next.
> 
> I have gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

It really appears to be an optimization bug in gcc-8. Considering that
bogus compilers are released in the wild, we can hardly justify using
the compiler feature that triggers the bogus behavior, even if it gets
fixed in the future.

I've prepared a patch that changes the way the __rseq_abi fields are
passed to the inline asm. I pass the address of the __rseq_abi TLS
as a register input operand rather than each individual field as "m"
operand.

I will submit it in a separate thread.

By the way, it affects both x86-32 (building with gcc-8 -m32) and x86-64.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 14:40               ` Mathieu Desnoyers
@ 2019-04-19 18:57                 ` shuah
  2019-04-19 20:59                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 27+ messages in thread
From: shuah @ 2019-04-19 18:57 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, linux-api, Peter Zijlstra,
	Paul E . McKenney, Boqun Feng, Andy Lutomirski, Dave Watson,
	Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Joel Fernandes, linux-kselftest, shuah

On 4/19/19 8:40 AM, Mathieu Desnoyers wrote:
> ----- On Apr 19, 2019, at 10:17 AM, shuah shuah@kernel.org wrote:
> 
>> On 4/19/19 7:48 AM, Mathieu Desnoyers wrote:
>>> ----- On Apr 19, 2019, at 9:42 AM, Mathieu Desnoyers
>>> mathieu.desnoyers@efficios.com wrote:
>>>
>>>> ----- On Apr 19, 2019, at 8:55 AM, Mathieu Desnoyers
>>>> mathieu.desnoyers@efficios.com wrote:
>>>>
>>>>> ----- On Apr 19, 2019, at 8:41 AM, Mathieu Desnoyers
>>>>> mathieu.desnoyers@efficios.com wrote:
>>>>>
>>>>>> ----- On Apr 19, 2019, at 6:38 AM, Ingo Molnar mingo@kernel.org wrote:
>>>>>>
>>>>>>> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>>>>
>>>>>>>> On smaller systems, running a test with 200 threads can take a long
>>>>>>>> time on machines with smaller number of CPUs.
>>>>>>>>
>>>>>>>> Detect the number of online cpus at test runtime, and multiply that
>>>>>>>> by 6 to have 6 rseq threads per cpu preempting each other.
>>>>>>>>
>>>>>>>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>>>> Cc: Shuah Khan <shuah@kernel.org>
>>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>> Cc: Joel Fernandes <joelaf@google.com>
>>>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>>>>> Cc: Dave Watson <davejwatson@fb.com>
>>>>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>>>>> Cc: Andi Kleen <andi@firstfloor.org>
>>>>>>>> Cc: linux-kselftest@vger.kernel.org
>>>>>>>> Cc: "H . Peter Anvin" <hpa@zytor.com>
>>>>>>>> Cc: Chris Lameter <cl@linux.com>
>>>>>>>> Cc: Russell King <linux@arm.linux.org.uk>
>>>>>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>>>>>> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
>>>>>>>> Cc: Paul Turner <pjt@google.com>
>>>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>>>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>>>>>>> Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>>>>> Cc: Ben Maurer <bmaurer@fb.com>
>>>>>>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>>> ---
>>>>>>>>    tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
>>>>>>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>>> b/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>>> index 3acd6d75ff9f..e426304fd4a0 100755
>>>>>>>> --- a/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>>> +++ b/tools/testing/selftests/rseq/run_param_test.sh
>>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>>    #!/bin/bash
>>>>>>>>    # SPDX-License-Identifier: GPL-2.0+ or MIT
>>>>>>>>    
>>>>>>>> +NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
>>>>>>>> +
>>>>>>>>    EXTRA_ARGS=${@}
>>>>>>>>    
>>>>>>>>    OLDIFS="$IFS"
>>>>>>>> @@ -28,15 +30,16 @@ IFS="$OLDIFS"
>>>>>>>>    
>>>>>>>>    REPS=1000
>>>>>>>>    SLOW_REPS=100
>>>>>>>> +NR_THREADS=$((6*${NR_CPUS}))
>>>>>>>>    
>>>>>>>>    function do_tests()
>>>>>>>>    {
>>>>>>>>    	local i=0
>>>>>>>>    	while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
>>>>>>>>    		echo "Running test ${TEST_NAME[$i]}"
>>>>>>>> -		./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
>>>>>>>> +		./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS}
>>>>>>>> || exit 1
>>>>>>>>    		echo "Running compare-twice test ${TEST_NAME[$i]}"
>>>>>>>> -		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} ||
>>>>>>>> exit 1
>>>>>>>> +		./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@}
>>>>>>>> ${EXTRA_ARGS} || exit 1
>>>>>>>>    		let "i++"
>>>>>>>>    	done
>>>>>>>>    }
>>>>>>>
>>>>>>> BTW., when trying to build the rseq self-tests I get this build failure:
>>>>>>>
>>>>>>>    dagon:~/tip/tools/testing/selftests/rseq> make
>>>>>>>    gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ -shared
>>>>>>>    -fPIC rseq.c -lpthread -o
>>>>>>>    /home/mingo/tip/tools/testing/selftests/rseq/librseq.so
>>>>>>>    gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./ basic_test.c
>>>>>>>    -lpthread -lrseq -o /home/mingo/tip/tools/testing/selftests/rseq/basic_test
>>>>>>>    gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
>>>>>>>    basic_percpu_ops_test.c -lpthread -lrseq -o
>>>>>>>    /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test
>>>>>>>    /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpeqv_storev':
>>>>>>>    /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84: undefined
>>>>>>>    reference to `.L8'
>>>>>>>    /usr/bin/ld: /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:84:
>>>>>>>    undefined reference to `.L49'
>>>>>>>    /usr/bin/ld: /tmp/ccuHTWnZ.o: in function `rseq_cmpnev_storeoffp_load':
>>>>>>>    /home/mingo/tip/tools/testing/selftests/rseq/./rseq-x86.h:141: undefined
>>>>>>>    reference to `.L57'
>>>>>>>    /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x8): undefined reference to `.L8'
>>>>>>>    /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x14): undefined reference to
>>>>>>>    `.L49'
>>>>>>>    /usr/bin/ld: /tmp/ccuHTWnZ.o:(__rseq_failure+0x20): undefined reference to
>>>>>>>    `.L55'
>>>>>>>    collect2: error: ld returned 1 exit status
>>>>>>>    make: *** [Makefile:22:
>>>>>>>    /home/mingo/tip/tools/testing/selftests/rseq/basic_percpu_ops_test] Error 1
>>>>>>>
>>>>>>> Is this a known problem, or do I miss something from my build environment
>>>>>>> perhaps? Vanilla 64-bit Ubuntu 18.10 (Cosmic).
>>>>>>
>>>>>> It works fine with gcc-7 (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3))
>>>>>> but indeed I get the same failure with gcc-8 (gcc version 8.0.1 20180414
>>>>>> (experimental) [trunk revision 259383] (Ubuntu 8-20180414-1ubuntu2)).
>>>>>>
>>>>>> Thanks for reporting! I will investigate.
>>>>>
>>>>> It looks like gcc-8 optimize away the target of asm goto labels when
>>>>> there are more than one of them on x86-64. I'll try to come up with
>>>>> a simpler reproducer.
>>>>
>>>> It appears to be related to gcc-8 mishandling combination of
>>>> asm goto and thread-local storage input operands on x86-64.
>>>> Here is a simple reproducer:
>>>>
>>>> __thread int var;
>>>>    
>>>> static int fct(void)
>>>> {
>>>>          asm goto (      "jmp %l[testlabel]\n\t"
>>>>                          : : [var] "m" (var) : : testlabel);
>>>>          return 0;
>>>> testlabel:
>>>
>>> FWIW, if I add an empty
>>>
>>>      asm volatile ("");
>>>
>>> here after the label, gcc-8 -O2 builds "something" which is
>>> a bogus assembler (an endless loop) :
>>>
>>> main:
>>> .LFB24:
>>>           .cfi_startproc
>>> .L2:
>>>           subq    $8, %rsp
>>>           .cfi_def_cfa_offset 16
>>> #APP
>>> # 6 "test-asm-goto.c" 1
>>>           jmp .L2
>>>
>>> # 0 "" 2
>>> #NO_APP
>>>           movl    %fs:var@tpoff, %edx
>>>           leaq    .LC0(%rip), %rsi
>>>           movl    $1, %edi
>>>           xorl    %eax, %eax
>>>           call    __printf_chk@PLT
>>>           xorl    %eax, %eax
>>>           addq    $8, %rsp
>>>           .cfi_def_cfa_offset 8
>>>           ret
>>>           .cfi_endproc
>>>
>>> Thoughts ?
>>>
>>
>> Didn't see problems when I tested it before applying it to
>> linux-kselftest next.
>>
>> I have gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
> 
> It really appears to be an optimization bug in gcc-8. Considering that
> bogus compilers are released in the wild, we can hardly justify using
> the compiler feature that triggers the bogus behavior, even if it gets
> fixed in the future.
> 
> I've prepared a patch that changes the way the __rseq_abi fields are
> passed to the inline asm. I pass the address of the __rseq_abi TLS
> as a register input operand rather than each individual field as "m"
> operand.
> 
> I will submit it in a separate thread.
> 
> By the way, it affects both x86-32 (building with gcc-8 -m32) and x86-64.
> 

Should I drop this patch that is currently in linux-kseltest next? Just
confirming if your new patch is supposed to be applied on top of this
one or not?

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 18:57                 ` shuah
@ 2019-04-19 20:59                   ` Mathieu Desnoyers
  2019-04-19 21:03                     ` shuah
  0 siblings, 1 reply; 27+ messages in thread
From: Mathieu Desnoyers @ 2019-04-19 20:59 UTC (permalink / raw)
  To: shuah
  Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, linux-api,
	Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Paul Turner, Andrew Morton, Russell King,
	Ingo Molnar, H. Peter Anvin, Andi Kleen, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds,
	Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes,
	linux-kselftest

----- On Apr 19, 2019, at 2:57 PM, shuah shuah@kernel.org wrote:
[...]
> Should I drop this patch that is currently in linux-kseltest next? Just
> confirming if your new patch is supposed to be applied on top of this
> one or not?

We should keep this patch in linux-kselftest next. The fix applies on top
of it.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus
  2019-04-19 20:59                   ` Mathieu Desnoyers
@ 2019-04-19 21:03                     ` shuah
  0 siblings, 0 replies; 27+ messages in thread
From: shuah @ 2019-04-19 21:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, linux-api,
	Peter Zijlstra, Paul E . McKenney, Boqun Feng, Andy Lutomirski,
	Dave Watson, Paul Turner, Andrew Morton, Russell King,
	Ingo Molnar, H. Peter Anvin, Andi Kleen, Chris Lameter,
	Ben Maurer, rostedt, Josh Triplett, Linus Torvalds,
	Catalin Marinas, Will Deacon, Michael Kerrisk, Joel Fernandes,
	linux-kselftest, shuah

On 4/19/19 2:59 PM, Mathieu Desnoyers wrote:
> ----- On Apr 19, 2019, at 2:57 PM, shuah shuah@kernel.org wrote:
> [...]
>> Should I drop this patch that is currently in linux-kseltest next? Just
>> confirming if your new patch is supposed to be applied on top of this
>> one or not?
> 
> We should keep this patch in linux-kselftest next. The fix applies on top
> of it.
> 

Will do. Thanks for a quick response.

-- Shuah

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-04-19 21:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05 19:47 [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
2019-03-05 19:47 ` [PATCH for 5.1 1/3] rseq: cleanup: Reflect removal of event counter in comments Mathieu Desnoyers
2019-04-19 10:43   ` [tip:core/rseq] rseq: Clean up comments by reflecting removal of event counter tip-bot for Mathieu Desnoyers
2019-03-05 19:47 ` [PATCH for 5.1 2/3] rseq: cleanup: remove rseq_len from task_struct Mathieu Desnoyers
2019-04-19 10:43   ` [tip:core/rseq] rseq: Remove superfluous " tip-bot for Mathieu Desnoyers
2019-03-05 19:47 ` [PATCH for 5.1 3/3] rseq/selftests: Adapt number of threads to the number of detected cpus Mathieu Desnoyers
2019-04-19 10:38   ` Ingo Molnar
2019-04-19 12:41     ` Mathieu Desnoyers
2019-04-19 12:55       ` Mathieu Desnoyers
2019-04-19 13:42         ` Mathieu Desnoyers
2019-04-19 13:48           ` Mathieu Desnoyers
2019-04-19 14:17             ` shuah
2019-04-19 14:40               ` Mathieu Desnoyers
2019-04-19 18:57                 ` shuah
2019-04-19 20:59                   ` Mathieu Desnoyers
2019-04-19 21:03                     ` shuah
2019-04-19 10:44   ` [tip:core/rseq] " tip-bot for Mathieu Desnoyers
2019-04-19 10:46   ` [tip:core/rseq] rseq/selftests: Adapt number of threads to the number of detected CPUs tip-bot for Mathieu Desnoyers
2019-03-05 20:18 ` [PATCH for 5.1 0/3] Restartable Sequences updates for 5.1 Mathieu Desnoyers
2019-03-05 21:58   ` Peter Zijlstra
2019-03-05 22:32     ` Mathieu Desnoyers
2019-03-06  8:21       ` Peter Zijlstra
2019-03-06  8:30       ` Peter Zijlstra
2019-03-06 17:00         ` Mathieu Desnoyers
2019-03-05 21:49 ` Peter Zijlstra
2019-04-19 10:41 ` Ingo Molnar
2019-04-19 12:42   ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).