All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
@ 2015-12-16 22:09 Mathieu Desnoyers
  2015-12-17 12:54 ` Mathieu Desnoyers
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2015-12-16 22:09 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Mathieu Desnoyers, Michael Jeanson, Ralf Baechle, linux-mips,
	linux-kernel, James E.J. Bottomley, Helge Deller, linux-parisc

When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
signal handler. This spurious ENOSYS behavior causes hangs in liburcu
0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
same behavior. This might affect earlier kernels.

This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
nevertheless, we should try to handle this kernel bug more gracefully
than a user-space hang due to unexpected spurious ENOSYS return value.

Therefore, fallback on the "async-safe" version of compat_futex in those
situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
the nice property of being OK to use concurrently with other FUTEX_WAKE
and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.

We suspect that parisc might be affected by a similar issue (Debian
build bots reported a similar hang on both mips and parisc), but we do
not have access to the hardware required to test this hypothesis.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Michael Jeanson <mjeanson@efficios.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: linux-mips@linux-mips.org
CC: linux-kernel@vger.kernel.org
CC: "James E.J. Bottomley" <jejb@parisc-linux.org>
CC: Helge Deller <deller@gmx.de>
CC: linux-parisc@vger.kernel.org
---
 compat_futex.c |  2 ++
 urcu/futex.h   | 12 +++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/compat_futex.c b/compat_futex.c
index b7f78f0..9e918fe 100644
--- a/compat_futex.c
+++ b/compat_futex.c
@@ -111,6 +111,8 @@ end:
  * _ASYNC SIGNAL-SAFE_.
  * For now, timeout, uaddr2 and val3 are unused.
  * Waiter will busy-loop trying to read the condition.
+ * It is OK to use compat_futex_async() on a futex address on which
+ * futex() WAKE operations are also performed.
  */
 
 int compat_futex_async(int32_t *uaddr, int op, int32_t val,
diff --git a/urcu/futex.h b/urcu/futex.h
index 4d16cfa..a17eda8 100644
--- a/urcu/futex.h
+++ b/urcu/futex.h
@@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op, int32_t val,
 
 	ret = futex(uaddr, op, val, timeout, uaddr2, val3);
 	if (caa_unlikely(ret < 0 && errno == ENOSYS)) {
-		return compat_futex_noasync(uaddr, op, val, timeout,
+		/*
+		 * The fallback on ENOSYS is the async-safe version of
+		 * the compat futex implementation, because the
+		 * async-safe compat implementation allows being used
+		 * concurrently with calls to futex(). Indeed, sys_futex
+		 * FUTEX_WAIT, on some architectures (e.g. mips), within
+		 * a given process, spuriously return ENOSYS due to
+		 * signal restart bugs on some kernel versions (e.g.
+		 * Linux kernel 3.18 and possibly earlier).
+		 */
+		return compat_futex_async(uaddr, op, val, timeout,
 				uaddr2, val3);
 	}
 	return ret;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-16 22:09 [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug Mathieu Desnoyers
@ 2015-12-17 12:54 ` Mathieu Desnoyers
  2015-12-17 13:16   ` Ed Swierk
  2015-12-17 16:22   ` Aw: " Helge Deller
  0 siblings, 2 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2015-12-17 12:54 UTC (permalink / raw)
  To: Paul E. McKenney, Jon Bernard
  Cc: Michael Jeanson, Ralf Baechle, linux-mips, linux-kernel,
	James E.J. Bottomley, Helge Deller, linux-parisc, Ed Swierk,
	Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 3768 bytes --]

----- On Dec 16, 2015, at 5:09 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
> same behavior. This might affect earlier kernels.
> 
> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
> nevertheless, we should try to handle this kernel bug more gracefully
> than a user-space hang due to unexpected spurious ENOSYS return value.

It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
Linux kernel upstream fix commit is:
e967ef02 "MIPS: Fix restart of indirect syscalls"

I've created a small test program that could also be used on parisc
to check if it suffers from the same issue (see attached).

On bogus mips kernels, we see the following output:
[OK] Test program with pid: 5748 SIGUSR1 handler
[FAIL] futex returns -1, Function not implemented

Let me know if someone can try it out on a parisc kernel.

Thanks!

Mathieu

> 
> Therefore, fallback on the "async-safe" version of compat_futex in those
> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
> the nice property of being OK to use concurrently with other FUTEX_WAKE
> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
> 
> We suspect that parisc might be affected by a similar issue (Debian
> build bots reported a similar hang on both mips and parisc), but we do
> not have access to the hardware required to test this hypothesis.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Michael Jeanson <mjeanson@efficios.com>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CC: Ralf Baechle <ralf@linux-mips.org>
> CC: linux-mips@linux-mips.org
> CC: linux-kernel@vger.kernel.org
> CC: "James E.J. Bottomley" <jejb@parisc-linux.org>
> CC: Helge Deller <deller@gmx.de>
> CC: linux-parisc@vger.kernel.org
> ---
> compat_futex.c |  2 ++
> urcu/futex.h   | 12 +++++++++++-
> 2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/compat_futex.c b/compat_futex.c
> index b7f78f0..9e918fe 100644
> --- a/compat_futex.c
> +++ b/compat_futex.c
> @@ -111,6 +111,8 @@ end:
>  * _ASYNC SIGNAL-SAFE_.
>  * For now, timeout, uaddr2 and val3 are unused.
>  * Waiter will busy-loop trying to read the condition.
> + * It is OK to use compat_futex_async() on a futex address on which
> + * futex() WAKE operations are also performed.
>  */
> 
> int compat_futex_async(int32_t *uaddr, int op, int32_t val,
> diff --git a/urcu/futex.h b/urcu/futex.h
> index 4d16cfa..a17eda8 100644
> --- a/urcu/futex.h
> +++ b/urcu/futex.h
> @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op,
> int32_t val,
> 
> 	ret = futex(uaddr, op, val, timeout, uaddr2, val3);
> 	if (caa_unlikely(ret < 0 && errno == ENOSYS)) {
> -		return compat_futex_noasync(uaddr, op, val, timeout,
> +		/*
> +		 * The fallback on ENOSYS is the async-safe version of
> +		 * the compat futex implementation, because the
> +		 * async-safe compat implementation allows being used
> +		 * concurrently with calls to futex(). Indeed, sys_futex
> +		 * FUTEX_WAIT, on some architectures (e.g. mips), within
> +		 * a given process, spuriously return ENOSYS due to
> +		 * signal restart bugs on some kernel versions (e.g.
> +		 * Linux kernel 3.18 and possibly earlier).
> +		 */
> +		return compat_futex_async(uaddr, op, val, timeout,
> 				uaddr2, val3);
> 	}
> 	return ret;
> --
> 2.1.4

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: test-sigrestart-futex.c --]
[-- Type: text/x-c++src; name=test-sigrestart-futex.c, Size: 1440 bytes --]

#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/syscall.h>

static int value = -1;

#define FUTEX_WAIT		0
#define FUTEX_WAKE		1

static int futex(int32_t *uaddr, int op, int32_t val,
		const struct timespec *timeout, int32_t *uaddr2, int32_t val3)
{
	return syscall(__NR_futex, uaddr, op, val, timeout,
			uaddr2, val3);
}

static void sighandler(int signo, siginfo_t *siginfo, void *context)
{
	fprintf(stderr, "[OK] Test program with pid: %d SIGUSR1 handler\n",
		getpid());
}

int main(int argc, char **argv)
{
	struct sigaction act;
	pid_t pid, wait_pid;
	int ret;

	fprintf(stderr, "Testing futex sigrestart. Stop with CTRL-c.\n",
		getpid());
	act.sa_sigaction = sighandler;
	act.sa_flags = SA_SIGINFO | SA_RESTART;
	//act.sa_flags = SA_SIGINFO;
	sigemptyset(&act.sa_mask);
	ret = sigaction(SIGUSR1, &act, NULL);
	if (ret)
		abort();

	pid = fork();
	if (pid > 0) {
		/* parent */
		for (;;) {
			ret = kill(pid, SIGUSR1);
			if (ret) {
				perror("kill");
				abort();
			}
			sleep(1);
		}
	} else {
		if (pid < 0) {
			abort();
		}
		/* child */
		for (;;) {
			ret = futex(&value, FUTEX_WAIT, -1, NULL, NULL, 0);
			if (ret < 0) {
				fprintf(stderr, "[FAIL] futex returns %d, %s\n",
					ret, strerror(errno));
			} else {
				fprintf(stderr, "[FAIL] futex returns %d (unexpected)\n",
					ret);
			}
		}
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-17 12:54 ` Mathieu Desnoyers
@ 2015-12-17 13:16   ` Ed Swierk
  2015-12-17 16:22   ` Aw: " Helge Deller
  1 sibling, 0 replies; 9+ messages in thread
From: Ed Swierk @ 2015-12-17 13:16 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, Helge Deller,
	linux-parisc, Greg Kroah-Hartman

I believe e967ef02 "MIPS: Fix restart of indirect syscalls" should be
backported to all stable kernels.

It would be a surprising coincidence if parisc suffers from the same problem.

--Ed


On Thu, Dec 17, 2015 at 4:54 AM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> ----- On Dec 16, 2015, at 5:09 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
>
>> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
>> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
>> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
>> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
>> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
>> same behavior. This might affect earlier kernels.
>>
>> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
>> nevertheless, we should try to handle this kernel bug more gracefully
>> than a user-space hang due to unexpected spurious ENOSYS return value.
>
> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
> Linux kernel upstream fix commit is:
> e967ef02 "MIPS: Fix restart of indirect syscalls"
>
> I've created a small test program that could also be used on parisc
> to check if it suffers from the same issue (see attached).
>
> On bogus mips kernels, we see the following output:
> [OK] Test program with pid: 5748 SIGUSR1 handler
> [FAIL] futex returns -1, Function not implemented
>
> Let me know if someone can try it out on a parisc kernel.
>
> Thanks!
>
> Mathieu
>
>>
>> Therefore, fallback on the "async-safe" version of compat_futex in those
>> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
>> the nice property of being OK to use concurrently with other FUTEX_WAKE
>> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
>>
>> We suspect that parisc might be affected by a similar issue (Debian
>> build bots reported a similar hang on both mips and parisc), but we do
>> not have access to the hardware required to test this hypothesis.
>>
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> CC: Michael Jeanson <mjeanson@efficios.com>
>> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> CC: Ralf Baechle <ralf@linux-mips.org>
>> CC: linux-mips@linux-mips.org
>> CC: linux-kernel@vger.kernel.org
>> CC: "James E.J. Bottomley" <jejb@parisc-linux.org>
>> CC: Helge Deller <deller@gmx.de>
>> CC: linux-parisc@vger.kernel.org
>> ---
>> compat_futex.c |  2 ++
>> urcu/futex.h   | 12 +++++++++++-
>> 2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/compat_futex.c b/compat_futex.c
>> index b7f78f0..9e918fe 100644
>> --- a/compat_futex.c
>> +++ b/compat_futex.c
>> @@ -111,6 +111,8 @@ end:
>>  * _ASYNC SIGNAL-SAFE_.
>>  * For now, timeout, uaddr2 and val3 are unused.
>>  * Waiter will busy-loop trying to read the condition.
>> + * It is OK to use compat_futex_async() on a futex address on which
>> + * futex() WAKE operations are also performed.
>>  */
>>
>> int compat_futex_async(int32_t *uaddr, int op, int32_t val,
>> diff --git a/urcu/futex.h b/urcu/futex.h
>> index 4d16cfa..a17eda8 100644
>> --- a/urcu/futex.h
>> +++ b/urcu/futex.h
>> @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op,
>> int32_t val,
>>
>>       ret = futex(uaddr, op, val, timeout, uaddr2, val3);
>>       if (caa_unlikely(ret < 0 && errno == ENOSYS)) {
>> -             return compat_futex_noasync(uaddr, op, val, timeout,
>> +             /*
>> +              * The fallback on ENOSYS is the async-safe version of
>> +              * the compat futex implementation, because the
>> +              * async-safe compat implementation allows being used
>> +              * concurrently with calls to futex(). Indeed, sys_futex
>> +              * FUTEX_WAIT, on some architectures (e.g. mips), within
>> +              * a given process, spuriously return ENOSYS due to
>> +              * signal restart bugs on some kernel versions (e.g.
>> +              * Linux kernel 3.18 and possibly earlier).
>> +              */
>> +             return compat_futex_async(uaddr, op, val, timeout,
>>                               uaddr2, val3);
>>       }
>>       return ret;
>> --
>> 2.1.4
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Aw: Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-17 12:54 ` Mathieu Desnoyers
  2015-12-17 13:16   ` Ed Swierk
@ 2015-12-17 16:22   ` Helge Deller
  2015-12-18 19:58     ` Mathieu Desnoyers
  1 sibling, 1 reply; 9+ messages in thread
From: Helge Deller @ 2015-12-17 16:22 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, linux-parisc,
	Ed Swierk, Greg Kroah-Hartman

Hello Mathieu,

> > When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
> > Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
> > FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
> > signal handler. This spurious ENOSYS behavior causes hangs in liburcu
> > 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
> > same behavior. This might affect earlier kernels.
> > 
> > This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
> > nevertheless, we should try to handle this kernel bug more gracefully
> > than a user-space hang due to unexpected spurious ENOSYS return value.
> 
> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
> Linux kernel upstream fix commit is:
> e967ef02 "MIPS: Fix restart of indirect syscalls"

But that patch fixes mips only.
 
> I've created a small test program that could also be used on parisc
> to check if it suffers from the same issue (see attached).
> 
> On bogus mips kernels, we see the following output:
> [OK] Test program with pid: 5748 SIGUSR1 handler
> [FAIL] futex returns -1, Function not implemented

I tested it on a recent 4.2 kernel on parisc.
It fails as you describe:

Testing futex sigrestart. Stop with CTRL-c.
[OK] Test program with pid: 1361 SIGUSR1 handler
[OK] Test program with pid: 1361 SIGUSR1 handler
[FAIL] futex returns -1, Function not implemented
[OK] Test program with pid: 1361 SIGUSR1 handler
[FAIL] futex returns -1, Function not implemented

strace gives:
[pid  1329] futex(0x1210c, FUTEX_WAIT, -1, NULL <unfinished ...>
[pid  1328] nanosleep({1, 0},  <unfinished ...>
[pid  1329] <... futex resumed> )       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid  1329] write(2, "[FAIL] futex returns -1, Functio"..., 50[FAIL] futex returns -1, Function not implemented)


> > Therefore, fallback on the "async-safe" version of compat_futex in those
> > situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
> > the nice property of being OK to use concurrently with other FUTEX_WAKE
> > and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
> > 
> > We suspect that parisc might be affected by a similar issue (Debian
> > build bots reported a similar hang on both mips and parisc), but we do
> > not have access to the hardware required to test this hypothesis.

If you want access to a machine, let me know.
I'll try the patch below as well..

> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > CC: Michael Jeanson <mjeanson@efficios.com>
> > CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > CC: Ralf Baechle <ralf@linux-mips.org>
> > CC: linux-mips@linux-mips.org
> > CC: linux-kernel@vger.kernel.org
> > CC: "James E.J. Bottomley" <jejb@parisc-linux.org>
> > CC: Helge Deller <deller@gmx.de>
> > CC: linux-parisc@vger.kernel.org
> > ---
> > compat_futex.c |  2 ++
> > urcu/futex.h   | 12 +++++++++++-
> > 2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/compat_futex.c b/compat_futex.c
> > index b7f78f0..9e918fe 100644
> > --- a/compat_futex.c
> > +++ b/compat_futex.c
> > @@ -111,6 +111,8 @@ end:
> >  * _ASYNC SIGNAL-SAFE_.
> >  * For now, timeout, uaddr2 and val3 are unused.
> >  * Waiter will busy-loop trying to read the condition.
> > + * It is OK to use compat_futex_async() on a futex address on which
> > + * futex() WAKE operations are also performed.
> >  */
> > 
> > int compat_futex_async(int32_t *uaddr, int op, int32_t val,
> > diff --git a/urcu/futex.h b/urcu/futex.h
> > index 4d16cfa..a17eda8 100644
> > --- a/urcu/futex.h
> > +++ b/urcu/futex.h
> > @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op,
> > int32_t val,
> > 
> > 	ret = futex(uaddr, op, val, timeout, uaddr2, val3);
> > 	if (caa_unlikely(ret < 0 && errno == ENOSYS)) {
> > -		return compat_futex_noasync(uaddr, op, val, timeout,
> > +		/*
> > +		 * The fallback on ENOSYS is the async-safe version of
> > +		 * the compat futex implementation, because the
> > +		 * async-safe compat implementation allows being used
> > +		 * concurrently with calls to futex(). Indeed, sys_futex
> > +		 * FUTEX_WAIT, on some architectures (e.g. mips), within
> > +		 * a given process, spuriously return ENOSYS due to
> > +		 * signal restart bugs on some kernel versions (e.g.
> > +		 * Linux kernel 3.18 and possibly earlier).
> > +		 */
> > +		return compat_futex_async(uaddr, op, val, timeout,
> > 				uaddr2, val3);
> > 	}
> > 	return ret;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Aw: Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-17 16:22   ` Aw: " Helge Deller
@ 2015-12-18 19:58     ` Mathieu Desnoyers
  2015-12-18 20:42       ` Helge Deller
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2015-12-18 19:58 UTC (permalink / raw)
  To: Helge Deller
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, linux-parisc,
	Ed Swierk, Greg Kroah-Hartman

----- On Dec 17, 2015, at 11:22 AM, Helge Deller deller@gmx.de wrote:

> Hello Mathieu,
> 
>> > When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
>> > Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
>> > FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
>> > signal handler. This spurious ENOSYS behavior causes hangs in liburcu
>> > 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
>> > same behavior. This might affect earlier kernels.
>> > 
>> > This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
>> > nevertheless, we should try to handle this kernel bug more gracefully
>> > than a user-space hang due to unexpected spurious ENOSYS return value.
>> 
>> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
>> Linux kernel upstream fix commit is:
>> e967ef02 "MIPS: Fix restart of indirect syscalls"
> 
> But that patch fixes mips only.

Indeed, I do not expect this commit to have any effect on parisc.

> 
>> I've created a small test program that could also be used on parisc
>> to check if it suffers from the same issue (see attached).
>> 
>> On bogus mips kernels, we see the following output:
>> [OK] Test program with pid: 5748 SIGUSR1 handler
>> [FAIL] futex returns -1, Function not implemented
> 
> I tested it on a recent 4.2 kernel on parisc.
> It fails as you describe:
> 
> Testing futex sigrestart. Stop with CTRL-c.
> [OK] Test program with pid: 1361 SIGUSR1 handler
> [OK] Test program with pid: 1361 SIGUSR1 handler
> [FAIL] futex returns -1, Function not implemented
> [OK] Test program with pid: 1361 SIGUSR1 handler
> [FAIL] futex returns -1, Function not implemented
> 
> strace gives:
> [pid  1329] futex(0x1210c, FUTEX_WAIT, -1, NULL <unfinished ...>
> [pid  1328] nanosleep({1, 0},  <unfinished ...>
> [pid  1329] <... futex resumed> )       = ? ERESTARTSYS (To be restarted if
> SA_RESTART is set)
> [pid  1329] write(2, "[FAIL] futex returns -1, Functio"..., 50[FAIL] futex
> returns -1, Function not implemented)

Looks like parisc has an issue very similar to the one that
has been fixed on MIPS by e967ef02 "MIPS: Fix restart of indirect syscalls".

> 
> 
>> > Therefore, fallback on the "async-safe" version of compat_futex in those
>> > situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
>> > the nice property of being OK to use concurrently with other FUTEX_WAKE
>> > and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
>> > 
>> > We suspect that parisc might be affected by a similar issue (Debian
>> > build bots reported a similar hang on both mips and parisc), but we do
>> > not have access to the hardware required to test this hypothesis.
> 
> If you want access to a machine, let me know.
> I'll try the patch below as well..

This would be very useful indeed, just to make sure our approach to
futex fallback in liburcu works fine on parisc.

I'm no parisc assembly expert though, but I suspect the issue
would be quite similar to the one already fixed on MIPS. The
existing fix for MIPS would be a good starting point to see if
something similar is missing on parisc.

When time allows, we should consider cleaning up my test case for
restart of indirect system calls and add it to kselftest. It's
the second architecture that has the same defect, which means this
behavior is seldom tested.

Thanks,

Mathieu

> 
>> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> > CC: Michael Jeanson <mjeanson@efficios.com>
>> > CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > CC: Ralf Baechle <ralf@linux-mips.org>
>> > CC: linux-mips@linux-mips.org
>> > CC: linux-kernel@vger.kernel.org
>> > CC: "James E.J. Bottomley" <jejb@parisc-linux.org>
>> > CC: Helge Deller <deller@gmx.de>
>> > CC: linux-parisc@vger.kernel.org
>> > ---
>> > compat_futex.c |  2 ++
>> > urcu/futex.h   | 12 +++++++++++-
>> > 2 files changed, 13 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/compat_futex.c b/compat_futex.c
>> > index b7f78f0..9e918fe 100644
>> > --- a/compat_futex.c
>> > +++ b/compat_futex.c
>> > @@ -111,6 +111,8 @@ end:
>> >  * _ASYNC SIGNAL-SAFE_.
>> >  * For now, timeout, uaddr2 and val3 are unused.
>> >  * Waiter will busy-loop trying to read the condition.
>> > + * It is OK to use compat_futex_async() on a futex address on which
>> > + * futex() WAKE operations are also performed.
>> >  */
>> > 
>> > int compat_futex_async(int32_t *uaddr, int op, int32_t val,
>> > diff --git a/urcu/futex.h b/urcu/futex.h
>> > index 4d16cfa..a17eda8 100644
>> > --- a/urcu/futex.h
>> > +++ b/urcu/futex.h
>> > @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op,
>> > int32_t val,
>> > 
>> > 	ret = futex(uaddr, op, val, timeout, uaddr2, val3);
>> > 	if (caa_unlikely(ret < 0 && errno == ENOSYS)) {
>> > -		return compat_futex_noasync(uaddr, op, val, timeout,
>> > +		/*
>> > +		 * The fallback on ENOSYS is the async-safe version of
>> > +		 * the compat futex implementation, because the
>> > +		 * async-safe compat implementation allows being used
>> > +		 * concurrently with calls to futex(). Indeed, sys_futex
>> > +		 * FUTEX_WAIT, on some architectures (e.g. mips), within
>> > +		 * a given process, spuriously return ENOSYS due to
>> > +		 * signal restart bugs on some kernel versions (e.g.
>> > +		 * Linux kernel 3.18 and possibly earlier).
>> > +		 */
>> > +		return compat_futex_async(uaddr, op, val, timeout,
>> > 				uaddr2, val3);
>> > 	}
> > > 	return ret;

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Aw: Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-18 19:58     ` Mathieu Desnoyers
@ 2015-12-18 20:42       ` Helge Deller
  2015-12-19 10:37         ` Helge Deller
  0 siblings, 1 reply; 9+ messages in thread
From: Helge Deller @ 2015-12-18 20:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, linux-parisc,
	Ed Swierk, Greg Kroah-Hartman

Hi Mathieu,

On 18.12.2015 20:58, Mathieu Desnoyers wrote:
> ----- On Dec 17, 2015, at 11:22 AM, Helge Deller deller@gmx.de wrote:
> 
>> Hello Mathieu,
>>
>>>> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
>>>> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
>>>> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
>>>> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
>>>> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
>>>> same behavior. This might affect earlier kernels.
>>>>
>>>> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
>>>> nevertheless, we should try to handle this kernel bug more gracefully
>>>> than a user-space hang due to unexpected spurious ENOSYS return value.
>>>
>>> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
>>> Linux kernel upstream fix commit is:
>>> e967ef02 "MIPS: Fix restart of indirect syscalls"
>>
>> But that patch fixes mips only.
> 
> Indeed, I do not expect this commit to have any effect on parisc.
> 
>>
>>> I've created a small test program that could also be used on parisc
>>> to check if it suffers from the same issue (see attached).
>>>
>>> On bogus mips kernels, we see the following output:
>>> [OK] Test program with pid: 5748 SIGUSR1 handler
>>> [FAIL] futex returns -1, Function not implemented
>>
>> I tested it on a recent 4.2 kernel on parisc.
>> It fails as you describe:
>>
>> Testing futex sigrestart. Stop with CTRL-c.
>> [OK] Test program with pid: 1361 SIGUSR1 handler
>> [OK] Test program with pid: 1361 SIGUSR1 handler
>> [FAIL] futex returns -1, Function not implemented
>> [OK] Test program with pid: 1361 SIGUSR1 handler
>> [FAIL] futex returns -1, Function not implemented
>>
>> strace gives:
>> [pid  1329] futex(0x1210c, FUTEX_WAIT, -1, NULL <unfinished ...>
>> [pid  1328] nanosleep({1, 0},  <unfinished ...>
>> [pid  1329] <... futex resumed> )       = ? ERESTARTSYS (To be restarted if
>> SA_RESTART is set)
>> [pid  1329] write(2, "[FAIL] futex returns -1, Functio"..., 50[FAIL] futex
>> returns -1, Function not implemented)
> 
> Looks like parisc has an issue very similar to the one that
> has been fixed on MIPS by e967ef02 "MIPS: Fix restart of indirect syscalls".

Yes.

>>>> Therefore, fallback on the "async-safe" version of compat_futex in those
>>>> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
>>>> the nice property of being OK to use concurrently with other FUTEX_WAKE
>>>> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
>>>>
>>>> We suspect that parisc might be affected by a similar issue (Debian
>>>> build bots reported a similar hang on both mips and parisc), but we do
>>>> not have access to the hardware required to test this hypothesis.
>>
>> If you want access to a machine, let me know.
>> I'll try the patch below as well..
> 
> This would be very useful indeed, just to make sure our approach to
> futex fallback in liburcu works fine on parisc.

Yes, but will take me some time...

> I'm no parisc assembly expert though, but I suspect the issue
> would be quite similar to the one already fixed on MIPS. The
> existing fix for MIPS would be a good starting point to see if
> something similar is missing on parisc.

Yes, I've already started to look into the parisc assembly parts.
The problems seems to be both the same, the syscall number is not
reserved during a syscall restart.
We have problems with pthread cancellation in glibc too, maybe
it's related to this bug.

> When time allows, we should consider cleaning up my test case for
> restart of indirect system calls and add it to kselftest. 

I was thinking of adding it to the Linux Test Project (LTP) :-)

> It's
> the second architecture that has the same defect, which means this
> behavior is seldom tested.

Helge


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Aw: Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-18 20:42       ` Helge Deller
@ 2015-12-19 10:37         ` Helge Deller
  2015-12-20 14:11           ` Mathieu Desnoyers
  0 siblings, 1 reply; 9+ messages in thread
From: Helge Deller @ 2015-12-19 10:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, linux-parisc,
	Ed Swierk, Greg Kroah-Hartman

Hi Mathieu,

On 18.12.2015 21:42, Helge Deller wrote:
> On 18.12.2015 20:58, Mathieu Desnoyers wrote:
>>>>> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
>>>>> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
>>>>> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
>>>>> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
>>>>> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
>>>>> same behavior. This might affect earlier kernels.
>>>>>
>>>>> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
>>>>> nevertheless, we should try to handle this kernel bug more gracefully
>>>>> than a user-space hang due to unexpected spurious ENOSYS return value.
>>>>
>>>> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
>>>> Linux kernel upstream fix commit is:
>>>> e967ef02 "MIPS: Fix restart of indirect syscalls"

>> Looks like parisc has an issue very similar to the one that
>> has been fixed on MIPS by e967ef02 "MIPS: Fix restart of indirect syscalls".

Yes, parisc is affected the same way.
I've posted a patch to the parisc mailing list which fixes this issue for
parisc and which I plan to push into stable kernels:
http://thread.gmane.org/gmane.linux.ports.parisc/26243

Regarding your patch for liburcu:

>>>>> Therefore, fallback on the "async-safe" version of compat_futex in those
>>>>> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
>>>>> the nice property of being OK to use concurrently with other FUTEX_WAKE
>>>>> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.

I've tested your patch. It does not produce any regressions on parisc, but I can't
say for sure if it really works. ENOSYS is returned randomly, so maybe I didn't
faced a situation where your patch actually was used.

Helge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Aw: Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-19 10:37         ` Helge Deller
@ 2015-12-20 14:11           ` Mathieu Desnoyers
  2015-12-20 15:37             ` Helge Deller
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2015-12-20 14:11 UTC (permalink / raw)
  To: Helge Deller
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, linux-parisc,
	Ed Swierk, Greg Kroah-Hartman

----- On Dec 19, 2015, at 5:37 AM, Helge Deller deller@gmx.de wrote:

> Hi Mathieu,
> 
> On 18.12.2015 21:42, Helge Deller wrote:
>> On 18.12.2015 20:58, Mathieu Desnoyers wrote:
>>>>>> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
>>>>>> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
>>>>>> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
>>>>>> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
>>>>>> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
>>>>>> same behavior. This might affect earlier kernels.
>>>>>>
>>>>>> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
>>>>>> nevertheless, we should try to handle this kernel bug more gracefully
>>>>>> than a user-space hang due to unexpected spurious ENOSYS return value.
>>>>>
>>>>> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
>>>>> Linux kernel upstream fix commit is:
>>>>> e967ef02 "MIPS: Fix restart of indirect syscalls"
> 
>>> Looks like parisc has an issue very similar to the one that
>>> has been fixed on MIPS by e967ef02 "MIPS: Fix restart of indirect syscalls".
> 
> Yes, parisc is affected the same way.
> I've posted a patch to the parisc mailing list which fixes this issue for
> parisc and which I plan to push into stable kernels:
> http://thread.gmane.org/gmane.linux.ports.parisc/26243
> 
> Regarding your patch for liburcu:
> 
>>>>>> Therefore, fallback on the "async-safe" version of compat_futex in those
>>>>>> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
>>>>>> the nice property of being OK to use concurrently with other FUTEX_WAKE
>>>>>> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
> 
> I've tested your patch. It does not produce any regressions on parisc, but I
> can't
> say for sure if it really works. ENOSYS is returned randomly, so maybe I didn't
> faced a situation where your patch actually was used.

If you ran make check and make regtest, and nothing
fails/hangs, you should be OK. liburcu runs very heavy
stress-tests which makes it likely to hit race conditions
repeatedly.

Thanks!

Mathieu

> 
> Helge

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Aw: Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug
  2015-12-20 14:11           ` Mathieu Desnoyers
@ 2015-12-20 15:37             ` Helge Deller
  0 siblings, 0 replies; 9+ messages in thread
From: Helge Deller @ 2015-12-20 15:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Jon Bernard, Michael Jeanson, Ralf Baechle,
	linux-mips, linux-kernel, James E.J. Bottomley, linux-parisc,
	Ed Swierk, Greg Kroah-Hartman

On 20.12.2015 15:11, Mathieu Desnoyers wrote:
> ----- On Dec 19, 2015, at 5:37 AM, Helge Deller deller@gmx.de wrote:
> 
>> Hi Mathieu,
>>
>> On 18.12.2015 21:42, Helge Deller wrote:
>>> On 18.12.2015 20:58, Mathieu Desnoyers wrote:
>>>>>>> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
>>>>>>> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
>>>>>>> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
>>>>>>> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
>>>>>>> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
>>>>>>> same behavior. This might affect earlier kernels.
>>>>>>>
>>>>>>> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
>>>>>>> nevertheless, we should try to handle this kernel bug more gracefully
>>>>>>> than a user-space hang due to unexpected spurious ENOSYS return value.
>>>>>>
>>>>>> It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
>>>>>> Linux kernel upstream fix commit is:
>>>>>> e967ef02 "MIPS: Fix restart of indirect syscalls"
>>
>>>> Looks like parisc has an issue very similar to the one that
>>>> has been fixed on MIPS by e967ef02 "MIPS: Fix restart of indirect syscalls".
>>
>> Yes, parisc is affected the same way.
>> I've posted a patch to the parisc mailing list which fixes this issue for
>> parisc and which I plan to push into stable kernels:
>> http://thread.gmane.org/gmane.linux.ports.parisc/26243
>>
>> Regarding your patch for liburcu:
>>
>>>>>>> Therefore, fallback on the "async-safe" version of compat_futex in those
>>>>>>> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
>>>>>>> the nice property of being OK to use concurrently with other FUTEX_WAKE
>>>>>>> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
>>
>> I've tested your patch. It does not produce any regressions on parisc, but I
>> can't
>> say for sure if it really works. ENOSYS is returned randomly, so maybe I didn't
>> faced a situation where your patch actually was used.
> 
> If you ran make check and make regtest, and nothing
> fails/hangs, you should be OK.

Yes, I did run both.

> liburcu runs very heavy
> stress-tests which makes it likely to hit race conditions
> repeatedly.

Helge

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-20 15:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16 22:09 [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug Mathieu Desnoyers
2015-12-17 12:54 ` Mathieu Desnoyers
2015-12-17 13:16   ` Ed Swierk
2015-12-17 16:22   ` Aw: " Helge Deller
2015-12-18 19:58     ` Mathieu Desnoyers
2015-12-18 20:42       ` Helge Deller
2015-12-19 10:37         ` Helge Deller
2015-12-20 14:11           ` Mathieu Desnoyers
2015-12-20 15:37             ` Helge Deller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.