linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
@ 2019-11-12 17:40 Harris, Robert
  2019-11-13  7:29 ` Harris, Robert
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Harris, Robert @ 2019-11-12 17:40 UTC (permalink / raw)
  To: tglx, mingo, peterz, dvhart; +Cc: linux-kernel, Harris, Robert

I am investigating an issue on 4.9.184 in which futex() returns EPERM
intermittently for

futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)

The failure affects an application in an AWS lambda;  traditional
debugging approaches vary from difficult to impossible.  I cannot
reproduce the problem at will, instrument the kernel, install a new
kernel or get an application core dump.

Understanding the circumstances under which EPERM can be returned for
FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
mode.  I have spent some time looking through futex.c but have not
found anything yet.  I would be grateful for a hint from someone more
knowledgeable.

Please address/cc me on any reply.

Thanks,

Robert Harris
Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
  2019-11-12 17:40 Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM Harris, Robert
@ 2019-11-13  7:29 ` Harris, Robert
  2019-11-13  9:04 ` Thomas Gleixner
  2019-11-13 13:29 ` Mikael Pettersson
  2 siblings, 0 replies; 7+ messages in thread
From: Harris, Robert @ 2019-11-13  7:29 UTC (permalink / raw)
  To: tglx, mingo, peterz, dvhart; +Cc: linux-kernel



> On 12 Nov 2019, at 17:40, Harris, Robert <robert.harris@alertlogic.com> wrote:
>
> I am investigating an issue on 4.9.184 in which futex() returns EPERM
> intermittently for
>
> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>
> The failure affects an application in an AWS lambda;  traditional
> debugging approaches vary from difficult to impossible.  I cannot
> reproduce the problem at will, instrument the kernel, install a new
> kernel or get an application core dump.
>
> Understanding the circumstances under which EPERM can be returned for
> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
> mode.  I have spent some time looking through futex.c but have not
> found anything yet.  I would be grateful for a hint from someone more
> knowledgeable.
>
> Please address/cc me on any reply.

To be clear, I do mean that futex() is returning -1 and setting errno
to EPERM.

Robert
Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
  2019-11-12 17:40 Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM Harris, Robert
  2019-11-13  7:29 ` Harris, Robert
@ 2019-11-13  9:04 ` Thomas Gleixner
  2019-11-13 10:15   ` Harris, Robert
  2019-11-13 13:29 ` Mikael Pettersson
  2 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2019-11-13  9:04 UTC (permalink / raw)
  To: Harris, Robert; +Cc: Ingo Molnar, peterz, dvhart, linux-kernel

On Tue, 12 Nov 2019, Harris, Robert wrote:

> I am investigating an issue on 4.9.184 in which futex() returns EPERM
> intermittently for
> 
> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
> 
> The failure affects an application in an AWS lambda;  traditional
> debugging approaches vary from difficult to impossible.  I cannot
> reproduce the problem at will, instrument the kernel, install a new
> kernel or get an application core dump.
> 
> Understanding the circumstances under which EPERM can be returned for
> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
> mode.  I have spent some time looking through futex.c but have not
> found anything yet.  I would be grateful for a hint from someone more
> knowledgeable.

sys_futex(FUTEX_WAIT_PRIVATE) does not return -EPERM. Only the PI variants
do that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
  2019-11-13  9:04 ` Thomas Gleixner
@ 2019-11-13 10:15   ` Harris, Robert
  2019-11-13 21:34     ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Harris, Robert @ 2019-11-13 10:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, peterz, dvhart, linux-kernel, Harris, Robert



> On 13 Nov 2019, at 09:04, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, 12 Nov 2019, Harris, Robert wrote:
>
>> I am investigating an issue on 4.9.184 in which futex() returns EPERM
>> intermittently for
>>
>> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>>
>> The failure affects an application in an AWS lambda;  traditional
>> debugging approaches vary from difficult to impossible.  I cannot
>> reproduce the problem at will, instrument the kernel, install a new
>> kernel or get an application core dump.
>>
>> Understanding the circumstances under which EPERM can be returned for
>> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
>> mode.  I have spent some time looking through futex.c but have not
>> found anything yet.  I would be grateful for a hint from someone more
>> knowledgeable.
>
> sys_futex(FUTEX_WAIT_PRIVATE) does not return -EPERM. Only the PI variants
> do that.

In that case I would appreciate a second pair of eyes.  The error I see
(intermittently) is

pthread/ethr_event.c:164: Fatal error in wait__(): Operation not permitted (1)

which comes from

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/lib_src/pthread/ethr_event.c#L152-L164

> res = ETHR_FUTEX__(&e->futex,
>    ETHR_FUTEX_WAIT__,
>    ETHR_EVENT_OFF_WAITER__,
>    tsp);
> switch (res) {
> case EINTR:
> case ETIMEDOUT:
>     return res;
> case 0:
> case EWOULDBLOCK:
>     break;
> default:
>     ETHR_FATAL_ERROR__(res);

where

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/include/internal/ethread.h#L259-L260

> #define ETHR_FATAL_ERROR__(ERR) \
>   ethr_fatal_error__(__FILE__, __LINE__, __func__, (ERR))

and

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/lib_src/common/ethr_aux.c#L725-L741

> ETHR_IMPL_NORETURN__ ethr_fatal_error__(const char *file,
> int line,
> const char *func,
> int err)
> {
>     char *errstr;
>     if (err == ENOTSUP)
> errstr = "Operation not supported";
>     else {
> errstr = strerror(err);
> if (!errstr)
>     errstr = "Unknown error";
>     }
>     fprintf(stderr, "%s:%d: Fatal error in %s(): %s (%d)\n",
>     file, line, func, errstr, err);
>     ethr_abort__();
> }

and

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/include/internal/pthread/ethr_event.h#L38-L58

> #if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
> #  define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
> #  define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
> #else
> #  define ETHR_FUTEX_WAIT__ FUTEX_WAIT
> #  define ETHR_FUTEX_WAKE__ FUTEX_WAKE
> #endif
>
> typedef struct {
>     ethr_atomic32_t futex;
> } ethr_event;
>
> #define ETHR_FUTEX__(FTX, OP, VAL, TIMEOUT)\
>   (-1 == syscall(__NR_futex,\
>  (void *) ethr_atomic32_addr((FTX)),\
>  (OP),\
>  (int) (VAL),\
>  (TIMEOUT),\
>  NULL,\
>  0)\
>    ? errno : 0)

To be sure:

>    0x0000000000687e65 <+325>:   mov    $0x80,%edx
>    0x0000000000687e6a <+330>:   mov    $0xca,%edi
>    0x0000000000687e6f <+335>:   callq  0x443ab0 <syscall@plt>

Thanks,

Robert


Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
  2019-11-12 17:40 Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM Harris, Robert
  2019-11-13  7:29 ` Harris, Robert
  2019-11-13  9:04 ` Thomas Gleixner
@ 2019-11-13 13:29 ` Mikael Pettersson
  2019-11-13 17:03   ` Harris, Robert
  2 siblings, 1 reply; 7+ messages in thread
From: Mikael Pettersson @ 2019-11-13 13:29 UTC (permalink / raw)
  To: Harris, Robert; +Cc: tglx, mingo, peterz, dvhart, linux-kernel

On Tue, Nov 12, 2019 at 6:43 PM Harris, Robert
<robert.harris@alertlogic.com> wrote:
>
> I am investigating an issue on 4.9.184 in which futex() returns EPERM
> intermittently for
>
> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>
> The failure affects an application in an AWS lambda;  traditional
> debugging approaches vary from difficult to impossible.  I cannot
> reproduce the problem at will, instrument the kernel, install a new
> kernel or get an application core dump.
>
> Understanding the circumstances under which EPERM can be returned for
> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
> mode.  I have spent some time looking through futex.c but have not
> found anything yet.  I would be grateful for a hint from someone more
> knowledgeable.


I just wanted to add that a colleague of mine reported the exact same
issue to me two days ago: a highly threaded application (the Erlang
VM) running in AWS lambda, futex wait calls occasionally failing with
EPERM.  I don't have more specifics than that, I've asked for kernel
version and the exact parameters in the failed futex call.

(Third attempt, really sorry about the noise, gmail's UI sucks.)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
  2019-11-13 13:29 ` Mikael Pettersson
@ 2019-11-13 17:03   ` Harris, Robert
  0 siblings, 0 replies; 7+ messages in thread
From: Harris, Robert @ 2019-11-13 17:03 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: tglx, mingo, peterz, dvhart, linux-kernel, Harris, Robert



> On 13 Nov 2019, at 13:29, Mikael Pettersson <mikpelinux@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 6:43 PM Harris, Robert
> <robert.harris@alertlogic.com> wrote:
>>
>> I am investigating an issue on 4.9.184 in which futex() returns EPERM
>> intermittently for
>>
>> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>>
>> The failure affects an application in an AWS lambda;  traditional
>> debugging approaches vary from difficult to impossible.  I cannot
>> reproduce the problem at will, instrument the kernel, install a new
>> kernel or get an application core dump.
>>
>> Understanding the circumstances under which EPERM can be returned for
>> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
>> mode.  I have spent some time looking through futex.c but have not
>> found anything yet.  I would be grateful for a hint from someone more
>> knowledgeable.
>
>
> I just wanted to add that a colleague of mine reported the exact same
> issue to me two days ago: a highly threaded application (the Erlang
> VM) running in AWS lambda, futex wait calls occasionally failing with
> EPERM.  I don't have more specifics than that, I've asked for kernel
> version and the exact parameters in the failed futex call.

Thanks, that's a great data point.  One of my outstanding questions had
been "why does this happen to only us?"

When I look at the timings I can say with some confidence that the
problem stopped for us minutes after

2017 on 2019-10-23 in us-east-1
2030 on 2019-10-24 in eu-west-1
1817 on 2019-10-25 in us-west-2

(all times UTC).  I've logged a ticket with Amazon to find out what
changed.

Robert
Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
  2019-11-13 10:15   ` Harris, Robert
@ 2019-11-13 21:34     ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2019-11-13 21:34 UTC (permalink / raw)
  To: Harris, Robert; +Cc: Ingo Molnar, peterz, dvhart, linux-kernel

On Wed, 13 Nov 2019, Harris, Robert wrote:
> > On 13 Nov 2019, at 09:04, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 12 Nov 2019, Harris, Robert wrote:
> >> Understanding the circumstances under which EPERM can be returned for
> >> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
> >> mode.  I have spent some time looking through futex.c but have not
> >> found anything yet.  I would be grateful for a hint from someone more
> >> knowledgeable.
> >
> > sys_futex(FUTEX_WAIT_PRIVATE) does not return -EPERM. Only the PI variants
> > do that.
> 
> In that case I would appreciate a second pair of eyes.  The error I see
> (intermittently) is

The code looks innocent enough. As I don't know whether the kernel version
you mentioned is a vanilla 4.19.184 from the stable tree or some patched up
frankenkernel which pretends to have this version number, I can't be sure
that this is an issue in that particular kernel.

In the vanilla 4.19.184 I really cant find how that would return EPERM for
regular futexes.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-13 21:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 17:40 Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM Harris, Robert
2019-11-13  7:29 ` Harris, Robert
2019-11-13  9:04 ` Thomas Gleixner
2019-11-13 10:15   ` Harris, Robert
2019-11-13 21:34     ` Thomas Gleixner
2019-11-13 13:29 ` Mikael Pettersson
2019-11-13 17:03   ` Harris, Robert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).