All of lore.kernel.org
 help / color / mirror / Atom feed
* FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
@ 2018-12-24 11:52 gregkh
  2019-02-17 11:34 ` Sudip Mukherjee
  0 siblings, 1 reply; 8+ messages in thread
From: gregkh @ 2018-12-24 11:52 UTC (permalink / raw)
  To: tglx, dvhart, heiko.carstens, mingo, peterz, sashal, stli; +Cc: stable


The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.

thanks,

greg k-h

------------------ original commit in Linus's tree ------------------

From da791a667536bf8322042e38ca85d55a78d3c273 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 10 Dec 2018 14:35:14 +0100
Subject: [PATCH] futex: Cure exit race

Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():

 CPU0				CPU1

 sys_exit()			sys_futex()
  do_exit()			 futex_lock_pi()
   exit_signals(tsk)		  No waiters:
    tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
  mm_release(tsk)		  Set waiter bit
   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
      Set owner died		  attach_to_pi_owner() {
    *uaddr = 0xC0000000;	   tsk = get_task(PID);
   }				   if (!tsk->flags & PF_EXITING) {
  ...				     attach();
  tsk->flags |= PF_EXITPIDONE;	   } else {
				     if (!(tsk->flags & PF_EXITPIDONE))
				       return -EAGAIN;
				     return -ESRCH; <--- FAIL
				   }

ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.

Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.

If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.

Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de

diff --git a/kernel/futex.c b/kernel/futex.c
index f423f9b6577e..5cc8083a4c89 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1148,11 +1148,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	return ret;
 }
 
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+			    struct task_struct *tsk)
+{
+	u32 uval2;
+
+	/*
+	 * If PF_EXITPIDONE is not yet set, then try again.
+	 */
+	if (tsk && !(tsk->flags & PF_EXITPIDONE))
+		return -EAGAIN;
+
+	/*
+	 * Reread the user space value to handle the following situation:
+	 *
+	 * CPU0				CPU1
+	 *
+	 * sys_exit()			sys_futex()
+	 *  do_exit()			 futex_lock_pi()
+	 *                                futex_lock_pi_atomic()
+	 *   exit_signals(tsk)		    No waiters:
+	 *    tsk->flags |= PF_EXITING;	    *uaddr == 0x00000PID
+	 *  mm_release(tsk)		    Set waiter bit
+	 *   exit_robust_list(tsk) {	    *uaddr = 0x80000PID;
+	 *      Set owner died		    attach_to_pi_owner() {
+	 *    *uaddr = 0xC0000000;	     tsk = get_task(PID);
+	 *   }				     if (!tsk->flags & PF_EXITING) {
+	 *  ...				       attach();
+	 *  tsk->flags |= PF_EXITPIDONE;     } else {
+	 *				       if (!(tsk->flags & PF_EXITPIDONE))
+	 *				         return -EAGAIN;
+	 *				       return -ESRCH; <--- FAIL
+	 *				     }
+	 *
+	 * Returning ESRCH unconditionally is wrong here because the
+	 * user space value has been changed by the exiting task.
+	 *
+	 * The same logic applies to the case where the exiting task is
+	 * already gone.
+	 */
+	if (get_futex_value_locked(&uval2, uaddr))
+		return -EFAULT;
+
+	/* If the user space value has changed, try again. */
+	if (uval2 != uval)
+		return -EAGAIN;
+
+	/*
+	 * The exiting task did not have a robust list, the robust list was
+	 * corrupted or the user space value in *uaddr is simply bogus.
+	 * Give up and tell user space.
+	 */
+	return -ESRCH;
+}
+
 /*
  * Lookup the task for the TID provided from user space and attach to
  * it after doing proper sanity checks.
  */
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
 			      struct futex_pi_state **ps)
 {
 	pid_t pid = uval & FUTEX_TID_MASK;
@@ -1162,12 +1216,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
 	/*
 	 * We are the first waiter - try to look up the real owner and attach
 	 * the new pi_state to it, but bail out when TID = 0 [1]
+	 *
+	 * The !pid check is paranoid. None of the call sites should end up
+	 * with pid == 0, but better safe than sorry. Let the caller retry
 	 */
 	if (!pid)
-		return -ESRCH;
+		return -EAGAIN;
 	p = find_get_task_by_vpid(pid);
 	if (!p)
-		return -ESRCH;
+		return handle_exit_race(uaddr, uval, NULL);
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		put_task_struct(p);
@@ -1187,7 +1244,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
 		 * set, we know that the task has finished the
 		 * cleanup:
 		 */
-		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+		int ret = handle_exit_race(uaddr, uval, p);
 
 		raw_spin_unlock_irq(&p->pi_lock);
 		put_task_struct(p);
@@ -1244,7 +1301,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
 	 * We are the first waiter - try to look up the owner based on
 	 * @uval and attach to it.
 	 */
-	return attach_to_pi_owner(uval, key, ps);
+	return attach_to_pi_owner(uaddr, uval, key, ps);
 }
 
 static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1352,7 +1409,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
 	 * attach to the owner. If that fails, no harm done, we only
 	 * set the FUTEX_WAITERS bit in the user space variable.
 	 */
-	return attach_to_pi_owner(uval, key, ps);
+	return attach_to_pi_owner(uaddr, newval, key, ps);
 }
 
 /**


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2018-12-24 11:52 FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree gregkh
@ 2019-02-17 11:34 ` Sudip Mukherjee
  2019-02-17 11:53   ` Thomas Gleixner
  2019-02-18 12:20   ` Greg KH
  0 siblings, 2 replies; 8+ messages in thread
From: Sudip Mukherjee @ 2019-02-17 11:34 UTC (permalink / raw)
  To: gregkh; +Cc: tglx, dvhart, heiko.carstens, mingo, peterz, sashal, stli, stable

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

Hi Greg,

On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
> 
> The patch below does not apply to the 4.14-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.

The attached backported patch should apply to 4.14-stable tree.

I think we have a real usecase which is triggering this error and I was
still in the middle of debugging that. But my initial analysis was
showing that the userspace thread was stuck in the indefinite loop.
I have a reliable reproducer of the problem and will setup a test
tomorrow and confirm.

--
Regards
Sudip

[-- Attachment #2: 0001-futex-Cure-exit-race.patch --]
[-- Type: text/x-diff, Size: 6364 bytes --]

From 03cf90bf8a29dfd2bc3202ff8d322e9498058228 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 10 Dec 2018 14:35:14 +0100
Subject: [PATCH] futex: Cure exit race

commit da791a667536bf8322042e38ca85d55a78d3c273 upstream

Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():

 CPU0				CPU1

 sys_exit()			sys_futex()
  do_exit()			 futex_lock_pi()
   exit_signals(tsk)		  No waiters:
    tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
  mm_release(tsk)		  Set waiter bit
   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
      Set owner died		  attach_to_pi_owner() {
    *uaddr = 0xC0000000;	   tsk = get_task(PID);
   }				   if (!tsk->flags & PF_EXITING) {
  ...				     attach();
  tsk->flags |= PF_EXITPIDONE;	   } else {
				     if (!(tsk->flags & PF_EXITPIDONE))
				       return -EAGAIN;
				     return -ESRCH; <--- FAIL
				   }

ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.

Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.

If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.

Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
---
 kernel/futex.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 6 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index abe04a2bb5b9..29d708d0b3d1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1166,11 +1166,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
 	return ret;
 }
 
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+			    struct task_struct *tsk)
+{
+	u32 uval2;
+
+	/*
+	 * If PF_EXITPIDONE is not yet set, then try again.
+	 */
+	if (tsk && !(tsk->flags & PF_EXITPIDONE))
+		return -EAGAIN;
+
+	/*
+	 * Reread the user space value to handle the following situation:
+	 *
+	 * CPU0				CPU1
+	 *
+	 * sys_exit()			sys_futex()
+	 *  do_exit()			 futex_lock_pi()
+	 *                                futex_lock_pi_atomic()
+	 *   exit_signals(tsk)		    No waiters:
+	 *    tsk->flags |= PF_EXITING;	    *uaddr == 0x00000PID
+	 *  mm_release(tsk)		    Set waiter bit
+	 *   exit_robust_list(tsk) {	    *uaddr = 0x80000PID;
+	 *      Set owner died		    attach_to_pi_owner() {
+	 *    *uaddr = 0xC0000000;	     tsk = get_task(PID);
+	 *   }				     if (!tsk->flags & PF_EXITING) {
+	 *  ...				       attach();
+	 *  tsk->flags |= PF_EXITPIDONE;     } else {
+	 *				       if (!(tsk->flags & PF_EXITPIDONE))
+	 *				         return -EAGAIN;
+	 *				       return -ESRCH; <--- FAIL
+	 *				     }
+	 *
+	 * Returning ESRCH unconditionally is wrong here because the
+	 * user space value has been changed by the exiting task.
+	 *
+	 * The same logic applies to the case where the exiting task is
+	 * already gone.
+	 */
+	if (get_futex_value_locked(&uval2, uaddr))
+		return -EFAULT;
+
+	/* If the user space value has changed, try again. */
+	if (uval2 != uval)
+		return -EAGAIN;
+
+	/*
+	 * The exiting task did not have a robust list, the robust list was
+	 * corrupted or the user space value in *uaddr is simply bogus.
+	 * Give up and tell user space.
+	 */
+	return -ESRCH;
+}
+
 /*
  * Lookup the task for the TID provided from user space and attach to
  * it after doing proper sanity checks.
  */
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
 			      struct futex_pi_state **ps)
 {
 	pid_t pid = uval & FUTEX_TID_MASK;
@@ -1180,12 +1234,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
 	/*
 	 * We are the first waiter - try to look up the real owner and attach
 	 * the new pi_state to it, but bail out when TID = 0 [1]
+	 *
+	 * The !pid check is paranoid. None of the call sites should end up
+	 * with pid == 0, but better safe than sorry. Let the caller retry
 	 */
 	if (!pid)
-		return -ESRCH;
+		return -EAGAIN;
 	p = futex_find_get_task(pid);
 	if (!p)
-		return -ESRCH;
+		return handle_exit_race(uaddr, uval, NULL);
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		put_task_struct(p);
@@ -1205,7 +1262,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
 		 * set, we know that the task has finished the
 		 * cleanup:
 		 */
-		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+		int ret = handle_exit_race(uaddr, uval, p);
 
 		raw_spin_unlock_irq(&p->pi_lock);
 		put_task_struct(p);
@@ -1262,7 +1319,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
 	 * We are the first waiter - try to look up the owner based on
 	 * @uval and attach to it.
 	 */
-	return attach_to_pi_owner(uval, key, ps);
+	return attach_to_pi_owner(uaddr, uval, key, ps);
 }
 
 static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1370,7 +1427,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
 	 * attach to the owner. If that fails, no harm done, we only
 	 * set the FUTEX_WAITERS bit in the user space variable.
 	 */
-	return attach_to_pi_owner(uval, key, ps);
+	return attach_to_pi_owner(uaddr, newval, key, ps);
 }
 
 /**
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2019-02-17 11:34 ` Sudip Mukherjee
@ 2019-02-17 11:53   ` Thomas Gleixner
  2019-02-17 12:27     ` Sudip Mukherjee
  2019-02-18 12:20   ` Greg KH
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2019-02-17 11:53 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: gregkh, dvhart, heiko.carstens, mingo, peterz, sashal, stli, stable

On Sun, 17 Feb 2019, Sudip Mukherjee wrote:

> Hi Greg,
> 
> On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
> > 
> > The patch below does not apply to the 4.14-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@vger.kernel.org>.
> 
> The attached backported patch should apply to 4.14-stable tree.
> 
> I think we have a real usecase which is triggering this error and I was
> still in the middle of debugging that. But my initial analysis was
> showing that the userspace thread was stuck in the indefinite loop.
> I have a reliable reproducer of the problem and will setup a test
> tomorrow and confirm.

There are more patches in that area and you also need a fixed glibc.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2019-02-17 11:53   ` Thomas Gleixner
@ 2019-02-17 12:27     ` Sudip Mukherjee
  2019-02-17 17:59       ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Sudip Mukherjee @ 2019-02-17 12:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Greg Kroah-Hartman, Darren Hart, Heiko Carstens, Ingo Molnar,
	Peter Zijlstra (Intel),
	Sasha Levin, stli, Stable

Hi Thomas,

On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
>
> > Hi Greg,
> >
> > On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
> > >
> >
<snip>
> > I think we have a real usecase which is triggering this error and I was
> > still in the middle of debugging that. But my initial analysis was
> > showing that the userspace thread was stuck in the indefinite loop.
> > I have a reliable reproducer of the problem and will setup a test
> > tomorrow and confirm.
>
> There are more patches in that area and you also need a fixed glibc.

I can see 1a1fb985f2e2 ("futex: Handle early deadlock return
correctly") is already there in 4.14-stable.
Is anything else missing, other than this one?

glibc might be a problem, but lets see what can be done.


-- 
Regards
Sudip

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2019-02-17 12:27     ` Sudip Mukherjee
@ 2019-02-17 17:59       ` Thomas Gleixner
  2019-02-18  8:26         ` Stefan Liebler
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2019-02-17 17:59 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: Greg Kroah-Hartman, Darren Hart, Heiko Carstens, Ingo Molnar,
	Peter Zijlstra (Intel),
	Sasha Levin, stli, Stable

On Sun, 17 Feb 2019, Sudip Mukherjee wrote:

> Hi Thomas,
> 
> On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
> >
> > > Hi Greg,
> > >
> > > On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
> > > >
> > >
> <snip>
> > > I think we have a real usecase which is triggering this error and I was
> > > still in the middle of debugging that. But my initial analysis was
> > > showing that the userspace thread was stuck in the indefinite loop.
> > > I have a reliable reproducer of the problem and will setup a test
> > > tomorrow and confirm.
> >
> > There are more patches in that area and you also need a fixed glibc.
> 
> I can see 1a1fb985f2e2 ("futex: Handle early deadlock return
> correctly") is already there in 4.14-stable.
> Is anything else missing, other than this one?
> 
> glibc might be a problem, but lets see what can be done.

Those two are the kernel side of affairs I think.

The relevant glibc commits are:

 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
 823624bdc47f1f80109c9c52dee7939b9386d708

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2019-02-17 17:59       ` Thomas Gleixner
@ 2019-02-18  8:26         ` Stefan Liebler
  2019-02-19 10:34           ` Sudip Mukherjee
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Liebler @ 2019-02-18  8:26 UTC (permalink / raw)
  To: Thomas Gleixner, Sudip Mukherjee
  Cc: Greg Kroah-Hartman, Darren Hart, Heiko Carstens, Ingo Molnar,
	Peter Zijlstra (Intel),
	Sasha Levin, Stable

Hi Sudip,


On 02/17/2019 06:59 PM, Thomas Gleixner wrote:
> On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
> 
>> Hi Thomas,
>>
>> On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
>>>
>>>> Hi Greg,
>>>>
>>>> On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
>>>>>
>>>>
>> <snip>
>>>> I think we have a real usecase which is triggering this error and I was
>>>> still in the middle of debugging that. But my initial analysis was
>>>> showing that the userspace thread was stuck in the indefinite loop.
=> This behaviour depends on the configuration of assert.
See glibc code in nptl/pthread_mutex_lock.c (you will encounter either 
an abort due to assert or an indefinite loop):
		/* ESRCH can happen only for non-robust PI mutexes where
		   the owner of the lock died.  */
		assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);

		/* Delay the thread indefinitely.  */
		while (1)
		  __pause_nocancel ();
>>>> I have a reliable reproducer of the problem and will setup a test
>>>> tomorrow and confirm.
>>>
>>> There are more patches in that area and you also need a fixed glibc.
>>
>> I can see 1a1fb985f2e2 ("futex: Handle early deadlock return
>> correctly") is already there in 4.14-stable.
>> Is anything else missing, other than this one?
>>
>> glibc might be a problem, but lets see what can be done.
> 
> Those two are the kernel side of affairs I think.
> 
> The relevant glibc commits are:
> 
>   8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
=> Needed for pthread_mutex_lock / pthread_mutex_timedlock (within glibc 
release 2.25)

>   823624bdc47f1f80109c9c52dee7939b9386d708
=> Needed for pthread_mutex_trylock (will be within next glibc release 
2.30, but is backported to glibc release branches 2.25 ... 2.29)

Bye
Stefan
> 
> Thanks,
> 
> 	tglx
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2019-02-17 11:34 ` Sudip Mukherjee
  2019-02-17 11:53   ` Thomas Gleixner
@ 2019-02-18 12:20   ` Greg KH
  1 sibling, 0 replies; 8+ messages in thread
From: Greg KH @ 2019-02-18 12:20 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: tglx, dvhart, heiko.carstens, mingo, peterz, sashal, stli, stable

On Sun, Feb 17, 2019 at 11:34:30AM +0000, Sudip Mukherjee wrote:
> Hi Greg,
> 
> On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
> > 
> > The patch below does not apply to the 4.14-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@vger.kernel.org>.
> 
> The attached backported patch should apply to 4.14-stable tree.
> 
> I think we have a real usecase which is triggering this error and I was
> still in the middle of debugging that. But my initial analysis was
> showing that the userspace thread was stuck in the indefinite loop.
> I have a reliable reproducer of the problem and will setup a test
> tomorrow and confirm.

Now applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree
  2019-02-18  8:26         ` Stefan Liebler
@ 2019-02-19 10:34           ` Sudip Mukherjee
  0 siblings, 0 replies; 8+ messages in thread
From: Sudip Mukherjee @ 2019-02-19 10:34 UTC (permalink / raw)
  To: Stefan Liebler
  Cc: Thomas Gleixner, Greg Kroah-Hartman, Darren Hart, Heiko Carstens,
	Ingo Molnar, Peter Zijlstra (Intel),
	Sasha Levin, Stable

On Mon, Feb 18, 2019 at 8:26 AM Stefan Liebler <stli@linux.ibm.com> wrote:
>
> Hi Sudip,
>
>
> On 02/17/2019 06:59 PM, Thomas Gleixner wrote:
> > On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
> >
> >> Hi Thomas,
> >>
> >> On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>
> >>> On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
> >>>
> >>>> Hi Greg,
> >>>>
> >>>> On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@linuxfoundation.org wrote:
> >>>>>
> >>>>
> >> <snip>
> >>>> I think we have a real usecase which is triggering this error and I was
> >>>> still in the middle of debugging that. But my initial analysis was
> >>>> showing that the userspace thread was stuck in the indefinite loop.
> => This behaviour depends on the configuration of assert.
> See glibc code in nptl/pthread_mutex_lock.c (you will encounter either
> an abort due to assert or an indefinite loop):
>                 /* ESRCH can happen only for non-robust PI mutexes where
>                    the owner of the lock died.  */
>                 assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
>
>                 /* Delay the thread indefinitely.  */
>                 while (1)
>                   __pause_nocancel ();
> >>>> I have a reliable reproducer of the problem and will setup a test
> >>>> tomorrow and confirm.
> >>>
> >>> There are more patches in that area and you also need a fixed glibc.
> >>
> >> I can see 1a1fb985f2e2 ("futex: Handle early deadlock return
> >> correctly") is already there in 4.14-stable.
> >> Is anything else missing, other than this one?
> >>
> >> glibc might be a problem, but lets see what can be done.
> >
> > Those two are the kernel side of affairs I think.
> >
> > The relevant glibc commits are:
> >
> >   8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
> => Needed for pthread_mutex_lock / pthread_mutex_timedlock (within glibc
> release 2.25)
>
> >   823624bdc47f1f80109c9c52dee7939b9386d708
> => Needed for pthread_mutex_trylock (will be within next glibc release
> 2.30, but is backported to glibc release branches 2.25 ... 2.29)

Thanks.
I tried with only the kernel changes and it was not resolved. Then I
tried with both kernel changes and the glibc changes and I saw the
problem improving significantly. But since we are using an ancient
version of eglibc, I am not expecting it to get better than this.


-- 
Regards
Sudip

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-19 10:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-24 11:52 FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.14-stable tree gregkh
2019-02-17 11:34 ` Sudip Mukherjee
2019-02-17 11:53   ` Thomas Gleixner
2019-02-17 12:27     ` Sudip Mukherjee
2019-02-17 17:59       ` Thomas Gleixner
2019-02-18  8:26         ` Stefan Liebler
2019-02-19 10:34           ` Sudip Mukherjee
2019-02-18 12:20   ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.