linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip 0/2] futex: Two pi fixes
@ 2021-03-15  5:02 Davidlohr Bueso
  2021-03-15  5:02 ` [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list() Davidlohr Bueso
  2021-03-15  5:02 ` [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault Davidlohr Bueso
  0 siblings, 2 replies; 9+ messages in thread
From: Davidlohr Bueso @ 2021-03-15  5:02 UTC (permalink / raw)
  To: tglx, mingo; +Cc: peterz, dvhart, linux-kernel, dave

Hi,

Some unrelated fixlets found via code inspection. Please consider for v5.13.

Thanks!

Davidlohr Bueso (2):
  futex: Fix irq mismatch in exit_pi_state_list()
  futex: Leave the pi lock stealer in a consistent state upon successful
    fault

 kernel/futex.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list()
  2021-03-15  5:02 [PATCH -tip 0/2] futex: Two pi fixes Davidlohr Bueso
@ 2021-03-15  5:02 ` Davidlohr Bueso
  2021-03-15 13:12   ` Peter Zijlstra
  2021-03-15  5:02 ` [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault Davidlohr Bueso
  1 sibling, 1 reply; 9+ messages in thread
From: Davidlohr Bueso @ 2021-03-15  5:02 UTC (permalink / raw)
  To: tglx, mingo; +Cc: peterz, dvhart, linux-kernel, dave, Davidlohr Bueso

The pi_mutex->wait_lock is irq safe and needs to enable local
interrupts upon unlocking, matching it's corresponding
raw_spin_lock_irq().

Fixes: c74aef2d06a9f (futex: Fix pi_state->owner serialization)
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/futex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 475055715371..ded7af2ba87f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -885,7 +885,7 @@ static void exit_pi_state_list(struct task_struct *curr)
 		 */
 		if (head->next != next) {
 			/* retain curr->pi_lock for the loop invariant */
-			raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
+			raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 			spin_unlock(&hb->lock);
 			put_pi_state(pi_state);
 			continue;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault
  2021-03-15  5:02 [PATCH -tip 0/2] futex: Two pi fixes Davidlohr Bueso
  2021-03-15  5:02 ` [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list() Davidlohr Bueso
@ 2021-03-15  5:02 ` Davidlohr Bueso
  2021-03-16 11:20   ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Davidlohr Bueso @ 2021-03-15  5:02 UTC (permalink / raw)
  To: tglx, mingo; +Cc: peterz, dvhart, linux-kernel, dave, Davidlohr Bueso

Before 34b1a1ce145 (futex: Handle faults correctly for PI futexes) any
concurrent pi_state->owner fixup would assume that the task that fixed
things on our behalf also correctly updated the userspace value. This
is not always the case anymore, and can result in scenarios where a lock
stealer returns a successful FUTEX_PI_LOCK operation but raced during a fault
with an enqueued top waiter in an immutable state so the uval TID was
not updated for the stealer, breaking otherwise expected (and valid)
semantics and confusing the stealer task:

with pi_state->owner == victim.

victim							stealer
futex_lock_pi() {
  queue_me(&q, hb);
  rt_mutex_timed_futex_lock() {
							futex_lock_pi() {
							  // lock steal
							  rt_mutex_timed_futex_lock();
    // timeout
  }

  spin_lock(q.lock_ptr);
  fixup_owner(!locked) {
    fixup_pi_state_owner(NULL) {
      oldowner = pi_state->owner
      newowner = stealer;
      handle_err:
      //drop locks

      ret = fault_in_user_writeable() {			spin_lock(q.lock_ptr);
							fixup_owner(locked) {
      } // -EFAULT					    fixup_pi_state_owner(current) {
							      oldowner = pi_state->owner
							      newowner = current;
							      handle_err:
							      // drop locks
							      ret = fault_in_user_writeable() {

      // take locks
      if (pi_state->owner != oldowner) // false

      pi_state_update_owner(rt_mutex_owner());

							       } // SUCCESS
   }
   // all locks dropped					       // take locks
							       if (pi_state->owner != oldowner) // success
}								 return 1;

This leaves: (pi_state == pi_mutex owner == stealer) AND (uval TID == victim).

This patch proposes for the lock stealer to do a retry upon seeing someone
changed pi_state->owner while all locks were dropped if the fault was
successful. This allows to self-fixup the user state of the lock, albeit
an incorrect order compared to traditionally updating userspace before
pi_state, but this is an extraordinary scenario.

For the cases of a normal fixups, this does add some unnecessary overhead
by having to deal with userspace value when things are already ok, but
this case is pretty rare and we've already given up any inch of performance
when releasing all locks, for faulting/blocking.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/futex.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index ded7af2ba87f..95ce10c4e33d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2460,7 +2460,6 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 
 	case -EAGAIN:
 		cond_resched();
-		err = 0;
 		break;
 
 	default:
@@ -2474,11 +2473,22 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 	/*
 	 * Check if someone else fixed it for us:
 	 */
-	if (pi_state->owner != oldowner)
+	if (pi_state->owner != oldowner) {
+		/*
+		 * The change might have come from the rare immutable
+		 * state below, which leaves the userspace value out of
+		 * sync. But if we are the lock stealer and can update
+		 * the uval, do so, instead of reporting a successful
+		 * lock operation with an invalid user state.
+		 */
+		if (!err && argowner == current)
+			goto retry;
+
 		return argowner == current;
+	}
 
 	/* Retry if err was -EAGAIN or the fault in succeeded */
-	if (!err)
+	if (err == -EAGAIN || !err)
 		goto retry;
 
 	/*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list()
  2021-03-15  5:02 ` [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list() Davidlohr Bueso
@ 2021-03-15 13:12   ` Peter Zijlstra
  2021-03-15 19:03     ` Davidlohr Bueso
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-03-15 13:12 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: tglx, mingo, dvhart, linux-kernel, Davidlohr Bueso

On Sun, Mar 14, 2021 at 10:02:23PM -0700, Davidlohr Bueso wrote:
> The pi_mutex->wait_lock is irq safe and needs to enable local
> interrupts upon unlocking, matching it's corresponding
> raw_spin_lock_irq().
> 
> Fixes: c74aef2d06a9f (futex: Fix pi_state->owner serialization)
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  kernel/futex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 475055715371..ded7af2ba87f 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -885,7 +885,7 @@ static void exit_pi_state_list(struct task_struct *curr)
>  		 */
>  		if (head->next != next) {
>  			/* retain curr->pi_lock for the loop invariant */
> -			raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
> +			raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
>  			spin_unlock(&hb->lock);
>  			put_pi_state(pi_state);
>  			continue;

This seems broken, afaict we own:

  &hb->lock
  &pi_state->pi_mutex.wait_lock
  &curr->pi_lock

And we're only releasing:

  &hb->lock
  &pi_state->pi_mutex.wait_lock

Which leaves us holding:

  &curr->pi_lock

which is also an IRQ safe lock, so enabling IRQs would be BAD.

Or am I reading this wrong?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list()
  2021-03-15 13:12   ` Peter Zijlstra
@ 2021-03-15 19:03     ` Davidlohr Bueso
  0 siblings, 0 replies; 9+ messages in thread
From: Davidlohr Bueso @ 2021-03-15 19:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, mingo, dvhart, linux-kernel, Davidlohr Bueso

On Mon, 15 Mar 2021, Peter Zijlstra wrote:

>Or am I reading this wrong?

No, I read it wrong. Please ignore this patch, there are rather a few
cases that do this trickery.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault
  2021-03-15  5:02 ` [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault Davidlohr Bueso
@ 2021-03-16 11:20   ` Peter Zijlstra
  2021-03-16 18:03     ` Davidlohr Bueso
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-03-16 11:20 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: tglx, mingo, dvhart, linux-kernel, Davidlohr Bueso

On Sun, Mar 14, 2021 at 10:02:24PM -0700, Davidlohr Bueso wrote:
> Before 34b1a1ce145 (futex: Handle faults correctly for PI futexes) any
> concurrent pi_state->owner fixup would assume that the task that fixed
> things on our behalf also correctly updated the userspace value. This
> is not always the case anymore, and can result in scenarios where a lock
> stealer returns a successful FUTEX_PI_LOCK operation but raced during a fault
> with an enqueued top waiter in an immutable state so the uval TID was
> not updated for the stealer, breaking otherwise expected (and valid)
> semantics and confusing the stealer task:


> ---
>  kernel/futex.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index ded7af2ba87f..95ce10c4e33d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2460,7 +2460,6 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
>  
>  	case -EAGAIN:
>  		cond_resched();
> -		err = 0;
>  		break;
>  
>  	default:
> @@ -2474,11 +2473,22 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
>  	/*
>  	 * Check if someone else fixed it for us:
>  	 */
> -	if (pi_state->owner != oldowner)
> +	if (pi_state->owner != oldowner) {
> +		/*
> +		 * The change might have come from the rare immutable
> +		 * state below, which leaves the userspace value out of
> +		 * sync. But if we are the lock stealer and can update
> +		 * the uval, do so, instead of reporting a successful
> +		 * lock operation with an invalid user state.
> +		 */
> +		if (!err && argowner == current)
> +			goto retry;
> +
>  		return argowner == current;
> +	}
>  
>  	/* Retry if err was -EAGAIN or the fault in succeeded */
> -	if (!err)
> +	if (err == -EAGAIN || !err)
>  		goto retry;
>  

IIRC we made the explicit choice to never loop here. That saves having
to worry about getting stuck in in-kernel loops.

Userspace triggering the case where the futex goes corrupt is UB, after
that we have no obligation for anything to still work. It's on them,
they get to deal with the bits remaining.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault
  2021-03-16 11:20   ` Peter Zijlstra
@ 2021-03-16 18:03     ` Davidlohr Bueso
  2021-03-16 19:48       ` Thomas Gleixner
  2021-03-16 20:12       ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Davidlohr Bueso @ 2021-03-16 18:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, mingo, dvhart, linux-kernel, Davidlohr Bueso

On Tue, 16 Mar 2021, Peter Zijlstra wrote:
>
>IIRC we made the explicit choice to never loop here. That saves having
>to worry about getting stuck in in-kernel loops.
>
>Userspace triggering the case where the futex goes corrupt is UB, after
>that we have no obligation for anything to still work. It's on them,
>they get to deal with the bits remaining.

I was kind of expecting this answer, honestly. After all, we are warned
about violations to the 10th:

  * [10] There is no transient state which leaves owner and user space
  *      TID out of sync. Except one error case where the kernel is denied
  *      write access to the user address, see fixup_pi_state_owner().

(btw, should we actually WARN_ON_ONCE this case such that the user is
well aware things are screwed up?)

However, as 34b1a1ce145 describes, it was cared enough about users to
protect them against spurious runaway tasks. And this is why I decided
to even send the patch; it fixes, without sacrificing performance or
additional complexity, a potentially user visible issue which could be
due to programming error. And unlike 34b1a1ce145, where a stealer that
cannot fault ends up dropping the lock, here the stealer can actually
amend things and not break semantics because of another task's stupidity.
But yeah, this could also be considered in the category of inept attempts
to fix a rotten situation.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault
  2021-03-16 18:03     ` Davidlohr Bueso
@ 2021-03-16 19:48       ` Thomas Gleixner
  2021-03-16 20:12       ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2021-03-16 19:48 UTC (permalink / raw)
  To: Davidlohr Bueso, Peter Zijlstra
  Cc: mingo, dvhart, linux-kernel, Davidlohr Bueso

On Tue, Mar 16 2021 at 11:03, Davidlohr Bueso wrote:
> On Tue, 16 Mar 2021, Peter Zijlstra wrote:
>>IIRC we made the explicit choice to never loop here. That saves having
>>to worry about getting stuck in in-kernel loops.
>>
>>Userspace triggering the case where the futex goes corrupt is UB, after
>>that we have no obligation for anything to still work. It's on them,
>>they get to deal with the bits remaining.
>
> I was kind of expecting this answer, honestly. After all, we are warned
> about violations to the 10th:
>
>   * [10] There is no transient state which leaves owner and user space
>   *      TID out of sync. Except one error case where the kernel is denied
>   *      write access to the user address, see fixup_pi_state_owner().
>
> (btw, should we actually WARN_ON_ONCE this case such that the user is
> well aware things are screwed up?)
>
> However, as 34b1a1ce145 describes, it was cared enough about users to
> protect them against spurious runaway tasks. And this is why I decided
> to even send the patch; it fixes, without sacrificing performance or
> additional complexity, a potentially user visible issue which could be
> due to programming error. And unlike 34b1a1ce145, where a stealer that
> cannot fault ends up dropping the lock, here the stealer can actually
> amend things and not break semantics because of another task's stupidity.
> But yeah, this could also be considered in the category of inept attempts
> to fix a rotten situation.

It's one of the 'Doctor it hurts when I shoot myself in the foot' cases :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault
  2021-03-16 18:03     ` Davidlohr Bueso
  2021-03-16 19:48       ` Thomas Gleixner
@ 2021-03-16 20:12       ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2021-03-16 20:12 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: tglx, mingo, dvhart, linux-kernel, Davidlohr Bueso

On Tue, Mar 16, 2021 at 11:03:05AM -0700, Davidlohr Bueso wrote:
> On Tue, 16 Mar 2021, Peter Zijlstra wrote:
> > 
> > IIRC we made the explicit choice to never loop here. That saves having
> > to worry about getting stuck in in-kernel loops.
> > 
> > Userspace triggering the case where the futex goes corrupt is UB, after
> > that we have no obligation for anything to still work. It's on them,
> > they get to deal with the bits remaining.
> 
> I was kind of expecting this answer, honestly. After all, we are warned
> about violations to the 10th:
> 
>  * [10] There is no transient state which leaves owner and user space
>  *      TID out of sync. Except one error case where the kernel is denied
>  *      write access to the user address, see fixup_pi_state_owner().
> 
> (btw, should we actually WARN_ON_ONCE this case such that the user is
> well aware things are screwed up?)

I'm not sure WARN is appropriate, it is something unpriv userspace
can trigger at will.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-16 20:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15  5:02 [PATCH -tip 0/2] futex: Two pi fixes Davidlohr Bueso
2021-03-15  5:02 ` [PATCH 1/2] futex: Fix irq mismatch in exit_pi_state_list() Davidlohr Bueso
2021-03-15 13:12   ` Peter Zijlstra
2021-03-15 19:03     ` Davidlohr Bueso
2021-03-15  5:02 ` [PATCH 2/2] futex: Leave the pi lock stealer in a consistent state upon successful fault Davidlohr Bueso
2021-03-16 11:20   ` Peter Zijlstra
2021-03-16 18:03     ` Davidlohr Bueso
2021-03-16 19:48       ` Thomas Gleixner
2021-03-16 20:12       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).