From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1161592AbbBDU0w (ORCPT <rfc822;w@1wt.eu>);
	Wed, 4 Feb 2015 15:26:52 -0500
Received: from mx1.redhat.com ([209.132.183.28]:59990 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1161055AbbBDU0u (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 4 Feb 2015 15:26:50 -0500
Date: Wed, 4 Feb 2015 21:25:09 +0100
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>, Thomas Gleixner <tglx@linutronix.de>,
        Jerome Marchand <jmarchan@redhat.com>,
        Larry Woodman <lwoodman@redhat.com>, Mateusz Guzik <mguzik@redhat.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to
	filter out kthreads
Message-ID: <20150204202509.GA1502@redhat.com>
References: <20150202140515.GA26398@redhat.com> <20150202151159.GE26304@twins.programming.kicks-ass.net> <20150203200916.GA10545@redhat.com> <20150204111212.GF2896@worktop.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150204111212.GF2896@worktop.programming.kicks-ass.net>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 02/04, Peter Zijlstra wrote:
>
> On Tue, Feb 03, 2015 at 09:09:16PM +0100, Oleg Nesterov wrote:
> > Btw, do you agree with 1/1? Can you ack/nack it?
>
> Done!

Thanks ;)

> > I think that attach_to_pi_owner() should never check PF_EXITING and never
> > return -EAGAIN. It should either proceed and add pi_state to the list or
> > return -ESRCH if exit_pi_state_list() was called.
> >
> > Do you agree?
>
> Yes.

OK, great.

> > Perhaps we can set PF_EXITPIDONE lockless and avoid the unconditional
> > lock(pi_lock) but this is minor.
>
> Agreed, lets first fix things. We can optimize later.

Yes, agreed. and BTW the current list_empty(&tsk->pi_state_list) check
in mm_release() doesn't look right in theory. It seems that we need
another barrier before this check and list_empty_careful(). Nevermind,
this is only theoretical and we have another unlock_wait(pi_lock) in
do_exit().

> I'm not entire sure why we need two PF flags for this; once PF_EXITING
> is set userspace is _dead_ and it doesn't make sense to keep adding
> (futex) PI-state to the task.

This is what I _seem_ to understand: exit_robust_list(). Although I am
not sure this all is by design...

And this is the reason why I still can't finish the patch. Perhaps I am
totally confused, but I think there is yet another problem here.

Please forget about PF_EXIT.*. attach_to_pi_owner() returns -ESRCH if
futex_find_get_task() and even this looks wrong. Because handle_futex_death()
updates *uaddr lockless and does nothing if "pi". This means that the owner
of PI + robust mutex can go away (or just set PF_EXITPIDONE) and the caller
of futex_lock_pi() can miss unlock.

Peter, could you confirm that this problem does exist, or I missed something?

If yes. perhaps we need another get_futex_value_locked() before "return ESRCH",
or perhaps something like the (ugly) patch below. I'll try to think again...

Oleg.

--- x/kernel/futex.c
+++ x/kernel/futex.c
@@ -2815,12 +2815,20 @@ retry:
 		if (nval != uval)
 			goto retry;
 
-		/*
-		 * Wake robust non-PI futexes here. The wakeup of
-		 * PI futexes happens in exit_pi_state():
-		 */
-		if (!pi && (uval & FUTEX_WAITERS))
-			futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+		if (uval & FUTEX_WAITERS) {
+			if (pi) {
+				/*
+				 * Wake robust non-PI futexes here. The wakeup
+				 * of PI futexes happens in exit_pi_stale_list().
+				 * Sync with potential attachers to this list.
+				 */
+				get_futex_key(..., &key, ...);
+				hb = hash_futex(&key);
+				spin_unlock_wait(&hb->lock);
+			} else {
+				futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+			}
+		}
 	}
 	return 0;
 }