From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756060AbbBFRGN (ORCPT <rfc822;w@1wt.eu>);
	Fri, 6 Feb 2015 12:06:13 -0500
Received: from mx1.redhat.com ([209.132.183.28]:33647 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752777AbbBFRGL (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 6 Feb 2015 12:06:11 -0500
Date: Fri, 6 Feb 2015 18:04:25 +0100
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>, Thomas Gleixner <tglx@linutronix.de>,
        Jerome Marchand <jmarchan@redhat.com>,
        Larry Woodman <lwoodman@redhat.com>, Mateusz Guzik <mguzik@redhat.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to
	filter out kthreads
Message-ID: <20150206170425.GA7493@redhat.com>
References: <20150202140515.GA26398@redhat.com> <20150202151159.GE26304@twins.programming.kicks-ass.net> <20150203200916.GA10545@redhat.com> <20150204111212.GF2896@worktop.programming.kicks-ass.net> <20150204202509.GA1502@redhat.com> <20150205162725.GK5029@twins.programming.kicks-ass.net> <20150205181014.GA20244@redhat.com> <20150206104658.GI23123@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150206104658.GI23123@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Peter, I am spamming you again and again, but I didn't even start the
patches. It turns out I can do nothing until devconf.cz finishes next
week.

On 02/06, Peter Zijlstra wrote:
>
> On Thu, Feb 05, 2015 at 07:10:14PM +0100, Oleg Nesterov wrote:
>
> > So I think that in this case we either need to recheck that *uaddr is still the
> > same (and turn -ESRCH into -EAGAIN otherwise), or change handle_futex_death() to
> > serialize with X so that it can proceed and attach pi_state.
> >
> > No?
>
> I _think_ you're right, doing -ESRCH is wrong without first looking to
> see if uval changed and gained an FUTEX_OWNER_DIED.

OK, thanks.

> I don't think making handle_futex_death() wait on hb lock works because
> of the -EAGAIN loop releasing that lock.

I think this should work... EAGAIN loop will either notice the change in
*uaddr or it will attach to pi list successfully. But please ignore, even
if I am right I do not like this change too.


And there is another thing which looks like design bug to me. I understand
that it is too late and pointless to complain, probably we can't change the
current behaviour, but I simply can't resist...

Suppose that a task T takes a PI futex (non-robust) and exits. Another task
does futex(FUTEX_LOCK_PI).

Now. if futex() is called after T exits it returns -ESRCH, this is correct.
But if it is called before, it succeeds while (I think) it should not.
fixup_owner() treats pi_state->owner == NULL pretty much as "unlocked".

IOW,
	#include <stdio.h>
	#include <unistd.h>
	#include <sys/syscall.h>
	#include <sys/wait.h>
	#include <assert.h>

	#define FUTEX_LOCK_PI	6

	int main(void)
	{
		int mutex, pid, err;

		pid = fork();
		if (!pid) {
			sleep(1);
			return 0;
		}

		mutex = pid;
		err = syscall(__NR_futex, &mutex, FUTEX_LOCK_PI, 0,0,0);
		printf("err=%d %x -> %x %m\n", err, pid, mutex);

		assert(wait(NULL) == pid);
		return 0;
	}

I don't understand why syscall(FUTEX_LOCK_PI) succeeds in this case.
To me it should fail with -ESRCH, this would be much more consistent
imho.

And this means that "PI" implies "robust" to some degree. OK, may be
this is fine. But if this is fine, why we can't do the same if, say,
futex_find_get_task() returns NULL ?

To me the rule should be simple. If the owner dies the next LOCK_PI
should succeed if and only if the futex was robust.

Or at least this should not depend on timing.

Oleg.