From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965467AbbBDLMW (ORCPT ); Wed, 4 Feb 2015 06:12:22 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:57104 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932833AbbBDLMR (ORCPT ); Wed, 4 Feb 2015 06:12:17 -0500 Date: Wed, 4 Feb 2015 12:12:12 +0100 From: Peter Zijlstra To: Oleg Nesterov Cc: Darren Hart , Thomas Gleixner , Jerome Marchand , Larry Woodman , Mateusz Guzik , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads Message-ID: <20150204111212.GF2896@worktop.programming.kicks-ass.net> References: <20150202140515.GA26398@redhat.com> <20150202151159.GE26304@twins.programming.kicks-ass.net> <20150203200916.GA10545@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150203200916.GA10545@redhat.com> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 03, 2015 at 09:09:16PM +0100, Oleg Nesterov wrote: > Btw, do you agree with 1/1? Can you ack/nack it? Done! > On 02/02, Peter Zijlstra wrote: > > > > On Mon, Feb 02, 2015 at 03:05:15PM +0100, Oleg Nesterov wrote: > > > > > And another question. Lets forget about this ->mm check. I simply can not > > > understand this > > > > > > ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN > > > > > > I must have missed something but this looks buggy, I do not see any > > > preemption point in this "retry" loop. Suppose that max_cpus=1 and rt_task() > > > preempts the non-rt PF_EXITING owner. Looks like futex_lock_pi() can spin > > > forever in this case? (OK, ignoring RT throttling). > > > > So yes, I do like your proposal of putting PF_EXITPIDONE under the > > ->pi_lock section that handles exit_pi_state_list(). > > Probably I was not clear... Let try again just in case. > > I believe that the whole "spin waiting for PF_EXITING -> PF_EXITPIDONE > transition" idea is simply wrong. See the test-case I sent. > > I think that attach_to_pi_owner() should never check PF_EXITING and never > return -EAGAIN. It should either proceed and add pi_state to the list or > return -ESRCH if exit_pi_state_list() was called. > > Do you agree? Yes. > Perhaps we can set PF_EXITPIDONE lockless and avoid the unconditional > lock(pi_lock) but this is minor. Agreed, lets first fix things. We can optimize later. > The main problem is that I fail to understand why this logic was added > in the first place... To avoid the race with exit_robust_list() ? I do > not see why this is needed... exit_pi_state_list() I think, but 778e9a9c3e71 ("pi-futex: fix exit races and locking problems") is a big and somewhat confusing patch. I'm not quite sure why/how all that happened either, it was before I got sucked into all this. I'm not entire sure why we need two PF flags for this; once PF_EXITING is set userspace is _dead_ and it doesn't make sense to keep adding (futex) PI-state to the task. > > As for the recursive fault; I think the safer option is to set > > EXITPIDONE and not register more PI states, as opposed to allowing more > > and more states to be added. Yes we'll leak whatever currently is there, > > but no point in allowing it to get worse. > > Not sure I understand... If you mean recursive do_exit() then yes, I think > that we should simply set EXITPIDONE lockless in a best-effort manner, this > is what the current code does. Just the comment should be updated in any > case imo. Yes, the "Fixing recursive fault..." branch, you had an XXX explain comment there. I think we agree there. > But mostly I was confused by the pseudo-code below. Heh, because I thought > that it describes the changes in kernel/futex.c you think we should do. Now > that I finally realized that it outlines the current code I am unconfused a > bit ;) Yes, it was an attempt to show what the current code does -- which is; of itself; confusing enough.