From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161592AbbBDU0w (ORCPT ); Wed, 4 Feb 2015 15:26:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59990 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161055AbbBDU0u (ORCPT ); Wed, 4 Feb 2015 15:26:50 -0500 Date: Wed, 4 Feb 2015 21:25:09 +0100 From: Oleg Nesterov To: Peter Zijlstra Cc: Darren Hart , Thomas Gleixner , Jerome Marchand , Larry Woodman , Mateusz Guzik , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/1] futex: check PF_KTHREAD rather than !p->mm to filter out kthreads Message-ID: <20150204202509.GA1502@redhat.com> References: <20150202140515.GA26398@redhat.com> <20150202151159.GE26304@twins.programming.kicks-ass.net> <20150203200916.GA10545@redhat.com> <20150204111212.GF2896@worktop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150204111212.GF2896@worktop.programming.kicks-ass.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/04, Peter Zijlstra wrote: > > On Tue, Feb 03, 2015 at 09:09:16PM +0100, Oleg Nesterov wrote: > > Btw, do you agree with 1/1? Can you ack/nack it? > > Done! Thanks ;) > > I think that attach_to_pi_owner() should never check PF_EXITING and never > > return -EAGAIN. It should either proceed and add pi_state to the list or > > return -ESRCH if exit_pi_state_list() was called. > > > > Do you agree? > > Yes. OK, great. > > Perhaps we can set PF_EXITPIDONE lockless and avoid the unconditional > > lock(pi_lock) but this is minor. > > Agreed, lets first fix things. We can optimize later. Yes, agreed. and BTW the current list_empty(&tsk->pi_state_list) check in mm_release() doesn't look right in theory. It seems that we need another barrier before this check and list_empty_careful(). Nevermind, this is only theoretical and we have another unlock_wait(pi_lock) in do_exit(). > I'm not entire sure why we need two PF flags for this; once PF_EXITING > is set userspace is _dead_ and it doesn't make sense to keep adding > (futex) PI-state to the task. This is what I _seem_ to understand: exit_robust_list(). Although I am not sure this all is by design... And this is the reason why I still can't finish the patch. Perhaps I am totally confused, but I think there is yet another problem here. Please forget about PF_EXIT.*. attach_to_pi_owner() returns -ESRCH if futex_find_get_task() and even this looks wrong. Because handle_futex_death() updates *uaddr lockless and does nothing if "pi". This means that the owner of PI + robust mutex can go away (or just set PF_EXITPIDONE) and the caller of futex_lock_pi() can miss unlock. Peter, could you confirm that this problem does exist, or I missed something? If yes. perhaps we need another get_futex_value_locked() before "return ESRCH", or perhaps something like the (ugly) patch below. I'll try to think again... Oleg. --- x/kernel/futex.c +++ x/kernel/futex.c @@ -2815,12 +2815,20 @@ retry: if (nval != uval) goto retry; - /* - * Wake robust non-PI futexes here. The wakeup of - * PI futexes happens in exit_pi_state(): - */ - if (!pi && (uval & FUTEX_WAITERS)) - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + if (uval & FUTEX_WAITERS) { + if (pi) { + /* + * Wake robust non-PI futexes here. The wakeup + * of PI futexes happens in exit_pi_stale_list(). + * Sync with potential attachers to this list. + */ + get_futex_key(..., &key, ...); + hb = hash_futex(&key); + spin_unlock_wait(&hb->lock); + } else { + futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + } + } } return 0; }