From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=0iDx=RZ=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8A51BC43381
	for <linux-fsdevel@archiver.kernel.org>; Fri, 22 Mar 2019 20:27:53 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 5F4F821841
	for <linux-fsdevel@archiver.kernel.org>; Fri, 22 Mar 2019 20:27:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727519AbfCVU1w (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Fri, 22 Mar 2019 16:27:52 -0400
Received: from fieldses.org ([173.255.197.46]:52760 "EHLO fieldses.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1727373AbfCVU1w (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 22 Mar 2019 16:27:52 -0400
Received: by fieldses.org (Postfix, from userid 2815)
        id 4E43C1CE6; Fri, 22 Mar 2019 16:27:51 -0400 (EDT)
Date:   Fri, 22 Mar 2019 16:27:51 -0400
From:   "J. Bruce Fields" <bfields@fieldses.org>
To:     NeilBrown <neilb@suse.com>
Cc:     Jeff Layton <jlayton@kernel.org>, linux-fsdevel@vger.kernel.org,
        asn@redhat.com
Subject: Re: [PATCH] locks: ignore same lock in blocked_lock_hash
Message-ID: <20190322202751.GB10961@fieldses.org>
References: <20190321112744.7832-1-jlayton@kernel.org>
 <20190322004233.GA28329@fieldses.org>
 <87sgvf6560.fsf@notabene.neil.brown.name>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87sgvf6560.fsf@notabene.neil.brown.name>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On Fri, Mar 22, 2019 at 12:58:47PM +1100, NeilBrown wrote:
> On Thu, Mar 21 2019, J. Bruce Fields wrote:
> 
> > On Thu, Mar 21, 2019 at 07:27:44AM -0400, Jeff Layton wrote:
> >> Andreas reported that he was seeing the tdbtorture test fail in
> >> some cases with -EDEADLCK when it wasn't before. Some debugging
> >> showed that deadlock detection was sometimes discovering the
> >> caller's lock request itself in a dependency chain.
> >> 
> >> If posix_locks_deadlock() fails to find a deadlock, the caller_fl
> >> will be passed to __locks_insert_block(), and this wakes up all
> >> locks that are blocked on caller_fl, clearing the fl_blocker link.
> >> 
> >> So if posix_locks_deadlock() finds caller_fl while searching for
> >> a deadlock,
> >
> > I'm feeling dense.  Could you step me through the scenario in a little
> > more detail?
> 
> Three processes: A, B, C.  One byte in one file: X
> 
> A gets a read lock on X.  Doesn't block, so nothing goes in the hash
>   table.
> 
> B gets a read lock on X.  Doesn't block.
> 
> C tries to get a write lock on X.  It cannot so it can block on either A
>   or B.  It chooses B.  It check if this might cause a deadlock, but the
>   hash is empty so it doesn't.  It gets added to the hash. "C -> B"
> 
> A tries to get a write lock on X. It cannot because B has a read lock,
>   so it needs to block on B, or some child.  C is a child and A's
>   request conflicts with C's request, so A blocks on C.
>   It first checks: Does A->C cause a loop in the hash-table?  No. So
>   it proceeds to block and adds  "A->C" to the hash table.
> 
> B drops its read lock. This wakes up the first child: C.
> 
> C tries (again) to get a write lock on X.  It cannot because A still
>   holds a read lock, so it will have to block on A.  First it must
>   check if "C->A" will cause a loop in the hash-table ... Yes it will
>   because "A->C" is already there - error!
> 
> This is a false-positive because the very next thing to be done after
> posix_locks_deadlock() (while still holding all spinlocks), is
> __locks_insert_block(C->A) which calls __locks_wake_up_blocks(C) which
> calls __locks_delete_block() for any pending block ??->C and so will
> remove A->C from the hash.
> 
> Maybe we could call __locks_wake_up_block() *before* calling
> posix_locks_deadlock().  That *might* make more sense. 
>   
> 
> >
> > Also, how do we know this catches every such case?
> 
> The only links in the hash table that are about to be removed are those
> that record that something is blocked by C (the lock we are trying to
> get).  So these are the only ones that it is reasonable to special-case
> in posix_locks_deadlock().
> 
> >
> > And why aren't we unhashing blocks when we wake them up?
> 
> We do - that is exactly when we unhash them.  The problem is they we
> wake them up *after* we check for errors, rather than before.

Thanks!

OK, somehow I overlooked
__locks_wake_up_blocks->__locks_delete_block->locks_delete_global_block.

And I missed that all this was happening while still under the
blocked_lock_lock, so we don't have to worry about other tasks leaving
locks in this state.

It's possible I'm still being a little dense, but the new check could
really use a careful comment (maybe even as detailed as the above).

--b.