From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753579AbbBQWVF (ORCPT <rfc822;w@1wt.eu>);
	Tue, 17 Feb 2015 17:21:05 -0500
Received: from fieldses.org ([173.255.197.46]:47362 "EHLO fieldses.org"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753322AbbBQWVE (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 17 Feb 2015 17:21:04 -0500
Date: Tue, 17 Feb 2015 17:21:02 -0500
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jeff.layton@primarydata.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        "Kirill A. Shutemov" <kirill@shutemov.name>,
        Christoph Hellwig <hch@lst.de>, Dave Chinner <david@fromorbit.com>,
        Sasha Levin <sasha.levin@oracle.com>
Subject: Re: [PATCH 2/4] locks: remove conditional lock release in middle of
 flock_lock_file
Message-ID: <20150217222102.GB3667@fieldses.org>
References: <1424177190-14252-1-git-send-email-jeff.layton@primarydata.com>
 <1424177190-14252-3-git-send-email-jeff.layton@primarydata.com>
 <20150217171017.GA27900@fieldses.org>
 <20150217125649.5015cfe4@tlielax.poochiereds.net>
 <20150217191136.GD27900@fieldses.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150217191136.GD27900@fieldses.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Feb 17, 2015 at 02:11:36PM -0500, J. Bruce Fields wrote:
> On Tue, Feb 17, 2015 at 12:56:49PM -0500, Jeff Layton wrote:
> > On Tue, 17 Feb 2015 12:10:17 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Tue, Feb 17, 2015 at 07:46:28AM -0500, Jeff Layton wrote:
> > > > As Linus pointed out:
> > > > 
> > > >     Say we have an existing flock, and now do a new one that conflicts. I
> > > >     see what looks like three separate bugs.
> > > > 
> > > >      - We go through the first loop, find a lock of another type, and
> > > >     delete it in preparation for replacing it
> > > > 
> > > >      - we *drop* the lock context spinlock.
> > > > 
> > > >      - BUG #1? So now there is no lock at all, and somebody can come in
> > > >     and see that unlocked state. Is that really valid?
> > > > 
> > > >      - another thread comes in while the first thread dropped the lock
> > > >     context lock, and wants to add its own lock. It doesn't see the
> > > >     deleted or pending locks, so it just adds it
> > > > 
> > > >      - the first thread gets the context spinlock again, and adds the lock
> > > >     that replaced the original
> > > > 
> > > >      - BUG #2? So now there are *two* locks on the thing, and the next
> > > >     time you do an unlock (or when you close the file), it will only
> > > >     remove/replace the first one.
> > > > 
> > > > ...remove the "drop the spinlock" code in the middle of this function as
> > > > it has always been suspicious.
> > > 
> > > Looking back through a historical git repo, purely out of curiosity--the
> > > cond_resched was previously a yield, and then I *think* bug #2 was
> > > introduced in 2002 by a "[PATCH] fs/locks.c: more cleanup".  Before that
> > > it retried the first loop after the yield.
> > > 
> > > Before that the yield was in locks_wake_up_blocks, removed by:
> > > 
> > > 	commit 7962ad19e6300531784722c16849837864304d84
> > > 	Author: Matthew Wilcox <willy@debian.org>
> > > 	Date:   Sat Jun 8 02:09:24 2002 -0700
> > > 
> > > 	[PATCH] fs/locks.c: Only yield once for flocks
> > > 	    
> > > 	This patch removes the annoying and confusing `wait' argument
> > > 	from many places.  The only change in behaviour is that we now
> > > 	yield once when unblocking other BSD-style flocks instead of
> > > 	once for each lock.
> > > 			    
> > > 	This slightly improves the semantics for userspace.  Before,
> > > 	when we had two tasks waiting on a lock, the first one would
> > > 	receive the lock.  Now, the one with the highest priority
> > > 	receives the lock.
> > > 
> > > So this really was intended to guarantee other waiters the lock before
> > > allowing an upgrade.  Could that actually have worked?
> > > 
> > > The non-atomic behavior is documented in flock(2), which says it's
> > > "original BSD behavior".
> > > 
> > > A comment at the top of locks.c says this is to avoid deadlock.  Hm, so,
> > > are we introducing a deadlock?:
> > > 
> > > 	1. Two processes both get shared lock on different filps.
> > > 	2. Both request exclusive lock.
> > > 
> > > Now they're stuck, whereas previously they might have recovered?
> > > 
> > > --b.
> > > 
> > 
> > 
> > Yes, I see that now. It might be best to preserve the existing behavior
> > for that reason then. I'd rather not introduce any behavioral changes in this
> > set if we can help it, particularly if there are userland apps that
> > might rely on it.
> > 
> > It may be best then to keep this patch, but drop patch #3.
> 
> Unfortunately it's this patch that I'm worried about.

(Urp, never mind, you and Linus were right here.)

> Also these patches are introducing some failure in my locking tests
> (probably unrelated, I doubt I ever wrote a test for this case).  I'll
> investigate.

I didn't try to figure out, but after dropping patch 3 everything does
pass.

--b.

> 
> --b.
> 
> > That should
> > be enough to ensure that we don't end up with two different locks on
> > the same file description, which is clearly a bug.
> > 
> > It might also not hurt to follow HCH's earlier advice and make
> > locks_remove_flock just iterate over the list and just unconditionally
> > delete any lock that in the case where the ->flock operation isn't set.
> > 
> > In principle we shouldn't need that once apply patch #2, but it would
> > probably be simpler than dealing with flock_lock_file in that case.
> > 
> > > > 
> > > > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > > > ---
> > > >  fs/locks.c | 10 ----------
> > > >  1 file changed, 10 deletions(-)
> > > > 
> > > > diff --git a/fs/locks.c b/fs/locks.c
> > > > index 7998f670812c..00c105f499a2 100644
> > > > --- a/fs/locks.c
> > > > +++ b/fs/locks.c
> > > > @@ -901,16 +901,6 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
> > > >  		goto out;
> > > >  	}
> > > >  
> > > > -	/*
> > > > -	 * If a higher-priority process was blocked on the old file lock,
> > > > -	 * give it the opportunity to lock the file.
> > > > -	 */
> > > > -	if (found) {
> > > > -		spin_unlock(&ctx->flc_lock);
> > > > -		cond_resched();
> > > > -		spin_lock(&ctx->flc_lock);
> > > > -	}
> > > > -
> > > >  find_conflict:
> > > >  	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
> > > >  		if (!flock_locks_conflict(request, fl))
> > > > -- 
> > > > 2.1.0
> > 
> > 
> > -- 
> > Jeff Layton <jeff.layton@primarydata.com>