All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
To: Mike Marshall <hubcap-gqc3UtWaqJ5Wk0Htik3J/w@public.gmane.org>
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Sasha Levin <sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v3 00/13] locks: saner method for managing file locks
Date: Mon, 2 Feb 2015 15:42:43 -0500	[thread overview]
Message-ID: <20150202154243.7023391f@tlielax.poochiereds.net> (raw)
In-Reply-To: <CAOg9mSQSW68-aCh529BdJJ-G3xeWGY3NsNoizPu5wNdrZz88vA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, 2 Feb 2015 15:29:33 -0500
Mike Marshall <hubcap-gqc3UtWaqJ5Wk0Htik3J/w@public.gmane.org> wrote:

> I applied Jeff's patch to my orangefs-3.19-rc5 tree. Orangefs
> doesn't yet support the lock callout for file_operations, but
> we have been experimenting with some ideas that would allow
> Orangefs to honor locks in our distributed environment: basically
> posix locks for each kernel client plus meta data in our userspace
> servers.
> 
> Anyhow, the lock callout in the Orangefs patch I've posted
> just returns ENOSYS. I have added posix locks in a current
> test, so now I have:
> 
> int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
> {
>         int rc;
> 
>         rc = posix_lock_file(filp, fl, NULL);
> 
>         return rc;
> }
> 
> So... while this isn't safe for our real world distributed environment,
> it makes the kernel with this version of the Orangefs kernel client
> loaded on it think that locks work.
> 
> Besides observing that locks seem to work by running some programs
> that need locks (like svn/sqlite) I also ran xfstest generic/131.
> 
> Without Jeff's patch, xfstest generic/131 fails:
> 
>   29:Verify that F_GETLK for F_WRLCK doesn't require that file be
>      opened for write
> 
> Same with Jeff's patch.
> 
> Jeff's patch applied painlessly, and my simple tests didn't uncover
> any problems with Jeff's implementation of separate lists for different
> lock types, so that's good.
> 
> What surprised me, though, is that the posix lock code failed
> to get all the way through xfstest generic/131... maybe you
> all knew that already?
> 
> -Mike
> 

Hmm... I haven't seen 131 fail like this, but your code above looks
wrong. You're ignoring the "cmd" argument, so even F_GETLK requests are
setting a lock.

I think you might want to do something like:

int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
{
	if (cmd == F_GETLK)
		return posix_test_lock(filp, fl);
	return posix_lock_file(filp, fl, fl);
}

...untested of course, but you get the idea.


> On Thu, Jan 22, 2015 at 9:27 AM, Jeff Layton
> <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> wrote:
> > v3:
> > - break out a ceph locking cleanup patch into a separate one earlier
> >   in the series
> > - switch back to using the i_lock to protect assignment of i_flctx.
> >   Using cmpxchg() was subject to races that I couldn't quite get a
> >   grip on. Eventually I may switch it back, but it probably doesn't
> >   provide much benefit.
> > - add a patch to clean up some comments that refer to i_flock
> > - use list_empty_careful in lockless checks rather than list_empty
> >
> > v2:
> > - bugfix to the flc_posix list ordering that broke the split/merge code
> > - don't use i_lock to manage the i_flctx pointer. Do so locklessly
> >   via cmpxchg().
> > - reordering of the patches to make the set bisectable. As a result
> >   the new spinlock is not introduced until near the end of the set
> > - some cleanup of the lm_change operation was added, made possible
> >   by the move to standard list_heads for tracking locks
> > - it takes greater pains to avoid spinlocking by checking when the
> >   lists are empty in addition to whether the i_flctx pointer is NULL
> > - a patch was added to keep count of the number of locks, so we can
> >   avoid having to do count/alloc/populate in ceph and cifs
> >
> > This is the third iteration of this patchset. The second was posted
> > on January 8th, under the cover letter entitled:
> >
> >     [PATCH v2 00/10] locks: saner method for managing file locks
> >
> > The big change here is that it once again uses the i_lock to protect the
> > i_flctx pointer assignment. In principle we should be able to use
> > cmpxchg instead, but that seems leave open a race that causes
> > locks_remove_file to miss recently-added locks. Given that is not a
> > super latency-sensitive codepath, an extra spinlock shouldn't make much
> > difference.
> >
> > Many thanks to Sasha Levin who helped me nail a race that would leave
> > leftover locks on the i_flock list, and David Howells who convinced
> > me to just go ahead back to using the i_lock to protect that field.
> >
> > There are some other minor changes, but overall it's pretty similar
> > to the last set. I still plan to merge this for v3.20.
> >
> > ------------------------[snip]-------------------------
> >
> > We currently manage all file_locks via a singly-linked list. This is
> > problematic for a number of reasons:
> >
> > - we have to protect all file locks with the same spinlock (or
> >   equivalent). Currently that uses the i_lock, but Christoph has voiced
> >   objections due to the potential for contention with other i_lock
> >   users. He'd like to see us move to using a different lock.
> >
> > - we have to walk through irrelevant file locks in order to get to the
> >   ones we're interested in. For instance, POSIX locks are at the end
> >   of the list, so we have to skip over all of the flock locks and
> >   leases before we can work with them.
> >
> > - the singly-linked list is primitive and difficult to work with. We
> >   have to keep track of the "before" pointer and it's easy to get that
> >   wrong.
> >
> > Cleaning all of this up is complicated by the fact that no one really
> > wants to grow struct inode in order to do so. We have a single pointer
> > in the inode now and I don't think we want to use any more.
> >
> > One possibility that Trond raised was to move this to an hlist, but
> > that doesn't do anything about the desire for a new spinlock.
> >
> > This patchset takes the approach of replacing the i_flock list with a
> > new struct file_lock_context that is allocated when we intend to add a
> > new file lock to an inode. The file_lock_context is only freed when we
> > destroy the inode.
> >
> > Within that, we have separate (and standard!) lists for each lock type,
> > and a dedicated spinlock for managing those lists. In principle we could
> > even consider adding separate locks for each, but I didn't bother with
> > that for now.
> >
> > For now, the code is still pretty "raw" and isn't bisectable. This is
> > just a RFC for the basic approach. This is probably v3.19 material at
> > best.
> >
> > Anyone have thoughts or comments on the basic approach?
> >
> > Jeff Layton (13):
> >   locks: add new struct list_head to struct file_lock
> >   locks: have locks_release_file use flock_lock_file to release generic
> >     flock locks
> >   locks: add a new struct file_locking_context pointer to struct inode
> >   ceph: move spinlocking into ceph_encode_locks_to_buffer and
> >     ceph_count_locks
> >   locks: move flock locks to file_lock_context
> >   locks: convert posix locks to file_lock_context
> >   locks: convert lease handling to file_lock_context
> >   locks: remove i_flock field from struct inode
> >   locks: add a dedicated spinlock to protect i_flctx lists
> >   locks: clean up the lm_change prototype
> >   locks: keep a count of locks on the flctx lists
> >   locks: consolidate NULL i_flctx checks in locks_remove_file
> >   locks: update comments that refer to inode->i_flock
> >
> >  fs/ceph/locks.c      |  64 +++---
> >  fs/ceph/mds_client.c |   4 -
> >  fs/cifs/file.c       |  34 +--
> >  fs/inode.c           |   3 +-
> >  fs/lockd/svcsubs.c   |  26 ++-
> >  fs/locks.c           | 569 +++++++++++++++++++++++++++------------------------
> >  fs/nfs/delegation.c  |  23 ++-
> >  fs/nfs/nfs4state.c   |  70 ++++---
> >  fs/nfs/pagelist.c    |   6 +-
> >  fs/nfs/write.c       |  41 +++-
> >  fs/nfsd/nfs4state.c  |  21 +-
> >  fs/read_write.c      |   2 +-
> >  include/linux/fs.h   |  52 +++--
> >  13 files changed, 510 insertions(+), 405 deletions(-)
> >
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jeff.layton@primarydata.com>
To: Mike Marshall <hubcap@omnibond.com>
Cc: linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	David Howells <dhowells@redhat.com>,
	linux-kernel@vger.kernel.org, linux-cifs@vger.kernel.org,
	linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [PATCH v3 00/13] locks: saner method for managing file locks
Date: Mon, 2 Feb 2015 15:42:43 -0500	[thread overview]
Message-ID: <20150202154243.7023391f@tlielax.poochiereds.net> (raw)
In-Reply-To: <CAOg9mSQSW68-aCh529BdJJ-G3xeWGY3NsNoizPu5wNdrZz88vA@mail.gmail.com>

On Mon, 2 Feb 2015 15:29:33 -0500
Mike Marshall <hubcap@omnibond.com> wrote:

> I applied Jeff's patch to my orangefs-3.19-rc5 tree. Orangefs
> doesn't yet support the lock callout for file_operations, but
> we have been experimenting with some ideas that would allow
> Orangefs to honor locks in our distributed environment: basically
> posix locks for each kernel client plus meta data in our userspace
> servers.
> 
> Anyhow, the lock callout in the Orangefs patch I've posted
> just returns ENOSYS. I have added posix locks in a current
> test, so now I have:
> 
> int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
> {
>         int rc;
> 
>         rc = posix_lock_file(filp, fl, NULL);
> 
>         return rc;
> }
> 
> So... while this isn't safe for our real world distributed environment,
> it makes the kernel with this version of the Orangefs kernel client
> loaded on it think that locks work.
> 
> Besides observing that locks seem to work by running some programs
> that need locks (like svn/sqlite) I also ran xfstest generic/131.
> 
> Without Jeff's patch, xfstest generic/131 fails:
> 
>   29:Verify that F_GETLK for F_WRLCK doesn't require that file be
>      opened for write
> 
> Same with Jeff's patch.
> 
> Jeff's patch applied painlessly, and my simple tests didn't uncover
> any problems with Jeff's implementation of separate lists for different
> lock types, so that's good.
> 
> What surprised me, though, is that the posix lock code failed
> to get all the way through xfstest generic/131... maybe you
> all knew that already?
> 
> -Mike
> 

Hmm... I haven't seen 131 fail like this, but your code above looks
wrong. You're ignoring the "cmd" argument, so even F_GETLK requests are
setting a lock.

I think you might want to do something like:

int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
{
	if (cmd == F_GETLK)
		return posix_test_lock(filp, fl);
	return posix_lock_file(filp, fl, fl);
}

...untested of course, but you get the idea.


> On Thu, Jan 22, 2015 at 9:27 AM, Jeff Layton
> <jeff.layton@primarydata.com> wrote:
> > v3:
> > - break out a ceph locking cleanup patch into a separate one earlier
> >   in the series
> > - switch back to using the i_lock to protect assignment of i_flctx.
> >   Using cmpxchg() was subject to races that I couldn't quite get a
> >   grip on. Eventually I may switch it back, but it probably doesn't
> >   provide much benefit.
> > - add a patch to clean up some comments that refer to i_flock
> > - use list_empty_careful in lockless checks rather than list_empty
> >
> > v2:
> > - bugfix to the flc_posix list ordering that broke the split/merge code
> > - don't use i_lock to manage the i_flctx pointer. Do so locklessly
> >   via cmpxchg().
> > - reordering of the patches to make the set bisectable. As a result
> >   the new spinlock is not introduced until near the end of the set
> > - some cleanup of the lm_change operation was added, made possible
> >   by the move to standard list_heads for tracking locks
> > - it takes greater pains to avoid spinlocking by checking when the
> >   lists are empty in addition to whether the i_flctx pointer is NULL
> > - a patch was added to keep count of the number of locks, so we can
> >   avoid having to do count/alloc/populate in ceph and cifs
> >
> > This is the third iteration of this patchset. The second was posted
> > on January 8th, under the cover letter entitled:
> >
> >     [PATCH v2 00/10] locks: saner method for managing file locks
> >
> > The big change here is that it once again uses the i_lock to protect the
> > i_flctx pointer assignment. In principle we should be able to use
> > cmpxchg instead, but that seems leave open a race that causes
> > locks_remove_file to miss recently-added locks. Given that is not a
> > super latency-sensitive codepath, an extra spinlock shouldn't make much
> > difference.
> >
> > Many thanks to Sasha Levin who helped me nail a race that would leave
> > leftover locks on the i_flock list, and David Howells who convinced
> > me to just go ahead back to using the i_lock to protect that field.
> >
> > There are some other minor changes, but overall it's pretty similar
> > to the last set. I still plan to merge this for v3.20.
> >
> > ------------------------[snip]-------------------------
> >
> > We currently manage all file_locks via a singly-linked list. This is
> > problematic for a number of reasons:
> >
> > - we have to protect all file locks with the same spinlock (or
> >   equivalent). Currently that uses the i_lock, but Christoph has voiced
> >   objections due to the potential for contention with other i_lock
> >   users. He'd like to see us move to using a different lock.
> >
> > - we have to walk through irrelevant file locks in order to get to the
> >   ones we're interested in. For instance, POSIX locks are at the end
> >   of the list, so we have to skip over all of the flock locks and
> >   leases before we can work with them.
> >
> > - the singly-linked list is primitive and difficult to work with. We
> >   have to keep track of the "before" pointer and it's easy to get that
> >   wrong.
> >
> > Cleaning all of this up is complicated by the fact that no one really
> > wants to grow struct inode in order to do so. We have a single pointer
> > in the inode now and I don't think we want to use any more.
> >
> > One possibility that Trond raised was to move this to an hlist, but
> > that doesn't do anything about the desire for a new spinlock.
> >
> > This patchset takes the approach of replacing the i_flock list with a
> > new struct file_lock_context that is allocated when we intend to add a
> > new file lock to an inode. The file_lock_context is only freed when we
> > destroy the inode.
> >
> > Within that, we have separate (and standard!) lists for each lock type,
> > and a dedicated spinlock for managing those lists. In principle we could
> > even consider adding separate locks for each, but I didn't bother with
> > that for now.
> >
> > For now, the code is still pretty "raw" and isn't bisectable. This is
> > just a RFC for the basic approach. This is probably v3.19 material at
> > best.
> >
> > Anyone have thoughts or comments on the basic approach?
> >
> > Jeff Layton (13):
> >   locks: add new struct list_head to struct file_lock
> >   locks: have locks_release_file use flock_lock_file to release generic
> >     flock locks
> >   locks: add a new struct file_locking_context pointer to struct inode
> >   ceph: move spinlocking into ceph_encode_locks_to_buffer and
> >     ceph_count_locks
> >   locks: move flock locks to file_lock_context
> >   locks: convert posix locks to file_lock_context
> >   locks: convert lease handling to file_lock_context
> >   locks: remove i_flock field from struct inode
> >   locks: add a dedicated spinlock to protect i_flctx lists
> >   locks: clean up the lm_change prototype
> >   locks: keep a count of locks on the flctx lists
> >   locks: consolidate NULL i_flctx checks in locks_remove_file
> >   locks: update comments that refer to inode->i_flock
> >
> >  fs/ceph/locks.c      |  64 +++---
> >  fs/ceph/mds_client.c |   4 -
> >  fs/cifs/file.c       |  34 +--
> >  fs/inode.c           |   3 +-
> >  fs/lockd/svcsubs.c   |  26 ++-
> >  fs/locks.c           | 569 +++++++++++++++++++++++++++------------------------
> >  fs/nfs/delegation.c  |  23 ++-
> >  fs/nfs/nfs4state.c   |  70 ++++---
> >  fs/nfs/pagelist.c    |   6 +-
> >  fs/nfs/write.c       |  41 +++-
> >  fs/nfsd/nfs4state.c  |  21 +-
> >  fs/read_write.c      |   2 +-
> >  include/linux/fs.h   |  52 +++--
> >  13 files changed, 510 insertions(+), 405 deletions(-)
> >
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jeff.layton@primarydata.com>

  parent reply	other threads:[~2015-02-02 20:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-22 14:27 [PATCH v3 00/13] locks: saner method for managing file locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 02/13] locks: have locks_release_file use flock_lock_file to release generic flock locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 03/13] locks: add a new struct file_locking_context pointer to struct inode Jeff Layton
2015-01-22 14:27 ` [PATCH v3 04/13] ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks Jeff Layton
2015-01-22 14:27 ` [PATCH v3 06/13] locks: convert posix locks to file_lock_context Jeff Layton
2015-01-22 14:27 ` [PATCH v3 07/13] locks: convert lease handling " Jeff Layton
2015-01-22 14:27 ` [PATCH v3 08/13] locks: remove i_flock field from struct inode Jeff Layton
     [not found] ` <1421936877-27529-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2015-01-22 14:27   ` [PATCH v3 01/13] locks: add new struct list_head to struct file_lock Jeff Layton
2015-01-22 14:27     ` Jeff Layton
2015-01-22 14:27   ` [PATCH v3 05/13] locks: move flock locks to file_lock_context Jeff Layton
2015-01-22 14:27     ` Jeff Layton
2015-01-22 14:27   ` [PATCH v3 09/13] locks: add a dedicated spinlock to protect i_flctx lists Jeff Layton
2015-01-22 14:27     ` Jeff Layton
2015-01-22 14:27   ` [PATCH v3 10/13] locks: clean up the lm_change prototype Jeff Layton
2015-01-22 14:27     ` Jeff Layton
2015-01-22 14:27   ` [PATCH v3 13/13] locks: update comments that refer to inode->i_flock Jeff Layton
2015-01-22 14:27     ` Jeff Layton
2015-02-02 20:29   ` [PATCH v3 00/13] locks: saner method for managing file locks Mike Marshall
2015-02-02 20:29     ` Mike Marshall
     [not found]     ` <CAOg9mSQSW68-aCh529BdJJ-G3xeWGY3NsNoizPu5wNdrZz88vA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-02 20:42       ` Jeff Layton [this message]
2015-02-02 20:42         ` Jeff Layton
2015-02-03 18:01         ` Mike Marshall
2015-01-22 14:27 ` [PATCH v3 11/13] locks: keep a count of locks on the flctx lists Jeff Layton
2015-01-22 14:27 ` [PATCH v3 12/13] locks: consolidate NULL i_flctx checks in locks_remove_file Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150202154243.7023391f@tlielax.poochiereds.net \
    --to=jeff.layton-7i+n7zu2hftekmmhf/gkza@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=hubcap-gqc3UtWaqJ5Wk0Htik3J/w@public.gmane.org \
    --cc=linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sasha.levin-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.