Re: [RFC PATCH] filelock: new helper: vfs_file_has_locks

From: Xiubo Li <xiubli@redhat.com>
To: Jeff Layton <jlayton@kernel.org>, chuck.lever@oracle.com
Cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC PATCH] filelock: new helper: vfs_file_has_locks
Date: Wed, 16 Nov 2022 14:49:34 +0800	[thread overview]
Message-ID: <a8c94ba5-c01f-3bb6-0b35-2aee06b9d6e7@redhat.com> (raw)
In-Reply-To: <4a8720c8a24a9b06adc40fdada9c621fd5d849df.camel@kernel.org>

On 15/11/2022 22:40, Jeff Layton wrote:
> On Tue, 2022-11-15 at 13:43 +0800, Xiubo Li wrote:
>> On 15/11/2022 03:46, Jeff Layton wrote:
>>> On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
>>>> Ceph has a need to know whether a particular file has any locks set on
>>>> it. It's currently tracking that by a num_locks field in its
>>>> filp->private_data, but that's problematic as it tries to decrement this
>>>> field when releasing locks and that can race with the file being torn
>>>> down.
>>>>
>>>> Add a new vfs_file_has_locks helper that will scan the flock and posix
>>>> lists, and return true if any of the locks have a fl_file that matches
>>>> the given one. Ceph can then call this instead of doing its own
>>>> tracking.
>>>>
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>>    fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
>>>>    include/linux/fs.h |  1 +
>>>>    2 files changed, 37 insertions(+)
>>>>
>>>> Xiubo,
>>>>
>>>> Here's what I was thinking instead of trying to track this within ceph.
>>>> Most inodes never have locks set, so in most cases this will be a NULL
>>>> pointer check.
>>>>
>>>>
>>>>
>>> I went ahead and added a slightly updated version of this this to my
>>> locks-next branch for now, but...
>>>
>>> Thinking about this more...I'm not sure this whole concept of what the
>>> ceph code is trying to do makes sense. Locks only conflict if they have
>>> different owners, and POSIX locks are owned by the process. Consider
>>> this scenario (obviously, this is not a problem with OFD locks).
>>>
>>> A process has the same file open via two different fds. It sets lock A
>>> from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
>>> fd 2. The two locks will be merged, because they don't conflict (because
>>> it's the same process).
>>>
>>> Against which fd should the merged lock record be counted?
>> Thanks Jeff.
>>
>> For the above example as you mentioned, from my reading of the lock code
>> after being merged it will always keep the old file_lock's fl_file.
>>
>> There is another case that if the Inode already has LockA and LockB:
>>
>> Lock A --> [0, 9] --> fileA
>>
>> Lock B --> [15, 20] --> fileB
>>
>> And then LockC comes:
>>
>> Lock C --> [8, 16] --> fileC
>>
>> Then the inode will only have the LockB:
>>
>> Lock B --> [0, 20] --> fileB.
>>
>> So the exiting ceph code seems buggy!
>>
> Yeah, there are a number of ways to end up with a different fl_file than
> you started with.
>   
>>> Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
>>> the fd hasn't had any locks explicitly set on it?
>> Maybe we should check whether any POSIX lock exist, if so we should
>> check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on
>> each fd ?
>>
>>
> It was originally added here:
>
> commit ff5d913dfc7142974eb1694d5fd6284658e46bc6
> Author: Yan, Zheng <zyan@redhat.com>
> Date:   Thu Jul 25 20:16:45 2019 +0800
>
>      ceph: return -EIO if read/write against filp that lost file locks
>      
>      After mds evicts session, file locks get lost sliently. It's not safe to
>      let programs continue to do read/write.
>      
>      Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
>      Reviewed-by: Jeff Layton <jlayton@kernel.org>
>      Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>
> So I guess with the current code if you have the file open and set a
> lock on it, you'll get back EIO when you try to get caps for it, but if
> you never set a lock on the fd, then you wouldn't get an error. We don't
> reliably keep track of what fd was used to set a lock (as noted above),
> so we can't really do what Zheng was trying to do here.
>
> Having a file where some openers use locking and others don't is a
> really odd usage pattern though. Locks are like stoplights -- they only
> work if everyone pays attention to them.
>
> I think we should probably switch ceph_get_caps to just check whether
> any locks are set on the file. If there are POSIX/OFD/FLOCK locks on the
> file at the time, we should set CHECK_FILELOCK, regardless of what fd
> was used to set the lock.
>
> In practical terms, we probably want a vfs_inode_has_locks function,
> that just tests whether the flc_posix and flc_flock lists are empty.

Jeff,

Yeah, this sounds good to me.

> Maybe something like this instead? Then ceph could call this from
> ceph_get_caps and set CHECK_FILELOCK if it returns true.
>
> -------------8<---------------
>
> [PATCH] filelock: new helper: vfs_inode_has_locks
>
> Ceph has a need to know whether a particular inode has any locks set on
> it. It's currently tracking that by a num_locks field in its
> filp->private_data, but that's problematic as it tries to decrement this
> field when releasing locks and that can race with the file being torn
> down.
>
> Add a new vfs_inode_has_locks helper that just returns whether any locks
> are currently held on the inode.
>
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/locks.c         | 23 +++++++++++++++++++++++
>   include/linux/fs.h |  1 +
>   2 files changed, 24 insertions(+)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 5876c8ff0edc..9ccf89b6c95d 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2672,6 +2672,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
>   }
>   EXPORT_SYMBOL_GPL(vfs_cancel_lock);
>   
> +/**
> + * vfs_inode_has_locks - are any file locks held on @inode?
> + * @inode: inode to check for locks
> + *
> + * Return true if there are any FL_POSIX or FL_FLOCK locks currently
> + * set on @inode.
> + */
> +bool vfs_inode_has_locks(struct inode *inode)
> +{
> +	struct file_lock_context *ctx;
> +	bool ret;
> +
> +	ctx = smp_load_acquire(&inode->i_flctx);
> +	if (!ctx)
> +		return false;
> +
> +	spin_lock(&ctx->flc_lock);
> +	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
> +	spin_unlock(&ctx->flc_lock);

BTW, is the spin_lock/spin_unlock here really needed ?

> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
> +
>   #ifdef CONFIG_PROC_FS
>   #include <linux/proc_fs.h>
>   #include <linux/seq_file.h>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e654435f1651..d6cb42b7e91c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
>   extern int vfs_test_lock(struct file *, struct file_lock *);
>   extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>   extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> +bool vfs_inode_has_locks(struct inode *inode);
>   extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
>   extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>   extern void lease_get_mtime(struct inode *, struct timespec64 *time);

All the others LGTM.

Thanks.

- Xiubo