From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Layton Subject: Re: [PATCH 08/12] locks: break delegations on unlink Date: Tue, 9 Jul 2013 09:05:06 -0400 Message-ID: <20130709090506.71c96841@tlielax.poochiereds.net> References: <1372882356-14168-1-git-send-email-bfields@redhat.com> <1372882356-14168-9-git-send-email-bfields@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Al Viro , linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Howells , Tyler Hicks , Dustin Kirkland To: "J. Bruce Fields" Return-path: In-Reply-To: <1372882356-14168-9-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org On Wed, 3 Jul 2013 16:12:32 -0400 "J. Bruce Fields" wrote: > From: "J. Bruce Fields" > > We need to break delegations on any operation that changes the set of > links pointing to an inode. Start with unlink. > > Such operations also hold the i_mutex on a parent directory. Breaking a > delegation may require waiting for a timeout (by default 90 seconds) in > the case of a unresponsive NFS client. To avoid blocking all directory > operations, we therefore drop locks before waiting for the delegation. > The logic then looks like: > > acquire locks > ... > test for delegation; if found: > take reference on inode > release locks > wait for delegation break > drop reference on inode > retry > > It is possible this could never terminate. (Even if we take precautions > to prevent another delegation being acquired on the same inode, we could > get a different inode on each retry.) But this seems very unlikely. > > The initial test for a delegation happens after the lock on the target > inode is acquired, but the directory inode may have been acquired > further up the call stack. We therefore add a "struct inode **" > argument to any intervening functions, which we use to pass the inode > back up to the caller in the case it needs a delegation synchronously > broken. > > Cc: David Howells > Cc: Tyler Hicks > Cc: Dustin Kirkland > Signed-off-by: J. Bruce Fields > --- > drivers/base/devtmpfs.c | 2 +- > fs/cachefiles/namei.c | 2 +- > fs/ecryptfs/inode.c | 2 +- > fs/namei.c | 24 +++++++++++++++++++++--- > fs/nfsd/vfs.c | 2 +- > include/linux/fs.h | 2 +- > ipc/mqueue.c | 2 +- > 7 files changed, 27 insertions(+), 9 deletions(-) > > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c > index 7413d06..1b8490e 100644 > --- a/drivers/base/devtmpfs.c > +++ b/drivers/base/devtmpfs.c > @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev) > mutex_lock(&dentry->d_inode->i_mutex); > notify_change(dentry, &newattrs); > mutex_unlock(&dentry->d_inode->i_mutex); > - err = vfs_unlink(parent.dentry->d_inode, dentry); > + err = vfs_unlink(parent.dentry->d_inode, dentry, NULL); > if (!err || err == -ENOENT) > deleted = 1; > } > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c > index 8c01c5fc..d61d884 100644 > --- a/fs/cachefiles/namei.c > +++ b/fs/cachefiles/namei.c > @@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache, > if (ret < 0) { > cachefiles_io_error(cache, "Unlink security error"); > } else { > - ret = vfs_unlink(dir->d_inode, rep); > + ret = vfs_unlink(dir->d_inode, rep, NULL); > > if (preemptive) > cachefiles_mark_object_buried(cache, rep); > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index 5eab400..af42d88 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry, > > dget(lower_dentry); > lower_dir_dentry = lock_parent(lower_dentry); > - rc = vfs_unlink(lower_dir_inode, lower_dentry); > + rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL); > if (rc) { > printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc); > goto out_unlock; > diff --git a/fs/namei.c b/fs/namei.c > index 7e76fe1..cba3db1 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname) > return do_rmdir(AT_FDCWD, pathname); > } > > -int vfs_unlink(struct inode *dir, struct dentry *dentry) > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode) nit: this might be a good time to add a kerneldoc header on this function. The delegated_inode thing might not be clear to the uninitiated. > { > struct inode *target = dentry->d_inode; > int error = may_delete(dir, dentry, 0); > @@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry) > else { > error = security_inode_unlink(dir, dentry); > if (!error) { > + error = break_deleg(target, O_WRONLY|O_NONBLOCK); > + if (error) { > + if (error == -EWOULDBLOCK && delegated_inode) { > + *delegated_inode = target; > + ihold(target); > + } > + goto out; > + } > error = dir->i_op->unlink(dir, dentry); > if (!error) > dont_mount(dentry); > } > } > +out: > mutex_unlock(&target->i_mutex); > > /* We don't d_delete() NFS sillyrenamed files--they still exist. */ > @@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname) > struct dentry *dentry; > struct nameidata nd; > struct inode *inode = NULL; > + struct inode *delegated_inode = NULL; > unsigned int lookup_flags = 0; > retry: > name = user_path_parent(dfd, pathname, &nd, lookup_flags); > @@ -3444,7 +3454,7 @@ retry: > error = mnt_want_write(nd.path.mnt); > if (error) > goto exit1; > - > +retry_deleg: > mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); > dentry = lookup_hash(&nd); > error = PTR_ERR(dentry); > @@ -3459,13 +3469,21 @@ retry: > error = security_path_unlink(&nd.path, dentry); > if (error) > goto exit2; > - error = vfs_unlink(nd.path.dentry->d_inode, dentry); > + error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode); > exit2: > dput(dentry); > } > mutex_unlock(&nd.path.dentry->d_inode->i_mutex); > if (inode) > iput(inode); /* truncate the inode here */ > + inode = NULL; > + if (delegated_inode) { > + error = break_deleg(delegated_inode, O_WRONLY); > + iput(delegated_inode); > + delegated_inode = NULL; > + if (!error) > + goto retry_deleg; > + } > mnt_drop_write(nd.path.mnt); > exit1: > path_put(&nd.path); > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 84ce601..6ccaca2 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, > if (host_err) > goto out_put; > if (type != S_IFDIR) > - host_err = vfs_unlink(dirp, rdentry); > + host_err = vfs_unlink(dirp, rdentry, NULL); > else > host_err = vfs_rmdir(dirp, rdentry); > if (!host_err) > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c6cc686..f951588 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t); > extern int vfs_symlink(struct inode *, struct dentry *, const char *); > extern int vfs_link(struct dentry *, struct inode *, struct dentry *); > extern int vfs_rmdir(struct inode *, struct dentry *); > -extern int vfs_unlink(struct inode *, struct dentry *); > +extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); > extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); > > /* > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index e4e47f6..384eb35 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name) > err = -ENOENT; > } else { > ihold(inode); > - err = vfs_unlink(dentry->d_parent->d_inode, dentry); > + err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); > } > dput(dentry); > We probably also ought to eyeball some of these other cases where you passing in NULL as the deleg_inode too. It's probably reasonable in most cases -- exporting a filesystem that you also mount using ecryptfs seems silly, but you never know... Looks reasonable otherwise, if a little convoluted. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from mx1.redhat.com ([209.132.183.28]:5150 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753542Ab3GINFN (ORCPT ); Tue, 9 Jul 2013 09:05:13 -0400 Date: Tue, 9 Jul 2013 09:05:06 -0400 From: Jeff Layton To: "J. Bruce Fields" Cc: Al Viro , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Howells , Tyler Hicks , Dustin Kirkland Subject: Re: [PATCH 08/12] locks: break delegations on unlink Message-ID: <20130709090506.71c96841@tlielax.poochiereds.net> In-Reply-To: <1372882356-14168-9-git-send-email-bfields@redhat.com> References: <1372882356-14168-1-git-send-email-bfields@redhat.com> <1372882356-14168-9-git-send-email-bfields@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, 3 Jul 2013 16:12:32 -0400 "J. Bruce Fields" wrote: > From: "J. Bruce Fields" > > We need to break delegations on any operation that changes the set of > links pointing to an inode. Start with unlink. > > Such operations also hold the i_mutex on a parent directory. Breaking a > delegation may require waiting for a timeout (by default 90 seconds) in > the case of a unresponsive NFS client. To avoid blocking all directory > operations, we therefore drop locks before waiting for the delegation. > The logic then looks like: > > acquire locks > ... > test for delegation; if found: > take reference on inode > release locks > wait for delegation break > drop reference on inode > retry > > It is possible this could never terminate. (Even if we take precautions > to prevent another delegation being acquired on the same inode, we could > get a different inode on each retry.) But this seems very unlikely. > > The initial test for a delegation happens after the lock on the target > inode is acquired, but the directory inode may have been acquired > further up the call stack. We therefore add a "struct inode **" > argument to any intervening functions, which we use to pass the inode > back up to the caller in the case it needs a delegation synchronously > broken. > > Cc: David Howells > Cc: Tyler Hicks > Cc: Dustin Kirkland > Signed-off-by: J. Bruce Fields > --- > drivers/base/devtmpfs.c | 2 +- > fs/cachefiles/namei.c | 2 +- > fs/ecryptfs/inode.c | 2 +- > fs/namei.c | 24 +++++++++++++++++++++--- > fs/nfsd/vfs.c | 2 +- > include/linux/fs.h | 2 +- > ipc/mqueue.c | 2 +- > 7 files changed, 27 insertions(+), 9 deletions(-) > > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c > index 7413d06..1b8490e 100644 > --- a/drivers/base/devtmpfs.c > +++ b/drivers/base/devtmpfs.c > @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev) > mutex_lock(&dentry->d_inode->i_mutex); > notify_change(dentry, &newattrs); > mutex_unlock(&dentry->d_inode->i_mutex); > - err = vfs_unlink(parent.dentry->d_inode, dentry); > + err = vfs_unlink(parent.dentry->d_inode, dentry, NULL); > if (!err || err == -ENOENT) > deleted = 1; > } > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c > index 8c01c5fc..d61d884 100644 > --- a/fs/cachefiles/namei.c > +++ b/fs/cachefiles/namei.c > @@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache, > if (ret < 0) { > cachefiles_io_error(cache, "Unlink security error"); > } else { > - ret = vfs_unlink(dir->d_inode, rep); > + ret = vfs_unlink(dir->d_inode, rep, NULL); > > if (preemptive) > cachefiles_mark_object_buried(cache, rep); > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index 5eab400..af42d88 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry, > > dget(lower_dentry); > lower_dir_dentry = lock_parent(lower_dentry); > - rc = vfs_unlink(lower_dir_inode, lower_dentry); > + rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL); > if (rc) { > printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc); > goto out_unlock; > diff --git a/fs/namei.c b/fs/namei.c > index 7e76fe1..cba3db1 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname) > return do_rmdir(AT_FDCWD, pathname); > } > > -int vfs_unlink(struct inode *dir, struct dentry *dentry) > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode) nit: this might be a good time to add a kerneldoc header on this function. The delegated_inode thing might not be clear to the uninitiated. > { > struct inode *target = dentry->d_inode; > int error = may_delete(dir, dentry, 0); > @@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry) > else { > error = security_inode_unlink(dir, dentry); > if (!error) { > + error = break_deleg(target, O_WRONLY|O_NONBLOCK); > + if (error) { > + if (error == -EWOULDBLOCK && delegated_inode) { > + *delegated_inode = target; > + ihold(target); > + } > + goto out; > + } > error = dir->i_op->unlink(dir, dentry); > if (!error) > dont_mount(dentry); > } > } > +out: > mutex_unlock(&target->i_mutex); > > /* We don't d_delete() NFS sillyrenamed files--they still exist. */ > @@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname) > struct dentry *dentry; > struct nameidata nd; > struct inode *inode = NULL; > + struct inode *delegated_inode = NULL; > unsigned int lookup_flags = 0; > retry: > name = user_path_parent(dfd, pathname, &nd, lookup_flags); > @@ -3444,7 +3454,7 @@ retry: > error = mnt_want_write(nd.path.mnt); > if (error) > goto exit1; > - > +retry_deleg: > mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); > dentry = lookup_hash(&nd); > error = PTR_ERR(dentry); > @@ -3459,13 +3469,21 @@ retry: > error = security_path_unlink(&nd.path, dentry); > if (error) > goto exit2; > - error = vfs_unlink(nd.path.dentry->d_inode, dentry); > + error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode); > exit2: > dput(dentry); > } > mutex_unlock(&nd.path.dentry->d_inode->i_mutex); > if (inode) > iput(inode); /* truncate the inode here */ > + inode = NULL; > + if (delegated_inode) { > + error = break_deleg(delegated_inode, O_WRONLY); > + iput(delegated_inode); > + delegated_inode = NULL; > + if (!error) > + goto retry_deleg; > + } > mnt_drop_write(nd.path.mnt); > exit1: > path_put(&nd.path); > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 84ce601..6ccaca2 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, > if (host_err) > goto out_put; > if (type != S_IFDIR) > - host_err = vfs_unlink(dirp, rdentry); > + host_err = vfs_unlink(dirp, rdentry, NULL); > else > host_err = vfs_rmdir(dirp, rdentry); > if (!host_err) > diff --git a/include/linux/fs.h b/include/linux/fs.h > index c6cc686..f951588 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t); > extern int vfs_symlink(struct inode *, struct dentry *, const char *); > extern int vfs_link(struct dentry *, struct inode *, struct dentry *); > extern int vfs_rmdir(struct inode *, struct dentry *); > -extern int vfs_unlink(struct inode *, struct dentry *); > +extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); > extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); > > /* > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index e4e47f6..384eb35 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name) > err = -ENOENT; > } else { > ihold(inode); > - err = vfs_unlink(dentry->d_parent->d_inode, dentry); > + err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL); > } > dput(dentry); > We probably also ought to eyeball some of these other cases where you passing in NULL as the deleg_inode too. It's probably reasonable in most cases -- exporting a filesystem that you also mount using ecryptfs seems silly, but you never know... Looks reasonable otherwise, if a little convoluted. -- Jeff Layton