All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
@ 2015-09-03 17:33 Jeff Layton
  2015-09-03 18:43 ` J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2015-09-03 17:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

There are cases with NFSv3 where the client doesn't actually care about
WCC attributes in replies. If the server is mainly acting as a DS for
flexfiles, then the client just throws out those attributes anyway.
Also, in the case where the client is primarily doing direct I/O, post
op attributes aren't terribly useful

Another reason to allow turning these off is that NFS will flush all
buffered writes prior to issuing a GETATTR, and it also takes the
i_mutex in its ->getattr operation.

If we're doing a vfs_getattr after most RPCs, then we can end up
deadlocking or (at best) prematurely flushing buffered writes, which
kills performance.

This patch adds a new export option -- "nowcc" that disables the return
of WCC attributes in NFSv3 replies. I also have a userland patch that
adds support for the same option to nfs-utils that I'll send along as
well.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/export.c                 | 1 +
 fs/nfsd/nfs3xdr.c                | 5 ++++-
 fs/nfsd/nfsfh.c                  | 4 ++++
 fs/nfsd/nfsfh.h                  | 5 ++++-
 include/uapi/linux/nfsd/export.h | 3 ++-
 5 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b579f20..97258009ce1e 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1092,6 +1092,7 @@ static struct flags {
 	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
 	{ NFSEXP_V4ROOT, {"v4root", ""}},
 	{ NFSEXP_PNFS, {"pnfs", ""}},
+	{ NFSEXP_NOWCC, {"nowcc", ""}},
 	{ 0, {"", ""}}
 };
 
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 01dcd494f781..c30c8c604e2a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -203,7 +203,7 @@ static __be32 *
 encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
 {
 	struct dentry *dentry = fhp->fh_dentry;
-	if (dentry && d_really_is_positive(dentry)) {
+	if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
 	        __be32 err;
 		struct kstat stat;
 
@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
 {
 	__be32 err;
 
+	if (fhp->fh_no_wcc)
+		return;
+
 	if (fhp->fh_post_saved)
 		printk("nfsd: inode locked twice during operation.\n");
 
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 350041a40fe5..32093b7dce55 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 
 	fhp->fh_dentry = dentry;
 	fhp->fh_export = exp;
+	if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
+		fhp->fh_no_wcc = true;
+
 	return 0;
 out:
 	exp_put(exp);
@@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
 		exp_put(exp);
 		fhp->fh_export = NULL;
 	}
+	fhp->fh_no_wcc = false;
 	return;
 }
 
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 1e90dad4926b..9ddead4d98f8 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -32,6 +32,7 @@ typedef struct svc_fh {
 
 	unsigned char		fh_locked;	/* inode locked by us */
 	unsigned char		fh_want_write;	/* remount protection taken */
+	bool			fh_no_wcc;	/* no wcc data needed */
 
 #ifdef CONFIG_NFSD_V3
 	unsigned char		fh_post_saved;	/* post-op attrs saved */
@@ -51,7 +52,6 @@ typedef struct svc_fh {
 	struct kstat		fh_post_attr;	/* full attrs after operation */
 	u64			fh_post_change; /* nfsv4 change; see above */
 #endif /* CONFIG_NFSD_V3 */
-
 } svc_fh;
 
 enum nfsd_fsid {
@@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
 {
 	struct inode    *inode;
 
+	if (fhp->fh_no_wcc)
+		return;
+
 	inode = d_inode(fhp->fh_dentry);
 	if (!fhp->fh_pre_saved) {
 		fhp->fh_pre_mtime = inode->i_mtime;
diff --git a/include/uapi/linux/nfsd/export.h b/include/uapi/linux/nfsd/export.h
index 0df7bd5d2fb1..4c132290f414 100644
--- a/include/uapi/linux/nfsd/export.h
+++ b/include/uapi/linux/nfsd/export.h
@@ -51,9 +51,10 @@
  */
 #define	NFSEXP_V4ROOT		0x10000
 #define NFSEXP_PNFS		0x20000
+#define NFSEXP_NOWCC		0x40000
 
 /* All flags that we claim to support.  (Note we don't support NOACL.) */
-#define NFSEXP_ALLFLAGS		0x3FE7F
+#define NFSEXP_ALLFLAGS		0x7FE7F
 
 /* The flags that may vary depending on security flavor: */
 #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY | NFSEXP_ROOTSQUASH \
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 17:33 [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies Jeff Layton
@ 2015-09-03 18:43 ` J. Bruce Fields
  2015-09-03 18:52   ` Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2015-09-03 18:43 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> There are cases with NFSv3 where the client doesn't actually care about
> WCC attributes in replies. If the server is mainly acting as a DS for
> flexfiles, then the client just throws out those attributes anyway.
> Also, in the case where the client is primarily doing direct I/O, post
> op attributes aren't terribly useful
>
> Another reason to allow turning these off is that NFS will flush all
> buffered writes prior to issuing a GETATTR, and it also takes the
> i_mutex in its ->getattr operation.
>
> If we're doing a vfs_getattr after most RPCs, then we can end up
> deadlocking or (at best) prematurely flushing buffered writes, which
> kills performance.

So you're talking about the NFS re-export case?  Do we know of any other
case when a ->getattr is so expensive?

--b.

> This patch adds a new export option -- "nowcc" that disables the return
> of WCC attributes in NFSv3 replies. I also have a userland patch that
> adds support for the same option to nfs-utils that I'll send along as
> well.
> 
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> ---
>  fs/nfsd/export.c                 | 1 +
>  fs/nfsd/nfs3xdr.c                | 5 ++++-
>  fs/nfsd/nfsfh.c                  | 4 ++++
>  fs/nfsd/nfsfh.h                  | 5 ++++-
>  include/uapi/linux/nfsd/export.h | 3 ++-
>  5 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index b4d84b579f20..97258009ce1e 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -1092,6 +1092,7 @@ static struct flags {
>  	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
>  	{ NFSEXP_V4ROOT, {"v4root", ""}},
>  	{ NFSEXP_PNFS, {"pnfs", ""}},
> +	{ NFSEXP_NOWCC, {"nowcc", ""}},
>  	{ 0, {"", ""}}
>  };
>  
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 01dcd494f781..c30c8c604e2a 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -203,7 +203,7 @@ static __be32 *
>  encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
>  {
>  	struct dentry *dentry = fhp->fh_dentry;
> -	if (dentry && d_really_is_positive(dentry)) {
> +	if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
>  	        __be32 err;
>  		struct kstat stat;
>  
> @@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
>  {
>  	__be32 err;
>  
> +	if (fhp->fh_no_wcc)
> +		return;
> +
>  	if (fhp->fh_post_saved)
>  		printk("nfsd: inode locked twice during operation.\n");
>  
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 350041a40fe5..32093b7dce55 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  
>  	fhp->fh_dentry = dentry;
>  	fhp->fh_export = exp;
> +	if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
> +		fhp->fh_no_wcc = true;
> +
>  	return 0;
>  out:
>  	exp_put(exp);
> @@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
>  		exp_put(exp);
>  		fhp->fh_export = NULL;
>  	}
> +	fhp->fh_no_wcc = false;
>  	return;
>  }
>  
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 1e90dad4926b..9ddead4d98f8 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -32,6 +32,7 @@ typedef struct svc_fh {
>  
>  	unsigned char		fh_locked;	/* inode locked by us */
>  	unsigned char		fh_want_write;	/* remount protection taken */
> +	bool			fh_no_wcc;	/* no wcc data needed */
>  
>  #ifdef CONFIG_NFSD_V3
>  	unsigned char		fh_post_saved;	/* post-op attrs saved */
> @@ -51,7 +52,6 @@ typedef struct svc_fh {
>  	struct kstat		fh_post_attr;	/* full attrs after operation */
>  	u64			fh_post_change; /* nfsv4 change; see above */
>  #endif /* CONFIG_NFSD_V3 */
> -
>  } svc_fh;
>  
>  enum nfsd_fsid {
> @@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
>  {
>  	struct inode    *inode;
>  
> +	if (fhp->fh_no_wcc)
> +		return;
> +
>  	inode = d_inode(fhp->fh_dentry);
>  	if (!fhp->fh_pre_saved) {
>  		fhp->fh_pre_mtime = inode->i_mtime;
> diff --git a/include/uapi/linux/nfsd/export.h b/include/uapi/linux/nfsd/export.h
> index 0df7bd5d2fb1..4c132290f414 100644
> --- a/include/uapi/linux/nfsd/export.h
> +++ b/include/uapi/linux/nfsd/export.h
> @@ -51,9 +51,10 @@
>   */
>  #define	NFSEXP_V4ROOT		0x10000
>  #define NFSEXP_PNFS		0x20000
> +#define NFSEXP_NOWCC		0x40000
>  
>  /* All flags that we claim to support.  (Note we don't support NOACL.) */
> -#define NFSEXP_ALLFLAGS		0x3FE7F
> +#define NFSEXP_ALLFLAGS		0x7FE7F
>  
>  /* The flags that may vary depending on security flavor: */
>  #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY | NFSEXP_ROOTSQUASH \
> -- 
> 2.4.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 18:43 ` J. Bruce Fields
@ 2015-09-03 18:52   ` Jeff Layton
  2015-09-03 19:19     ` J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2015-09-03 18:52 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Thu, 3 Sep 2015 14:43:27 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> > There are cases with NFSv3 where the client doesn't actually care about
> > WCC attributes in replies. If the server is mainly acting as a DS for
> > flexfiles, then the client just throws out those attributes anyway.
> > Also, in the case where the client is primarily doing direct I/O, post
> > op attributes aren't terribly useful
> >
> > Another reason to allow turning these off is that NFS will flush all
> > buffered writes prior to issuing a GETATTR, and it also takes the
> > i_mutex in its ->getattr operation.
> >
> > If we're doing a vfs_getattr after most RPCs, then we can end up
> > deadlocking or (at best) prematurely flushing buffered writes, which
> > kills performance.
> 
> So you're talking about the NFS re-export case?  Do we know of any other
> case when a ->getattr is so expensive?
> 

That's the main one that I have experience with, but getattr can be
pretty expensive in clustered filesystems. For instance, on ceph:

        err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false);
        if (!err) {
                generic_fillattr(inode, stat);
                stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino);


...and it looks like ceph_do_getattr can issue a request on the network
(though I'm not familiar with that code and I imagine that it's
sometimes optimized out).


> 
> > This patch adds a new export option -- "nowcc" that disables the
> > return of WCC attributes in NFSv3 replies. I also have a userland
> > patch that adds support for the same option to nfs-utils that I'll
> > send along as well.
> > 
> > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > ---
> >  fs/nfsd/export.c                 | 1 +
> >  fs/nfsd/nfs3xdr.c                | 5 ++++-
> >  fs/nfsd/nfsfh.c                  | 4 ++++
> >  fs/nfsd/nfsfh.h                  | 5 ++++-
> >  include/uapi/linux/nfsd/export.h | 3 ++-
> >  5 files changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> > index b4d84b579f20..97258009ce1e 100644
> > --- a/fs/nfsd/export.c
> > +++ b/fs/nfsd/export.c
> > @@ -1092,6 +1092,7 @@ static struct flags {
> >  	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
> >  	{ NFSEXP_V4ROOT, {"v4root", ""}},
> >  	{ NFSEXP_PNFS, {"pnfs", ""}},
> > +	{ NFSEXP_NOWCC, {"nowcc", ""}},
> >  	{ 0, {"", ""}}
> >  };
> >  
> > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > index 01dcd494f781..c30c8c604e2a 100644
> > --- a/fs/nfsd/nfs3xdr.c
> > +++ b/fs/nfsd/nfs3xdr.c
> > @@ -203,7 +203,7 @@ static __be32 *
> >  encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct
> > svc_fh *fhp) {
> >  	struct dentry *dentry = fhp->fh_dentry;
> > -	if (dentry && d_really_is_positive(dentry)) {
> > +	if (!fhp->fh_no_wcc && dentry &&
> > d_really_is_positive(dentry)) { __be32 err;
> >  		struct kstat stat;
> >  
> > @@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> >  {
> >  	__be32 err;
> >  
> > +	if (fhp->fh_no_wcc)
> > +		return;
> > +
> >  	if (fhp->fh_post_saved)
> >  		printk("nfsd: inode locked twice during
> > operation.\n"); 
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 350041a40fe5..32093b7dce55 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct
> > svc_rqst *rqstp, struct svc_fh *fhp) 
> >  	fhp->fh_dentry = dentry;
> >  	fhp->fh_export = exp;
> > +	if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
> > +		fhp->fh_no_wcc = true;
> > +
> >  	return 0;
> >  out:
> >  	exp_put(exp);
> > @@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
> >  		exp_put(exp);
> >  		fhp->fh_export = NULL;
> >  	}
> > +	fhp->fh_no_wcc = false;
> >  	return;
> >  }
> >  
> > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > index 1e90dad4926b..9ddead4d98f8 100644
> > --- a/fs/nfsd/nfsfh.h
> > +++ b/fs/nfsd/nfsfh.h
> > @@ -32,6 +32,7 @@ typedef struct svc_fh {
> >  
> >  	unsigned char		fh_locked;	/* inode
> > locked by us */ unsigned char
> > fh_want_write;	/* remount protection taken */
> > +	bool			fh_no_wcc;	/* no wcc
> > data needed */ 
> >  #ifdef CONFIG_NFSD_V3
> >  	unsigned char		fh_post_saved;	/*
> > post-op attrs saved */ @@ -51,7 +52,6 @@ typedef struct svc_fh {
> >  	struct kstat		fh_post_attr;	/* full
> > attrs after operation */ u64
> > fh_post_change; /* nfsv4 change; see above */ #endif /*
> > CONFIG_NFSD_V3 */ -
> >  } svc_fh;
> >  
> >  enum nfsd_fsid {
> > @@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> >  {
> >  	struct inode    *inode;
> >  
> > +	if (fhp->fh_no_wcc)
> > +		return;
> > +
> >  	inode = d_inode(fhp->fh_dentry);
> >  	if (!fhp->fh_pre_saved) {
> >  		fhp->fh_pre_mtime = inode->i_mtime;
> > diff --git a/include/uapi/linux/nfsd/export.h
> > b/include/uapi/linux/nfsd/export.h index 0df7bd5d2fb1..4c132290f414
> > 100644 --- a/include/uapi/linux/nfsd/export.h
> > +++ b/include/uapi/linux/nfsd/export.h
> > @@ -51,9 +51,10 @@
> >   */
> >  #define	NFSEXP_V4ROOT		0x10000
> >  #define NFSEXP_PNFS		0x20000
> > +#define NFSEXP_NOWCC		0x40000
> >  
> >  /* All flags that we claim to support.  (Note we don't support
> > NOACL.) */ -#define NFSEXP_ALLFLAGS		0x3FE7F
> > +#define NFSEXP_ALLFLAGS		0x7FE7F
> >  
> >  /* The flags that may vary depending on security flavor: */
> >  #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY |
> > NFSEXP_ROOTSQUASH \ -- 
> > 2.4.3


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 18:52   ` Jeff Layton
@ 2015-09-03 19:19     ` J. Bruce Fields
  2015-09-03 19:54       ` Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2015-09-03 19:19 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Sep 03, 2015 at 02:52:25PM -0400, Jeff Layton wrote:
> On Thu, 3 Sep 2015 14:43:27 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> > > There are cases with NFSv3 where the client doesn't actually care about
> > > WCC attributes in replies. If the server is mainly acting as a DS for
> > > flexfiles, then the client just throws out those attributes anyway.
> > > Also, in the case where the client is primarily doing direct I/O, post
> > > op attributes aren't terribly useful
> > >
> > > Another reason to allow turning these off is that NFS will flush all
> > > buffered writes prior to issuing a GETATTR, and it also takes the
> > > i_mutex in its ->getattr operation.
> > >
> > > If we're doing a vfs_getattr after most RPCs, then we can end up
> > > deadlocking or (at best) prematurely flushing buffered writes, which
> > > kills performance.
> > 
> > So you're talking about the NFS re-export case?  Do we know of any other
> > case when a ->getattr is so expensive?
> > 
> 
> That's the main one that I have experience with, but getattr can be
> pretty expensive in clustered filesystems. For instance, on ceph:
> 
>         err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false);
>         if (!err) {
>                 generic_fillattr(inode, stat);
>                 stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino);
> 
> 
> ...and it looks like ceph_do_getattr can issue a request on the network
> (though I'm not familiar with that code and I imagine that it's
> sometimes optimized out).

OK.  Could we get something like that in the changelog?  The change
really needs to stand on the non-NFS case alone as long as NFS
reexport's not upstream.  For that reason (and because without the
context that second paragraph's kind of confusing), it'd be helpful to
preface the NFS discussion by smoething like "In the (out-of-tree) NFS
re-export case".

What's keeping that out of upstream, anyway?  Apparently there's some
use case, and if it's inspiring a lot of changes in generic code, then
it'd simplify life to have it upstream.

--b.

> 
> 
> > 
> > > This patch adds a new export option -- "nowcc" that disables the
> > > return of WCC attributes in NFSv3 replies. I also have a userland
> > > patch that adds support for the same option to nfs-utils that I'll
> > > send along as well.
> > > 
> > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > > ---
> > >  fs/nfsd/export.c                 | 1 +
> > >  fs/nfsd/nfs3xdr.c                | 5 ++++-
> > >  fs/nfsd/nfsfh.c                  | 4 ++++
> > >  fs/nfsd/nfsfh.h                  | 5 ++++-
> > >  include/uapi/linux/nfsd/export.h | 3 ++-
> > >  5 files changed, 15 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> > > index b4d84b579f20..97258009ce1e 100644
> > > --- a/fs/nfsd/export.c
> > > +++ b/fs/nfsd/export.c
> > > @@ -1092,6 +1092,7 @@ static struct flags {
> > >  	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
> > >  	{ NFSEXP_V4ROOT, {"v4root", ""}},
> > >  	{ NFSEXP_PNFS, {"pnfs", ""}},
> > > +	{ NFSEXP_NOWCC, {"nowcc", ""}},
> > >  	{ 0, {"", ""}}
> > >  };
> > >  
> > > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > > index 01dcd494f781..c30c8c604e2a 100644
> > > --- a/fs/nfsd/nfs3xdr.c
> > > +++ b/fs/nfsd/nfs3xdr.c
> > > @@ -203,7 +203,7 @@ static __be32 *
> > >  encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct
> > > svc_fh *fhp) {
> > >  	struct dentry *dentry = fhp->fh_dentry;
> > > -	if (dentry && d_really_is_positive(dentry)) {
> > > +	if (!fhp->fh_no_wcc && dentry &&
> > > d_really_is_positive(dentry)) { __be32 err;
> > >  		struct kstat stat;
> > >  
> > > @@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> > >  {
> > >  	__be32 err;
> > >  
> > > +	if (fhp->fh_no_wcc)
> > > +		return;
> > > +
> > >  	if (fhp->fh_post_saved)
> > >  		printk("nfsd: inode locked twice during
> > > operation.\n"); 
> > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > > index 350041a40fe5..32093b7dce55 100644
> > > --- a/fs/nfsd/nfsfh.c
> > > +++ b/fs/nfsd/nfsfh.c
> > > @@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct
> > > svc_rqst *rqstp, struct svc_fh *fhp) 
> > >  	fhp->fh_dentry = dentry;
> > >  	fhp->fh_export = exp;
> > > +	if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
> > > +		fhp->fh_no_wcc = true;
> > > +
> > >  	return 0;
> > >  out:
> > >  	exp_put(exp);
> > > @@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
> > >  		exp_put(exp);
> > >  		fhp->fh_export = NULL;
> > >  	}
> > > +	fhp->fh_no_wcc = false;
> > >  	return;
> > >  }
> > >  
> > > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > > index 1e90dad4926b..9ddead4d98f8 100644
> > > --- a/fs/nfsd/nfsfh.h
> > > +++ b/fs/nfsd/nfsfh.h
> > > @@ -32,6 +32,7 @@ typedef struct svc_fh {
> > >  
> > >  	unsigned char		fh_locked;	/* inode
> > > locked by us */ unsigned char
> > > fh_want_write;	/* remount protection taken */
> > > +	bool			fh_no_wcc;	/* no wcc
> > > data needed */ 
> > >  #ifdef CONFIG_NFSD_V3
> > >  	unsigned char		fh_post_saved;	/*
> > > post-op attrs saved */ @@ -51,7 +52,6 @@ typedef struct svc_fh {
> > >  	struct kstat		fh_post_attr;	/* full
> > > attrs after operation */ u64
> > > fh_post_change; /* nfsv4 change; see above */ #endif /*
> > > CONFIG_NFSD_V3 */ -
> > >  } svc_fh;
> > >  
> > >  enum nfsd_fsid {
> > > @@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> > >  {
> > >  	struct inode    *inode;
> > >  
> > > +	if (fhp->fh_no_wcc)
> > > +		return;
> > > +
> > >  	inode = d_inode(fhp->fh_dentry);
> > >  	if (!fhp->fh_pre_saved) {
> > >  		fhp->fh_pre_mtime = inode->i_mtime;
> > > diff --git a/include/uapi/linux/nfsd/export.h
> > > b/include/uapi/linux/nfsd/export.h index 0df7bd5d2fb1..4c132290f414
> > > 100644 --- a/include/uapi/linux/nfsd/export.h
> > > +++ b/include/uapi/linux/nfsd/export.h
> > > @@ -51,9 +51,10 @@
> > >   */
> > >  #define	NFSEXP_V4ROOT		0x10000
> > >  #define NFSEXP_PNFS		0x20000
> > > +#define NFSEXP_NOWCC		0x40000
> > >  
> > >  /* All flags that we claim to support.  (Note we don't support
> > > NOACL.) */ -#define NFSEXP_ALLFLAGS		0x3FE7F
> > > +#define NFSEXP_ALLFLAGS		0x7FE7F
> > >  
> > >  /* The flags that may vary depending on security flavor: */
> > >  #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY |
> > > NFSEXP_ROOTSQUASH \ -- 
> > > 2.4.3
> 
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 19:19     ` J. Bruce Fields
@ 2015-09-03 19:54       ` Jeff Layton
  2015-09-03 20:20         ` J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2015-09-03 19:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Thu, 3 Sep 2015 15:19:14 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Sep 03, 2015 at 02:52:25PM -0400, Jeff Layton wrote:
> > On Thu, 3 Sep 2015 14:43:27 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> > > > There are cases with NFSv3 where the client doesn't actually care about
> > > > WCC attributes in replies. If the server is mainly acting as a DS for
> > > > flexfiles, then the client just throws out those attributes anyway.
> > > > Also, in the case where the client is primarily doing direct I/O, post
> > > > op attributes aren't terribly useful
> > > >
> > > > Another reason to allow turning these off is that NFS will flush all
> > > > buffered writes prior to issuing a GETATTR, and it also takes the
> > > > i_mutex in its ->getattr operation.
> > > >
> > > > If we're doing a vfs_getattr after most RPCs, then we can end up
> > > > deadlocking or (at best) prematurely flushing buffered writes, which
> > > > kills performance.
> > > 
> > > So you're talking about the NFS re-export case?  Do we know of any other
> > > case when a ->getattr is so expensive?
> > > 
> > 
> > That's the main one that I have experience with, but getattr can be
> > pretty expensive in clustered filesystems. For instance, on ceph:
> > 
> >         err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false);
> >         if (!err) {
> >                 generic_fillattr(inode, stat);
> >                 stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino);
> > 
> > 
> > ...and it looks like ceph_do_getattr can issue a request on the network
> > (though I'm not familiar with that code and I imagine that it's
> > sometimes optimized out).
> 
> OK.  Could we get something like that in the changelog?  The change
> really needs to stand on the non-NFS case alone as long as NFS
> reexport's not upstream.  For that reason (and because without the
> context that second paragraph's kind of confusing), it'd be helpful to
> preface the NFS discussion by smoething like "In the (out-of-tree) NFS
> re-export case".
> 

Yeah, no problem. I'll respin the changelog on both patches and resend
within the next day or two.

> What's keeping that out of upstream, anyway?  Apparently there's some
> use case, and if it's inspiring a lot of changes in generic code, then
> it'd simplify life to have it upstream.
> 
> 

There are several problems. Here are few but there are others:

1) it is at least somewhat of a potential security concern. By mounting
on a box that has access and then reexporting it, you can circumvent
the export restrictions on the original server. Granted you can do that
today with samba or something, but still -- it's a little sketchy.

2) getattrs: We're working around the problem with this new export
option, but if you don't use that then you can potentially deadlock
with NFS. It wants to take the i_mutex in its ->getattr operation but
knfsd calls vfs_getattr with that held to do post-op attrs. My initial
workaround was to drop the i_mutex before calling fh_getattr instead of
after, but then I hit the performance problem I described.

3) locking: proxying v3 locking is a painful mess. If the reexporter
reboots, it'll lose its lease on the main server, which will kick out
all of its state. At that point you can end up with another client
racing and getting your lock before the reexporter can come back up and
reclaim it.

Our main use-case for this is pretty limited and doesn't involve file
locking (so far!). We'll probably have to code up some mechanism to
deal with that at some point, but that won't be an in-kernel solution.

Once we ship this thing, the kernel changes will be GPLed of course for
anyone who wants to use them, but I'm not sure any of us really want the
headache of merging this upstream as a first-class feature.

> > 
> > 
> > > 
> > > > This patch adds a new export option -- "nowcc" that disables the
> > > > return of WCC attributes in NFSv3 replies. I also have a userland
> > > > patch that adds support for the same option to nfs-utils that I'll
> > > > send along as well.
> > > > 
> > > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > > > ---
> > > >  fs/nfsd/export.c                 | 1 +
> > > >  fs/nfsd/nfs3xdr.c                | 5 ++++-
> > > >  fs/nfsd/nfsfh.c                  | 4 ++++
> > > >  fs/nfsd/nfsfh.h                  | 5 ++++-
> > > >  include/uapi/linux/nfsd/export.h | 3 ++-
> > > >  5 files changed, 15 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> > > > index b4d84b579f20..97258009ce1e 100644
> > > > --- a/fs/nfsd/export.c
> > > > +++ b/fs/nfsd/export.c
> > > > @@ -1092,6 +1092,7 @@ static struct flags {
> > > >  	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
> > > >  	{ NFSEXP_V4ROOT, {"v4root", ""}},
> > > >  	{ NFSEXP_PNFS, {"pnfs", ""}},
> > > > +	{ NFSEXP_NOWCC, {"nowcc", ""}},
> > > >  	{ 0, {"", ""}}
> > > >  };
> > > >  
> > > > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > > > index 01dcd494f781..c30c8c604e2a 100644
> > > > --- a/fs/nfsd/nfs3xdr.c
> > > > +++ b/fs/nfsd/nfs3xdr.c
> > > > @@ -203,7 +203,7 @@ static __be32 *
> > > >  encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct
> > > > svc_fh *fhp) {
> > > >  	struct dentry *dentry = fhp->fh_dentry;
> > > > -	if (dentry && d_really_is_positive(dentry)) {
> > > > +	if (!fhp->fh_no_wcc && dentry &&
> > > > d_really_is_positive(dentry)) { __be32 err;
> > > >  		struct kstat stat;
> > > >  
> > > > @@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> > > >  {
> > > >  	__be32 err;
> > > >  
> > > > +	if (fhp->fh_no_wcc)
> > > > +		return;
> > > > +
> > > >  	if (fhp->fh_post_saved)
> > > >  		printk("nfsd: inode locked twice during
> > > > operation.\n"); 
> > > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > > > index 350041a40fe5..32093b7dce55 100644
> > > > --- a/fs/nfsd/nfsfh.c
> > > > +++ b/fs/nfsd/nfsfh.c
> > > > @@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct
> > > > svc_rqst *rqstp, struct svc_fh *fhp) 
> > > >  	fhp->fh_dentry = dentry;
> > > >  	fhp->fh_export = exp;
> > > > +	if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
> > > > +		fhp->fh_no_wcc = true;
> > > > +
> > > >  	return 0;
> > > >  out:
> > > >  	exp_put(exp);
> > > > @@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
> > > >  		exp_put(exp);
> > > >  		fhp->fh_export = NULL;
> > > >  	}
> > > > +	fhp->fh_no_wcc = false;
> > > >  	return;
> > > >  }
> > > >  
> > > > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > > > index 1e90dad4926b..9ddead4d98f8 100644
> > > > --- a/fs/nfsd/nfsfh.h
> > > > +++ b/fs/nfsd/nfsfh.h
> > > > @@ -32,6 +32,7 @@ typedef struct svc_fh {
> > > >  
> > > >  	unsigned char		fh_locked;	/* inode
> > > > locked by us */ unsigned char
> > > > fh_want_write;	/* remount protection taken */
> > > > +	bool			fh_no_wcc;	/* no wcc
> > > > data needed */ 
> > > >  #ifdef CONFIG_NFSD_V3
> > > >  	unsigned char		fh_post_saved;	/*
> > > > post-op attrs saved */ @@ -51,7 +52,6 @@ typedef struct svc_fh {
> > > >  	struct kstat		fh_post_attr;	/* full
> > > > attrs after operation */ u64
> > > > fh_post_change; /* nfsv4 change; see above */ #endif /*
> > > > CONFIG_NFSD_V3 */ -
> > > >  } svc_fh;
> > > >  
> > > >  enum nfsd_fsid {
> > > > @@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> > > >  {
> > > >  	struct inode    *inode;
> > > >  
> > > > +	if (fhp->fh_no_wcc)
> > > > +		return;
> > > > +
> > > >  	inode = d_inode(fhp->fh_dentry);
> > > >  	if (!fhp->fh_pre_saved) {
> > > >  		fhp->fh_pre_mtime = inode->i_mtime;
> > > > diff --git a/include/uapi/linux/nfsd/export.h
> > > > b/include/uapi/linux/nfsd/export.h index 0df7bd5d2fb1..4c132290f414
> > > > 100644 --- a/include/uapi/linux/nfsd/export.h
> > > > +++ b/include/uapi/linux/nfsd/export.h
> > > > @@ -51,9 +51,10 @@
> > > >   */
> > > >  #define	NFSEXP_V4ROOT		0x10000
> > > >  #define NFSEXP_PNFS		0x20000
> > > > +#define NFSEXP_NOWCC		0x40000
> > > >  
> > > >  /* All flags that we claim to support.  (Note we don't support
> > > > NOACL.) */ -#define NFSEXP_ALLFLAGS		0x3FE7F
> > > > +#define NFSEXP_ALLFLAGS		0x7FE7F
> > > >  
> > > >  /* The flags that may vary depending on security flavor: */
> > > >  #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY |
> > > > NFSEXP_ROOTSQUASH \ -- 
> > > > 2.4.3
> > 
> > 
> > -- 
> > Jeff Layton <jlayton@poochiereds.net>


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 19:54       ` Jeff Layton
@ 2015-09-03 20:20         ` J. Bruce Fields
  2015-09-03 20:35           ` Jeff Layton
  0 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2015-09-03 20:20 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Sep 03, 2015 at 03:54:17PM -0400, Jeff Layton wrote:
> On Thu, 3 Sep 2015 15:19:14 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Thu, Sep 03, 2015 at 02:52:25PM -0400, Jeff Layton wrote:
> > > On Thu, 3 Sep 2015 14:43:27 -0400
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > 
> > > > On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> > > > > There are cases with NFSv3 where the client doesn't actually care about
> > > > > WCC attributes in replies. If the server is mainly acting as a DS for
> > > > > flexfiles, then the client just throws out those attributes anyway.
> > > > > Also, in the case where the client is primarily doing direct I/O, post
> > > > > op attributes aren't terribly useful
> > > > >
> > > > > Another reason to allow turning these off is that NFS will flush all
> > > > > buffered writes prior to issuing a GETATTR, and it also takes the
> > > > > i_mutex in its ->getattr operation.
> > > > >
> > > > > If we're doing a vfs_getattr after most RPCs, then we can end up
> > > > > deadlocking or (at best) prematurely flushing buffered writes, which
> > > > > kills performance.
> > > > 
> > > > So you're talking about the NFS re-export case?  Do we know of any other
> > > > case when a ->getattr is so expensive?
> > > > 
> > > 
> > > That's the main one that I have experience with, but getattr can be
> > > pretty expensive in clustered filesystems. For instance, on ceph:
> > > 
> > >         err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false);
> > >         if (!err) {
> > >                 generic_fillattr(inode, stat);
> > >                 stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino);
> > > 
> > > 
> > > ...and it looks like ceph_do_getattr can issue a request on the network
> > > (though I'm not familiar with that code and I imagine that it's
> > > sometimes optimized out).
> > 
> > OK.  Could we get something like that in the changelog?  The change
> > really needs to stand on the non-NFS case alone as long as NFS
> > reexport's not upstream.  For that reason (and because without the
> > context that second paragraph's kind of confusing), it'd be helpful to
> > preface the NFS discussion by smoething like "In the (out-of-tree) NFS
> > re-export case".
> > 
> 
> Yeah, no problem. I'll respin the changelog on both patches and resend
> within the next day or two.
> 
> > What's keeping that out of upstream, anyway?  Apparently there's some
> > use case, and if it's inspiring a lot of changes in generic code, then
> > it'd simplify life to have it upstream.
> > 
> > 
> 
> There are several problems. Here are few but there are others:
> 
> 1) it is at least somewhat of a potential security concern. By mounting
> on a box that has access and then reexporting it, you can circumvent
> the export restrictions on the original server. Granted you can do that
> today with samba or something, but still -- it's a little sketchy.

Or with ganesha, or you could run a web server, or just mail the file
contents to someone....  This just isn't the way to enforce anything.

I've hard Trond argue something like this before, but I think his point
was a little different: not that we have to deny reexports for security
reasons, but just that such a policy-circumventing use case isn't worth
supporting.  So he wasn't interested in reexports as long as that was
the only use case he'd heard about.

> 2) getattrs: We're working around the problem with this new export
> option, but if you don't use that then you can potentially deadlock
> with NFS. It wants to take the i_mutex in its ->getattr operation but
> knfsd calls vfs_getattr with that held to do post-op attrs. My initial
> workaround was to drop the i_mutex before calling fh_getattr instead of
> after, but then I hit the performance problem I described.
> 
> 3) locking: proxying v3 locking is a painful mess. If the reexporter
> reboots, it'll lose its lease on the main server, which will kick out
> all of its state. At that point you can end up with another client
> racing and getting your lock before the reexporter can come back up and
> reclaim it.
> 
> Our main use-case for this is pretty limited and doesn't involve file
> locking (so far!).

So this is the interest part, I guess.--b.

> We'll probably have to code up some mechanism to
> deal with that at some point, but that won't be an in-kernel solution.
> 
> Once we ship this thing, the kernel changes will be GPLed of course for
> anyone who wants to use them, but I'm not sure any of us really want the
> headache of merging this upstream as a first-class feature.
> 
> > > 
> > > 
> > > > 
> > > > > This patch adds a new export option -- "nowcc" that disables the
> > > > > return of WCC attributes in NFSv3 replies. I also have a userland
> > > > > patch that adds support for the same option to nfs-utils that I'll
> > > > > send along as well.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > > > > ---
> > > > >  fs/nfsd/export.c                 | 1 +
> > > > >  fs/nfsd/nfs3xdr.c                | 5 ++++-
> > > > >  fs/nfsd/nfsfh.c                  | 4 ++++
> > > > >  fs/nfsd/nfsfh.h                  | 5 ++++-
> > > > >  include/uapi/linux/nfsd/export.h | 3 ++-
> > > > >  5 files changed, 15 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> > > > > index b4d84b579f20..97258009ce1e 100644
> > > > > --- a/fs/nfsd/export.c
> > > > > +++ b/fs/nfsd/export.c
> > > > > @@ -1092,6 +1092,7 @@ static struct flags {
> > > > >  	{ NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
> > > > >  	{ NFSEXP_V4ROOT, {"v4root", ""}},
> > > > >  	{ NFSEXP_PNFS, {"pnfs", ""}},
> > > > > +	{ NFSEXP_NOWCC, {"nowcc", ""}},
> > > > >  	{ 0, {"", ""}}
> > > > >  };
> > > > >  
> > > > > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > > > > index 01dcd494f781..c30c8c604e2a 100644
> > > > > --- a/fs/nfsd/nfs3xdr.c
> > > > > +++ b/fs/nfsd/nfs3xdr.c
> > > > > @@ -203,7 +203,7 @@ static __be32 *
> > > > >  encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct
> > > > > svc_fh *fhp) {
> > > > >  	struct dentry *dentry = fhp->fh_dentry;
> > > > > -	if (dentry && d_really_is_positive(dentry)) {
> > > > > +	if (!fhp->fh_no_wcc && dentry &&
> > > > > d_really_is_positive(dentry)) { __be32 err;
> > > > >  		struct kstat stat;
> > > > >  
> > > > > @@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> > > > >  {
> > > > >  	__be32 err;
> > > > >  
> > > > > +	if (fhp->fh_no_wcc)
> > > > > +		return;
> > > > > +
> > > > >  	if (fhp->fh_post_saved)
> > > > >  		printk("nfsd: inode locked twice during
> > > > > operation.\n"); 
> > > > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > > > > index 350041a40fe5..32093b7dce55 100644
> > > > > --- a/fs/nfsd/nfsfh.c
> > > > > +++ b/fs/nfsd/nfsfh.c
> > > > > @@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct
> > > > > svc_rqst *rqstp, struct svc_fh *fhp) 
> > > > >  	fhp->fh_dentry = dentry;
> > > > >  	fhp->fh_export = exp;
> > > > > +	if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
> > > > > +		fhp->fh_no_wcc = true;
> > > > > +
> > > > >  	return 0;
> > > > >  out:
> > > > >  	exp_put(exp);
> > > > > @@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
> > > > >  		exp_put(exp);
> > > > >  		fhp->fh_export = NULL;
> > > > >  	}
> > > > > +	fhp->fh_no_wcc = false;
> > > > >  	return;
> > > > >  }
> > > > >  
> > > > > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > > > > index 1e90dad4926b..9ddead4d98f8 100644
> > > > > --- a/fs/nfsd/nfsfh.h
> > > > > +++ b/fs/nfsd/nfsfh.h
> > > > > @@ -32,6 +32,7 @@ typedef struct svc_fh {
> > > > >  
> > > > >  	unsigned char		fh_locked;	/* inode
> > > > > locked by us */ unsigned char
> > > > > fh_want_write;	/* remount protection taken */
> > > > > +	bool			fh_no_wcc;	/* no wcc
> > > > > data needed */ 
> > > > >  #ifdef CONFIG_NFSD_V3
> > > > >  	unsigned char		fh_post_saved;	/*
> > > > > post-op attrs saved */ @@ -51,7 +52,6 @@ typedef struct svc_fh {
> > > > >  	struct kstat		fh_post_attr;	/* full
> > > > > attrs after operation */ u64
> > > > > fh_post_change; /* nfsv4 change; see above */ #endif /*
> > > > > CONFIG_NFSD_V3 */ -
> > > > >  } svc_fh;
> > > > >  
> > > > >  enum nfsd_fsid {
> > > > > @@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> > > > >  {
> > > > >  	struct inode    *inode;
> > > > >  
> > > > > +	if (fhp->fh_no_wcc)
> > > > > +		return;
> > > > > +
> > > > >  	inode = d_inode(fhp->fh_dentry);
> > > > >  	if (!fhp->fh_pre_saved) {
> > > > >  		fhp->fh_pre_mtime = inode->i_mtime;
> > > > > diff --git a/include/uapi/linux/nfsd/export.h
> > > > > b/include/uapi/linux/nfsd/export.h index 0df7bd5d2fb1..4c132290f414
> > > > > 100644 --- a/include/uapi/linux/nfsd/export.h
> > > > > +++ b/include/uapi/linux/nfsd/export.h
> > > > > @@ -51,9 +51,10 @@
> > > > >   */
> > > > >  #define	NFSEXP_V4ROOT		0x10000
> > > > >  #define NFSEXP_PNFS		0x20000
> > > > > +#define NFSEXP_NOWCC		0x40000
> > > > >  
> > > > >  /* All flags that we claim to support.  (Note we don't support
> > > > > NOACL.) */ -#define NFSEXP_ALLFLAGS		0x3FE7F
> > > > > +#define NFSEXP_ALLFLAGS		0x7FE7F
> > > > >  
> > > > >  /* The flags that may vary depending on security flavor: */
> > > > >  #define NFSEXP_SECINFO_FLAGS	(NFSEXP_READONLY |
> > > > > NFSEXP_ROOTSQUASH \ -- 
> > > > > 2.4.3
> > > 
> > > 
> > > -- 
> > > Jeff Layton <jlayton@poochiereds.net>
> 
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 20:20         ` J. Bruce Fields
@ 2015-09-03 20:35           ` Jeff Layton
  2015-09-11 13:35             ` J. Bruce Fields
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2015-09-03 20:35 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Thu, 3 Sep 2015 16:20:16 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Sep 03, 2015 at 03:54:17PM -0400, Jeff Layton wrote:
> > On Thu, 3 Sep 2015 15:19:14 -0400
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Thu, Sep 03, 2015 at 02:52:25PM -0400, Jeff Layton wrote:
> > > > On Thu, 3 Sep 2015 14:43:27 -0400
> > > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > > 
> > > > > On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> > > > > > There are cases with NFSv3 where the client doesn't actually care about
> > > > > > WCC attributes in replies. If the server is mainly acting as a DS for
> > > > > > flexfiles, then the client just throws out those attributes anyway.
> > > > > > Also, in the case where the client is primarily doing direct I/O, post
> > > > > > op attributes aren't terribly useful
> > > > > >
> > > > > > Another reason to allow turning these off is that NFS will flush all
> > > > > > buffered writes prior to issuing a GETATTR, and it also takes the
> > > > > > i_mutex in its ->getattr operation.
> > > > > >
> > > > > > If we're doing a vfs_getattr after most RPCs, then we can end up
> > > > > > deadlocking or (at best) prematurely flushing buffered writes, which
> > > > > > kills performance.
> > > > > 
> > > > > So you're talking about the NFS re-export case?  Do we know of any other
> > > > > case when a ->getattr is so expensive?
> > > > > 
> > > > 
> > > > That's the main one that I have experience with, but getattr can be
> > > > pretty expensive in clustered filesystems. For instance, on ceph:
> > > > 
> > > >         err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false);
> > > >         if (!err) {
> > > >                 generic_fillattr(inode, stat);
> > > >                 stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino);
> > > > 
> > > > 
> > > > ...and it looks like ceph_do_getattr can issue a request on the network
> > > > (though I'm not familiar with that code and I imagine that it's
> > > > sometimes optimized out).
> > > 
> > > OK.  Could we get something like that in the changelog?  The change
> > > really needs to stand on the non-NFS case alone as long as NFS
> > > reexport's not upstream.  For that reason (and because without the
> > > context that second paragraph's kind of confusing), it'd be helpful to
> > > preface the NFS discussion by smoething like "In the (out-of-tree) NFS
> > > re-export case".
> > > 
> > 
> > Yeah, no problem. I'll respin the changelog on both patches and resend
> > within the next day or two.
> > 
> > > What's keeping that out of upstream, anyway?  Apparently there's some
> > > use case, and if it's inspiring a lot of changes in generic code, then
> > > it'd simplify life to have it upstream.
> > > 
> > > 
> > 
> > There are several problems. Here are few but there are others:
> > 
> > 1) it is at least somewhat of a potential security concern. By mounting
> > on a box that has access and then reexporting it, you can circumvent
> > the export restrictions on the original server. Granted you can do that
> > today with samba or something, but still -- it's a little sketchy.
> 
> Or with ganesha, or you could run a web server, or just mail the file
> contents to someone....  This just isn't the way to enforce anything.
> 
> I've hard Trond argue something like this before, but I think his point
> was a little different: not that we have to deny reexports for security
> reasons, but just that such a policy-circumventing use case isn't worth
> supporting.  So he wasn't interested in reexports as long as that was
> the only use case he'd heard about.
> 

Agreed. We can't really enforce anything with this, but it's not a use
case we really want to be an enabler of.

> > 2) getattrs: We're working around the problem with this new export
> > option, but if you don't use that then you can potentially deadlock
> > with NFS. It wants to take the i_mutex in its ->getattr operation but
> > knfsd calls vfs_getattr with that held to do post-op attrs. My initial
> > workaround was to drop the i_mutex before calling fh_getattr instead of
> > after, but then I hit the performance problem I described.
> > 
> > 3) locking: proxying v3 locking is a painful mess. If the reexporter
> > reboots, it'll lose its lease on the main server, which will kick out
> > all of its state. At that point you can end up with another client
> > racing and getting your lock before the reexporter can come back up and
> > reclaim it.
> > 
> > Our main use-case for this is pretty limited and doesn't involve file
> > locking (so far!).
> 
> So this is the interest part, I guess.--b.
> 

Yeah. The locking one is a real bugger. We have a potential design for
a solution, but I'm not sure it'll be something we can open source.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
  2015-09-03 20:35           ` Jeff Layton
@ 2015-09-11 13:35             ` J. Bruce Fields
  0 siblings, 0 replies; 8+ messages in thread
From: J. Bruce Fields @ 2015-09-11 13:35 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Thu, Sep 03, 2015 at 04:35:37PM -0400, Jeff Layton wrote:
> On Thu, 3 Sep 2015 16:20:16 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Thu, Sep 03, 2015 at 03:54:17PM -0400, Jeff Layton wrote:
> > > 2) getattrs: We're working around the problem with this new export
> > > option, but if you don't use that then you can potentially deadlock
> > > with NFS. It wants to take the i_mutex in its ->getattr operation but
> > > knfsd calls vfs_getattr with that held to do post-op attrs. My initial
> > > workaround was to drop the i_mutex before calling fh_getattr instead of
> > > after, but then I hit the performance problem I described.
> > > 
> > > 3) locking: proxying v3 locking is a painful mess. If the reexporter
> > > reboots, it'll lose its lease on the main server, which will kick out
> > > all of its state. At that point you can end up with another client
> > > racing and getting your lock before the reexporter can come back up and
> > > reclaim it.
> > > 
> > > Our main use-case for this is pretty limited and doesn't involve file
> > > locking (so far!).
> > 
> > So this is the interest part, I guess.--b.
> > 
> 
> Yeah. The locking one is a real bugger. We have a potential design for
> a solution, but I'm not sure it'll be something we can open source.

Actually it wasn't the locking I was curious about so much as the use
case and whether it's something anyone else would care about.

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-11 13:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03 17:33 [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies Jeff Layton
2015-09-03 18:43 ` J. Bruce Fields
2015-09-03 18:52   ` Jeff Layton
2015-09-03 19:19     ` J. Bruce Fields
2015-09-03 19:54       ` Jeff Layton
2015-09-03 20:20         ` J. Bruce Fields
2015-09-03 20:35           ` Jeff Layton
2015-09-11 13:35             ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.