[PATCH] nfs: only do COMMIT for range written with direct I/O

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nfs: only do COMMIT for range written with direct I/O
@ 2011-11-30 14:07 Jeff Layton
  2011-11-30 14:56 ` Malahal Naineni
  2011-12-05 17:11 ` Jeff Layton
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff Layton @ 2011-11-30 14:07 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, khoa

When given a range to write with unstable writes, the current code
always does a COMMIT of the entire file afterward. This is potentially
expensive on some servers and unnecessary. Instead, just do a COMMIT for
the offset and count that was written.

Khoa, who reported this bug, stated that this made a big difference in
performance in their environment, which I believe involves GPFS on the
server. He didn't pass along any hard numbers so I can't quantify the
gain, but it stands to reason that clustered filesystems might suffer
more contention issues when issuing a commit over the whole file.

Reported-by: Khoa Huynh <khoa@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/direct.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1940f1a..33f2be7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -77,6 +77,7 @@ struct nfs_direct_req {
 	/* completion state */
 	atomic_t		io_count;	/* i/os we're waiting for */
 	spinlock_t		lock;		/* protect completion state */
+	loff_t			pos;		/* offset into file */
 	ssize_t			count,		/* bytes actually processed */
 				error;		/* any reported error */
 	struct completion	completion;	/* wait for i/o completion */
@@ -584,8 +585,8 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
 	data->cred = msg.rpc_cred;
 
 	data->args.fh = NFS_FH(data->inode);
-	data->args.offset = 0;
-	data->args.count = 0;
+	data->args.offset = dreq->pos;
+	data->args.count = dreq->count;
 	data->args.context = dreq->ctx;
 	data->args.lock_context = dreq->l_ctx;
 	data->res.count = 0;
@@ -877,6 +878,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
 		sync = NFS_FILE_SYNC;
 
 	dreq->inode = inode;
+	dreq->pos = pos;
 	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
 	dreq->l_ctx = nfs_get_lock_context(dreq->ctx);
 	if (dreq->l_ctx == NULL)
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfs: only do COMMIT for range written with direct I/O
  2011-11-30 14:07 [PATCH] nfs: only do COMMIT for range written with direct I/O Jeff Layton
@ 2011-11-30 14:56 ` Malahal Naineni
  2011-11-30 19:25   ` Jeff Layton
  2011-12-05 17:11 ` Jeff Layton
  1 sibling, 1 reply; 6+ messages in thread
From: Malahal Naineni @ 2011-11-30 14:56 UTC (permalink / raw)
  To: linux-nfs

Jeff Layton [jlayton@redhat.com] wrote:
> When given a range to write with unstable writes, the current code
> always does a COMMIT of the entire file afterward. This is potentially
> expensive on some servers and unnecessary. Instead, just do a COMMIT for
> the offset and count that was written.
> 
> Khoa, who reported this bug, stated that this made a big difference in
> performance in their environment, which I believe involves GPFS on the
> server. He didn't pass along any hard numbers so I can't quantify the
> gain, but it stands to reason that clustered filesystems might suffer
> more contention issues when issuing a commit over the whole file.
> 
> Reported-by: Khoa Huynh <khoa@us.ibm.com>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/nfs/direct.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 1940f1a..33f2be7 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -77,6 +77,7 @@ struct nfs_direct_req {
>  	/* completion state */
>  	atomic_t		io_count;	/* i/os we're waiting for */
>  	spinlock_t		lock;		/* protect completion state */
> +	loff_t			pos;		/* offset into file */

Why not call it "offset"? Should we set this to zero in
nfs_direct_req_alloc (just like count is set to zero there)?

Thanks, Malahal.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfs: only do COMMIT for range written with direct I/O
  2011-11-30 14:56 ` Malahal Naineni
@ 2011-11-30 19:25   ` Jeff Layton
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2011-11-30 19:25 UTC (permalink / raw)
  To: Malahal Naineni; +Cc: linux-nfs

On Wed, 30 Nov 2011 08:56:57 -0600
Malahal Naineni <malahal@us.ibm.com> wrote:

> Jeff Layton [jlayton@redhat.com] wrote:
> > When given a range to write with unstable writes, the current code
> > always does a COMMIT of the entire file afterward. This is potentially
> > expensive on some servers and unnecessary. Instead, just do a COMMIT for
> > the offset and count that was written.
> > 
> > Khoa, who reported this bug, stated that this made a big difference in
> > performance in their environment, which I believe involves GPFS on the
> > server. He didn't pass along any hard numbers so I can't quantify the
> > gain, but it stands to reason that clustered filesystems might suffer
> > more contention issues when issuing a commit over the whole file.
> > 
> > Reported-by: Khoa Huynh <khoa@us.ibm.com>
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/nfs/direct.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index 1940f1a..33f2be7 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -77,6 +77,7 @@ struct nfs_direct_req {
> >  	/* completion state */
> >  	atomic_t		io_count;	/* i/os we're waiting for */
> >  	spinlock_t		lock;		/* protect completion state */
> > +	loff_t			pos;		/* offset into file */
> 
> Why not call it "offset"?

Why should we call it "offset"? The rest of the code refers to this
value as "pos". Essentially this patch replaces a field that was
removed from this struct several years ago. At the time, that field was
also called "pos".

> Should we set this to zero in
> nfs_direct_req_alloc (just like count is set to zero there)?
> 

Yes, that certainly wouldn't hurt. I'll plan to add that into the next
respin which I'll plan to send out once I've collected some more
feedback.

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfs: only do COMMIT for range written with direct I/O
  2011-11-30 14:07 [PATCH] nfs: only do COMMIT for range written with direct I/O Jeff Layton
  2011-11-30 14:56 ` Malahal Naineni
@ 2011-12-05 17:11 ` Jeff Layton
  2011-12-05 18:06   ` Myklebust, Trond
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2011-12-05 17:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: trond.myklebust, linux-nfs, khoa

On Wed, 30 Nov 2011 09:07:54 -0500
Jeff Layton <jlayton@redhat.com> wrote:

> When given a range to write with unstable writes, the current code
> always does a COMMIT of the entire file afterward. This is potentially
> expensive on some servers and unnecessary. Instead, just do a COMMIT for
> the offset and count that was written.
> 
> Khoa, who reported this bug, stated that this made a big difference in
> performance in their environment, which I believe involves GPFS on the
> server. He didn't pass along any hard numbers so I can't quantify the
> gain, but it stands to reason that clustered filesystems might suffer
> more contention issues when issuing a commit over the whole file.
> 
> Reported-by: Khoa Huynh <khoa@us.ibm.com>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/nfs/direct.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 1940f1a..33f2be7 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -77,6 +77,7 @@ struct nfs_direct_req {
>  	/* completion state */
>  	atomic_t		io_count;	/* i/os we're waiting for */
>  	spinlock_t		lock;		/* protect completion state */
> +	loff_t			pos;		/* offset into file */
>  	ssize_t			count,		/* bytes actually processed */
>  				error;		/* any reported error */
>  	struct completion	completion;	/* wait for i/o completion */
> @@ -584,8 +585,8 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
>  	data->cred = msg.rpc_cred;
>  
>  	data->args.fh = NFS_FH(data->inode);
> -	data->args.offset = 0;
> -	data->args.count = 0;
> +	data->args.offset = dreq->pos;
> +	data->args.count = dreq->count;
>  	data->args.context = dreq->ctx;
>  	data->args.lock_context = dreq->l_ctx;
>  	data->res.count = 0;
> @@ -877,6 +878,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
>  		sync = NFS_FILE_SYNC;
>  
>  	dreq->inode = inode;
> +	dreq->pos = pos;
>  	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
>  	dreq->l_ctx = nfs_get_lock_context(dreq->ctx);
>  	if (dreq->l_ctx == NULL)

Khoa found that he made a mistake when testing this originally, and any
benefit that the patch provides seems to be negligible. I still think
it's safe and reasonable to only issue a commit for the range that was
written, but there doesn't seem to be any compelling need for this
patch right now.

Trond, do you have an opinion here? Should we go ahead and commit this
patch or something like it, or leave well-enough alone?
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] nfs: only do COMMIT for range written with direct I/O
  2011-12-05 17:11 ` Jeff Layton
@ 2011-12-05 18:06   ` Myklebust, Trond
  2011-12-13 15:02     ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Myklebust, Trond @ 2011-12-05 18:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs, khoa

> -----Original Message-----
> From: Jeff Layton [mailto:jlayton@redhat.com]
> Sent: Monday, December 05, 2011 12:11 PM
> To: Jeff Layton
> Cc: Myklebust, Trond; linux-nfs@vger.kernel.org; khoa@us.ibm.com
> Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct
I/O
> 
> On Wed, 30 Nov 2011 09:07:54 -0500
> Jeff Layton <jlayton@redhat.com> wrote:
> 
> > When given a range to write with unstable writes, the current code
> > always does a COMMIT of the entire file afterward. This is
potentially
> > expensive on some servers and unnecessary. Instead, just do a COMMIT
> > for the offset and count that was written.
> >
> > Khoa, who reported this bug, stated that this made a big difference
in
> > performance in their environment, which I believe involves GPFS on
the
> > server. He didn't pass along any hard numbers so I can't quantify
the
> > gain, but it stands to reason that clustered filesystems might
suffer
> > more contention issues when issuing a commit over the whole file.
> >
> > Reported-by: Khoa Huynh <khoa@us.ibm.com>
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> 
> Khoa found that he made a mistake when testing this originally, and
any
> benefit that the patch provides seems to be negligible. I still think
it's safe
> and reasonable to only issue a commit for the range that was written,
but
> there doesn't seem to be any compelling need for this patch right now.
> 
> Trond, do you have an opinion here? Should we go ahead and commit this
> patch or something like it, or leave well-enough alone?

I'd prefer to wait until I see a tangible benefit. I know that recent
kernels do have support for a COMMIT range on the Linux kernel server
side, so maybe it is just a question of shooting up an ext4 or XFS based
server and running a few tests with a large O_DIRECT writer on one
client, and a smaller O_DIRECT writer on another...

Cheers
  Trond

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfs: only do COMMIT for range written with direct I/O
  2011-12-05 18:06   ` Myklebust, Trond
@ 2011-12-13 15:02     ` Jeff Layton
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2011-12-13 15:02 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: linux-nfs, khoa

On Mon, 5 Dec 2011 10:06:49 -0800
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> > -----Original Message-----
> > From: Jeff Layton [mailto:jlayton@redhat.com]
> > Sent: Monday, December 05, 2011 12:11 PM
> > To: Jeff Layton
> > Cc: Myklebust, Trond; linux-nfs@vger.kernel.org; khoa@us.ibm.com
> > Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct
> I/O
> > 
> > On Wed, 30 Nov 2011 09:07:54 -0500
> > Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > > When given a range to write with unstable writes, the current code
> > > always does a COMMIT of the entire file afterward. This is
> potentially
> > > expensive on some servers and unnecessary. Instead, just do a COMMIT
> > > for the offset and count that was written.
> > >
> > > Khoa, who reported this bug, stated that this made a big difference
> in
> > > performance in their environment, which I believe involves GPFS on
> the
> > > server. He didn't pass along any hard numbers so I can't quantify
> the
> > > gain, but it stands to reason that clustered filesystems might
> suffer
> > > more contention issues when issuing a commit over the whole file.
> > >
> > > Reported-by: Khoa Huynh <khoa@us.ibm.com>
> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > 
> > Khoa found that he made a mistake when testing this originally, and
> any
> > benefit that the patch provides seems to be negligible. I still think
> it's safe
> > and reasonable to only issue a commit for the range that was written,
> but
> > there doesn't seem to be any compelling need for this patch right now.
> > 
> > Trond, do you have an opinion here? Should we go ahead and commit this
> > patch or something like it, or leave well-enough alone?
> 
> I'd prefer to wait until I see a tangible benefit. I know that recent
> kernels do have support for a COMMIT range on the Linux kernel server
> side, so maybe it is just a question of shooting up an ext4 or XFS based
> server and running a few tests with a large O_DIRECT writer on one
> client, and a smaller O_DIRECT writer on another...
> 

Khoa went back and did his tests and they confirm that there was no
tangible benefit in his environment. I did some minimal testing on
other rigs too, but also didn't see any clear benefit.

Of course, this is highly server-dependent, so there may be places
where this matters more, but for now I don't know of any. At this
point, I suggest we drop this patch for now. If we can note some
tangible benefit later, we can resurrect it then.

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-12-13 15:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-30 14:07 [PATCH] nfs: only do COMMIT for range written with direct I/O Jeff Layton
2011-11-30 14:56 ` Malahal Naineni
2011-11-30 19:25   ` Jeff Layton
2011-12-05 17:11 ` Jeff Layton
2011-12-05 18:06   ` Myklebust, Trond
2011-12-13 15:02     ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.