All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix nfsd stable write implementation
@ 2012-10-26 21:06 ` J. Bruce Fields
  0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-26 21:06 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Peter Staubach, J. Bruce Fields

From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Peter pointed out to me that the nfs server is implementing stable
writes by setting the O_SYNC flag.  I can't see why we couldn't write
and then sync instead, but I don't know this stuff as well as I should;
does the following look reasonable to people?

--b.

J. Bruce Fields (2):
  nfsd: assume writeable exportabled filesystems have f_sync
  nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes

 fs/nfsd/vfs.c |   26 ++++++--------------------
 1 file changed, 6 insertions(+), 20 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 0/2] fix nfsd stable write implementation
@ 2012-10-26 21:06 ` J. Bruce Fields
  0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-26 21:06 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel; +Cc: Peter Staubach, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

Peter pointed out to me that the nfs server is implementing stable
writes by setting the O_SYNC flag.  I can't see why we couldn't write
and then sync instead, but I don't know this stuff as well as I should;
does the following look reasonable to people?

--b.

J. Bruce Fields (2):
  nfsd: assume writeable exportabled filesystems have f_sync
  nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes

 fs/nfsd/vfs.c |   26 ++++++--------------------
 1 file changed, 6 insertions(+), 20 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] nfsd: assume writeable exportabled filesystems have f_sync
  2012-10-26 21:06 ` J. Bruce Fields
@ 2012-10-26 21:06     ` J. Bruce Fields
  -1 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-26 21:06 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Peter Staubach, J. Bruce Fields

From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

I don't really see how you could claim to support nfsd and not support
fsync somehow.

And in practice a quick look through the exportable filesystems suggests
the only ones without an ->fsync are read-only (efs, isofs, squashfs) or
in-memory (shmem).

Also, performing a write and then returning an error if the sync fails
(as we would do here in the wgather case) seems unhelpful to clients.

Also remove an incorrect comment.

Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/nfsd/vfs.c |   13 -------------
 1 file changed, 13 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c120b48..ed3eb59 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1020,21 +1020,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	inode = dentry->d_inode;
 	exp   = fhp->fh_export;
 
-	/*
-	 * Request sync writes if
-	 *  -	the sync export option has been set, or
-	 *  -	the client requested O_SYNC behavior (NFSv3 feature).
-	 *  -   The file system doesn't support fsync().
-	 * When NFSv2 gathered writes have been configured for this volume,
-	 * flushing the data to disk is handled separately below.
-	 */
 	use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp);
 
-	if (!file->f_op->fsync) {/* COMMIT3 cannot work */
-	       stable = 2;
-	       *stablep = 2; /* FILE_SYNC */
-	}
-
 	if (!EX_ISSYNC(exp))
 		stable = 0;
 	if (stable && !use_wgather) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] nfsd: assume writeable exportabled filesystems have f_sync
@ 2012-10-26 21:06     ` J. Bruce Fields
  0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-26 21:06 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel; +Cc: Peter Staubach, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

I don't really see how you could claim to support nfsd and not support
fsync somehow.

And in practice a quick look through the exportable filesystems suggests
the only ones without an ->fsync are read-only (efs, isofs, squashfs) or
in-memory (shmem).

Also, performing a write and then returning an error if the sync fails
(as we would do here in the wgather case) seems unhelpful to clients.

Also remove an incorrect comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/vfs.c |   13 -------------
 1 file changed, 13 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c120b48..ed3eb59 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1020,21 +1020,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	inode = dentry->d_inode;
 	exp   = fhp->fh_export;
 
-	/*
-	 * Request sync writes if
-	 *  -	the sync export option has been set, or
-	 *  -	the client requested O_SYNC behavior (NFSv3 feature).
-	 *  -   The file system doesn't support fsync().
-	 * When NFSv2 gathered writes have been configured for this volume,
-	 * flushing the data to disk is handled separately below.
-	 */
 	use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp);
 
-	if (!file->f_op->fsync) {/* COMMIT3 cannot work */
-	       stable = 2;
-	       *stablep = 2; /* FILE_SYNC */
-	}
-
 	if (!EX_ISSYNC(exp))
 		stable = 0;
 	if (stable && !use_wgather) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
  2012-10-26 21:06 ` J. Bruce Fields
@ 2012-10-26 21:06     ` J. Bruce Fields
  -1 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-26 21:06 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
  Cc: Peter Staubach, J. Bruce Fields

From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

NFSv4 shares the same struct file across multiple writes.  (And we'd
like NFSv2 and NFSv3 to do that as well some day.)

So setting O_SYNC on the struct file as a way to request a synchronous
write doesn't work.

Instead, do a vfs_fsync_range() in that case.

Reported-by: Peter Staubach <pstaubach-83r9SdEf25FBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/nfsd/vfs.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ed3eb59..b584205 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1024,11 +1024,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 
 	if (!EX_ISSYNC(exp))
 		stable = 0;
-	if (stable && !use_wgather) {
-		spin_lock(&file->f_lock);
-		file->f_flags |= O_SYNC;
-		spin_unlock(&file->f_lock);
-	}
 
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
@@ -1044,8 +1039,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	if (inode->i_mode & (S_ISUID | S_ISGID))
 		kill_suid(dentry);
 
-	if (stable && use_wgather)
-		host_err = wait_for_concurrent_writes(file);
+	if (stable) {
+		if (use_wgather)
+			host_err = wait_for_concurrent_writes(file);
+		else
+			host_err = vfs_fsync_range(file, offset, offset+*cnt, 0);
+	}
 
 out_nfserr:
 	dprintk("nfsd: write complete host_err=%d\n", host_err);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
@ 2012-10-26 21:06     ` J. Bruce Fields
  0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-26 21:06 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel; +Cc: Peter Staubach, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

NFSv4 shares the same struct file across multiple writes.  (And we'd
like NFSv2 and NFSv3 to do that as well some day.)

So setting O_SYNC on the struct file as a way to request a synchronous
write doesn't work.

Instead, do a vfs_fsync_range() in that case.

Reported-by: Peter Staubach <pstaubach@exagrid.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/vfs.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ed3eb59..b584205 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1024,11 +1024,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 
 	if (!EX_ISSYNC(exp))
 		stable = 0;
-	if (stable && !use_wgather) {
-		spin_lock(&file->f_lock);
-		file->f_flags |= O_SYNC;
-		spin_unlock(&file->f_lock);
-	}
 
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
@@ -1044,8 +1039,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	if (inode->i_mode & (S_ISUID | S_ISGID))
 		kill_suid(dentry);
 
-	if (stable && use_wgather)
-		host_err = wait_for_concurrent_writes(file);
+	if (stable) {
+		if (use_wgather)
+			host_err = wait_for_concurrent_writes(file);
+		else
+			host_err = vfs_fsync_range(file, offset, offset+*cnt, 0);
+	}
 
 out_nfserr:
 	dprintk("nfsd: write complete host_err=%d\n", host_err);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] fix nfsd stable write implementation
  2012-10-26 21:06 ` J. Bruce Fields
@ 2012-10-29 23:28     ` NeilBrown
  -1 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-10-29 23:28 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Peter Staubach

[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]

On Fri, 26 Oct 2012 17:06:55 -0400 "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Peter pointed out to me that the nfs server is implementing stable
> writes by setting the O_SYNC flag.  I can't see why we couldn't write
> and then sync instead, but I don't know this stuff as well as I should;
> does the following look reasonable to people?

Bruce changed the code to implement stable writes by calling
vfs_fsync_range().  I can't see why we couldn't use O_SYNC instead.

It seems like you are making a change just for the sake of making a change.
Is there some reason that you think a separate 'sync' is more efficient than
using O_SYNC ?

As a general principle, I think it is best to give the file system as much
information as possible to that it can make whatever optimisation decisions
that it wants to.

Setting O_SYNC gives the filesystem more information than not, because it
allows it to change the behaviour of the 'write' request... though I don't
know if any filesystem actually uses the information.

Why the change?

NeilBrown

> 
> --b.
> 
> J. Bruce Fields (2):
>   nfsd: assume writeable exportabled filesystems have f_sync
>   nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
> 
>  fs/nfsd/vfs.c |   26 ++++++--------------------
>  1 file changed, 6 insertions(+), 20 deletions(-)
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] fix nfsd stable write implementation
@ 2012-10-29 23:28     ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-10-29 23:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-fsdevel, Peter Staubach

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

On Fri, 26 Oct 2012 17:06:55 -0400 "J. Bruce Fields" <bfields@redhat.com>
wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> Peter pointed out to me that the nfs server is implementing stable
> writes by setting the O_SYNC flag.  I can't see why we couldn't write
> and then sync instead, but I don't know this stuff as well as I should;
> does the following look reasonable to people?

Bruce changed the code to implement stable writes by calling
vfs_fsync_range().  I can't see why we couldn't use O_SYNC instead.

It seems like you are making a change just for the sake of making a change.
Is there some reason that you think a separate 'sync' is more efficient than
using O_SYNC ?

As a general principle, I think it is best to give the file system as much
information as possible to that it can make whatever optimisation decisions
that it wants to.

Setting O_SYNC gives the filesystem more information than not, because it
allows it to change the behaviour of the 'write' request... though I don't
know if any filesystem actually uses the information.

Why the change?

NeilBrown

> 
> --b.
> 
> J. Bruce Fields (2):
>   nfsd: assume writeable exportabled filesystems have f_sync
>   nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
> 
>  fs/nfsd/vfs.c |   26 ++++++--------------------
>  1 file changed, 6 insertions(+), 20 deletions(-)
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] fix nfsd stable write implementation
  2012-10-29 23:28     ` NeilBrown
  (?)
@ 2012-10-30 14:07     ` J. Bruce Fields
       [not found]       ` <20121030140725.GC24618-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  -1 siblings, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2012-10-30 14:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, Peter Staubach

On Tue, Oct 30, 2012 at 10:28:33AM +1100, NeilBrown wrote:
> On Fri, 26 Oct 2012 17:06:55 -0400 "J. Bruce Fields" <bfields@redhat.com>
> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > Peter pointed out to me that the nfs server is implementing stable
> > writes by setting the O_SYNC flag.  I can't see why we couldn't write
> > and then sync instead, but I don't know this stuff as well as I should;
> > does the following look reasonable to people?
> 
> Bruce changed the code to implement stable writes by calling
> vfs_fsync_range().  I can't see why we couldn't use O_SYNC instead.
> 
> It seems like you are making a change just for the sake of making a change.
> Is there some reason that you think a separate 'sync' is more efficient than
> using O_SYNC ?

Oh, sorry, see the changelog on the second patch: the problem is that
the struct file can be shared across multiple writes in the NFSv4 case,
so a single stable write could make all subsequent writes synchronous.

(And some day people would like filehandle caching for v2/v3, in which
case we'll run into the same problem.)

> As a general principle, I think it is best to give the file system as much
> information as possible to that it can make whatever optimisation decisions
> that it wants to.
> 
> Setting O_SYNC gives the filesystem more information than not, because it
> allows it to change the behaviour of the 'write' request... though I don't
> know if any filesystem actually uses the information.

I'm not sure how to figure out if that's a real problem or not.

--b.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] fix nfsd stable write implementation
  2012-10-30 14:07     ` J. Bruce Fields
@ 2012-10-30 20:30           ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-10-30 20:30 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: J. Bruce Fields, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Peter Staubach

[-- Attachment #1: Type: text/plain, Size: 2036 bytes --]

On Tue, 30 Oct 2012 10:07:25 -0400 "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
wrote:

> On Tue, Oct 30, 2012 at 10:28:33AM +1100, NeilBrown wrote:
> > On Fri, 26 Oct 2012 17:06:55 -0400 "J. Bruce Fields" <bfields-H+wXaHxf7aJhl2p70BpVqQ@public.gmane.orgm>
> > wrote:
> > 
> > > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > 
> > > Peter pointed out to me that the nfs server is implementing stable
> > > writes by setting the O_SYNC flag.  I can't see why we couldn't write
> > > and then sync instead, but I don't know this stuff as well as I should;
> > > does the following look reasonable to people?
> > 
> > Bruce changed the code to implement stable writes by calling
> > vfs_fsync_range().  I can't see why we couldn't use O_SYNC instead.
> > 
> > It seems like you are making a change just for the sake of making a change.
> > Is there some reason that you think a separate 'sync' is more efficient than
> > using O_SYNC ?
> 
> Oh, sorry, see the changelog on the second patch: the problem is that
> the struct file can be shared across multiple writes in the NFSv4 case,
> so a single stable write could make all subsequent writes synchronous.
> 
> (And some day people would like filehandle caching for v2/v3, in which
> case we'll run into the same problem.)

Ahh, I missed that.
Makes sense then, thanks.

> 
> > As a general principle, I think it is best to give the file system as much
> > information as possible to that it can make whatever optimisation decisions
> > that it wants to.
> > 
> > Setting O_SYNC gives the filesystem more information than not, because it
> > allows it to change the behaviour of the 'write' request... though I don't
> > know if any filesystem actually uses the information.
> 
> I'm not sure how to figure out if that's a real problem or not.

Search all filesystems for special handling of O_SYNC or O_DSYNC?

I strongly suspect that it is not a real problem.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] fix nfsd stable write implementation
@ 2012-10-30 20:30           ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-10-30 20:30 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, Peter Staubach

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

On Tue, 30 Oct 2012 10:07:25 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Tue, Oct 30, 2012 at 10:28:33AM +1100, NeilBrown wrote:
> > On Fri, 26 Oct 2012 17:06:55 -0400 "J. Bruce Fields" <bfields@redhat.com>
> > wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > Peter pointed out to me that the nfs server is implementing stable
> > > writes by setting the O_SYNC flag.  I can't see why we couldn't write
> > > and then sync instead, but I don't know this stuff as well as I should;
> > > does the following look reasonable to people?
> > 
> > Bruce changed the code to implement stable writes by calling
> > vfs_fsync_range().  I can't see why we couldn't use O_SYNC instead.
> > 
> > It seems like you are making a change just for the sake of making a change.
> > Is there some reason that you think a separate 'sync' is more efficient than
> > using O_SYNC ?
> 
> Oh, sorry, see the changelog on the second patch: the problem is that
> the struct file can be shared across multiple writes in the NFSv4 case,
> so a single stable write could make all subsequent writes synchronous.
> 
> (And some day people would like filehandle caching for v2/v3, in which
> case we'll run into the same problem.)

Ahh, I missed that.
Makes sense then, thanks.

> 
> > As a general principle, I think it is best to give the file system as much
> > information as possible to that it can make whatever optimisation decisions
> > that it wants to.
> > 
> > Setting O_SYNC gives the filesystem more information than not, because it
> > allows it to change the behaviour of the 'write' request... though I don't
> > know if any filesystem actually uses the information.
> 
> I'm not sure how to figure out if that's a real problem or not.

Search all filesystems for special handling of O_SYNC or O_DSYNC?

I strongly suspect that it is not a real problem.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] fix nfsd stable write implementation
  2012-10-30 20:30           ` NeilBrown
  (?)
@ 2012-11-08  0:20           ` J. Bruce Fields
  -1 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2012-11-08  0:20 UTC (permalink / raw)
  To: NeilBrown; +Cc: J. Bruce Fields, linux-nfs, linux-fsdevel, Peter Staubach

On Wed, Oct 31, 2012 at 07:30:27AM +1100, NeilBrown wrote:
> On Tue, 30 Oct 2012 10:07:25 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
> 
> > On Tue, Oct 30, 2012 at 10:28:33AM +1100, NeilBrown wrote:
> > > On Fri, 26 Oct 2012 17:06:55 -0400 "J. Bruce Fields" <bfields@redhat.com>
> > > wrote:
> > > 
> > > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > > 
> > > > Peter pointed out to me that the nfs server is implementing stable
> > > > writes by setting the O_SYNC flag.  I can't see why we couldn't write
> > > > and then sync instead, but I don't know this stuff as well as I should;
> > > > does the following look reasonable to people?
> > > 
> > > Bruce changed the code to implement stable writes by calling
> > > vfs_fsync_range().  I can't see why we couldn't use O_SYNC instead.
> > > 
> > > It seems like you are making a change just for the sake of making a change.
> > > Is there some reason that you think a separate 'sync' is more efficient than
> > > using O_SYNC ?
> > 
> > Oh, sorry, see the changelog on the second patch: the problem is that
> > the struct file can be shared across multiple writes in the NFSv4 case,
> > so a single stable write could make all subsequent writes synchronous.
> > 
> > (And some day people would like filehandle caching for v2/v3, in which
> > case we'll run into the same problem.)
> 
> Ahh, I missed that.
> Makes sense then, thanks.
> 
> > 
> > > As a general principle, I think it is best to give the file system as much
> > > information as possible to that it can make whatever optimisation decisions
> > > that it wants to.
> > > 
> > > Setting O_SYNC gives the filesystem more information than not, because it
> > > allows it to change the behaviour of the 'write' request... though I don't
> > > know if any filesystem actually uses the information.
> > 
> > I'm not sure how to figure out if that's a real problem or not.
> 
> Search all filesystems for special handling of O_SYNC or O_DSYNC?
> 
> I strongly suspect that it is not a real problem.

OK, committing for 3.8.

--b.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-08  0:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-26 21:06 [PATCH 0/2] fix nfsd stable write implementation J. Bruce Fields
2012-10-26 21:06 ` J. Bruce Fields
     [not found] ` <1351285617-20450-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 21:06   ` [PATCH 1/2] nfsd: assume writeable exportabled filesystems have f_sync J. Bruce Fields
2012-10-26 21:06     ` J. Bruce Fields
2012-10-26 21:06   ` [PATCH 2/2] nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes J. Bruce Fields
2012-10-26 21:06     ` J. Bruce Fields
2012-10-29 23:28   ` [PATCH 0/2] fix nfsd stable write implementation NeilBrown
2012-10-29 23:28     ` NeilBrown
2012-10-30 14:07     ` J. Bruce Fields
     [not found]       ` <20121030140725.GC24618-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2012-10-30 20:30         ` NeilBrown
2012-10-30 20:30           ` NeilBrown
2012-11-08  0:20           ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.