All of lore.kernel.org
 help / color / mirror / Atom feed
* fallocate vs O_(D)SYNC
@ 2011-11-16  8:42 ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16  8:42 UTC (permalink / raw)
  To: linux-btrfs, linux-ext4, mfasheh, jlbec, cluster-devel

It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
make sure the size update transaction made it to disk.

Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
operation (it adds new blocks that return zeroes) that seems like a
fairly nasty surprise for O_SYNC users.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16  8:42 ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16  8:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
make sure the size update transaction made it to disk.

Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
operation (it adds new blocks that return zeroes) that seems like a
fairly nasty surprise for O_SYNC users.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16  8:42 ` [Cluster-devel] " Christoph Hellwig
@ 2011-11-16  9:43   ` Steven Whitehouse
  -1 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2011-11-16  9:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, linux-ext4, mfasheh, jlbec, cluster-devel

Hi,

On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> make sure the size update transaction made it to disk.
> 
> Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> operation (it adds new blocks that return zeroes) that seems like a
> fairly nasty surprise for O_SYNC users.
> 


In GFS2 we zero out the data blocks as we go (since our metadata doesn't
allow us to mark blocks as zeroed at alloc time) and also because we are
mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
on our rindex system file in order to ensure that there is always enough
space to expand a filesystem.

So there is no danger of having non-zeroed blocks appearing later, as
that is done before the metadata change.

Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
call, so that fsync should pick that up and ensure that the metadata has
been written back. So we should thus have both data and metadata stable
on disk.

Do you have some evidence that this is not happening?

Steve.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16  9:43   ` Steven Whitehouse
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2011-11-16  9:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> make sure the size update transaction made it to disk.
> 
> Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> operation (it adds new blocks that return zeroes) that seems like a
> fairly nasty surprise for O_SYNC users.
> 


In GFS2 we zero out the data blocks as we go (since our metadata doesn't
allow us to mark blocks as zeroed at alloc time) and also because we are
mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
on our rindex system file in order to ensure that there is always enough
space to expand a filesystem.

So there is no danger of having non-zeroed blocks appearing later, as
that is done before the metadata change.

Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
call, so that fsync should pick that up and ensure that the metadata has
been written back. So we should thus have both data and metadata stable
on disk.

Do you have some evidence that this is not happening?

Steve.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16  9:43   ` Steven Whitehouse
@ 2011-11-16 10:54     ` Jan Kara
  -1 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2011-11-16 10:54 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Christoph Hellwig, linux-btrfs, linux-ext4, mfasheh, jlbec,
	cluster-devel

  Hello,

On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > make sure the size update transaction made it to disk.
> > 
> > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > operation (it adds new blocks that return zeroes) that seems like a
> > fairly nasty surprise for O_SYNC users.
> 
> In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> allow us to mark blocks as zeroed at alloc time) and also because we are
> mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> on our rindex system file in order to ensure that there is always enough
> space to expand a filesystem.
> 
> So there is no danger of having non-zeroed blocks appearing later, as
> that is done before the metadata change.
> 
> Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> call, so that fsync should pick that up and ensure that the metadata has
> been written back. So we should thus have both data and metadata stable
> on disk.
> 
> Do you have some evidence that this is not happening?
  Yeah, only that nobody calls that fsync() automatically if the fd is
O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
most filesystems? That would match how we treat O_SYNC for other operations
as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
with this.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 10:54     ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2011-11-16 10:54 UTC (permalink / raw)
  To: cluster-devel.redhat.com

  Hello,

On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > make sure the size update transaction made it to disk.
> > 
> > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > operation (it adds new blocks that return zeroes) that seems like a
> > fairly nasty surprise for O_SYNC users.
> 
> In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> allow us to mark blocks as zeroed at alloc time) and also because we are
> mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> on our rindex system file in order to ensure that there is always enough
> space to expand a filesystem.
> 
> So there is no danger of having non-zeroed blocks appearing later, as
> that is done before the metadata change.
> 
> Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> call, so that fsync should pick that up and ensure that the metadata has
> been written back. So we should thus have both data and metadata stable
> on disk.
> 
> Do you have some evidence that this is not happening?
  Yeah, only that nobody calls that fsync() automatically if the fd is
O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
most filesystems? That would match how we treat O_SYNC for other operations
as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
with this.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 10:54     ` Jan Kara
@ 2011-11-16 11:20       ` Steven Whitehouse
  -1 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2011-11-16 11:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-btrfs, linux-ext4, mfasheh, jlbec,
	cluster-devel

Hi,

On Wed, 2011-11-16 at 11:54 +0100, Jan Kara wrote:
> Hello,
> 
> On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> > On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > > make sure the size update transaction made it to disk.
> > > 
> > > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > > operation (it adds new blocks that return zeroes) that seems like a
> > > fairly nasty surprise for O_SYNC users.
> > 
> > In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> > allow us to mark blocks as zeroed at alloc time) and also because we are
> > mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> > on our rindex system file in order to ensure that there is always enough
> > space to expand a filesystem.
> > 
> > So there is no danger of having non-zeroed blocks appearing later, as
> > that is done before the metadata change.
> > 
> > Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> > call, so that fsync should pick that up and ensure that the metadata has
> > been written back. So we should thus have both data and metadata stable
> > on disk.
> > 
> > Do you have some evidence that this is not happening?
>   Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.
> 
> 								Honza

Ah, I see now. Sorry, I missed the original point. So that would just be
a VFS addition to check the O_(D)SYNC flag as you suggest. I've no
objections to that, it makes sense to me,

Steve.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 11:20       ` Steven Whitehouse
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2011-11-16 11:20 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On Wed, 2011-11-16 at 11:54 +0100, Jan Kara wrote:
> Hello,
> 
> On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> > On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > > make sure the size update transaction made it to disk.
> > > 
> > > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > > operation (it adds new blocks that return zeroes) that seems like a
> > > fairly nasty surprise for O_SYNC users.
> > 
> > In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> > allow us to mark blocks as zeroed at alloc time) and also because we are
> > mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> > on our rindex system file in order to ensure that there is always enough
> > space to expand a filesystem.
> > 
> > So there is no danger of having non-zeroed blocks appearing later, as
> > that is done before the metadata change.
> > 
> > Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> > call, so that fsync should pick that up and ensure that the metadata has
> > been written back. So we should thus have both data and metadata stable
> > on disk.
> > 
> > Do you have some evidence that this is not happening?
>   Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.
> 
> 								Honza

Ah, I see now. Sorry, I missed the original point. So that would just be
a VFS addition to check the O_(D)SYNC flag as you suggest. I've no
objections to that, it makes sense to me,

Steve.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: fallocate vs O_(D)SYNC
  2011-11-16  8:42 ` [Cluster-devel] " Christoph Hellwig
  (?)
  (?)
@ 2011-11-16 11:28 ` Zheng Liu
  -1 siblings, 0 replies; 27+ messages in thread
From: Zheng Liu @ 2011-11-16 11:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-btrfs, linux-ext4, mfasheh, jlbec, cluster-devel, jack,
	swhiteho, tytso

On Wed, Nov 16, 2011 at 03:42:56AM -0500, Christoph Hellwig wrote:
> It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> make sure the size update transaction made it to disk.
> 
> Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> operation (it adds new blocks that return zeroes) that seems like a
> fairly nasty surprise for O_SYNC users.

Hi all,

This patch should be fix this problem in ext4.

From: Zheng Liu <wenqing.lz@taobao.com>

Make sure the transaction to be commited if O_(D)SYNC flag is set in
ext4_fallocate().

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..f47e3ad 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4356,6 +4356,8 @@ retry:
                        ret = PTR_ERR(handle);
                        break;
                }
+               if (file->f_flags & O_SYNC)
+                       ext4_handle_sync(handle);
                ret = ext4_map_blocks(handle, inode, &map, flags);
                if (ret <= 0) {
 #ifdef EXT4FS_DEBUG
-- 
1.7.4.1


> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 10:54     ` Jan Kara
@ 2011-11-16 12:45       ` Christoph Hellwig
  -1 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16 12:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: Steven Whitehouse, Christoph Hellwig, linux-btrfs, linux-ext4,
	mfasheh, jlbec, cluster-devel

On Wed, Nov 16, 2011 at 11:54:13AM +0100, Jan Kara wrote:
>   Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.

This would work fine with XFS and be equivalent to what it does for
O_DSYNC now.  But I'd rather see every filesystem do the right thing
and make sure the update actually is on disk when doing O_(D)SYNC
operations.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 12:45       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16 12:45 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Nov 16, 2011 at 11:54:13AM +0100, Jan Kara wrote:
>   Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.

This would work fine with XFS and be equivalent to what it does for
O_DSYNC now.  But I'd rather see every filesystem do the right thing
and make sure the update actually is on disk when doing O_(D)SYNC
operations.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 12:45       ` Christoph Hellwig
@ 2011-11-16 13:39         ` Jan Kara
  -1 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2011-11-16 13:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Steven Whitehouse, linux-btrfs, linux-ext4, mfasheh,
	jlbec, cluster-devel

On Wed 16-11-11 07:45:50, Christoph Hellwig wrote:
> On Wed, Nov 16, 2011 at 11:54:13AM +0100, Jan Kara wrote:
> >   Yeah, only that nobody calls that fsync() automatically if the fd is
> > O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> > fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> > most filesystems? That would match how we treat O_SYNC for other operations
> > as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> > with this.
> 
> This would work fine with XFS and be equivalent to what it does for
> O_DSYNC now.  But I'd rather see every filesystem do the right thing
> and make sure the update actually is on disk when doing O_(D)SYNC
> operations.
  OK, I don't really have a strong opinion here. Are you afraid that just
calling fsync() need not be enough to push all updates fallocate did to
disk?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 13:39         ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2011-11-16 13:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed 16-11-11 07:45:50, Christoph Hellwig wrote:
> On Wed, Nov 16, 2011 at 11:54:13AM +0100, Jan Kara wrote:
> >   Yeah, only that nobody calls that fsync() automatically if the fd is
> > O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> > fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> > most filesystems? That would match how we treat O_SYNC for other operations
> > as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> > with this.
> 
> This would work fine with XFS and be equivalent to what it does for
> O_DSYNC now.  But I'd rather see every filesystem do the right thing
> and make sure the update actually is on disk when doing O_(D)SYNC
> operations.
  OK, I don't really have a strong opinion here. Are you afraid that just
calling fsync() need not be enough to push all updates fallocate did to
disk?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 13:39         ` Jan Kara
@ 2011-11-16 13:42           ` Christoph Hellwig
  -1 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16 13:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Steven Whitehouse, linux-btrfs, linux-ext4,
	mfasheh, jlbec, cluster-devel

On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > This would work fine with XFS and be equivalent to what it does for
> > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > and make sure the update actually is on disk when doing O_(D)SYNC
> > operations.
>   OK, I don't really have a strong opinion here. Are you afraid that just
> calling fsync() need not be enough to push all updates fallocate did to
> disk?

No, the point is that you should not have to call fsync when doing
O_SYNC I/O.  That's the whole point of it.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 13:42           ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16 13:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > This would work fine with XFS and be equivalent to what it does for
> > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > and make sure the update actually is on disk when doing O_(D)SYNC
> > operations.
>   OK, I don't really have a strong opinion here. Are you afraid that just
> calling fsync() need not be enough to push all updates fallocate did to
> disk?

No, the point is that you should not have to call fsync when doing
O_SYNC I/O.  That's the whole point of it.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 13:42           ` Christoph Hellwig
@ 2011-11-16 15:57             ` Jan Kara
  -1 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2011-11-16 15:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Steven Whitehouse, linux-btrfs, linux-ext4, mfasheh,
	jlbec, cluster-devel

On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > This would work fine with XFS and be equivalent to what it does for
> > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > operations.
> >   OK, I don't really have a strong opinion here. Are you afraid that just
> > calling fsync() need not be enough to push all updates fallocate did to
> > disk?
> 
> No, the point is that you should not have to call fsync when doing
> O_SYNC I/O.  That's the whole point of it.
  I agree with you that userspace shouldn't have to call fsync. What I
meant is that sys_fallocate() or do_fallocate() can call
generic_write_sync(file, pos, len), and that would be completely
transparent to userspace.

									Honza

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 15:57             ` Jan Kara
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2011-11-16 15:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > This would work fine with XFS and be equivalent to what it does for
> > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > operations.
> >   OK, I don't really have a strong opinion here. Are you afraid that just
> > calling fsync() need not be enough to push all updates fallocate did to
> > disk?
> 
> No, the point is that you should not have to call fsync when doing
> O_SYNC I/O.  That's the whole point of it.
  I agree with you that userspace shouldn't have to call fsync. What I
meant is that sys_fallocate() or do_fallocate() can call
generic_write_sync(file, pos, len), and that would be completely
transparent to userspace.

									Honza



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 15:57             ` Jan Kara
@ 2011-11-16 16:16               ` Christoph Hellwig
  -1 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16 16:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Steven Whitehouse, linux-btrfs, linux-ext4,
	mfasheh, jlbec, cluster-devel

On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
>   I agree with you that userspace shouldn't have to call fsync. What I
> meant is that sys_fallocate() or do_fallocate() can call
> generic_write_sync(file, pos, len), and that would be completely
> transparent to userspace.

That's different from how everything else in the I/O path works.
If filessystem want to use it, that's fine, but I suspect most could
do it more efficiently.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-16 16:16               ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-11-16 16:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
>   I agree with you that userspace shouldn't have to call fsync. What I
> meant is that sys_fallocate() or do_fallocate() can call
> generic_write_sync(file, pos, len), and that would be completely
> transparent to userspace.

That's different from how everything else in the I/O path works.
If filessystem want to use it, that's fine, but I suspect most could
do it more efficiently.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 15:57             ` Jan Kara
  (?)
  (?)
@ 2011-11-16 16:18             ` Chris Mason
  2011-11-16 19:35               ` Mark Fasheh
  -1 siblings, 1 reply; 27+ messages in thread
From: Chris Mason @ 2011-11-16 16:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Steven Whitehouse, linux-btrfs, linux-ext4,
	mfasheh, jlbec, cluster-devel

On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
> On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> > On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > > This would work fine with XFS and be equivalent to what it does for
> > > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > > operations.
> > >   OK, I don't really have a strong opinion here. Are you afraid that just
> > > calling fsync() need not be enough to push all updates fallocate did to
> > > disk?
> > 
> > No, the point is that you should not have to call fsync when doing
> > O_SYNC I/O.  That's the whole point of it.
>   I agree with you that userspace shouldn't have to call fsync. What I
> meant is that sys_fallocate() or do_fallocate() can call
> generic_write_sync(file, pos, len), and that would be completely
> transparent to userspace.

We should do it per FS though, I'll patch up btrfs.

-chris

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 16:18             ` Chris Mason
@ 2011-11-16 19:35               ` Mark Fasheh
  2011-11-16 20:03                 ` Mark Fasheh
  2011-11-16 20:03                 ` Mark Fasheh
  0 siblings, 2 replies; 27+ messages in thread
From: Mark Fasheh @ 2011-11-16 19:35 UTC (permalink / raw)
  To: Chris Mason, Jan Kara, Christoph Hellwig, Steven Whitehouse,
	linux-btrfs, linux-ext4

On Wed, Nov 16, 2011 at 11:18:06AM -0500, Chris Mason wrote:
> On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
> > On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> > > On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > > > This would work fine with XFS and be equivalent to what it does for
> > > > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > > > operations.
> > > >   OK, I don't really have a strong opinion here. Are you afraid that just
> > > > calling fsync() need not be enough to push all updates fallocate did to
> > > > disk?
> > > 
> > > No, the point is that you should not have to call fsync when doing
> > > O_SYNC I/O.  That's the whole point of it.
> >   I agree with you that userspace shouldn't have to call fsync. What I
> > meant is that sys_fallocate() or do_fallocate() can call
> > generic_write_sync(file, pos, len), and that would be completely
> > transparent to userspace.
> 
> We should do it per FS though, I'll patch up btrfs.

I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
journal transaction as synchronous.
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 19:35               ` Mark Fasheh
@ 2011-11-16 20:03                 ` Mark Fasheh
  2011-11-17 10:16                     ` [Cluster-devel] " Joel Becker
  2011-11-16 20:03                 ` Mark Fasheh
  1 sibling, 1 reply; 27+ messages in thread
From: Mark Fasheh @ 2011-11-16 20:03 UTC (permalink / raw)
  To: Chris Mason, Jan Kara, Christoph Hellwig, Steven Whitehouse,
	linux-btrfs, linux-ext4

On Wed, Nov 16, 2011 at 11:35:40AM -0800, Mark Fasheh wrote:
> > We should do it per FS though, I'll patch up btrfs.
> 
> I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
> journal transaction as synchronous.

Joel, here's an (untested) patch to fix this in Ocfs2.
	--Mark

--
Mark Fasheh

From: Mark Fasheh <mfasheh@suse.com>

ocfs2: honor O_(D)SYNC flag in fallocate

We need to sync the transaction which updates i_size if the file is marked
as needing sync semantics.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/ocfs2/file.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index de4ea1a..cac00b4 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1950,6 +1950,9 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
 	if (ret < 0)
 		mlog_errno(ret);
 
+	if (file->f_flags & O_SYNC)
+		handle->h_sync = 1;
+
 	ocfs2_commit_trans(osb, handle);
 
 out_inode_unlock:
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Cluster-devel] fallocate vs O_(D)SYNC
  2011-11-16 19:35               ` Mark Fasheh
  2011-11-16 20:03                 ` Mark Fasheh
@ 2011-11-16 20:03                 ` Mark Fasheh
  1 sibling, 0 replies; 27+ messages in thread
From: Mark Fasheh @ 2011-11-16 20:03 UTC (permalink / raw)
  To: Chris Mason, Jan Kara, Christoph Hellwig, Steven Whitehouse,
	linux-btrfs, linux-ext4

On Wed, Nov 16, 2011 at 11:35:40AM -0800, Mark Fasheh wrote:
> > We should do it per FS though, I'll patch up btrfs.
> 
> I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
> journal transaction as synchronous.

Joel, here's an (untested) patch to fix this in Ocfs2.
	--Mark

--
Mark Fasheh

From: Mark Fasheh <mfasheh@suse.com>

ocfs2: honor O_(D)SYNC flag in fallocate

We need to sync the transaction which updates i_size if the file is marked
as needing sync semantics.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/ocfs2/file.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index de4ea1a..cac00b4 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1950,6 +1950,9 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
 	if (ret < 0)
 		mlog_errno(ret);
 
+	if (file->f_flags & O_SYNC)
+		handle->h_sync = 1;
+
 	ocfs2_commit_trans(osb, handle);
 
 out_inode_unlock:
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: fallocate vs O_(D)SYNC
  2011-11-16 20:03                 ` Mark Fasheh
@ 2011-11-17 10:16                     ` Joel Becker
  0 siblings, 0 replies; 27+ messages in thread
From: Joel Becker @ 2011-11-17 10:16 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: cluster-devel, Jan Kara, Chris Mason, linux-ext4, linux-btrfs

On Wed, Nov 16, 2011 at 12:03:10PM -0800, Mark Fasheh wrote:
> On Wed, Nov 16, 2011 at 11:35:40AM -0800, Mark Fasheh wrote:
> > > We should do it per FS though, I'll patch up btrfs.
> > 
> > I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
> > journal transaction as synchronous.
> 
> Joel, here's an (untested) patch to fix this in Ocfs2.
> 	--Mark
> 
> --
> Mark Fasheh
> 
> From: Mark Fasheh <mfasheh@suse.com>
> 
> ocfs2: honor O_(D)SYNC flag in fallocate
> 
> We need to sync the transaction which updates i_size if the file is marked
> as needing sync semantics.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>

	Makes sense to me.  

Joel

> ---
>  fs/ocfs2/file.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index de4ea1a..cac00b4 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1950,6 +1950,9 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>  	if (ret < 0)
>  		mlog_errno(ret);
>  
> +	if (file->f_flags & O_SYNC)
> +		handle->h_sync = 1;
> +
>  	ocfs2_commit_trans(osb, handle);
>  
>  out_inode_unlock:
> -- 
> 1.7.6
> 

-- 

Life's Little Instruction Book #347

	"Never waste the oppourtunity to tell someone you love them."

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-17 10:16                     ` Joel Becker
  0 siblings, 0 replies; 27+ messages in thread
From: Joel Becker @ 2011-11-17 10:16 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Nov 16, 2011 at 12:03:10PM -0800, Mark Fasheh wrote:
> On Wed, Nov 16, 2011 at 11:35:40AM -0800, Mark Fasheh wrote:
> > > We should do it per FS though, I'll patch up btrfs.
> > 
> > I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
> > journal transaction as synchronous.
> 
> Joel, here's an (untested) patch to fix this in Ocfs2.
> 	--Mark
> 
> --
> Mark Fasheh
> 
> From: Mark Fasheh <mfasheh@suse.com>
> 
> ocfs2: honor O_(D)SYNC flag in fallocate
> 
> We need to sync the transaction which updates i_size if the file is marked
> as needing sync semantics.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>

	Makes sense to me.  

Joel

> ---
>  fs/ocfs2/file.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index de4ea1a..cac00b4 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1950,6 +1950,9 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>  	if (ret < 0)
>  		mlog_errno(ret);
>  
> +	if (file->f_flags & O_SYNC)
> +		handle->h_sync = 1;
> +
>  	ocfs2_commit_trans(osb, handle);
>  
>  out_inode_unlock:
> -- 
> 1.7.6
> 

-- 

Life's Little Instruction Book #347

	"Never waste the oppourtunity to tell someone you love them."

			http://www.jlbec.org/
			jlbec at evilplan.org



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: fallocate vs O_(D)SYNC
  2011-11-16 15:57             ` Jan Kara
@ 2011-11-18 12:09               ` Steven Whitehouse
  -1 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2011-11-18 12:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: cluster-devel, linux-ext4, linux-btrfs

Hi,

Here is what I'm planning for GFS2:


Add sync of metadata after fallocate for O_SYNC files to ensure that we
meet expectations for everything being on disk in this case.
Unfortunately, the offset and len parameters are modified during the
course of the fallocate function, so I've had to add a couple of new
variables to call generic_write_sync() at the end.

I know that potentially this will sync data as well within the range,
but I think that is a fairly harmless side-effect overall, since we
would not normally expect there to be any dirty data within the range in
question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Benjamin Marzinski <bmarzins@redhat.com>

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 6336bc6..9b6c6ac 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -752,6 +752,8 @@ static long gfs2_fallocate(struct file *file, int mode, loff_t offset,
 	loff_t bytes, max_bytes;
 	struct gfs2_alloc *al;
 	int error;
+	const loff_t pos = offset;
+	const loff_t count = len;
 	loff_t bsize_mask = ~((loff_t)sdp->sd_sb.sb_bsize - 1);
 	loff_t next = (offset + len - 1) >> sdp->sd_sb.sb_bsize_shift;
 	loff_t max_chunk_size = UINT_MAX & bsize_mask;
@@ -834,6 +836,9 @@ retry:
 		gfs2_quota_unlock(ip);
 		gfs2_alloc_put(ip);
 	}
+
+	if (error == 0)
+		error = generic_write_sync(file, pos, count);
 	goto out_unlock;
 
 out_trans_fail:

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Cluster-devel] fallocate vs O_(D)SYNC
@ 2011-11-18 12:09               ` Steven Whitehouse
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2011-11-18 12:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Here is what I'm planning for GFS2:


Add sync of metadata after fallocate for O_SYNC files to ensure that we
meet expectations for everything being on disk in this case.
Unfortunately, the offset and len parameters are modified during the
course of the fallocate function, so I've had to add a couple of new
variables to call generic_write_sync() at the end.

I know that potentially this will sync data as well within the range,
but I think that is a fairly harmless side-effect overall, since we
would not normally expect there to be any dirty data within the range in
question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Benjamin Marzinski <bmarzins@redhat.com>

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 6336bc6..9b6c6ac 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -752,6 +752,8 @@ static long gfs2_fallocate(struct file *file, int mode, loff_t offset,
 	loff_t bytes, max_bytes;
 	struct gfs2_alloc *al;
 	int error;
+	const loff_t pos = offset;
+	const loff_t count = len;
 	loff_t bsize_mask = ~((loff_t)sdp->sd_sb.sb_bsize - 1);
 	loff_t next = (offset + len - 1) >> sdp->sd_sb.sb_bsize_shift;
 	loff_t max_chunk_size = UINT_MAX & bsize_mask;
@@ -834,6 +836,9 @@ retry:
 		gfs2_quota_unlock(ip);
 		gfs2_alloc_put(ip);
 	}
+
+	if (error == 0)
+		error = generic_write_sync(file, pos, count);
 	goto out_unlock;
 
 out_trans_fail:




^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-11-18 12:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-16  8:42 fallocate vs O_(D)SYNC Christoph Hellwig
2011-11-16  8:42 ` [Cluster-devel] " Christoph Hellwig
2011-11-16  9:43 ` Steven Whitehouse
2011-11-16  9:43   ` Steven Whitehouse
2011-11-16 10:54   ` Jan Kara
2011-11-16 10:54     ` Jan Kara
2011-11-16 11:20     ` Steven Whitehouse
2011-11-16 11:20       ` Steven Whitehouse
2011-11-16 12:45     ` Christoph Hellwig
2011-11-16 12:45       ` Christoph Hellwig
2011-11-16 13:39       ` Jan Kara
2011-11-16 13:39         ` Jan Kara
2011-11-16 13:42         ` Christoph Hellwig
2011-11-16 13:42           ` Christoph Hellwig
2011-11-16 15:57           ` Jan Kara
2011-11-16 15:57             ` Jan Kara
2011-11-16 16:16             ` Christoph Hellwig
2011-11-16 16:16               ` Christoph Hellwig
2011-11-16 16:18             ` Chris Mason
2011-11-16 19:35               ` Mark Fasheh
2011-11-16 20:03                 ` Mark Fasheh
2011-11-17 10:16                   ` Joel Becker
2011-11-17 10:16                     ` [Cluster-devel] " Joel Becker
2011-11-16 20:03                 ` Mark Fasheh
2011-11-18 12:09             ` Steven Whitehouse
2011-11-18 12:09               ` [Cluster-devel] " Steven Whitehouse
2011-11-16 11:28 ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.