All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4)
@ 2009-09-02 13:59 Jan Kara
  2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
                   ` (18 more replies)
  0 siblings, 19 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch

  Hi,

  here is a new version of my O_SYNC cleanup patches. There are two minor
changes since last time. XFS now uses filemap_write_and_wait() as Christoph
asked and generic syncing function has been renamed to vfs_fsync_range() and
the flags controling it are gone and it has just a single datasync flag as
other syncing functions.
  If noone objects, I think the patch series is ready to be put in linux-next.

								Honza

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/16] vfs: Introduce filemap_fdatawait_range
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara

This simple helper saves some filesystems conversion from byte offset
to page numbers and also makes the fdata* interface more complete.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/fs.h |    2 ++
 mm/filemap.c       |   20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 73e9b64..dde471a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2076,6 +2076,8 @@ extern int write_inode_now(struct inode *, int);
 extern int filemap_fdatawrite(struct address_space *);
 extern int filemap_flush(struct address_space *);
 extern int filemap_fdatawait(struct address_space *);
+extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
+				   loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index ccea3b6..65b2e50 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -307,6 +307,26 @@ int wait_on_page_writeback_range(struct address_space *mapping,
 }
 
 /**
+ * filemap_fdatawait_range - wait for all under-writeback pages to complete in a given range
+ * @mapping: address space structure to wait for
+ * @start:	offset in bytes where the range starts
+ * @end:	offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the given address space
+ * in the given range and wait for all of them.
+ *
+ * This is just a simple wrapper so that callers don't have to convert offsets
+ * to page indexes themselves
+ */
+int filemap_fdatawait_range(struct address_space *mapping, loff_t start,
+			    loff_t end)
+{
+	return wait_on_page_writeback_range(mapping, start >> PAGE_CACHE_SHIFT,
+					    end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL(filemap_fdatawait_range);
+
+/**
  * sync_page_range - write and wait on all pages in the passed range
  * @inode:	target inode
  * @mapping:	target address_space
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/16] vfs: Export __generic_file_aio_write() and add some comments
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
  2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
@ 2009-09-02 13:59   ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, ocfs2-devel, Joel Becker

Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.

CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/fs.h |    2 +
 mm/filemap.c       |   57 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dde471a..4f4e7f6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2195,6 +2195,8 @@ extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
 extern int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size);
 int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
 extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
+extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
+		loff_t *);
 extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
 extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *,
 		unsigned long, loff_t);
diff --git a/mm/filemap.c b/mm/filemap.c
index 65b2e50..554a396 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2368,9 +2368,27 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
-static ssize_t
-__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t *ppos)
+/**
+ * __generic_file_aio_write - write data to a file
+ * @iocb:	IO state structure (file, offset, etc.)
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @ppos:	position where to write
+ *
+ * This function does all the work needed for actually writing data to a
+ * file. It does all basic checks, removes SUID from the file, updates
+ * modification times and calls proper subroutines depending on whether we
+ * do direct IO or a standard buffered write.
+ *
+ * It expects i_mutex to be grabbed unless we work on a block device or similar
+ * object which does not need locking at all.
+ *
+ * This function does *not* take care of syncing data in case of O_SYNC write.
+ * A caller has to handle it. This is mainly due to the fact that we want to
+ * avoid syncing under i_mutex.
+ */
+ssize_t __generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
+				 unsigned long nr_segs, loff_t *ppos)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space * mapping = file->f_mapping;
@@ -2467,7 +2485,23 @@ out:
 	current->backing_dev_info = NULL;
 	return written ? written : err;
 }
+EXPORT_SYMBOL(__generic_file_aio_write);
+
 
+/**
+ * generic_file_aio_write_nolock - write data, usually to a device
+ * @iocb:	IO state structure
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @pos:	position in file where to write
+ *
+ * This is a wrapper around __generic_file_aio_write() which takes care of
+ * syncing the file in case of O_SYNC file. It does not take i_mutex for the
+ * write itself but may do so during syncing. It is meant for users like block
+ * devices which do not need i_mutex during write. If your filesystem needs to
+ * do a write but already holds i_mutex, use __generic_file_aio_write()
+ * directly and then sync the file like generic_file_aio_write().
+ */
 ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 		const struct iovec *iov, unsigned long nr_segs, loff_t pos)
 {
@@ -2478,8 +2512,7 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
-			&iocb->ki_pos);
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
 	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
@@ -2492,6 +2525,17 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 }
 EXPORT_SYMBOL(generic_file_aio_write_nolock);
 
+/**
+ * generic_file_aio_write - write data to a file
+ * @iocb:	IO state structure
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @pos:	position in file where to write
+ *
+ * This is a wrapper around __generic_file_aio_write() to be used by most
+ * filesystems. It takes care of syncing the file in case of O_SYNC file
+ * and acquires i_mutex as needed.
+ */
 ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
@@ -2503,8 +2547,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	BUG_ON(iocb->ki_pos != pos);
 
 	mutex_lock(&inode->i_mutex);
-	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
-			&iocb->ki_pos);
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
 	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/16] vfs: Export __generic_file_aio_write() and add some comments
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, LKML, ocfs2-devel

Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.

CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/fs.h |    2 +
 mm/filemap.c       |   57 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dde471a..4f4e7f6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2195,6 +2195,8 @@ extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
 extern int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size);
 int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
 extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
+extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
+		loff_t *);
 extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
 extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *,
 		unsigned long, loff_t);
diff --git a/mm/filemap.c b/mm/filemap.c
index 65b2e50..554a396 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2368,9 +2368,27 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
-static ssize_t
-__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t *ppos)
+/**
+ * __generic_file_aio_write - write data to a file
+ * @iocb:	IO state structure (file, offset, etc.)
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @ppos:	position where to write
+ *
+ * This function does all the work needed for actually writing data to a
+ * file. It does all basic checks, removes SUID from the file, updates
+ * modification times and calls proper subroutines depending on whether we
+ * do direct IO or a standard buffered write.
+ *
+ * It expects i_mutex to be grabbed unless we work on a block device or similar
+ * object which does not need locking at all.
+ *
+ * This function does *not* take care of syncing data in case of O_SYNC write.
+ * A caller has to handle it. This is mainly due to the fact that we want to
+ * avoid syncing under i_mutex.
+ */
+ssize_t __generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
+				 unsigned long nr_segs, loff_t *ppos)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space * mapping = file->f_mapping;
@@ -2467,7 +2485,23 @@ out:
 	current->backing_dev_info = NULL;
 	return written ? written : err;
 }
+EXPORT_SYMBOL(__generic_file_aio_write);
+
 
+/**
+ * generic_file_aio_write_nolock - write data, usually to a device
+ * @iocb:	IO state structure
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @pos:	position in file where to write
+ *
+ * This is a wrapper around __generic_file_aio_write() which takes care of
+ * syncing the file in case of O_SYNC file. It does not take i_mutex for the
+ * write itself but may do so during syncing. It is meant for users like block
+ * devices which do not need i_mutex during write. If your filesystem needs to
+ * do a write but already holds i_mutex, use __generic_file_aio_write()
+ * directly and then sync the file like generic_file_aio_write().
+ */
 ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 		const struct iovec *iov, unsigned long nr_segs, loff_t pos)
 {
@@ -2478,8 +2512,7 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
-			&iocb->ki_pos);
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
 	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
@@ -2492,6 +2525,17 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 }
 EXPORT_SYMBOL(generic_file_aio_write_nolock);
 
+/**
+ * generic_file_aio_write - write data to a file
+ * @iocb:	IO state structure
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @pos:	position in file where to write
+ *
+ * This is a wrapper around __generic_file_aio_write() to be used by most
+ * filesystems. It takes care of syncing the file in case of O_SYNC file
+ * and acquires i_mutex as needed.
+ */
 ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
@@ -2503,8 +2547,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	BUG_ON(iocb->ki_pos != pos);
 
 	mutex_lock(&inode->i_mutex);
-	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
-			&iocb->ki_pos);
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
 	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 02/16] vfs: Export __generic_file_aio_write() and add some comments
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, LKML, ocfs2-devel

Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.

CC: ocfs2-devel at oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/fs.h |    2 +
 mm/filemap.c       |   57 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index dde471a..4f4e7f6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2195,6 +2195,8 @@ extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
 extern int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size);
 int generic_write_checks(struct file *file, loff_t *pos, size_t *count, int isblk);
 extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsigned long, loff_t);
+extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
+		loff_t *);
 extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
 extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *,
 		unsigned long, loff_t);
diff --git a/mm/filemap.c b/mm/filemap.c
index 65b2e50..554a396 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2368,9 +2368,27 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
-static ssize_t
-__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t *ppos)
+/**
+ * __generic_file_aio_write - write data to a file
+ * @iocb:	IO state structure (file, offset, etc.)
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @ppos:	position where to write
+ *
+ * This function does all the work needed for actually writing data to a
+ * file. It does all basic checks, removes SUID from the file, updates
+ * modification times and calls proper subroutines depending on whether we
+ * do direct IO or a standard buffered write.
+ *
+ * It expects i_mutex to be grabbed unless we work on a block device or similar
+ * object which does not need locking at all.
+ *
+ * This function does *not* take care of syncing data in case of O_SYNC write.
+ * A caller has to handle it. This is mainly due to the fact that we want to
+ * avoid syncing under i_mutex.
+ */
+ssize_t __generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
+				 unsigned long nr_segs, loff_t *ppos)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space * mapping = file->f_mapping;
@@ -2467,7 +2485,23 @@ out:
 	current->backing_dev_info = NULL;
 	return written ? written : err;
 }
+EXPORT_SYMBOL(__generic_file_aio_write);
+
 
+/**
+ * generic_file_aio_write_nolock - write data, usually to a device
+ * @iocb:	IO state structure
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @pos:	position in file where to write
+ *
+ * This is a wrapper around __generic_file_aio_write() which takes care of
+ * syncing the file in case of O_SYNC file. It does not take i_mutex for the
+ * write itself but may do so during syncing. It is meant for users like block
+ * devices which do not need i_mutex during write. If your filesystem needs to
+ * do a write but already holds i_mutex, use __generic_file_aio_write()
+ * directly and then sync the file like generic_file_aio_write().
+ */
 ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 		const struct iovec *iov, unsigned long nr_segs, loff_t pos)
 {
@@ -2478,8 +2512,7 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
-			&iocb->ki_pos);
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
 	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
@@ -2492,6 +2525,17 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 }
 EXPORT_SYMBOL(generic_file_aio_write_nolock);
 
+/**
+ * generic_file_aio_write - write data to a file
+ * @iocb:	IO state structure
+ * @iov:	vector with data to write
+ * @nr_segs:	number of segments in the vector
+ * @pos:	position in file where to write
+ *
+ * This is a wrapper around __generic_file_aio_write() to be used by most
+ * filesystems. It takes care of syncing the file in case of O_SYNC file
+ * and acquires i_mutex as needed.
+ */
 ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
@@ -2503,8 +2547,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	BUG_ON(iocb->ki_pos != pos);
 
 	mutex_lock(&inode->i_mutex);
-	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
-			&iocb->ki_pos);
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
 	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/16] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
  2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
  2009-09-02 13:59   ` Jan Kara
@ 2009-09-02 13:59   ` Jan Kara
  2009-09-02 13:59 ` [PATCH 04/16] pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Jan Kara
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: LKML, hch, Jan Kara, ocfs2-devel, Joel Becker, Felix Blyakher, xfs

generic_file_direct_write() and generic_file_buffered_write() called
generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
this is superfluous since generic_file_aio_write() does the syncing as well.
Also XFS and OCFS2 which call these functions directly handle syncing
themselves. So let's have a single place where syncing happens:
generic_file_aio_write().

We slightly change the behavior by syncing only the range of file to which the
write happened for buffered writes but that should be all that is required.

CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c |   35 ++++++-----------------------------
 1 files changed, 6 insertions(+), 29 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 554a396..f863e1d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2187,20 +2187,7 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 		}
 		*ppos = end;
 	}
-
-	/*
-	 * Sync the fs metadata but not the minor inode changes and
-	 * of course not the data as we did direct DMA for the IO.
-	 * i_mutex is held, which protects generic_osync_inode() from
-	 * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
-	 */
 out:
-	if ((written >= 0 || written == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-		int err = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (err < 0)
-			written = err;
-	}
 	return written;
 }
 EXPORT_SYMBOL(generic_file_direct_write);
@@ -2332,8 +2319,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
-	const struct address_space_operations *a_ops = mapping->a_ops;
-	struct inode *inode = mapping->host;
 	ssize_t status;
 	struct iov_iter i;
 
@@ -2343,16 +2328,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	if (likely(status >= 0)) {
 		written += status;
 		*ppos = pos + status;
-
-		/*
-		 * For now, when the user asks for O_SYNC, we'll actually give
-		 * O_DSYNC
-		 */
-		if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			if (!a_ops->writepage || !is_sync_kiocb(iocb))
-				status = generic_osync_inode(inode, mapping,
-						OSYNC_METADATA|OSYNC_DATA);
-		}
   	}
 	
 	/*
@@ -2514,11 +2489,12 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range_nolock(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
@@ -2550,11 +2526,12 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/16] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, Felix Blyakher, LKML, xfs, ocfs2-devel

generic_file_direct_write() and generic_file_buffered_write() called
generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
this is superfluous since generic_file_aio_write() does the syncing as well.
Also XFS and OCFS2 which call these functions directly handle syncing
themselves. So let's have a single place where syncing happens:
generic_file_aio_write().

We slightly change the behavior by syncing only the range of file to which the
write happened for buffered writes but that should be all that is required.

CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c |   35 ++++++-----------------------------
 1 files changed, 6 insertions(+), 29 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 554a396..f863e1d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2187,20 +2187,7 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 		}
 		*ppos = end;
 	}
-
-	/*
-	 * Sync the fs metadata but not the minor inode changes and
-	 * of course not the data as we did direct DMA for the IO.
-	 * i_mutex is held, which protects generic_osync_inode() from
-	 * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
-	 */
 out:
-	if ((written >= 0 || written == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-		int err = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (err < 0)
-			written = err;
-	}
 	return written;
 }
 EXPORT_SYMBOL(generic_file_direct_write);
@@ -2332,8 +2319,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
-	const struct address_space_operations *a_ops = mapping->a_ops;
-	struct inode *inode = mapping->host;
 	ssize_t status;
 	struct iov_iter i;
 
@@ -2343,16 +2328,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	if (likely(status >= 0)) {
 		written += status;
 		*ppos = pos + status;
-
-		/*
-		 * For now, when the user asks for O_SYNC, we'll actually give
-		 * O_DSYNC
-		 */
-		if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			if (!a_ops->writepage || !is_sync_kiocb(iocb))
-				status = generic_osync_inode(inode, mapping,
-						OSYNC_METADATA|OSYNC_DATA);
-		}
   	}
 	
 	/*
@@ -2514,11 +2489,12 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range_nolock(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
@@ -2550,11 +2526,12 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/16] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, Joel Becker, LKML, xfs, hch, ocfs2-devel

generic_file_direct_write() and generic_file_buffered_write() called
generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
this is superfluous since generic_file_aio_write() does the syncing as well.
Also XFS and OCFS2 which call these functions directly handle syncing
themselves. So let's have a single place where syncing happens:
generic_file_aio_write().

We slightly change the behavior by syncing only the range of file to which the
write happened for buffered writes but that should be all that is required.

CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c |   35 ++++++-----------------------------
 1 files changed, 6 insertions(+), 29 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 554a396..f863e1d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2187,20 +2187,7 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 		}
 		*ppos = end;
 	}
-
-	/*
-	 * Sync the fs metadata but not the minor inode changes and
-	 * of course not the data as we did direct DMA for the IO.
-	 * i_mutex is held, which protects generic_osync_inode() from
-	 * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
-	 */
 out:
-	if ((written >= 0 || written == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-		int err = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (err < 0)
-			written = err;
-	}
 	return written;
 }
 EXPORT_SYMBOL(generic_file_direct_write);
@@ -2332,8 +2319,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
-	const struct address_space_operations *a_ops = mapping->a_ops;
-	struct inode *inode = mapping->host;
 	ssize_t status;
 	struct iov_iter i;
 
@@ -2343,16 +2328,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	if (likely(status >= 0)) {
 		written += status;
 		*ppos = pos + status;
-
-		/*
-		 * For now, when the user asks for O_SYNC, we'll actually give
-		 * O_DSYNC
-		 */
-		if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			if (!a_ops->writepage || !is_sync_kiocb(iocb))
-				status = generic_osync_inode(inode, mapping,
-						OSYNC_METADATA|OSYNC_DATA);
-		}
   	}
 	
 	/*
@@ -2514,11 +2489,12 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range_nolock(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
@@ -2550,11 +2526,12 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
-- 
1.6.0.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 03/16] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, Felix Blyakher, LKML, xfs, ocfs2-devel

generic_file_direct_write() and generic_file_buffered_write() called
generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
this is superfluous since generic_file_aio_write() does the syncing as well.
Also XFS and OCFS2 which call these functions directly handle syncing
themselves. So let's have a single place where syncing happens:
generic_file_aio_write().

We slightly change the behavior by syncing only the range of file to which the
write happened for buffered writes but that should be all that is required.

CC: ocfs2-devel at oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs at oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c |   35 ++++++-----------------------------
 1 files changed, 6 insertions(+), 29 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 554a396..f863e1d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2187,20 +2187,7 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
 		}
 		*ppos = end;
 	}
-
-	/*
-	 * Sync the fs metadata but not the minor inode changes and
-	 * of course not the data as we did direct DMA for the IO.
-	 * i_mutex is held, which protects generic_osync_inode() from
-	 * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
-	 */
 out:
-	if ((written >= 0 || written == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-		int err = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (err < 0)
-			written = err;
-	}
 	return written;
 }
 EXPORT_SYMBOL(generic_file_direct_write);
@@ -2332,8 +2319,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
-	const struct address_space_operations *a_ops = mapping->a_ops;
-	struct inode *inode = mapping->host;
 	ssize_t status;
 	struct iov_iter i;
 
@@ -2343,16 +2328,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	if (likely(status >= 0)) {
 		written += status;
 		*ppos = pos + status;
-
-		/*
-		 * For now, when the user asks for O_SYNC, we'll actually give
-		 * O_DSYNC
-		 */
-		if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			if (!a_ops->writepage || !is_sync_kiocb(iocb))
-				status = generic_osync_inode(inode, mapping,
-						OSYNC_METADATA|OSYNC_DATA);
-		}
   	}
 	
 	/*
@@ -2514,11 +2489,12 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range_nolock(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
@@ -2550,11 +2526,12 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if ((ret > 0 || ret == -EIOCBQUEUED) &&
+	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		ssize_t err;
 
 		err = sync_page_range(inode, mapping, pos, ret);
-		if (err < 0)
+		if (err < 0 && ret > 0)
 			ret = err;
 	}
 	return ret;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/16] pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (2 preceding siblings ...)
  2009-09-02 13:59   ` Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, Evgeniy Polyakov

Use new helper __generic_file_aio_write(). Since the fs takes care of syncing
by itself afterwards, there are no more changes needed.

CC: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/staging/pohmelfs/inode.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/pohmelfs/inode.c b/drivers/staging/pohmelfs/inode.c
index 7b60579..17801a5 100644
--- a/drivers/staging/pohmelfs/inode.c
+++ b/drivers/staging/pohmelfs/inode.c
@@ -921,7 +921,7 @@ ssize_t pohmelfs_write(struct file *file, const char __user *buf,
 	if (ret)
 		goto err_out_unlock;
 
-	ret = generic_file_aio_write_nolock(&kiocb, &iov, 1, pos);
+	ret = __generic_file_aio_write(&kiocb, &iov, 1, &kiocb.ki_pos);
 	*ppos = kiocb.ki_pos;
 
 	mutex_unlock(&inode->i_mutex);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/16] ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
  2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
@ 2009-09-02 13:59   ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, ocfs2-devel

Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.

CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   22 ++++++++++++----------
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index aa501d3..6002273 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1871,8 +1871,7 @@ relock:
 			goto out_dio;
 		}
 	} else {
-		written = generic_file_aio_write_nolock(iocb, iov, nr_segs,
-							*ppos);
+		written = __generic_file_aio_write(iocb, iov, nr_segs, ppos);
 	}
 
 out_dio:
@@ -1880,18 +1879,21 @@ out_dio:
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
 	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
-		/*
-		 * The generic write paths have handled getting data
-		 * to disk, but since we don't make use of the dirty
-		 * inode list, a manual journal commit is necessary
-		 * here.
-		 */
-		if (old_size != i_size_read(inode) ||
-		    old_clusters != OCFS2_I(inode)->ip_clusters) {
+		ret = filemap_fdatawrite_range(file->f_mapping, pos,
+					       pos + count - 1);
+		if (ret < 0)
+			written = ret;
+
+		if (!ret && (old_size != i_size_read(inode) ||
+		    old_clusters != OCFS2_I(inode)->ip_clusters)) {
 			ret = jbd2_journal_force_commit(osb->journal->j_journal);
 			if (ret < 0)
 				written = ret;
 		}
+
+		if (!ret)
+			ret = filemap_fdatawait_range(file->f_mapping, pos,
+						      pos + count - 1);
 	}
 
 	/* 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/16] ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, LKML, ocfs2-devel

Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.

CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   22 ++++++++++++----------
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index aa501d3..6002273 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1871,8 +1871,7 @@ relock:
 			goto out_dio;
 		}
 	} else {
-		written = generic_file_aio_write_nolock(iocb, iov, nr_segs,
-							*ppos);
+		written = __generic_file_aio_write(iocb, iov, nr_segs, ppos);
 	}
 
 out_dio:
@@ -1880,18 +1879,21 @@ out_dio:
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
 	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
-		/*
-		 * The generic write paths have handled getting data
-		 * to disk, but since we don't make use of the dirty
-		 * inode list, a manual journal commit is necessary
-		 * here.
-		 */
-		if (old_size != i_size_read(inode) ||
-		    old_clusters != OCFS2_I(inode)->ip_clusters) {
+		ret = filemap_fdatawrite_range(file->f_mapping, pos,
+					       pos + count - 1);
+		if (ret < 0)
+			written = ret;
+
+		if (!ret && (old_size != i_size_read(inode) ||
+		    old_clusters != OCFS2_I(inode)->ip_clusters)) {
 			ret = jbd2_journal_force_commit(osb->journal->j_journal);
 			if (ret < 0)
 				written = ret;
 		}
+
+		if (!ret)
+			ret = filemap_fdatawait_range(file->f_mapping, pos,
+						      pos + count - 1);
 	}
 
 	/* 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 05/16] ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, LKML, ocfs2-devel

Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.

CC: ocfs2-devel at oss.oracle.com
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   22 ++++++++++++----------
 1 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index aa501d3..6002273 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1871,8 +1871,7 @@ relock:
 			goto out_dio;
 		}
 	} else {
-		written = generic_file_aio_write_nolock(iocb, iov, nr_segs,
-							*ppos);
+		written = __generic_file_aio_write(iocb, iov, nr_segs, ppos);
 	}
 
 out_dio:
@@ -1880,18 +1879,21 @@ out_dio:
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
 	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
-		/*
-		 * The generic write paths have handled getting data
-		 * to disk, but since we don't make use of the dirty
-		 * inode list, a manual journal commit is necessary
-		 * here.
-		 */
-		if (old_size != i_size_read(inode) ||
-		    old_clusters != OCFS2_I(inode)->ip_clusters) {
+		ret = filemap_fdatawrite_range(file->f_mapping, pos,
+					       pos + count - 1);
+		if (ret < 0)
+			written = ret;
+
+		if (!ret && (old_size != i_size_read(inode) ||
+		    old_clusters != OCFS2_I(inode)->ip_clusters)) {
 			ret = jbd2_journal_force_commit(osb->journal->j_journal);
 			if (ret < 0)
 				written = ret;
 		}
+
+		if (!ret)
+			ret = filemap_fdatawait_range(file->f_mapping, pos,
+						      pos + count - 1);
 	}
 
 	/* 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (4 preceding siblings ...)
  2009-09-02 13:59   ` Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 21:47   ` Christoph Hellwig
  2009-09-02 13:59   ` Jan Kara
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara

generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
device_aio_write().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/char/raw.c |    2 +-
 fs/block_dev.c     |    2 +-
 include/linux/fs.h |    4 ++--
 mm/filemap.c       |    9 ++++-----
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/char/raw.c b/drivers/char/raw.c
index 05f9d18..c43c7a7 100644
--- a/drivers/char/raw.c
+++ b/drivers/char/raw.c
@@ -246,7 +246,7 @@ static const struct file_operations raw_fops = {
 	.read	=	do_sync_read,
 	.aio_read = 	generic_file_aio_read,
 	.write	=	do_sync_write,
-	.aio_write = 	generic_file_aio_write_nolock,
+	.aio_write =	device_aio_write,
 	.open	=	raw_open,
 	.release=	raw_release,
 	.ioctl	=	raw_ioctl,
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 94dfda2..67fc1c9 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1436,7 +1436,7 @@ const struct file_operations def_blk_fops = {
 	.read		= do_sync_read,
 	.write		= do_sync_write,
   	.aio_read	= generic_file_aio_read,
-  	.aio_write	= generic_file_aio_write_nolock,
+	.aio_write	= device_aio_write,
 	.mmap		= generic_file_mmap,
 	.fsync		= block_fsync,
 	.unlocked_ioctl	= block_ioctl,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f4e7f6..bc7f0f1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2198,8 +2198,8 @@ extern ssize_t generic_file_aio_read(struct kiocb *, const struct iovec *, unsig
 extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
 		loff_t *);
 extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
-extern ssize_t generic_file_aio_write_nolock(struct kiocb *, const struct iovec *,
-		unsigned long, loff_t);
+extern ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
+				unsigned long nr_segs, loff_t pos);
 extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *,
 		unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
diff --git a/mm/filemap.c b/mm/filemap.c
index f863e1d..3955f7e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2462,9 +2462,8 @@ out:
 }
 EXPORT_SYMBOL(__generic_file_aio_write);
 
-
 /**
- * generic_file_aio_write_nolock - write data, usually to a device
+ * device_aio_write - write data
  * @iocb:	IO state structure
  * @iov:	vector with data to write
  * @nr_segs:	number of segments in the vector
@@ -2477,8 +2476,8 @@ EXPORT_SYMBOL(__generic_file_aio_write);
  * do a write but already holds i_mutex, use __generic_file_aio_write()
  * directly and then sync the file like generic_file_aio_write().
  */
-ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
-		const struct iovec *iov, unsigned long nr_segs, loff_t pos)
+ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
+			 unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
@@ -2499,7 +2498,7 @@ ssize_t generic_file_aio_write_nolock(struct kiocb *iocb,
 	}
 	return ret;
 }
-EXPORT_SYMBOL(generic_file_aio_write_nolock);
+EXPORT_SYMBOL(device_aio_write);
 
 /**
  * generic_file_aio_write - write data to a file
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/16] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
  2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
@ 2009-09-02 13:59   ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: LKML, hch, Jan Kara, Evgeniy Polyakov, ocfs2-devel, Joel Becker,
	Felix Blyakher, xfs, Anton Altaparmakov, linux-ntfs-dev,
	OGAWA Hirofumi, linux-ext4, tytso

Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.

Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.

CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/splice.c        |   22 +++++---------------
 fs/sync.c          |   55 +++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h |    3 ++
 mm/filemap.c       |   18 +++++-----------
 4 files changed, 63 insertions(+), 35 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 73766d2..8190237 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -976,25 +976,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	if (ret > 0) {
 		unsigned long nr_pages;
+		int err;
 
-		*ppos += ret;
 		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 
-		/*
-		 * If file or inode is SYNC and we actually wrote some data,
-		 * sync it.
-		 */
-		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			int err;
-
-			mutex_lock(&inode->i_mutex);
-			err = generic_osync_inode(inode, mapping,
-						  OSYNC_METADATA|OSYNC_DATA);
-			mutex_unlock(&inode->i_mutex);
-
-			if (err)
-				ret = err;
-		}
+		err = generic_write_sync(out, *ppos, ret);
+		if (err)
+			ret = err;
+		else
+			*ppos += ret;
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
diff --git a/fs/sync.c b/fs/sync.c
index 3422ba6..6fe72e6 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -176,19 +176,23 @@ int file_fsync(struct file *filp, struct dentry *dentry, int datasync)
 }
 
 /**
- * vfs_fsync - perform a fsync or fdatasync on a file
+ * vfs_fsync_range - helper to sync a range of data & metadata to disk
  * @file:		file to sync
  * @dentry:		dentry of @file
- * @data:		only perform a fdatasync operation
+ * @start:		offset in bytes of the beginning of data range to sync
+ * @end:		offset in bytes of the end of data range (inclusive)
+ * @datasync:		perform only datasync
  *
- * Write back data and metadata for @file to disk.  If @datasync is
- * set only metadata needed to access modified file data is written.
+ * Write back data in range @start..@end and metadata for @file to disk.  If
+ * @datasync is set only metadata needed to access modified file data is
+ * written.
  *
  * In case this function is called from nfsd @file may be %NULL and
  * only @dentry is set.  This can only happen when the filesystem
  * implements the export_operations API.
  */
-int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+int vfs_fsync_range(struct file *file, struct dentry *dentry, loff_t start,
+		    loff_t end, int datasync)
 {
 	const struct file_operations *fop;
 	struct address_space *mapping;
@@ -212,7 +216,7 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 		goto out;
 	}
 
-	ret = filemap_fdatawrite(mapping);
+	ret = filemap_fdatawrite_range(mapping, start, end);
 
 	/*
 	 * We need to protect against concurrent writers, which could cause
@@ -223,12 +227,32 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	if (!ret)
 		ret = err;
 	mutex_unlock(&mapping->host->i_mutex);
-	err = filemap_fdatawait(mapping);
+
+	err = filemap_fdatawait_range(mapping, start, end);
 	if (!ret)
 		ret = err;
 out:
 	return ret;
 }
+EXPORT_SYMBOL(vfs_fsync_range);
+
+/**
+ * vfs_fsync - perform a fsync or fdatasync on a file
+ * @file:		file to sync
+ * @dentry:		dentry of @file
+ * @datasync:		only perform a fdatasync operation
+ *
+ * Write back data and metadata for @file to disk.  If @datasync is
+ * set only metadata needed to access modified file data is written.
+ *
+ * In case this function is called from nfsd @file may be %NULL and
+ * only @dentry is set.  This can only happen when the filesystem
+ * implements the export_operations API.
+ */
+int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+{
+	return vfs_fsync_range(file, dentry, 0, LLONG_MAX, datasync);
+}
 EXPORT_SYMBOL(vfs_fsync);
 
 static int do_fsync(unsigned int fd, int datasync)
@@ -254,6 +278,23 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
 	return do_fsync(fd, 1);
 }
 
+/**
+ * generic_write_sync - perform syncing after a write if file / inode is sync
+ * @file:	file to which the write happened
+ * @pos:	offset where the write started
+ * @count:	length of the write
+ *
+ * This is just a simple wrapper about our general syncing function.
+ */
+int generic_write_sync(struct file *file, loff_t pos, loff_t count)
+{
+	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
+		return 0;
+	return vfs_fsync_range(file, file->f_path.dentry, pos,
+			       pos + count - 1, 1);
+}
+EXPORT_SYMBOL(generic_write_sync);
+
 /*
  * sys_sync_file_range() permits finely controlled syncing over a segment of
  * a file in the range offset .. (offset+nbytes-1) inclusive.  If nbytes is
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bc7f0f1..18acaec 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2088,7 +2088,10 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 
+extern int vfs_fsync_range(struct file *file, struct dentry *dentry,
+			   loff_t start, loff_t end, int datasync);
 extern int vfs_fsync(struct file *file, struct dentry *dentry, int datasync);
+extern int generic_write_sync(struct file *file, loff_t pos, loff_t count);
 extern void sync_supers(void);
 extern void emergency_sync(void);
 extern void emergency_remount(void);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3955f7e..70988a1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -39,11 +39,10 @@
 /*
  * FIXME: remove all knowledge of the buffer layer from the core VM
  */
-#include <linux/buffer_head.h> /* for generic_osync_inode */
+#include <linux/buffer_head.h> /* for try_to_free_buffers */
 
 #include <asm/mman.h>
 
-
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -2480,19 +2479,16 @@ ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
 			 unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if ((ret > 0 || ret == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0 || ret == -EIOCBQUEUED) {
 		ssize_t err;
 
-		err = sync_page_range_nolock(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0 && ret > 0)
 			ret = err;
 	}
@@ -2515,8 +2511,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
+	struct inode *inode = file->f_mapping->host;
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
@@ -2525,11 +2520,10 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if ((ret > 0 || ret == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0 || ret == -EIOCBQUEUED) {
 		ssize_t err;
 
-		err = sync_page_range(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0 && ret > 0)
 			ret = err;
 	}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/16] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: tytso, linux-ext4, Jan Kara, linux-ntfs-dev, Joel Becker, LKML,
	Anton Altaparmakov, OGAWA Hirofumi, Evgeniy Polyakov, xfs, hch,
	ocfs2-devel

Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.

Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.

CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/splice.c        |   22 +++++---------------
 fs/sync.c          |   55 +++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h |    3 ++
 mm/filemap.c       |   18 +++++-----------
 4 files changed, 63 insertions(+), 35 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 73766d2..8190237 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -976,25 +976,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	if (ret > 0) {
 		unsigned long nr_pages;
+		int err;
 
-		*ppos += ret;
 		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 
-		/*
-		 * If file or inode is SYNC and we actually wrote some data,
-		 * sync it.
-		 */
-		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			int err;
-
-			mutex_lock(&inode->i_mutex);
-			err = generic_osync_inode(inode, mapping,
-						  OSYNC_METADATA|OSYNC_DATA);
-			mutex_unlock(&inode->i_mutex);
-
-			if (err)
-				ret = err;
-		}
+		err = generic_write_sync(out, *ppos, ret);
+		if (err)
+			ret = err;
+		else
+			*ppos += ret;
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
diff --git a/fs/sync.c b/fs/sync.c
index 3422ba6..6fe72e6 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -176,19 +176,23 @@ int file_fsync(struct file *filp, struct dentry *dentry, int datasync)
 }
 
 /**
- * vfs_fsync - perform a fsync or fdatasync on a file
+ * vfs_fsync_range - helper to sync a range of data & metadata to disk
  * @file:		file to sync
  * @dentry:		dentry of @file
- * @data:		only perform a fdatasync operation
+ * @start:		offset in bytes of the beginning of data range to sync
+ * @end:		offset in bytes of the end of data range (inclusive)
+ * @datasync:		perform only datasync
  *
- * Write back data and metadata for @file to disk.  If @datasync is
- * set only metadata needed to access modified file data is written.
+ * Write back data in range @start..@end and metadata for @file to disk.  If
+ * @datasync is set only metadata needed to access modified file data is
+ * written.
  *
  * In case this function is called from nfsd @file may be %NULL and
  * only @dentry is set.  This can only happen when the filesystem
  * implements the export_operations API.
  */
-int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+int vfs_fsync_range(struct file *file, struct dentry *dentry, loff_t start,
+		    loff_t end, int datasync)
 {
 	const struct file_operations *fop;
 	struct address_space *mapping;
@@ -212,7 +216,7 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 		goto out;
 	}
 
-	ret = filemap_fdatawrite(mapping);
+	ret = filemap_fdatawrite_range(mapping, start, end);
 
 	/*
 	 * We need to protect against concurrent writers, which could cause
@@ -223,12 +227,32 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	if (!ret)
 		ret = err;
 	mutex_unlock(&mapping->host->i_mutex);
-	err = filemap_fdatawait(mapping);
+
+	err = filemap_fdatawait_range(mapping, start, end);
 	if (!ret)
 		ret = err;
 out:
 	return ret;
 }
+EXPORT_SYMBOL(vfs_fsync_range);
+
+/**
+ * vfs_fsync - perform a fsync or fdatasync on a file
+ * @file:		file to sync
+ * @dentry:		dentry of @file
+ * @datasync:		only perform a fdatasync operation
+ *
+ * Write back data and metadata for @file to disk.  If @datasync is
+ * set only metadata needed to access modified file data is written.
+ *
+ * In case this function is called from nfsd @file may be %NULL and
+ * only @dentry is set.  This can only happen when the filesystem
+ * implements the export_operations API.
+ */
+int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+{
+	return vfs_fsync_range(file, dentry, 0, LLONG_MAX, datasync);
+}
 EXPORT_SYMBOL(vfs_fsync);
 
 static int do_fsync(unsigned int fd, int datasync)
@@ -254,6 +278,23 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
 	return do_fsync(fd, 1);
 }
 
+/**
+ * generic_write_sync - perform syncing after a write if file / inode is sync
+ * @file:	file to which the write happened
+ * @pos:	offset where the write started
+ * @count:	length of the write
+ *
+ * This is just a simple wrapper about our general syncing function.
+ */
+int generic_write_sync(struct file *file, loff_t pos, loff_t count)
+{
+	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
+		return 0;
+	return vfs_fsync_range(file, file->f_path.dentry, pos,
+			       pos + count - 1, 1);
+}
+EXPORT_SYMBOL(generic_write_sync);
+
 /*
  * sys_sync_file_range() permits finely controlled syncing over a segment of
  * a file in the range offset .. (offset+nbytes-1) inclusive.  If nbytes is
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bc7f0f1..18acaec 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2088,7 +2088,10 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 
+extern int vfs_fsync_range(struct file *file, struct dentry *dentry,
+			   loff_t start, loff_t end, int datasync);
 extern int vfs_fsync(struct file *file, struct dentry *dentry, int datasync);
+extern int generic_write_sync(struct file *file, loff_t pos, loff_t count);
 extern void sync_supers(void);
 extern void emergency_sync(void);
 extern void emergency_remount(void);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3955f7e..70988a1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -39,11 +39,10 @@
 /*
  * FIXME: remove all knowledge of the buffer layer from the core VM
  */
-#include <linux/buffer_head.h> /* for generic_osync_inode */
+#include <linux/buffer_head.h> /* for try_to_free_buffers */
 
 #include <asm/mman.h>
 
-
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -2480,19 +2479,16 @@ ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
 			 unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if ((ret > 0 || ret == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0 || ret == -EIOCBQUEUED) {
 		ssize_t err;
 
-		err = sync_page_range_nolock(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0 && ret > 0)
 			ret = err;
 	}
@@ -2515,8 +2511,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
+	struct inode *inode = file->f_mapping->host;
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
@@ -2525,11 +2520,10 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if ((ret > 0 || ret == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0 || ret == -EIOCBQUEUED) {
 		ssize_t err;
 
-		err = sync_page_range(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0 && ret > 0)
 			ret = err;
 	}
-- 
1.6.0.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/16] ext2: Update comment about generic_osync_inode
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (6 preceding siblings ...)
  2009-09-02 13:59   ` Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59 ` [PATCH 09/16] ext3: Remove syncing logic from ext3_file_write Jan Kara
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, linux-ext4

We rely on generic_write_sync() now.

CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/inode.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index e271303..1c1638f 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -482,7 +482,7 @@ static int ext2_alloc_branch(struct inode *inode,
 		unlock_buffer(bh);
 		mark_buffer_dirty_inode(bh, inode);
 		/* We used to sync bh here if IS_SYNC(inode).
-		 * But we now rely upon generic_osync_inode()
+		 * But we now rely upon generic_write_sync()
 		 * and b_inode_buffers.  But not for directories.
 		 */
 		if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode))
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/16] ext3: Remove syncing logic from ext3_file_write
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (7 preceding siblings ...)
  2009-09-02 13:59 ` [PATCH 08/16] ext2: Update comment about generic_osync_inode Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59 ` [PATCH 10/16] ext4: Remove syncing logic from ext4_file_write Jan Kara
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, linux-ext4

Syncing is now properly done by generic_file_aio_write() so no special logic is
needed in ext3.

CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/file.c |   61 +-------------------------------------------------------
 1 files changed, 1 insertions(+), 60 deletions(-)

diff --git a/fs/ext3/file.c b/fs/ext3/file.c
index 5b49704..51fee5f 100644
--- a/fs/ext3/file.c
+++ b/fs/ext3/file.c
@@ -51,71 +51,12 @@ static int ext3_release_file (struct inode * inode, struct file * filp)
 	return 0;
 }
 
-static ssize_t
-ext3_file_write(struct kiocb *iocb, const struct iovec *iov,
-		unsigned long nr_segs, loff_t pos)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_path.dentry->d_inode;
-	ssize_t ret;
-	int err;
-
-	ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
-
-	/*
-	 * Skip flushing if there was an error, or if nothing was written.
-	 */
-	if (ret <= 0)
-		return ret;
-
-	/*
-	 * If the inode is IS_SYNC, or is O_SYNC and we are doing data
-	 * journalling then we need to make sure that we force the transaction
-	 * to disk to keep all metadata uptodate synchronously.
-	 */
-	if (file->f_flags & O_SYNC) {
-		/*
-		 * If we are non-data-journaled, then the dirty data has
-		 * already been flushed to backing store by generic_osync_inode,
-		 * and the inode has been flushed too if there have been any
-		 * modifications other than mere timestamp updates.
-		 *
-		 * Open question --- do we care about flushing timestamps too
-		 * if the inode is IS_SYNC?
-		 */
-		if (!ext3_should_journal_data(inode))
-			return ret;
-
-		goto force_commit;
-	}
-
-	/*
-	 * So we know that there has been no forced data flush.  If the inode
-	 * is marked IS_SYNC, we need to force one ourselves.
-	 */
-	if (!IS_SYNC(inode))
-		return ret;
-
-	/*
-	 * Open question #2 --- should we force data to disk here too?  If we
-	 * don't, the only impact is that data=writeback filesystems won't
-	 * flush data to disk automatically on IS_SYNC, only metadata (but
-	 * historically, that is what ext2 has done.)
-	 */
-
-force_commit:
-	err = ext3_force_commit(inode->i_sb);
-	if (err)
-		return err;
-	return ret;
-}
-
 const struct file_operations ext3_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
 	.write		= do_sync_write,
 	.aio_read	= generic_file_aio_read,
-	.aio_write	= ext3_file_write,
+	.aio_write	= generic_file_aio_write,
 	.unlocked_ioctl	= ext3_ioctl,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext3_compat_ioctl,
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/16] ext4: Remove syncing logic from ext4_file_write
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (8 preceding siblings ...)
  2009-09-02 13:59 ` [PATCH 09/16] ext3: Remove syncing logic from ext3_file_write Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59 ` [PATCH 11/16] ntfs: Use new syncing helpers and update comments Jan Kara
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, linux-ext4, tytso

The syncing is now properly handled by generic_file_aio_write() so
no special ext4 code is needed.

CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c |   53 ++---------------------------------------------------
 1 files changed, 2 insertions(+), 51 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 3f1873f..aafe432 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -58,10 +58,7 @@ static ssize_t
 ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_path.dentry->d_inode;
-	ssize_t ret;
-	int err;
+	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
 
 	/*
 	 * If we have encountered a bitmap-format file, the size limit
@@ -81,53 +78,7 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 		}
 	}
 
-	ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
-	/*
-	 * Skip flushing if there was an error, or if nothing was written.
-	 */
-	if (ret <= 0)
-		return ret;
-
-	/*
-	 * If the inode is IS_SYNC, or is O_SYNC and we are doing data
-	 * journalling then we need to make sure that we force the transaction
-	 * to disk to keep all metadata uptodate synchronously.
-	 */
-	if (file->f_flags & O_SYNC) {
-		/*
-		 * If we are non-data-journaled, then the dirty data has
-		 * already been flushed to backing store by generic_osync_inode,
-		 * and the inode has been flushed too if there have been any
-		 * modifications other than mere timestamp updates.
-		 *
-		 * Open question --- do we care about flushing timestamps too
-		 * if the inode is IS_SYNC?
-		 */
-		if (!ext4_should_journal_data(inode))
-			return ret;
-
-		goto force_commit;
-	}
-
-	/*
-	 * So we know that there has been no forced data flush.  If the inode
-	 * is marked IS_SYNC, we need to force one ourselves.
-	 */
-	if (!IS_SYNC(inode))
-		return ret;
-
-	/*
-	 * Open question #2 --- should we force data to disk here too?  If we
-	 * don't, the only impact is that data=writeback filesystems won't
-	 * flush data to disk automatically on IS_SYNC, only metadata (but
-	 * historically, that is what ext2 has done.)
-	 */
-
-force_commit:
-	err = ext4_force_commit(inode->i_sb);
-	if (err)
-		return err;
-	return ret;
+	return generic_file_aio_write(iocb, iov, nr_segs, pos);
 }
 
 static struct vm_operations_struct ext4_file_vm_ops = {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/16] ntfs: Use new syncing helpers and update comments
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (9 preceding siblings ...)
  2009-09-02 13:59 ` [PATCH 10/16] ext4: Remove syncing logic from ext4_file_write Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, Anton Altaparmakov, linux-ntfs-dev

Use new syncing helpers in .write and .aio_write functions. Also
remove superfluous syncing in ntfs_file_buffered_write() and update
comments about generic_osync_inode().

CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ntfs/file.c |   16 ++++------------
 fs/ntfs/mft.c  |   13 ++++++-------
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 3140a44..4350d49 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2076,14 +2076,6 @@ err_out:
 	*ppos = pos;
 	if (cached_page)
 		page_cache_release(cached_page);
-	/* For now, when the user asks for O_SYNC, we actually give O_DSYNC. */
-	if (likely(!status)) {
-		if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(vi))) {
-			if (!mapping->a_ops->writepage || !is_sync_kiocb(iocb))
-				status = generic_osync_inode(vi, mapping,
-						OSYNC_METADATA|OSYNC_DATA);
-		}
-  	}
 	pagevec_lru_add_file(&lru_pvec);
 	ntfs_debug("Done.  Returning %s (written 0x%lx, status %li).",
 			written ? "written" : "status", (unsigned long)written,
@@ -2145,8 +2137,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	mutex_lock(&inode->i_mutex);
 	ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-		int err = sync_page_range(inode, mapping, pos, ret);
+	if (ret > 0) {
+		int err = generic_write_sync(file, pos, ret);
 		if (err < 0)
 			ret = err;
 	}
@@ -2173,8 +2165,8 @@ static ssize_t ntfs_file_writev(struct file *file, const struct iovec *iov,
 	if (ret == -EIOCBQUEUED)
 		ret = wait_on_sync_kiocb(&kiocb);
 	mutex_unlock(&inode->i_mutex);
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
-		int err = sync_page_range(inode, mapping, *ppos - ret, ret);
+	if (ret > 0) {
+		int err = generic_write_sync(file, *ppos - ret, ret);
 		if (err < 0)
 			ret = err;
 	}
diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 23bf684..1caa0ef 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -384,13 +384,12 @@ unm_err_out:
  * it is dirty in the inode meta data rather than the data page cache of the
  * inode, and thus there are no data pages that need writing out.  Therefore, a
  * full mark_inode_dirty() is overkill.  A mark_inode_dirty_sync(), on the
- * other hand, is not sufficient, because I_DIRTY_DATASYNC needs to be set to
- * ensure ->write_inode is called from generic_osync_inode() and this needs to
- * happen or the file data would not necessarily hit the device synchronously,
- * even though the vfs inode has the O_SYNC flag set.  Also, I_DIRTY_DATASYNC
- * simply "feels" better than just I_DIRTY_SYNC, since the file data has not
- * actually hit the block device yet, which is not what I_DIRTY_SYNC on its own
- * would suggest.
+ * other hand, is not sufficient, because ->write_inode needs to be called even
+ * in case of fdatasync. This needs to happen or the file data would not
+ * necessarily hit the device synchronously, even though the vfs inode has the
+ * O_SYNC flag set.  Also, I_DIRTY_DATASYNC simply "feels" better than just
+ * I_DIRTY_SYNC, since the file data has not actually hit the block device yet,
+ * which is not what I_DIRTY_SYNC on its own would suggest.
  */
 void __mark_mft_record_dirty(ntfs_inode *ni)
 {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/16] ocfs2: Update syncing after splicing to match generic version
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
  2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
@ 2009-09-02 13:59   ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, ocfs2-devel

Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   27 ++++++---------------------
 1 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 6002273..221c5e9 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1993,31 +1993,16 @@ static ssize_t ocfs2_file_splice_write(struct pipe_inode_info *pipe,
 
 	if (ret > 0) {
 		unsigned long nr_pages;
+		int err;
 
-		*ppos += ret;
 		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 
-		/*
-		 * If file or inode is SYNC and we actually wrote some data,
-		 * sync it.
-		 */
-		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			int err;
-
-			mutex_lock(&inode->i_mutex);
-			err = ocfs2_rw_lock(inode, 1);
-			if (err < 0) {
-				mlog_errno(err);
-			} else {
-				err = generic_osync_inode(inode, mapping,
-						  OSYNC_METADATA|OSYNC_DATA);
-				ocfs2_rw_unlock(inode, 1);
-			}
-			mutex_unlock(&inode->i_mutex);
+		err = generic_write_sync(out, *ppos, ret);
+		if (err)
+			ret = err;
+		else
+			*ppos += ret;
 
-			if (err)
-				ret = err;
-		}
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/16] ocfs2: Update syncing after splicing to match generic version
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, LKML, ocfs2-devel

Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   27 ++++++---------------------
 1 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 6002273..221c5e9 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1993,31 +1993,16 @@ static ssize_t ocfs2_file_splice_write(struct pipe_inode_info *pipe,
 
 	if (ret > 0) {
 		unsigned long nr_pages;
+		int err;
 
-		*ppos += ret;
 		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 
-		/*
-		 * If file or inode is SYNC and we actually wrote some data,
-		 * sync it.
-		 */
-		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			int err;
-
-			mutex_lock(&inode->i_mutex);
-			err = ocfs2_rw_lock(inode, 1);
-			if (err < 0) {
-				mlog_errno(err);
-			} else {
-				err = generic_osync_inode(inode, mapping,
-						  OSYNC_METADATA|OSYNC_DATA);
-				ocfs2_rw_unlock(inode, 1);
-			}
-			mutex_unlock(&inode->i_mutex);
+		err = generic_write_sync(out, *ppos, ret);
+		if (err)
+			ret = err;
+		else
+			*ppos += ret;
 
-			if (err)
-				ret = err;
-		}
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 12/16] ocfs2: Update syncing after splicing to match generic version
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jan Kara, LKML, ocfs2-devel

Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel at oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ocfs2/file.c |   27 ++++++---------------------
 1 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 6002273..221c5e9 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1993,31 +1993,16 @@ static ssize_t ocfs2_file_splice_write(struct pipe_inode_info *pipe,
 
 	if (ret > 0) {
 		unsigned long nr_pages;
+		int err;
 
-		*ppos += ret;
 		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 
-		/*
-		 * If file or inode is SYNC and we actually wrote some data,
-		 * sync it.
-		 */
-		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			int err;
-
-			mutex_lock(&inode->i_mutex);
-			err = ocfs2_rw_lock(inode, 1);
-			if (err < 0) {
-				mlog_errno(err);
-			} else {
-				err = generic_osync_inode(inode, mapping,
-						  OSYNC_METADATA|OSYNC_DATA);
-				ocfs2_rw_unlock(inode, 1);
-			}
-			mutex_unlock(&inode->i_mutex);
+		err = generic_write_sync(out, *ppos, ret);
+		if (err)
+			ret = err;
+		else
+			*ppos += ret;
 
-			if (err)
-				ret = err;
-		}
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 13/16] xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
@ 2009-09-02 13:59   ` Jan Kara
  2009-09-02 13:59   ` Jan Kara
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, Felix Blyakher, xfs

Christoph Hellwig says that it is enough for XFS to call
filemap_write_and_wait_range() instead of sync_page_range() because we do
all the metadata syncing when forcing the log.

CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/xfs/linux-2.6/xfs_lrw.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
index 7078974..fde63a3 100644
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -817,7 +817,8 @@ write_retry:
 		xfs_iunlock(xip, iolock);
 		if (need_i_mutex)
 			mutex_unlock(&inode->i_mutex);
-		error2 = sync_page_range(inode, mapping, pos, ret);
+		error2 = filemap_write_and_wait_range(mapping, pos,
+						      pos + ret - 1);
 		if (!error)
 			error = error2;
 		if (need_i_mutex)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 13/16] xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: xfs, Jan Kara, LKML, hch

Christoph Hellwig says that it is enough for XFS to call
filemap_write_and_wait_range() instead of sync_page_range() because we do
all the metadata syncing when forcing the log.

CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/xfs/linux-2.6/xfs_lrw.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
index 7078974..fde63a3 100644
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -817,7 +817,8 @@ write_retry:
 		xfs_iunlock(xip, iolock);
 		if (need_i_mutex)
 			mutex_unlock(&inode->i_mutex);
-		error2 = sync_page_range(inode, mapping, pos, ret);
+		error2 = filemap_write_and_wait_range(mapping, pos,
+						      pos + ret - 1);
 		if (!error)
 			error = error2;
 		if (need_i_mutex)
-- 
1.6.0.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 14/16] pohmelfs: Use new syncing helper
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (12 preceding siblings ...)
  2009-09-02 13:59   ` Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59 ` [PATCH 15/16] fat: Opencode sync_page_range_nolock() Jan Kara
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara

Use new generic_write_sync() helper instead of sync_page_range().

Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/staging/pohmelfs/inode.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/pohmelfs/inode.c b/drivers/staging/pohmelfs/inode.c
index 17801a5..a2e5eed 100644
--- a/drivers/staging/pohmelfs/inode.c
+++ b/drivers/staging/pohmelfs/inode.c
@@ -927,10 +927,10 @@ ssize_t pohmelfs_write(struct file *file, const char __user *buf,
 	mutex_unlock(&inode->i_mutex);
 	WARN_ON(ret < 0);
 
-	if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0) {
 		ssize_t err;
 
-		err = sync_page_range(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0)
 			ret = err;
 		WARN_ON(ret < 0);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 15/16] fat: Opencode sync_page_range_nolock()
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (13 preceding siblings ...)
  2009-09-02 13:59 ` [PATCH 14/16] pohmelfs: Use new syncing helper Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 13:59 ` [PATCH 16/16] vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() Jan Kara
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara, OGAWA Hirofumi

fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the
only user of generic_osync_inode() which does not have a file open.  So
opencode needed actions for FAT so that we can convert generic_osync_inode() to
a standard syncing path.

Update a comment about generic_osync_inode().

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fat/file.c |   22 ++++++++++++++++++++--
 fs/fat/misc.c |    4 ++--
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/fat/file.c b/fs/fat/file.c
index f042b96..e8c159d 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -176,8 +176,26 @@ static int fat_cont_expand(struct inode *inode, loff_t size)
 
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
 	mark_inode_dirty(inode);
-	if (IS_SYNC(inode))
-		err = sync_page_range_nolock(inode, mapping, start, count);
+	if (IS_SYNC(inode)) {
+		int err2;
+
+		/*
+		 * Opencode syncing since we don't have a file open to use
+		 * standard fsync path.
+		 */
+		err = filemap_fdatawrite_range(mapping, start,
+					       start + count - 1);
+		err2 = sync_mapping_buffers(mapping);
+		if (!err)
+			err = err2;
+		err2 = write_inode_now(inode, 1);
+		if (!err)
+			err = err2;
+		if (!err) {
+			err =  filemap_fdatawait_range(mapping, start,
+						       start + count - 1);
+		}
+	}
 out:
 	return err;
 }
diff --git a/fs/fat/misc.c b/fs/fat/misc.c
index a6c2047..4e35be8 100644
--- a/fs/fat/misc.c
+++ b/fs/fat/misc.c
@@ -119,8 +119,8 @@ int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster)
 		MSDOS_I(inode)->i_start = new_dclus;
 		MSDOS_I(inode)->i_logstart = new_dclus;
 		/*
-		 * Since generic_osync_inode() synchronize later if
-		 * this is not directory, we don't here.
+		 * Since generic_write_sync() synchronizes regular files later,
+		 * we sync here only directories.
 		 */
 		if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) {
 			ret = fat_sync_inode(inode);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 16/16] vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (14 preceding siblings ...)
  2009-09-02 13:59 ` [PATCH 15/16] fat: Opencode sync_page_range_nolock() Jan Kara
@ 2009-09-02 13:59 ` Jan Kara
  2009-09-02 14:16 ` [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Christoph Hellwig
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: LKML, hch, Jan Kara

Remove these three functions since nobody uses them anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c         |   54 --------------------------------------
 include/linux/fs.h        |    5 ---
 include/linux/writeback.h |    4 ---
 mm/filemap.c              |   64 ---------------------------------------------
 4 files changed, 0 insertions(+), 127 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c54226b..d8dbef0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -737,57 +737,3 @@ int sync_inode(struct inode *inode, struct writeback_control *wbc)
 	return ret;
 }
 EXPORT_SYMBOL(sync_inode);
-
-/**
- * generic_osync_inode - flush all dirty data for a given inode to disk
- * @inode: inode to write
- * @mapping: the address_space that should be flushed
- * @what:  what to write and wait upon
- *
- * This can be called by file_write functions for files which have the
- * O_SYNC flag set, to flush dirty writes to disk.
- *
- * @what is a bitmask, specifying which part of the inode's data should be
- * written and waited upon.
- *
- *    OSYNC_DATA:     i_mapping's dirty data
- *    OSYNC_METADATA: the buffers at i_mapping->private_list
- *    OSYNC_INODE:    the inode itself
- */
-
-int generic_osync_inode(struct inode *inode, struct address_space *mapping, int what)
-{
-	int err = 0;
-	int need_write_inode_now = 0;
-	int err2;
-
-	if (what & OSYNC_DATA)
-		err = filemap_fdatawrite(mapping);
-	if (what & (OSYNC_METADATA|OSYNC_DATA)) {
-		err2 = sync_mapping_buffers(mapping);
-		if (!err)
-			err = err2;
-	}
-	if (what & OSYNC_DATA) {
-		err2 = filemap_fdatawait(mapping);
-		if (!err)
-			err = err2;
-	}
-
-	spin_lock(&inode_lock);
-	if ((inode->i_state & I_DIRTY) &&
-	    ((what & OSYNC_INODE) || (inode->i_state & I_DIRTY_DATASYNC)))
-		need_write_inode_now = 1;
-	spin_unlock(&inode_lock);
-
-	if (need_write_inode_now) {
-		err2 = write_inode_now(inode, 1);
-		if (!err)
-			err = err2;
-	}
-	else
-		inode_sync_wait(inode);
-
-	return err;
-}
-EXPORT_SYMBOL(generic_osync_inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 18acaec..8ae3a07 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1458,11 +1458,6 @@ int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
 #define DT_SOCK		12
 #define DT_WHT		14
 
-#define OSYNC_METADATA	(1<<0)
-#define OSYNC_DATA	(1<<1)
-#define OSYNC_INODE	(1<<2)
-int generic_osync_inode(struct inode *, struct address_space *, int);
-
 /*
  * This is the "filldir" function type, used by readdir() to let
  * the kernel specify what kind of dirent layout it wants to have.
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 3224820..1446694 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -157,10 +157,6 @@ int write_cache_pages(struct address_space *mapping,
 		      struct writeback_control *wbc, writepage_t writepage,
 		      void *data);
 int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
-int sync_page_range(struct inode *inode, struct address_space *mapping,
-			loff_t pos, loff_t count);
-int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
-			   loff_t pos, loff_t count);
 void set_page_dirty_balance(struct page *page, int page_mkwrite);
 void writeback_set_ratelimit(void);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 70988a1..854e10e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -326,70 +326,6 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
- * sync_page_range - write and wait on all pages in the passed range
- * @inode:	target inode
- * @mapping:	target address_space
- * @pos:	beginning offset in pages to write
- * @count:	number of bytes to write
- *
- * Write and wait upon all the pages in the passed range.  This is a "data
- * integrity" operation.  It waits upon in-flight writeout before starting and
- * waiting upon new writeout.  If there was an IO error, return it.
- *
- * We need to re-take i_mutex during the generic_osync_inode list walk because
- * it is otherwise livelockable.
- */
-int sync_page_range(struct inode *inode, struct address_space *mapping,
-			loff_t pos, loff_t count)
-{
-	pgoff_t start = pos >> PAGE_CACHE_SHIFT;
-	pgoff_t end = (pos + count - 1) >> PAGE_CACHE_SHIFT;
-	int ret;
-
-	if (!mapping_cap_writeback_dirty(mapping) || !count)
-		return 0;
-	ret = filemap_fdatawrite_range(mapping, pos, pos + count - 1);
-	if (ret == 0) {
-		mutex_lock(&inode->i_mutex);
-		ret = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		mutex_unlock(&inode->i_mutex);
-	}
-	if (ret == 0)
-		ret = wait_on_page_writeback_range(mapping, start, end);
-	return ret;
-}
-EXPORT_SYMBOL(sync_page_range);
-
-/**
- * sync_page_range_nolock - write & wait on all pages in the passed range without locking
- * @inode:	target inode
- * @mapping:	target address_space
- * @pos:	beginning offset in pages to write
- * @count:	number of bytes to write
- *
- * Note: Holding i_mutex across sync_page_range_nolock() is not a good idea
- * as it forces O_SYNC writers to different parts of the same file
- * to be serialised right until io completion.
- */
-int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
-			   loff_t pos, loff_t count)
-{
-	pgoff_t start = pos >> PAGE_CACHE_SHIFT;
-	pgoff_t end = (pos + count - 1) >> PAGE_CACHE_SHIFT;
-	int ret;
-
-	if (!mapping_cap_writeback_dirty(mapping) || !count)
-		return 0;
-	ret = filemap_fdatawrite_range(mapping, pos, pos + count - 1);
-	if (ret == 0)
-		ret = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-	if (ret == 0)
-		ret = wait_on_page_writeback_range(mapping, start, end);
-	return ret;
-}
-EXPORT_SYMBOL(sync_page_range_nolock);
-
-/**
  * filemap_fdatawait - wait for all under-writeback pages to complete
  * @mapping: address space structure to wait for
  *
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Ocfs2-devel] [PATCH 07/16] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
@ 2009-09-02 13:59   ` Jan Kara
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-02 13:59 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: LKML, hch, Jan Kara, Evgeniy Polyakov, ocfs2-devel, Joel Becker,
	Felix Blyakher, xfs, Anton Altaparmakov, linux-ntfs-dev,
	OGAWA Hirofumi, linux-ext4, tytso

Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.

Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.

CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel at oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs at oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev at lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4 at vger.kernel.org
CC: tytso at mit.edu
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/splice.c        |   22 +++++---------------
 fs/sync.c          |   55 +++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h |    3 ++
 mm/filemap.c       |   18 +++++-----------
 4 files changed, 63 insertions(+), 35 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 73766d2..8190237 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -976,25 +976,15 @@ generic_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 	if (ret > 0) {
 		unsigned long nr_pages;
+		int err;
 
-		*ppos += ret;
 		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 
-		/*
-		 * If file or inode is SYNC and we actually wrote some data,
-		 * sync it.
-		 */
-		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
-			int err;
-
-			mutex_lock(&inode->i_mutex);
-			err = generic_osync_inode(inode, mapping,
-						  OSYNC_METADATA|OSYNC_DATA);
-			mutex_unlock(&inode->i_mutex);
-
-			if (err)
-				ret = err;
-		}
+		err = generic_write_sync(out, *ppos, ret);
+		if (err)
+			ret = err;
+		else
+			*ppos += ret;
 		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
 	}
 
diff --git a/fs/sync.c b/fs/sync.c
index 3422ba6..6fe72e6 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -176,19 +176,23 @@ int file_fsync(struct file *filp, struct dentry *dentry, int datasync)
 }
 
 /**
- * vfs_fsync - perform a fsync or fdatasync on a file
+ * vfs_fsync_range - helper to sync a range of data & metadata to disk
  * @file:		file to sync
  * @dentry:		dentry of @file
- * @data:		only perform a fdatasync operation
+ * @start:		offset in bytes of the beginning of data range to sync
+ * @end:		offset in bytes of the end of data range (inclusive)
+ * @datasync:		perform only datasync
  *
- * Write back data and metadata for @file to disk.  If @datasync is
- * set only metadata needed to access modified file data is written.
+ * Write back data in range @start.. at end and metadata for @file to disk.  If
+ * @datasync is set only metadata needed to access modified file data is
+ * written.
  *
  * In case this function is called from nfsd @file may be %NULL and
  * only @dentry is set.  This can only happen when the filesystem
  * implements the export_operations API.
  */
-int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+int vfs_fsync_range(struct file *file, struct dentry *dentry, loff_t start,
+		    loff_t end, int datasync)
 {
 	const struct file_operations *fop;
 	struct address_space *mapping;
@@ -212,7 +216,7 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 		goto out;
 	}
 
-	ret = filemap_fdatawrite(mapping);
+	ret = filemap_fdatawrite_range(mapping, start, end);
 
 	/*
 	 * We need to protect against concurrent writers, which could cause
@@ -223,12 +227,32 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	if (!ret)
 		ret = err;
 	mutex_unlock(&mapping->host->i_mutex);
-	err = filemap_fdatawait(mapping);
+
+	err = filemap_fdatawait_range(mapping, start, end);
 	if (!ret)
 		ret = err;
 out:
 	return ret;
 }
+EXPORT_SYMBOL(vfs_fsync_range);
+
+/**
+ * vfs_fsync - perform a fsync or fdatasync on a file
+ * @file:		file to sync
+ * @dentry:		dentry of @file
+ * @datasync:		only perform a fdatasync operation
+ *
+ * Write back data and metadata for @file to disk.  If @datasync is
+ * set only metadata needed to access modified file data is written.
+ *
+ * In case this function is called from nfsd @file may be %NULL and
+ * only @dentry is set.  This can only happen when the filesystem
+ * implements the export_operations API.
+ */
+int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
+{
+	return vfs_fsync_range(file, dentry, 0, LLONG_MAX, datasync);
+}
 EXPORT_SYMBOL(vfs_fsync);
 
 static int do_fsync(unsigned int fd, int datasync)
@@ -254,6 +278,23 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
 	return do_fsync(fd, 1);
 }
 
+/**
+ * generic_write_sync - perform syncing after a write if file / inode is sync
+ * @file:	file to which the write happened
+ * @pos:	offset where the write started
+ * @count:	length of the write
+ *
+ * This is just a simple wrapper about our general syncing function.
+ */
+int generic_write_sync(struct file *file, loff_t pos, loff_t count)
+{
+	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
+		return 0;
+	return vfs_fsync_range(file, file->f_path.dentry, pos,
+			       pos + count - 1, 1);
+}
+EXPORT_SYMBOL(generic_write_sync);
+
 /*
  * sys_sync_file_range() permits finely controlled syncing over a segment of
  * a file in the range offset .. (offset+nbytes-1) inclusive.  If nbytes is
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bc7f0f1..18acaec 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2088,7 +2088,10 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 
+extern int vfs_fsync_range(struct file *file, struct dentry *dentry,
+			   loff_t start, loff_t end, int datasync);
 extern int vfs_fsync(struct file *file, struct dentry *dentry, int datasync);
+extern int generic_write_sync(struct file *file, loff_t pos, loff_t count);
 extern void sync_supers(void);
 extern void emergency_sync(void);
 extern void emergency_remount(void);
diff --git a/mm/filemap.c b/mm/filemap.c
index 3955f7e..70988a1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -39,11 +39,10 @@
 /*
  * FIXME: remove all knowledge of the buffer layer from the core VM
  */
-#include <linux/buffer_head.h> /* for generic_osync_inode */
+#include <linux/buffer_head.h> /* for try_to_free_buffers */
 
 #include <asm/mman.h>
 
-
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -2480,19 +2479,16 @@ ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
 			 unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
 
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 
-	if ((ret > 0 || ret == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0 || ret == -EIOCBQUEUED) {
 		ssize_t err;
 
-		err = sync_page_range_nolock(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0 && ret > 0)
 			ret = err;
 	}
@@ -2515,8 +2511,7 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
 {
 	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
+	struct inode *inode = file->f_mapping->host;
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
@@ -2525,11 +2520,10 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
 
-	if ((ret > 0 || ret == -EIOCBQUEUED) &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
+	if (ret > 0 || ret == -EIOCBQUEUED) {
 		ssize_t err;
 
-		err = sync_page_range(inode, mapping, pos, ret);
+		err = generic_write_sync(file, pos, ret);
 		if (err < 0 && ret > 0)
 			ret = err;
 	}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4)
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (15 preceding siblings ...)
  2009-09-02 13:59 ` [PATCH 16/16] vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() Jan Kara
@ 2009-09-02 14:16 ` Christoph Hellwig
  2009-09-02 22:18 ` [PATCH] fsync: wait for data writeout completion before calling ->fsync Christoph Hellwig
  2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
  18 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-02 14:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, LKML, hch

On Wed, Sep 02, 2009 at 03:59:10PM +0200, Jan Kara wrote:
>   Hi,
> 
>   here is a new version of my O_SYNC cleanup patches. There are two minor
> changes since last time. XFS now uses filemap_write_and_wait() as Christoph
> asked

The XFS tree now does this already, Idid it as part of the XFS-internal
fsync/O_SYNC consolidation.  I'll look into fixing the data writeout
stuff ontop of you series, and after that XFS can use the generic
helper, too.

>   If noone objects, I think the patch series is ready to be put in linux-next.

I'll look through it, but in general the sooner we have this in -next,
the better.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock
  2009-09-02 13:59 ` [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock Jan Kara
@ 2009-09-02 21:47   ` Christoph Hellwig
  2009-09-03 10:24     ` Jan Kara
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-02 21:47 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, LKML, hch

On Wed, Sep 02, 2009 at 03:59:16PM +0200, Jan Kara wrote:
> generic_file_aio_write_nolock() is now used only by block devices and raw
> character device. Filesystems should use __generic_file_aio_write() in case
> generic_file_aio_write() doesn't suit them. So rename the function to
> device_aio_write().

I would recommend this one on top:

Move it to fs/block_dev.c, rename it to blkdev_aio_write, export it _GPL
only and make it very clear it's only for block devices and raw.


And btw, I'm not actually sure it is the right thing for raw.  Raw is
supposed to do direct I/O only, and in fact forced O_DIRECT on.  Because
there are no holes it also can't fall back to direct I/O.  So strictly
spreaking we could just use __generic_file_aio_write directly.   That
is until we care about the hw disk caches..

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/drivers/char/raw.c
===================================================================
--- linux-2.6.orig/drivers/char/raw.c	2009-09-02 11:29:21.772122902 -0300
+++ linux-2.6/drivers/char/raw.c	2009-09-02 11:29:31.996123226 -0300
@@ -246,7 +246,7 @@ static const struct file_operations raw_
 	.read	=	do_sync_read,
 	.aio_read = 	generic_file_aio_read,
 	.write	=	do_sync_write,
-	.aio_write =	device_aio_write,
+	.aio_write =	blkdev_aio_write,
 	.open	=	raw_open,
 	.release=	raw_release,
 	.ioctl	=	raw_ioctl,
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c	2009-09-02 11:26:33.344623333 -0300
+++ linux-2.6/fs/block_dev.c	2009-09-02 15:04:15.125012009 -0300
@@ -1405,6 +1405,33 @@ static long block_ioctl(struct file *fil
 }
 
 /*
+ * Write data to the block device.  Only intended for the block device itself
+ * and the raw driver which basically is a fake block device.
+ *
+ * Does not take i_mutex for the write and thus is not for general purpose
+ * use.
+ */
+ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
+			 unsigned long nr_segs, loff_t pos)
+{
+	struct file *file = iocb->ki_filp;
+	ssize_t ret;
+
+	BUG_ON(iocb->ki_pos != pos);
+
+	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
+	if (ret > 0 || ret == -EIOCBQUEUED) {
+		ssize_t err;
+
+		err = generic_write_sync(file, pos, ret);
+		if (err < 0 && ret > 0)
+			ret = err;
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_aio_write);
+
+/*
  * Try to release a page associated with block device when the system
  * is under memory pressure.
  */
@@ -1436,7 +1463,7 @@ const struct file_operations def_blk_fop
 	.read		= do_sync_read,
 	.write		= do_sync_write,
   	.aio_read	= generic_file_aio_read,
-	.aio_write	= device_aio_write,
+	.aio_write	= blkdev_aio_write,
 	.mmap		= generic_file_mmap,
 	.fsync		= block_fsync,
 	.unlocked_ioctl	= block_ioctl,
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-09-02 11:28:36.464127763 -0300
+++ linux-2.6/include/linux/fs.h	2009-09-02 15:04:34.740769169 -0300
@@ -2196,8 +2196,6 @@ extern ssize_t generic_file_aio_read(str
 extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
 		loff_t *);
 extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
-extern ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
-				unsigned long nr_segs, loff_t pos);
 extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *,
 		unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
@@ -2207,6 +2205,10 @@ extern ssize_t do_sync_write(struct file
 extern int generic_segment_checks(const struct iovec *iov,
 		unsigned long *nr_segs, size_t *count, int access_flags);
 
+/* fs/block_dev.c */
+extern ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
+				unsigned long nr_segs, loff_t pos);
+
 /* fs/splice.c */
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
 		struct pipe_inode_info *, size_t, unsigned int);
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2009-09-02 11:26:33.240134550 -0300
+++ linux-2.6/mm/filemap.c	2009-09-02 11:27:03.488134215 -0300
@@ -2398,41 +2398,6 @@ out:
 EXPORT_SYMBOL(__generic_file_aio_write);
 
 /**
- * device_aio_write - write data
- * @iocb:	IO state structure
- * @iov:	vector with data to write
- * @nr_segs:	number of segments in the vector
- * @pos:	position in file where to write
- *
- * This is a wrapper around __generic_file_aio_write() which takes care of
- * syncing the file in case of O_SYNC file. It does not take i_mutex for the
- * write itself but may do so during syncing. It is meant for users like block
- * devices which do not need i_mutex during write. If your filesystem needs to
- * do a write but already holds i_mutex, use __generic_file_aio_write()
- * directly and then sync the file like generic_file_aio_write().
- */
-ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
-			 unsigned long nr_segs, loff_t pos)
-{
-	struct file *file = iocb->ki_filp;
-	ssize_t ret;
-
-	BUG_ON(iocb->ki_pos != pos);
-
-	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
-
-	if (ret > 0 || ret == -EIOCBQUEUED) {
-		ssize_t err;
-
-		err = generic_write_sync(file, pos, ret);
-		if (err < 0 && ret > 0)
-			ret = err;
-	}
-	return ret;
-}
-EXPORT_SYMBOL(device_aio_write);
-
-/**
  * generic_file_aio_write - write data to a file
  * @iocb:	IO state structure
  * @iov:	vector with data to write

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] fsync: wait for data writeout completion before calling ->fsync
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (16 preceding siblings ...)
  2009-09-02 14:16 ` [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Christoph Hellwig
@ 2009-09-02 22:18 ` Christoph Hellwig
  2009-09-02 22:37   ` Joel Becker
  2009-09-03 10:47   ` Jan Kara
  2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
  18 siblings, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-02 22:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, LKML, hch

I think we should add this one ontop:

-- 
Subject: [PATCH] fsync: wait for data writeout completion before calling ->fsync
From: Christoph Hellwig <hch@lst.de>

Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
the data, the calls into ->fsync to write out the metadata and then finally
calls filemap_fdatawait to wait for the data I/O to complete.  What sounds
like a clever micro-optimization actually is nast trap for many filesystems.

For many modern filesystems i_size or other inode information is only
updated on I/O completion and we need to wait for I/O to finish before
we can write out the metadata.  For old fashionen filesystems that
instanciate blocks during the actual write and also update the metadata
at that point it opens up a large window were we could expose uninitialized
blocks after a crash.  While a few filesystems that need it already wait
for the I/O to finish inside their ->fsync methods it is rather suboptimal
as it is done under the i_mutex and also always for the whole file instead
of just a part as we could do for O_SYNC handling.

Here is a small audit of all fsync instances in the tree:

 - spufs_mfc_fsync:
 - ps3flash_fsync: 
 - vol_cdev_fsync:
 - printer_fsync:
 - fb_deferred_io_fsync:
 - bad_file_fsync:
 - simple_sync_file:

	don't care - filesystems/drivers do't use the page cache or are
	purely in-memory.

 - simple_fsync:
 - file_fsync:
 - affs_file_fsync:
 - fat_file_fsync:
 - jfs_fsync:
 - ubifs_fsync:
 - reiserfs_dir_fsync:
 - reiserfs_sync_file:

	never touch pagecache themselves.  We need to wait before if we do
	not want to expose stale data after an allocation.

 - afs_fsync:
 - fuse_fsync_common:

	do the waiting writeback itself in awkward ways, would benefit from
	proper semantics

 - block_fsync:

	Does a filemap_write_and_wait on the block device inode.  Because we
	now have f_mapping that is the same inode we call it on in vfs_fsync.
	So just removing it and letting the VFS do the work in one go would
	be an improvement.

 - btrfs_sync_file:
 - cifs_fsync:
 - xfs_file_fsync:

	need the wait first and currently do it themselves. would benefit from
	doing it outside i_mutex.

 - coda_fsync:
 - ecryptfs_fsync:
 - exofs_file_fsync:
 - shm_fsync:

	only passes the fsync through to the lower layer

 - ext3_sync_file:

	doesn't seem to care, comments are confusing.

 - ext4_sync_file:

	would need the wait to work correctly for delalloc mode with late
	i_size updates.  Otherwise the ext3 comment applies.


	currently implemens it's own writeback and wait in an odd way,
	could benefit from doing it properly.

 - gfs2_fsync:

	not needed for journaled data mode, but probably harmless there.
	Currently writes back data asynchronously itself.  Needs some
	major audit.

 - hostfs_fsync:

	just calls fsync/datasync on the host FD.  Without the wait before
	data might not even be inflight yet if we're unlucky.

 - hpfs_file_fsync:
 - ncp_fsync:

	no-ops.  Dangerous before and after.

 - jffs2_fsync:

	just calls jffs2_flush_wbuf_gc, not sure how this relates to data.

 - nfs_fsync_dir:

	just increments stats, claims all directory operations are synchronous

 - nfs_file_fsync:

	only writes out data???  Looks very odd.

 - nilfs_sync_file:

	looks like it expects all data done, but not sure from the code

 - ntfs_dir_fsync:
 - ntfs_file_fsync:

	appear to do their own data writeback.  Very convoluted code.

 - ocfs2_sync_file:

	does it's own data writeback, but no wait.  probably needs the wait.

 - smb_fsync:

	according to a comment expects all pages written already, probably needs
	the wait before.


This patch only changes vfs_fsync_range, removal of the wait in the methods
that have it is left to the filesystem maintainers.  Note that most
filesystems really do need an audit for their fsync methods given the
gems found in this very brief audit.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/fs/sync.c
===================================================================
--- linux-2.6.orig/fs/sync.c	2009-09-02 15:03:41.073271287 -0300
+++ linux-2.6/fs/sync.c	2009-09-02 15:04:34.401269249 -0300
@@ -216,7 +216,7 @@ int vfs_fsync_range(struct file *file, s
 		goto out;
 	}
 
-	ret = filemap_fdatawrite_range(mapping, start, end);
+	ret = filemap_write_and_wait_range(mapping, start, end);
 
 	/*
 	 * We need to protect against concurrent writers, which could cause
@@ -228,9 +228,6 @@ int vfs_fsync_range(struct file *file, s
 		ret = err;
 	mutex_unlock(&mapping->host->i_mutex);
 
-	err = filemap_fdatawait_range(mapping, start, end);
-	if (!ret)
-		ret = err;
 out:
 	return ret;
 }

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] fsync: wait for data writeout completion before calling ->fsync
  2009-09-02 22:18 ` [PATCH] fsync: wait for data writeout completion before calling ->fsync Christoph Hellwig
@ 2009-09-02 22:37   ` Joel Becker
  2009-09-03 10:47   ` Jan Kara
  1 sibling, 0 replies; 52+ messages in thread
From: Joel Becker @ 2009-09-02 22:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, LKML

On Thu, Sep 03, 2009 at 12:18:38AM +0200, Christoph Hellwig wrote:
>  - ocfs2_sync_file:
> 
> 	does it's own data writeback, but no wait.  probably needs the wait.

	The journal force commit will wait in ordered mode.  In
writeback mode we currently allow the usual writeback problem.

Joel

-- 

"There are some experiences in life which should not be demanded
 twice from any man, and one of them is listening to the Brahms Requiem."
        - George Bernard Shaw

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock
  2009-09-02 21:47   ` Christoph Hellwig
@ 2009-09-03 10:24     ` Jan Kara
  2009-09-03 15:37       ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: Jan Kara @ 2009-09-03 10:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, LKML

On Wed 02-09-09 23:47:22, Christoph Hellwig wrote:
> On Wed, Sep 02, 2009 at 03:59:16PM +0200, Jan Kara wrote:
> > generic_file_aio_write_nolock() is now used only by block devices and raw
> > character device. Filesystems should use __generic_file_aio_write() in case
> > generic_file_aio_write() doesn't suit them. So rename the function to
> > device_aio_write().
> 
> I would recommend this one on top:
> 
> Move it to fs/block_dev.c, rename it to blkdev_aio_write, export it _GPL
> only and make it very clear it's only for block devices and raw.
  Yes, fine with me. I'll replace my patch with yours so that we don't
rename the function twice unnecessarily.

> And btw, I'm not actually sure it is the right thing for raw.  Raw is
> supposed to do direct I/O only, and in fact forced O_DIRECT on.  Because
> there are no holes it also can't fall back to direct I/O.  So strictly
> spreaking we could just use __generic_file_aio_write directly.   That
> is until we care about the hw disk caches..
  I'm slightly confused with the above - probably you mean it cannot fall
back to buffered I/O and it could use generic_file_direct_write (because
__generic_file_aio_write is just blkdev_aio_write without syncing in case
of O_SYNC). The thing with using generic_file_direct_write() is that we'd
have to duplicate checks in the beginning of __generic_file_aio_write() so
I'm not sure the code will be cleaner in the end.

									Honza

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/drivers/char/raw.c
> ===================================================================
> --- linux-2.6.orig/drivers/char/raw.c	2009-09-02 11:29:21.772122902 -0300
> +++ linux-2.6/drivers/char/raw.c	2009-09-02 11:29:31.996123226 -0300
> @@ -246,7 +246,7 @@ static const struct file_operations raw_
>  	.read	=	do_sync_read,
>  	.aio_read = 	generic_file_aio_read,
>  	.write	=	do_sync_write,
> -	.aio_write =	device_aio_write,
> +	.aio_write =	blkdev_aio_write,
>  	.open	=	raw_open,
>  	.release=	raw_release,
>  	.ioctl	=	raw_ioctl,
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c	2009-09-02 11:26:33.344623333 -0300
> +++ linux-2.6/fs/block_dev.c	2009-09-02 15:04:15.125012009 -0300
> @@ -1405,6 +1405,33 @@ static long block_ioctl(struct file *fil
>  }
>  
>  /*
> + * Write data to the block device.  Only intended for the block device itself
> + * and the raw driver which basically is a fake block device.
> + *
> + * Does not take i_mutex for the write and thus is not for general purpose
> + * use.
> + */
> +ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
> +			 unsigned long nr_segs, loff_t pos)
> +{
> +	struct file *file = iocb->ki_filp;
> +	ssize_t ret;
> +
> +	BUG_ON(iocb->ki_pos != pos);
> +
> +	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
> +	if (ret > 0 || ret == -EIOCBQUEUED) {
> +		ssize_t err;
> +
> +		err = generic_write_sync(file, pos, ret);
> +		if (err < 0 && ret > 0)
> +			ret = err;
> +	}
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_aio_write);
> +
> +/*
>   * Try to release a page associated with block device when the system
>   * is under memory pressure.
>   */
> @@ -1436,7 +1463,7 @@ const struct file_operations def_blk_fop
>  	.read		= do_sync_read,
>  	.write		= do_sync_write,
>    	.aio_read	= generic_file_aio_read,
> -	.aio_write	= device_aio_write,
> +	.aio_write	= blkdev_aio_write,
>  	.mmap		= generic_file_mmap,
>  	.fsync		= block_fsync,
>  	.unlocked_ioctl	= block_ioctl,
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h	2009-09-02 11:28:36.464127763 -0300
> +++ linux-2.6/include/linux/fs.h	2009-09-02 15:04:34.740769169 -0300
> @@ -2196,8 +2196,6 @@ extern ssize_t generic_file_aio_read(str
>  extern ssize_t __generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long,
>  		loff_t *);
>  extern ssize_t generic_file_aio_write(struct kiocb *, const struct iovec *, unsigned long, loff_t);
> -extern ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
> -				unsigned long nr_segs, loff_t pos);
>  extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *,
>  		unsigned long *, loff_t, loff_t *, size_t, size_t);
>  extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
> @@ -2207,6 +2205,10 @@ extern ssize_t do_sync_write(struct file
>  extern int generic_segment_checks(const struct iovec *iov,
>  		unsigned long *nr_segs, size_t *count, int access_flags);
>  
> +/* fs/block_dev.c */
> +extern ssize_t blkdev_aio_write(struct kiocb *iocb, const struct iovec *iov,
> +				unsigned long nr_segs, loff_t pos);
> +
>  /* fs/splice.c */
>  extern ssize_t generic_file_splice_read(struct file *, loff_t *,
>  		struct pipe_inode_info *, size_t, unsigned int);
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c	2009-09-02 11:26:33.240134550 -0300
> +++ linux-2.6/mm/filemap.c	2009-09-02 11:27:03.488134215 -0300
> @@ -2398,41 +2398,6 @@ out:
>  EXPORT_SYMBOL(__generic_file_aio_write);
>  
>  /**
> - * device_aio_write - write data
> - * @iocb:	IO state structure
> - * @iov:	vector with data to write
> - * @nr_segs:	number of segments in the vector
> - * @pos:	position in file where to write
> - *
> - * This is a wrapper around __generic_file_aio_write() which takes care of
> - * syncing the file in case of O_SYNC file. It does not take i_mutex for the
> - * write itself but may do so during syncing. It is meant for users like block
> - * devices which do not need i_mutex during write. If your filesystem needs to
> - * do a write but already holds i_mutex, use __generic_file_aio_write()
> - * directly and then sync the file like generic_file_aio_write().
> - */
> -ssize_t device_aio_write(struct kiocb *iocb, const struct iovec *iov,
> -			 unsigned long nr_segs, loff_t pos)
> -{
> -	struct file *file = iocb->ki_filp;
> -	ssize_t ret;
> -
> -	BUG_ON(iocb->ki_pos != pos);
> -
> -	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
> -
> -	if (ret > 0 || ret == -EIOCBQUEUED) {
> -		ssize_t err;
> -
> -		err = generic_write_sync(file, pos, ret);
> -		if (err < 0 && ret > 0)
> -			ret = err;
> -	}
> -	return ret;
> -}
> -EXPORT_SYMBOL(device_aio_write);
> -
> -/**
>   * generic_file_aio_write - write data to a file
>   * @iocb:	IO state structure
>   * @iov:	vector with data to write
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] fsync: wait for data writeout completion before calling ->fsync
  2009-09-02 22:18 ` [PATCH] fsync: wait for data writeout completion before calling ->fsync Christoph Hellwig
  2009-09-02 22:37   ` Joel Becker
@ 2009-09-03 10:47   ` Jan Kara
  2009-09-03 15:39     ` Christoph Hellwig
  1 sibling, 1 reply; 52+ messages in thread
From: Jan Kara @ 2009-09-03 10:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, LKML

On Thu 03-09-09 00:18:38, Christoph Hellwig wrote:
> I think we should add this one ontop:
  Agreed. Added to the series. I'll now push the whole series to linux-next
via my fs tree.

								Honza

> 
> -- 
> Subject: [PATCH] fsync: wait for data writeout completion before calling ->fsync
> From: Christoph Hellwig <hch@lst.de>
> 
> Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
> the data, the calls into ->fsync to write out the metadata and then finally
> calls filemap_fdatawait to wait for the data I/O to complete.  What sounds
> like a clever micro-optimization actually is nast trap for many filesystems.
> 
> For many modern filesystems i_size or other inode information is only
> updated on I/O completion and we need to wait for I/O to finish before
> we can write out the metadata.  For old fashionen filesystems that
> instanciate blocks during the actual write and also update the metadata
> at that point it opens up a large window were we could expose uninitialized
> blocks after a crash.  While a few filesystems that need it already wait
> for the I/O to finish inside their ->fsync methods it is rather suboptimal
> as it is done under the i_mutex and also always for the whole file instead
> of just a part as we could do for O_SYNC handling.
> 
> Here is a small audit of all fsync instances in the tree:
> 
>  - spufs_mfc_fsync:
>  - ps3flash_fsync: 
>  - vol_cdev_fsync:
>  - printer_fsync:
>  - fb_deferred_io_fsync:
>  - bad_file_fsync:
>  - simple_sync_file:
> 
> 	don't care - filesystems/drivers do't use the page cache or are
> 	purely in-memory.
> 
>  - simple_fsync:
>  - file_fsync:
>  - affs_file_fsync:
>  - fat_file_fsync:
>  - jfs_fsync:
>  - ubifs_fsync:
>  - reiserfs_dir_fsync:
>  - reiserfs_sync_file:
> 
> 	never touch pagecache themselves.  We need to wait before if we do
> 	not want to expose stale data after an allocation.
> 
>  - afs_fsync:
>  - fuse_fsync_common:
> 
> 	do the waiting writeback itself in awkward ways, would benefit from
> 	proper semantics
> 
>  - block_fsync:
> 
> 	Does a filemap_write_and_wait on the block device inode.  Because we
> 	now have f_mapping that is the same inode we call it on in vfs_fsync.
> 	So just removing it and letting the VFS do the work in one go would
> 	be an improvement.
> 
>  - btrfs_sync_file:
>  - cifs_fsync:
>  - xfs_file_fsync:
> 
> 	need the wait first and currently do it themselves. would benefit from
> 	doing it outside i_mutex.
> 
>  - coda_fsync:
>  - ecryptfs_fsync:
>  - exofs_file_fsync:
>  - shm_fsync:
> 
> 	only passes the fsync through to the lower layer
> 
>  - ext3_sync_file:
> 
> 	doesn't seem to care, comments are confusing.
> 
>  - ext4_sync_file:
> 
> 	would need the wait to work correctly for delalloc mode with late
> 	i_size updates.  Otherwise the ext3 comment applies.
> 
> 
> 	currently implemens it's own writeback and wait in an odd way,
> 	could benefit from doing it properly.
> 
>  - gfs2_fsync:
> 
> 	not needed for journaled data mode, but probably harmless there.
> 	Currently writes back data asynchronously itself.  Needs some
> 	major audit.
> 
>  - hostfs_fsync:
> 
> 	just calls fsync/datasync on the host FD.  Without the wait before
> 	data might not even be inflight yet if we're unlucky.
> 
>  - hpfs_file_fsync:
>  - ncp_fsync:
> 
> 	no-ops.  Dangerous before and after.
> 
>  - jffs2_fsync:
> 
> 	just calls jffs2_flush_wbuf_gc, not sure how this relates to data.
> 
>  - nfs_fsync_dir:
> 
> 	just increments stats, claims all directory operations are synchronous
> 
>  - nfs_file_fsync:
> 
> 	only writes out data???  Looks very odd.
> 
>  - nilfs_sync_file:
> 
> 	looks like it expects all data done, but not sure from the code
> 
>  - ntfs_dir_fsync:
>  - ntfs_file_fsync:
> 
> 	appear to do their own data writeback.  Very convoluted code.
> 
>  - ocfs2_sync_file:
> 
> 	does it's own data writeback, but no wait.  probably needs the wait.
> 
>  - smb_fsync:
> 
> 	according to a comment expects all pages written already, probably needs
> 	the wait before.
> 
> 
> This patch only changes vfs_fsync_range, removal of the wait in the methods
> that have it is left to the filesystem maintainers.  Note that most
> filesystems really do need an audit for their fsync methods given the
> gems found in this very brief audit.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/fs/sync.c
> ===================================================================
> --- linux-2.6.orig/fs/sync.c	2009-09-02 15:03:41.073271287 -0300
> +++ linux-2.6/fs/sync.c	2009-09-02 15:04:34.401269249 -0300
> @@ -216,7 +216,7 @@ int vfs_fsync_range(struct file *file, s
>  		goto out;
>  	}
>  
> -	ret = filemap_fdatawrite_range(mapping, start, end);
> +	ret = filemap_write_and_wait_range(mapping, start, end);
>  
>  	/*
>  	 * We need to protect against concurrent writers, which could cause
> @@ -228,9 +228,6 @@ int vfs_fsync_range(struct file *file, s
>  		ret = err;
>  	mutex_unlock(&mapping->host->i_mutex);
>  
> -	err = filemap_fdatawait_range(mapping, start, end);
> -	if (!ret)
> -		ret = err;
>  out:
>  	return ret;
>  }
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock
  2009-09-03 10:24     ` Jan Kara
@ 2009-09-03 15:37       ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-03 15:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, LKML

On Thu, Sep 03, 2009 at 12:24:36PM +0200, Jan Kara wrote:
> > Move it to fs/block_dev.c, rename it to blkdev_aio_write, export it _GPL
> > only and make it very clear it's only for block devices and raw.
>   Yes, fine with me. I'll replace my patch with yours so that we don't
> rename the function twice unnecessarily.

It's not a replacement, it's ontop of yours.  But folding it into yours
would make a lot of sense.

> > And btw, I'm not actually sure it is the right thing for raw.  Raw is
> > supposed to do direct I/O only, and in fact forced O_DIRECT on.  Because
> > there are no holes it also can't fall back to direct I/O.  So strictly
> > spreaking we could just use __generic_file_aio_write directly.   That
> > is until we care about the hw disk caches..
>   I'm slightly confused with the above - probably you mean it cannot fall
> back to buffered I/O and it could use generic_file_direct_write (because
> __generic_file_aio_write is just blkdev_aio_write without syncing in case
> of O_SYNC).

It can not fall back to buffered I/O, yes.  Any given that it does not
not do buffered I/O and the block/raw device also doesn' have any
inode metadata we could just use __generic_file_aio_write directly.
That is until my patch to flush the disk cache in ->fsync goes in in
which case we'll at least need that one again.  But we might just be
better off to opencode that instead of really using fsync - that avoids
the superflous call to filemap_write_and_wait and performs the cache
flush without i_mutex which we don't need.

That is the story for the block device, now the raw device is more
difficult as I would be surprised if the user of it used fsync on it.
Then again that would require us to find those users first, although
they apparently exist as removal of this horrible raw device feature
was vetoed by the big distros.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] fsync: wait for data writeout completion before calling ->fsync
  2009-09-03 10:47   ` Jan Kara
@ 2009-09-03 15:39     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-03 15:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, LKML

On Thu, Sep 03, 2009 at 12:47:04PM +0200, Jan Kara wrote:
> On Thu 03-09-09 00:18:38, Christoph Hellwig wrote:
> > I think we should add this one ontop:
>   Agreed. Added to the series. I'll now push the whole series to linux-next
> via my fs tree.

Thanks.  We do also need something similar in writeback_single_inode,
but I'm not sure how to do that in a good way, and how fixing this now
would interact with Jens' writeback changes.  Probably makes sense to
postpone it until 2.6.32 at least and possibly try to split data
and metadata writeback fully.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
                   ` (17 preceding siblings ...)
  2009-09-02 22:18 ` [PATCH] fsync: wait for data writeout completion before calling ->fsync Christoph Hellwig
@ 2009-09-10 20:25 ` Christoph Hellwig
  2009-09-10 20:38   ` Trond Myklebust
                     ` (2 more replies)
  18 siblings, 3 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-10 20:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, akpm, drepper, viro, kyle

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC"
comment.  This patch intends to actually give us real O_SYNC semantics
in addition to the O_DSYNC semantics.  After Jan's O_SYNC patches which
are required before this patch it's actually surprisingly simple, we
just need to figure out when to set the datasync flag to vfs_fsync_range
and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping
it's numerical value to keep binary compatibility, and adds a new real
O_SYNC flag.  To guarantee backwards compatiblity it is defined as
expanding to both the O_DSYNC and the new additional binary flag
(__O_SYNC) to make sure we are backwards-compatible when compiled against
the new headers.

This also means that all places that don't care about the differences
can just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers
and network filesystems have been updated in a fail safe way to always
do the full sync magic if O_DSYNC is set.  The few places setting O_SYNC
for lower layers are kept that way for now to stay failsafe.

Note that Parisc really fucked up their headers as they already define
a O_DSYNC that has always been a no-op.  We try to repair it by using it
for the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c	2009-09-10 16:17:51.324254135 -0300
+++ linux-2.6/arch/x86/mm/pat.c	2009-09-10 16:20:55.577254195 -0300
@@ -541,7 +541,7 @@ int phys_mem_access_prot_allowed(struct 
 	if (!range_is_allowed(pfn, size))
 		return 0;
 
-	if (file->f_flags & O_SYNC) {
+	if (file->f_flags & O_DSYNC) {
 		flags = _PAGE_CACHE_UC_MINUS;
 	}
 
Index: linux-2.6/drivers/char/mem.c
===================================================================
--- linux-2.6.orig/drivers/char/mem.c	2009-09-10 16:17:51.331254187 -0300
+++ linux-2.6/drivers/char/mem.c	2009-09-10 16:20:55.662019572 -0300
@@ -44,7 +44,7 @@ static inline int uncached_access(struct
 {
 #if defined(CONFIG_IA64)
 	/*
-	 * On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
+	 * On ia64, we ignore O_DSYNC because we cannot tolerate memory attribute aliases.
 	 */
 	return !(efi_mem_attributes(addr) & EFI_MEMORY_WB);
 #elif defined(CONFIG_MIPS)
@@ -57,9 +57,9 @@ static inline int uncached_access(struct
 #else
 	/*
 	 * Accessing memory above the top the kernel knows about or through a file pointer
-	 * that was marked O_SYNC will be done non-cached.
+	 * that was marked O_DSYNC will be done non-cached.
 	 */
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 	return addr >= __pa(high_memory);
 #endif
Index: linux-2.6/drivers/staging/me4000/me4000.c
===================================================================
--- linux-2.6.orig/drivers/staging/me4000/me4000.c	2009-09-10 16:17:51.387254052 -0300
+++ linux-2.6/drivers/staging/me4000/me4000.c	2009-09-10 16:20:55.700008546 -0300
@@ -1985,8 +1985,8 @@ static ssize_t me4000_ao_write_cont(stru
 			spin_unlock_irqrestore(&ao_context->int_lock, flags);
 		}
 
-		/* Wait until the state machine is stopped if O_SYNC is set */
-		if (filep->f_flags & O_SYNC) {
+		/* Wait until the state machine is stopped if O_DSYNC is set */
+		if (filep->f_flags & O_DSYNC) {
 			while (inl(ao_context->status_reg) &
 			       ME4000_AO_STATUS_BIT_FSM) {
 				interruptible_sleep_on_timeout(&queue, 1);
Index: linux-2.6/drivers/usb/gadget/file_storage.c
===================================================================
--- linux-2.6.orig/drivers/usb/gadget/file_storage.c	2009-09-10 16:17:51.394254454 -0300
+++ linux-2.6/drivers/usb/gadget/file_storage.c	2009-09-10 16:20:55.710009118 -0300
@@ -1713,7 +1713,7 @@ static int do_write(struct fsg_dev *fsg)
 		}
 		if (fsg->cmnd[1] & 0x08) {	// FUA
 			spin_lock(&curlun->filp->f_lock);
-			curlun->filp->f_flags |= O_SYNC;
+			curlun->filp->f_flags |= O_DSYNC;
 			spin_unlock(&curlun->filp->f_lock);
 		}
 	}
Index: linux-2.6/fs/afs/write.c
===================================================================
--- linux-2.6.orig/fs/afs/write.c	2009-09-10 16:17:51.135254033 -0300
+++ linux-2.6/fs/afs/write.c	2009-09-10 16:20:55.712004414 -0300
@@ -692,8 +692,9 @@ ssize_t afs_file_write(struct kiocb *ioc
 	}
 
 	/* return error values for O_SYNC and IS_SYNC() */
-	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_SYNC) {
-		ret = afs_fsync(iocb->ki_filp, dentry, 1);
+	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_DSYNC) {
+		ret = afs_fsync(iocb->ki_filp, dentry,
+				(iocb->ki_filp->f_flags & O_SYNC) ? 0 : 1);
 		if (ret < 0)
 			result = ret;
 	}
Index: linux-2.6/fs/btrfs/file.c
===================================================================
--- linux-2.6.orig/fs/btrfs/file.c	2009-09-10 16:17:51.140253971 -0300
+++ linux-2.6/fs/btrfs/file.c	2009-09-10 16:20:55.716004922 -0300
@@ -924,7 +924,7 @@ static ssize_t btrfs_file_write(struct f
 	unsigned long last_index;
 	int will_write;
 
-	will_write = ((file->f_flags & O_SYNC) || IS_SYNC(inode) ||
+	will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
 		      (file->f_flags & O_DIRECT));
 
 	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
@@ -1077,7 +1077,7 @@ out_nolock:
 		if (err)
 			num_written = err;
 
-		if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
+		if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 			trans = btrfs_start_transaction(root, 1);
 			ret = btrfs_log_dentry_safe(trans, root,
 						    file->f_dentry);
Index: linux-2.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.orig/fs/cifs/dir.c	2009-09-10 16:17:51.144253851 -0300
+++ linux-2.6/fs/cifs/dir.c	2009-09-10 16:20:55.719004396 -0300
@@ -214,7 +214,8 @@ int cifs_posix_open(char *full_path, str
 		posix_flags |= SMB_O_TRUNC;
 	if (oflags & O_APPEND)
 		posix_flags |= SMB_O_APPEND;
-	if (oflags & O_SYNC)
+	/* be safe and imply O_SYNC for O_DSYNC */
+	if (oflags & O_DSYNC)
 		posix_flags |= SMB_O_SYNC;
 	if (oflags & O_DIRECTORY)
 		posix_flags |= SMB_O_DIRECTORY;
Index: linux-2.6/fs/cifs/file.c
===================================================================
--- linux-2.6.orig/fs/cifs/file.c	2009-09-10 16:17:51.198254020 -0300
+++ linux-2.6/fs/cifs/file.c	2009-09-10 16:20:55.719004396 -0300
@@ -98,8 +98,10 @@ static inline fmode_t cifs_posix_convert
 	   reopening a file.  They had their effect on the original open */
 	if (flags & O_APPEND)
 		posix_flags |= (fmode_t)O_APPEND;
-	if (flags & O_SYNC)
-		posix_flags |= (fmode_t)O_SYNC;
+	if (flags & O_DSYNC)
+		posix_flags |= (fmode_t)O_DSYNC;
+	if (flags & __O_SYNC)
+		posix_flags |= (fmode_t)__O_SYNC;
 	if (flags & O_DIRECTORY)
 		posix_flags |= (fmode_t)O_DIRECTORY;
 	if (flags & O_NOFOLLOW)
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2009-09-10 16:17:51.060253915 -0300
+++ linux-2.6/fs/namei.c	2009-09-10 16:48:04.710299751 -0300
@@ -1676,6 +1676,9 @@ struct file *do_filp_open(int dfd, const
 	int will_write;
 	int flag = open_to_namei_flags(open_flag);
 
+	if ((open_flag & __O_SYNC) && !(open_flag & O_DSYNC))
+		return ERR_PTR(-EINVAL);
+
 	if (!acc_mode)
 		acc_mode = MAY_OPEN | ACC_MODE(flag);
 
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c	2009-09-10 16:17:51.234253295 -0300
+++ linux-2.6/fs/nfs/file.c	2009-09-10 16:20:55.733005337 -0300
@@ -535,7 +535,7 @@ static int nfs_need_sync_write(struct fi
 {
 	struct nfs_open_context *ctx;
 
-	if (IS_SYNC(inode) || (filp->f_flags & O_SYNC))
+	if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
 		return 1;
 	ctx = nfs_file_open_context(filp);
 	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags))
@@ -576,7 +576,7 @@ static ssize_t nfs_file_write(struct kio
 
 	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
 	result = generic_file_aio_write(iocb, iov, nr_segs, pos);
-	/* Return error values for O_SYNC and IS_SYNC() */
+	/* Return error values for O_DSYNC and IS_SYNC() */
 	if (result >= 0 && nfs_need_sync_write(iocb->ki_filp, inode)) {
 		int err = nfs_do_fsync(nfs_file_open_context(iocb->ki_filp), inode);
 		if (err < 0)
Index: linux-2.6/fs/nfs/write.c
===================================================================
--- linux-2.6.orig/fs/nfs/write.c	2009-09-10 16:17:51.301254210 -0300
+++ linux-2.6/fs/nfs/write.c	2009-09-10 16:20:55.734004766 -0300
@@ -762,7 +762,7 @@ int nfs_updatepage(struct file *file, st
 	 */
 	if (nfs_write_pageuptodate(page, inode) &&
 			inode->i_flock == NULL &&
-			!(file->f_flags & O_SYNC)) {
+			!(file->f_flags & O_DSYNC)) {
 		count = max(count + offset, nfs_page_length(page));
 		offset = 0;
 	}
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h	2009-09-10 16:17:51.049254051 -0300
+++ linux-2.6/include/asm-generic/fcntl.h	2009-09-10 16:34:50.890004705 -0300
@@ -3,8 +3,6 @@
 
 #include <linux/types.h>
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_ACCMODE	00000003
 #define O_RDONLY	00000000
 #define O_WRONLY	00000001
@@ -27,8 +25,8 @@
 #ifndef O_NONBLOCK
 #define O_NONBLOCK	00004000
 #endif
-#ifndef O_SYNC
-#define O_SYNC		00010000
+#ifndef O_DSYNC
+#define O_DSYNC		00010000	/* used to be O_SYNC, see below */
 #endif
 #ifndef FASYNC
 #define FASYNC		00020000	/* fcntl, for BSD compatibility */
@@ -51,6 +49,25 @@
 #ifndef O_CLOEXEC
 #define O_CLOEXEC	02000000	/* set close_on_exec */
 #endif
+
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+
+ * Note: __O_SYNC must never be used directly.
+ */
+#ifndef O_SYNC
+#define __O_SYNC	04000000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
+#endif
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c	2009-09-10 16:22:58.096253707 -0300
+++ linux-2.6/fs/ocfs2/file.c	2009-09-10 16:23:16.359256714 -0300
@@ -1878,7 +1878,7 @@ out_dio:
 	/* buffered aio wouldn't have proper lock coverage today */
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
-	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
+	if ((file->f_flags & O_DSYNC && !direct_io) || IS_SYNC(inode)) {
 		ret = filemap_fdatawrite_range(file->f_mapping, pos,
 					       pos + count - 1);
 		if (ret < 0)
Index: linux-2.6/fs/ubifs/file.c
===================================================================
--- linux-2.6.orig/fs/ubifs/file.c	2009-09-10 16:23:25.507276389 -0300
+++ linux-2.6/fs/ubifs/file.c	2009-09-10 16:24:20.322254305 -0300
@@ -1403,7 +1403,7 @@ static ssize_t ubifs_aio_write(struct ki
 	if (ret < 0)
 		return ret;
 
-	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_SYNC)) {
+	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_DSYNC)) {
 		err = ubifs_sync_wbufs_by_inode(c, inode);
 		if (err)
 			return err;
Index: linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-10 16:24:26.914275924 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-10 16:24:48.495255171 -0300
@@ -811,7 +811,7 @@ write_retry:
 	XFS_STATS_ADD(xs_write_bytes, ret);
 
 	/* Handle various SYNC-type writes */
-	if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
+	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		int error2;
 
 		xfs_iunlock(xip, iolock);
Index: linux-2.6/sound/core/rawmidi.c
===================================================================
--- linux-2.6.orig/sound/core/rawmidi.c	2009-09-10 16:25:01.675028270 -0300
+++ linux-2.6/sound/core/rawmidi.c	2009-09-10 16:25:13.867256078 -0300
@@ -1258,7 +1258,7 @@ static ssize_t snd_rawmidi_write(struct 
 			break;
 		count -= count1;
 	}
-	if (file->f_flags & O_SYNC) {
+	if (file->f_flags & O_DSYNC) {
 		spin_lock_irq(&runtime->lock);
 		while (runtime->avail != runtime->buffer_size) {
 			wait_queue_t wait;
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-10 16:31:47.720004025 -0300
+++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-10 16:33:55.087294444 -0300
@@ -1,8 +1,6 @@
 #ifndef _ALPHA_FCNTL_H
 #define _ALPHA_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_CREAT		 01000	/* not fcntl */
 #define O_TRUNC		 02000	/* not fcntl */
 #define O_EXCL		 04000	/* not fcntl */
@@ -10,13 +8,28 @@
 
 #define O_NONBLOCK	 00004
 #define O_APPEND	 00010
-#define O_SYNC		040000
+#define O_DSYNC		040000	/* used to be O_SYNC, see below */
 #define O_DIRECTORY	0100000	/* must be a directory */
 #define O_NOFOLLOW	0200000 /* don't follow links */
 #define O_LARGEFILE	0400000 /* will be set by the kernel on every open */
 #define O_DIRECT	02000000 /* direct disk access - should check with OSF/1 */
 #define O_NOATIME	04000000
 #define O_CLOEXEC	010000000 /* set close_on_exec */
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	010000000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 
 #define F_GETLK		7
 #define F_SETLK		8
Index: linux-2.6/arch/blackfin/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/blackfin/include/asm/fcntl.h	2009-09-10 16:26:12.586004008 -0300
+++ linux-2.6/arch/blackfin/include/asm/fcntl.h	2009-09-10 16:26:17.423254257 -0300
@@ -1,8 +1,6 @@
 #ifndef _BFIN_FCNTL_H
 #define _BFIN_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_DIRECTORY	 040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
Index: linux-2.6/arch/mips/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-10 16:33:10.872011143 -0300
+++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-10 16:34:35.708004894 -0300
@@ -10,7 +10,7 @@
 
 
 #define O_APPEND	0x0008
-#define O_SYNC		0x0010
+#define O_DSYNC		000010	/* used to be O_SYNC, see below */
 #define O_NONBLOCK	0x0080
 #define O_CREAT         0x0100	/* not fcntl */
 #define O_TRUNC		0x0200	/* not fcntl */
Index: linux-2.6/arch/mips/kernel/kspd.c
===================================================================
--- linux-2.6.orig/arch/mips/kernel/kspd.c	2009-09-10 16:27:50.020272497 -0300
+++ linux-2.6/arch/mips/kernel/kspd.c	2009-09-10 16:28:29.602253909 -0300
@@ -82,6 +82,7 @@ static int sp_stopping = 0;
 #define MTSP_O_SHLOCK		0x0010
 #define MTSP_O_EXLOCK		0x0020
 #define MTSP_O_ASYNC		0x0040
+/* XXX: check which of these is actually O_SYNC vs O_DSYNC */
 #define MTSP_O_FSYNC		O_SYNC
 #define MTSP_O_NOFOLLOW		0x0100
 #define MTSP_O_SYNC		0x0080
Index: linux-2.6/arch/mips/lemote/lm2e/mem.c
===================================================================
--- linux-2.6.orig/arch/mips/lemote/lm2e/mem.c	2009-09-10 16:28:35.860254650 -0300
+++ linux-2.6/arch/mips/lemote/lm2e/mem.c	2009-09-10 16:28:43.011008930 -0300
@@ -11,7 +11,7 @@
 /* override of arch/mips/mm/cache.c: __uncached_access */
 int __uncached_access(struct file *file, unsigned long addr)
 {
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 
 	/*
Index: linux-2.6/arch/mips/mm/cache.c
===================================================================
--- linux-2.6.orig/arch/mips/mm/cache.c	2009-09-10 16:28:47.598003838 -0300
+++ linux-2.6/arch/mips/mm/cache.c	2009-09-10 16:28:51.492351243 -0300
@@ -194,7 +194,7 @@ void __devinit cpu_cache_init(void)
 
 int __weak __uncached_access(struct file *file, unsigned long addr)
 {
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 
 	return addr >= __pa(high_memory);
Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h	2009-09-10 16:26:24.647005349 -0300
+++ linux-2.6/arch/parisc/include/asm/fcntl.h	2009-09-10 17:03:20.287004185 -0300
@@ -1,14 +1,13 @@
 #ifndef _PARISC_FCNTL_H
 #define _PARISC_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_APPEND	000000010
 #define O_BLKSEEK	000000100 /* HPUX only */
 #define O_CREAT		000000400 /* not fcntl */
 #define O_EXCL		000002000 /* not fcntl */
 #define O_LARGEFILE	000004000
-#define O_SYNC		000100000
+#define __O_SYNC	000100000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 #define O_NONBLOCK	000200004 /* HPUX has separate NDELAY & NONBLOCK */
 #define O_NOCTTY	000400000 /* not fcntl */
 #define O_DSYNC		001000000 /* HPUX only */
Index: linux-2.6/arch/sparc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/fcntl.h	2009-09-10 16:29:04.995008494 -0300
+++ linux-2.6/arch/sparc/include/asm/fcntl.h	2009-09-10 16:35:44.147004304 -0300
@@ -1,14 +1,12 @@
 #ifndef _SPARC_FCNTL_H
 #define _SPARC_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_APPEND	0x0008
 #define FASYNC		0x0040	/* fcntl, for BSD compatibility */
 #define O_CREAT		0x0200	/* not fcntl */
 #define O_TRUNC		0x0400	/* not fcntl */
 #define O_EXCL		0x0800	/* not fcntl */
-#define O_SYNC		0x2000
+#define O_DSYNC		0x2000	/* used to be O_SYNC, see below */
 #define O_NONBLOCK	0x4000
 #if defined(__sparc__) && defined(__arch64__)
 #define O_NDELAY	0x0004
@@ -20,6 +18,21 @@
 #define O_DIRECT        0x100000 /* direct disk access hint */
 #define O_NOATIME	0x200000
 #define O_CLOEXEC	0x400000
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	0x800000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 
 #define F_GETOWN	5	/*  for sockets. */
 #define F_SETOWN	6	/*  for sockets. */
Index: linux-2.6/fs/sync.c
===================================================================
--- linux-2.6.orig/fs/sync.c	2009-09-10 16:30:32.414027738 -0300
+++ linux-2.6/fs/sync.c	2009-09-10 16:31:19.042005715 -0300
@@ -285,10 +285,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
  */
 int generic_write_sync(struct file *file, loff_t pos, loff_t count)
 {
-	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
+	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
 		return 0;
 	return vfs_fsync_range(file, file->f_path.dentry, pos,
-			       pos + count - 1, 1);
+			       pos + count - 1,
+			       (file->f_flags & O_SYNC) ? 1 : 0);
 }
 EXPORT_SYMBOL(generic_write_sync);
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
@ 2009-09-10 20:38   ` Trond Myklebust
  2009-09-10 20:40     ` Christoph Hellwig
  2009-09-10 23:07   ` Andreas Dilger
  2009-09-11 19:16   ` [PATCHv2 " Christoph Hellwig
  2 siblings, 1 reply; 52+ messages in thread
From: Trond Myklebust @ 2009-09-10 20:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-fsdevel, akpm, drepper, viro, kyle

On Thu, 2009-09-10 at 22:25 +0200, Christoph Hellwig wrote:
> Index: linux-2.6/fs/sync.c
> ===================================================================
> --- linux-2.6.orig/fs/sync.c	2009-09-10 16:30:32.414027738 -0300
> +++ linux-2.6/fs/sync.c	2009-09-10 16:31:19.042005715 -0300
> @@ -285,10 +285,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
>   */
>  int generic_write_sync(struct file *file, loff_t pos, loff_t count)
>  {
> -	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
> +	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
>  		return 0;
>  	return vfs_fsync_range(file, file->f_path.dentry, pos,
> -			       pos + count - 1, 1);
> +			       pos + count - 1,
> +			       (file->f_flags & O_SYNC) ? 1 : 0);
>  }
>  EXPORT_SYMBOL(generic_write_sync);
>  
Shouldn't this be testing for

   file->f_flags & __O_SYNC

?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 20:38   ` Trond Myklebust
@ 2009-09-10 20:40     ` Christoph Hellwig
  2009-09-10 20:43       ` Trond Myklebust
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-10 20:40 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Jan Kara, linux-kernel, linux-fsdevel, akpm,
	drepper, viro, kyle

On Thu, Sep 10, 2009 at 04:38:40PM -0400, Trond Myklebust wrote:
> >  	return vfs_fsync_range(file, file->f_path.dentry, pos,
> > -			       pos + count - 1, 1);
> > +			       pos + count - 1,
> > +			       (file->f_flags & O_SYNC) ? 1 : 0);
> >  }
> >  EXPORT_SYMBOL(generic_write_sync);
> >  
> Shouldn't this be testing for
> 
>    file->f_flags & __O_SYNC
> 
> ?

Doesn't matter, we check early in the open path that __O_SYNC is only
set together with O_SYNC.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 20:40     ` Christoph Hellwig
@ 2009-09-10 20:43       ` Trond Myklebust
  2009-09-10 20:44         ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: Trond Myklebust @ 2009-09-10 20:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-fsdevel, akpm, drepper, viro, kyle

On Thu, 2009-09-10 at 22:40 +0200, Christoph Hellwig wrote:
> On Thu, Sep 10, 2009 at 04:38:40PM -0400, Trond Myklebust wrote:
> > >  	return vfs_fsync_range(file, file->f_path.dentry, pos,
> > > -			       pos + count - 1, 1);
> > > +			       pos + count - 1,
> > > +			       (file->f_flags & O_SYNC) ? 1 : 0);
> > >  }
> > >  EXPORT_SYMBOL(generic_write_sync);
> > >  
> > Shouldn't this be testing for
> > 
> >    file->f_flags & __O_SYNC
> > 
> > ?
> 
> Doesn't matter, we check early in the open path that __O_SYNC is only
> set together with O_SYNC.

Right, but (file->f_flags & O_SYNC) will be non-zero even if only the
O_DSYNC flag is set.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 20:43       ` Trond Myklebust
@ 2009-09-10 20:44         ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-10 20:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Christoph Hellwig, Jan Kara, linux-kernel, linux-fsdevel, akpm,
	drepper, viro, kyle

On Thu, Sep 10, 2009 at 04:43:35PM -0400, Trond Myklebust wrote:
> > 
> > Doesn't matter, we check early in the open path that __O_SYNC is only
> > set together with O_SYNC.
> 
> Right, but (file->f_flags & O_SYNC) will be non-zero even if only the
> O_DSYNC flag is set.

Thanks, corrected.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
  2009-09-10 20:38   ` Trond Myklebust
@ 2009-09-10 23:07   ` Andreas Dilger
  2009-09-10 23:18     ` Christoph Hellwig
  2009-09-11 19:16   ` [PATCHv2 " Christoph Hellwig
  2 siblings, 1 reply; 52+ messages in thread
From: Andreas Dilger @ 2009-09-10 23:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-fsdevel, akpm, drepper, viro, kyle

On Sep 10, 2009  22:25 +0200, Christoph Hellwig wrote:
> +/*
> + * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
> + * the O_SYNC flag.  We continue to use the existing numerical value
> + * for O_DSYNC semantics now, but using the correct symbolic name for it.
> + * This new value is used to request true Posix O_SYNC semantics.  It is
> + * defined in this strange way to make sure applications compiled against
> + * new headers get at least O_DSYNC semantics on older kernels.
> + *
> + * This has the nice side-effect that we can simply test for O_DSYNC
> + * wherever we do not care if O_DSYNC or O_SYNC is used.
> +
> + * Note: __O_SYNC must never be used directly.

Doesn't it make sense that applications that actually know what they are
doing may want to start using __O_SYNC directly at some point in the
future?  It makes sense to code the kernel to handle both of these flags
appropriately (i.e. if __O_SYNC is set, but O_DSYNC is not then treat
this as the proper "O_SYNC").

> Index: linux-2.6/arch/alpha/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-10 16:31:47.720004025 -0300
> +++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-10 16:33:55.087294444 -0300
>  #define O_CLOEXEC	010000000 /* set close_on_exec */
> +#define __O_SYNC	010000000

These two flags have the same value...

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 23:07   ` Andreas Dilger
@ 2009-09-10 23:18     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-10 23:18 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Christoph Hellwig, Jan Kara, linux-kernel, linux-fsdevel, akpm,
	drepper, viro, kyle

On Fri, Sep 11, 2009 at 01:07:55AM +0200, Andreas Dilger wrote:
> > + * Note: __O_SYNC must never be used directly.
> 
> Doesn't it make sense that applications that actually know what they are
> doing may want to start using __O_SYNC directly at some point in the
> future?  It makes sense to code the kernel to handle both of these flags
> appropriately (i.e. if __O_SYNC is set, but O_DSYNC is not then treat
> this as the proper "O_SYNC").

What would be the benefit of that?  Setting two bits vs one in a data
structure is not going to make any difference, and the way it's done in
this patch is actually much easier to implement in the kernel.

> > Index: linux-2.6/arch/alpha/include/asm/fcntl.h
> > ===================================================================
> > --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-10 16:31:47.720004025 -0300
> > +++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-10 16:33:55.087294444 -0300
> >  #define O_CLOEXEC	010000000 /* set close_on_exec */
> > +#define __O_SYNC	010000000
> 
> These two flags have the same value...

Thanks, corrected.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCHv2 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
  2009-09-10 20:38   ` Trond Myklebust
  2009-09-10 23:07   ` Andreas Dilger
@ 2009-09-11 19:16   ` Christoph Hellwig
  2009-09-14 16:54     ` Jan Kara
  2 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-11 19:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, akpm, drepper, viro, kyle

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC"
comment.  This patch intends to actually give us real O_SYNC semantics
in addition to the O_DSYNC semantics.  After Jan's O_SYNC patches which
are required before this patch it's actually surprisingly simple, we
just need to figure out when to set the datasync flag to vfs_fsync_range
and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping
it's numerical value to keep binary compatibility, and adds a new real
O_SYNC flag.  To guarantee backwards compatiblity it is defined as
expanding to both the O_DSYNC and the new additional binary flag
(__O_SYNC) to make sure we are backwards-compatible when compiled against
the new headers.

This also means that all places that don't care about the differences
can just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers
and network filesystems have been updated in a fail safe way to always
do the full sync magic if O_DSYNC is set.  The few places setting O_SYNC
for lower layers are kept that way for now to stay failsafe.

Note that parisc really fucked up their headers as they already define
a O_DSYNC that has always been a no-op.  We try to repair it by using it
for the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c	2009-09-10 21:02:06.369009712 -0300
+++ linux-2.6/arch/x86/mm/pat.c	2009-09-11 16:11:50.424274144 -0300
@@ -541,7 +541,7 @@ int phys_mem_access_prot_allowed(struct 
 	if (!range_is_allowed(pfn, size))
 		return 0;
 
-	if (file->f_flags & O_SYNC) {
+	if (file->f_flags & O_DSYNC) {
 		flags = _PAGE_CACHE_UC_MINUS;
 	}
 
Index: linux-2.6/drivers/char/mem.c
===================================================================
--- linux-2.6.orig/drivers/char/mem.c	2009-09-11 14:53:21.023003943 -0300
+++ linux-2.6/drivers/char/mem.c	2009-09-11 16:11:50.424274144 -0300
@@ -44,7 +44,7 @@ static inline int uncached_access(struct
 {
 #if defined(CONFIG_IA64)
 	/*
-	 * On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
+	 * On ia64, we ignore O_DSYNC because we cannot tolerate memory attribute aliases.
 	 */
 	return !(efi_mem_attributes(addr) & EFI_MEMORY_WB);
 #elif defined(CONFIG_MIPS)
@@ -57,9 +57,9 @@ static inline int uncached_access(struct
 #else
 	/*
 	 * Accessing memory above the top the kernel knows about or through a file pointer
-	 * that was marked O_SYNC will be done non-cached.
+	 * that was marked O_DSYNC will be done non-cached.
 	 */
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 	return addr >= __pa(high_memory);
 #endif
Index: linux-2.6/drivers/staging/me4000/me4000.c
===================================================================
--- linux-2.6.orig/drivers/staging/me4000/me4000.c	2009-09-10 21:02:06.691004468 -0300
+++ linux-2.6/drivers/staging/me4000/me4000.c	2009-09-11 16:11:50.427273688 -0300
@@ -1985,8 +1985,8 @@ static ssize_t me4000_ao_write_cont(stru
 			spin_unlock_irqrestore(&ao_context->int_lock, flags);
 		}
 
-		/* Wait until the state machine is stopped if O_SYNC is set */
-		if (filep->f_flags & O_SYNC) {
+		/* Wait until the state machine is stopped if O_DSYNC is set */
+		if (filep->f_flags & O_DSYNC) {
 			while (inl(ao_context->status_reg) &
 			       ME4000_AO_STATUS_BIT_FSM) {
 				interruptible_sleep_on_timeout(&queue, 1);
Index: linux-2.6/drivers/usb/gadget/file_storage.c
===================================================================
--- linux-2.6.orig/drivers/usb/gadget/file_storage.c	2009-09-10 21:02:06.704004095 -0300
+++ linux-2.6/drivers/usb/gadget/file_storage.c	2009-09-11 16:11:50.434004504 -0300
@@ -1713,7 +1713,7 @@ static int do_write(struct fsg_dev *fsg)
 		}
 		if (fsg->cmnd[1] & 0x08) {	// FUA
 			spin_lock(&curlun->filp->f_lock);
-			curlun->filp->f_flags |= O_SYNC;
+			curlun->filp->f_flags |= O_DSYNC;
 			spin_unlock(&curlun->filp->f_lock);
 		}
 	}
Index: linux-2.6/fs/afs/write.c
===================================================================
--- linux-2.6.orig/fs/afs/write.c	2009-09-10 21:02:06.710003950 -0300
+++ linux-2.6/fs/afs/write.c	2009-09-11 16:11:50.439008144 -0300
@@ -692,8 +692,9 @@ ssize_t afs_file_write(struct kiocb *ioc
 	}
 
 	/* return error values for O_SYNC and IS_SYNC() */
-	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_SYNC) {
-		ret = afs_fsync(iocb->ki_filp, dentry, 1);
+	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_DSYNC) {
+		ret = afs_fsync(iocb->ki_filp, dentry,
+				(iocb->ki_filp->f_flags & __O_SYNC) ? 0 : 1);
 		if (ret < 0)
 			result = ret;
 	}
Index: linux-2.6/fs/btrfs/file.c
===================================================================
--- linux-2.6.orig/fs/btrfs/file.c	2009-09-10 21:02:06.715004446 -0300
+++ linux-2.6/fs/btrfs/file.c	2009-09-11 16:11:50.443016057 -0300
@@ -924,7 +924,7 @@ static ssize_t btrfs_file_write(struct f
 	unsigned long last_index;
 	int will_write;
 
-	will_write = ((file->f_flags & O_SYNC) || IS_SYNC(inode) ||
+	will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
 		      (file->f_flags & O_DIRECT));
 
 	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
@@ -1077,7 +1077,7 @@ out_nolock:
 		if (err)
 			num_written = err;
 
-		if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
+		if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 			trans = btrfs_start_transaction(root, 1);
 			ret = btrfs_log_dentry_safe(trans, root,
 						    file->f_dentry);
Index: linux-2.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.orig/fs/cifs/dir.c	2009-09-10 21:02:06.722004498 -0300
+++ linux-2.6/fs/cifs/dir.c	2009-09-11 16:11:50.448006078 -0300
@@ -214,7 +214,8 @@ int cifs_posix_open(char *full_path, str
 		posix_flags |= SMB_O_TRUNC;
 	if (oflags & O_APPEND)
 		posix_flags |= SMB_O_APPEND;
-	if (oflags & O_SYNC)
+	/* be safe and imply O_SYNC for O_DSYNC */
+	if (oflags & O_DSYNC)
 		posix_flags |= SMB_O_SYNC;
 	if (oflags & O_DIRECTORY)
 		posix_flags |= SMB_O_DIRECTORY;
Index: linux-2.6/fs/cifs/file.c
===================================================================
--- linux-2.6.orig/fs/cifs/file.c	2009-09-10 21:02:06.727003737 -0300
+++ linux-2.6/fs/cifs/file.c	2009-09-11 16:11:50.451006321 -0300
@@ -98,8 +98,10 @@ static inline fmode_t cifs_posix_convert
 	   reopening a file.  They had their effect on the original open */
 	if (flags & O_APPEND)
 		posix_flags |= (fmode_t)O_APPEND;
-	if (flags & O_SYNC)
-		posix_flags |= (fmode_t)O_SYNC;
+	if (flags & O_DSYNC)
+		posix_flags |= (fmode_t)O_DSYNC;
+	if (flags & __O_SYNC)
+		posix_flags |= (fmode_t)__O_SYNC;
 	if (flags & O_DIRECTORY)
 		posix_flags |= (fmode_t)O_DIRECTORY;
 	if (flags & O_NOFOLLOW)
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2009-09-11 14:53:21.111004393 -0300
+++ linux-2.6/fs/namei.c	2009-09-11 16:11:50.457006387 -0300
@@ -1678,6 +1678,9 @@ struct file *do_filp_open(int dfd, const
 	int will_write;
 	int flag = open_to_namei_flags(open_flag);
 
+	if ((open_flag & __O_SYNC) && !(open_flag & O_DSYNC))
+		return ERR_PTR(-EINVAL);
+
 	if (!acc_mode)
 		acc_mode = MAY_OPEN | ACC_MODE(flag);
 
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c	2009-09-10 21:02:06.744005200 -0300
+++ linux-2.6/fs/nfs/file.c	2009-09-11 16:11:50.461006478 -0300
@@ -535,7 +535,7 @@ static int nfs_need_sync_write(struct fi
 {
 	struct nfs_open_context *ctx;
 
-	if (IS_SYNC(inode) || (filp->f_flags & O_SYNC))
+	if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
 		return 1;
 	ctx = nfs_file_open_context(filp);
 	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags))
@@ -576,7 +576,7 @@ static ssize_t nfs_file_write(struct kio
 
 	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
 	result = generic_file_aio_write(iocb, iov, nr_segs, pos);
-	/* Return error values for O_SYNC and IS_SYNC() */
+	/* Return error values for O_DSYNC and IS_SYNC() */
 	if (result >= 0 && nfs_need_sync_write(iocb->ki_filp, inode)) {
 		int err = nfs_do_fsync(nfs_file_open_context(iocb->ki_filp), inode);
 		if (err < 0)
Index: linux-2.6/fs/nfs/write.c
===================================================================
--- linux-2.6.orig/fs/nfs/write.c	2009-09-10 21:02:06.749004230 -0300
+++ linux-2.6/fs/nfs/write.c	2009-09-11 16:11:50.465005940 -0300
@@ -762,7 +762,7 @@ int nfs_updatepage(struct file *file, st
 	 */
 	if (nfs_write_pageuptodate(page, inode) &&
 			inode->i_flock == NULL &&
-			!(file->f_flags & O_SYNC)) {
+			!(file->f_flags & O_DSYNC)) {
 		count = max(count + offset, nfs_page_length(page));
 		offset = 0;
 	}
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h	2009-09-10 21:02:06.869004122 -0300
+++ linux-2.6/include/asm-generic/fcntl.h	2009-09-11 16:11:50.468017357 -0300
@@ -3,8 +3,6 @@
 
 #include <linux/types.h>
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_ACCMODE	00000003
 #define O_RDONLY	00000000
 #define O_WRONLY	00000001
@@ -27,8 +25,8 @@
 #ifndef O_NONBLOCK
 #define O_NONBLOCK	00004000
 #endif
-#ifndef O_SYNC
-#define O_SYNC		00010000
+#ifndef O_DSYNC
+#define O_DSYNC		00010000	/* used to be O_SYNC, see below */
 #endif
 #ifndef FASYNC
 #define FASYNC		00020000	/* fcntl, for BSD compatibility */
@@ -51,6 +49,25 @@
 #ifndef O_CLOEXEC
 #define O_CLOEXEC	02000000	/* set close_on_exec */
 #endif
+
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+
+ * Note: __O_SYNC must never be used directly.
+ */
+#ifndef O_SYNC
+#define __O_SYNC	04000000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
+#endif
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c	2009-09-11 16:11:47.524253868 -0300
+++ linux-2.6/fs/ocfs2/file.c	2009-09-11 16:11:50.474005341 -0300
@@ -1878,7 +1878,7 @@ out_dio:
 	/* buffered aio wouldn't have proper lock coverage today */
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
-	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
+	if ((file->f_flags & O_DSYNC && !direct_io) || IS_SYNC(inode)) {
 		ret = filemap_fdatawrite_range(file->f_mapping, pos,
 					       pos + count - 1);
 		if (ret < 0)
Index: linux-2.6/fs/ubifs/file.c
===================================================================
--- linux-2.6.orig/fs/ubifs/file.c	2009-09-10 21:02:06.801004905 -0300
+++ linux-2.6/fs/ubifs/file.c	2009-09-11 16:11:50.476016351 -0300
@@ -1403,7 +1403,7 @@ static ssize_t ubifs_aio_write(struct ki
 	if (ret < 0)
 		return ret;
 
-	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_SYNC)) {
+	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_DSYNC)) {
 		err = ubifs_sync_wbufs_by_inode(c, inode);
 		if (err)
 			return err;
Index: linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-11 16:11:47.532254189 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-11 16:11:50.481006442 -0300
@@ -811,7 +811,7 @@ write_retry:
 	XFS_STATS_ADD(xs_write_bytes, ret);
 
 	/* Handle various SYNC-type writes */
-	if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
+	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		int error2;
 
 		xfs_iunlock(xip, iolock);
Index: linux-2.6/sound/core/rawmidi.c
===================================================================
--- linux-2.6.orig/sound/core/rawmidi.c	2009-09-11 14:53:21.207004255 -0300
+++ linux-2.6/sound/core/rawmidi.c	2009-09-11 16:11:50.487006299 -0300
@@ -1258,7 +1258,7 @@ static ssize_t snd_rawmidi_write(struct 
 			break;
 		count -= count1;
 	}
-	if (file->f_flags & O_SYNC) {
+	if (file->f_flags & O_DSYNC) {
 		spin_lock_irq(&runtime->lock);
 		while (runtime->avail != runtime->buffer_size) {
 			wait_queue_t wait;
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-10 21:02:06.381022412 -0300
+++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-11 16:11:50.491005971 -0300
@@ -1,8 +1,6 @@
 #ifndef _ALPHA_FCNTL_H
 #define _ALPHA_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_CREAT		 01000	/* not fcntl */
 #define O_TRUNC		 02000	/* not fcntl */
 #define O_EXCL		 04000	/* not fcntl */
@@ -10,13 +8,28 @@
 
 #define O_NONBLOCK	 00004
 #define O_APPEND	 00010
-#define O_SYNC		040000
+#define O_DSYNC		040000	/* used to be O_SYNC, see below */
 #define O_DIRECTORY	0100000	/* must be a directory */
 #define O_NOFOLLOW	0200000 /* don't follow links */
 #define O_LARGEFILE	0400000 /* will be set by the kernel on every open */
 #define O_DIRECT	02000000 /* direct disk access - should check with OSF/1 */
 #define O_NOATIME	04000000
 #define O_CLOEXEC	010000000 /* set close_on_exec */
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	020000000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 
 #define F_GETLK		7
 #define F_SETLK		8
Index: linux-2.6/arch/blackfin/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/blackfin/include/asm/fcntl.h	2009-09-10 21:02:06.390003722 -0300
+++ linux-2.6/arch/blackfin/include/asm/fcntl.h	2009-09-11 16:11:50.494006144 -0300
@@ -1,8 +1,6 @@
 #ifndef _BFIN_FCNTL_H
 #define _BFIN_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_DIRECTORY	 040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
Index: linux-2.6/arch/mips/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-10 21:02:06.443262027 -0300
+++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-11 16:11:50.495015560 -0300
@@ -10,7 +10,7 @@
 
 
 #define O_APPEND	0x0008
-#define O_SYNC		0x0010
+#define O_DSYNC		000010	/* used to be O_SYNC, see below */
 #define O_NONBLOCK	0x0080
 #define O_CREAT         0x0100	/* not fcntl */
 #define O_TRUNC		0x0200	/* not fcntl */
Index: linux-2.6/arch/mips/kernel/kspd.c
===================================================================
--- linux-2.6.orig/arch/mips/kernel/kspd.c	2009-09-10 21:02:06.465005782 -0300
+++ linux-2.6/arch/mips/kernel/kspd.c	2009-09-11 16:11:50.499009085 -0300
@@ -82,6 +82,7 @@ static int sp_stopping = 0;
 #define MTSP_O_SHLOCK		0x0010
 #define MTSP_O_EXLOCK		0x0020
 #define MTSP_O_ASYNC		0x0040
+/* XXX: check which of these is actually O_SYNC vs O_DSYNC */
 #define MTSP_O_FSYNC		O_SYNC
 #define MTSP_O_NOFOLLOW		0x0100
 #define MTSP_O_SYNC		0x0080
Index: linux-2.6/arch/mips/lemote/lm2e/mem.c
===================================================================
--- linux-2.6.orig/arch/mips/lemote/lm2e/mem.c	2009-09-10 21:02:06.497028569 -0300
+++ linux-2.6/arch/mips/lemote/lm2e/mem.c	2009-09-11 16:11:50.503021258 -0300
@@ -11,7 +11,7 @@
 /* override of arch/mips/mm/cache.c: __uncached_access */
 int __uncached_access(struct file *file, unsigned long addr)
 {
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 
 	/*
Index: linux-2.6/arch/mips/mm/cache.c
===================================================================
--- linux-2.6.orig/arch/mips/mm/cache.c	2009-09-10 21:02:06.583002680 -0300
+++ linux-2.6/arch/mips/mm/cache.c	2009-09-11 16:11:50.507011921 -0300
@@ -194,7 +194,7 @@ void __devinit cpu_cache_init(void)
 
 int __weak __uncached_access(struct file *file, unsigned long addr)
 {
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 
 	return addr >= __pa(high_memory);
Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h	2009-09-10 21:02:06.618023193 -0300
+++ linux-2.6/arch/parisc/include/asm/fcntl.h	2009-09-11 16:11:50.512006342 -0300
@@ -1,14 +1,13 @@
 #ifndef _PARISC_FCNTL_H
 #define _PARISC_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_APPEND	000000010
 #define O_BLKSEEK	000000100 /* HPUX only */
 #define O_CREAT		000000400 /* not fcntl */
 #define O_EXCL		000002000 /* not fcntl */
 #define O_LARGEFILE	000004000
-#define O_SYNC		000100000
+#define __O_SYNC	000100000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 #define O_NONBLOCK	000200004 /* HPUX has separate NDELAY & NONBLOCK */
 #define O_NOCTTY	000400000 /* not fcntl */
 #define O_DSYNC		001000000 /* HPUX only */
Index: linux-2.6/arch/sparc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/fcntl.h	2009-09-10 21:02:06.671004509 -0300
+++ linux-2.6/arch/sparc/include/asm/fcntl.h	2009-09-11 16:11:50.513006260 -0300
@@ -1,14 +1,12 @@
 #ifndef _SPARC_FCNTL_H
 #define _SPARC_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_APPEND	0x0008
 #define FASYNC		0x0040	/* fcntl, for BSD compatibility */
 #define O_CREAT		0x0200	/* not fcntl */
 #define O_TRUNC		0x0400	/* not fcntl */
 #define O_EXCL		0x0800	/* not fcntl */
-#define O_SYNC		0x2000
+#define O_DSYNC		0x2000	/* used to be O_SYNC, see below */
 #define O_NONBLOCK	0x4000
 #if defined(__sparc__) && defined(__arch64__)
 #define O_NDELAY	0x0004
@@ -20,6 +18,21 @@
 #define O_DIRECT        0x100000 /* direct disk access hint */
 #define O_NOATIME	0x200000
 #define O_CLOEXEC	0x400000
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	0x800000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 
 #define F_GETOWN	5	/*  for sockets. */
 #define F_SETOWN	6	/*  for sockets. */
Index: linux-2.6/fs/sync.c
===================================================================
--- linux-2.6.orig/fs/sync.c	2009-09-11 16:11:49.725278522 -0300
+++ linux-2.6/fs/sync.c	2009-09-11 16:11:50.516015792 -0300
@@ -287,10 +287,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
  */
 int generic_write_sync(struct file *file, loff_t pos, loff_t count)
 {
-	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
+	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
 		return 0;
 	return vfs_fsync_range(file, file->f_path.dentry, pos,
-			       pos + count - 1, 1);
+			       pos + count - 1,
+			       (file->f_flags & __O_SYNC) ? 1 : 0);
 }
 EXPORT_SYMBOL(generic_write_sync);
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHv2 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-11 19:16   ` [PATCHv2 " Christoph Hellwig
@ 2009-09-14 16:54     ` Jan Kara
  2009-09-14 17:02       ` Christoph Hellwig
  2009-09-15 13:12       ` [PATCH] " Christoph Hellwig
  0 siblings, 2 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-14 16:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-fsdevel, akpm, drepper, viro, kyle

  Hi,

On Fri 11-09-09 21:16:00, Christoph Hellwig wrote:
> While Linux provided an O_SYNC flag basically since day 1, it took until
> Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
> since that day we had generic_osync_around with only minor changes and the
> great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC"
> comment.  This patch intends to actually give us real O_SYNC semantics
> in addition to the O_DSYNC semantics.  After Jan's O_SYNC patches which
> are required before this patch it's actually surprisingly simple, we
> just need to figure out when to set the datasync flag to vfs_fsync_range
> and when not.
> 
> This patch renames the existing O_SYNC flag to O_DSYNC while keeping
> it's numerical value to keep binary compatibility, and adds a new real
> O_SYNC flag.  To guarantee backwards compatiblity it is defined as
> expanding to both the O_DSYNC and the new additional binary flag
> (__O_SYNC) to make sure we are backwards-compatible when compiled against
> the new headers.
> 
> This also means that all places that don't care about the differences
> can just check O_DSYNC and get the right behaviour for O_SYNC, too - only
> places that actuall care need to check __O_SYNC in addition.  Drivers
> and network filesystems have been updated in a fail safe way to always
> do the full sync magic if O_DSYNC is set.  The few places setting O_SYNC
> for lower layers are kept that way for now to stay failsafe.
> 
> Note that parisc really fucked up their headers as they already define
> a O_DSYNC that has always been a no-op.  We try to repair it by using it
> for the new O_DSYNC and redefinining O_SYNC to send both the traditional
> O_SYNC numerical value _and_ the O_DSYNC one.
  I've sent Linus a pull request without this patch (I have some comments
to it). When this patch is ready, you can merge it yourself or I can do
it if you like.

> Index: linux-2.6/fs/afs/write.c
> ===================================================================
> --- linux-2.6.orig/fs/afs/write.c	2009-09-10 21:02:06.710003950 -0300
> +++ linux-2.6/fs/afs/write.c	2009-09-11 16:11:50.439008144 -0300
> @@ -692,8 +692,9 @@ ssize_t afs_file_write(struct kiocb *ioc
>  	}
>  
>  	/* return error values for O_SYNC and IS_SYNC() */
> -	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_SYNC) {
> -		ret = afs_fsync(iocb->ki_filp, dentry, 1);
> +	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_DSYNC) {
> +		ret = afs_fsync(iocb->ki_filp, dentry,
> +				(iocb->ki_filp->f_flags & __O_SYNC) ? 0 : 1);
>  		if (ret < 0)
>  			result = ret;
>  	}
  This code can go away because generic_file_aio_write() already calls
fsync()...

> Index: linux-2.6/arch/mips/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-10 21:02:06.443262027 -0300
> +++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-11 16:11:50.495015560 -0300
> @@ -10,7 +10,7 @@
>  
>  
>  #define O_APPEND	0x0008
> -#define O_SYNC		0x0010
> +#define O_DSYNC		000010	/* used to be O_SYNC, see below */
  The value used to be in hex, not in octal. Moreover I don't see O_SYNC
defined in the header now...

> Index: linux-2.6/arch/mips/kernel/kspd.c
> ===================================================================
> --- linux-2.6.orig/arch/mips/kernel/kspd.c	2009-09-10 21:02:06.465005782 -0300
> +++ linux-2.6/arch/mips/kernel/kspd.c	2009-09-11 16:11:50.499009085 -0300
> @@ -82,6 +82,7 @@ static int sp_stopping = 0;
>  #define MTSP_O_SHLOCK		0x0010
>  #define MTSP_O_EXLOCK		0x0020
>  #define MTSP_O_ASYNC		0x0040
> +/* XXX: check which of these is actually O_SYNC vs O_DSYNC */
>  #define MTSP_O_FSYNC		O_SYNC
>  #define MTSP_O_NOFOLLOW		0x0100
>  #define MTSP_O_SYNC		0x0080
  Since noone uses MTSP_O_FSYNC and it's not exported, I guess it's your
choice ;). Looking at the code, it looks slightly incomplete - probably
open_flags_table should contain all the MTSP_O_... flags but I don't really
know.

> Index: linux-2.6/arch/parisc/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h	2009-09-10 21:02:06.618023193 -0300
> +++ linux-2.6/arch/parisc/include/asm/fcntl.h	2009-09-11 16:11:50.512006342 -0300
> @@ -1,14 +1,13 @@
>  #ifndef _PARISC_FCNTL_H
>  #define _PARISC_FCNTL_H
>  
> -/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
> -   located on an ext2 file system */
>  #define O_APPEND	000000010
>  #define O_BLKSEEK	000000100 /* HPUX only */
>  #define O_CREAT		000000400 /* not fcntl */
>  #define O_EXCL		000002000 /* not fcntl */
>  #define O_LARGEFILE	000004000
> -#define O_SYNC		000100000
> +#define __O_SYNC	000100000
> +#define O_SYNC		(__O_SYNC|O_DSYNC)
>  #define O_NONBLOCK	000200004 /* HPUX has separate NDELAY & NONBLOCK */
>  #define O_NOCTTY	000400000 /* not fcntl */
>  #define O_DSYNC		001000000 /* HPUX only */
  So for parisc, programs compiled against old headers will fail open
O_SYNC because of the check in open() you've added will bail out with
EINVAL. I don't like it  but I'm not sure we can do better...

> Index: linux-2.6/fs/sync.c
> ===================================================================
> --- linux-2.6.orig/fs/sync.c	2009-09-11 16:11:49.725278522 -0300
> +++ linux-2.6/fs/sync.c	2009-09-11 16:11:50.516015792 -0300
> @@ -287,10 +287,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
>   */
>  int generic_write_sync(struct file *file, loff_t pos, loff_t count)
>  {
> -	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
> +	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
>  		return 0;
>  	return vfs_fsync_range(file, file->f_path.dentry, pos,
> -			       pos + count - 1, 1);
> +			       pos + count - 1,
> +			       (file->f_flags & __O_SYNC) ? 1 : 0);
  The logic is inverted here, isn't it?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHv2 18/16] implement posix O_SYNC and O_DSYNC semantics
  2009-09-14 16:54     ` Jan Kara
@ 2009-09-14 17:02       ` Christoph Hellwig
  2009-09-15 13:12       ` [PATCH] " Christoph Hellwig
  1 sibling, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-14 17:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-kernel, linux-fsdevel, akpm, drepper,
	viro, kyle

On Mon, Sep 14, 2009 at 06:54:19PM +0200, Jan Kara wrote:
>   I've sent Linus a pull request without this patch (I have some comments
> to it). When this patch is ready, you can merge it yourself or I can do
> it if you like.

Yeah, much better anyway.  I'll also post a O_RSYNC implementation later
today.

> > Index: linux-2.6/fs/afs/write.c
> > ===================================================================
> > --- linux-2.6.orig/fs/afs/write.c	2009-09-10 21:02:06.710003950 -0300
> > +++ linux-2.6/fs/afs/write.c	2009-09-11 16:11:50.439008144 -0300
> > @@ -692,8 +692,9 @@ ssize_t afs_file_write(struct kiocb *ioc
> >  	}
> >  
> >  	/* return error values for O_SYNC and IS_SYNC() */
> > -	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_SYNC) {
> > -		ret = afs_fsync(iocb->ki_filp, dentry, 1);
> > +	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_DSYNC) {
> > +		ret = afs_fsync(iocb->ki_filp, dentry,
> > +				(iocb->ki_filp->f_flags & __O_SYNC) ? 0 : 1);
> >  		if (ret < 0)
> >  			result = ret;
> >  	}
>   This code can go away because generic_file_aio_write() already calls
> fsync()...

Yes, but that should be a separate patch.

> > Index: linux-2.6/arch/mips/include/asm/fcntl.h
> > ===================================================================
> > --- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-10 21:02:06.443262027 -0300
> > +++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-11 16:11:50.495015560 -0300
> > @@ -10,7 +10,7 @@
> >  
> >  
> >  #define O_APPEND	0x0008
> > -#define O_SYNC		0x0010
> > +#define O_DSYNC		000010	/* used to be O_SYNC, see below */
>   The value used to be in hex, not in octal. Moreover I don't see O_SYNC
> defined in the header now...

Thanks, fixed up both bits.

> > Index: linux-2.6/arch/mips/kernel/kspd.c
> > ===================================================================
> > --- linux-2.6.orig/arch/mips/kernel/kspd.c	2009-09-10 21:02:06.465005782 -0300
> > +++ linux-2.6/arch/mips/kernel/kspd.c	2009-09-11 16:11:50.499009085 -0300
> > @@ -82,6 +82,7 @@ static int sp_stopping = 0;
> >  #define MTSP_O_SHLOCK		0x0010
> >  #define MTSP_O_EXLOCK		0x0020
> >  #define MTSP_O_ASYNC		0x0040
> > +/* XXX: check which of these is actually O_SYNC vs O_DSYNC */
> >  #define MTSP_O_FSYNC		O_SYNC
> >  #define MTSP_O_NOFOLLOW		0x0100
> >  #define MTSP_O_SYNC		0x0080
>   Since noone uses MTSP_O_FSYNC and it's not exported, I guess it's your
> choice ;). Looking at the code, it looks slightly incomplete - probably
> open_flags_table should contain all the MTSP_O_... flags but I don't really
> know.

Yeah, I'll hope someone who knows this are better is going to chime in.

>   So for parisc, programs compiled against old headers will fail open
> O_SYNC because of the check in open() you've added will bail out with
> EINVAL. I don't like it  but I'm not sure we can do better...

Hmm.  let me thing about something for parisc.

> > @@ -287,10 +287,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
> >   */
> >  int generic_write_sync(struct file *file, loff_t pos, loff_t count)
> >  {
> > -	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
> > +	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
> >  		return 0;
> >  	return vfs_fsync_range(file, file->f_path.dentry, pos,
> > -			       pos + count - 1, 1);
> > +			       pos + count - 1,
> > +			       (file->f_flags & __O_SYNC) ? 1 : 0);
>   The logic is inverted here, isn't it?

Yeah, already correct in my tree after I started to the barrier
testing in qemu that noticed it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] implement posix O_SYNC and O_DSYNC semantics
  2009-09-14 16:54     ` Jan Kara
  2009-09-14 17:02       ` Christoph Hellwig
@ 2009-09-15 13:12       ` Christoph Hellwig
  2009-09-15 14:10         ` Jan Kara
                           ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-15 13:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-arch, akpm, drepper, viro, kyle, sct

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC"
comment.  This patch intends to actually give us real O_SYNC semantics
in addition to the O_DSYNC semantics.  After Jan's O_SYNC patches which
are required before this patch it's actually surprisingly simple, we
just need to figure out when to set the datasync flag to vfs_fsync_range
and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping
it's numerical value to keep binary compatibility, and adds a new real
O_SYNC flag.  To guarantee backwards compatiblity it is defined as
expanding to both the O_DSYNC and the new additional binary flag
(__O_SYNC) to make sure we are backwards-compatible when compiled against
the new headers.

This also means that all places that don't care about the differences
can just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers
and network filesystems have been updated in a fail safe way to always
do the full sync magic if O_DSYNC is set.  The few places setting O_SYNC
for lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the
open path to make sure we always get these sane options.

Note that parisc really fucked up their headers as they already define
a O_DSYNC that has always been a no-op.  We try to repair it by using it
for the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.


Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c	2009-09-15 00:46:32.911256267 -0300
+++ linux-2.6/arch/x86/mm/pat.c	2009-09-15 09:41:27.301253948 -0300
@@ -541,7 +541,7 @@ int phys_mem_access_prot_allowed(struct 
 	if (!range_is_allowed(pfn, size))
 		return 0;
 
-	if (file->f_flags & O_SYNC) {
+	if (file->f_flags & O_DSYNC) {
 		flags = _PAGE_CACHE_UC_MINUS;
 	}
 
Index: linux-2.6/drivers/char/mem.c
===================================================================
--- linux-2.6.orig/drivers/char/mem.c	2009-09-15 00:46:33.096254330 -0300
+++ linux-2.6/drivers/char/mem.c	2009-09-15 09:41:27.302253936 -0300
@@ -44,7 +44,7 @@ static inline int uncached_access(struct
 {
 #if defined(CONFIG_IA64)
 	/*
-	 * On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
+	 * On ia64, we ignore O_DSYNC because we cannot tolerate memory attribute aliases.
 	 */
 	return !(efi_mem_attributes(addr) & EFI_MEMORY_WB);
 #elif defined(CONFIG_MIPS)
@@ -57,9 +57,9 @@ static inline int uncached_access(struct
 #else
 	/*
 	 * Accessing memory above the top the kernel knows about or through a file pointer
-	 * that was marked O_SYNC will be done non-cached.
+	 * that was marked O_DSYNC will be done non-cached.
 	 */
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 	return addr >= __pa(high_memory);
 #endif
Index: linux-2.6/drivers/staging/me4000/me4000.c
===================================================================
--- linux-2.6.orig/drivers/staging/me4000/me4000.c	2009-09-15 00:46:33.130254399 -0300
+++ linux-2.6/drivers/staging/me4000/me4000.c	2009-09-15 09:41:27.305253618 -0300
@@ -1985,8 +1985,8 @@ static ssize_t me4000_ao_write_cont(stru
 			spin_unlock_irqrestore(&ao_context->int_lock, flags);
 		}
 
-		/* Wait until the state machine is stopped if O_SYNC is set */
-		if (filep->f_flags & O_SYNC) {
+		/* Wait until the state machine is stopped if O_DSYNC is set */
+		if (filep->f_flags & O_DSYNC) {
 			while (inl(ao_context->status_reg) &
 			       ME4000_AO_STATUS_BIT_FSM) {
 				interruptible_sleep_on_timeout(&queue, 1);
Index: linux-2.6/drivers/usb/gadget/file_storage.c
===================================================================
--- linux-2.6.orig/drivers/usb/gadget/file_storage.c	2009-09-15 00:46:33.138253951 -0300
+++ linux-2.6/drivers/usb/gadget/file_storage.c	2009-09-15 09:41:27.311253752 -0300
@@ -1713,7 +1713,7 @@ static int do_write(struct fsg_dev *fsg)
 		}
 		if (fsg->cmnd[1] & 0x08) {	// FUA
 			spin_lock(&curlun->filp->f_lock);
-			curlun->filp->f_flags |= O_SYNC;
+			curlun->filp->f_flags |= O_DSYNC;
 			spin_unlock(&curlun->filp->f_lock);
 		}
 	}
Index: linux-2.6/fs/afs/write.c
===================================================================
--- linux-2.6.orig/fs/afs/write.c	2009-09-15 00:46:33.144254016 -0300
+++ linux-2.6/fs/afs/write.c	2009-09-15 09:41:27.316253550 -0300
@@ -692,8 +692,9 @@ ssize_t afs_file_write(struct kiocb *ioc
 	}
 
 	/* return error values for O_SYNC and IS_SYNC() */
-	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_SYNC) {
-		ret = afs_fsync(iocb->ki_filp, dentry, 1);
+	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_DSYNC) {
+		ret = afs_fsync(iocb->ki_filp, dentry,
+				(iocb->ki_filp->f_flags & __O_SYNC) ? 0 : 1);
 		if (ret < 0)
 			result = ret;
 	}
Index: linux-2.6/fs/btrfs/file.c
===================================================================
--- linux-2.6.orig/fs/btrfs/file.c	2009-09-15 00:46:33.151254279 -0300
+++ linux-2.6/fs/btrfs/file.c	2009-09-15 09:41:27.316253550 -0300
@@ -924,7 +924,7 @@ static ssize_t btrfs_file_write(struct f
 	unsigned long last_index;
 	int will_write;
 
-	will_write = ((file->f_flags & O_SYNC) || IS_SYNC(inode) ||
+	will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
 		      (file->f_flags & O_DIRECT));
 
 	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
@@ -1077,7 +1077,7 @@ out_nolock:
 		if (err)
 			num_written = err;
 
-		if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
+		if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 			trans = btrfs_start_transaction(root, 1);
 			ret = btrfs_log_dentry_safe(trans, root,
 						    file->f_dentry);
Index: linux-2.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.orig/fs/cifs/dir.c	2009-09-15 00:46:33.156254147 -0300
+++ linux-2.6/fs/cifs/dir.c	2009-09-15 09:41:27.319254141 -0300
@@ -214,7 +214,8 @@ int cifs_posix_open(char *full_path, str
 		posix_flags |= SMB_O_TRUNC;
 	if (oflags & O_APPEND)
 		posix_flags |= SMB_O_APPEND;
-	if (oflags & O_SYNC)
+	/* be safe and imply O_SYNC for O_DSYNC */
+	if (oflags & O_DSYNC)
 		posix_flags |= SMB_O_SYNC;
 	if (oflags & O_DIRECTORY)
 		posix_flags |= SMB_O_DIRECTORY;
Index: linux-2.6/fs/cifs/file.c
===================================================================
--- linux-2.6.orig/fs/cifs/file.c	2009-09-15 00:46:33.162254422 -0300
+++ linux-2.6/fs/cifs/file.c	2009-09-15 09:41:27.323254719 -0300
@@ -96,8 +96,10 @@ static inline fmode_t cifs_posix_convert
 	   reopening a file.  They had their effect on the original open */
 	if (flags & O_APPEND)
 		posix_flags |= (fmode_t)O_APPEND;
-	if (flags & O_SYNC)
-		posix_flags |= (fmode_t)O_SYNC;
+	if (flags & O_DSYNC)
+		posix_flags |= (fmode_t)O_DSYNC;
+	if (flags & __O_SYNC)
+		posix_flags |= (fmode_t)__O_SYNC;
 	if (flags & O_DIRECTORY)
 		posix_flags |= (fmode_t)O_DIRECTORY;
 	if (flags & O_NOFOLLOW)
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2009-09-15 00:46:33.168253161 -0300
+++ linux-2.6/fs/namei.c	2009-09-15 09:45:26.694256679 -0300
@@ -1678,6 +1678,15 @@ struct file *do_filp_open(int dfd, const
 	int will_write;
 	int flag = open_to_namei_flags(open_flag);
 
+	/*
+	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
+	 * check for O_DSYNC if the need any syncing at all we enforce it's
+	 * always set instead of having to deal with possibly weird behaviour
+	 * for malicious applications setting only __O_SYNC.
+	 */
+	if (open_flag & __O_SYNC)
+		open_flag |= O_DSYNC;
+
 	if (!acc_mode)
 		acc_mode = MAY_OPEN | ACC_MODE(flag);
 
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c	2009-09-15 00:46:33.174254134 -0300
+++ linux-2.6/fs/nfs/file.c	2009-09-15 09:41:27.330253653 -0300
@@ -580,7 +580,7 @@ static int nfs_need_sync_write(struct fi
 {
 	struct nfs_open_context *ctx;
 
-	if (IS_SYNC(inode) || (filp->f_flags & O_SYNC))
+	if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
 		return 1;
 	ctx = nfs_file_open_context(filp);
 	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags))
@@ -621,7 +621,7 @@ static ssize_t nfs_file_write(struct kio
 
 	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
 	result = generic_file_aio_write(iocb, iov, nr_segs, pos);
-	/* Return error values for O_SYNC and IS_SYNC() */
+	/* Return error values for O_DSYNC and IS_SYNC() */
 	if (result >= 0 && nfs_need_sync_write(iocb->ki_filp, inode)) {
 		int err = nfs_do_fsync(nfs_file_open_context(iocb->ki_filp), inode);
 		if (err < 0)
Index: linux-2.6/fs/nfs/write.c
===================================================================
--- linux-2.6.orig/fs/nfs/write.c	2009-09-15 00:46:33.180254200 -0300
+++ linux-2.6/fs/nfs/write.c	2009-09-15 09:41:27.332254187 -0300
@@ -774,7 +774,7 @@ int nfs_updatepage(struct file *file, st
 	 */
 	if (nfs_write_pageuptodate(page, inode) &&
 			inode->i_flock == NULL &&
-			!(file->f_flags & O_SYNC)) {
+			!(file->f_flags & O_DSYNC)) {
 		count = max(count + offset, nfs_page_length(page));
 		offset = 0;
 	}
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h	2009-09-15 00:46:33.211253817 -0300
+++ linux-2.6/include/asm-generic/fcntl.h	2009-09-15 09:41:27.335253940 -0300
@@ -3,8 +3,6 @@
 
 #include <linux/types.h>
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_ACCMODE	00000003
 #define O_RDONLY	00000000
 #define O_WRONLY	00000001
@@ -27,8 +25,8 @@
 #ifndef O_NONBLOCK
 #define O_NONBLOCK	00004000
 #endif
-#ifndef O_SYNC
-#define O_SYNC		00010000
+#ifndef O_DSYNC
+#define O_DSYNC		00010000	/* used to be O_SYNC, see below */
 #endif
 #ifndef FASYNC
 #define FASYNC		00020000	/* fcntl, for BSD compatibility */
@@ -51,6 +49,25 @@
 #ifndef O_CLOEXEC
 #define O_CLOEXEC	02000000	/* set close_on_exec */
 #endif
+
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+ *
+ * Note: __O_SYNC must never be used directly.
+ */
+#ifndef O_SYNC
+#define __O_SYNC	04000000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
+#endif
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
Index: linux-2.6/fs/ocfs2/file.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/file.c	2009-09-15 00:46:33.186253776 -0300
+++ linux-2.6/fs/ocfs2/file.c	2009-09-15 09:41:27.338254042 -0300
@@ -1878,7 +1878,7 @@ out_dio:
 	/* buffered aio wouldn't have proper lock coverage today */
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
-	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
+	if ((file->f_flags & O_DSYNC && !direct_io) || IS_SYNC(inode)) {
 		ret = filemap_fdatawrite_range(file->f_mapping, pos,
 					       pos + count - 1);
 		if (ret < 0)
Index: linux-2.6/fs/ubifs/file.c
===================================================================
--- linux-2.6.orig/fs/ubifs/file.c	2009-09-15 00:46:33.192253912 -0300
+++ linux-2.6/fs/ubifs/file.c	2009-09-15 09:41:27.341254213 -0300
@@ -1403,7 +1403,7 @@ static ssize_t ubifs_aio_write(struct ki
 	if (ret < 0)
 		return ret;
 
-	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_SYNC)) {
+	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_DSYNC)) {
 		err = ubifs_sync_wbufs_by_inode(c, inode);
 		if (err)
 			return err;
Index: linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-15 00:46:33.198253488 -0300
+++ linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-15 09:41:27.344254176 -0300
@@ -811,7 +811,7 @@ write_retry:
 	XFS_STATS_ADD(xs_write_bytes, ret);
 
 	/* Handle various SYNC-type writes */
-	if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
+	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		int error2;
 
 		xfs_iunlock(xip, iolock);
Index: linux-2.6/sound/core/rawmidi.c
===================================================================
--- linux-2.6.orig/sound/core/rawmidi.c	2009-09-15 00:46:33.219253718 -0300
+++ linux-2.6/sound/core/rawmidi.c	2009-09-15 09:41:27.347253859 -0300
@@ -1258,7 +1258,7 @@ static ssize_t snd_rawmidi_write(struct 
 			break;
 		count -= count1;
 	}
-	if (file->f_flags & O_SYNC) {
+	if (file->f_flags & O_DSYNC) {
 		spin_lock_irq(&runtime->lock);
 		while (runtime->avail != runtime->buffer_size) {
 			wait_queue_t wait;
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-15 00:46:32.945006724 -0300
+++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-15 09:41:27.348253497 -0300
@@ -1,8 +1,6 @@
 #ifndef _ALPHA_FCNTL_H
 #define _ALPHA_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_CREAT		 01000	/* not fcntl */
 #define O_TRUNC		 02000	/* not fcntl */
 #define O_EXCL		 04000	/* not fcntl */
@@ -10,13 +8,28 @@
 
 #define O_NONBLOCK	 00004
 #define O_APPEND	 00010
-#define O_SYNC		040000
+#define O_DSYNC		040000	/* used to be O_SYNC, see below */
 #define O_DIRECTORY	0100000	/* must be a directory */
 #define O_NOFOLLOW	0200000 /* don't follow links */
 #define O_LARGEFILE	0400000 /* will be set by the kernel on every open */
 #define O_DIRECT	02000000 /* direct disk access - should check with OSF/1 */
 #define O_NOATIME	04000000
 #define O_CLOEXEC	010000000 /* set close_on_exec */
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+ *
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	020000000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 
 #define F_GETLK		7
 #define F_SETLK		8
Index: linux-2.6/arch/blackfin/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/blackfin/include/asm/fcntl.h	2009-09-15 00:46:32.978006455 -0300
+++ linux-2.6/arch/blackfin/include/asm/fcntl.h	2009-09-15 09:41:27.351254088 -0300
@@ -1,8 +1,6 @@
 #ifndef _BFIN_FCNTL_H
 #define _BFIN_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_DIRECTORY	 040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
Index: linux-2.6/arch/mips/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-15 00:46:33.002006368 -0300
+++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-15 09:41:27.354254050 -0300
@@ -10,7 +10,7 @@
 
 
 #define O_APPEND	0x0008
-#define O_SYNC		0x0010
+#define O_DSYNC		0x0010	/* used to be O_SYNC, see below */
 #define O_NONBLOCK	0x0080
 #define O_CREAT         0x0100	/* not fcntl */
 #define O_TRUNC		0x0200	/* not fcntl */
@@ -18,6 +18,21 @@
 #define O_NOCTTY	0x0800	/* not fcntl */
 #define FASYNC		0x1000	/* fcntl, for BSD compatibility */
 #define O_LARGEFILE	0x2000	/* allow large file opens */
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+ *
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	0x4000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 #define O_DIRECT	0x8000	/* direct disk access hint */
 
 #define F_GETLK		14
Index: linux-2.6/arch/mips/kernel/kspd.c
===================================================================
--- linux-2.6.orig/arch/mips/kernel/kspd.c	2009-09-15 00:46:33.021004807 -0300
+++ linux-2.6/arch/mips/kernel/kspd.c	2009-09-15 09:41:27.357254082 -0300
@@ -82,6 +82,7 @@ static int sp_stopping = 0;
 #define MTSP_O_SHLOCK		0x0010
 #define MTSP_O_EXLOCK		0x0020
 #define MTSP_O_ASYNC		0x0040
+/* XXX: check which of these is actually O_SYNC vs O_DSYNC */
 #define MTSP_O_FSYNC		O_SYNC
 #define MTSP_O_NOFOLLOW		0x0100
 #define MTSP_O_SYNC		0x0080
Index: linux-2.6/arch/mips/lemote/lm2e/mem.c
===================================================================
--- linux-2.6.orig/arch/mips/lemote/lm2e/mem.c	2009-09-15 00:46:33.054254081 -0300
+++ linux-2.6/arch/mips/lemote/lm2e/mem.c	2009-09-15 09:41:27.357254082 -0300
@@ -11,7 +11,7 @@
 /* override of arch/mips/mm/cache.c: __uncached_access */
 int __uncached_access(struct file *file, unsigned long addr)
 {
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 
 	/*
Index: linux-2.6/arch/mips/mm/cache.c
===================================================================
--- linux-2.6.orig/arch/mips/mm/cache.c	2009-09-15 00:46:33.074254183 -0300
+++ linux-2.6/arch/mips/mm/cache.c	2009-09-15 09:41:27.360254044 -0300
@@ -194,7 +194,7 @@ void __devinit cpu_cache_init(void)
 
 int __weak __uncached_access(struct file *file, unsigned long addr)
 {
-	if (file->f_flags & O_SYNC)
+	if (file->f_flags & O_DSYNC)
 		return 1;
 
 	return addr >= __pa(high_memory);
Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h	2009-09-15 00:46:33.082254364 -0300
+++ linux-2.6/arch/parisc/include/asm/fcntl.h	2009-09-15 09:41:27.363254007 -0300
@@ -1,14 +1,13 @@
 #ifndef _PARISC_FCNTL_H
 #define _PARISC_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_APPEND	000000010
 #define O_BLKSEEK	000000100 /* HPUX only */
 #define O_CREAT		000000400 /* not fcntl */
 #define O_EXCL		000002000 /* not fcntl */
 #define O_LARGEFILE	000004000
-#define O_SYNC		000100000
+#define __O_SYNC	000100000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 #define O_NONBLOCK	000200004 /* HPUX has separate NDELAY & NONBLOCK */
 #define O_NOCTTY	000400000 /* not fcntl */
 #define O_DSYNC		001000000 /* HPUX only */
Index: linux-2.6/arch/sparc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/fcntl.h	2009-09-15 00:46:33.090254335 -0300
+++ linux-2.6/arch/sparc/include/asm/fcntl.h	2009-09-15 09:41:27.367253956 -0300
@@ -1,14 +1,12 @@
 #ifndef _SPARC_FCNTL_H
 #define _SPARC_FCNTL_H
 
-/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
-   located on an ext2 file system */
 #define O_APPEND	0x0008
 #define FASYNC		0x0040	/* fcntl, for BSD compatibility */
 #define O_CREAT		0x0200	/* not fcntl */
 #define O_TRUNC		0x0400	/* not fcntl */
 #define O_EXCL		0x0800	/* not fcntl */
-#define O_SYNC		0x2000
+#define O_DSYNC		0x2000	/* used to be O_SYNC, see below */
 #define O_NONBLOCK	0x4000
 #if defined(__sparc__) && defined(__arch64__)
 #define O_NDELAY	0x0004
@@ -20,6 +18,21 @@
 #define O_DIRECT        0x100000 /* direct disk access hint */
 #define O_NOATIME	0x200000
 #define O_CLOEXEC	0x400000
+/*
+ * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
+ * the O_SYNC flag.  We continue to use the existing numerical value
+ * for O_DSYNC semantics now, but using the correct symbolic name for it.
+ * This new value is used to request true Posix O_SYNC semantics.  It is
+ * defined in this strange way to make sure applications compiled against
+ * new headers get at least O_DSYNC semantics on older kernels.
+ *
+ * This has the nice side-effect that we can simply test for O_DSYNC
+ * wherever we do not care if O_DSYNC or O_SYNC is used.
+ *
+ * Note: __O_SYNC must never be used directly.
+ */
+#define __O_SYNC	0x800000
+#define O_SYNC		(__O_SYNC|O_DSYNC)
 
 #define F_GETOWN	5	/*  for sockets. */
 #define F_SETOWN	6	/*  for sockets. */
Index: linux-2.6/fs/sync.c
===================================================================
--- linux-2.6.orig/fs/sync.c	2009-09-15 00:46:33.205253612 -0300
+++ linux-2.6/fs/sync.c	2009-09-15 09:41:27.370254058 -0300
@@ -287,10 +287,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
  */
 int generic_write_sync(struct file *file, loff_t pos, loff_t count)
 {
-	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
+	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
 		return 0;
 	return vfs_fsync_range(file, file->f_path.dentry, pos,
-			       pos + count - 1, 1);
+			       pos + count - 1,
+			       (file->f_flags & __O_SYNC) ? 0 : 1);
 }
 EXPORT_SYMBOL(generic_write_sync);
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] implement posix O_SYNC and O_DSYNC semantics
  2009-09-15 13:12       ` [PATCH] " Christoph Hellwig
@ 2009-09-15 14:10         ` Jan Kara
  2009-09-15 14:50         ` Ulrich Drepper
  2009-09-17 21:03         ` Kyle McMartin
  2 siblings, 0 replies; 52+ messages in thread
From: Jan Kara @ 2009-09-15 14:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-arch, akpm, drepper, viro, kyle, sct

On Tue 15-09-09 15:12:52, Christoph Hellwig wrote:
> While Linux provided an O_SYNC flag basically since day 1, it took until
> Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
> since that day we had generic_osync_around with only minor changes and the
> great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC"
> comment.  This patch intends to actually give us real O_SYNC semantics
> in addition to the O_DSYNC semantics.  After Jan's O_SYNC patches which
> are required before this patch it's actually surprisingly simple, we
> just need to figure out when to set the datasync flag to vfs_fsync_range
> and when not.
> 
> This patch renames the existing O_SYNC flag to O_DSYNC while keeping
> it's numerical value to keep binary compatibility, and adds a new real
> O_SYNC flag.  To guarantee backwards compatiblity it is defined as
> expanding to both the O_DSYNC and the new additional binary flag
> (__O_SYNC) to make sure we are backwards-compatible when compiled against
> the new headers.
> 
> This also means that all places that don't care about the differences
> can just check O_DSYNC and get the right behaviour for O_SYNC, too - only
> places that actuall care need to check __O_SYNC in addition.  Drivers
> and network filesystems have been updated in a fail safe way to always
> do the full sync magic if O_DSYNC is set.  The few places setting O_SYNC
> for lower layers are kept that way for now to stay failsafe.
> 
> We enforce that O_DSYNC is set when __O_SYNC is set early in the
> open path to make sure we always get these sane options.
> 
> Note that parisc really fucked up their headers as they already define
> a O_DSYNC that has always been a no-op.  We try to repair it by using it
> for the new O_DSYNC and redefinining O_SYNC to send both the traditional
> O_SYNC numerical value _and_ the O_DSYNC one.
> 
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
  The patch looks fine now.
  Acked-by: Jan Kara <jack@suse.cz>

> Index: linux-2.6/arch/x86/mm/pat.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/pat.c	2009-09-15 00:46:32.911256267 -0300
> +++ linux-2.6/arch/x86/mm/pat.c	2009-09-15 09:41:27.301253948 -0300
> @@ -541,7 +541,7 @@ int phys_mem_access_prot_allowed(struct 
>  	if (!range_is_allowed(pfn, size))
>  		return 0;
>  
> -	if (file->f_flags & O_SYNC) {
> +	if (file->f_flags & O_DSYNC) {
>  		flags = _PAGE_CACHE_UC_MINUS;
>  	}
>  
> Index: linux-2.6/drivers/char/mem.c
> ===================================================================
> --- linux-2.6.orig/drivers/char/mem.c	2009-09-15 00:46:33.096254330 -0300
> +++ linux-2.6/drivers/char/mem.c	2009-09-15 09:41:27.302253936 -0300
> @@ -44,7 +44,7 @@ static inline int uncached_access(struct
>  {
>  #if defined(CONFIG_IA64)
>  	/*
> -	 * On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
> +	 * On ia64, we ignore O_DSYNC because we cannot tolerate memory attribute aliases.
>  	 */
>  	return !(efi_mem_attributes(addr) & EFI_MEMORY_WB);
>  #elif defined(CONFIG_MIPS)
> @@ -57,9 +57,9 @@ static inline int uncached_access(struct
>  #else
>  	/*
>  	 * Accessing memory above the top the kernel knows about or through a file pointer
> -	 * that was marked O_SYNC will be done non-cached.
> +	 * that was marked O_DSYNC will be done non-cached.
>  	 */
> -	if (file->f_flags & O_SYNC)
> +	if (file->f_flags & O_DSYNC)
>  		return 1;
>  	return addr >= __pa(high_memory);
>  #endif
> Index: linux-2.6/drivers/staging/me4000/me4000.c
> ===================================================================
> --- linux-2.6.orig/drivers/staging/me4000/me4000.c	2009-09-15 00:46:33.130254399 -0300
> +++ linux-2.6/drivers/staging/me4000/me4000.c	2009-09-15 09:41:27.305253618 -0300
> @@ -1985,8 +1985,8 @@ static ssize_t me4000_ao_write_cont(stru
>  			spin_unlock_irqrestore(&ao_context->int_lock, flags);
>  		}
>  
> -		/* Wait until the state machine is stopped if O_SYNC is set */
> -		if (filep->f_flags & O_SYNC) {
> +		/* Wait until the state machine is stopped if O_DSYNC is set */
> +		if (filep->f_flags & O_DSYNC) {
>  			while (inl(ao_context->status_reg) &
>  			       ME4000_AO_STATUS_BIT_FSM) {
>  				interruptible_sleep_on_timeout(&queue, 1);
> Index: linux-2.6/drivers/usb/gadget/file_storage.c
> ===================================================================
> --- linux-2.6.orig/drivers/usb/gadget/file_storage.c	2009-09-15 00:46:33.138253951 -0300
> +++ linux-2.6/drivers/usb/gadget/file_storage.c	2009-09-15 09:41:27.311253752 -0300
> @@ -1713,7 +1713,7 @@ static int do_write(struct fsg_dev *fsg)
>  		}
>  		if (fsg->cmnd[1] & 0x08) {	// FUA
>  			spin_lock(&curlun->filp->f_lock);
> -			curlun->filp->f_flags |= O_SYNC;
> +			curlun->filp->f_flags |= O_DSYNC;
>  			spin_unlock(&curlun->filp->f_lock);
>  		}
>  	}
> Index: linux-2.6/fs/afs/write.c
> ===================================================================
> --- linux-2.6.orig/fs/afs/write.c	2009-09-15 00:46:33.144254016 -0300
> +++ linux-2.6/fs/afs/write.c	2009-09-15 09:41:27.316253550 -0300
> @@ -692,8 +692,9 @@ ssize_t afs_file_write(struct kiocb *ioc
>  	}
>  
>  	/* return error values for O_SYNC and IS_SYNC() */
> -	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_SYNC) {
> -		ret = afs_fsync(iocb->ki_filp, dentry, 1);
> +	if (IS_SYNC(&vnode->vfs_inode) || iocb->ki_filp->f_flags & O_DSYNC) {
> +		ret = afs_fsync(iocb->ki_filp, dentry,
> +				(iocb->ki_filp->f_flags & __O_SYNC) ? 0 : 1);
>  		if (ret < 0)
>  			result = ret;
>  	}
> Index: linux-2.6/fs/btrfs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/btrfs/file.c	2009-09-15 00:46:33.151254279 -0300
> +++ linux-2.6/fs/btrfs/file.c	2009-09-15 09:41:27.316253550 -0300
> @@ -924,7 +924,7 @@ static ssize_t btrfs_file_write(struct f
>  	unsigned long last_index;
>  	int will_write;
>  
> -	will_write = ((file->f_flags & O_SYNC) || IS_SYNC(inode) ||
> +	will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
>  		      (file->f_flags & O_DIRECT));
>  
>  	nrptrs = min((count + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE,
> @@ -1077,7 +1077,7 @@ out_nolock:
>  		if (err)
>  			num_written = err;
>  
> -		if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
> +		if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
>  			trans = btrfs_start_transaction(root, 1);
>  			ret = btrfs_log_dentry_safe(trans, root,
>  						    file->f_dentry);
> Index: linux-2.6/fs/cifs/dir.c
> ===================================================================
> --- linux-2.6.orig/fs/cifs/dir.c	2009-09-15 00:46:33.156254147 -0300
> +++ linux-2.6/fs/cifs/dir.c	2009-09-15 09:41:27.319254141 -0300
> @@ -214,7 +214,8 @@ int cifs_posix_open(char *full_path, str
>  		posix_flags |= SMB_O_TRUNC;
>  	if (oflags & O_APPEND)
>  		posix_flags |= SMB_O_APPEND;
> -	if (oflags & O_SYNC)
> +	/* be safe and imply O_SYNC for O_DSYNC */
> +	if (oflags & O_DSYNC)
>  		posix_flags |= SMB_O_SYNC;
>  	if (oflags & O_DIRECTORY)
>  		posix_flags |= SMB_O_DIRECTORY;
> Index: linux-2.6/fs/cifs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/cifs/file.c	2009-09-15 00:46:33.162254422 -0300
> +++ linux-2.6/fs/cifs/file.c	2009-09-15 09:41:27.323254719 -0300
> @@ -96,8 +96,10 @@ static inline fmode_t cifs_posix_convert
>  	   reopening a file.  They had their effect on the original open */
>  	if (flags & O_APPEND)
>  		posix_flags |= (fmode_t)O_APPEND;
> -	if (flags & O_SYNC)
> -		posix_flags |= (fmode_t)O_SYNC;
> +	if (flags & O_DSYNC)
> +		posix_flags |= (fmode_t)O_DSYNC;
> +	if (flags & __O_SYNC)
> +		posix_flags |= (fmode_t)__O_SYNC;
>  	if (flags & O_DIRECTORY)
>  		posix_flags |= (fmode_t)O_DIRECTORY;
>  	if (flags & O_NOFOLLOW)
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c	2009-09-15 00:46:33.168253161 -0300
> +++ linux-2.6/fs/namei.c	2009-09-15 09:45:26.694256679 -0300
> @@ -1678,6 +1678,15 @@ struct file *do_filp_open(int dfd, const
>  	int will_write;
>  	int flag = open_to_namei_flags(open_flag);
>  
> +	/*
> +	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
> +	 * check for O_DSYNC if the need any syncing at all we enforce it's
> +	 * always set instead of having to deal with possibly weird behaviour
> +	 * for malicious applications setting only __O_SYNC.
> +	 */
> +	if (open_flag & __O_SYNC)
> +		open_flag |= O_DSYNC;
> +
>  	if (!acc_mode)
>  		acc_mode = MAY_OPEN | ACC_MODE(flag);
>  
> Index: linux-2.6/fs/nfs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/file.c	2009-09-15 00:46:33.174254134 -0300
> +++ linux-2.6/fs/nfs/file.c	2009-09-15 09:41:27.330253653 -0300
> @@ -580,7 +580,7 @@ static int nfs_need_sync_write(struct fi
>  {
>  	struct nfs_open_context *ctx;
>  
> -	if (IS_SYNC(inode) || (filp->f_flags & O_SYNC))
> +	if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
>  		return 1;
>  	ctx = nfs_file_open_context(filp);
>  	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags))
> @@ -621,7 +621,7 @@ static ssize_t nfs_file_write(struct kio
>  
>  	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
>  	result = generic_file_aio_write(iocb, iov, nr_segs, pos);
> -	/* Return error values for O_SYNC and IS_SYNC() */
> +	/* Return error values for O_DSYNC and IS_SYNC() */
>  	if (result >= 0 && nfs_need_sync_write(iocb->ki_filp, inode)) {
>  		int err = nfs_do_fsync(nfs_file_open_context(iocb->ki_filp), inode);
>  		if (err < 0)
> Index: linux-2.6/fs/nfs/write.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/write.c	2009-09-15 00:46:33.180254200 -0300
> +++ linux-2.6/fs/nfs/write.c	2009-09-15 09:41:27.332254187 -0300
> @@ -774,7 +774,7 @@ int nfs_updatepage(struct file *file, st
>  	 */
>  	if (nfs_write_pageuptodate(page, inode) &&
>  			inode->i_flock == NULL &&
> -			!(file->f_flags & O_SYNC)) {
> +			!(file->f_flags & O_DSYNC)) {
>  		count = max(count + offset, nfs_page_length(page));
>  		offset = 0;
>  	}
> Index: linux-2.6/include/asm-generic/fcntl.h
> ===================================================================
> --- linux-2.6.orig/include/asm-generic/fcntl.h	2009-09-15 00:46:33.211253817 -0300
> +++ linux-2.6/include/asm-generic/fcntl.h	2009-09-15 09:41:27.335253940 -0300
> @@ -3,8 +3,6 @@
>  
>  #include <linux/types.h>
>  
> -/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
> -   located on an ext2 file system */
>  #define O_ACCMODE	00000003
>  #define O_RDONLY	00000000
>  #define O_WRONLY	00000001
> @@ -27,8 +25,8 @@
>  #ifndef O_NONBLOCK
>  #define O_NONBLOCK	00004000
>  #endif
> -#ifndef O_SYNC
> -#define O_SYNC		00010000
> +#ifndef O_DSYNC
> +#define O_DSYNC		00010000	/* used to be O_SYNC, see below */
>  #endif
>  #ifndef FASYNC
>  #define FASYNC		00020000	/* fcntl, for BSD compatibility */
> @@ -51,6 +49,25 @@
>  #ifndef O_CLOEXEC
>  #define O_CLOEXEC	02000000	/* set close_on_exec */
>  #endif
> +
> +/*
> + * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
> + * the O_SYNC flag.  We continue to use the existing numerical value
> + * for O_DSYNC semantics now, but using the correct symbolic name for it.
> + * This new value is used to request true Posix O_SYNC semantics.  It is
> + * defined in this strange way to make sure applications compiled against
> + * new headers get at least O_DSYNC semantics on older kernels.
> + *
> + * This has the nice side-effect that we can simply test for O_DSYNC
> + * wherever we do not care if O_DSYNC or O_SYNC is used.
> + *
> + * Note: __O_SYNC must never be used directly.
> + */
> +#ifndef O_SYNC
> +#define __O_SYNC	04000000
> +#define O_SYNC		(__O_SYNC|O_DSYNC)
> +#endif
> +
>  #ifndef O_NDELAY
>  #define O_NDELAY	O_NONBLOCK
>  #endif
> Index: linux-2.6/fs/ocfs2/file.c
> ===================================================================
> --- linux-2.6.orig/fs/ocfs2/file.c	2009-09-15 00:46:33.186253776 -0300
> +++ linux-2.6/fs/ocfs2/file.c	2009-09-15 09:41:27.338254042 -0300
> @@ -1878,7 +1878,7 @@ out_dio:
>  	/* buffered aio wouldn't have proper lock coverage today */
>  	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
>  
> -	if ((file->f_flags & O_SYNC && !direct_io) || IS_SYNC(inode)) {
> +	if ((file->f_flags & O_DSYNC && !direct_io) || IS_SYNC(inode)) {
>  		ret = filemap_fdatawrite_range(file->f_mapping, pos,
>  					       pos + count - 1);
>  		if (ret < 0)
> Index: linux-2.6/fs/ubifs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/ubifs/file.c	2009-09-15 00:46:33.192253912 -0300
> +++ linux-2.6/fs/ubifs/file.c	2009-09-15 09:41:27.341254213 -0300
> @@ -1403,7 +1403,7 @@ static ssize_t ubifs_aio_write(struct ki
>  	if (ret < 0)
>  		return ret;
>  
> -	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_SYNC)) {
> +	if (ret > 0 && (IS_SYNC(inode) || iocb->ki_filp->f_flags & O_DSYNC)) {
>  		err = ubifs_sync_wbufs_by_inode(c, inode);
>  		if (err)
>  			return err;
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-15 00:46:33.198253488 -0300
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c	2009-09-15 09:41:27.344254176 -0300
> @@ -811,7 +811,7 @@ write_retry:
>  	XFS_STATS_ADD(xs_write_bytes, ret);
>  
>  	/* Handle various SYNC-type writes */
> -	if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
> +	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
>  		int error2;
>  
>  		xfs_iunlock(xip, iolock);
> Index: linux-2.6/sound/core/rawmidi.c
> ===================================================================
> --- linux-2.6.orig/sound/core/rawmidi.c	2009-09-15 00:46:33.219253718 -0300
> +++ linux-2.6/sound/core/rawmidi.c	2009-09-15 09:41:27.347253859 -0300
> @@ -1258,7 +1258,7 @@ static ssize_t snd_rawmidi_write(struct 
>  			break;
>  		count -= count1;
>  	}
> -	if (file->f_flags & O_SYNC) {
> +	if (file->f_flags & O_DSYNC) {
>  		spin_lock_irq(&runtime->lock);
>  		while (runtime->avail != runtime->buffer_size) {
>  			wait_queue_t wait;
> Index: linux-2.6/arch/alpha/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-15 00:46:32.945006724 -0300
> +++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-15 09:41:27.348253497 -0300
> @@ -1,8 +1,6 @@
>  #ifndef _ALPHA_FCNTL_H
>  #define _ALPHA_FCNTL_H
>  
> -/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
> -   located on an ext2 file system */
>  #define O_CREAT		 01000	/* not fcntl */
>  #define O_TRUNC		 02000	/* not fcntl */
>  #define O_EXCL		 04000	/* not fcntl */
> @@ -10,13 +8,28 @@
>  
>  #define O_NONBLOCK	 00004
>  #define O_APPEND	 00010
> -#define O_SYNC		040000
> +#define O_DSYNC		040000	/* used to be O_SYNC, see below */
>  #define O_DIRECTORY	0100000	/* must be a directory */
>  #define O_NOFOLLOW	0200000 /* don't follow links */
>  #define O_LARGEFILE	0400000 /* will be set by the kernel on every open */
>  #define O_DIRECT	02000000 /* direct disk access - should check with OSF/1 */
>  #define O_NOATIME	04000000
>  #define O_CLOEXEC	010000000 /* set close_on_exec */
> +/*
> + * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
> + * the O_SYNC flag.  We continue to use the existing numerical value
> + * for O_DSYNC semantics now, but using the correct symbolic name for it.
> + * This new value is used to request true Posix O_SYNC semantics.  It is
> + * defined in this strange way to make sure applications compiled against
> + * new headers get at least O_DSYNC semantics on older kernels.
> + *
> + * This has the nice side-effect that we can simply test for O_DSYNC
> + * wherever we do not care if O_DSYNC or O_SYNC is used.
> + *
> + * Note: __O_SYNC must never be used directly.
> + */
> +#define __O_SYNC	020000000
> +#define O_SYNC		(__O_SYNC|O_DSYNC)
>  
>  #define F_GETLK		7
>  #define F_SETLK		8
> Index: linux-2.6/arch/blackfin/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/blackfin/include/asm/fcntl.h	2009-09-15 00:46:32.978006455 -0300
> +++ linux-2.6/arch/blackfin/include/asm/fcntl.h	2009-09-15 09:41:27.351254088 -0300
> @@ -1,8 +1,6 @@
>  #ifndef _BFIN_FCNTL_H
>  #define _BFIN_FCNTL_H
>  
> -/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
> -   located on an ext2 file system */
>  #define O_DIRECTORY	 040000	/* must be a directory */
>  #define O_NOFOLLOW	0100000	/* don't follow links */
>  #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
> Index: linux-2.6/arch/mips/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-15 00:46:33.002006368 -0300
> +++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-15 09:41:27.354254050 -0300
> @@ -10,7 +10,7 @@
>  
>  
>  #define O_APPEND	0x0008
> -#define O_SYNC		0x0010
> +#define O_DSYNC		0x0010	/* used to be O_SYNC, see below */
>  #define O_NONBLOCK	0x0080
>  #define O_CREAT         0x0100	/* not fcntl */
>  #define O_TRUNC		0x0200	/* not fcntl */
> @@ -18,6 +18,21 @@
>  #define O_NOCTTY	0x0800	/* not fcntl */
>  #define FASYNC		0x1000	/* fcntl, for BSD compatibility */
>  #define O_LARGEFILE	0x2000	/* allow large file opens */
> +/*
> + * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
> + * the O_SYNC flag.  We continue to use the existing numerical value
> + * for O_DSYNC semantics now, but using the correct symbolic name for it.
> + * This new value is used to request true Posix O_SYNC semantics.  It is
> + * defined in this strange way to make sure applications compiled against
> + * new headers get at least O_DSYNC semantics on older kernels.
> + *
> + * This has the nice side-effect that we can simply test for O_DSYNC
> + * wherever we do not care if O_DSYNC or O_SYNC is used.
> + *
> + * Note: __O_SYNC must never be used directly.
> + */
> +#define __O_SYNC	0x4000
> +#define O_SYNC		(__O_SYNC|O_DSYNC)
>  #define O_DIRECT	0x8000	/* direct disk access hint */
>  
>  #define F_GETLK		14
> Index: linux-2.6/arch/mips/kernel/kspd.c
> ===================================================================
> --- linux-2.6.orig/arch/mips/kernel/kspd.c	2009-09-15 00:46:33.021004807 -0300
> +++ linux-2.6/arch/mips/kernel/kspd.c	2009-09-15 09:41:27.357254082 -0300
> @@ -82,6 +82,7 @@ static int sp_stopping = 0;
>  #define MTSP_O_SHLOCK		0x0010
>  #define MTSP_O_EXLOCK		0x0020
>  #define MTSP_O_ASYNC		0x0040
> +/* XXX: check which of these is actually O_SYNC vs O_DSYNC */
>  #define MTSP_O_FSYNC		O_SYNC
>  #define MTSP_O_NOFOLLOW		0x0100
>  #define MTSP_O_SYNC		0x0080
> Index: linux-2.6/arch/mips/lemote/lm2e/mem.c
> ===================================================================
> --- linux-2.6.orig/arch/mips/lemote/lm2e/mem.c	2009-09-15 00:46:33.054254081 -0300
> +++ linux-2.6/arch/mips/lemote/lm2e/mem.c	2009-09-15 09:41:27.357254082 -0300
> @@ -11,7 +11,7 @@
>  /* override of arch/mips/mm/cache.c: __uncached_access */
>  int __uncached_access(struct file *file, unsigned long addr)
>  {
> -	if (file->f_flags & O_SYNC)
> +	if (file->f_flags & O_DSYNC)
>  		return 1;
>  
>  	/*
> Index: linux-2.6/arch/mips/mm/cache.c
> ===================================================================
> --- linux-2.6.orig/arch/mips/mm/cache.c	2009-09-15 00:46:33.074254183 -0300
> +++ linux-2.6/arch/mips/mm/cache.c	2009-09-15 09:41:27.360254044 -0300
> @@ -194,7 +194,7 @@ void __devinit cpu_cache_init(void)
>  
>  int __weak __uncached_access(struct file *file, unsigned long addr)
>  {
> -	if (file->f_flags & O_SYNC)
> +	if (file->f_flags & O_DSYNC)
>  		return 1;
>  
>  	return addr >= __pa(high_memory);
> Index: linux-2.6/arch/parisc/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h	2009-09-15 00:46:33.082254364 -0300
> +++ linux-2.6/arch/parisc/include/asm/fcntl.h	2009-09-15 09:41:27.363254007 -0300
> @@ -1,14 +1,13 @@
>  #ifndef _PARISC_FCNTL_H
>  #define _PARISC_FCNTL_H
>  
> -/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
> -   located on an ext2 file system */
>  #define O_APPEND	000000010
>  #define O_BLKSEEK	000000100 /* HPUX only */
>  #define O_CREAT		000000400 /* not fcntl */
>  #define O_EXCL		000002000 /* not fcntl */
>  #define O_LARGEFILE	000004000
> -#define O_SYNC		000100000
> +#define __O_SYNC	000100000
> +#define O_SYNC		(__O_SYNC|O_DSYNC)
>  #define O_NONBLOCK	000200004 /* HPUX has separate NDELAY & NONBLOCK */
>  #define O_NOCTTY	000400000 /* not fcntl */
>  #define O_DSYNC		001000000 /* HPUX only */
> Index: linux-2.6/arch/sparc/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/sparc/include/asm/fcntl.h	2009-09-15 00:46:33.090254335 -0300
> +++ linux-2.6/arch/sparc/include/asm/fcntl.h	2009-09-15 09:41:27.367253956 -0300
> @@ -1,14 +1,12 @@
>  #ifndef _SPARC_FCNTL_H
>  #define _SPARC_FCNTL_H
>  
> -/* open/fcntl - O_SYNC is only implemented on blocks devices and on files
> -   located on an ext2 file system */
>  #define O_APPEND	0x0008
>  #define FASYNC		0x0040	/* fcntl, for BSD compatibility */
>  #define O_CREAT		0x0200	/* not fcntl */
>  #define O_TRUNC		0x0400	/* not fcntl */
>  #define O_EXCL		0x0800	/* not fcntl */
> -#define O_SYNC		0x2000
> +#define O_DSYNC		0x2000	/* used to be O_SYNC, see below */
>  #define O_NONBLOCK	0x4000
>  #if defined(__sparc__) && defined(__arch64__)
>  #define O_NDELAY	0x0004
> @@ -20,6 +18,21 @@
>  #define O_DIRECT        0x100000 /* direct disk access hint */
>  #define O_NOATIME	0x200000
>  #define O_CLOEXEC	0x400000
> +/*
> + * Before Linux 2.6.32 only O_DSYNC semantics were implemented, but using
> + * the O_SYNC flag.  We continue to use the existing numerical value
> + * for O_DSYNC semantics now, but using the correct symbolic name for it.
> + * This new value is used to request true Posix O_SYNC semantics.  It is
> + * defined in this strange way to make sure applications compiled against
> + * new headers get at least O_DSYNC semantics on older kernels.
> + *
> + * This has the nice side-effect that we can simply test for O_DSYNC
> + * wherever we do not care if O_DSYNC or O_SYNC is used.
> + *
> + * Note: __O_SYNC must never be used directly.
> + */
> +#define __O_SYNC	0x800000
> +#define O_SYNC		(__O_SYNC|O_DSYNC)
>  
>  #define F_GETOWN	5	/*  for sockets. */
>  #define F_SETOWN	6	/*  for sockets. */
> Index: linux-2.6/fs/sync.c
> ===================================================================
> --- linux-2.6.orig/fs/sync.c	2009-09-15 00:46:33.205253612 -0300
> +++ linux-2.6/fs/sync.c	2009-09-15 09:41:27.370254058 -0300
> @@ -287,10 +287,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int,
>   */
>  int generic_write_sync(struct file *file, loff_t pos, loff_t count)
>  {
> -	if (!(file->f_flags & O_SYNC) && !IS_SYNC(file->f_mapping->host))
> +	if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
>  		return 0;
>  	return vfs_fsync_range(file, file->f_path.dentry, pos,
> -			       pos + count - 1, 1);
> +			       pos + count - 1,
> +			       (file->f_flags & __O_SYNC) ? 0 : 1);
>  }
>  EXPORT_SYMBOL(generic_write_sync);
>  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] implement posix O_SYNC and O_DSYNC semantics
  2009-09-15 13:12       ` [PATCH] " Christoph Hellwig
  2009-09-15 14:10         ` Jan Kara
@ 2009-09-15 14:50         ` Ulrich Drepper
  2009-09-17 17:16           ` Christoph Hellwig
  2009-09-17 21:03         ` Kyle McMartin
  2 siblings, 1 reply; 52+ messages in thread
From: Ulrich Drepper @ 2009-09-15 14:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-arch, akpm, viro, kyle, sct

On 09/15/2009 06:12 AM, Christoph Hellwig wrote:

> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Acked-by: Trond Myklebust<Trond.Myklebust@netapp.com>

Looks OK to me:

Acked-by: Ulrich Drepper <drepper@redhat.com>

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] implement posix O_SYNC and O_DSYNC semantics
  2009-09-15 14:50         ` Ulrich Drepper
@ 2009-09-17 17:16           ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2009-09-17 17:16 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Christoph Hellwig, Jan Kara, linux-kernel, linux-arch, akpm,
	viro, kyle, sct

Btw, a little update on O_RSYNC:  I have a patch that should work,
but surprisingly enough it doesn't.  Seem like the O_ flags grew too
large and somewhere in the middle they get truncated off.  Here's what I
have so far:

Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2009-09-15 00:06:09.737003454 -0300
+++ linux-2.6/fs/splice.c	2009-09-15 00:08:23.669254032 -0300
@@ -501,6 +501,10 @@ ssize_t generic_file_splice_read(struct 
 	if (unlikely(left < len))
 		len = left;
 
+	ret = generic_read_sync(in, *ppos, len);
+	if (ret)
+		return ret;
+
 	ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
 	if (ret > 0) {
 		*ppos += ret;
Index: linux-2.6/fs/sync.c
===================================================================
--- linux-2.6.orig/fs/sync.c	2009-09-15 00:08:23.180271144 -0300
+++ linux-2.6/fs/sync.c	2009-09-15 00:28:41.359031442 -0300
@@ -295,6 +295,33 @@ int generic_write_sync(struct file *file
 }
 EXPORT_SYMBOL(generic_write_sync);
 
+/**
+ * generic_read_sync - perform syncing befor
+ * @file:	file to which the read happens
+ * @pos:	offset where the read starts
+ * @count:	length of the read
+ *
+ * This implements the O_RSYNC semantics:
+ *   O_RSYNC on its own just means the data is successfully transferred to
+ *   the calling process (always the case).
+ *
+ *   O_RSYNC|O_DSYNC means that if a read request hits data that is currently
+ *   in a cache and not yet on the medium, then the write to medium is
+ *   successful before the read succeeds.
+ *
+ *   O_RSYNC|O_SYNC means the same plus the integrity of file meta information
+ *   (access time etc).
+ */
+int generic_read_sync(struct file *file, loff_t pos, loff_t count)
+{
+	if (((file->f_flags & (O_RSYNC|O_DSYNC)) != (O_RSYNC|O_DSYNC)))
+		return 0;
+	return vfs_fsync_range(file, file->f_path.dentry, pos,
+			       pos + count - 1,
+			       (file->f_flags & __O_SYNC) ? 0 : 1);
+}
+EXPORT_SYMBOL(generic_read_sync);
+
 /*
  * sys_sync_file_range() permits finely controlled syncing over a segment of
  * a file in the range offset .. (offset+nbytes-1) inclusive.  If nbytes is
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h	2009-09-15 00:08:23.162254189 -0300
+++ linux-2.6/include/asm-generic/fcntl.h	2009-09-15 00:08:23.672254134 -0300
@@ -68,6 +68,10 @@
 #define O_SYNC		(__O_SYNC|O_DSYNC)
 #endif
 
+#ifndef O_RSYNC
+#define O_RSYNC		010000000
+#endif
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-09-15 00:06:09.758004312 -0300
+++ linux-2.6/include/linux/fs.h	2009-09-15 00:08:23.673254191 -0300
@@ -2097,6 +2097,7 @@ extern int vfs_fsync_range(struct file *
 			   loff_t start, loff_t end, int datasync);
 extern int vfs_fsync(struct file *file, struct dentry *dentry, int datasync);
 extern int generic_write_sync(struct file *file, loff_t pos, loff_t count);
+extern int generic_read_sync(struct file *file, loff_t pos, loff_t count);
 extern void sync_supers(void);
 extern void emergency_sync(void);
 extern void emergency_remount(void);
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2009-09-15 00:06:09.764004377 -0300
+++ linux-2.6/mm/filemap.c	2009-09-15 00:08:23.676300248 -0300
@@ -1285,6 +1285,10 @@ generic_file_aio_read(struct kiocb *iocb
 	if (retval)
 		return retval;
 
+	retval = generic_read_sync(filp, pos, count);
+	if (retval)
+		return retval;
+
 	/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
 	if (filp->f_flags & O_DIRECT) {
 		loff_t size;
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h	2009-09-15 00:08:23.169254241 -0300
+++ linux-2.6/arch/alpha/include/asm/fcntl.h	2009-09-15 00:08:23.678253988 -0300
@@ -30,6 +30,7 @@
  */
 #define __O_SYNC	020000000
 #define O_SYNC		(__O_SYNC|O_DSYNC)
+#define O_RSYNC		040000000
 
 #define F_GETLK		7
 #define F_SETLK		8
Index: linux-2.6/arch/mips/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/mips/include/asm/fcntl.h	2009-09-15 00:08:23.172253854 -0300
+++ linux-2.6/arch/mips/include/asm/fcntl.h	2009-09-15 00:08:23.678253988 -0300
@@ -34,6 +34,7 @@
 #define __O_SYNC	0x4000
 #define O_SYNC		(__O_SYNC|O_DSYNC)
 #define O_DIRECT	0x8000	/* direct disk access hint */
+#define O_DSYNC		0x10000
 
 #define F_GETLK		14
 #define F_SETLK		6
Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h	2009-09-15 00:08:23.178298896 -0300
+++ linux-2.6/arch/parisc/include/asm/fcntl.h	2009-09-15 00:08:23.680301735 -0300
@@ -14,6 +14,7 @@
 #define O_RSYNC		002000000 /* HPUX only */
 #define O_NOATIME	004000000
 #define O_CLOEXEC	010000000 /* set close_on_exec */
+#define O_RSYNC		020000000
 
 #define O_DIRECTORY	000010000 /* must be a directory */
 #define O_NOFOLLOW	000000200 /* don't follow links */
Index: linux-2.6/arch/sparc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/sparc/include/asm/fcntl.h	2009-09-15 00:08:23.179254674 -0300
+++ linux-2.6/arch/sparc/include/asm/fcntl.h	2009-09-15 00:08:23.681254370 -0300
@@ -33,6 +33,7 @@
  */
 #define __O_SYNC	0x800000
 #define O_SYNC		(__O_SYNC|O_DSYNC)
+#define O_RSYNC		0x1000000
 
 #define F_GETOWN	5	/*  for sockets. */
 #define F_SETOWN	6	/*  for sockets. */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] implement posix O_SYNC and O_DSYNC semantics
  2009-09-15 13:12       ` [PATCH] " Christoph Hellwig
  2009-09-15 14:10         ` Jan Kara
  2009-09-15 14:50         ` Ulrich Drepper
@ 2009-09-17 21:03         ` Kyle McMartin
  2 siblings, 0 replies; 52+ messages in thread
From: Kyle McMartin @ 2009-09-17 21:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-kernel, linux-arch, akpm, drepper, viro, kyle, sct

On Tue, Sep 15, 2009 at 03:12:52PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 

Acked-by: Kyle McMartin <kyle@redhat.com>

Parisc bits look ok to me.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2009-09-17 21:03 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
2009-09-02 13:59 ` [PATCH 02/16] vfs: Export __generic_file_aio_write() and add some comments Jan Kara
2009-09-02 13:59   ` [Ocfs2-devel] " Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59 ` [PATCH 03/16] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() Jan Kara
2009-09-02 13:59   ` [Ocfs2-devel] " Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59 ` [PATCH 04/16] pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Jan Kara
2009-09-02 13:59 ` [PATCH 05/16] ocfs2: " Jan Kara
2009-09-02 13:59   ` [Ocfs2-devel] " Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59 ` [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock Jan Kara
2009-09-02 21:47   ` Christoph Hellwig
2009-09-03 10:24     ` Jan Kara
2009-09-03 15:37       ` Christoph Hellwig
2009-09-02 13:59 ` [PATCH 07/16] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Jan Kara
2009-09-02 13:59   ` [Ocfs2-devel] " Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59 ` [PATCH 08/16] ext2: Update comment about generic_osync_inode Jan Kara
2009-09-02 13:59 ` [PATCH 09/16] ext3: Remove syncing logic from ext3_file_write Jan Kara
2009-09-02 13:59 ` [PATCH 10/16] ext4: Remove syncing logic from ext4_file_write Jan Kara
2009-09-02 13:59 ` [PATCH 11/16] ntfs: Use new syncing helpers and update comments Jan Kara
2009-09-02 13:59 ` [PATCH 12/16] ocfs2: Update syncing after splicing to match generic version Jan Kara
2009-09-02 13:59   ` [Ocfs2-devel] " Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59 ` [PATCH 13/16] xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() Jan Kara
2009-09-02 13:59   ` Jan Kara
2009-09-02 13:59 ` [PATCH 14/16] pohmelfs: Use new syncing helper Jan Kara
2009-09-02 13:59 ` [PATCH 15/16] fat: Opencode sync_page_range_nolock() Jan Kara
2009-09-02 13:59 ` [PATCH 16/16] vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() Jan Kara
2009-09-02 14:16 ` [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Christoph Hellwig
2009-09-02 22:18 ` [PATCH] fsync: wait for data writeout completion before calling ->fsync Christoph Hellwig
2009-09-02 22:37   ` Joel Becker
2009-09-03 10:47   ` Jan Kara
2009-09-03 15:39     ` Christoph Hellwig
2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
2009-09-10 20:38   ` Trond Myklebust
2009-09-10 20:40     ` Christoph Hellwig
2009-09-10 20:43       ` Trond Myklebust
2009-09-10 20:44         ` Christoph Hellwig
2009-09-10 23:07   ` Andreas Dilger
2009-09-10 23:18     ` Christoph Hellwig
2009-09-11 19:16   ` [PATCHv2 " Christoph Hellwig
2009-09-14 16:54     ` Jan Kara
2009-09-14 17:02       ` Christoph Hellwig
2009-09-15 13:12       ` [PATCH] " Christoph Hellwig
2009-09-15 14:10         ` Jan Kara
2009-09-15 14:50         ` Ulrich Drepper
2009-09-17 17:16           ` Christoph Hellwig
2009-09-17 21:03         ` Kyle McMartin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.