All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: fake direct I/O mode for data=journal
@ 2011-08-15  2:19 Theodore Ts'o
  2011-08-15 11:59 ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2011-08-15  2:19 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

Currently attempts to open a file with O_DIRECT in data=journal mode
causes the open to fail with -EINVAL.  This makes it very hard to test
data=journal mode.  So we will let the open succeed, but then always
fall back to O_DSYNC buffered writes.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---

With this commit applied ext4 in data=journal mode passes nearly all
of the xfstests -g auto tests:

BEGIN TEST: Ext4 4k block w/data=journal Sun Aug 14 20:10:23 EDT 2011
Ran: 001 002 005 006 007 011 013 014 015 053 069 070 074 075 076 077 079 083 088 089 100 105 112 113 117 120 123 124 125 126 127 128 129 130 131 132 133 135 141 169 184 192 193 198 204 207 208 209 210 211 212 213 214 215 221 223 224 225 226 228 236 237 239 240 243 245 246 247 248 249 256
Failures: 223
END TEST: Ext4 4k block w/data=journal Sun Aug 14 20:24:18 EDT 2011

The #223 failure is a stripe alignment failure; it's interesting that
data=journal is affecting our block allocation, but since ext4's RAID
stripe alignment happens mostly by luck, it's not a huge deal....

 fs/ext4/file.c  |   10 ++++++++++
 fs/ext4/inode.c |    1 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index e4095e9..f92981a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -98,6 +98,16 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 	int ret;
 
 	/*
+	 * If O_DIRECT is set and we are doing data journalling we
+	 * don't support O_DIRECT so force it off.
+	 */
+	if ((iocb->ki_filp->f_flags & O_DIRECT) &&
+	    ext4_should_journal_data(inode)) {
+		iocb->ki_filp->f_flags &= ~O_DIRECT;
+		iocb->ki_filp->f_flags |= O_DSYNC;
+	}
+
+	/*
 	 * If we have encountered a bitmap-format file, the size limit
 	 * is smaller than s_maxbytes, which is for extent-mapped files.
 	 */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7dd6981..49ebd3b9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2914,6 +2914,7 @@ static const struct address_space_operations ext4_journalled_aops = {
 	.bmap			= ext4_bmap,
 	.invalidatepage		= ext4_invalidatepage,
 	.releasepage		= ext4_releasepage,
+	.direct_IO		= ext4_direct_IO,
 	.is_partially_uptodate  = block_is_partially_uptodate,
 	.error_remove_page	= generic_error_remove_page,
 };
-- 
1.7.4.1.22.gec8e1.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fake direct I/O mode for data=journal
  2011-08-15  2:19 [PATCH] ext4: fake direct I/O mode for data=journal Theodore Ts'o
@ 2011-08-15 11:59 ` Jan Kara
  2011-08-15 18:03   ` Ted Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2011-08-15 11:59 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List

On Sun 14-08-11 22:19:14, Ted Tso wrote:
> Currently attempts to open a file with O_DIRECT in data=journal mode
> causes the open to fail with -EINVAL.  This makes it very hard to test
> data=journal mode.  So we will let the open succeed, but then always
> fall back to O_DSYNC buffered writes.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> 
> With this commit applied ext4 in data=journal mode passes nearly all
> of the xfstests -g auto tests:
> 
> BEGIN TEST: Ext4 4k block w/data=journal Sun Aug 14 20:10:23 EDT 2011
> Ran: 001 002 005 006 007 011 013 014 015 053 069 070 074 075 076 077 079 083 088 089 100 105 112 113 117 120 123 124 125 126 127 128 129 130 131 132 133 135 141 169 184 192 193 198 204 207 208 209 210 211 212 213 214 215 221 223 224 225 226 228 236 237 239 240 243 245 246 247 248 249 256
> Failures: 223
> END TEST: Ext4 4k block w/data=journal Sun Aug 14 20:24:18 EDT 2011
> 
> The #223 failure is a stripe alignment failure; it's interesting that
> data=journal is affecting our block allocation, but since ext4's RAID
> stripe alignment happens mostly by luck, it's not a huge deal....
> 
>  fs/ext4/file.c  |   10 ++++++++++
>  fs/ext4/inode.c |    1 +
>  2 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index e4095e9..f92981a 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -98,6 +98,16 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
>  	int ret;
>  
>  	/*
> +	 * If O_DIRECT is set and we are doing data journalling we
> +	 * don't support O_DIRECT so force it off.
> +	 */
> +	if ((iocb->ki_filp->f_flags & O_DIRECT) &&
> +	    ext4_should_journal_data(inode)) {
> +		iocb->ki_filp->f_flags &= ~O_DIRECT;
> +		iocb->ki_filp->f_flags |= O_DSYNC;
> +	}
> +
> +	/*
>  	 * If we have encountered a bitmap-format file, the size limit
>  	 * is smaller than s_maxbytes, which is for extent-mapped files.
>  	 */
  Why have you chosen to set O_DSYNC?
  Also what about reads? Mixing of buffered writes and direct IO reads will
not work because filemap_write_and_write() does not write data block to the
final location on disk but only to a journal.

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 7dd6981..49ebd3b9 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2914,6 +2914,7 @@ static const struct address_space_operations ext4_journalled_aops = {
>  	.bmap			= ext4_bmap,
>  	.invalidatepage		= ext4_invalidatepage,
>  	.releasepage		= ext4_releasepage,
> +	.direct_IO		= ext4_direct_IO,
>  	.is_partially_uptodate  = block_is_partially_uptodate,
>  	.error_remove_page	= generic_error_remove_page,
>  };

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fake direct I/O mode for data=journal
  2011-08-15 11:59 ` Jan Kara
@ 2011-08-15 18:03   ` Ted Ts'o
  2011-08-16 19:32     ` [PATCH -v2] " Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Ted Ts'o @ 2011-08-15 18:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List

On Mon, Aug 15, 2011 at 01:59:39PM +0200, Jan Kara wrote:
>   Why have you chosen to set O_DSYNC?

O_DSYNC was the closest thing to O_DIRECT, in that writes get forced
to disk after the write finished.  I did this fix a test failure
caused by missing direct I/O writes with a buffered read.  OTOH, there
is a performance cost in doing this since it will force a journal
commit after every write.  That's probably not acceptable.  Hmph.

>   Also what about reads? Mixing of buffered writes and direct IO reads will
> not work because filemap_write_and_write() does not write data block to the
> final location on disk but only to a journal.

Given that, we're probably better off forcing reads drop O_DIRECT as
well and drop O_DSYNC.

I'll change the patch and then give it a test.

     	    	      	       	       	 - Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH -v2] ext4: fake direct I/O mode for data=journal
  2011-08-15 18:03   ` Ted Ts'o
@ 2011-08-16 19:32     ` Theodore Ts'o
  2011-08-17 11:42       ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2011-08-16 19:32 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Theodore Ts'o

Currently attempts to open a file with O_DIRECT in data=journal mode
causes the open to fail with -EINVAL.  This makes it very hard to test
data=journal mode.  So we will let the open succeed, but then always
fall back to O_DSYNC buffered writes.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/file.c  |   28 +++++++++++++++++++++++++++-
 fs/ext4/inode.c |    1 +
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index e4095e9..566f33f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -89,6 +89,24 @@ ext4_unaligned_aio(struct inode *inode, const struct iovec *iov,
 	return 0;
 }
 
+ssize_t
+ext4_file_read(struct kiocb *iocb, const struct iovec *iov,
+	       unsigned long nr_segs, loff_t pos)
+{
+	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
+
+	/*
+	 * If O_DIRECT is set and we are doing data journalling we
+	 * don't support O_DIRECT so force it off.
+	 */
+	if ((iocb->ki_filp->f_flags & O_DIRECT) &&
+	    ext4_should_journal_data(inode))
+		iocb->ki_filp->f_flags &= ~O_DIRECT;
+
+	return generic_file_aio_read(iocb, iov, nr_segs, pos);
+}
+
+
 static ssize_t
 ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 		unsigned long nr_segs, loff_t pos)
@@ -98,6 +116,14 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
 	int ret;
 
 	/*
+	 * If O_DIRECT is set and we are doing data journalling we
+	 * don't support O_DIRECT so force it off.
+	 */
+	if ((iocb->ki_filp->f_flags & O_DIRECT) &&
+	    ext4_should_journal_data(inode))
+		iocb->ki_filp->f_flags &= ~O_DIRECT;
+
+	/*
 	 * If we have encountered a bitmap-format file, the size limit
 	 * is smaller than s_maxbytes, which is for extent-mapped files.
 	 */
@@ -277,7 +303,7 @@ const struct file_operations ext4_file_operations = {
 	.llseek		= ext4_llseek,
 	.read		= do_sync_read,
 	.write		= do_sync_write,
-	.aio_read	= generic_file_aio_read,
+	.aio_read	= ext4_file_read,
 	.aio_write	= ext4_file_write,
 	.unlocked_ioctl = ext4_ioctl,
 #ifdef CONFIG_COMPAT
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 762e803..b7088e2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2921,6 +2921,7 @@ static const struct address_space_operations ext4_journalled_aops = {
 	.bmap			= ext4_bmap,
 	.invalidatepage		= ext4_invalidatepage,
 	.releasepage		= ext4_releasepage,
+	.direct_IO		= ext4_direct_IO,
 	.is_partially_uptodate  = block_is_partially_uptodate,
 	.error_remove_page	= generic_error_remove_page,
 };
-- 
1.7.4.1.22.gec8e1.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -v2] ext4: fake direct I/O mode for data=journal
  2011-08-16 19:32     ` [PATCH -v2] " Theodore Ts'o
@ 2011-08-17 11:42       ` Jan Kara
  2011-08-17 16:50         ` Ted Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2011-08-17 11:42 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List

On Tue 16-08-11 15:32:25, Ted Tso wrote:
> Currently attempts to open a file with O_DIRECT in data=journal mode
> causes the open to fail with -EINVAL.  This makes it very hard to test
> data=journal mode.  So we will let the open succeed, but then always
> fall back to O_DSYNC buffered writes.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
>  fs/ext4/file.c  |   28 +++++++++++++++++++++++++++-
>  fs/ext4/inode.c |    1 +
>  2 files changed, 28 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index e4095e9..566f33f 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -89,6 +89,24 @@ ext4_unaligned_aio(struct inode *inode, const struct iovec *iov,
>  	return 0;
>  }
>  
> +ssize_t
> +ext4_file_read(struct kiocb *iocb, const struct iovec *iov,
> +	       unsigned long nr_segs, loff_t pos)
> +{
> +	struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
> +
> +	/*
> +	 * If O_DIRECT is set and we are doing data journalling we
> +	 * don't support O_DIRECT so force it off.
> +	 */
> +	if ((iocb->ki_filp->f_flags & O_DIRECT) &&
> +	    ext4_should_journal_data(inode))
> +		iocb->ki_filp->f_flags &= ~O_DIRECT;
> +
> +	return generic_file_aio_read(iocb, iov, nr_segs, pos);
> +}
> +
> +
  Two empty lines here?

>  static ssize_t
>  ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
>  		unsigned long nr_segs, loff_t pos)
> @@ -98,6 +116,14 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
>  	int ret;
>  
>  	/*
> +	 * If O_DIRECT is set and we are doing data journalling we
> +	 * don't support O_DIRECT so force it off.
> +	 */
> +	if ((iocb->ki_filp->f_flags & O_DIRECT) &&
> +	    ext4_should_journal_data(inode))
> +		iocb->ki_filp->f_flags &= ~O_DIRECT;
> +
> +	/*
>  	 * If we have encountered a bitmap-format file, the size limit
>  	 * is smaller than s_maxbytes, which is for extent-mapped files.
>  	 */
> @@ -277,7 +303,7 @@ const struct file_operations ext4_file_operations = {
>  	.llseek		= ext4_llseek,
>  	.read		= do_sync_read,
>  	.write		= do_sync_write,
> -	.aio_read	= generic_file_aio_read,
> +	.aio_read	= ext4_file_read,
>  	.aio_write	= ext4_file_write,
>  	.unlocked_ioctl = ext4_ioctl,
>  #ifdef CONFIG_COMPAT
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 762e803..b7088e2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2921,6 +2921,7 @@ static const struct address_space_operations ext4_journalled_aops = {
>  	.bmap			= ext4_bmap,
>  	.invalidatepage		= ext4_invalidatepage,
>  	.releasepage		= ext4_releasepage,
> +	.direct_IO		= ext4_direct_IO,
>  	.is_partially_uptodate  = block_is_partially_uptodate,
>  	.error_remove_page	= generic_error_remove_page,
>  };
  Strictly speaking this is racy wrt. to fcntl() setting O_DIRECT bit.
Fixing it is actually quite simple - just provide ext4_noop_direct_IO
function that will just "return 0" and generic write code will fall
back to buffered IO automatically. We don't want to use the fallback all
the time since it incurs overhead of flushing and invalidating the mapping
so what you did above will catch the common case.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -v2] ext4: fake direct I/O mode for data=journal
  2011-08-17 11:42       ` Jan Kara
@ 2011-08-17 16:50         ` Ted Ts'o
  0 siblings, 0 replies; 6+ messages in thread
From: Ted Ts'o @ 2011-08-17 16:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List

On Wed, Aug 17, 2011 at 01:42:52PM +0200, Jan Kara wrote:
>   Strictly speaking this is racy wrt. to fcntl() setting O_DIRECT bit.
> Fixing it is actually quite simple - just provide ext4_noop_direct_IO
> function that will just "return 0" and generic write code will fall
> back to buffered IO automatically. We don't want to use the fallback all
> the time since it incurs overhead of flushing and invalidating the mapping
> so what you did above will catch the common case.

Good point!  Yes, that's a much better way to do it.  Actually
ext4_noop_direct_IO() isn't good enough, since even with data=ordered
or data=journalled, because you can set the 'j' attribute to force the
file to use data journalling.  (This is true in ext3 as well, and I'm
not sure something sane happens if you force a file to be data
journalling and then do direct IO on it when in data=ordered or
data=writeback mode.)

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-08-17 16:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-15  2:19 [PATCH] ext4: fake direct I/O mode for data=journal Theodore Ts'o
2011-08-15 11:59 ` Jan Kara
2011-08-15 18:03   ` Ted Ts'o
2011-08-16 19:32     ` [PATCH -v2] " Theodore Ts'o
2011-08-17 11:42       ` Jan Kara
2011-08-17 16:50         ` Ted Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.