linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] VFS: File System Mount Wide O_DIRECT Support
@ 2012-09-04 10:17 Li Wang
  2012-09-04 10:57 ` Christoph Hellwig
  2012-09-04 12:27 ` Jan Kara
  0 siblings, 2 replies; 5+ messages in thread
From: Li Wang @ 2012-09-04 10:17 UTC (permalink / raw)
  To: viro, axboe; +Cc: linux-fsdevel, linux-kernel

For file system created on file-backed loop device, there will be two-levels of 
page cache present, which typically doubles the memory consumption. 
In many cases, it is beneficial to turn on the O_DIRECT option while performing 
the upper file system file IO, to bypass the upper page cache, which not only reduces half
of the memory consumption, but also improves the performance due to shorter copy path.

For example, the following iozone REREAD test with O_DIRECT turned on over the one without
enjoys 10x speedup due to redundant cache elimination, consequently, avoiding page cache thrashing
on a 2GB memory machine running 3.2.9 kernel.

losetup /dev/loop0 dummy // dummy is a ext4 file with a size of 1.1GB
mkfs -t ext2 /dev/loop0
mount /dev/loop0 /dsk
cd /dsk
iozone -t 1 -s 1G -r 4M -i 0 -+n -w // produce a 1GB test file
iozone -t 1 -s 1G -r 4M -i 1 -w // REREAD test without O_DIRECT
echo 1 > /proc/sys/vm/drop_caches // cleanup the page cache
iozone -t 1 -s 1G -r 4M -i 1 -w -I // REREAD test with O_DIRECT

This feature is also expected to be useful for virtualization situation, the file systems inside 
the guest operation system will use much less of guest memory, which, potencially results in less of 
host memory use. Especially, it may be more useful if multiple guests are running based 
on a same disk image file.  

The idea is simple, leave the desicion for the file system user to enable file system mount 
wide O_DIRECT support with a new mount option, for example,

losetup /dev/loop0 dummy
mount /dev/loop0 -o MS_DIRECT /dsk

Below is the preliminary patch,

---
 fs/open.c          |    5 +++++
 fs/super.c         |    2 ++
 include/linux/fs.h |    1 +
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e1f2cdb..dacac30 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -958,6 +958,11 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
 			} else {
 				fsnotify_open(f);
 				fd_install(fd, f);
+				if (f->f_vfsmnt->mnt_sb && f->f_vfsmnt->mnt_sb->s_flags & MS_DIRECT) {
+					if (S_ISREG(f->f_dentry->d_inode->i_mode)) {
+	if (!f->f_mapping->a_ops || ((!f->f_mapping->a_ops->direct_IO) && (!f->f_mapping->a_ops->get_xip_mem)))
+		f->f_flags |= O_DIRECT;
+				}
 			}
 		}
 		putname(tmp);
diff --git a/fs/super.c b/fs/super.c
index 0902cfa..ab5c4a5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1147,6 +1147,8 @@ mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
 	WARN_ON(!sb->s_bdi);
 	WARN_ON(sb->s_bdi == &default_backing_dev_info);
 	sb->s_flags |= MS_BORN;
+	if (flags & MS_DIRECT)
+		sb->s_flags |= MS_DIRECT;
 
 	error = security_sb_kern_mount(sb, flags, secdata);
 	if (error)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa11047..127cc85 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -225,6 +225,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_DIRECT	(1<<27)
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
-- 
1.7.6.5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] VFS: File System Mount Wide O_DIRECT Support
  2012-09-04 10:17 [RFC] VFS: File System Mount Wide O_DIRECT Support Li Wang
@ 2012-09-04 10:57 ` Christoph Hellwig
  2012-09-04 14:09   ` Matthew Wilcox
  2012-09-04 12:27 ` Jan Kara
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2012-09-04 10:57 UTC (permalink / raw)
  To: Li Wang; +Cc: viro, axboe, linux-fsdevel, linux-kernel

On Tue, Sep 04, 2012 at 06:17:47PM +0800, Li Wang wrote:
> For file system created on file-backed loop device, there will be two-levels of 
> page cache present, which typically doubles the memory consumption. 

And the right fix is to not use buffer I/O on the backing file instead
of hacks like this.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] VFS: File System Mount Wide O_DIRECT Support
  2012-09-04 10:17 [RFC] VFS: File System Mount Wide O_DIRECT Support Li Wang
  2012-09-04 10:57 ` Christoph Hellwig
@ 2012-09-04 12:27 ` Jan Kara
  2012-09-04 23:28   ` Zach Brown
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2012-09-04 12:27 UTC (permalink / raw)
  To: Li Wang; +Cc: viro, axboe, linux-fsdevel, linux-kernel

On Tue 04-09-12 18:17:47, Li Wang wrote:
> For file system created on file-backed loop device, there will be two-levels of 
> page cache present, which typically doubles the memory consumption. 
> In many cases, it is beneficial to turn on the O_DIRECT option while performing 
> the upper file system file IO, to bypass the upper page cache, which not only reduces half
> of the memory consumption, but also improves the performance due to shorter copy path.
> 
> For example, the following iozone REREAD test with O_DIRECT turned on over the one without
> enjoys 10x speedup due to redundant cache elimination, consequently, avoiding page cache thrashing
> on a 2GB memory machine running 3.2.9 kernel.
> 
> losetup /dev/loop0 dummy // dummy is a ext4 file with a size of 1.1GB
> mkfs -t ext2 /dev/loop0
> mount /dev/loop0 /dsk
> cd /dsk
> iozone -t 1 -s 1G -r 4M -i 0 -+n -w // produce a 1GB test file
> iozone -t 1 -s 1G -r 4M -i 1 -w // REREAD test without O_DIRECT
> echo 1 > /proc/sys/vm/drop_caches // cleanup the page cache
> iozone -t 1 -s 1G -r 4M -i 1 -w -I // REREAD test with O_DIRECT
> 
> This feature is also expected to be useful for virtualization situation, the file systems inside 
> the guest operation system will use much less of guest memory, which, potencially results in less of 
> host memory use. Especially, it may be more useful if multiple guests are running based 
> on a same disk image file.  
> 
> The idea is simple, leave the desicion for the file system user to enable file system mount 
> wide O_DIRECT support with a new mount option, for example,
  I believe a better approach to your problem is actually to enable
loopback device driver to use direct IO. Someone was actually working on
this but I'm not sure where this ended up.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] VFS: File System Mount Wide O_DIRECT Support
  2012-09-04 10:57 ` Christoph Hellwig
@ 2012-09-04 14:09   ` Matthew Wilcox
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2012-09-04 14:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Li Wang, viro, axboe, linux-fsdevel, linux-kernel

On Tue, Sep 04, 2012 at 06:57:14AM -0400, Christoph Hellwig wrote:
> On Tue, Sep 04, 2012 at 06:17:47PM +0800, Li Wang wrote:
> > For file system created on file-backed loop device, there will be two-levels of 
> > page cache present, which typically doubles the memory consumption. 
> 
> And the right fix is to not use buffer I/O on the backing file instead
> of hacks like this.

That was my initial reaction too, but for the case of two VMs operating on
the same device, it's better for it to be cached once in the hype-rvisor
than twice in the VMs.  Is that a common case worth optimising for?
Probably not ...

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] VFS: File System Mount Wide O_DIRECT Support
  2012-09-04 12:27 ` Jan Kara
@ 2012-09-04 23:28   ` Zach Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Zach Brown @ 2012-09-04 23:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: Li Wang, viro, axboe, linux-fsdevel, linux-kernel

> > The idea is simple, leave the desicion for the file system user to enable file system mount 
> > wide O_DIRECT support with a new mount option, for example,

>   I believe a better approach to your problem is actually to enable
> loopback device driver to use direct IO. Someone was actually working on
> this but I'm not sure where this ended up.

Dave's been working on getting those patches merged.  I'm also not sure
where the work currently is, but here's an older posting:

  http://lwn.net/Articles/489647/

- z

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-09-04 23:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-04 10:17 [RFC] VFS: File System Mount Wide O_DIRECT Support Li Wang
2012-09-04 10:57 ` Christoph Hellwig
2012-09-04 14:09   ` Matthew Wilcox
2012-09-04 12:27 ` Jan Kara
2012-09-04 23:28   ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).