All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:13 ` Andrea Righi
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Righi @ 2011-04-27 18:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Chinner, Mike Frysinger, Al Viro, Arnd Bergmann,
	linux-fsdevel, linux-api, linux-kernel

Introduce a new fadvise flag to drop page cache pages of a single
filesystem.

At the moment it is possible to drop page cache pages via
/proc/sys/vm/drop_pagecache or via posix_fadvise(POSIX_FADV_DONTNEED).

The first method drops the whole page cache while the second can be used
to drop page cache pages of a single file descriptor. However, there's
not a simple way to drop all the pages of a filesystem (we could scan
all the file descriptors and use posix_fadvise(POSIX_FADV_DONTNEED), but
this solution obviously doesn't scale well).

This functionality requires root privilege to avoid potential DoS in the
system (i.e., a hard loop of posix_fadvise(POSIX_FADV_DONTNEED_FS) on
the root filesystem).

A practical example:

  # ls -lh /mnt/sda/zero /mnt/sdb/zero
  -rw-r--r-- 1 root   root   16M 2011-04-20 10:20 /mnt/sda/zero
  -rw-r--r-- 1 root   root   16M 2011-04-20 10:20 /mnt/sdb/zero

  $ grep ^Cached /proc/meminfo
  Cached:             5660 kB
  $ md5sum /mnt/sda/zero /mnt/sdb/zero
  2c7ab85a893283e98c931e9511add182  /mnt/sda/zero
  2c7ab85a893283e98c931e9511add182  /mnt/sdb/zero
  $ grep ^Cached /proc/meminfo
  Cached:            38544 kB
  $ sudo ./drop-pagecache /mnt/sda/
  $ grep ^Cached /proc/meminfo
  Cached:            22440 kB
  $ sudo ./drop-pagecache /mnt/sdb/
  $ grep ^Cached /proc/meminfo
  Cached:             5056 kB

A previous RFC about this topic can be found here:
  http://marc.info/?l=linux-kernel&m=130385374902114&w=2

ChangeLog (v1 -> v2):

 * use the same value for POSIX_FADV_DONTNEED_FS on all architectures
 * check CAP_SYS_ADMIN capability instead of checking the EUID value

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 fs/drop_caches.c        |    2 +-
 include/linux/fadvise.h |    1 +
 include/linux/mm.h      |    2 ++
 mm/fadvise.c            |    7 +++++++
 4 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 98b77c8..59d6caa 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -13,7 +13,7 @@
 /* A global variable is a bit ugly, but it keeps the code simple */
 int sysctl_drop_caches;
 
-static void drop_pagecache_sb(struct super_block *sb, void *unused)
+void drop_pagecache_sb(struct super_block *sb, void *unused)
 {
 	struct inode *inode, *toput_inode = NULL;
 
diff --git a/include/linux/fadvise.h b/include/linux/fadvise.h
index e8e7471..ab39117 100644
--- a/include/linux/fadvise.h
+++ b/include/linux/fadvise.h
@@ -17,5 +17,6 @@
 #define POSIX_FADV_DONTNEED	4 /* Don't need these pages.  */
 #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
 #endif
+#define POSIX_FADV_DONTNEED_FS	8 /* Don't need these filesystem pages.  */
 
 #endif	/* FADVISE_H_INCLUDED */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..004cdbc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -21,6 +21,7 @@ struct anon_vma;
 struct file_ra_state;
 struct user_struct;
 struct writeback_control;
+struct super_block;
 
 #ifndef CONFIG_DISCONTIGMEM          /* Don't use mapnrs, do it properly */
 extern unsigned long max_mapnr;
@@ -1602,6 +1603,7 @@ int in_gate_area_no_mm(unsigned long addr);
 #define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_mm(addr);})
 #endif	/* __HAVE_ARCH_GATE_AREA */
 
+void drop_pagecache_sb(struct super_block *sb, void *unused);
 int drop_caches_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 8d723c9..15155e7 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -57,6 +57,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 		case POSIX_FADV_WILLNEED:
 		case POSIX_FADV_NOREUSE:
 		case POSIX_FADV_DONTNEED:
+		case POSIX_FADV_DONTNEED_FS:
 			/* no bad return value, but ignore advice */
 			break;
 		default:
@@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 			invalidate_mapping_pages(mapping, start_index,
 						end_index);
 		break;
+	case POSIX_FADV_DONTNEED_FS:
+		if (capable(CAP_SYS_ADMIN))
+			drop_pagecache_sb(file->f_dentry->d_sb, NULL);
+		else
+			ret = -EPERM;
+		break;
 	default:
 		ret = -EINVAL;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:13 ` Andrea Righi
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Righi @ 2011-04-27 18:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Chinner, Mike Frysinger, Al Viro, Arnd Bergmann,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Introduce a new fadvise flag to drop page cache pages of a single
filesystem.

At the moment it is possible to drop page cache pages via
/proc/sys/vm/drop_pagecache or via posix_fadvise(POSIX_FADV_DONTNEED).

The first method drops the whole page cache while the second can be used
to drop page cache pages of a single file descriptor. However, there's
not a simple way to drop all the pages of a filesystem (we could scan
all the file descriptors and use posix_fadvise(POSIX_FADV_DONTNEED), but
this solution obviously doesn't scale well).

This functionality requires root privilege to avoid potential DoS in the
system (i.e., a hard loop of posix_fadvise(POSIX_FADV_DONTNEED_FS) on
the root filesystem).

A practical example:

  # ls -lh /mnt/sda/zero /mnt/sdb/zero
  -rw-r--r-- 1 root   root   16M 2011-04-20 10:20 /mnt/sda/zero
  -rw-r--r-- 1 root   root   16M 2011-04-20 10:20 /mnt/sdb/zero

  $ grep ^Cached /proc/meminfo
  Cached:             5660 kB
  $ md5sum /mnt/sda/zero /mnt/sdb/zero
  2c7ab85a893283e98c931e9511add182  /mnt/sda/zero
  2c7ab85a893283e98c931e9511add182  /mnt/sdb/zero
  $ grep ^Cached /proc/meminfo
  Cached:            38544 kB
  $ sudo ./drop-pagecache /mnt/sda/
  $ grep ^Cached /proc/meminfo
  Cached:            22440 kB
  $ sudo ./drop-pagecache /mnt/sdb/
  $ grep ^Cached /proc/meminfo
  Cached:             5056 kB

A previous RFC about this topic can be found here:
  http://marc.info/?l=linux-kernel&m=130385374902114&w=2

ChangeLog (v1 -> v2):

 * use the same value for POSIX_FADV_DONTNEED_FS on all architectures
 * check CAP_SYS_ADMIN capability instead of checking the EUID value

Signed-off-by: Andrea Righi <andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org>
---
 fs/drop_caches.c        |    2 +-
 include/linux/fadvise.h |    1 +
 include/linux/mm.h      |    2 ++
 mm/fadvise.c            |    7 +++++++
 4 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 98b77c8..59d6caa 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -13,7 +13,7 @@
 /* A global variable is a bit ugly, but it keeps the code simple */
 int sysctl_drop_caches;
 
-static void drop_pagecache_sb(struct super_block *sb, void *unused)
+void drop_pagecache_sb(struct super_block *sb, void *unused)
 {
 	struct inode *inode, *toput_inode = NULL;
 
diff --git a/include/linux/fadvise.h b/include/linux/fadvise.h
index e8e7471..ab39117 100644
--- a/include/linux/fadvise.h
+++ b/include/linux/fadvise.h
@@ -17,5 +17,6 @@
 #define POSIX_FADV_DONTNEED	4 /* Don't need these pages.  */
 #define POSIX_FADV_NOREUSE	5 /* Data will be accessed once.  */
 #endif
+#define POSIX_FADV_DONTNEED_FS	8 /* Don't need these filesystem pages.  */
 
 #endif	/* FADVISE_H_INCLUDED */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..004cdbc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -21,6 +21,7 @@ struct anon_vma;
 struct file_ra_state;
 struct user_struct;
 struct writeback_control;
+struct super_block;
 
 #ifndef CONFIG_DISCONTIGMEM          /* Don't use mapnrs, do it properly */
 extern unsigned long max_mapnr;
@@ -1602,6 +1603,7 @@ int in_gate_area_no_mm(unsigned long addr);
 #define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_mm(addr);})
 #endif	/* __HAVE_ARCH_GATE_AREA */
 
+void drop_pagecache_sb(struct super_block *sb, void *unused);
 int drop_caches_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 8d723c9..15155e7 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -57,6 +57,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 		case POSIX_FADV_WILLNEED:
 		case POSIX_FADV_NOREUSE:
 		case POSIX_FADV_DONTNEED:
+		case POSIX_FADV_DONTNEED_FS:
 			/* no bad return value, but ignore advice */
 			break;
 		default:
@@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 			invalidate_mapping_pages(mapping, start_index,
 						end_index);
 		break;
+	case POSIX_FADV_DONTNEED_FS:
+		if (capable(CAP_SYS_ADMIN))
+			drop_pagecache_sb(file->f_dentry->d_sb, NULL);
+		else
+			ret = -EPERM;
+		break;
 	default:
 		ret = -EINVAL;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:25   ` Mike Frysinger
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Frysinger @ 2011-04-27 18:25 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Dave Chinner, Al Viro, Arnd Bergmann,
	linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 14:13, Andrea Righi wrote:
> Introduce a new fadvise flag to drop page cache pages of a single
> filesystem.
>
> At the moment it is possible to drop page cache pages via
> /proc/sys/vm/drop_pagecache or via posix_fadvise(POSIX_FADV_DONTNEED).
>
> The first method drops the whole page cache while the second can be used
> to drop page cache pages of a single file descriptor. However, there's
> not a simple way to drop all the pages of a filesystem (we could scan
> all the file descriptors and use posix_fadvise(POSIX_FADV_DONTNEED), but
> this solution obviously doesn't scale well).

what if you open the mount point and use POSIX_FADV_DONTNEED on that
dir handle ?  if you required write access for that level, it'd also
implicitly take care of the permission issue.  but maybe this is just
trying to fit existing code in the wrong way.
-mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:25   ` Mike Frysinger
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Frysinger @ 2011-04-27 18:25 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Dave Chinner, Al Viro, Arnd Bergmann,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 27, 2011 at 14:13, Andrea Righi wrote:
> Introduce a new fadvise flag to drop page cache pages of a single
> filesystem.
>
> At the moment it is possible to drop page cache pages via
> /proc/sys/vm/drop_pagecache or via posix_fadvise(POSIX_FADV_DONTNEED).
>
> The first method drops the whole page cache while the second can be used
> to drop page cache pages of a single file descriptor. However, there's
> not a simple way to drop all the pages of a filesystem (we could scan
> all the file descriptors and use posix_fadvise(POSIX_FADV_DONTNEED), but
> this solution obviously doesn't scale well).

what if you open the mount point and use POSIX_FADV_DONTNEED on that
dir handle ?  if you required write access for that level, it'd also
implicitly take care of the permission issue.  but maybe this is just
trying to fit existing code in the wrong way.
-mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:33   ` Matthew Wilcox
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2011-04-27 18:33 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Dave Chinner, Mike Frysinger, Al Viro,
	Arnd Bergmann, linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 08:13:47PM +0200, Andrea Righi wrote:
> @@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>  			invalidate_mapping_pages(mapping, start_index,
>  						end_index);
>  		break;
> +	case POSIX_FADV_DONTNEED_FS:
> +		if (capable(CAP_SYS_ADMIN))
> +			drop_pagecache_sb(file->f_dentry->d_sb, NULL);
> +		else
> +			ret = -EPERM;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}

Mmm ... what if I open /dev/sdxyz and call fadvise() on it?  I think
you end up flushing /dev's page cache entries, instead of the filesystem
which is on /dev/sdxyz.

If I understand correctly, you want mapping->host->i_sb instead of
file->f_dentry->d_sb.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:33   ` Matthew Wilcox
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2011-04-27 18:33 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Dave Chinner, Mike Frysinger, Al Viro,
	Arnd Bergmann, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 27, 2011 at 08:13:47PM +0200, Andrea Righi wrote:
> @@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>  			invalidate_mapping_pages(mapping, start_index,
>  						end_index);
>  		break;
> +	case POSIX_FADV_DONTNEED_FS:
> +		if (capable(CAP_SYS_ADMIN))
> +			drop_pagecache_sb(file->f_dentry->d_sb, NULL);
> +		else
> +			ret = -EPERM;
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}

Mmm ... what if I open /dev/sdxyz and call fadvise() on it?  I think
you end up flushing /dev's page cache entries, instead of the filesystem
which is on /dev/sdxyz.

If I understand correctly, you want mapping->host->i_sb instead of
file->f_dentry->d_sb.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:39     ` Mike Frysinger
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Frysinger @ 2011-04-27 18:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrea Righi, Andrew Morton, Dave Chinner, Al Viro,
	Arnd Bergmann, linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 14:33, Matthew Wilcox wrote:
> On Wed, Apr 27, 2011 at 08:13:47PM +0200, Andrea Righi wrote:
>> @@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>>                       invalidate_mapping_pages(mapping, start_index,
>>                                               end_index);
>>               break;
>> +     case POSIX_FADV_DONTNEED_FS:
>> +             if (capable(CAP_SYS_ADMIN))
>> +                     drop_pagecache_sb(file->f_dentry->d_sb, NULL);
>> +             else
>> +                     ret = -EPERM;
>> +             break;
>>       default:
>>               ret = -EINVAL;
>>       }
>
> Mmm ... what if I open /dev/sdxyz and call fadvise() on it?  I think
> you end up flushing /dev's page cache entries, instead of the filesystem
> which is on /dev/sdxyz.

i was thinking of that, but was trying to come up with situations
where there might not have a node to work on.  fs's in a file go
through loop devs, dm/lvm have ones created, and flash fs's still have
a mtd block.  how about network based fs's ?  how you going to signal
dropping of pages for nfs or cifs or fuse ones ?
-mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:39     ` Mike Frysinger
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Frysinger @ 2011-04-27 18:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrea Righi, Andrew Morton, Dave Chinner, Al Viro,
	Arnd Bergmann, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 27, 2011 at 14:33, Matthew Wilcox wrote:
> On Wed, Apr 27, 2011 at 08:13:47PM +0200, Andrea Righi wrote:
>> @@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>>                       invalidate_mapping_pages(mapping, start_index,
>>                                               end_index);
>>               break;
>> +     case POSIX_FADV_DONTNEED_FS:
>> +             if (capable(CAP_SYS_ADMIN))
>> +                     drop_pagecache_sb(file->f_dentry->d_sb, NULL);
>> +             else
>> +                     ret = -EPERM;
>> +             break;
>>       default:
>>               ret = -EINVAL;
>>       }
>
> Mmm ... what if I open /dev/sdxyz and call fadvise() on it?  I think
> you end up flushing /dev's page cache entries, instead of the filesystem
> which is on /dev/sdxyz.

i was thinking of that, but was trying to come up with situations
where there might not have a node to work on.  fs's in a file go
through loop devs, dm/lvm have ones created, and flash fs's still have
a mtd block.  how about network based fs's ?  how you going to signal
dropping of pages for nfs or cifs or fuse ones ?
-mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
  2011-04-27 18:39     ` Mike Frysinger
  (?)
@ 2011-04-27 18:47     ` Matthew Wilcox
  2011-04-27 18:49         ` Mike Frysinger
  -1 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2011-04-27 18:47 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Andrea Righi, Andrew Morton, Dave Chinner, Al Viro,
	Arnd Bergmann, linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 02:39:53PM -0400, Mike Frysinger wrote:
> > Mmm ... what if I open /dev/sdxyz and call fadvise() on it? ??I think
> > you end up flushing /dev's page cache entries, instead of the filesystem
> > which is on /dev/sdxyz.
> 
> i was thinking of that, but was trying to come up with situations
> where there might not have a node to work on.  fs's in a file go
> through loop devs, dm/lvm have ones created, and flash fs's still have
> a mtd block.  how about network based fs's ?  how you going to signal
> dropping of pages for nfs or cifs or fuse ones ?

For a regular file, mapping->host->i_sb points to the superblock this
file is on.  For a device, mapping->host->i_sb points to the superblock
corresponding to this device.  So it's always what we want.

(hm, what about block devices not currently mounted?  do we need to check
whether mapping->host is NULL?)

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:49         ` Mike Frysinger
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Frysinger @ 2011-04-27 18:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrea Righi, Andrew Morton, Dave Chinner, Al Viro,
	Arnd Bergmann, linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 14:47, Matthew Wilcox wrote:
> On Wed, Apr 27, 2011 at 02:39:53PM -0400, Mike Frysinger wrote:
>> > Mmm ... what if I open /dev/sdxyz and call fadvise() on it? ??I think
>> > you end up flushing /dev's page cache entries, instead of the filesystem
>> > which is on /dev/sdxyz.
>>
>> i was thinking of that, but was trying to come up with situations
>> where there might not have a node to work on.  fs's in a file go
>> through loop devs, dm/lvm have ones created, and flash fs's still have
>> a mtd block.  how about network based fs's ?  how you going to signal
>> dropping of pages for nfs or cifs or fuse ones ?
>
> For a regular file, mapping->host->i_sb points to the superblock this
> file is on.  For a device, mapping->host->i_sb points to the superblock
> corresponding to this device.  So it's always what we want.

sorry, wrong question.  i misread your original post (suggesting we
should be calling fadvise on the block instead of an arbitrary dir
handle).
-mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-04-27 18:49         ` Mike Frysinger
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Frysinger @ 2011-04-27 18:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrea Righi, Andrew Morton, Dave Chinner, Al Viro,
	Arnd Bergmann, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 27, 2011 at 14:47, Matthew Wilcox wrote:
> On Wed, Apr 27, 2011 at 02:39:53PM -0400, Mike Frysinger wrote:
>> > Mmm ... what if I open /dev/sdxyz and call fadvise() on it? ??I think
>> > you end up flushing /dev's page cache entries, instead of the filesystem
>> > which is on /dev/sdxyz.
>>
>> i was thinking of that, but was trying to come up with situations
>> where there might not have a node to work on.  fs's in a file go
>> through loop devs, dm/lvm have ones created, and flash fs's still have
>> a mtd block.  how about network based fs's ?  how you going to signal
>> dropping of pages for nfs or cifs or fuse ones ?
>
> For a regular file, mapping->host->i_sb points to the superblock this
> file is on.  For a device, mapping->host->i_sb points to the superblock
> corresponding to this device.  So it's always what we want.

sorry, wrong question.  i misread your original post (suggesting we
should be calling fadvise on the block instead of an arbitrary dir
handle).
-mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
  2011-04-27 18:33   ` Matthew Wilcox
  (?)
  (?)
@ 2011-04-28  9:29   ` Andrea Righi
  -1 siblings, 0 replies; 17+ messages in thread
From: Andrea Righi @ 2011-04-28  9:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Dave Chinner, Mike Frysinger, Al Viro,
	Arnd Bergmann, linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 12:33:08PM -0600, Matthew Wilcox wrote:
> On Wed, Apr 27, 2011 at 08:13:47PM +0200, Andrea Righi wrote:
> > @@ -127,6 +128,12 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> >  			invalidate_mapping_pages(mapping, start_index,
> >  						end_index);
> >  		break;
> > +	case POSIX_FADV_DONTNEED_FS:
> > +		if (capable(CAP_SYS_ADMIN))
> > +			drop_pagecache_sb(file->f_dentry->d_sb, NULL);
> > +		else
> > +			ret = -EPERM;
> > +		break;
> >  	default:
> >  		ret = -EINVAL;
> >  	}
> 
> Mmm ... what if I open /dev/sdxyz and call fadvise() on it?  I think
> you end up flushing /dev's page cache entries, instead of the filesystem
> which is on /dev/sdxyz.
> 
> If I understand correctly, you want mapping->host->i_sb instead of
> file->f_dentry->d_sb.

I did some tests, but I don't get the expected behaviour. In all cases
both if I use mapping->host->i_sb or file->f_dentry->d_sb when I call
fadvise() on any block device all the blockdev pages are dropped
("Buffers" from /proc/meminfo), but page cache pages are not touched:

# df -hT /
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext4     29G   20G  7.4G  73% /
# grep "^Cached\|Buffers" /proc/meminfo
Buffers:           79772 kB
Cached:            32440 kB
# sudo drop-pagecache /dev/sda1
# grep "^Cached\|Buffers" /proc/meminfo
Buffers:             228 kB
Cached:            32440 kB
# sudo drop-pagecache /
# grep "^Cached\|Buffers" /proc/meminfo
Buffers:             228 kB
Cached:             4884 kB

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
  2011-04-27 18:25   ` Mike Frysinger
  (?)
@ 2011-04-28  9:35   ` Andrea Righi
  -1 siblings, 0 replies; 17+ messages in thread
From: Andrea Righi @ 2011-04-28  9:35 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Andrew Morton, Dave Chinner, Al Viro, Arnd Bergmann,
	linux-fsdevel, linux-api, linux-kernel

On Wed, Apr 27, 2011 at 02:25:17PM -0400, Mike Frysinger wrote:
> On Wed, Apr 27, 2011 at 14:13, Andrea Righi wrote:
> > Introduce a new fadvise flag to drop page cache pages of a single
> > filesystem.
> >
> > At the moment it is possible to drop page cache pages via
> > /proc/sys/vm/drop_pagecache or via posix_fadvise(POSIX_FADV_DONTNEED).
> >
> > The first method drops the whole page cache while the second can be used
> > to drop page cache pages of a single file descriptor. However, there's
> > not a simple way to drop all the pages of a filesystem (we could scan
> > all the file descriptors and use posix_fadvise(POSIX_FADV_DONTNEED), but
> > this solution obviously doesn't scale well).
> 
> what if you open the mount point and use POSIX_FADV_DONTNEED on that
> dir handle ?  if you required write access for that level, it'd also
> implicitly take care of the permission issue.  but maybe this is just
> trying to fit existing code in the wrong way.
> -mike

I still prefer the capability check. I think it's much more simple from
the userspace point of view to be able to specify any file or directory
instead of being forced to retrieve the mountpoint.

However, an advantage with the approach you're proposing is that a
non-privileged user can drop the page cache of a filesystem if it has
write permission in the root of that filesystem.

mmmh.. I don't see big problems also with the interface you propose, if
you all think it's better I can implement this in the next version.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-05-04 21:44   ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2011-05-04 21:44 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Dave Chinner, Mike Frysinger, Al Viro, Arnd Bergmann,
	linux-fsdevel, linux-api, linux-kernel

On Wed, 27 Apr 2011 20:13:47 +0200
Andrea Righi <andrea@betterlinux.com> wrote:

> Introduce a new fadvise flag to drop page cache pages of a single
> filesystem.

I'm going to object to this on general principle.  We shouldn't toss
new features into the kernel API just because we can.  Each feature
needs a really good argument to justify its inclusion, and I don't
believe that the case has been made for this feature.

IOW, who's going to die if we don't merge it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-05-04 21:44   ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2011-05-04 21:44 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Dave Chinner, Mike Frysinger, Al Viro, Arnd Bergmann,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, 27 Apr 2011 20:13:47 +0200
Andrea Righi <andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org> wrote:

> Introduce a new fadvise flag to drop page cache pages of a single
> filesystem.

I'm going to object to this on general principle.  We shouldn't toss
new features into the kernel API just because we can.  Each feature
needs a really good argument to justify its inclusion, and I don't
believe that the case has been made for this feature.

IOW, who's going to die if we don't merge it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-05-04 22:09     ` Andrea Righi
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Righi @ 2011-05-04 22:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Chinner, Mike Frysinger, Al Viro, Arnd Bergmann,
	linux-fsdevel, linux-api, linux-kernel

On Wed, May 04, 2011 at 02:44:11PM -0700, Andrew Morton wrote:
> On Wed, 27 Apr 2011 20:13:47 +0200
> Andrea Righi <andrea@betterlinux.com> wrote:
> 
> > Introduce a new fadvise flag to drop page cache pages of a single
> > filesystem.
> 
> I'm going to object to this on general principle.  We shouldn't toss
> new features into the kernel API just because we can.  Each feature
> needs a really good argument to justify its inclusion, and I don't
> believe that the case has been made for this feature.
> 
> IOW, who's going to die if we don't merge it?

OK. :)

-Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS
@ 2011-05-04 22:09     ` Andrea Righi
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Righi @ 2011-05-04 22:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Chinner, Mike Frysinger, Al Viro, Arnd Bergmann,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, May 04, 2011 at 02:44:11PM -0700, Andrew Morton wrote:
> On Wed, 27 Apr 2011 20:13:47 +0200
> Andrea Righi <andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org> wrote:
> 
> > Introduce a new fadvise flag to drop page cache pages of a single
> > filesystem.
> 
> I'm going to object to this on general principle.  We shouldn't toss
> new features into the kernel API just because we can.  Each feature
> needs a really good argument to justify its inclusion, and I don't
> believe that the case has been made for this feature.
> 
> IOW, who's going to die if we don't merge it?

OK. :)

-Andrea

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-05-04 22:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-27 18:13 [PATCH v2] fadvise: introduce POSIX_FADV_DONTNEED_FS Andrea Righi
2011-04-27 18:13 ` Andrea Righi
2011-04-27 18:25 ` Mike Frysinger
2011-04-27 18:25   ` Mike Frysinger
2011-04-28  9:35   ` Andrea Righi
2011-04-27 18:33 ` Matthew Wilcox
2011-04-27 18:33   ` Matthew Wilcox
2011-04-27 18:39   ` Mike Frysinger
2011-04-27 18:39     ` Mike Frysinger
2011-04-27 18:47     ` Matthew Wilcox
2011-04-27 18:49       ` Mike Frysinger
2011-04-27 18:49         ` Mike Frysinger
2011-04-28  9:29   ` Andrea Righi
2011-05-04 21:44 ` Andrew Morton
2011-05-04 21:44   ` Andrew Morton
2011-05-04 22:09   ` Andrea Righi
2011-05-04 22:09     ` Andrea Righi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.