All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] btrfs/vfs: Return same device in stat(2) and /proc/pid/maps
@ 2011-05-13 23:18 Mark Fasheh
  2011-05-13 23:18 ` [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device Mark Fasheh
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Fasheh @ 2011-05-13 23:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, linux-fsdevel

I originally posted about this issue some weeks ago but unfortunately didn't
get a response:

http://comments.gmane.org/gmane.comp.file-systems.btrfs/9682


To recap:

btrfs_getattr() is filling stat->dev with an anonymous device
(for per-snapshot root?):

stat->dev = BTRFS_I(inode)->root->anon_super.s_dev;

but /proc/pid/maps uses the "real" block device:

dev = inode->i_sb->s_dev;


This results in some unfortunate behavior for lsof as it reports some
duplicate paths (except on different block devices). The easiest way to see
this (if your root partition is btrfs):

$ lsof | grep lsof
<snip>
lsof       9238            root  txt       REG               0,19   139736 14478 /usr/bin/lsof
lsof       9238            root  mem       REG               0,17   14478 /usr/bin/lsof (path dev=0,19)


As was noted in my original mail, this winds up breaking userspace software
(zypper in my case).

The following patch series fixes this issue by allowing a file
system to supply a custom method for the device reported in /proc/pid/maps.
It fixes the issue for me, but I'm not 100% sold on the approach taken -
it's an admittedly brute-force solution hence the RFC tag. Any comments are
appreciated and welcome.

Thanks,
	--Mark


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
  2011-05-13 23:18 [RFC][PATCH 0/2] btrfs/vfs: Return same device in stat(2) and /proc/pid/maps Mark Fasheh
@ 2011-05-13 23:18 ` Mark Fasheh
  2011-05-15  3:06   ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Fasheh @ 2011-05-13 23:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, linux-fsdevel, Mark Fasheh

This patch introduces a callback in the super_operations structure,
'get_maps_dev' which is then used by procfs to query which device to return
for reporting in /proc/[PID]/maps.

btrfs wants this so that it can return the same device as it uses from
stat(2) calls.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
---
 fs/proc/task_mmu.c   |    2 +-
 fs/proc/task_nommu.c |    2 +-
 include/linux/fs.h   |    8 ++++++++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2e7addf..c0acc19 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -220,7 +220,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 
 	if (file) {
 		struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
-		dev = inode->i_sb->s_dev;
+		dev = get_maps_dev(inode);
 		ino = inode->i_ino;
 		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
 	}
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 980de54..6769efe 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -148,7 +148,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
 
 	if (file) {
 		struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
-		dev = inode->i_sb->s_dev;
+		dev = get_maps_dev(inode);
 		ino = inode->i_ino;
 		pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dbd860a..ad683ad 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1637,8 +1637,16 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	dev_t (*get_maps_dev)(struct inode *);
 };
 
+static inline dev_t get_maps_dev(struct inode *inode)
+{
+	if (inode->i_sb->s_op->get_maps_dev)
+		return inode->i_sb->s_op->get_maps_dev(inode);
+	return inode->i_sb->s_dev;
+}
+
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
  2011-05-13 23:18 ` [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device Mark Fasheh
@ 2011-05-15  3:06   ` Eric W. Biederman
  2011-05-19 20:18     ` Mark Fasheh
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2011-05-15  3:06 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Chris Mason, linux-fsdevel

Mark Fasheh <mfasheh@suse.com> writes:

> This patch introduces a callback in the super_operations structure,
> 'get_maps_dev' which is then used by procfs to query which device to return
> for reporting in /proc/[PID]/maps.

No.

It may make sense to call the vfs stat method.  But introducing an extra
vfs operations for this seems like a maintenance nightmare.


Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
  2011-05-15  3:06   ` Eric W. Biederman
@ 2011-05-19 20:18     ` Mark Fasheh
  2011-10-07 18:32       ` Mark Fasheh
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Fasheh @ 2011-05-19 20:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-btrfs, Chris Mason, linux-fsdevel

On Sat, May 14, 2011 at 08:06:04PM -0700, Eric W. Biederman wrote:
> Mark Fasheh <mfasheh@suse.com> writes:
> 
> > This patch introduces a callback in the super_operations structure,
> > 'get_maps_dev' which is then used by procfs to query which device to return
> > for reporting in /proc/[PID]/maps.
> 
> No.
> 
> It may make sense to call the vfs stat method.  But introducing an extra
> vfs operations for this seems like a maintenance nightmare.

Yeah I'm not thrilled with the extra method either. My concern with using
->getattr is whether it's too heavy since that implies potential disk /
network i/o.
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
  2011-05-19 20:18     ` Mark Fasheh
@ 2011-10-07 18:32       ` Mark Fasheh
  0 siblings, 0 replies; 5+ messages in thread
From: Mark Fasheh @ 2011-10-07 18:32 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, linux-fsdevel, Eric W. Biederman

Ressurecting an old thread, sorry. Here's the conversation thus far:

http://www.spinics.net/lists/linux-btrfs/msg10099.html

This is still hitting folks wishing to use btrfs on suse based systems.
Using getattr() (unconditionally I might add) is possible, but will affect
performance to a far greater degree than just allowing an optional deref of
whatever structure btrfs (and similar file systems) have to point to the
right block device. Is this really the way we would like to proceed? Chris,
maybe you can chime in here?
	--Mark

On Thu, May 19, 2011 at 01:18:26PM -0700, Mark Fasheh wrote:
> On Sat, May 14, 2011 at 08:06:04PM -0700, Eric W. Biederman wrote:
> > Mark Fasheh <mfasheh@suse.com> writes:
> > 
> > > This patch introduces a callback in the super_operations structure,
> > > 'get_maps_dev' which is then used by procfs to query which device to return
> > > for reporting in /proc/[PID]/maps.
> > 
> > No.
> > 
> > It may make sense to call the vfs stat method.  But introducing an extra
> > vfs operations for this seems like a maintenance nightmare.
> 
> Yeah I'm not thrilled with the extra method either. My concern with using
> ->getattr is whether it's too heavy since that implies potential disk /
> network i/o.
> 	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-10-07 18:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-13 23:18 [RFC][PATCH 0/2] btrfs/vfs: Return same device in stat(2) and /proc/pid/maps Mark Fasheh
2011-05-13 23:18 ` [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device Mark Fasheh
2011-05-15  3:06   ` Eric W. Biederman
2011-05-19 20:18     ` Mark Fasheh
2011-10-07 18:32       ` Mark Fasheh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.