All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space
@ 2018-07-31 21:10 Mark Fasheh
  2018-07-31 21:10 ` [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs Mark Fasheh
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 21:10 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs

Hi,

These patches are follow-up to this conversation on fsdevel which provides a
good break-down of the core problem:

https://www.spinics.net/lists/linux-fsdevel/msg128003.html

That e-mail was in turn a follow-up to the following patch series:

https://lwn.net/Articles/753917/

which tried to solve the dev portion of this problem with a bit of structure
re-routing but was never accepted.


We have an inconsistency in how the kernel is exporting inode number /
device pairs for user space.  I'm not talking about stat(2) or statx(2) here
but everywhere else where an ino/dev is exported into a buffer for user
space.

We typically do this by simply dumping the potentially 32 bit inode->i_ino
and single device from super->s_dev.  Sometimes these values differ from
what is returned via stat(2) or statx(2).  Some filesystems might even show
duplicate (but internally different!) pairs when the raw i_ino/s_dev is
used.

Aside from breaking software which expects these pairs to be unique, this
can put the user in a situation where they might not be able to find an
inode referenced from some kernel log (be it /proc/, printk buffer, etc). 
What's even worse - depending on how ino is exported, they might even find
an existing but /wrong/ file.

The following patches fix these inconsistencies by introducing a VFS helper
function which calls the underlying filesystem ->getattr to get our real
inode number / device pair.  The returned values can then be used at those
places in the kernel where we are incorrectly reporting our ino/dev pair. 
We then update fs/proc/ and fs/locks.c to use this helper when writing to
/proc/PID/maps and /proc/locks respectively.


For anyone who is watching the evolution of these patches, this would be one
implementation of the 'callback' method which I discussed in my last e-mail. 
This approach is very straight forward and easy to understand but has some
drawbacks:

- Not all of the call sites can tolerate errors.  Instead, we fallback to
  inode->i_ino and super->s_dev in case of getattr error.  That behavior is
  no worse than what we have today but what we have today is pretty bad so I
  don't particularly feel good about that. It's better than nothing though.

- There are places in the kernel where we could never call into ->getattr. 
  There are tons of tracepoints for example that are printing the the wrong
  ino/dev pair.

- The helper function has to fake a path struct because getting a vfsmount
  from inside these call sites is impossible.  This isn't actually a big
  deal as nobody except NFS cares and the NFS patch is very trivial.  It's
  ugly though, so if we had consensus on this approach I would happily
  rework our ->getattr function signature, perhaps pulling the vfsmount and
  dentry arguments back out.

- The dentry argument to ->getattr is potentially problematic as we have
  some places which _only_ have an inode. Even if we had a dentry I'm not
  sure what would keep it from going negative while in the middle of our
  ->getattr call.


Comments and review would be greatly appreciated. Thanks,
	--Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs
  2018-07-31 21:10 [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space Mark Fasheh
@ 2018-07-31 21:10 ` Mark Fasheh
  2018-07-31 23:21     ` Mark Fasheh
  2018-07-31 21:10 ` [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr Mark Fasheh
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 21:10 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs, Mark Fasheh

There are places in the VFS where we export an ino/dev pair to userspace.
/proc/PID/maps is a good example - we directly expose inode->i_ino and
inode->i_sb->s_dev to userspace there.  Many filesystems don't put a unique
value in inode->i_ino and instead rely on ->getattr to provide the real
inode number to userspace.  super->s_dev is similar - some filesystems
expose a difference device from what's put in super->s_dev when queried via
stat/statx.

Ultimately this makes it impossible for a user (or software) to match one of
those reported pairs to the right inode.

We can fix this by adding a helper function, vfs_map_unique_ino_dev(), which
will query the owning filesystem (via ->getattr) to get the correct ino/dev
pair.  Later patches will update those places which simply dump inode->i_ino
and super->s_dev to use the helper.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/stat.c          | 23 +++++++++++++++++++++++
 include/linux/fs.h |  2 ++
 2 files changed, 25 insertions(+)

diff --git a/fs/stat.c b/fs/stat.c
index f8e6fb2c3657..80ea42505219 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -84,6 +84,29 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 }
 EXPORT_SYMBOL(vfs_getattr_nosec);
 
+void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)
+{
+	int ret;
+	struct path path;
+	struct kstat stat;
+
+	path.mnt = NULL;
+	path.dentry = dentry;
+	/* ->dev is always returned, so we only need to specify ino here */
+	ret = vfs_getattr_nosec(&path, &stat, STATX_INO, 0);
+	if (ret) {
+		struct inode *inode = d_inode(dentry);
+		/* Fallback to old behavior in case of getattr error */
+		*ino = inode->i_ino;
+		*dev = inode->i_sb->s_dev;
+		return ret;
+	}
+	*ino = stat.ino;
+	*dev = stat.dev;
+	return 0;
+}
+EXPORT_SYMBOL(vfs_map_unique_ino_dev);
+
 /*
  * vfs_getattr - Get the enhanced basic attributes of a file
  * @path: The file of interest
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d78d146a98da..b80789472438 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3077,6 +3077,8 @@ extern void kfree_link(void *);
 extern void generic_fillattr(struct inode *, struct kstat *);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
 extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
+extern void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev);
+
 void __inode_add_bytes(struct inode *inode, loff_t bytes);
 void inode_add_bytes(struct inode *inode, loff_t bytes);
 void __inode_sub_bytes(struct inode *inode, loff_t bytes);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr
  2018-07-31 21:10 [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space Mark Fasheh
  2018-07-31 21:10 ` [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs Mark Fasheh
@ 2018-07-31 21:10 ` Mark Fasheh
  2018-07-31 22:16   ` Al Viro
  2018-07-31 21:10 ` [PATCH 3/4] proc: use vfs helper to get ino/dev pairs for maps file Mark Fasheh
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 21:10 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs, Mark Fasheh

->getattr from inside the kernel won't always have a vfsmount, check for
this.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/nfs/inode.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b65aee481d13..7a3d90a7cc92 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -795,7 +795,9 @@ int nfs_getattr(const struct path *path, struct kstat *stat,
 	}
 
 	/*
-	 * We may force a getattr if the user cares about atime.
+	 * We may force a getattr if the user cares about atime. A
+	 * NULL vfsmount means we are called from inside the kernel,
+	 * where no atime is required.
 	 *
 	 * Note that we only have to check the vfsmount flags here:
 	 *  - NFS always sets S_NOATIME by so checking it would give a
@@ -803,7 +805,7 @@ int nfs_getattr(const struct path *path, struct kstat *stat,
 	 *  - NFS never sets SB_NOATIME or SB_NODIRATIME so there is
 	 *    no point in checking those.
 	 */
-	if ((path->mnt->mnt_flags & MNT_NOATIME) ||
+	if (!path->mnt || (path->mnt->mnt_flags & MNT_NOATIME) ||
 	    ((path->mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)))
 		request_mask &= ~STATX_ATIME;
 
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] proc: use vfs helper to get ino/dev pairs for maps file
  2018-07-31 21:10 [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space Mark Fasheh
  2018-07-31 21:10 ` [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs Mark Fasheh
  2018-07-31 21:10 ` [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr Mark Fasheh
@ 2018-07-31 21:10 ` Mark Fasheh
  2018-07-31 21:10 ` [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace Mark Fasheh
  2018-08-02  0:24 ` [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space J. R. Okajima
  4 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 21:10 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs, Mark Fasheh

Proc writes ino/dev into /proc/PID/maps which it gets from i_ino and s_dev.
The problem with this is that i_ino and s_dev are not guaranteed to be
unique - we can have duplicates in the maps file for otherwise distinct
inodes.

This breaks any software expecting them to be unique, including lsof(8).  We
can fix this by using vfs_map_unique_ino_dev() to map the unique ino/dev
pair for us.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/proc/nommu.c      | 11 ++++-------
 fs/proc/task_mmu.c   |  8 +++-----
 fs/proc/task_nommu.c |  8 +++-----
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c
index 3b63be64e436..56c12d02c5ad 100644
--- a/fs/proc/nommu.c
+++ b/fs/proc/nommu.c
@@ -36,7 +36,7 @@
  */
 static int nommu_region_show(struct seq_file *m, struct vm_region *region)
 {
-	unsigned long ino = 0;
+        u64 ino = 0;
 	struct file *file;
 	dev_t dev = 0;
 	int flags;
@@ -44,15 +44,12 @@ static int nommu_region_show(struct seq_file *m, struct vm_region *region)
 	flags = region->vm_flags;
 	file = region->vm_file;
 
-	if (file) {
-		struct inode *inode = file_inode(region->vm_file);
-		dev = inode->i_sb->s_dev;
-		ino = inode->i_ino;
-	}
+	if (file)
+		vfs_map_unique_ino_dev(file_dentry(file), &ino, &dev); {
 
 	seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
 	seq_printf(m,
-		   "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ",
+		   "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %llu ",
 		   region->vm_start,
 		   region->vm_end,
 		   flags & VM_READ ? 'r' : '-',
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfd73a4616ce..e9cd33425191 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -276,7 +276,7 @@ static int is_stack(struct vm_area_struct *vma)
 static void show_vma_header_prefix(struct seq_file *m,
 				   unsigned long start, unsigned long end,
 				   vm_flags_t flags, unsigned long long pgoff,
-				   dev_t dev, unsigned long ino)
+				   dev_t dev, u64 ino)
 {
 	seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
 	seq_put_hex_ll(m, NULL, start, 8);
@@ -299,16 +299,14 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 	struct mm_struct *mm = vma->vm_mm;
 	struct file *file = vma->vm_file;
 	vm_flags_t flags = vma->vm_flags;
-	unsigned long ino = 0;
+	u64 ino = 0;
 	unsigned long long pgoff = 0;
 	unsigned long start, end;
 	dev_t dev = 0;
 	const char *name = NULL;
 
 	if (file) {
-		struct inode *inode = file_inode(vma->vm_file);
-		dev = inode->i_sb->s_dev;
-		ino = inode->i_ino;
+		vfs_map_unique_ino_dev(file_dentry(file), &ino, &dev);
 		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
 	}
 
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 5b62f57bd9bc..1198e979ed3f 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -146,7 +146,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
 			  int is_pid)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	unsigned long ino = 0;
+	u64 ino = 0;
 	struct file *file;
 	dev_t dev = 0;
 	int flags;
@@ -156,15 +156,13 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
 	file = vma->vm_file;
 
 	if (file) {
-		struct inode *inode = file_inode(vma->vm_file);
-		dev = inode->i_sb->s_dev;
-		ino = inode->i_ino;
+		vfs_map_unique_inode_dev(file_dentry(file), &ino, &dev));
 		pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
 	}
 
 	seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
 	seq_printf(m,
-		   "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ",
+		   "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %llu ",
 		   vma->vm_start,
 		   vma->vm_end,
 		   flags & VM_READ ? 'r' : '-',
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace
  2018-07-31 21:10 [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space Mark Fasheh
                   ` (2 preceding siblings ...)
  2018-07-31 21:10 ` [PATCH 3/4] proc: use vfs helper to get ino/dev pairs for maps file Mark Fasheh
@ 2018-07-31 21:10 ` Mark Fasheh
  2018-08-01  5:37   ` Amir Goldstein
  2018-08-01 11:03   ` Jeff Layton
  2018-08-02  0:24 ` [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space J. R. Okajima
  4 siblings, 2 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 21:10 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs, Mark Fasheh

/proc/locks does not always print the correct inode number/device pair.
Update lock_get_status() to use vfs_map_unique_ino_dev() to get the real,
unique values for userspace.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/locks.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index db7b6917d9c5..3a012df87fd8 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2621,6 +2621,7 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 			    loff_t id, char *pfx)
 {
 	struct inode *inode = NULL;
+	struct dentry *dentry;
 	unsigned int fl_pid;
 	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
 
@@ -2633,8 +2634,10 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 	if (fl_pid == 0)
 		return;
 
-	if (fl->fl_file != NULL)
+	if (fl->fl_file != NULL) {
 		inode = locks_inode(fl->fl_file);
+		dentry = file_dentry(fl->fl_file);
+	}
 
 	seq_printf(f, "%lld:%s ", id, pfx);
 	if (IS_POSIX(fl)) {
@@ -2681,10 +2684,13 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 			       : (fl->fl_type == F_WRLCK) ? "WRITE" : "READ ");
 	}
 	if (inode) {
+		__u64 ino;
+		dev_t dev;
+
+		vfs_map_unique_ino_dev(dentry, &ino, &dev);
 		/* userspace relies on this representation of dev_t */
 		seq_printf(f, "%d %02x:%02x:%ld ", fl_pid,
-				MAJOR(inode->i_sb->s_dev),
-				MINOR(inode->i_sb->s_dev), inode->i_ino);
+				MAJOR(dev), MINOR(dev), inode->i_ino);
 	} else {
 		seq_printf(f, "%d <none>:0 ", fl_pid);
 	}
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr
  2018-07-31 21:10 ` [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr Mark Fasheh
@ 2018-07-31 22:16   ` Al Viro
  2018-07-31 22:51     ` Mark Fasheh
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2018-07-31 22:16 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: linux-fsdevel, linux-btrfs, Andrew Morton, Amir Goldstein,
	linux-unionfs, Jeff Mahoney, linux-nfs

On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote:
> ->getattr from inside the kernel won't always have a vfsmount, check for
> this.

Really?  Where would that happen?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr
  2018-07-31 22:16   ` Al Viro
@ 2018-07-31 22:51     ` Mark Fasheh
  2018-08-02  0:43       ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 22:51 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, linux-btrfs, Andrew Morton, Amir Goldstein,
	linux-unionfs, Jeff Mahoney, linux-nfs

Hi Al,

On Tue, Jul 31, 2018 at 11:16:05PM +0100, Al Viro wrote:
> On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote:
> > ->getattr from inside the kernel won't always have a vfsmount, check for
> > this.
> 
> Really?  Where would that happen?

It happens in my first patch. FWIW, I'm not tied to that particular bit of
code, or even this (latest) approach. I would actually very much appreciate
your input into how we might fix the problem we have where we are sometimes
not exporting a correct ino/dev pair to user space.

I have a good break-down of the problem and possible solutions here:

https://www.spinics.net/lists/linux-fsdevel/msg128003.html

Thanks,
	--Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs
  2018-07-31 21:10 ` [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs Mark Fasheh
@ 2018-07-31 23:21     ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 23:21 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs


On Tue, Jul 31, 2018 at 02:10:42PM -0700, Mark Fasheh wrote:
> diff --git a/fs/stat.c b/fs/stat.c
> index f8e6fb2c3657..80ea42505219 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -84,6 +84,29 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>  }
>  EXPORT_SYMBOL(vfs_getattr_nosec);
>  
> +void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)
> +{
> +	int ret;
> +	struct path path;
> +	struct kstat stat;
> +
> +	path.mnt = NULL;
> +	path.dentry = dentry;
> +	/* ->dev is always returned, so we only need to specify ino here */
> +	ret = vfs_getattr_nosec(&path, &stat, STATX_INO, 0);
> +	if (ret) {
> +		struct inode *inode = d_inode(dentry);
> +		/* Fallback to old behavior in case of getattr error */
> +		*ino = inode->i_ino;
> +		*dev = inode->i_sb->s_dev;
> +		return ret;

Ooof, attached is a version of this patch which doesn't try to return a value
from a void function. Apologies for the extra e-mail.

>From Mark Fasheh <mfasheh@suse.de>

[PATCH 1/4] vfs: introduce function to map unique ino/dev pairs

There are places in the VFS where we export an ino/dev pair to userspace.
/proc/PID/maps is a good example - we directly expose inode->i_ino and
inode->i_sb->s_dev to userspace there.  Many filesystems don't put a unique
value in inode->i_ino and instead rely on ->getattr to provide the real
inode number to userspace.  super->s_dev is similar - some filesystems
expose a difference device from what's put in super->s_dev when queried via
stat/statx.

Ultimately this makes it impossible for a user (or software) to match one of
those reported pairs to the right inode.

We can fix this by adding a helper function, vfs_map_unique_ino_dev(), which
will query the owning filesystem (via ->getattr) to get the correct ino/dev
pair.  Later patches will update those places which simply dump inode->i_ino
and super->s_dev to use the helper.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/stat.c          | 22 ++++++++++++++++++++++
 include/linux/fs.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/fs/stat.c b/fs/stat.c
index f8e6fb2c3657..20c72d618ed5 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -84,6 +84,28 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 }
 EXPORT_SYMBOL(vfs_getattr_nosec);
 
+void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)
+{
+	int ret;
+	struct path path;
+	struct kstat stat;
+
+	path.mnt = NULL;
+	path.dentry = dentry;
+	/* ->dev is always returned, so we only need to specify ino here */
+	ret = vfs_getattr_nosec(&path, &stat, STATX_INO, 0);
+	if (ret) {
+		struct inode *inode = d_inode(dentry);
+		/* Fallback to old behavior in case of getattr error */
+		*ino = inode->i_ino;
+		*dev = inode->i_sb->s_dev;
+		return;
+	}
+	*ino = stat.ino;
+	*dev = stat.dev;
+}
+EXPORT_SYMBOL(vfs_map_unique_ino_dev);
+
 /*
  * vfs_getattr - Get the enhanced basic attributes of a file
  * @path: The file of interest
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d78d146a98da..b80789472438 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3077,6 +3077,8 @@ extern void kfree_link(void *);
 extern void generic_fillattr(struct inode *, struct kstat *);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
 extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
+extern void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev);
+
 void __inode_add_bytes(struct inode *inode, loff_t bytes);
 void inode_add_bytes(struct inode *inode, loff_t bytes);
 void __inode_sub_bytes(struct inode *inode, loff_t bytes);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs
@ 2018-07-31 23:21     ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-07-31 23:21 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs


On Tue, Jul 31, 2018 at 02:10:42PM -0700, Mark Fasheh wrote:
> diff --git a/fs/stat.c b/fs/stat.c
> index f8e6fb2c3657..80ea42505219 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -84,6 +84,29 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>  }
>  EXPORT_SYMBOL(vfs_getattr_nosec);
>  
> +void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)
> +{
> +	int ret;
> +	struct path path;
> +	struct kstat stat;
> +
> +	path.mnt = NULL;
> +	path.dentry = dentry;
> +	/* ->dev is always returned, so we only need to specify ino here */
> +	ret = vfs_getattr_nosec(&path, &stat, STATX_INO, 0);
> +	if (ret) {
> +		struct inode *inode = d_inode(dentry);
> +		/* Fallback to old behavior in case of getattr error */
> +		*ino = inode->i_ino;
> +		*dev = inode->i_sb->s_dev;
> +		return ret;

Ooof, attached is a version of this patch which doesn't try to return a value
from a void function. Apologies for the extra e-mail.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace
  2018-07-31 21:10 ` [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace Mark Fasheh
@ 2018-08-01  5:37   ` Amir Goldstein
  2018-08-01 17:45     ` Mark Fasheh
  2018-08-01 11:03   ` Jeff Layton
  1 sibling, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2018-08-01  5:37 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: Al Viro, linux-fsdevel, Linux Btrfs, Andrew Morton, overlayfs,
	Jeff Mahoney, Linux NFS Mailing List

On Wed, Aug 1, 2018 at 12:10 AM, Mark Fasheh <mfasheh@suse.de> wrote:
> /proc/locks does not always print the correct inode number/device pair.
> Update lock_get_status() to use vfs_map_unique_ino_dev() to get the real,
> unique values for userspace.
>
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/locks.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index db7b6917d9c5..3a012df87fd8 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2621,6 +2621,7 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
>                             loff_t id, char *pfx)
>  {
>         struct inode *inode = NULL;
> +       struct dentry *dentry;
>         unsigned int fl_pid;
>         struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>
> @@ -2633,8 +2634,10 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
>         if (fl_pid == 0)
>                 return;
>
> -       if (fl->fl_file != NULL)
> +       if (fl->fl_file != NULL) {
>                 inode = locks_inode(fl->fl_file);
> +               dentry = file_dentry(fl->fl_file);
> +       }
>
>         seq_printf(f, "%lld:%s ", id, pfx);
>         if (IS_POSIX(fl)) {
> @@ -2681,10 +2684,13 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
>                                : (fl->fl_type == F_WRLCK) ? "WRITE" : "READ ");
>         }
>         if (inode) {
> +               __u64 ino;
> +               dev_t dev;
> +
> +               vfs_map_unique_ino_dev(dentry, &ino, &dev);
>                 /* userspace relies on this representation of dev_t */
>                 seq_printf(f, "%d %02x:%02x:%ld ", fl_pid,
> -                               MAJOR(inode->i_sb->s_dev),
> -                               MINOR(inode->i_sb->s_dev), inode->i_ino);
> +                               MAJOR(dev), MINOR(dev), inode->i_ino);

Don't you mean ,ino); ?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs
  2018-07-31 23:21     ` Mark Fasheh
  (?)
@ 2018-08-01  5:41     ` Amir Goldstein
  2018-08-01 15:59       ` Mark Fasheh
  -1 siblings, 1 reply; 18+ messages in thread
From: Amir Goldstein @ 2018-08-01  5:41 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: Al Viro, linux-fsdevel, Linux Btrfs, Andrew Morton, overlayfs,
	Jeff Mahoney, Linux NFS Mailing List

On Wed, Aug 1, 2018 at 2:21 AM, Mark Fasheh <mfasheh@suse.de> wrote:
>
> On Tue, Jul 31, 2018 at 02:10:42PM -0700, Mark Fasheh wrote:
>> diff --git a/fs/stat.c b/fs/stat.c
>> index f8e6fb2c3657..80ea42505219 100644
>> --- a/fs/stat.c
>> +++ b/fs/stat.c
>> @@ -84,6 +84,29 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>>  }
>>  EXPORT_SYMBOL(vfs_getattr_nosec);
>>
>> +void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)
>> +{
>> +     int ret;
>> +     struct path path;
>> +     struct kstat stat;
>> +
>> +     path.mnt = NULL;
>> +     path.dentry = dentry;
>> +     /* ->dev is always returned, so we only need to specify ino here */
>> +     ret = vfs_getattr_nosec(&path, &stat, STATX_INO, 0);
>> +     if (ret) {
>> +             struct inode *inode = d_inode(dentry);
>> +             /* Fallback to old behavior in case of getattr error */
>> +             *ino = inode->i_ino;
>> +             *dev = inode->i_sb->s_dev;
>> +             return ret;
>
> Ooof, attached is a version of this patch which doesn't try to return a value
> from a void function. Apologies for the extra e-mail.
>
> From Mark Fasheh <mfasheh@suse.de>
>
> [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs
>
> There are places in the VFS where we export an ino/dev pair to userspace.
> /proc/PID/maps is a good example - we directly expose inode->i_ino and
> inode->i_sb->s_dev to userspace there.  Many filesystems don't put a unique
> value in inode->i_ino and instead rely on ->getattr to provide the real
> inode number to userspace.  super->s_dev is similar - some filesystems
> expose a difference device from what's put in super->s_dev when queried via
> stat/statx.
>
> Ultimately this makes it impossible for a user (or software) to match one of
> those reported pairs to the right inode.
>
> We can fix this by adding a helper function, vfs_map_unique_ino_dev(), which
> will query the owning filesystem (via ->getattr) to get the correct ino/dev
> pair.  Later patches will update those places which simply dump inode->i_ino
> and super->s_dev to use the helper.
>
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/stat.c          | 22 ++++++++++++++++++++++
>  include/linux/fs.h |  2 ++
>  2 files changed, 24 insertions(+)
>
> diff --git a/fs/stat.c b/fs/stat.c
> index f8e6fb2c3657..20c72d618ed5 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -84,6 +84,28 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>  }
>  EXPORT_SYMBOL(vfs_getattr_nosec);
>
> +void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)

I find this function name a bit more than function can guaranty.
It's just a fancy wrapper around ->getattr()
How about vfs_get_ino_dev() ?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace
  2018-07-31 21:10 ` [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace Mark Fasheh
  2018-08-01  5:37   ` Amir Goldstein
@ 2018-08-01 11:03   ` Jeff Layton
  2018-08-01 20:38     ` Mark Fasheh
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2018-08-01 11:03 UTC (permalink / raw)
  To: Mark Fasheh, Al Viro, linux-fsdevel
  Cc: linux-btrfs, Andrew Morton, Amir Goldstein, linux-unionfs,
	Jeff Mahoney, linux-nfs

On Tue, 2018-07-31 at 14:10 -0700, Mark Fasheh wrote:
> /proc/locks does not always print the correct inode number/device pair.
> Update lock_get_status() to use vfs_map_unique_ino_dev() to get the real,
> unique values for userspace.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/locks.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index db7b6917d9c5..3a012df87fd8 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2621,6 +2621,7 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
>  			    loff_t id, char *pfx)
>  {
>  	struct inode *inode = NULL;
> +	struct dentry *dentry;
>  	unsigned int fl_pid;
>  	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
> @@ -2633,8 +2634,10 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
>  	if (fl_pid == 0)
>  		return;
>  
> -	if (fl->fl_file != NULL)
> +	if (fl->fl_file != NULL) {
>  		inode = locks_inode(fl->fl_file);
> +		dentry = file_dentry(fl->fl_file);
> +	}
>  
>  	seq_printf(f, "%lld:%s ", id, pfx);
>  	if (IS_POSIX(fl)) {
> @@ -2681,10 +2684,13 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
>  			       : (fl->fl_type == F_WRLCK) ? "WRITE" : "READ ");
>  	}
>  	if (inode) {
> +		__u64 ino;
> +		dev_t dev;
> +
> +		vfs_map_unique_ino_dev(dentry, &ino, &dev);

This code is under a spinlock (blocked_locks_lock or ctx->flc_lock). I
don't think it'll be ok to call ->getattr while holding a spinlock.

>  		/* userspace relies on this representation of dev_t */
>  		seq_printf(f, "%d %02x:%02x:%ld ", fl_pid,
> -				MAJOR(inode->i_sb->s_dev),
> -				MINOR(inode->i_sb->s_dev), inode->i_ino);
> +				MAJOR(dev), MINOR(dev), inode->i_ino);
>  	} else {
>  		seq_printf(f, "%d <none>:0 ", fl_pid);
>  	}

-- 
Jeff Layton <jlayton@poochiereds.net>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs
  2018-08-01  5:41     ` Amir Goldstein
@ 2018-08-01 15:59       ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-08-01 15:59 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Al Viro, linux-fsdevel, Linux Btrfs, Andrew Morton, overlayfs,
	Jeff Mahoney, Linux NFS Mailing List

Hi Amir,

On Wed, Aug 01, 2018 at 08:41:14AM +0300, Amir Goldstein wrote:
> > +void vfs_map_unique_ino_dev(struct dentry *dentry, u64 *ino, dev_t *dev)
> 
> I find this function name a bit more than function can guaranty.
> It's just a fancy wrapper around ->getattr()
> How about vfs_get_ino_dev() ?

Yeah I agree with that. An early version actually had the name
vfs_get_ino_dev(), I don't mind going back to it.

Thanks for the review!
	--Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace
  2018-08-01  5:37   ` Amir Goldstein
@ 2018-08-01 17:45     ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-08-01 17:45 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Al Viro, linux-fsdevel, Linux Btrfs, Andrew Morton, overlayfs,
	Jeff Mahoney, Linux NFS Mailing List

On Wed, Aug 01, 2018 at 08:37:31AM +0300, Amir Goldstein wrote:
> >         if (inode) {
> > +               __u64 ino;
> > +               dev_t dev;
> > +
> > +               vfs_map_unique_ino_dev(dentry, &ino, &dev);
> >                 /* userspace relies on this representation of dev_t */
> >                 seq_printf(f, "%d %02x:%02x:%ld ", fl_pid,
> > -                               MAJOR(inode->i_sb->s_dev),
> > -                               MINOR(inode->i_sb->s_dev), inode->i_ino);
> > +                               MAJOR(dev), MINOR(dev), inode->i_ino);
> 
> Don't you mean ,ino); ?

Indeed I do, thanks for catching that one Amir.
	--Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace
  2018-08-01 11:03   ` Jeff Layton
@ 2018-08-01 20:38     ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-08-01 20:38 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-fsdevel, linux-btrfs, Andrew Morton,
	Amir Goldstein, linux-unionfs, Jeff Mahoney, linux-nfs

Hi Jeff,

On Wed, Aug 01, 2018 at 07:03:54AM -0400, Jeff Layton wrote:
> On Tue, 2018-07-31 at 14:10 -0700, Mark Fasheh wrote:
> > /proc/locks does not always print the correct inode number/device pair.
> > Update lock_get_status() to use vfs_map_unique_ino_dev() to get the real,
> > unique values for userspace.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > ---
> >  fs/locks.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index db7b6917d9c5..3a012df87fd8 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -2621,6 +2621,7 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> >  			    loff_t id, char *pfx)
> >  {
> >  	struct inode *inode = NULL;
> > +	struct dentry *dentry;
> >  	unsigned int fl_pid;
> >  	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> >  
> > @@ -2633,8 +2634,10 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> >  	if (fl_pid == 0)
> >  		return;
> >  
> > -	if (fl->fl_file != NULL)
> > +	if (fl->fl_file != NULL) {
> >  		inode = locks_inode(fl->fl_file);
> > +		dentry = file_dentry(fl->fl_file);
> > +	}
> >  
> >  	seq_printf(f, "%lld:%s ", id, pfx);
> >  	if (IS_POSIX(fl)) {
> > @@ -2681,10 +2684,13 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> >  			       : (fl->fl_type == F_WRLCK) ? "WRITE" : "READ ");
> >  	}
> >  	if (inode) {
> > +		__u64 ino;
> > +		dev_t dev;
> > +
> > +		vfs_map_unique_ino_dev(dentry, &ino, &dev);
> 
> This code is under a spinlock (blocked_locks_lock or ctx->flc_lock). I
> don't think it'll be ok to call ->getattr while holding a spinlock.

Hmm, indeed it does call under spinlock. I'll hve to rethink how to approach
this in fs/locks.c

Thanks,
	--Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space
  2018-07-31 21:10 [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space Mark Fasheh
                   ` (3 preceding siblings ...)
  2018-07-31 21:10 ` [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace Mark Fasheh
@ 2018-08-02  0:24 ` J. R. Okajima
  4 siblings, 0 replies; 18+ messages in thread
From: J. R. Okajima @ 2018-08-02  0:24 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: Al Viro, linux-fsdevel, linux-btrfs, Andrew Morton,
	Amir Goldstein, linux-unionfs, Jeff Mahoney, linux-nfs

Mark Fasheh:
> The following patches fix these inconsistencies by introducing a VFS helper
> function which calls the underlying filesystem ->getattr to get our real
> inode number / device pair.  The returned values can then be used at those
> places in the kernel where we are incorrectly reporting our ino/dev pair. 
> We then update fs/proc/ and fs/locks.c to use this helper when writing to
> /proc/PID/maps and /proc/locks respectively.

I definitly agree that ino/dev pair should be a unique identity on the
system.  But I don't know why you are tryng to solve the problem in
generic VFS layer instead of the problematic FS.  Isn't it an
unnecessary overhead for many FS?
How about creating a new f_op member ->get_ino_dev(), ->show_identity()
or something, and implement the new f_op in the problematic FS only?
I hope it will be a lighter way to get the pair than generic getattr
way.


J. R. Okajima

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr
  2018-07-31 22:51     ` Mark Fasheh
@ 2018-08-02  0:43       ` Al Viro
  2018-08-03  2:04         ` Mark Fasheh
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2018-08-02  0:43 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: linux-fsdevel, linux-btrfs, Andrew Morton, Amir Goldstein,
	linux-unionfs, Jeff Mahoney, linux-nfs

On Tue, Jul 31, 2018 at 10:51:57PM +0000, Mark Fasheh wrote:
> Hi Al,
> 
> On Tue, Jul 31, 2018 at 11:16:05PM +0100, Al Viro wrote:
> > On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote:
> > > ->getattr from inside the kernel won't always have a vfsmount, check for
> > > this.
> > 
> > Really?  Where would that happen?
> 
> It happens in my first patch. FWIW, I'm not tied to that particular bit of
> code, or even this (latest) approach. I would actually very much appreciate
> your input into how we might fix the problem we have where we are sometimes
> not exporting a correct ino/dev pair to user space.

Which callers would those be?  I don't see anything in your series that
wouldn't have vfsmount at hand...

> I have a good break-down of the problem and possible solutions here:
> 
> https://www.spinics.net/lists/linux-fsdevel/msg128003.html

btrfs pulling odd crap with device numbers is a problem, but this is far from
the most obvious unpleasantness caused by that - e.g. userland code expecting
that unequal st_dev for directory and subdirectory means that we have
a mountpoint there.  Or assuming it can compare that to st_dev of mountpoints
(or, indeed, the values in /proc/self/mountinfo) to find which fs it's on...

/proc/*/maps is unfortunate, but it does contain the pathnames, doesn't it?
What _real_ problems are there - the ones where we don't have a saner solution?
Note that /proc/locks is (a) FPOS interface and (b) unsuitable for your
approach anyway.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr
  2018-08-02  0:43       ` Al Viro
@ 2018-08-03  2:04         ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2018-08-03  2:04 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, linux-btrfs, Andrew Morton, Amir Goldstein,
	linux-unionfs, Jeff Mahoney, linux-nfs

Hi Al,

On Thu, Aug 02, 2018 at 01:43:48AM +0100, Al Viro wrote:
> On Tue, Jul 31, 2018 at 10:51:57PM +0000, Mark Fasheh wrote:
> > Hi Al,
> > 
> > On Tue, Jul 31, 2018 at 11:16:05PM +0100, Al Viro wrote:
> > > On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote:
> > > > ->getattr from inside the kernel won't always have a vfsmount, check for
> > > > this.
> > > 
> > > Really?  Where would that happen?
> > 
> > It happens in my first patch. FWIW, I'm not tied to that particular bit of
> > code, or even this (latest) approach. I would actually very much appreciate
> > your input into how we might fix the problem we have where we are sometimes
> > not exporting a correct ino/dev pair to user space.
> 
> Which callers would those be?  I don't see anything in your series that
> wouldn't have vfsmount at hand...

You are correct - there's no callers in my patch series that don't have a
vfsmount available. I can remove patch #2 and pass the vfsmount through.

At some point I was trying to plug this callback into audit_copy_inode()
which AFAICT doesn't have a vfsmount to pass in when we're coming from
vfs_rename (vfs_rename() -> fsnotify_move()). I will eventually have to go
back and handle this for audit though so it seems worth mentioning here.


> > I have a good break-down of the problem and possible solutions here:
> > 
> > https://www.spinics.net/lists/linux-fsdevel/msg128003.html
> 
> btrfs pulling odd crap with device numbers is a problem, but this is far from
> the most obvious unpleasantness caused by that - e.g. userland code expecting
> that unequal st_dev for directory and subdirectory means that we have
> a mountpoint there.  Or assuming it can compare that to st_dev of mountpoints
> (or, indeed, the values in /proc/self/mountinfo) to find which fs it's on...

Indeed, I agree that all of these are pretty gross and things I'd like to
see fixed - I have to deal with subvolume boundaries in some of my own
software. I know Jeff is working on some code to give subvolumes their own
vfsmount but that wouldn't help with our device reporting unless we did
something like move s_dev from the super_block to the vfsmount (also an idea
that's been thrown around).


> /proc/*/maps is unfortunate, but it does contain the pathnames, doesn't it?
> What _real_ problems are there - the ones where we don't have a saner solution?
> Note that /proc/locks is (a) FPOS interface and (b) unsuitable for your
> approach anyway.

Yeah I see now that /proc/locks isn't going to work with something calling
into ->getattr().

Primarily my concerns are /proc/*/maps and audit (which is recording ino/dev
pairs). Even though /proc/*/maps prints a path it is still a problem as
there is no way for userspace to verify that the inode it stats using that
path is the correct one.

I don't mean to overstate the case but the /proc/*/maps issue is real to us
(SUSE) in that it's produced bug reports. For example, lsof produces
confusing output when we run with a btrfs subvolume as root.

As far as problems that can't be solved by this approach - there are
tracepoints in events/lock.h and events/writeback.h which ought to be
reporting the correct ino/dev pairs. There's of course /proc/locks too. I
acknowledge that whether those are severe enough to warrant a different
approach might be up for debate.

Thanks,
	--Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-08-03  2:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 21:10 [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space Mark Fasheh
2018-07-31 21:10 ` [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs Mark Fasheh
2018-07-31 23:21   ` Mark Fasheh
2018-07-31 23:21     ` Mark Fasheh
2018-08-01  5:41     ` Amir Goldstein
2018-08-01 15:59       ` Mark Fasheh
2018-07-31 21:10 ` [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr Mark Fasheh
2018-07-31 22:16   ` Al Viro
2018-07-31 22:51     ` Mark Fasheh
2018-08-02  0:43       ` Al Viro
2018-08-03  2:04         ` Mark Fasheh
2018-07-31 21:10 ` [PATCH 3/4] proc: use vfs helper to get ino/dev pairs for maps file Mark Fasheh
2018-07-31 21:10 ` [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace Mark Fasheh
2018-08-01  5:37   ` Amir Goldstein
2018-08-01 17:45     ` Mark Fasheh
2018-08-01 11:03   ` Jeff Layton
2018-08-01 20:38     ` Mark Fasheh
2018-08-02  0:24 ` [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space J. R. Okajima

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.