* [PATCH] fix inconsistent device between /proc/pid/maps and stat
@ 2017-03-21 1:53 Satoru Takeuchi
2017-03-24 11:58 ` David Sterba
0 siblings, 1 reply; 4+ messages in thread
From: Satoru Takeuchi @ 2017-03-21 1:53 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Fasheh, David Sterba
There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).
http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
https://www.spinics.net/lists/linux-btrfs/msg09044.html
https://patchwork.kernel.org/patch/2825842/
https://patchwork.kernel.org/patch/2831525/
It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
However, unfortunately, that patch is not merged so far.
NOTE:
1) This patch is inspired by the Mark's one and I tweak it as follows.
- move a flag for this fix from sb->s_flags to kernel internal sb->s_iflags
- change that flag's name from MS_STAT_FOR_DEV to SB_I_LOGICALVOL, to make its meaning clearer
2) This patch is for Chris's for-linus-4.11 (commit e1699d2d7bf6e6cce3e1baff19f9dd4595a58664 ("")). It should modify to apply to 4.11-rcX because of statx patches changes
struct inode_operations->getattr() interface.
For more information about this problem, please refer to the patchat the end of this mail.
---
Subject: [PATCH] fix inconsistent devie between /proc/pid/maps and stat
/proc/pid/maps returns each device as follows.
===
dev = inode->i_sb->s_dev;
===
However, stat(2) returns it as follows.
===
stat->dev = BTRFS_I(inode)->root->anon_dev;
===
It results in device mismatch and this inconsistency breaks users programs like lsof(8) as follows.
Without this patch:
===
$ mount | grep /home/vagrant/mnt
/dev/vda5 on /home/vagrant/mnt type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/)
$ /home/vagrant/mnt/lsof | grep /home/vagrant/mnt
lsof 19292 root txt REG 0,44 163136 257 /home/sat/mnt/lsof
lsof 19292 root mem REG 0,42 257 /home/sat/mnt/lsof (path dev=0,44)
lsof 19294 root txt REG 0,44 163136 257 /home/sat/mnt/lsof
lsof 19294 root mem REG 0,42 257 /home/sat/mnt/lsof (path dev=0,44)
===
With this patch:
===
$ mount | grep /home/vagrant/mnt
/dev/vda5 on /home/vagrant/mnt type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/)
$ /home/vagrant/mnt/lsof | grep /home/vagrant/mnt
lsof 752 root txt REG 0,44 163224 263 /home/vagrant/mnt/lsof
lsof 754 root txt REG 0,44 163224 263 /home/vagrant/mnt/lsof
===
Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
CC: Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
fs/btrfs/super.c | 1 +
fs/proc/generic.c | 13 +++++++++++++
fs/proc/internal.h | 1 +
fs/proc/nommu.c | 2 +-
fs/proc/task_mmu.c | 2 +-
fs/proc/task_nommu.c | 2 +-
include/linux/fs.h | 1 +
7 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index da687dc79cce..2ccfdb107ba0 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1133,6 +1133,7 @@ static int btrfs_fill_super(struct super_block *sb,
#endif
sb->s_flags |= MS_I_VERSION;
sb->s_iflags |= SB_I_CGROUPWB;
+ sb->s_iflags |= SB_I_LOGICALVOL;
err = open_ctree(sb, fs_devices, (char *)data);
if (err) {
btrfs_err(fs_info, "open_ctree failed");
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index f6a01f09f79d..d38cd77b297c 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -646,3 +646,16 @@ void *PDE_DATA(const struct inode *inode)
return __PDE_DATA(inode);
}
EXPORT_SYMBOL(PDE_DATA);
+
+dev_t proc_get_map_dev(struct dentry *dentry)
+{
+ struct inode *inode = dentry->d_inode;
+ struct kstat kstat;
+
+ if (inode->i_sb->s_iflags & SB_I_LOGICALVOL &&
+ inode->i_op->getattr &&
+ inode->i_op->getattr(NULL, dentry, &kstat) == 0)
+ return kstat.dev;
+
+ return inode->i_sb->s_dev;
+}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2de5194ba378..bf0f11fc209b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -190,6 +190,7 @@ static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde)
return pde;
}
extern void pde_put(struct proc_dir_entry *);
+dev_t proc_get_map_dev(struct dentry *);
static inline bool is_empty_pde(const struct proc_dir_entry *pde)
{
diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c
index 75634379f82e..e9c29864a50e 100644
--- a/fs/proc/nommu.c
+++ b/fs/proc/nommu.c
@@ -46,7 +46,7 @@ static int nommu_region_show(struct seq_file *m, struct vm_region *region)
if (file) {
struct inode *inode = file_inode(region->vm_file);
- dev = inode->i_sb->s_dev;
+ dev = proc_get_map_dev(vma->vm_file->f_path.dentry);
ino = inode->i_ino;
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8f96a49178d0..3529adf0faa8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -292,7 +292,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
if (file) {
struct inode *inode = file_inode(vma->vm_file);
- dev = inode->i_sb->s_dev;
+ dev = proc_get_map_dev(vma->vm_file->f_path.dentry);
ino = inode->i_ino;
pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
}
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 37175621e890..4f17f91225c3 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -156,7 +156,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
if (file) {
struct inode *inode = file_inode(vma->vm_file);
- dev = inode->i_sb->s_dev;
+ dev = proc_get_map_dev(vma->vm_file->f_path.dentry);
ino = inode->i_ino;
pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2ba074328894..eec62a5bfd4d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1262,6 +1262,7 @@ struct mm_struct;
#define SB_I_CGROUPWB 0x00000001 /* cgroup-aware writeback enabled */
#define SB_I_NOEXEC 0x00000002 /* Ignore executables on this fs */
#define SB_I_NODEV 0x00000004 /* Ignore devices on this fs */
+#define SB_I_LOGICALVOL 0x00000008 /* this fs supports logical volumes */
/* sb->s_iflags to limit user namespace mounts */
#define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */
--
2.11.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] fix inconsistent device between /proc/pid/maps and stat
2017-03-21 1:53 [PATCH] fix inconsistent device between /proc/pid/maps and stat Satoru Takeuchi
@ 2017-03-24 11:58 ` David Sterba
2017-03-24 13:32 ` Satoru Takeuchi
0 siblings, 1 reply; 4+ messages in thread
From: David Sterba @ 2017-03-24 11:58 UTC (permalink / raw)
To: Satoru Takeuchi; +Cc: linux-btrfs, Mark Fasheh, David Sterba
On Tue, Mar 21, 2017 at 10:53:09AM +0900, Satoru Takeuchi wrote:
> There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).
>
> http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
> https://www.spinics.net/lists/linux-btrfs/msg09044.html
> https://patchwork.kernel.org/patch/2825842/
> https://patchwork.kernel.org/patch/2831525/
>
> It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
> However, unfortunately, that patch is not merged so far.
And no variant of the get_map_dev will ever be merged. Reworking this
requires extensions to the superblock and subvolume structures, making
it more generic and suitable for VFS.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] fix inconsistent device between /proc/pid/maps and stat
2017-03-24 11:58 ` David Sterba
@ 2017-03-24 13:32 ` Satoru Takeuchi
0 siblings, 0 replies; 4+ messages in thread
From: Satoru Takeuchi @ 2017-03-24 13:32 UTC (permalink / raw)
To: dsterba, Satoru Takeuchi, linux-btrfs, Mark Fasheh, David Sterba
2017-03-24 20:58 GMT+09:00 David Sterba <dsterba@suse.cz>:
> On Tue, Mar 21, 2017 at 10:53:09AM +0900, Satoru Takeuchi wrote:
>> There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).
>>
>> http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
>> https://www.spinics.net/lists/linux-btrfs/msg09044.html
>> https://patchwork.kernel.org/patch/2825842/
>> https://patchwork.kernel.org/patch/2831525/
>>
>> It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
>> However, unfortunately, that patch is not merged so far.
>
> And no variant of the get_map_dev will ever be merged. Reworking this
> requires extensions to the superblock and subvolume structures, making
> it more generic and suitable for VFS.
Thank you for your comment. I'm going to reconsider how to fix this problem.
Thanks,
Satoru
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] fix inconsistent device between /proc/pid/maps and stat
@ 2017-03-24 17:29 Hirotaka Yamamoto
0 siblings, 0 replies; 4+ messages in thread
From: Hirotaka Yamamoto @ 2017-03-24 17:29 UTC (permalink / raw)
To: linux-btrfs; +Cc: dsterba, satoru.takeuchi
Hi,
2017-03-24 20:58 GMT+09:00 David Sterba <dsterba@suse.cz>:
> On Tue, Mar 21, 2017 at 10:53:09AM +0900, Satoru Takeuchi wrote:
>> There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).
>>
>> http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
>> https://www.spinics.net/lists/linux-btrfs/msg09044.html
>> https://patchwork.kernel.org/patch/2825842/
>> https://patchwork.kernel.org/patch/2831525/
>>
>> It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
>> However, unfortunately, that patch is not merged so far.
>
> And no variant of the get_map_dev will ever be merged. Reworking this
> requires extensions to the superblock and subvolume structures, making
> it more generic and suitable for VFS.
Aside from how to fix, yet I think fixing this is quite important.
I have found the problem when I wanted to list processes using vulnerable
shared libraries by, say, "lsof /lib/x86_64-linux-gnu/libcrypto.so.1.0.0".
As lsof did not report any process, I almost failed to restart processes
that were using the library in reality after installing upgrades.
Leaving this problem would open a big security concern as btrfs is
getting more popular.
Satoru, thank you for the proposal, and good luck to further
improvements!
@ymmt2005
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-03-24 17:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-21 1:53 [PATCH] fix inconsistent device between /proc/pid/maps and stat Satoru Takeuchi
2017-03-24 11:58 ` David Sterba
2017-03-24 13:32 ` Satoru Takeuchi
2017-03-24 17:29 Hirotaka Yamamoto
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.