From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f66.google.com ([74.125.83.66]:33683 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755279AbdCUCCR (ORCPT ); Mon, 20 Mar 2017 22:02:17 -0400 Received: by mail-pg0-f66.google.com with SMTP id 79so14073929pgf.0 for ; Mon, 20 Mar 2017 19:02:16 -0700 (PDT) Date: Tue, 21 Mar 2017 10:53:09 +0900 Message-ID: <871strcs56.wl-satoru.takeuchi@gmail.com> From: Satoru Takeuchi To: linux-btrfs CC: Mark Fasheh CC: David Sterba Subject: [PATCH] fix inconsistent device between /proc/pid/maps and stat MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Sender: linux-btrfs-owner@vger.kernel.org List-ID: There have been some discussions about inconsistent device between /proc/pid/maps and stat(2). http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps https://www.spinics.net/lists/linux-btrfs/msg09044.html https://patchwork.kernel.org/patch/2825842/ https://patchwork.kernel.org/patch/2831525/ It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem. However, unfortunately, that patch is not merged so far. NOTE: 1) This patch is inspired by the Mark's one and I tweak it as follows. - move a flag for this fix from sb->s_flags to kernel internal sb->s_iflags - change that flag's name from MS_STAT_FOR_DEV to SB_I_LOGICALVOL, to make its meaning clearer 2) This patch is for Chris's for-linus-4.11 (commit e1699d2d7bf6e6cce3e1baff19f9dd4595a58664 ("")). It should modify to apply to 4.11-rcX because of statx patches changes struct inode_operations->getattr() interface. For more information about this problem, please refer to the patchat the end of this mail. --- Subject: [PATCH] fix inconsistent devie between /proc/pid/maps and stat /proc/pid/maps returns each device as follows. === dev = inode->i_sb->s_dev; === However, stat(2) returns it as follows. === stat->dev = BTRFS_I(inode)->root->anon_dev; === It results in device mismatch and this inconsistency breaks users programs like lsof(8) as follows. Without this patch: === $ mount | grep /home/vagrant/mnt /dev/vda5 on /home/vagrant/mnt type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/) $ /home/vagrant/mnt/lsof | grep /home/vagrant/mnt lsof 19292 root txt REG 0,44 163136 257 /home/sat/mnt/lsof lsof 19292 root mem REG 0,42 257 /home/sat/mnt/lsof (path dev=0,44) lsof 19294 root txt REG 0,44 163136 257 /home/sat/mnt/lsof lsof 19294 root mem REG 0,42 257 /home/sat/mnt/lsof (path dev=0,44) === With this patch: === $ mount | grep /home/vagrant/mnt /dev/vda5 on /home/vagrant/mnt type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/) $ /home/vagrant/mnt/lsof | grep /home/vagrant/mnt lsof 752 root txt REG 0,44 163224 263 /home/vagrant/mnt/lsof lsof 754 root txt REG 0,44 163224 263 /home/vagrant/mnt/lsof === Signed-off-by: Satoru Takeuchi CC: Signed-off-by: Mark Fasheh --- fs/btrfs/super.c | 1 + fs/proc/generic.c | 13 +++++++++++++ fs/proc/internal.h | 1 + fs/proc/nommu.c | 2 +- fs/proc/task_mmu.c | 2 +- fs/proc/task_nommu.c | 2 +- include/linux/fs.h | 1 + 7 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index da687dc79cce..2ccfdb107ba0 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1133,6 +1133,7 @@ static int btrfs_fill_super(struct super_block *sb, #endif sb->s_flags |= MS_I_VERSION; sb->s_iflags |= SB_I_CGROUPWB; + sb->s_iflags |= SB_I_LOGICALVOL; err = open_ctree(sb, fs_devices, (char *)data); if (err) { btrfs_err(fs_info, "open_ctree failed"); diff --git a/fs/proc/generic.c b/fs/proc/generic.c index f6a01f09f79d..d38cd77b297c 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -646,3 +646,16 @@ void *PDE_DATA(const struct inode *inode) return __PDE_DATA(inode); } EXPORT_SYMBOL(PDE_DATA); + +dev_t proc_get_map_dev(struct dentry *dentry) +{ + struct inode *inode = dentry->d_inode; + struct kstat kstat; + + if (inode->i_sb->s_iflags & SB_I_LOGICALVOL && + inode->i_op->getattr && + inode->i_op->getattr(NULL, dentry, &kstat) == 0) + return kstat.dev; + + return inode->i_sb->s_dev; +} diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 2de5194ba378..bf0f11fc209b 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -190,6 +190,7 @@ static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde) return pde; } extern void pde_put(struct proc_dir_entry *); +dev_t proc_get_map_dev(struct dentry *); static inline bool is_empty_pde(const struct proc_dir_entry *pde) { diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c index 75634379f82e..e9c29864a50e 100644 --- a/fs/proc/nommu.c +++ b/fs/proc/nommu.c @@ -46,7 +46,7 @@ static int nommu_region_show(struct seq_file *m, struct vm_region *region) if (file) { struct inode *inode = file_inode(region->vm_file); - dev = inode->i_sb->s_dev; + dev = proc_get_map_dev(vma->vm_file->f_path.dentry); ino = inode->i_ino; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8f96a49178d0..3529adf0faa8 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -292,7 +292,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) if (file) { struct inode *inode = file_inode(vma->vm_file); - dev = inode->i_sb->s_dev; + dev = proc_get_map_dev(vma->vm_file->f_path.dentry); ino = inode->i_ino; pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; } diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 37175621e890..4f17f91225c3 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -156,7 +156,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, if (file) { struct inode *inode = file_inode(vma->vm_file); - dev = inode->i_sb->s_dev; + dev = proc_get_map_dev(vma->vm_file->f_path.dentry); ino = inode->i_ino; pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 2ba074328894..eec62a5bfd4d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1262,6 +1262,7 @@ struct mm_struct; #define SB_I_CGROUPWB 0x00000001 /* cgroup-aware writeback enabled */ #define SB_I_NOEXEC 0x00000002 /* Ignore executables on this fs */ #define SB_I_NODEV 0x00000004 /* Ignore devices on this fs */ +#define SB_I_LOGICALVOL 0x00000008 /* this fs supports logical volumes */ /* sb->s_iflags to limit user namespace mounts */ #define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */ -- 2.11.0