* mm: shmem_zero_setup skip security check and lockdep conflict with XFS @ 2015-06-14 16:48 Hugh Dickins 2015-06-15 6:09 ` Daniel Wagner 2015-07-08 13:13 ` Stephen Smalley 0 siblings, 2 replies; 14+ messages in thread From: Hugh Dickins @ 2015-06-14 16:48 UTC (permalink / raw) To: Linus Torvalds Cc: Prarit Bhargava, Morten Stevens, Daniel Wagner, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, linux-kernel It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a ("security: shmem: implement kernel private shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Daniel Wagner <wagi@monom.org> Reported-by: Morten Stevens <mstevens@fedoraproject.org> Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/shmem.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru struct file *file; loff_t size = vma->vm_end - vma->vm_start; - file = shmem_file_setup("dev/zero", size, vma->vm_flags); + /* + * Cloning a new file under mmap_sem leads to a lock ordering conflict + * between XFS directory reading and selinux: since this file is only + * accessible to the user through its mapping, use S_PRIVATE flag to + * bypass file security, in the same way as shmem_kernel_file_setup(). + */ + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); if (IS_ERR(file)) return PTR_ERR(file); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-06-14 16:48 mm: shmem_zero_setup skip security check and lockdep conflict with XFS Hugh Dickins @ 2015-06-15 6:09 ` Daniel Wagner 2015-06-16 20:27 ` Hugh Dickins 2015-06-17 11:45 ` Morten Stevens 2015-07-08 13:13 ` Stephen Smalley 1 sibling, 2 replies; 14+ messages in thread From: Daniel Wagner @ 2015-06-15 6:09 UTC (permalink / raw) To: Hugh Dickins, Linus Torvalds Cc: Prarit Bhargava, Morten Stevens, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, linux-kernel On 06/14/2015 06:48 PM, Hugh Dickins wrote: > It appears that, at some point last year, XFS made directory handling > changes which bring it into lockdep conflict with shmem_zero_setup(): > it is surprising that mmap() can clone an inode while holding mmap_sem, > but that has been so for many years. > > Since those few lockdep traces that I've seen all implicated selinux, > I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which > v3.13's commit c7277090927a ("security: shmem: implement kernel private > shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: > the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. > > This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero > (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers > which cloned inode in mmap(), but if so, I cannot locate them now. > > Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> > Reported-by: Daniel Wagner <wagi@monom.org> Reported-and-tested-by: Daniel Wagner <wagi@monom.org> Sorry for the long delay. It took me a while to figure out my original setup. I could verify that this patch made the lockdep message go away on 4.0-rc6 and also on 4.1-rc8. For the record: SELinux needs to be enabled triggering it. cheers, daniel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-06-15 6:09 ` Daniel Wagner @ 2015-06-16 20:27 ` Hugh Dickins 2015-06-17 11:45 ` Morten Stevens 1 sibling, 0 replies; 14+ messages in thread From: Hugh Dickins @ 2015-06-16 20:27 UTC (permalink / raw) To: Daniel Wagner Cc: Linus Torvalds, Prarit Bhargava, Morten Stevens, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, linux-kernel On Mon, 15 Jun 2015, Daniel Wagner wrote: > On 06/14/2015 06:48 PM, Hugh Dickins wrote: > > It appears that, at some point last year, XFS made directory handling > > changes which bring it into lockdep conflict with shmem_zero_setup(): > > it is surprising that mmap() can clone an inode while holding mmap_sem, > > but that has been so for many years. > > > > Since those few lockdep traces that I've seen all implicated selinux, > > I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which > > v3.13's commit c7277090927a ("security: shmem: implement kernel private > > shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: > > the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. > > > > This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero > > (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers > > which cloned inode in mmap(), but if so, I cannot locate them now. > > > > Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> > > Reported-by: Daniel Wagner <wagi@monom.org> > > Reported-and-tested-by: Daniel Wagner <wagi@monom.org> Great, thank you Daniel: we look more convincing now :) > > Sorry for the long delay. It took me a while to figure out my original > setup. I could verify that this patch made the lockdep message go away > on 4.0-rc6 and also on 4.1-rc8. Thank you for taking the trouble. > > For the record: SELinux needs to be enabled triggering it. Right, selinux was in all the stacktraces we saw, and I was banking on that security "recursion" being what actually upset lockdep; but couldn't be sure until you tried it out. We didn't make -rc8, and I won't be at all surprised if Linus feels that a year(?)-old lockdep warning is not worth disturbing v4.1 final for, but it should get into v4.2 (thank you, Andrew). Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-06-15 6:09 ` Daniel Wagner 2015-06-16 20:27 ` Hugh Dickins @ 2015-06-17 11:45 ` Morten Stevens 2015-06-18 0:22 ` Hugh Dickins 2015-07-22 12:46 ` Morten Stevens 1 sibling, 2 replies; 14+ messages in thread From: Morten Stevens @ 2015-06-17 11:45 UTC (permalink / raw) To: Daniel Wagner Cc: Hugh Dickins, Linus Torvalds, Prarit Bhargava, Morten Stevens, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, linux-kernel 2015-06-15 8:09 GMT+02:00 Daniel Wagner <wagi@monom.org>: > On 06/14/2015 06:48 PM, Hugh Dickins wrote: >> It appears that, at some point last year, XFS made directory handling >> changes which bring it into lockdep conflict with shmem_zero_setup(): >> it is surprising that mmap() can clone an inode while holding mmap_sem, >> but that has been so for many years. >> >> Since those few lockdep traces that I've seen all implicated selinux, >> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >> v3.13's commit c7277090927a ("security: shmem: implement kernel private >> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >> >> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >> which cloned inode in mmap(), but if so, I cannot locate them now. >> >> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> >> Reported-by: Daniel Wagner <wagi@monom.org> > > Reported-and-tested-by: Daniel Wagner <wagi@monom.org> > > Sorry for the long delay. It took me a while to figure out my original > setup. I could verify that this patch made the lockdep message go away > on 4.0-rc6 and also on 4.1-rc8. Yes, it's also fixed for me after applying this patch to 4.1-rc8. Best regards, Morten ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-06-17 11:45 ` Morten Stevens @ 2015-06-18 0:22 ` Hugh Dickins 2015-07-22 12:46 ` Morten Stevens 1 sibling, 0 replies; 14+ messages in thread From: Hugh Dickins @ 2015-06-18 0:22 UTC (permalink / raw) To: Morten Stevens Cc: Daniel Wagner, Linus Torvalds, Prarit Bhargava, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, linux-kernel Wed, Jun 17, 2015 at 12:45 PM, Morten Stevens <mstevens@fedoraproject.org> wrote: > 2015-06-15 8:09 GMT+02:00 Daniel Wagner <wagi@monom.org>: >> On 06/14/2015 06:48 PM, Hugh Dickins wrote: >>> It appears that, at some point last year, XFS made directory handling >>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>> but that has been so for many years. >>> >>> Since those few lockdep traces that I've seen all implicated selinux, >>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>> >>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>> which cloned inode in mmap(), but if so, I cannot locate them now. >>> >>> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> >>> Reported-by: Daniel Wagner <wagi@monom.org> >> >> Reported-and-tested-by: Daniel Wagner <wagi@monom.org> >> >> Sorry for the long delay. It took me a while to figure out my original >> setup. I could verify that this patch made the lockdep message go away >> on 4.0-rc6 and also on 4.1-rc8. > > Yes, it's also fixed for me after applying this patch to 4.1-rc8. Thank you - Hugh > > Best regards, > > Morten ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-06-17 11:45 ` Morten Stevens 2015-06-18 0:22 ` Hugh Dickins @ 2015-07-22 12:46 ` Morten Stevens 2015-07-22 21:07 ` Stephen Smalley 1 sibling, 1 reply; 14+ messages in thread From: Morten Stevens @ 2015-07-22 12:46 UTC (permalink / raw) To: Morten Stevens, Stephen Smalley Cc: Daniel Wagner, Hugh Dickins, Linus Torvalds, Prarit Bhargava, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, Linux Kernel 2015-06-17 13:45 GMT+02:00 Morten Stevens <mstevens@fedoraproject.org>: > 2015-06-15 8:09 GMT+02:00 Daniel Wagner <wagi@monom.org>: >> On 06/14/2015 06:48 PM, Hugh Dickins wrote: >>> It appears that, at some point last year, XFS made directory handling >>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>> but that has been so for many years. >>> >>> Since those few lockdep traces that I've seen all implicated selinux, >>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>> >>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>> which cloned inode in mmap(), but if so, I cannot locate them now. >>> >>> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> >>> Reported-by: Daniel Wagner <wagi@monom.org> >> >> Reported-and-tested-by: Daniel Wagner <wagi@monom.org> >> >> Sorry for the long delay. It took me a while to figure out my original >> setup. I could verify that this patch made the lockdep message go away >> on 4.0-rc6 and also on 4.1-rc8. > > Yes, it's also fixed for me after applying this patch to 4.1-rc8. Here is another deadlock with the latest 4.2.0-rc3: Jul 22 14:36:40 fc23 kernel: ====================================================== Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking dependency detected ] Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W Jul 22 14:36:40 fc23 kernel: ------------------------------------------------------- Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at: [<ffffffff81385354>] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at: [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in reverse order) is: Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}: Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0 Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130 Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>] xfs_readdir+0x1b4/0x330 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffffa019f38b>] xfs_file_readdir+0x2b/0x30 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffff812847e7>] iterate_dir+0x97/0x130 Jul 22 14:36:40 fc23 kernel: [<ffffffff81284d21>] SyS_getdents+0x91/0x120 Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}: Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [<ffffffff81101e97>] down_read_nested+0x57/0xa0 Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b0e57>] xfs_ilock+0x167/0x350 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b10b8>] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffffa014799d>] xfs_attr_get+0xbd/0x190 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffffa01c17ad>] xfs_xattr_get+0x3d/0x70 [xfs] Jul 22 14:36:40 fc23 kernel: [<ffffffff8129962f>] generic_getxattr+0x4f/0x70 Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ba52>] inode_doinit_with_dentry+0x162/0x670 Jul 22 14:36:40 fc23 kernel: [<ffffffff8139cf69>] sb_finish_set_opts+0xd9/0x230 Jul 22 14:36:40 fc23 kernel: [<ffffffff8139d66c>] selinux_set_mnt_opts+0x35c/0x660 Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ff97>] superblock_doinit+0x77/0xf0 Jul 22 14:36:40 fc23 kernel: [<ffffffff813a0020>] delayed_superblock_init+0x10/0x20 Jul 22 14:36:40 fc23 kernel: [<ffffffff81272d23>] iterate_supers+0xb3/0x110 Jul 22 14:36:40 fc23 kernel: [<ffffffff813a4e5f>] selinux_complete_init+0x2f/0x40 Jul 22 14:36:40 fc23 kernel: [<ffffffff813b47a3>] security_load_policy+0x103/0x600 Jul 22 14:36:40 fc23 kernel: [<ffffffff813a6901>] sel_write_load+0xc1/0x750 Jul 22 14:36:40 fc23 kernel: [<ffffffff8126e817>] __vfs_write+0x37/0x100 Jul 22 14:36:40 fc23 kernel: [<ffffffff8126f229>] vfs_write+0xa9/0x1a0 Jul 22 14:36:40 fc23 kernel: [<ffffffff8126ff48>] SyS_write+0x58/0xd0 Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}: Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [<ffffffff8186de8f>] mutex_lock_nested+0x7f/0x3e0 Jul 22 14:36:40 fc23 kernel: [<ffffffff8139b9a9>] inode_doinit_with_dentry+0xb9/0x670 Jul 22 14:36:40 fc23 kernel: [<ffffffff8139bf7c>] selinux_d_instantiate+0x1c/0x20 Jul 22 14:36:40 fc23 kernel: [<ffffffff813955f6>] security_d_instantiate+0x36/0x60 Jul 22 14:36:40 fc23 kernel: [<ffffffff81287c34>] d_instantiate+0x54/0x70 Jul 22 14:36:40 fc23 kernel: [<ffffffff8120111c>] __shmem_file_setup+0xdc/0x240 Jul 22 14:36:40 fc23 kernel: [<ffffffff81201290>] shmem_file_setup+0x10/0x20 Jul 22 14:36:40 fc23 kernel: [<ffffffff813856e0>] newseg+0x290/0x3a0 Jul 22 14:36:40 fc23 kernel: [<ffffffff8137e278>] ipcget+0x208/0x2d0 Jul 22 14:36:40 fc23 kernel: [<ffffffff81386074>] SyS_shmget+0x54/0x70 Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}: Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00 Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0 Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80 Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460 Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180 Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76 Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this: Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem --> &xfs_dir_ilock_class --> &mm->mmap_sem Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario: Jul 22 14:36:40 fc23 kernel: CPU0 CPU1 Jul 22 14:36:40 fc23 kernel: ---- ---- Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem); Jul 22 14:36:40 fc23 kernel: lock(&xfs_dir_ilock_class); Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem); Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem); Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK *** Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597: Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: #012stack backtrace: Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d ffff88019ff07c58 ffffffff81868175 Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390 ffff88019ff07ca8 ffffffff81105903 Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08 0000000000000001 ffff8800b75108f0 Jul 22 14:36:40 fc23 kernel: Call Trace: Jul 22 14:36:40 fc23 kernel: [<ffffffff81868175>] dump_stack+0x4c/0x65 Jul 22 14:36:40 fc23 kernel: [<ffffffff81105903>] print_circular_bug+0x1e3/0x250 Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00 Jul 22 14:36:40 fc23 kernel: [<ffffffff81220c33>] ? unlink_file_vma+0x33/0x60 Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0 Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130 Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80 Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460 Jul 22 14:36:40 fc23 kernel: [<ffffffff81386bbb>] ? SyS_shmdt+0x4b/0x180 Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180 Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76 Best regards, Morten ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-22 12:46 ` Morten Stevens @ 2015-07-22 21:07 ` Stephen Smalley 0 siblings, 0 replies; 14+ messages in thread From: Stephen Smalley @ 2015-07-22 21:07 UTC (permalink / raw) To: Morten Stevens Cc: Daniel Wagner, Hugh Dickins, Linus Torvalds, Prarit Bhargava, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, Linux Kernel On 07/22/2015 08:46 AM, Morten Stevens wrote: > 2015-06-17 13:45 GMT+02:00 Morten Stevens <mstevens@fedoraproject.org>: >> 2015-06-15 8:09 GMT+02:00 Daniel Wagner <wagi@monom.org>: >>> On 06/14/2015 06:48 PM, Hugh Dickins wrote: >>>> It appears that, at some point last year, XFS made directory handling >>>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>>> but that has been so for many years. >>>> >>>> Since those few lockdep traces that I've seen all implicated selinux, >>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>>> >>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>>> which cloned inode in mmap(), but if so, I cannot locate them now. >>>> >>>> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> >>>> Reported-by: Daniel Wagner <wagi@monom.org> >>> >>> Reported-and-tested-by: Daniel Wagner <wagi@monom.org> >>> >>> Sorry for the long delay. It took me a while to figure out my original >>> setup. I could verify that this patch made the lockdep message go away >>> on 4.0-rc6 and also on 4.1-rc8. >> >> Yes, it's also fixed for me after applying this patch to 4.1-rc8. > > Here is another deadlock with the latest 4.2.0-rc3: > > Jul 22 14:36:40 fc23 kernel: > ====================================================== > Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking > dependency detected ] > Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 > Tainted: G W > Jul 22 14:36:40 fc23 kernel: > ------------------------------------------------------- > Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock: > Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at: > [<ffffffff81385354>] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock: > Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at: > [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock. > Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in > reverse order) is: > Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}: > Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130 > Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>] > xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>] > xfs_readdir+0x1b4/0x330 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffffa019f38b>] > xfs_file_readdir+0x2b/0x30 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffff812847e7>] iterate_dir+0x97/0x130 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81284d21>] SyS_getdents+0x91/0x120 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}: > Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81101e97>] > down_read_nested+0x57/0xa0 > Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b0e57>] > xfs_ilock+0x167/0x350 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b10b8>] > xfs_ilock_attr_map_shared+0x38/0x50 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffffa014799d>] > xfs_attr_get+0xbd/0x190 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffffa01c17ad>] > xfs_xattr_get+0x3d/0x70 [xfs] > Jul 22 14:36:40 fc23 kernel: [<ffffffff8129962f>] > generic_getxattr+0x4f/0x70 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ba52>] > inode_doinit_with_dentry+0x162/0x670 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8139cf69>] > sb_finish_set_opts+0xd9/0x230 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8139d66c>] > selinux_set_mnt_opts+0x35c/0x660 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ff97>] > superblock_doinit+0x77/0xf0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff813a0020>] > delayed_superblock_init+0x10/0x20 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81272d23>] > iterate_supers+0xb3/0x110 > Jul 22 14:36:40 fc23 kernel: [<ffffffff813a4e5f>] > selinux_complete_init+0x2f/0x40 > Jul 22 14:36:40 fc23 kernel: [<ffffffff813b47a3>] > security_load_policy+0x103/0x600 > Jul 22 14:36:40 fc23 kernel: [<ffffffff813a6901>] > sel_write_load+0xc1/0x750 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8126e817>] __vfs_write+0x37/0x100 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8126f229>] vfs_write+0xa9/0x1a0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8126ff48>] SyS_write+0x58/0xd0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}: > Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8186de8f>] > mutex_lock_nested+0x7f/0x3e0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8139b9a9>] > inode_doinit_with_dentry+0xb9/0x670 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8139bf7c>] > selinux_d_instantiate+0x1c/0x20 > Jul 22 14:36:40 fc23 kernel: [<ffffffff813955f6>] > security_d_instantiate+0x36/0x60 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81287c34>] d_instantiate+0x54/0x70 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8120111c>] > __shmem_file_setup+0xdc/0x240 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81201290>] > shmem_file_setup+0x10/0x20 > Jul 22 14:36:40 fc23 kernel: [<ffffffff813856e0>] newseg+0x290/0x3a0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8137e278>] ipcget+0x208/0x2d0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81386074>] SyS_shmget+0x54/0x70 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}: > Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] > __lock_acquire+0x1a78/0x1d00 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] > entry_SYSCALL_64_fastpath+0x12/0x76 > Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this: > Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem --> > &xfs_dir_ilock_class --> &mm->mmap_sem > Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario: > Jul 22 14:36:40 fc23 kernel: CPU0 CPU1 > Jul 22 14:36:40 fc23 kernel: ---- ---- > Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem); > Jul 22 14:36:40 fc23 kernel: > lock(&xfs_dir_ilock_class); > Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem); > Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem); > Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK *** > Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597: > Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at: > [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: #012stack backtrace: > Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G > W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 > Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware > Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 > 05/20/2014 > Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d > ffff88019ff07c58 ffffffff81868175 > Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390 > ffff88019ff07ca8 ffffffff81105903 > Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08 > 0000000000000001 ffff8800b75108f0 > Jul 22 14:36:40 fc23 kernel: Call Trace: > Jul 22 14:36:40 fc23 kernel: [<ffffffff81868175>] dump_stack+0x4c/0x65 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81105903>] print_circular_bug+0x1e3/0x250 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81220c33>] ? unlink_file_vma+0x33/0x60 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130 > Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81386bbb>] ? SyS_shmdt+0x4b/0x180 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180 > Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>] > entry_SYSCALL_64_fastpath+0x12/0x76 > > Best regards, > > Morten I would think that we could flip shm over to using shmem_kernel_file_setup(), but you might still encounter the same warning on a hugetlb segment, unless we also start marking those as private. I am a little concerned that we are playing whack-a-mole. What's the original change that has caused these deadlock scenarios to arise now (I see the original patch descriptions mentions XFS directory handling changes, but not clear on the details, nor on how this is relevant to shmem inodes), and are these real deadlock scenarios or merely false positives that you are trying to eliminate? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-06-14 16:48 mm: shmem_zero_setup skip security check and lockdep conflict with XFS Hugh Dickins 2015-06-15 6:09 ` Daniel Wagner @ 2015-07-08 13:13 ` Stephen Smalley 2015-07-08 16:37 ` Stephen Smalley 1 sibling, 1 reply; 14+ messages in thread From: Stephen Smalley @ 2015-07-08 13:13 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Prarit Bhargava, Morten Stevens, Daniel Wagner, Dave Chinner, Eric Paris, Eric Sandeen, Andrew Morton, linux-mm, Linux Kernel, Stephen Smalley, Paul Moore, selinux On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: > It appears that, at some point last year, XFS made directory handling > changes which bring it into lockdep conflict with shmem_zero_setup(): > it is surprising that mmap() can clone an inode while holding mmap_sem, > but that has been so for many years. > > Since those few lockdep traces that I've seen all implicated selinux, > I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which > v3.13's commit c7277090927a ("security: shmem: implement kernel private > shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: > the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. > > This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero > (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers > which cloned inode in mmap(), but if so, I cannot locate them now. This causes a regression for SELinux (please, in the future, cc selinux list and Paul Moore on SELinux-related changes). In particular, this change disables SELinux checking of mprotect PROT_EXEC on shared anonymous mappings, so we lose the ability to control executable mappings. That said, we are only getting that check today as a side effect of our file execute check on the tmpfs inode, whereas it would be better (and more consistent with the mmap-time checks) to apply an execmem check in that case, in which case we wouldn't care about the inode-based check. However, I am unclear on how to correctly detect that situation from selinux_file_mprotect() -> file_map_prot_check(), because we do have a non-NULL vma->vm_file so we treat it as a file execute check. In contrast, if directly creating an anonymous shared mapping with PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with a NULL file and therefore we end up applying an execmem check. > > Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> > Reported-by: Daniel Wagner <wagi@monom.org> > Reported-by: Morten Stevens <mstevens@fedoraproject.org> > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > > mm/shmem.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 > +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 > @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru > struct file *file; > loff_t size = vma->vm_end - vma->vm_start; > > - file = shmem_file_setup("dev/zero", size, vma->vm_flags); > + /* > + * Cloning a new file under mmap_sem leads to a lock ordering conflict > + * between XFS directory reading and selinux: since this file is only > + * accessible to the user through its mapping, use S_PRIVATE flag to > + * bypass file security, in the same way as shmem_kernel_file_setup(). > + */ > + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); > if (IS_ERR(file)) > return PTR_ERR(file); > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-08 13:13 ` Stephen Smalley @ 2015-07-08 16:37 ` Stephen Smalley 2015-07-08 20:37 ` Morten Stevens 2015-07-09 8:23 ` Hugh Dickins 0 siblings, 2 replies; 14+ messages in thread From: Stephen Smalley @ 2015-07-08 16:37 UTC (permalink / raw) To: Stephen Smalley, Hugh Dickins Cc: Prarit Bhargava, Morten Stevens, Eric Sandeen, Dave Chinner, Daniel Wagner, Linux Kernel, Eric Paris, linux-mm, selinux, Andrew Morton, Linus Torvalds On 07/08/2015 09:13 AM, Stephen Smalley wrote: > On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: >> It appears that, at some point last year, XFS made directory handling >> changes which bring it into lockdep conflict with shmem_zero_setup(): >> it is surprising that mmap() can clone an inode while holding mmap_sem, >> but that has been so for many years. >> >> Since those few lockdep traces that I've seen all implicated selinux, >> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >> v3.13's commit c7277090927a ("security: shmem: implement kernel private >> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >> >> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >> which cloned inode in mmap(), but if so, I cannot locate them now. > > This causes a regression for SELinux (please, in the future, cc > selinux list and Paul Moore on SELinux-related changes). In > particular, this change disables SELinux checking of mprotect > PROT_EXEC on shared anonymous mappings, so we lose the ability to > control executable mappings. That said, we are only getting that > check today as a side effect of our file execute check on the tmpfs > inode, whereas it would be better (and more consistent with the > mmap-time checks) to apply an execmem check in that case, in which > case we wouldn't care about the inode-based check. However, I am > unclear on how to correctly detect that situation from > selinux_file_mprotect() -> file_map_prot_check(), because we do have a > non-NULL vma->vm_file so we treat it as a file execute check. In > contrast, if directly creating an anonymous shared mapping with > PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with > a NULL file and therefore we end up applying an execmem check. Also, can you provide the lockdep traces that motivated this change? > >> >> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> >> Reported-by: Daniel Wagner <wagi@monom.org> >> Reported-by: Morten Stevens <mstevens@fedoraproject.org> >> Signed-off-by: Hugh Dickins <hughd@google.com> >> --- >> >> mm/shmem.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 >> +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 >> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru >> struct file *file; >> loff_t size = vma->vm_end - vma->vm_start; >> >> - file = shmem_file_setup("dev/zero", size, vma->vm_flags); >> + /* >> + * Cloning a new file under mmap_sem leads to a lock ordering conflict >> + * between XFS directory reading and selinux: since this file is only >> + * accessible to the user through its mapping, use S_PRIVATE flag to >> + * bypass file security, in the same way as shmem_kernel_file_setup(). >> + */ >> + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); >> if (IS_ERR(file)) >> return PTR_ERR(file); >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > _______________________________________________ > Selinux mailing list > Selinux@tycho.nsa.gov > To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. > To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov. > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-08 16:37 ` Stephen Smalley @ 2015-07-08 20:37 ` Morten Stevens 2015-07-09 8:23 ` Hugh Dickins 1 sibling, 0 replies; 14+ messages in thread From: Morten Stevens @ 2015-07-08 20:37 UTC (permalink / raw) To: Stephen Smalley Cc: Stephen Smalley, Hugh Dickins, Prarit Bhargava, Morten Stevens, Eric Sandeen, Dave Chinner, Daniel Wagner, Linux Kernel, Eric Paris, linux-mm, selinux, Andrew Morton, Linus Torvalds 2015-07-08 18:37 GMT+02:00 Stephen Smalley <sds@tycho.nsa.gov>: > On 07/08/2015 09:13 AM, Stephen Smalley wrote: >> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: >>> It appears that, at some point last year, XFS made directory handling >>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>> but that has been so for many years. >>> >>> Since those few lockdep traces that I've seen all implicated selinux, >>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>> >>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>> which cloned inode in mmap(), but if so, I cannot locate them now. >> >> This causes a regression for SELinux (please, in the future, cc >> selinux list and Paul Moore on SELinux-related changes). In >> particular, this change disables SELinux checking of mprotect >> PROT_EXEC on shared anonymous mappings, so we lose the ability to >> control executable mappings. That said, we are only getting that >> check today as a side effect of our file execute check on the tmpfs >> inode, whereas it would be better (and more consistent with the >> mmap-time checks) to apply an execmem check in that case, in which >> case we wouldn't care about the inode-based check. However, I am >> unclear on how to correctly detect that situation from >> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >> non-NULL vma->vm_file so we treat it as a file execute check. In >> contrast, if directly creating an anonymous shared mapping with >> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >> a NULL file and therefore we end up applying an execmem check. > > Also, can you provide the lockdep traces that motivated this change? Yes, here is it: [ 28.177939] ====================================================== [ 28.177959] [ INFO: possible circular locking dependency detected ] [ 28.177980] 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 Tainted: G W [ 28.178002] ------------------------------------------------------- [ 28.178022] sshd/1764 is trying to acquire lock: [ 28.178037] (&isec->lock){+.+.+.}, at: [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.178078] but task is already holding lock: [ 28.178097] (&mm->mmap_sem){++++++}, at: [<ffffffff81216a0f>] vm_mmap_pgoff+0x8f/0xf0 [ 28.178131] which lock already depends on the new lock. [ 28.178157] the existing dependency chain (in reverse order) is: [ 28.178180] -> #2 (&mm->mmap_sem){++++++}: [ 28.178201] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.178225] [<ffffffff8122853c>] might_fault+0x8c/0xb0 [ 28.178248] [<ffffffff8129af3a>] filldir+0x9a/0x130 [ 28.178269] [<ffffffffa019cfd6>] xfs_dir2_block_getdents.isra.12+0x1a6/0x1d0 [xfs] [ 28.178330] [<ffffffffa019dae4>] xfs_readdir+0x1c4/0x360 [xfs] [ 28.178368] [<ffffffffa01a0a5b>] xfs_file_readdir+0x2b/0x30 [xfs] [ 28.178404] [<ffffffff8129ad0a>] iterate_dir+0x9a/0x140 [ 28.178425] [<ffffffff8129b241>] SyS_getdents+0x91/0x120 [ 28.178447] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 28.178471] -> #1 (&xfs_dir_ilock_class){++++.+}: [ 28.178494] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.178515] [<ffffffff8110bee7>] down_read_nested+0x57/0xa0 [ 28.178538] [<ffffffffa01b2ed1>] xfs_ilock+0x171/0x390 [xfs] [ 28.178579] [<ffffffffa01b3168>] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] [ 28.178618] [<ffffffffa0145d8d>] xfs_attr_get+0xbd/0x1b0 [xfs] [ 28.178651] [<ffffffffa01c44ad>] xfs_xattr_get+0x3d/0x80 [xfs] [ 28.178688] [<ffffffff812b022f>] generic_getxattr+0x4f/0x70 [ 28.178711] [<ffffffff813b5372>] inode_doinit_with_dentry+0x172/0x6a0 [ 28.178737] [<ffffffff813b68db>] sb_finish_set_opts+0xdb/0x260 [ 28.178759] [<ffffffff813b6ff1>] selinux_set_mnt_opts+0x331/0x670 [ 28.178783] [<ffffffff813b9b47>] superblock_doinit+0x77/0xf0 [ 28.178804] [<ffffffff813b9bd0>] delayed_superblock_init+0x10/0x20 [ 28.178849] [<ffffffff8128691a>] iterate_supers+0xba/0x120 [ 28.178872] [<ffffffff813bef23>] selinux_complete_init+0x33/0x40 [ 28.178897] [<ffffffff813cf313>] security_load_policy+0x103/0x640 [ 28.178920] [<ffffffff813c0a76>] sel_write_load+0xb6/0x790 [ 28.179482] [<ffffffff812821f7>] __vfs_write+0x37/0x110 [ 28.180047] [<ffffffff81282c89>] vfs_write+0xa9/0x1c0 [ 28.180630] [<ffffffff81283a1c>] SyS_write+0x5c/0xd0 [ 28.181168] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 28.181740] -> #0 (&isec->lock){+.+.+.}: [ 28.182808] [<ffffffff81113331>] __lock_acquire+0x1b31/0x1e40 [ 28.183347] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.183897] [<ffffffff8189c10d>] mutex_lock_nested+0x7d/0x460 [ 28.184427] [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.184944] [<ffffffff813b58bc>] selinux_d_instantiate+0x1c/0x20 [ 28.185470] [<ffffffff813b07ab>] security_d_instantiate+0x1b/0x30 [ 28.185980] [<ffffffff8129e8c4>] d_instantiate+0x54/0x80 [ 28.186495] [<ffffffff81211edc>] __shmem_file_setup+0xdc/0x250 [ 28.186990] [<ffffffff812164a8>] shmem_zero_setup+0x28/0x70 [ 28.187500] [<ffffffff8123471c>] mmap_region+0x66c/0x680 [ 28.188006] [<ffffffff81234a53>] do_mmap_pgoff+0x323/0x410 [ 28.188500] [<ffffffff81216a30>] vm_mmap_pgoff+0xb0/0xf0 [ 28.189005] [<ffffffff81232bf6>] SyS_mmap_pgoff+0x116/0x2b0 [ 28.189490] [<ffffffff810232bb>] SyS_mmap+0x1b/0x30 [ 28.189975] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 28.190474] other info that might help us debug this: [ 28.191901] Chain exists of: &isec->lock --> &xfs_dir_ilock_class --> &mm->mmap_sem [ 28.193327] Possible unsafe locking scenario: [ 28.194297] CPU0 CPU1 [ 28.194774] ---- ---- [ 28.195254] lock(&mm->mmap_sem); [ 28.195709] lock(&xfs_dir_ilock_class); [ 28.196174] lock(&mm->mmap_sem); [ 28.196654] lock(&isec->lock); [ 28.197108] *** DEADLOCK *** [ 28.198451] 1 lock held by sshd/1764: [ 28.198900] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81216a0f>] vm_mmap_pgoff+0x8f/0xf0 [ 28.199370] stack backtrace: [ 28.200276] CPU: 2 PID: 1764 Comm: sshd Tainted: G W 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 [ 28.200753] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 28.201246] 0000000000000000 00000000eda89a94 ffff8800a86a39c8 ffffffff81896375 [ 28.201771] 0000000000000000 ffffffff82a910d0 ffff8800a86a3a18 ffffffff8110fbd6 [ 28.202275] 0000000000000002 ffff8800a86a3a78 0000000000000001 ffff8800a897b008 [ 28.203099] Call Trace: [ 28.204237] [<ffffffff81896375>] dump_stack+0x4c/0x65 [ 28.205362] [<ffffffff8110fbd6>] print_circular_bug+0x206/0x280 [ 28.206502] [<ffffffff81113331>] __lock_acquire+0x1b31/0x1e40 [ 28.207650] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.208758] [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.209902] [<ffffffff8189c10d>] mutex_lock_nested+0x7d/0x460 [ 28.211023] [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.212162] [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.213283] [<ffffffff81027e7d>] ? native_sched_clock+0x2d/0xa0 [ 28.214403] [<ffffffff81027ef9>] ? sched_clock+0x9/0x10 [ 28.215514] [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.216656] [<ffffffff813b58bc>] selinux_d_instantiate+0x1c/0x20 [ 28.217776] [<ffffffff813b07ab>] security_d_instantiate+0x1b/0x30 [ 28.218902] [<ffffffff8129e8c4>] d_instantiate+0x54/0x80 [ 28.219992] [<ffffffff81211edc>] __shmem_file_setup+0xdc/0x250 [ 28.221112] [<ffffffff812164a8>] shmem_zero_setup+0x28/0x70 [ 28.222234] [<ffffffff8123471c>] mmap_region+0x66c/0x680 [ 28.223362] [<ffffffff81234a53>] do_mmap_pgoff+0x323/0x410 [ 28.224493] [<ffffffff81216a0f>] ? vm_mmap_pgoff+0x8f/0xf0 [ 28.225643] [<ffffffff81216a30>] vm_mmap_pgoff+0xb0/0xf0 [ 28.226771] [<ffffffff81232bf6>] SyS_mmap_pgoff+0x116/0x2b0 [ 28.227900] [<ffffffff812996ce>] ? SyS_fcntl+0x5de/0x760 [ 28.229042] [<ffffffff810232bb>] SyS_mmap+0x1b/0x30 [ 28.230156] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 46.520367] Adjusting tsc more than 11% (5419175 vs 7179037) Best regards, Morten > >> >>> >>> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> >>> Reported-by: Daniel Wagner <wagi@monom.org> >>> Reported-by: Morten Stevens <mstevens@fedoraproject.org> >>> Signed-off-by: Hugh Dickins <hughd@google.com> >>> --- >>> >>> mm/shmem.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 >>> +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 >>> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru >>> struct file *file; >>> loff_t size = vma->vm_end - vma->vm_start; >>> >>> - file = shmem_file_setup("dev/zero", size, vma->vm_flags); >>> + /* >>> + * Cloning a new file under mmap_sem leads to a lock ordering conflict >>> + * between XFS directory reading and selinux: since this file is only >>> + * accessible to the user through its mapping, use S_PRIVATE flag to >>> + * bypass file security, in the same way as shmem_kernel_file_setup(). >>> + */ >>> + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); >>> if (IS_ERR(file)) >>> return PTR_ERR(file); >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >> _______________________________________________ >> Selinux mailing list >> Selinux@tycho.nsa.gov >> To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. >> To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov. >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-08 16:37 ` Stephen Smalley 2015-07-08 20:37 ` Morten Stevens @ 2015-07-09 8:23 ` Hugh Dickins 2015-07-09 12:59 ` Stephen Smalley 1 sibling, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2015-07-09 8:23 UTC (permalink / raw) To: Stephen Smalley Cc: Stephen Smalley, Hugh Dickins, Prarit Bhargava, Morten Stevens, Eric Sandeen, Dave Chinner, Daniel Wagner, Linux Kernel, Eric Paris, linux-mm, selinux, Andrew Morton, Linus Torvalds On Wed, 8 Jul 2015, Stephen Smalley wrote: > On 07/08/2015 09:13 AM, Stephen Smalley wrote: > > On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: > >> It appears that, at some point last year, XFS made directory handling > >> changes which bring it into lockdep conflict with shmem_zero_setup(): > >> it is surprising that mmap() can clone an inode while holding mmap_sem, > >> but that has been so for many years. > >> > >> Since those few lockdep traces that I've seen all implicated selinux, > >> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which > >> v3.13's commit c7277090927a ("security: shmem: implement kernel private > >> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: > >> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. > >> > >> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero > >> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers > >> which cloned inode in mmap(), but if so, I cannot locate them now. > > > > This causes a regression for SELinux (please, in the future, cc > > selinux list and Paul Moore on SELinux-related changes). In Surprised and sorry about that, yes, I should have Cc'ed. > > particular, this change disables SELinux checking of mprotect > > PROT_EXEC on shared anonymous mappings, so we lose the ability to > > control executable mappings. That said, we are only getting that > > check today as a side effect of our file execute check on the tmpfs > > inode, whereas it would be better (and more consistent with the > > mmap-time checks) to apply an execmem check in that case, in which > > case we wouldn't care about the inode-based check. However, I am > > unclear on how to correctly detect that situation from > > selinux_file_mprotect() -> file_map_prot_check(), because we do have a > > non-NULL vma->vm_file so we treat it as a file execute check. In > > contrast, if directly creating an anonymous shared mapping with > > PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with > > a NULL file and therefore we end up applying an execmem check. If you're willing to go forward with the change, rather than just call for an immediate revert of it, then I think the right way to detect the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)), wouldn't it? > > Also, can you provide the lockdep traces that motivated this change? Thank you for supplying that, Morten. > > > > >> > >> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> > >> Reported-by: Daniel Wagner <wagi@monom.org> > >> Reported-by: Morten Stevens <mstevens@fedoraproject.org> > >> Signed-off-by: Hugh Dickins <hughd@google.com> > >> --- > >> > >> mm/shmem.c | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 > >> +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 > >> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru > >> struct file *file; > >> loff_t size = vma->vm_end - vma->vm_start; > >> > >> - file = shmem_file_setup("dev/zero", size, vma->vm_flags); > >> + /* > >> + * Cloning a new file under mmap_sem leads to a lock ordering conflict > >> + * between XFS directory reading and selinux: since this file is only > >> + * accessible to the user through its mapping, use S_PRIVATE flag to > >> + * bypass file security, in the same way as shmem_kernel_file_setup(). > >> + */ > >> + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); > >> if (IS_ERR(file)) > >> return PTR_ERR(file); > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > > _______________________________________________ > > Selinux mailing list > > Selinux@tycho.nsa.gov > > To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. > > To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov. > > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-09 8:23 ` Hugh Dickins @ 2015-07-09 12:59 ` Stephen Smalley 2015-07-10 7:48 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Stephen Smalley @ 2015-07-09 12:59 UTC (permalink / raw) To: Hugh Dickins Cc: Stephen Smalley, Prarit Bhargava, Morten Stevens, Eric Sandeen, Dave Chinner, Daniel Wagner, Linux Kernel, Eric Paris, linux-mm, selinux, Andrew Morton, Linus Torvalds, David Howells On 07/09/2015 04:23 AM, Hugh Dickins wrote: > On Wed, 8 Jul 2015, Stephen Smalley wrote: >> On 07/08/2015 09:13 AM, Stephen Smalley wrote: >>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: >>>> It appears that, at some point last year, XFS made directory handling >>>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>>> but that has been so for many years. >>>> >>>> Since those few lockdep traces that I've seen all implicated selinux, >>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>>> >>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>>> which cloned inode in mmap(), but if so, I cannot locate them now. >>> >>> This causes a regression for SELinux (please, in the future, cc >>> selinux list and Paul Moore on SELinux-related changes). In > > Surprised and sorry about that, yes, I should have Cc'ed. > >>> particular, this change disables SELinux checking of mprotect >>> PROT_EXEC on shared anonymous mappings, so we lose the ability to >>> control executable mappings. That said, we are only getting that >>> check today as a side effect of our file execute check on the tmpfs >>> inode, whereas it would be better (and more consistent with the >>> mmap-time checks) to apply an execmem check in that case, in which >>> case we wouldn't care about the inode-based check. However, I am >>> unclear on how to correctly detect that situation from >>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >>> non-NULL vma->vm_file so we treat it as a file execute check. In >>> contrast, if directly creating an anonymous shared mapping with >>> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >>> a NULL file and therefore we end up applying an execmem check. > > If you're willing to go forward with the change, rather than just call > for an immediate revert of it, then I think the right way to detect > the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)), > wouldn't it? That seems misleading and might trigger execmem checks on non-shmem inodes. S_PRIVATE was originally introduced for fs-internal inodes that are never directly exposed to userspace, originally for reiserfs xattr inodes (reiserfs xattrs are internally implemented as their own files that are hidden from userspace) and later also applied to anon inodes. It would be better if we had an explicit way of testing that we are dealing with an anonymous shared mapping in selinux_file_mprotect() -> file_map_prot_check(). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-09 12:59 ` Stephen Smalley @ 2015-07-10 7:48 ` Hugh Dickins 2015-07-10 13:09 ` Stephen Smalley 0 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2015-07-10 7:48 UTC (permalink / raw) To: Stephen Smalley Cc: Hugh Dickins, Stephen Smalley, Prarit Bhargava, Morten Stevens, Eric Sandeen, Dave Chinner, Daniel Wagner, Linux Kernel, Eric Paris, linux-mm, selinux, Andrew Morton, Linus Torvalds, David Howells On Thu, 9 Jul 2015, Stephen Smalley wrote: > On 07/09/2015 04:23 AM, Hugh Dickins wrote: > > On Wed, 8 Jul 2015, Stephen Smalley wrote: > >> On 07/08/2015 09:13 AM, Stephen Smalley wrote: > >>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: > >>>> It appears that, at some point last year, XFS made directory handling > >>>> changes which bring it into lockdep conflict with shmem_zero_setup(): > >>>> it is surprising that mmap() can clone an inode while holding mmap_sem, > >>>> but that has been so for many years. > >>>> > >>>> Since those few lockdep traces that I've seen all implicated selinux, > >>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which > >>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private > >>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: > >>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. > >>>> > >>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero > >>>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers > >>>> which cloned inode in mmap(), but if so, I cannot locate them now. > >>> > >>> This causes a regression for SELinux (please, in the future, cc > >>> selinux list and Paul Moore on SELinux-related changes). In > > > > Surprised and sorry about that, yes, I should have Cc'ed. > > > >>> particular, this change disables SELinux checking of mprotect > >>> PROT_EXEC on shared anonymous mappings, so we lose the ability to > >>> control executable mappings. That said, we are only getting that > >>> check today as a side effect of our file execute check on the tmpfs > >>> inode, whereas it would be better (and more consistent with the > >>> mmap-time checks) to apply an execmem check in that case, in which > >>> case we wouldn't care about the inode-based check. However, I am > >>> unclear on how to correctly detect that situation from > >>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a > >>> non-NULL vma->vm_file so we treat it as a file execute check. In > >>> contrast, if directly creating an anonymous shared mapping with > >>> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with > >>> a NULL file and therefore we end up applying an execmem check. > > > > If you're willing to go forward with the change, rather than just call > > for an immediate revert of it, then I think the right way to detect > > the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)), > > wouldn't it? > > That seems misleading and might trigger execmem checks on non-shmem > inodes. S_PRIVATE was originally introduced for fs-internal inodes that > are never directly exposed to userspace, originally for reiserfs xattr > inodes (reiserfs xattrs are internally implemented as their own files > that are hidden from userspace) and later also applied to anon inodes. > It would be better if we had an explicit way of testing that we are > dealing with an anonymous shared mapping in selinux_file_mprotect() -> > file_map_prot_check(). But how would any of those original S_PRIVATE inodes arrive at selinux_file_mprotect()? Now we have added the anon shared mmap case which can arrive there, but the S_PRIVATE check seems just the right tool for the job of distinguishing those from the user-visible inodes. I don't see how adding some other flag for this case would be better - though certainly I can see that adding an "anon shared shmem" comment on its use in that check would be helpful. Or is there some further difficulty in this use of S_PRIVATE, beyond the mprotect case that you've mentioned? Unless there is some further difficulty, duplicating all the code relating to S_PRIVATE for a differently named flag seems counter-productive to me. (There is a bool shmem_mapping(mapping) that could be used to confirm that the inode you're looking at indeed belongs to shmem; but of course that would say yes on all the user-visible shmem inodes too, so it wouldn't be a useful test on its own, and I don't see that adding it to an S_PRIVATE test would add any real value.) Probably you were hoping that there's already some distinguishing feature of anon shared shmem inodes that you could check: I can't think of one offhand, beyond S_PRIVATE: if there is another, it would be accidental. Hugh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS 2015-07-10 7:48 ` Hugh Dickins @ 2015-07-10 13:09 ` Stephen Smalley 0 siblings, 0 replies; 14+ messages in thread From: Stephen Smalley @ 2015-07-10 13:09 UTC (permalink / raw) To: Hugh Dickins Cc: Stephen Smalley, Prarit Bhargava, Morten Stevens, Eric Sandeen, Dave Chinner, Daniel Wagner, Linux Kernel, Eric Paris, linux-mm, selinux, Andrew Morton, Linus Torvalds, David Howells On 07/10/2015 03:48 AM, Hugh Dickins wrote: > On Thu, 9 Jul 2015, Stephen Smalley wrote: >> On 07/09/2015 04:23 AM, Hugh Dickins wrote: >>> On Wed, 8 Jul 2015, Stephen Smalley wrote: >>>> On 07/08/2015 09:13 AM, Stephen Smalley wrote: >>>>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@google.com> wrote: >>>>>> It appears that, at some point last year, XFS made directory handling >>>>>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>>>>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>>>>> but that has been so for many years. >>>>>> >>>>>> Since those few lockdep traces that I've seen all implicated selinux, >>>>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>>>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>>>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>>>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>>>>> >>>>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>>>>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>>>>> which cloned inode in mmap(), but if so, I cannot locate them now. >>>>> >>>>> This causes a regression for SELinux (please, in the future, cc >>>>> selinux list and Paul Moore on SELinux-related changes). In >>> >>> Surprised and sorry about that, yes, I should have Cc'ed. >>> >>>>> particular, this change disables SELinux checking of mprotect >>>>> PROT_EXEC on shared anonymous mappings, so we lose the ability to >>>>> control executable mappings. That said, we are only getting that >>>>> check today as a side effect of our file execute check on the tmpfs >>>>> inode, whereas it would be better (and more consistent with the >>>>> mmap-time checks) to apply an execmem check in that case, in which >>>>> case we wouldn't care about the inode-based check. However, I am >>>>> unclear on how to correctly detect that situation from >>>>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >>>>> non-NULL vma->vm_file so we treat it as a file execute check. In >>>>> contrast, if directly creating an anonymous shared mapping with >>>>> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >>>>> a NULL file and therefore we end up applying an execmem check. >>> >>> If you're willing to go forward with the change, rather than just call >>> for an immediate revert of it, then I think the right way to detect >>> the situation would be to check IS_PRIVATE(file_inode(vma->vm_file)), >>> wouldn't it? >> >> That seems misleading and might trigger execmem checks on non-shmem >> inodes. S_PRIVATE was originally introduced for fs-internal inodes that >> are never directly exposed to userspace, originally for reiserfs xattr >> inodes (reiserfs xattrs are internally implemented as their own files >> that are hidden from userspace) and later also applied to anon inodes. >> It would be better if we had an explicit way of testing that we are >> dealing with an anonymous shared mapping in selinux_file_mprotect() -> >> file_map_prot_check(). > > But how would any of those original S_PRIVATE inodes arrive at > selinux_file_mprotect()? Now we have added the anon shared mmap case > which can arrive there, but the S_PRIVATE check seems just the right > tool for the job of distinguishing those from the user-visible inodes. > > I don't see how adding some other flag for this case would be better > - though certainly I can see that adding an "anon shared shmem" > comment on its use in that check would be helpful. > > Or is there some further difficulty in this use of S_PRIVATE, beyond > the mprotect case that you've mentioned? Unless there is some further > difficulty, duplicating all the code relating to S_PRIVATE for a > differently named flag seems counter-productive to me. S_PRIVATE is supposed to disable all security processing on the inode, and often this is checked in the security framework (security/security.c) even before we reach the SELinux hook and causes an immediate return there. In the case of mprotect, we do reach the SELinux code since the hook is on the vma, not merely the inode, so we could apply an execmem check in the SELinux code if IS_PRIVATE() instead of file execute. However, I was trying to figure out if the fact that S_PRIVATE also would disable any read/write checking by SELinux on the inode could potentially open up a bypass of security policy. That would only be an issue if the file returned by shmem_zero_setup() was ever linked to an open file descriptor that could be inherited across a fork+exec or passed across local socket IPC or binder IPC and thereby shared across different security contexts. Uses of shmem_zero_setup() include mmap MAP_ANONYMOUS|MAP_SHARED, drivers/staging/android/ashmem.c (from ashmem_mmap if VM_SHARED), and drivers/char/mem.c (from mmap_zero if VM_SHARED). That all seems fine AFAICS. > (There is a bool shmem_mapping(mapping) that could be used to confirm > that the inode you're looking at indeed belongs to shmem; but of > course that would say yes on all the user-visible shmem inodes too, > so it wouldn't be a useful test on its own, and I don't see that > adding it to an S_PRIVATE test would add any real value.) > > Probably you were hoping that there's already some distinguishing > feature of anon shared shmem inodes that you could check: I can't > think of one offhand, beyond S_PRIVATE: if there is another, > it would be accidental. Yes, I was hoping for that. Ok, I'll spin up a patch for adding an IS_PRIVATE() test to SELinux file_map_prot_check() and cc you all on it. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-07-22 21:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-14 16:48 mm: shmem_zero_setup skip security check and lockdep conflict with XFS Hugh Dickins 2015-06-15 6:09 ` Daniel Wagner 2015-06-16 20:27 ` Hugh Dickins 2015-06-17 11:45 ` Morten Stevens 2015-06-18 0:22 ` Hugh Dickins 2015-07-22 12:46 ` Morten Stevens 2015-07-22 21:07 ` Stephen Smalley 2015-07-08 13:13 ` Stephen Smalley 2015-07-08 16:37 ` Stephen Smalley 2015-07-08 20:37 ` Morten Stevens 2015-07-09 8:23 ` Hugh Dickins 2015-07-09 12:59 ` Stephen Smalley 2015-07-10 7:48 ` Hugh Dickins 2015-07-10 13:09 ` Stephen Smalley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).