From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:55162 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728075AbfACVyg (ORCPT ); Thu, 3 Jan 2019 16:54:36 -0500 From: David Howells In-Reply-To: <20190103173442.GA7428@gmail.com> References: <20190103173442.GA7428@gmail.com> <20190103010000.GA32003@gmail.com> <20190103035426.23526-1-avagin@gmail.com> <20190103083229.GJ2217@ZenIV.linux.org.uk> To: Andrei Vagin Cc: dhowells@redhat.com, Al Viro , linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, Li Zefan Subject: Re: [PATCH vfs/for-next v6] cgroup: fix top cgroup refcnt leak MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <16977.1546552471.1@warthog.procyon.org.uk> Content-Transfer-Encoding: 8BIT Date: Thu, 03 Jan 2019 21:54:31 +0000 Message-ID: <16978.1546552471@warthog.procyon.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Andrei, It turns out that the cgroup-v1 refcounting is slightly broken upstream. If kernfs_get_inode() fails in kernfs_fill_super(), then there's refcount breakage in the error handling. This can be provoked by making the following change: --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -240,6 +240,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic) sb->s_shrink.seeks = 0; /* get root inode, initialize and unlock it */ + if (strcmp(current->comm, "foobar") == 0) + return -ENOANO; mutex_lock(&kernfs_mutex); inode = kernfs_get_inode(sb, info->root->kn); mutex_unlock(&kernfs_mutex); and then copying /bin/mount to /tmp/foobar and doing: [root@andromeda ~]# /tmp/foobar -t cgroup -o none,name=xxxxy xxx /tmp/x/a1 foobar: /tmp/x/a1: mount(2) system call failed: No anode. In dmesg I see the attached traces (see below). The problem appears to be because cgroup_do_mount() calls kernfs_mount(), but the refcount on the new root cgroup object hasn't been properly initialised yet. However, because we make it past sget(), the superblock thinks it has taken the caller's ref - and this gets eaten by cgroup_kill_sb(). Further, another ref appears to be released by cgroup_do_mount() in the event that kernfs_mount() fails. David ------------[ cut here ]------------ percpu_ref_kill_and_confirm called more than once on css_release! WARNING: CPU: 3 PID: 3218 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x4b/0x14c Modules linked in: CPU: 3 PID: 3218 Comm: bugger Not tainted 4.20.0-fscache+ #1287 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 RIP: 0010:percpu_ref_kill_and_confirm+0x4b/0x14c Code: c6 74 29 80 3d 42 0a 00 01 00 75 20 48 8b 53 10 48 c7 c6 f0 cd e9 81 48 c7 c7 82 99 16 82 c6 05 27 0a 00 01 01 e8 04 54 ac ff <0f> 0b 48 83 4b 08 02 4c 89 e6 48 89 df e8 fb fc ff ff 65 ff 05 54 RSP: 0018:ffff8880c5103c60 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff8880d35e8020 RCX: ffff8880c5103b4c RDX: 0000000000000046 RSI: ffffffff8245fef8 RDI: ffffffff810ac1ec RBP: ffff8880c5103c78 R08: 0000000000000041 R09: 0000000000021900 R10: ffff8880c5103900 R11: 0000006423114dc0 R12: 0000000000000000 R13: 000000000027e0eb R14: 0000000000000246 R15: 0000000000000000 FS: 00007f8ec2dbb080(0000) GS:ffff8880c6d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005586974b0198 CR3: 00000000c4bfe002 CR4: 00000000001606e0 Call Trace: cgroup_kill_sb+0x131/0x141 deactivate_locked_super+0x29/0x5b kernfs_mount_ns+0x1fa/0x223 cgroup_do_mount+0x36/0x1c8 cgroup1_mount+0x5b2/0x610 cgroup_mount+0x33b/0x37f mount_fs+0x6a/0x10b vfs_kern_mount+0x67/0x13c do_mount+0x90e/0xb7e ? kmem_cache_alloc_trace+0x241/0x27d ksys_mount+0x72/0x97 __x64_sys_mount+0x21/0x24 do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f8ec1e14ada Code: 48 8b 0d c9 a3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 a3 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2fa32b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00005586974ae2a0 RCX: 00007f8ec1e14ada RDX: 00005586974ae480 RSI: 00005586974ae500 RDI: 00005586974ae4e0 RBP: 0000000000000000 R08: 00005586974ae4a0 R09: 00005586974ae480 R10: 00000000c0ed0000 R11: 0000000000000246 R12: 00005586974ae4e0 R13: 00005586974ae480 R14: 0000000000000000 R15: 00007f8ec2ba8184 irq event stamp: 3532 hardirqs last enabled at (3531): [] kfree+0x152/0x159 hardirqs last disabled at (3532): [] _raw_spin_lock_irqsave+0x12/0x44 softirqs last enabled at (3326): [] __do_softirq+0x353/0x38f softirqs last disabled at (3317): [] irq_exit+0x63/0xd1 ---[ end trace 9bca09dc135d9213 ]--- WARNING: CPU: 3 PID: 3218 at lib/percpu-refcount.c:359 percpu_ref_reinit+0x10/0x17 Modules linked in: CPU: 3 PID: 3218 Comm: bugger Tainted: G W 4.20.0-fscache+ #1287 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 RIP: 0010:percpu_ref_reinit+0x10/0x17 Code: fa ff ff 48 8d 65 f0 4c 89 f6 48 c7 c7 e0 8d 51 82 5b 41 5e 5d e9 3c 50 45 00 48 8b 47 08 a8 03 74 08 48 8b 07 48 85 c0 74 02 <0f> 0b e9 c4 fe ff ff 55 31 f6 53 48 89 fb 48 83 ec 30 65 48 8b 04 RSP: 0018:ffff8880c5103d38 EFLAGS: 00010282 RAX: fffffffffffffffe RBX: ffff8880d35e8000 RCX: ffffffff810f1a8f RDX: 00000000ffffbf7e RSI: 00000000425b018d RDI: ffff8880d35e8020 RBP: ffff8880c5103da8 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8880c5103d40 R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000000 R14: ffffffffffffffc9 R15: ffff88803f63f001 FS: 00007f8ec2dbb080(0000) GS:ffff8880c6d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005586974b0198 CR3: 00000000c4bfe002 CR4: 00000000001606e0 Call Trace: cgroup1_mount+0x5d1/0x610 cgroup_mount+0x33b/0x37f mount_fs+0x6a/0x10b vfs_kern_mount+0x67/0x13c do_mount+0x90e/0xb7e ? kmem_cache_alloc_trace+0x241/0x27d ksys_mount+0x72/0x97 __x64_sys_mount+0x21/0x24 do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f8ec1e14ada Code: 48 8b 0d c9 a3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 a3 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2fa32b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00005586974ae2a0 RCX: 00007f8ec1e14ada RDX: 00005586974ae480 RSI: 00005586974ae500 RDI: 00005586974ae4e0 RBP: 0000000000000000 R08: 00005586974ae4a0 R09: 00005586974ae480 R10: 00000000c0ed0000 R11: 0000000000000246 R12: 00005586974ae4e0 R13: 00005586974ae480 R14: 0000000000000000 R15: 00007f8ec2ba8184 irq event stamp: 3666 hardirqs last enabled at (3665): [] __call_rcu+0x1dc/0x1fa hardirqs last disabled at (3666): [] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (3608): [] __do_softirq+0x353/0x38f softirqs last disabled at (3535): [] irq_exit+0x63/0xd1 ---[ end trace 9bca09dc135d9214 ]---