All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Andrei Vagin <avagin@gmail.com>
Cc: dhowells@redhat.com, Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH vfs/for-next v6] cgroup: fix top cgroup refcnt leak
Date: Thu, 03 Jan 2019 21:54:31 +0000	[thread overview]
Message-ID: <16978.1546552471@warthog.procyon.org.uk> (raw)
In-Reply-To: <20190103173442.GA7428@gmail.com>

Hi Andrei,

It turns out that the cgroup-v1 refcounting is slightly broken upstream.  If
kernfs_get_inode() fails in kernfs_fill_super(), then there's refcount
breakage in the error handling.

This can be provoked by making the following change:

--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -240,6 +240,8 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	sb->s_shrink.seeks = 0;
 
 	/* get root inode, initialize and unlock it */
+	if (strcmp(current->comm, "foobar") == 0)
+		return -ENOANO;
 	mutex_lock(&kernfs_mutex);
 	inode = kernfs_get_inode(sb, info->root->kn);
 	mutex_unlock(&kernfs_mutex);

and then copying /bin/mount to /tmp/foobar and doing:

[root@andromeda ~]# /tmp/foobar -t cgroup -o none,name=xxxxy xxx /tmp/x/a1
foobar: /tmp/x/a1: mount(2) system call failed: No anode.

In dmesg I see the attached traces (see below).

The problem appears to be because cgroup_do_mount() calls kernfs_mount(), but
the refcount on the new root cgroup object hasn't been properly initialised
yet.  However, because we make it past sget(), the superblock thinks it has
taken the caller's ref - and this gets eaten by cgroup_kill_sb().

Further, another ref appears to be released by cgroup_do_mount() in the event
that kernfs_mount() fails.

David

------------[ cut here ]------------
percpu_ref_kill_and_confirm called more than once on css_release!
WARNING: CPU: 3 PID: 3218 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x4b/0x14c
Modules linked in:
CPU: 3 PID: 3218 Comm: bugger Not tainted 4.20.0-fscache+ #1287
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:percpu_ref_kill_and_confirm+0x4b/0x14c
Code: c6 74 29 80 3d 42 0a 00 01 00 75 20 48 8b 53 10 48 c7 c6 f0 cd e9 81 48 c7 c7 82 99 16 82 c6 05 27 0a 00 01 01 e8 04 54 ac ff <0f> 0b 48 83 4b 08 02 4c 89 e6 48 89 df e8 fb fc ff ff 65 ff 05 54
RSP: 0018:ffff8880c5103c60 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff8880d35e8020 RCX: ffff8880c5103b4c
RDX: 0000000000000046 RSI: ffffffff8245fef8 RDI: ffffffff810ac1ec
RBP: ffff8880c5103c78 R08: 0000000000000041 R09: 0000000000021900
R10: ffff8880c5103900 R11: 0000006423114dc0 R12: 0000000000000000
R13: 000000000027e0eb R14: 0000000000000246 R15: 0000000000000000
FS:  00007f8ec2dbb080(0000) GS:ffff8880c6d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005586974b0198 CR3: 00000000c4bfe002 CR4: 00000000001606e0
Call Trace:
 cgroup_kill_sb+0x131/0x141
 deactivate_locked_super+0x29/0x5b
 kernfs_mount_ns+0x1fa/0x223
 cgroup_do_mount+0x36/0x1c8
 cgroup1_mount+0x5b2/0x610
 cgroup_mount+0x33b/0x37f
 mount_fs+0x6a/0x10b
 vfs_kern_mount+0x67/0x13c
 do_mount+0x90e/0xb7e
 ? kmem_cache_alloc_trace+0x241/0x27d
 ksys_mount+0x72/0x97
 __x64_sys_mount+0x21/0x24
 do_syscall_64+0x7d/0x1a0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8ec1e14ada
Code: 48 8b 0d c9 a3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 a3 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe2fa32b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00005586974ae2a0 RCX: 00007f8ec1e14ada
RDX: 00005586974ae480 RSI: 00005586974ae500 RDI: 00005586974ae4e0
RBP: 0000000000000000 R08: 00005586974ae4a0 R09: 00005586974ae480
R10: 00000000c0ed0000 R11: 0000000000000246 R12: 00005586974ae4e0
R13: 00005586974ae480 R14: 0000000000000000 R15: 00007f8ec2ba8184
irq event stamp: 3532
hardirqs last  enabled at (3531): [<ffffffff811c0584>] kfree+0x152/0x159
hardirqs last disabled at (3532): [<ffffffff819e27e8>] _raw_spin_lock_irqsave+0x12/0x44
softirqs last  enabled at (3326): [<ffffffff81c00353>] __do_softirq+0x353/0x38f
softirqs last disabled at (3317): [<ffffffff81058c16>] irq_exit+0x63/0xd1
---[ end trace 9bca09dc135d9213 ]---
WARNING: CPU: 3 PID: 3218 at lib/percpu-refcount.c:359 percpu_ref_reinit+0x10/0x17
Modules linked in:
CPU: 3 PID: 3218 Comm: bugger Tainted: G        W         4.20.0-fscache+ #1287
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:percpu_ref_reinit+0x10/0x17
Code: fa ff ff 48 8d 65 f0 4c 89 f6 48 c7 c7 e0 8d 51 82 5b 41 5e 5d e9 3c 50 45 00 48 8b 47 08 a8 03 74 08 48 8b 07 48 85 c0 74 02 <0f> 0b e9 c4 fe ff ff 55 31 f6 53 48 89 fb 48 83 ec 30 65 48 8b 04
RSP: 0018:ffff8880c5103d38 EFLAGS: 00010282
RAX: fffffffffffffffe RBX: ffff8880d35e8000 RCX: ffffffff810f1a8f
RDX: 00000000ffffbf7e RSI: 00000000425b018d RDI: ffff8880d35e8020
RBP: ffff8880c5103da8 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8880c5103d40 R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000000 R14: ffffffffffffffc9 R15: ffff88803f63f001
FS:  00007f8ec2dbb080(0000) GS:ffff8880c6d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005586974b0198 CR3: 00000000c4bfe002 CR4: 00000000001606e0
Call Trace:
 cgroup1_mount+0x5d1/0x610
 cgroup_mount+0x33b/0x37f
 mount_fs+0x6a/0x10b
 vfs_kern_mount+0x67/0x13c
 do_mount+0x90e/0xb7e
 ? kmem_cache_alloc_trace+0x241/0x27d
 ksys_mount+0x72/0x97
 __x64_sys_mount+0x21/0x24
 do_syscall_64+0x7d/0x1a0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8ec1e14ada
Code: 48 8b 0d c9 a3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 96 a3 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe2fa32b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00005586974ae2a0 RCX: 00007f8ec1e14ada
RDX: 00005586974ae480 RSI: 00005586974ae500 RDI: 00005586974ae4e0
RBP: 0000000000000000 R08: 00005586974ae4a0 R09: 00005586974ae480
R10: 00000000c0ed0000 R11: 0000000000000246 R12: 00005586974ae4e0
R13: 00005586974ae480 R14: 0000000000000000 R15: 00007f8ec2ba8184
irq event stamp: 3666
hardirqs last  enabled at (3665): [<ffffffff810bb0b1>] __call_rcu+0x1dc/0x1fa
hardirqs last disabled at (3666): [<ffffffff81001639>] trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last  enabled at (3608): [<ffffffff81c00353>] __do_softirq+0x353/0x38f
softirqs last disabled at (3535): [<ffffffff81058c16>] irq_exit+0x63/0xd1
---[ end trace 9bca09dc135d9214 ]---

  parent reply	other threads:[~2019-01-03 21:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 23:59 [PATCH] cgroup: fix top cgroup refcnt leak Andrei Vagin
2018-12-29  0:04 ` [PATCH vfs/for-next v2] " Andrei Vagin
2018-12-30 19:41   ` Andrei Vagin
2019-01-02  2:28   ` Al Viro
2019-01-02 18:14     ` [PATCH vfs/for-next v3] " Andrei Vagin
2019-01-02 19:37     ` [PATCH vfs/for-next v2] " Andrei Vagin
2019-01-02 19:37     ` [PATCH vfs/for-next v4] " Andrei Vagin
2019-01-02 20:02       ` Al Viro
2019-01-02 21:06         ` Andrei Vagin
2019-01-03  0:26         ` David Howells
2019-01-03  0:43           ` Andrei Vagin
2019-01-03  1:00             ` Andrei Vagin
2019-01-03  3:54               ` [PATCH vfs/for-next v6] " Andrei Vagin
2019-01-03  8:32                 ` Al Viro
2019-01-03 17:34                   ` Andrei Vagin
2019-01-03 21:54                   ` David Howells [this message]
2019-01-02 22:26 ` [PATCH vfs/for-next v2] " David Howells
2019-01-02 23:06   ` Andrei Vagin
2019-01-02 23:31     ` Andrei Vagin
2019-01-03  0:33     ` David Howells
2019-01-03 13:41     ` David Howells
2019-01-03 13:42     ` David Howells
2019-01-03 15:27     ` David Howells
2019-01-03 15:44     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16978.1546552471@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=avagin@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.