From: Yang Yingliang <yangyingliang@huawei.com> To: <tj@kernel.org> Cc: <linux-kernel@vger.kernel.org>, <cgroups@vger.kernel.org>, <netdev@vger.kernel.org>, "Libin (Huawei)" <huawei.libin@huawei.com>, <yangyingliang@huawei.com>, <guofan5@huawei.com>, <wangkefeng.wang@huawei.com> Subject: cgroup pointed by sock is leaked on mode switch Date: Sat, 2 May 2020 18:27:21 +0800 [thread overview] Message-ID: <03dab6ab-0ffe-3cae-193f-a7f84e9b14c5@huawei.com> (raw) Hi, I got an oom panic because cgroup is leaked. Here is the steps : - run a docker with --cap-add sys_admin parameter and the systemd process in the docker uses both cgroupv1 and cgroupv2 - ssh/exit from host to docker repeately I find the number nr_dying_descendants is increasing: linux-dVpNUK:~ # find /sys/fs/cgroup/ -name cgroup.stat -exec grep '^nr_dying_descendants [^0]' {} + /sys/fs/cgroup/unified/cgroup.stat:nr_dying_descendants 80 /sys/fs/cgroup/unified/system.slice/cgroup.stat:nr_dying_descendants 1 /sys/fs/cgroup/unified/system.slice/system-hostos.slice/cgroup.stat:nr_dying_descendants 1 /sys/fs/cgroup/unified/lxc/cgroup.stat:nr_dying_descendants 79 /sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/cgroup.stat:nr_dying_descendants 78 /sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/system.slice/cgroup.stat:nr_dying_descendants 78 The situation is as same as the commit bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") describes. "On mode switch, cgroup references which are already being pointed to by socks may be leaked." Do we have a fix for this leak now ? Or how about fix this by record the cgrp2 pointer, then put it when sk is freeing like this: diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index d9bd671105e2..cbb1e76ea305 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -770,6 +770,7 @@ struct sock_cgroup_data { #endif u64 val; }; + struct cgroup *cgrpv2; }; /* @@ -802,6 +803,7 @@ static inline void sock_cgroup_set_prioidx(struct sock_cgroup_data *skcd, return; if (!(skcd_buf.is_data & 1)) { + WRITE_ONCE(skcd->cgrpv2, skcd_buf.val); skcd_buf.val = 0; skcd_buf.is_data = 1; } @@ -819,6 +821,7 @@ static inline void sock_cgroup_set_classid(struct sock_cgroup_data *skcd, return; if (!(skcd_buf.is_data & 1)) { + WRITE_ONCE(skcd->cgrpv2, skcd_buf.val); skcd_buf.val = 0; skcd_buf.is_data = 1; } diff --git a/net/core/sock.c b/net/core/sock.c index a0dda2bf9d7c..7c761ef2d32e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1520,6 +1520,10 @@ static void sk_prot_free(struct proto *prot, struct sock *sk) slab = prot->slab; cgroup_sk_free(&sk->sk_cgrp_data); + if (sk->sk_cgrp_data.cgrpv2) { + cgroup_put(sk->sk_cgrp_data.cgrpv2); + sk->sk_cgrp_data.cgrpv2 = NULL; + } mem_cgroup_sk_free(sk); security_sk_free(sk); if (slab != NULL) Thanks, Yang
WARNING: multiple messages have this Message-ID (diff)
From: Yang Yingliang <yangyingliang@huawei.com> To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, netdev@vger.kernel.org, "Libin (Huawei)" <huawei.libin@huawei.com>, yangyingliang@huawei.com, guofan5@huawei.com, wangkefeng.wang@huawei.com Subject: cgroup pointed by sock is leaked on mode switch Date: Sat, 2 May 2020 18:27:21 +0800 [thread overview] Message-ID: <03dab6ab-0ffe-3cae-193f-a7f84e9b14c5@huawei.com> (raw) Hi, I got an oom panic because cgroup is leaked. Here is the steps : - run a docker with --cap-add sys_admin parameter and the systemd process in the docker uses both cgroupv1 and cgroupv2 - ssh/exit from host to docker repeately I find the number nr_dying_descendants is increasing: linux-dVpNUK:~ # find /sys/fs/cgroup/ -name cgroup.stat -exec grep '^nr_dying_descendants [^0]' {} + /sys/fs/cgroup/unified/cgroup.stat:nr_dying_descendants 80 /sys/fs/cgroup/unified/system.slice/cgroup.stat:nr_dying_descendants 1 /sys/fs/cgroup/unified/system.slice/system-hostos.slice/cgroup.stat:nr_dying_descendants 1 /sys/fs/cgroup/unified/lxc/cgroup.stat:nr_dying_descendants 79 /sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/cgroup.stat:nr_dying_descendants 78 /sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/system.slice/cgroup.stat:nr_dying_descendants 78 The situation is as same as the commit bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") describes. "On mode switch, cgroup references which are already being pointed to by socks may be leaked." Do we have a fix for this leak now ? Or how about fix this by record the cgrp2 pointer, then put it when sk is freeing like this: diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index d9bd671105e2..cbb1e76ea305 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -770,6 +770,7 @@ struct sock_cgroup_data { #endif u64 val; }; + struct cgroup *cgrpv2; }; /* @@ -802,6 +803,7 @@ static inline void sock_cgroup_set_prioidx(struct sock_cgroup_data *skcd, return; if (!(skcd_buf.is_data & 1)) { + WRITE_ONCE(skcd->cgrpv2, skcd_buf.val); skcd_buf.val = 0; skcd_buf.is_data = 1; } @@ -819,6 +821,7 @@ static inline void sock_cgroup_set_classid(struct sock_cgroup_data *skcd, return; if (!(skcd_buf.is_data & 1)) { + WRITE_ONCE(skcd->cgrpv2, skcd_buf.val); skcd_buf.val = 0; skcd_buf.is_data = 1; } diff --git a/net/core/sock.c b/net/core/sock.c index a0dda2bf9d7c..7c761ef2d32e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1520,6 +1520,10 @@ static void sk_prot_free(struct proto *prot, struct sock *sk) slab = prot->slab; cgroup_sk_free(&sk->sk_cgrp_data); + if (sk->sk_cgrp_data.cgrpv2) { + cgroup_put(sk->sk_cgrp_data.cgrpv2); + sk->sk_cgrp_data.cgrpv2 = NULL; + } mem_cgroup_sk_free(sk); security_sk_free(sk); if (slab != NULL) Thanks, Yang
next reply other threads:[~2020-05-02 10:27 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-05-02 10:27 Yang Yingliang [this message] 2020-05-02 10:27 ` cgroup pointed by sock is leaked on mode switch Yang Yingliang 2020-05-05 16:06 ` Tejun Heo 2020-05-06 1:50 ` Yang Yingliang 2020-05-06 1:50 ` Yang Yingliang 2020-05-06 2:16 ` Zefan Li 2020-05-06 2:16 ` Zefan Li 2020-05-06 7:51 ` Zefan Li 2020-05-06 7:51 ` Zefan Li 2020-05-09 2:31 ` Yang Yingliang 2020-05-09 2:31 ` Yang Yingliang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=03dab6ab-0ffe-3cae-193f-a7f84e9b14c5@huawei.com \ --to=yangyingliang@huawei.com \ --cc=cgroups@vger.kernel.org \ --cc=guofan5@huawei.com \ --cc=huawei.libin@huawei.com \ --cc=linux-kernel@vger.kernel.org \ --cc=netdev@vger.kernel.org \ --cc=tj@kernel.org \ --cc=wangkefeng.wang@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.