All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zefan Li <lizefan@huawei.com>
To: Tejun Heo <tj@kernel.org>, David Miller <davem@davemloft.net>
Cc: yangyingliang <yangyingliang@huawei.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	<huawei.libin@huawei.com>, <guofan5@huawei.com>,
	<linux-kernel@vger.kernel.org>, <cgroups@vger.kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: [PATCH v2] netprio_cgroup: Fix unlimited memory leak of v2 cgroups
Date: Sat, 9 May 2020 11:32:10 +0800	[thread overview]
Message-ID: <2fcd921d-8f42-9d33-951c-899d0bbdd92d@huawei.com> (raw)
In-Reply-To: <939566f5-abe3-3526-d4ff-ec6bf8e8c138@huawei.com>

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called cgroup_sk_alloc()
thought it's in v2 mode, so it stores the cgroup pointer in sk->sk_cgrp_data
and increments the cgroup refcnt, but then sock_update_netprioidx() thought
it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the
cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap cgroup
control file. The easiest fix is to also do the switch when a task is attached
to a new cgroup.

Fixes: bd1060a1d671("sock, cgroup: add sock->sk_cgroup")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
---

forgot to rebase to the latest kernel.

---
 net/core/netprio_cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 8881dd9..9bd4cab 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -236,6 +236,8 @@ static void net_prio_attach(struct cgroup_taskset *tset)
 	struct task_struct *p;
 	struct cgroup_subsys_state *css;
 
+	cgroup_sk_alloc_disable();
+
 	cgroup_taskset_for_each(p, css, tset) {
 		void *v = (void *)(unsigned long)css->id;
 
-- 
2.7.4

WARNING: multiple messages have this Message-ID (diff)
From: Zefan Li <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Cc: yangyingliang
	<yangyingliang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	Kefeng Wang
	<wangkefeng.wang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	huawei.libin-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	guofan5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux Kernel Network Developers
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: [PATCH v2] netprio_cgroup: Fix unlimited memory leak of v2 cgroups
Date: Sat, 9 May 2020 11:32:10 +0800	[thread overview]
Message-ID: <2fcd921d-8f42-9d33-951c-899d0bbdd92d@huawei.com> (raw)
In-Reply-To: <939566f5-abe3-3526-d4ff-ec6bf8e8c138-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called cgroup_sk_alloc()
thought it's in v2 mode, so it stores the cgroup pointer in sk->sk_cgrp_data
and increments the cgroup refcnt, but then sock_update_netprioidx() thought
it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the
cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap cgroup
control file. The easiest fix is to also do the switch when a task is attached
to a new cgroup.

Fixes: bd1060a1d671("sock, cgroup: add sock->sk_cgroup")
Reported-by: Yang Yingliang <yangyingliang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Tested-by: Yang Yingliang <yangyingliang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Zefan Li <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---

forgot to rebase to the latest kernel.

---
 net/core/netprio_cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 8881dd9..9bd4cab 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -236,6 +236,8 @@ static void net_prio_attach(struct cgroup_taskset *tset)
 	struct task_struct *p;
 	struct cgroup_subsys_state *css;
 
+	cgroup_sk_alloc_disable();
+
 	cgroup_taskset_for_each(p, css, tset) {
 		void *v = (void *)(unsigned long)css->id;
 
-- 
2.7.4

  reply	other threads:[~2020-05-09  3:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-09  3:19 [PATCH] netprio_cgroup: Fix unlimited memory leak of v2 cgroup Zefan Li
2020-05-09  3:19 ` Zefan Li
2020-05-09  3:32 ` Zefan Li [this message]
2020-05-09  3:32   ` [PATCH v2] netprio_cgroup: Fix unlimited memory leak of v2 cgroups Zefan Li
2020-05-09  5:58   ` Jakub Kicinski
2020-05-09  5:58     ` Jakub Kicinski
2020-05-10  4:02     ` Jakub Kicinski
2020-05-10  4:02       ` Jakub Kicinski
2020-05-21 21:14       ` John Fastabend
2020-05-22  3:26         ` Zefan Li
2020-05-22  3:26           ` Zefan Li
2020-05-09 22:57   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fcd921d-8f42-9d33-951c-899d0bbdd92d@huawei.com \
    --to=lizefan@huawei.com \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=guofan5@huawei.com \
    --cc=huawei.libin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=yangyingliang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.