All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Yingliang <yangyingliang@huawei.com>
To: <tj@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <cgroups@vger.kernel.org>,
	<netdev@vger.kernel.org>,
	"Libin (Huawei)" <huawei.libin@huawei.com>,
	<yangyingliang@huawei.com>, <guofan5@huawei.com>,
	<wangkefeng.wang@huawei.com>
Subject: cgroup pointed by sock is leaked on mode switch
Date: Sat, 2 May 2020 18:27:21 +0800	[thread overview]
Message-ID: <03dab6ab-0ffe-3cae-193f-a7f84e9b14c5@huawei.com> (raw)

Hi,

I got an oom panic because cgroup is leaked.

Here is the steps :
   - run a docker with --cap-add sys_admin parameter and the systemd 
process in the docker uses both cgroupv1 and cgroupv2
   - ssh/exit from host to docker repeately

I find the number nr_dying_descendants is increasing:
linux-dVpNUK:~ # find /sys/fs/cgroup/ -name cgroup.stat -exec grep 
'^nr_dying_descendants [^0]'  {} +
/sys/fs/cgroup/unified/cgroup.stat:nr_dying_descendants 80
/sys/fs/cgroup/unified/system.slice/cgroup.stat:nr_dying_descendants 1
/sys/fs/cgroup/unified/system.slice/system-hostos.slice/cgroup.stat:nr_dying_descendants 
1
/sys/fs/cgroup/unified/lxc/cgroup.stat:nr_dying_descendants 79
/sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/cgroup.stat:nr_dying_descendants 
78
/sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/system.slice/cgroup.stat:nr_dying_descendants 
78


The situation is as same as the commit bd1060a1d671 ("sock, cgroup: add 
sock->sk_cgroup") describes.
"On mode switch, cgroup references which are already being pointed to by 
socks may be leaked."

Do we have a fix for this leak now ?

Or how  about fix this by record the cgrp2 pointer, then put it when sk 
is freeing like this:

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index d9bd671105e2..cbb1e76ea305 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -770,6 +770,7 @@ struct sock_cgroup_data {
  #endif
          u64        val;
      };
+    struct cgroup *cgrpv2;
  };

  /*
@@ -802,6 +803,7 @@ static inline void sock_cgroup_set_prioidx(struct 
sock_cgroup_data *skcd,
          return;

      if (!(skcd_buf.is_data & 1)) {
+        WRITE_ONCE(skcd->cgrpv2, skcd_buf.val);
          skcd_buf.val = 0;
          skcd_buf.is_data = 1;
      }
@@ -819,6 +821,7 @@ static inline void sock_cgroup_set_classid(struct 
sock_cgroup_data *skcd,
          return;

      if (!(skcd_buf.is_data & 1)) {
+        WRITE_ONCE(skcd->cgrpv2, skcd_buf.val);
          skcd_buf.val = 0;
          skcd_buf.is_data = 1;
      }
diff --git a/net/core/sock.c b/net/core/sock.c
index a0dda2bf9d7c..7c761ef2d32e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1520,6 +1520,10 @@ static void sk_prot_free(struct proto *prot, 
struct sock *sk)
      slab = prot->slab;

      cgroup_sk_free(&sk->sk_cgrp_data);
+    if (sk->sk_cgrp_data.cgrpv2) {
+        cgroup_put(sk->sk_cgrp_data.cgrpv2);
+        sk->sk_cgrp_data.cgrpv2 = NULL;
+    }
      mem_cgroup_sk_free(sk);
      security_sk_free(sk);
      if (slab != NULL)


Thanks,
Yang


WARNING: multiple messages have this Message-ID (diff)
From: Yang Yingliang <yangyingliang@huawei.com>
To: tj@kernel.org
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	netdev@vger.kernel.org,
	"Libin (Huawei)" <huawei.libin@huawei.com>,
	yangyingliang@huawei.com, guofan5@huawei.com,
	wangkefeng.wang@huawei.com
Subject: cgroup pointed by sock is leaked on mode switch
Date: Sat, 2 May 2020 18:27:21 +0800	[thread overview]
Message-ID: <03dab6ab-0ffe-3cae-193f-a7f84e9b14c5@huawei.com> (raw)

Hi,

I got an oom panic because cgroup is leaked.

Here is the steps :
   - run a docker with --cap-add sys_admin parameter and the systemd 
process in the docker uses both cgroupv1 and cgroupv2
   - ssh/exit from host to docker repeately

I find the number nr_dying_descendants is increasing:
linux-dVpNUK:~ # find /sys/fs/cgroup/ -name cgroup.stat -exec grep 
'^nr_dying_descendants [^0]'  {} +
/sys/fs/cgroup/unified/cgroup.stat:nr_dying_descendants 80
/sys/fs/cgroup/unified/system.slice/cgroup.stat:nr_dying_descendants 1
/sys/fs/cgroup/unified/system.slice/system-hostos.slice/cgroup.stat:nr_dying_descendants 
1
/sys/fs/cgroup/unified/lxc/cgroup.stat:nr_dying_descendants 79
/sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/cgroup.stat:nr_dying_descendants 
78
/sys/fs/cgroup/unified/lxc/5f1fdb8c54fa40c3e599613dab6e4815058b76ebada8a27bc1fe80c0d4801764/system.slice/cgroup.stat:nr_dying_descendants 
78


The situation is as same as the commit bd1060a1d671 ("sock, cgroup: add 
sock->sk_cgroup") describes.
"On mode switch, cgroup references which are already being pointed to by 
socks may be leaked."

Do we have a fix for this leak now ?

Or how  about fix this by record the cgrp2 pointer, then put it when sk 
is freeing like this:

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index d9bd671105e2..cbb1e76ea305 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -770,6 +770,7 @@ struct sock_cgroup_data {
  #endif
          u64        val;
      };
+    struct cgroup *cgrpv2;
  };

  /*
@@ -802,6 +803,7 @@ static inline void sock_cgroup_set_prioidx(struct 
sock_cgroup_data *skcd,
          return;

      if (!(skcd_buf.is_data & 1)) {
+        WRITE_ONCE(skcd->cgrpv2, skcd_buf.val);
          skcd_buf.val = 0;
          skcd_buf.is_data = 1;
      }
@@ -819,6 +821,7 @@ static inline void sock_cgroup_set_classid(struct 
sock_cgroup_data *skcd,
          return;

      if (!(skcd_buf.is_data & 1)) {
+        WRITE_ONCE(skcd->cgrpv2, skcd_buf.val);
          skcd_buf.val = 0;
          skcd_buf.is_data = 1;
      }
diff --git a/net/core/sock.c b/net/core/sock.c
index a0dda2bf9d7c..7c761ef2d32e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1520,6 +1520,10 @@ static void sk_prot_free(struct proto *prot, 
struct sock *sk)
      slab = prot->slab;

      cgroup_sk_free(&sk->sk_cgrp_data);
+    if (sk->sk_cgrp_data.cgrpv2) {
+        cgroup_put(sk->sk_cgrp_data.cgrpv2);
+        sk->sk_cgrp_data.cgrpv2 = NULL;
+    }
      mem_cgroup_sk_free(sk);
      security_sk_free(sk);
      if (slab != NULL)


Thanks,
Yang


             reply	other threads:[~2020-05-02 10:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-02 10:27 Yang Yingliang [this message]
2020-05-02 10:27 ` cgroup pointed by sock is leaked on mode switch Yang Yingliang
2020-05-05 16:06 ` Tejun Heo
2020-05-06  1:50   ` Yang Yingliang
2020-05-06  1:50     ` Yang Yingliang
2020-05-06  2:16     ` Zefan Li
2020-05-06  2:16       ` Zefan Li
2020-05-06  7:51       ` Zefan Li
2020-05-06  7:51         ` Zefan Li
2020-05-09  2:31         ` Yang Yingliang
2020-05-09  2:31           ` Yang Yingliang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03dab6ab-0ffe-3cae-193f-a7f84e9b14c5@huawei.com \
    --to=yangyingliang@huawei.com \
    --cc=cgroups@vger.kernel.org \
    --cc=guofan5@huawei.com \
    --cc=huawei.libin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.