linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 2/2] net: memcg: late association of sock to memcg
       [not found] <20200310051606.33121-1-shakeelb@google.com>
@ 2020-03-10  5:16 ` Shakeel Butt
  2020-03-10 15:53   ` Eric Dumazet
                     ` (3 more replies)
  2020-03-10 22:34 ` [PATCH v4 1/2] cgroup: memcg: net: do not associate sock with unrelated cgroup David Miller
  1 sibling, 4 replies; 7+ messages in thread
From: Shakeel Butt @ 2020-03-10  5:16 UTC (permalink / raw)
  To: Eric Dumazet, Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Greg Thelen, Andrew Morton,
	David S . Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-mm, cgroups, linux-kernel, Shakeel Butt

If a TCP socket is allocated in IRQ context or cloned from unassociated
(i.e. not associated to a memcg) in IRQ context then it will remain
unassociated for its whole life. Almost half of the TCPs created on the
system are created in IRQ context, so, memory used by such sockets will
not be accounted by the memcg.

This issue is more widespread in cgroup v1 where network memory
accounting is opt-in but it can happen in cgroup v2 if the source socket
for the cloning was created in root memcg.

To fix the issue, just do the association of the sockets at the accept()
time in the process context and then force charge the memory buffer
already used and reserved by the socket.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
Changes since v3:
- Moved the memcg association completely at accept time.

Changes since v2:
- Additional check for charging.
- Release the sock after charging.

Changes since v1:
- added sk->sk_rmem_alloc to initial charging.
- added synchronization to get memory usage and set sk_memcg race-free.

 mm/memcontrol.c                 | 14 --------------
 net/core/sock.c                 |  5 ++++-
 net/ipv4/inet_connection_sock.c | 20 ++++++++++++++++++++
 3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 06a889b0538b..351603c6c1c9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6737,20 +6737,6 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 	if (!mem_cgroup_sockets_enabled)
 		return;
 
-	/*
-	 * Socket cloning can throw us here with sk_memcg already
-	 * filled. It won't however, necessarily happen from
-	 * process context. So the test for root memcg given
-	 * the current task's memcg won't help us in this case.
-	 *
-	 * Respecting the original socket's memcg is a better
-	 * decision in this case.
-	 */
-	if (sk->sk_memcg) {
-		css_get(&sk->sk_memcg->css);
-		return;
-	}
-
 	/* Do not associate the sock with unrelated interrupted task's memcg. */
 	if (in_interrupt())
 		return;
diff --git a/net/core/sock.c b/net/core/sock.c
index e4af4dbc1c9e..0fc8937a7ff4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1832,7 +1832,10 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		atomic_set(&newsk->sk_zckey, 0);
 
 		sock_reset_flag(newsk, SOCK_DONE);
-		mem_cgroup_sk_alloc(newsk);
+
+		/* sk->sk_memcg will be populated at accept() time */
+		newsk->sk_memcg = NULL;
+
 		cgroup_sk_alloc(&newsk->sk_cgrp_data);
 
 		rcu_read_lock();
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a4db79b1b643..65a3b2565102 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -482,6 +482,26 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
 		}
 		spin_unlock_bh(&queue->fastopenq.lock);
 	}
+
+	if (mem_cgroup_sockets_enabled) {
+		int amt;
+
+		/* atomically get the memory usage, set and charge the
+		 * sk->sk_memcg.
+		 */
+		lock_sock(newsk);
+
+		/* The sk has not been accepted yet, no need to look at
+		 * sk->sk_wmem_queued.
+		 */
+		amt = sk_mem_pages(newsk->sk_forward_alloc +
+				   atomic_read(&sk->sk_rmem_alloc));
+		mem_cgroup_sk_alloc(newsk);
+		if (newsk->sk_memcg && amt)
+			mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
+
+		release_sock(newsk);
+	}
 out:
 	release_sock(sk);
 	if (req)
-- 
2.25.1.481.gfbce0eb801-goog



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] net: memcg: late association of sock to memcg
  2020-03-10  5:16 ` [PATCH v4 2/2] net: memcg: late association of sock to memcg Shakeel Butt
@ 2020-03-10 15:53   ` Eric Dumazet
  2020-03-10 22:34   ` David Miller
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2020-03-10 15:53 UTC (permalink / raw)
  To: Shakeel Butt, Eric Dumazet, Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Greg Thelen, Andrew Morton,
	David S . Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-mm, cgroups, linux-kernel



On 3/9/20 10:16 PM, Shakeel Butt wrote:
> If a TCP socket is allocated in IRQ context or cloned from unassociated
> (i.e. not associated to a memcg) in IRQ context then it will remain
> unassociated for its whole life. Almost half of the TCPs created on the
> system are created in IRQ context, so, memory used by such sockets will
> not be accounted by the memcg.
> 
> This issue is more widespread in cgroup v1 where network memory
> accounting is opt-in but it can happen in cgroup v2 if the source socket
> for the cloning was created in root memcg.
> 
> To fix the issue, just do the association of the sockets at the accept()
> time in the process context and then force charge the memory buffer
> already used and reserved by the socket.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] cgroup: memcg: net: do not associate sock with unrelated cgroup
       [not found] <20200310051606.33121-1-shakeelb@google.com>
  2020-03-10  5:16 ` [PATCH v4 2/2] net: memcg: late association of sock to memcg Shakeel Butt
@ 2020-03-10 22:34 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2020-03-10 22:34 UTC (permalink / raw)
  To: shakeelb
  Cc: edumazet, guro, hannes, mhocko, gthelen, akpm, kuznet, yoshfuji,
	netdev, linux-mm, cgroups, linux-kernel

From: Shakeel Butt <shakeelb@google.com>
Date: Mon,  9 Mar 2020 22:16:05 -0700

> We are testing network memory accounting in our setup and noticed
> inconsistent network memory usage and often unrelated cgroups network
> usage correlates with testing workload. On further inspection, it
> seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
> irq context specially for cgroup v1.
> 
> mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
> and kind of assumes that this can only happen from sk_clone_lock()
> and the source sock object has already associated cgroup. However in
> cgroup v1, where network memory accounting is opt-in, the source sock
> can be unassociated with any cgroup and the new cloned sock can get
> associated with unrelated interrupted cgroup.
> 
> Cgroup v2 can also suffer if the source sock object was created by
> process in the root cgroup or if sk_alloc() is called in irq context.
> The fix is to just do nothing in interrupt.
> 
> WARNING: Please note that about half of the TCP sockets are allocated
> from the IRQ context, so, memory used by such sockets will not be
> accouted by the memcg.
> 
> The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
 ...
> The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
> Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
> Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Reviewed-by: Roman Gushchin <guro@fb.com>

Applied and queued up for -stable.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] net: memcg: late association of sock to memcg
  2020-03-10  5:16 ` [PATCH v4 2/2] net: memcg: late association of sock to memcg Shakeel Butt
  2020-03-10 15:53   ` Eric Dumazet
@ 2020-03-10 22:34   ` David Miller
  2020-03-10 22:38   ` Roman Gushchin
  2020-03-12 14:03   ` Qian Cai
  3 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2020-03-10 22:34 UTC (permalink / raw)
  To: shakeelb
  Cc: edumazet, guro, hannes, mhocko, gthelen, akpm, kuznet, yoshfuji,
	netdev, linux-mm, cgroups, linux-kernel

From: Shakeel Butt <shakeelb@google.com>
Date: Mon,  9 Mar 2020 22:16:06 -0700

> If a TCP socket is allocated in IRQ context or cloned from unassociated
> (i.e. not associated to a memcg) in IRQ context then it will remain
> unassociated for its whole life. Almost half of the TCPs created on the
> system are created in IRQ context, so, memory used by such sockets will
> not be accounted by the memcg.
> 
> This issue is more widespread in cgroup v1 where network memory
> accounting is opt-in but it can happen in cgroup v2 if the source socket
> for the cloning was created in root memcg.
> 
> To fix the issue, just do the association of the sockets at the accept()
> time in the process context and then force charge the memory buffer
> already used and reserved by the socket.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Applied and queued up for -stable.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] net: memcg: late association of sock to memcg
  2020-03-10  5:16 ` [PATCH v4 2/2] net: memcg: late association of sock to memcg Shakeel Butt
  2020-03-10 15:53   ` Eric Dumazet
  2020-03-10 22:34   ` David Miller
@ 2020-03-10 22:38   ` Roman Gushchin
  2020-03-12 14:03   ` Qian Cai
  3 siblings, 0 replies; 7+ messages in thread
From: Roman Gushchin @ 2020-03-10 22:38 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Eric Dumazet, Johannes Weiner, Michal Hocko, Greg Thelen,
	Andrew Morton, David S . Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, netdev, linux-mm, cgroups, linux-kernel

On Mon, Mar 09, 2020 at 10:16:06PM -0700, Shakeel Butt wrote:
> If a TCP socket is allocated in IRQ context or cloned from unassociated
> (i.e. not associated to a memcg) in IRQ context then it will remain
> unassociated for its whole life. Almost half of the TCPs created on the
> system are created in IRQ context, so, memory used by such sockets will
> not be accounted by the memcg.
> 
> This issue is more widespread in cgroup v1 where network memory
> accounting is opt-in but it can happen in cgroup v2 if the source socket
> for the cloning was created in root memcg.
> 
> To fix the issue, just do the association of the sockets at the accept()
> time in the process context and then force charge the memory buffer
> already used and reserved by the socket.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Reviewed-by: Roman Gushchin <guro@fb.com>

Thank you!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] net: memcg: late association of sock to memcg
  2020-03-10  5:16 ` [PATCH v4 2/2] net: memcg: late association of sock to memcg Shakeel Butt
                     ` (2 preceding siblings ...)
  2020-03-10 22:38   ` Roman Gushchin
@ 2020-03-12 14:03   ` Qian Cai
  2020-03-12 14:05     ` Shakeel Butt
  3 siblings, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-03-12 14:03 UTC (permalink / raw)
  To: Shakeel Butt, Eric Dumazet, Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Greg Thelen, Andrew Morton,
	David S . Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-mm, cgroups, linux-kernel

On Mon, 2020-03-09 at 22:16 -0700, Shakeel Butt wrote:
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index a4db79b1b643..65a3b2565102 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -482,6 +482,26 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
>  		}
>  		spin_unlock_bh(&queue->fastopenq.lock);
>  	}
> +
> +	if (mem_cgroup_sockets_enabled) {
> +		int amt;
> +
> +		/* atomically get the memory usage, set and charge the
> +		 * sk->sk_memcg.
> +		 */
> +		lock_sock(newsk);

Here we have a deadlock,

[  362.620977][ T4106] WARNING: possible recursive locking detected
[  362.626983][ T4106] 5.6.0-rc5-next-20200312+ #5 Tainted: G             L   
[  362.633941][ T4106] --------------------------------------------
[  362.639944][ T4106] sshd/4106 is trying to acquire lock:
[  362.645251][ T4106] 7bff008a2eae6330 (sk_lock-AF_INET){+.+.}, at:
inet_csk_accept+0x370/0x45c
inet_csk_accept at net/ipv4/inet_connection_sock.c:497
[  362.653791][ T4106] 
[  362.653791][ T4106] but task is already holding lock:
[  362.661007][ T4106] c0ff008a2eae9430 (sk_lock-AF_INET){+.+.}, at:
inet_csk_accept+0x48/0x45c
inet_csk_accept at net/ipv4/inet_connection_sock.c:451
[  362.669452][ T4106] 
[  362.669452][ T4106] other info that might help us debug this:
[  362.677364][ T4106]  Possible unsafe locking scenario:
[  362.677364][ T4106] 
[  362.684666][ T4106]        CPU0
[  362.687801][ T4106]        ----
[  362.690937][ T4106]   lock(sk_lock-AF_INET);
[  362.695204][ T4106]   lock(sk_lock-AF_INET);
[  362.699472][ T4106] 
[  362.699472][ T4106]  *** DEADLOCK ***
[  362.699472][ T4106] 
[  362.707469][ T4106]  May be due to missing lock nesting notation
[  362.707469][ T4106] 
[  362.715643][ T4106] 1 lock held by sshd/4106:
[  362.719993][ T4106]  #0: c0ff008a2eae9430 (sk_lock-AF_INET){+.+.}, at:
inet_csk_accept+0x48/0x45c
[  362.728874][ T4106] 
[  362.728874][ T4106] stack backtrace:
[  362.734622][ T4106] CPU: 22 PID: 4106 Comm: sshd Tainted:
G             L    5.6.0-rc5-next-20200312+ #5
[  362.744096][ T4106] Hardware name: HPE Apollo
70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
[  362.754525][ T4106] Call trace:
[  362.757667][ T4106]  dump_backtrace+0x0/0x2c8
[  362.762022][ T4106]  show_stack+0x20/0x2c
[  362.766032][ T4106]  dump_stack+0xe8/0x150
[  362.770128][ T4106]  validate_chain+0x2f08/0x35e0
[  362.774830][ T4106]  __lock_acquire+0x868/0xc2c
[  362.779358][ T4106]  lock_acquire+0x320/0x360
[  362.783715][ T4106]  lock_sock_nested+0x9c/0xd8
[  362.788243][ T4106]  inet_csk_accept+0x370/0x45c
[  362.792861][ T4106]  inet_accept+0x80/0x1cc
[  362.797045][ T4106]  __sys_accept4_file+0x1b0/0x2bc
[  362.801921][ T4106]  __arm64_sys_accept+0x74/0xc8
[  362.806625][ T4106]  do_el0_svc+0x170/0x240
[  362.810807][ T4106]  el0_sync_handler+0x150/0x250
[  362.815509][ T4106]  el0_sync+0x164/0x180


> +
> +		/* The sk has not been accepted yet, no need to look at
> +		 * sk->sk_wmem_queued.
> +		 */
> +		amt = sk_mem_pages(newsk->sk_forward_alloc +
> +				   atomic_read(&sk->sk_rmem_alloc));
> +		mem_cgroup_sk_alloc(newsk);
> +		if (newsk->sk_memcg && amt)
> +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
> +
> +		release_sock(newsk);
> +	}
>  out:
>  	release_sock(sk);
>  	if (req)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] net: memcg: late association of sock to memcg
  2020-03-12 14:03   ` Qian Cai
@ 2020-03-12 14:05     ` Shakeel Butt
  0 siblings, 0 replies; 7+ messages in thread
From: Shakeel Butt @ 2020-03-12 14:05 UTC (permalink / raw)
  To: Qian Cai
  Cc: Eric Dumazet, Roman Gushchin, Johannes Weiner, Michal Hocko,
	Greg Thelen, Andrew Morton, David S . Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, netdev, Linux MM, Cgroups, LKML

On Thu, Mar 12, 2020 at 7:03 AM Qian Cai <cai@lca.pw> wrote:
>
> On Mon, 2020-03-09 at 22:16 -0700, Shakeel Butt wrote:
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index a4db79b1b643..65a3b2565102 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -482,6 +482,26 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
> >               }
> >               spin_unlock_bh(&queue->fastopenq.lock);
> >       }
> > +
> > +     if (mem_cgroup_sockets_enabled) {
> > +             int amt;
> > +
> > +             /* atomically get the memory usage, set and charge the
> > +              * sk->sk_memcg.
> > +              */
> > +             lock_sock(newsk);
>
> Here we have a deadlock,

It's a missing lockdep annotation. Eric already has a patch in
progress to fix this and another typo in the original patch.

>
> [  362.620977][ T4106] WARNING: possible recursive locking detected
> [  362.626983][ T4106] 5.6.0-rc5-next-20200312+ #5 Tainted: G             L
> [  362.633941][ T4106] --------------------------------------------
> [  362.639944][ T4106] sshd/4106 is trying to acquire lock:
> [  362.645251][ T4106] 7bff008a2eae6330 (sk_lock-AF_INET){+.+.}, at:
> inet_csk_accept+0x370/0x45c
> inet_csk_accept at net/ipv4/inet_connection_sock.c:497
> [  362.653791][ T4106]
> [  362.653791][ T4106] but task is already holding lock:
> [  362.661007][ T4106] c0ff008a2eae9430 (sk_lock-AF_INET){+.+.}, at:
> inet_csk_accept+0x48/0x45c
> inet_csk_accept at net/ipv4/inet_connection_sock.c:451
> [  362.669452][ T4106]
> [  362.669452][ T4106] other info that might help us debug this:
> [  362.677364][ T4106]  Possible unsafe locking scenario:
> [  362.677364][ T4106]
> [  362.684666][ T4106]        CPU0
> [  362.687801][ T4106]        ----
> [  362.690937][ T4106]   lock(sk_lock-AF_INET);
> [  362.695204][ T4106]   lock(sk_lock-AF_INET);
> [  362.699472][ T4106]
> [  362.699472][ T4106]  *** DEADLOCK ***
> [  362.699472][ T4106]
> [  362.707469][ T4106]  May be due to missing lock nesting notation
> [  362.707469][ T4106]
> [  362.715643][ T4106] 1 lock held by sshd/4106:
> [  362.719993][ T4106]  #0: c0ff008a2eae9430 (sk_lock-AF_INET){+.+.}, at:
> inet_csk_accept+0x48/0x45c
> [  362.728874][ T4106]
> [  362.728874][ T4106] stack backtrace:
> [  362.734622][ T4106] CPU: 22 PID: 4106 Comm: sshd Tainted:
> G             L    5.6.0-rc5-next-20200312+ #5
> [  362.744096][ T4106] Hardware name: HPE Apollo
> 70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
> [  362.754525][ T4106] Call trace:
> [  362.757667][ T4106]  dump_backtrace+0x0/0x2c8
> [  362.762022][ T4106]  show_stack+0x20/0x2c
> [  362.766032][ T4106]  dump_stack+0xe8/0x150
> [  362.770128][ T4106]  validate_chain+0x2f08/0x35e0
> [  362.774830][ T4106]  __lock_acquire+0x868/0xc2c
> [  362.779358][ T4106]  lock_acquire+0x320/0x360
> [  362.783715][ T4106]  lock_sock_nested+0x9c/0xd8
> [  362.788243][ T4106]  inet_csk_accept+0x370/0x45c
> [  362.792861][ T4106]  inet_accept+0x80/0x1cc
> [  362.797045][ T4106]  __sys_accept4_file+0x1b0/0x2bc
> [  362.801921][ T4106]  __arm64_sys_accept+0x74/0xc8
> [  362.806625][ T4106]  do_el0_svc+0x170/0x240
> [  362.810807][ T4106]  el0_sync_handler+0x150/0x250
> [  362.815509][ T4106]  el0_sync+0x164/0x180
>
>
> > +
> > +             /* The sk has not been accepted yet, no need to look at
> > +              * sk->sk_wmem_queued.
> > +              */
> > +             amt = sk_mem_pages(newsk->sk_forward_alloc +
> > +                                atomic_read(&sk->sk_rmem_alloc));
> > +             mem_cgroup_sk_alloc(newsk);
> > +             if (newsk->sk_memcg && amt)
> > +                     mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
> > +
> > +             release_sock(newsk);
> > +     }
> >  out:
> >       release_sock(sk);
> >       if (req)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-12 14:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200310051606.33121-1-shakeelb@google.com>
2020-03-10  5:16 ` [PATCH v4 2/2] net: memcg: late association of sock to memcg Shakeel Butt
2020-03-10 15:53   ` Eric Dumazet
2020-03-10 22:34   ` David Miller
2020-03-10 22:38   ` Roman Gushchin
2020-03-12 14:03   ` Qian Cai
2020-03-12 14:05     ` Shakeel Butt
2020-03-10 22:34 ` [PATCH v4 1/2] cgroup: memcg: net: do not associate sock with unrelated cgroup David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).