All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14  7:12 ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14  7:12 UTC (permalink / raw)
  To: Johannes Weiner, Eric Dumazet
  Cc: Greg Thelen, Michal Hocko, Vladimir Davydov, Andrew Morton,
	cgroups, linux-mm, Roman Gushchin, linux-kernel, Shakeel Butt

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated memcgs network
usage correlates with testing workload. On further inspection, it seems
like mem_cgroup_sk_alloc() is broken in irq context specially for
cgroup v1.

mem_cgroup_sk_alloc() can be called in irq context and kind
of assumes that it can only happen from sk_clone_lock() and the source
sock object has already associated memcg. However in cgroup v1, where
network memory accounting is opt-in, the source sock can be not
associated with any memcg and the new cloned sock can get associated
with unrelated interrupted memcg.

Cgroup v2 can also suffer if the source sock object was created by
process in the root memcg or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..f500da82bfe8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6697,6 +6697,10 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 		return;
 	}
 
+	/* Do not associate the sock with unrelated interrupted task's memcg. */
+	if (in_interrupt())
+		return;
+
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
 	if (memcg == root_mem_cgroup)
-- 
2.25.0.265.gbab2e86ba0-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14  7:12 ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14  7:12 UTC (permalink / raw)
  To: Johannes Weiner, Eric Dumazet
  Cc: Greg Thelen, Michal Hocko, Vladimir Davydov, Andrew Morton,
	cgroups, linux-mm, Roman Gushchin, linux-kernel, Shakeel Butt

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated memcgs network
usage correlates with testing workload. On further inspection, it seems
like mem_cgroup_sk_alloc() is broken in irq context specially for
cgroup v1.

mem_cgroup_sk_alloc() can be called in irq context and kind
of assumes that it can only happen from sk_clone_lock() and the source
sock object has already associated memcg. However in cgroup v1, where
network memory accounting is opt-in, the source sock can be not
associated with any memcg and the new cloned sock can get associated
with unrelated interrupted memcg.

Cgroup v2 can also suffer if the source sock object was created by
process in the root memcg or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..f500da82bfe8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6697,6 +6697,10 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 		return;
 	}
 
+	/* Do not associate the sock with unrelated interrupted task's memcg. */
+	if (in_interrupt())
+		return;
+
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
 	if (memcg == root_mem_cgroup)
-- 
2.25.0.265.gbab2e86ba0-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14 21:47   ` Roman Gushchin
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2020-02-14 21:47 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, cgroups, linux-mm, linux-kernel

Hello, Shakeel!

On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> We are testing network memory accounting in our setup and noticed
> inconsistent network memory usage and often unrelated memcgs network
> usage correlates with testing workload. On further inspection, it seems
> like mem_cgroup_sk_alloc() is broken in irq context specially for
> cgroup v1.

A great catch!

> 
> mem_cgroup_sk_alloc() can be called in irq context and kind
> of assumes that it can only happen from sk_clone_lock() and the source
> sock object has already associated memcg. However in cgroup v1, where
> network memory accounting is opt-in, the source sock can be not
> associated with any memcg and the new cloned sock can get associated
> with unrelated interrupted memcg.
> 
> Cgroup v2 can also suffer if the source sock object was created by
> process in the root memcg or if sk_alloc() is called in irq context.

Do you mind sharing a call trace?

Also, shouldn't cgroup_sk_alloc() be changed in a similar way?

Thanks!

Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14 21:47   ` Roman Gushchin
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2020-02-14 21:47 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello, Shakeel!

On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> We are testing network memory accounting in our setup and noticed
> inconsistent network memory usage and often unrelated memcgs network
> usage correlates with testing workload. On further inspection, it seems
> like mem_cgroup_sk_alloc() is broken in irq context specially for
> cgroup v1.

A great catch!

> 
> mem_cgroup_sk_alloc() can be called in irq context and kind
> of assumes that it can only happen from sk_clone_lock() and the source
> sock object has already associated memcg. However in cgroup v1, where
> network memory accounting is opt-in, the source sock can be not
> associated with any memcg and the new cloned sock can get associated
> with unrelated interrupted memcg.
> 
> Cgroup v2 can also suffer if the source sock object was created by
> process in the root memcg or if sk_alloc() is called in irq context.

Do you mind sharing a call trace?

Also, shouldn't cgroup_sk_alloc() be changed in a similar way?

Thanks!

Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
  2020-02-14 21:47   ` Roman Gushchin
  (?)
@ 2020-02-14 21:52     ` Shakeel Butt
  -1 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14 21:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML

On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote:
>
> Hello, Shakeel!
>
> On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > We are testing network memory accounting in our setup and noticed
> > inconsistent network memory usage and often unrelated memcgs network
> > usage correlates with testing workload. On further inspection, it seems
> > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > cgroup v1.
>
> A great catch!
>
> >
> > mem_cgroup_sk_alloc() can be called in irq context and kind
> > of assumes that it can only happen from sk_clone_lock() and the source
> > sock object has already associated memcg. However in cgroup v1, where
> > network memory accounting is opt-in, the source sock can be not
> > associated with any memcg and the new cloned sock can get associated
> > with unrelated interrupted memcg.
> >
> > Cgroup v2 can also suffer if the source sock object was created by
> > process in the root memcg or if sk_alloc() is called in irq context.
>
> Do you mind sharing a call trace?
>

Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().

[  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
 5.6.0-smp-DEV #1
[  647.255328] Hardware name: ...
[  647.255328] Call Trace:
[  647.255329]  <IRQ>
[  647.255333]  dump_stack+0x57/0x75
[  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
[  647.255337]  sk_clone_lock+0x2a7/0x420
[  647.255339]  inet_csk_clone_lock+0x1b/0x110
[  647.255340]  tcp_create_openreq_child+0x23/0x3b0
[  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
[  647.255343]  tcp_check_req+0x429/0x560
[  647.255345]  tcp_v6_rcv+0x72d/0xa40
[  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
[  647.255348]  ip6_input+0x44/0xd0
[  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
[  647.255350]  ip6_rcv_finish+0x71/0x80
[  647.255351]  ipv6_rcv+0x5b/0xe0
[  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
[  647.255354]  process_backlog+0x108/0x1e0
[  647.255355]  net_rx_action+0x26b/0x460
[  647.255357]  __do_softirq+0x104/0x2a6
[  647.255358]  do_softirq_own_stack+0x2a/0x40
[  647.255359]  </IRQ>
[  647.255361]  do_softirq.part.19+0x40/0x50
[  647.255362]  __local_bh_enable_ip+0x51/0x60
[  647.255363]  ip6_finish_output2+0x23d/0x520
[  647.255365]  ? ip6table_mangle_hook+0x55/0x160
[  647.255366]  __ip6_finish_output+0xa1/0x100
[  647.255367]  ip6_finish_output+0x30/0xd0
[  647.255368]  ip6_output+0x73/0x120
[  647.255369]  ? __ip6_finish_output+0x100/0x100
[  647.255370]  ip6_xmit+0x2e3/0x600
[  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
[  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
[  647.255374]  ? skb_free_head+0x1e/0x30
[  647.255375]  inet6_csk_xmit+0x95/0xf0
[  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
[  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
[  647.255379]  tcp_send_ack+0x1d/0x20
[  647.255380]  tcp_rcv_state_process+0xe64/0xe80
[  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
[  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255385]  __release_sock+0x7f/0xd0
[  647.255386]  release_sock+0x30/0xa0
[  647.255388]  __inet_stream_connect+0x1c3/0x3b0
[  647.255390]  ? prepare_to_wait+0xb0/0xb0
[  647.255391]  inet_stream_connect+0x3b/0x60
[  647.255394]  __sys_connect+0x101/0x120
[  647.255395]  ? __sys_getsockopt+0x11b/0x140
[  647.255397]  __x64_sys_connect+0x1a/0x20
[  647.255398]  do_syscall_64+0x51/0x200
[  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  647.255401] RIP: 0033:0x7f45464fcd50

> Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
>

I will check cgroup_sk_alloc() too.

Thanks,
Shakeel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14 21:52     ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14 21:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML

On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote:
>
> Hello, Shakeel!
>
> On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > We are testing network memory accounting in our setup and noticed
> > inconsistent network memory usage and often unrelated memcgs network
> > usage correlates with testing workload. On further inspection, it seems
> > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > cgroup v1.
>
> A great catch!
>
> >
> > mem_cgroup_sk_alloc() can be called in irq context and kind
> > of assumes that it can only happen from sk_clone_lock() and the source
> > sock object has already associated memcg. However in cgroup v1, where
> > network memory accounting is opt-in, the source sock can be not
> > associated with any memcg and the new cloned sock can get associated
> > with unrelated interrupted memcg.
> >
> > Cgroup v2 can also suffer if the source sock object was created by
> > process in the root memcg or if sk_alloc() is called in irq context.
>
> Do you mind sharing a call trace?
>

Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().

[  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
 5.6.0-smp-DEV #1
[  647.255328] Hardware name: ...
[  647.255328] Call Trace:
[  647.255329]  <IRQ>
[  647.255333]  dump_stack+0x57/0x75
[  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
[  647.255337]  sk_clone_lock+0x2a7/0x420
[  647.255339]  inet_csk_clone_lock+0x1b/0x110
[  647.255340]  tcp_create_openreq_child+0x23/0x3b0
[  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
[  647.255343]  tcp_check_req+0x429/0x560
[  647.255345]  tcp_v6_rcv+0x72d/0xa40
[  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
[  647.255348]  ip6_input+0x44/0xd0
[  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
[  647.255350]  ip6_rcv_finish+0x71/0x80
[  647.255351]  ipv6_rcv+0x5b/0xe0
[  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
[  647.255354]  process_backlog+0x108/0x1e0
[  647.255355]  net_rx_action+0x26b/0x460
[  647.255357]  __do_softirq+0x104/0x2a6
[  647.255358]  do_softirq_own_stack+0x2a/0x40
[  647.255359]  </IRQ>
[  647.255361]  do_softirq.part.19+0x40/0x50
[  647.255362]  __local_bh_enable_ip+0x51/0x60
[  647.255363]  ip6_finish_output2+0x23d/0x520
[  647.255365]  ? ip6table_mangle_hook+0x55/0x160
[  647.255366]  __ip6_finish_output+0xa1/0x100
[  647.255367]  ip6_finish_output+0x30/0xd0
[  647.255368]  ip6_output+0x73/0x120
[  647.255369]  ? __ip6_finish_output+0x100/0x100
[  647.255370]  ip6_xmit+0x2e3/0x600
[  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
[  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
[  647.255374]  ? skb_free_head+0x1e/0x30
[  647.255375]  inet6_csk_xmit+0x95/0xf0
[  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
[  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
[  647.255379]  tcp_send_ack+0x1d/0x20
[  647.255380]  tcp_rcv_state_process+0xe64/0xe80
[  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
[  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255385]  __release_sock+0x7f/0xd0
[  647.255386]  release_sock+0x30/0xa0
[  647.255388]  __inet_stream_connect+0x1c3/0x3b0
[  647.255390]  ? prepare_to_wait+0xb0/0xb0
[  647.255391]  inet_stream_connect+0x3b/0x60
[  647.255394]  __sys_connect+0x101/0x120
[  647.255395]  ? __sys_getsockopt+0x11b/0x140
[  647.255397]  __x64_sys_connect+0x1a/0x20
[  647.255398]  do_syscall_64+0x51/0x200
[  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  647.255401] RIP: 0033:0x7f45464fcd50

> Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
>

I will check cgroup_sk_alloc() too.

Thanks,
Shakeel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14 21:52     ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14 21:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML

On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
>
> Hello, Shakeel!
>
> On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > We are testing network memory accounting in our setup and noticed
> > inconsistent network memory usage and often unrelated memcgs network
> > usage correlates with testing workload. On further inspection, it seems
> > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > cgroup v1.
>
> A great catch!
>
> >
> > mem_cgroup_sk_alloc() can be called in irq context and kind
> > of assumes that it can only happen from sk_clone_lock() and the source
> > sock object has already associated memcg. However in cgroup v1, where
> > network memory accounting is opt-in, the source sock can be not
> > associated with any memcg and the new cloned sock can get associated
> > with unrelated interrupted memcg.
> >
> > Cgroup v2 can also suffer if the source sock object was created by
> > process in the root memcg or if sk_alloc() is called in irq context.
>
> Do you mind sharing a call trace?
>

Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().

[  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
 5.6.0-smp-DEV #1
[  647.255328] Hardware name: ...
[  647.255328] Call Trace:
[  647.255329]  <IRQ>
[  647.255333]  dump_stack+0x57/0x75
[  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
[  647.255337]  sk_clone_lock+0x2a7/0x420
[  647.255339]  inet_csk_clone_lock+0x1b/0x110
[  647.255340]  tcp_create_openreq_child+0x23/0x3b0
[  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
[  647.255343]  tcp_check_req+0x429/0x560
[  647.255345]  tcp_v6_rcv+0x72d/0xa40
[  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
[  647.255348]  ip6_input+0x44/0xd0
[  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
[  647.255350]  ip6_rcv_finish+0x71/0x80
[  647.255351]  ipv6_rcv+0x5b/0xe0
[  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
[  647.255354]  process_backlog+0x108/0x1e0
[  647.255355]  net_rx_action+0x26b/0x460
[  647.255357]  __do_softirq+0x104/0x2a6
[  647.255358]  do_softirq_own_stack+0x2a/0x40
[  647.255359]  </IRQ>
[  647.255361]  do_softirq.part.19+0x40/0x50
[  647.255362]  __local_bh_enable_ip+0x51/0x60
[  647.255363]  ip6_finish_output2+0x23d/0x520
[  647.255365]  ? ip6table_mangle_hook+0x55/0x160
[  647.255366]  __ip6_finish_output+0xa1/0x100
[  647.255367]  ip6_finish_output+0x30/0xd0
[  647.255368]  ip6_output+0x73/0x120
[  647.255369]  ? __ip6_finish_output+0x100/0x100
[  647.255370]  ip6_xmit+0x2e3/0x600
[  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
[  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
[  647.255374]  ? skb_free_head+0x1e/0x30
[  647.255375]  inet6_csk_xmit+0x95/0xf0
[  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
[  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
[  647.255379]  tcp_send_ack+0x1d/0x20
[  647.255380]  tcp_rcv_state_process+0xe64/0xe80
[  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
[  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
[  647.255385]  __release_sock+0x7f/0xd0
[  647.255386]  release_sock+0x30/0xa0
[  647.255388]  __inet_stream_connect+0x1c3/0x3b0
[  647.255390]  ? prepare_to_wait+0xb0/0xb0
[  647.255391]  inet_stream_connect+0x3b/0x60
[  647.255394]  __sys_connect+0x101/0x120
[  647.255395]  ? __sys_getsockopt+0x11b/0x140
[  647.255397]  __x64_sys_connect+0x1a/0x20
[  647.255398]  do_syscall_64+0x51/0x200
[  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  647.255401] RIP: 0033:0x7f45464fcd50

> Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
>

I will check cgroup_sk_alloc() too.

Thanks,
Shakeel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
  2020-02-14 21:52     ` Shakeel Butt
  (?)
@ 2020-02-14 22:09       ` Shakeel Butt
  -1 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14 22:09 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML

On Fri, Feb 14, 2020 at 1:52 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Hello, Shakeel!
> >
> > On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > > We are testing network memory accounting in our setup and noticed
> > > inconsistent network memory usage and often unrelated memcgs network
> > > usage correlates with testing workload. On further inspection, it seems
> > > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > > cgroup v1.
> >
> > A great catch!
> >
> > >
> > > mem_cgroup_sk_alloc() can be called in irq context and kind
> > > of assumes that it can only happen from sk_clone_lock() and the source
> > > sock object has already associated memcg. However in cgroup v1, where
> > > network memory accounting is opt-in, the source sock can be not
> > > associated with any memcg and the new cloned sock can get associated
> > > with unrelated interrupted memcg.
> > >
> > > Cgroup v2 can also suffer if the source sock object was created by
> > > process in the root memcg or if sk_alloc() is called in irq context.
> >
> > Do you mind sharing a call trace?
> >
>
> Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().
>
> [  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
>  5.6.0-smp-DEV #1
> [  647.255328] Hardware name: ...
> [  647.255328] Call Trace:
> [  647.255329]  <IRQ>
> [  647.255333]  dump_stack+0x57/0x75
> [  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
> [  647.255337]  sk_clone_lock+0x2a7/0x420
> [  647.255339]  inet_csk_clone_lock+0x1b/0x110
> [  647.255340]  tcp_create_openreq_child+0x23/0x3b0
> [  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
> [  647.255343]  tcp_check_req+0x429/0x560
> [  647.255345]  tcp_v6_rcv+0x72d/0xa40
> [  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
> [  647.255348]  ip6_input+0x44/0xd0
> [  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
> [  647.255350]  ip6_rcv_finish+0x71/0x80
> [  647.255351]  ipv6_rcv+0x5b/0xe0
> [  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
> [  647.255354]  process_backlog+0x108/0x1e0
> [  647.255355]  net_rx_action+0x26b/0x460
> [  647.255357]  __do_softirq+0x104/0x2a6
> [  647.255358]  do_softirq_own_stack+0x2a/0x40
> [  647.255359]  </IRQ>
> [  647.255361]  do_softirq.part.19+0x40/0x50
> [  647.255362]  __local_bh_enable_ip+0x51/0x60
> [  647.255363]  ip6_finish_output2+0x23d/0x520
> [  647.255365]  ? ip6table_mangle_hook+0x55/0x160
> [  647.255366]  __ip6_finish_output+0xa1/0x100
> [  647.255367]  ip6_finish_output+0x30/0xd0
> [  647.255368]  ip6_output+0x73/0x120
> [  647.255369]  ? __ip6_finish_output+0x100/0x100
> [  647.255370]  ip6_xmit+0x2e3/0x600
> [  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
> [  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
> [  647.255374]  ? skb_free_head+0x1e/0x30
> [  647.255375]  inet6_csk_xmit+0x95/0xf0
> [  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
> [  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
> [  647.255379]  tcp_send_ack+0x1d/0x20
> [  647.255380]  tcp_rcv_state_process+0xe64/0xe80
> [  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
> [  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255385]  __release_sock+0x7f/0xd0
> [  647.255386]  release_sock+0x30/0xa0
> [  647.255388]  __inet_stream_connect+0x1c3/0x3b0
> [  647.255390]  ? prepare_to_wait+0xb0/0xb0
> [  647.255391]  inet_stream_connect+0x3b/0x60
> [  647.255394]  __sys_connect+0x101/0x120
> [  647.255395]  ? __sys_getsockopt+0x11b/0x140
> [  647.255397]  __x64_sys_connect+0x1a/0x20
> [  647.255398]  do_syscall_64+0x51/0x200
> [  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  647.255401] RIP: 0033:0x7f45464fcd50
>
> > Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
> >
>
> I will check cgroup_sk_alloc() too.
>

Yes, cgroup_sk_alloc() should be changed similarly too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14 22:09       ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14 22:09 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML

On Fri, Feb 14, 2020 at 1:52 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Hello, Shakeel!
> >
> > On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > > We are testing network memory accounting in our setup and noticed
> > > inconsistent network memory usage and often unrelated memcgs network
> > > usage correlates with testing workload. On further inspection, it seems
> > > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > > cgroup v1.
> >
> > A great catch!
> >
> > >
> > > mem_cgroup_sk_alloc() can be called in irq context and kind
> > > of assumes that it can only happen from sk_clone_lock() and the source
> > > sock object has already associated memcg. However in cgroup v1, where
> > > network memory accounting is opt-in, the source sock can be not
> > > associated with any memcg and the new cloned sock can get associated
> > > with unrelated interrupted memcg.
> > >
> > > Cgroup v2 can also suffer if the source sock object was created by
> > > process in the root memcg or if sk_alloc() is called in irq context.
> >
> > Do you mind sharing a call trace?
> >
>
> Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().
>
> [  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
>  5.6.0-smp-DEV #1
> [  647.255328] Hardware name: ...
> [  647.255328] Call Trace:
> [  647.255329]  <IRQ>
> [  647.255333]  dump_stack+0x57/0x75
> [  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
> [  647.255337]  sk_clone_lock+0x2a7/0x420
> [  647.255339]  inet_csk_clone_lock+0x1b/0x110
> [  647.255340]  tcp_create_openreq_child+0x23/0x3b0
> [  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
> [  647.255343]  tcp_check_req+0x429/0x560
> [  647.255345]  tcp_v6_rcv+0x72d/0xa40
> [  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
> [  647.255348]  ip6_input+0x44/0xd0
> [  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
> [  647.255350]  ip6_rcv_finish+0x71/0x80
> [  647.255351]  ipv6_rcv+0x5b/0xe0
> [  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
> [  647.255354]  process_backlog+0x108/0x1e0
> [  647.255355]  net_rx_action+0x26b/0x460
> [  647.255357]  __do_softirq+0x104/0x2a6
> [  647.255358]  do_softirq_own_stack+0x2a/0x40
> [  647.255359]  </IRQ>
> [  647.255361]  do_softirq.part.19+0x40/0x50
> [  647.255362]  __local_bh_enable_ip+0x51/0x60
> [  647.255363]  ip6_finish_output2+0x23d/0x520
> [  647.255365]  ? ip6table_mangle_hook+0x55/0x160
> [  647.255366]  __ip6_finish_output+0xa1/0x100
> [  647.255367]  ip6_finish_output+0x30/0xd0
> [  647.255368]  ip6_output+0x73/0x120
> [  647.255369]  ? __ip6_finish_output+0x100/0x100
> [  647.255370]  ip6_xmit+0x2e3/0x600
> [  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
> [  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
> [  647.255374]  ? skb_free_head+0x1e/0x30
> [  647.255375]  inet6_csk_xmit+0x95/0xf0
> [  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
> [  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
> [  647.255379]  tcp_send_ack+0x1d/0x20
> [  647.255380]  tcp_rcv_state_process+0xe64/0xe80
> [  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
> [  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255385]  __release_sock+0x7f/0xd0
> [  647.255386]  release_sock+0x30/0xa0
> [  647.255388]  __inet_stream_connect+0x1c3/0x3b0
> [  647.255390]  ? prepare_to_wait+0xb0/0xb0
> [  647.255391]  inet_stream_connect+0x3b/0x60
> [  647.255394]  __sys_connect+0x101/0x120
> [  647.255395]  ? __sys_getsockopt+0x11b/0x140
> [  647.255397]  __x64_sys_connect+0x1a/0x20
> [  647.255398]  do_syscall_64+0x51/0x200
> [  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  647.255401] RIP: 0033:0x7f45464fcd50
>
> > Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
> >
>
> I will check cgroup_sk_alloc() too.
>

Yes, cgroup_sk_alloc() should be changed similarly too.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] memcg: net: do not associate sock with unrelated memcg
@ 2020-02-14 22:09       ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2020-02-14 22:09 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Eric Dumazet, Greg Thelen, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Cgroups, Linux MM, LKML

On Fri, Feb 14, 2020 at 1:52 PM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Fri, Feb 14, 2020 at 1:47 PM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
> >
> > Hello, Shakeel!
> >
> > On Thu, Feb 13, 2020 at 11:12:33PM -0800, Shakeel Butt wrote:
> > > We are testing network memory accounting in our setup and noticed
> > > inconsistent network memory usage and often unrelated memcgs network
> > > usage correlates with testing workload. On further inspection, it seems
> > > like mem_cgroup_sk_alloc() is broken in irq context specially for
> > > cgroup v1.
> >
> > A great catch!
> >
> > >
> > > mem_cgroup_sk_alloc() can be called in irq context and kind
> > > of assumes that it can only happen from sk_clone_lock() and the source
> > > sock object has already associated memcg. However in cgroup v1, where
> > > network memory accounting is opt-in, the source sock can be not
> > > associated with any memcg and the new cloned sock can get associated
> > > with unrelated interrupted memcg.
> > >
> > > Cgroup v2 can also suffer if the source sock object was created by
> > > process in the root memcg or if sk_alloc() is called in irq context.
> >
> > Do you mind sharing a call trace?
> >
>
> Sure, see below. I added a dump_stack() in mem_cgroup_sk_alloc().
>
> [  647.255327] CPU: 68 PID: 15859 Comm: ssh Tainted: G           O
>  5.6.0-smp-DEV #1
> [  647.255328] Hardware name: ...
> [  647.255328] Call Trace:
> [  647.255329]  <IRQ>
> [  647.255333]  dump_stack+0x57/0x75
> [  647.255336]  mem_cgroup_sk_alloc+0xe9/0xf0
> [  647.255337]  sk_clone_lock+0x2a7/0x420
> [  647.255339]  inet_csk_clone_lock+0x1b/0x110
> [  647.255340]  tcp_create_openreq_child+0x23/0x3b0
> [  647.255342]  tcp_v6_syn_recv_sock+0x88/0x730
> [  647.255343]  tcp_check_req+0x429/0x560
> [  647.255345]  tcp_v6_rcv+0x72d/0xa40
> [  647.255347]  ip6_protocol_deliver_rcu+0xc9/0x400
> [  647.255348]  ip6_input+0x44/0xd0
> [  647.255349]  ? ip6_protocol_deliver_rcu+0x400/0x400
> [  647.255350]  ip6_rcv_finish+0x71/0x80
> [  647.255351]  ipv6_rcv+0x5b/0xe0
> [  647.255352]  ? ip6_sublist_rcv+0x2e0/0x2e0
> [  647.255354]  process_backlog+0x108/0x1e0
> [  647.255355]  net_rx_action+0x26b/0x460
> [  647.255357]  __do_softirq+0x104/0x2a6
> [  647.255358]  do_softirq_own_stack+0x2a/0x40
> [  647.255359]  </IRQ>
> [  647.255361]  do_softirq.part.19+0x40/0x50
> [  647.255362]  __local_bh_enable_ip+0x51/0x60
> [  647.255363]  ip6_finish_output2+0x23d/0x520
> [  647.255365]  ? ip6table_mangle_hook+0x55/0x160
> [  647.255366]  __ip6_finish_output+0xa1/0x100
> [  647.255367]  ip6_finish_output+0x30/0xd0
> [  647.255368]  ip6_output+0x73/0x120
> [  647.255369]  ? __ip6_finish_output+0x100/0x100
> [  647.255370]  ip6_xmit+0x2e3/0x600
> [  647.255372]  ? ipv6_anycast_cleanup+0x50/0x50
> [  647.255373]  ? inet6_csk_route_socket+0x136/0x1e0
> [  647.255374]  ? skb_free_head+0x1e/0x30
> [  647.255375]  inet6_csk_xmit+0x95/0xf0
> [  647.255377]  __tcp_transmit_skb+0x5b4/0xb20
> [  647.255378]  __tcp_send_ack.part.60+0xa3/0x110
> [  647.255379]  tcp_send_ack+0x1d/0x20
> [  647.255380]  tcp_rcv_state_process+0xe64/0xe80
> [  647.255381]  ? tcp_v6_connect+0x5d1/0x5f0
> [  647.255383]  tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255384]  ? tcp_v6_do_rcv+0x1b1/0x3f0
> [  647.255385]  __release_sock+0x7f/0xd0
> [  647.255386]  release_sock+0x30/0xa0
> [  647.255388]  __inet_stream_connect+0x1c3/0x3b0
> [  647.255390]  ? prepare_to_wait+0xb0/0xb0
> [  647.255391]  inet_stream_connect+0x3b/0x60
> [  647.255394]  __sys_connect+0x101/0x120
> [  647.255395]  ? __sys_getsockopt+0x11b/0x140
> [  647.255397]  __x64_sys_connect+0x1a/0x20
> [  647.255398]  do_syscall_64+0x51/0x200
> [  647.255399]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  647.255401] RIP: 0033:0x7f45464fcd50
>
> > Also, shouldn't cgroup_sk_alloc() be changed in a similar way?
> >
>
> I will check cgroup_sk_alloc() too.
>

Yes, cgroup_sk_alloc() should be changed similarly too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-14 22:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14  7:12 [PATCH] memcg: net: do not associate sock with unrelated memcg Shakeel Butt
2020-02-14  7:12 ` Shakeel Butt
2020-02-14 21:47 ` Roman Gushchin
2020-02-14 21:47   ` Roman Gushchin
2020-02-14 21:52   ` Shakeel Butt
2020-02-14 21:52     ` Shakeel Butt
2020-02-14 21:52     ` Shakeel Butt
2020-02-14 22:09     ` Shakeel Butt
2020-02-14 22:09       ` Shakeel Butt
2020-02-14 22:09       ` Shakeel Butt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.