All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes.
@ 2012-03-29  7:01 KAMEZAWA Hiroyuki
  2012-03-29  7:03 ` [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup KAMEZAWA Hiroyuki
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  7:01 UTC (permalink / raw)
  To: Glauber Costa, netdev; +Cc: David Miller, Andrew Morton, kamezawa.hiroyu

I'm very sorry if you received this e-mail twice.
==
This series is 3 bugfixes for memcg's kmem.tcp memory controller.
Maybe this should go via network tree.
(CC akpm for noticing an ugly change in res_counter.)

All patches are generated onto today linus's git tree.

Brief description:

Patch  1/3 .... tcp memcontrol doesn't see memcg's use_hierarchy value. Fix it.

Patch  2/3 and 3/3 .... 
                Because tcp memcontrol doesn't do any accounting when limit=RESOUCE_MAX,
	        there will be account leakage when limit is changed. This can trigger
                WARN_ON() in res_counter which checks usage >= 0.

Patch 2/3  .... don't call static_key_slow_dec(&memcg_socket_limit_enabled) until
                a cgroup under accounted is destroyed.

Patch 3/3  .... add res_counter_uncharge_nowarn() to ignore leakage.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup
  2012-03-29  7:01 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
@ 2012-03-29  7:03 ` KAMEZAWA Hiroyuki
  2012-03-29  9:14   ` Glauber Costa
  2012-03-29  7:07 ` [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit KAMEZAWA Hiroyuki
  2012-03-29  7:10 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  7:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Glauber Costa, netdev, David Miller, Andrew Morton

 
Now, tcp memory control cgroup ignores memcg's use_hierarchy value
and act as use_hierarchy=1 always. After this patch, tcp memcontrol will
work as memcg is designed.

Note:
   I know there is a discussion to remove use_hierarchy but this is BUG, now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |    3 +++
 mm/memcontrol.c            |    5 +++++
 net/ipv4/tcp_memcontrol.c  |    2 +-
 3 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f94efd2..e116b7c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -199,6 +199,9 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 bool mem_cgroup_bad_page_check(struct page *page);
 void mem_cgroup_print_bad_page(struct page *page);
 #endif
+
+bool mem_cgroup_use_hierarchy(struct mem_cgroup *memcg);
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7d698df..467881f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -867,6 +867,11 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
 	return memcg;
 }
 
+bool mem_cgroup_use_hierarchy(struct mem_cgroup *memcg)
+{
+	return memcg->use_hierarchy;
+}
+
 /**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index e795272..32764a6 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -75,7 +75,7 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	tcp->tcp_memory_pressure = 0;
 
 	parent_cg = tcp_prot.proto_cgroup(parent);
-	if (parent_cg)
+	if (parent_cg && mem_cgroup_use_hierarchy(parent))
 		res_parent = parent_cg->memory_allocated;
 
 	res_counter_init(&tcp->tcp_memory_allocated, res_parent);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit
  2012-03-29  7:01 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
  2012-03-29  7:03 ` [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup KAMEZAWA Hiroyuki
@ 2012-03-29  7:07 ` KAMEZAWA Hiroyuki
  2012-03-29 10:58   ` Glauber Costa
  2012-03-29  7:10 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  7:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Glauber Costa, netdev, David Miller, Andrew Morton

tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
But it's buggy now.

For example, do following
 # while sleep 1;do
   echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
   echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
   done

and run network application under A. tcp's usage is sometimes accounted
and sometimes not accounted because of frequent changes of static_branch.
Then, finally, you can see broken tcp.usage_in_bytes.
WARN_ON() is printed because res_counter->usage goes below 0.
==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
 <snip>
kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==

This patch removes static_branch_slow_dec() at changing res_counter's
limit to RESOUCE_MAX. By this, once accounting started, the accountting
will continue until the tcp cgroup is destroyed.

I think this will not be problem in real use.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/net/tcp_memcontrol.h |    1 +
 net/ipv4/tcp_memcontrol.c    |   24 ++++++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 48410ff..f47e3c7 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -9,6 +9,7 @@ struct tcp_memcontrol {
 	/* those two are read-mostly, leave them at the end */
 	long tcp_prot_mem[3];
 	int tcp_memory_pressure;
+	bool accounting;
 };
 
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 32764a6..cd0b47d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -49,6 +49,20 @@ static void memcg_tcp_enter_memory_pressure(struct sock *sk)
 }
 EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure);
 
+static void tcp_start_accounting(struct tcp_memcontrol *tcp)
+{
+	if (tcp->accounting)
+		return;
+	tcp->accounting = true;
+	static_key_slow_inc(&memcg_socket_limit_enabled);
+}
+
+static void tcp_end_accounting(struct tcp_memcontrol *tcp)
+{
+	if (tcp->accounting)
+		static_key_slow_dec(&memcg_socket_limit_enabled);
+}
+
 int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 {
 	/*
@@ -73,6 +87,7 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
 	tcp->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1];
 	tcp->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2];
 	tcp->tcp_memory_pressure = 0;
+	tcp->accounting = false;
 
 	parent_cg = tcp_prot.proto_cgroup(parent);
 	if (parent_cg && mem_cgroup_use_hierarchy(parent))
@@ -110,8 +125,7 @@ void tcp_destroy_cgroup(struct cgroup *cgrp)
 
 	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
 
-	if (val != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
+	tcp_end_accounting(tcp);
 }
 EXPORT_SYMBOL(tcp_destroy_cgroup);
 
@@ -142,10 +156,8 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
 		tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
 					     net->ipv4.sysctl_tcp_mem[i]);
 
-	if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
-	else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
-		static_key_slow_inc(&memcg_socket_limit_enabled);
+	if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
+		tcp_start_accounting(tcp);
 
 	return 0;
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started
  2012-03-29  7:01 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
  2012-03-29  7:03 ` [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup KAMEZAWA Hiroyuki
  2012-03-29  7:07 ` [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit KAMEZAWA Hiroyuki
@ 2012-03-29  7:10 ` KAMEZAWA Hiroyuki
  2012-03-29  9:21   ` Glauber Costa
  2 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  7:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Glauber Costa, netdev, David Miller, Andrew Morton

tcp memcontrol starts accouting after res->limit is set. So, if a sockets
starts before setting res->limit, there are already used resource.
After setting res->limit, the resource (already used) will be uncharged and
make res_counter below 0 because they are not charged. This causes warning.

This patch fixes that by adding res_counter_uncharge_nowarn().
(*) We cannot avoid this while we have 'account start' switch.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    2 ++
 include/net/sock.h          |    3 ++-
 kernel/res_counter.c        |   18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..e081948 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -134,6 +134,8 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
 
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+				unsigned long val);
 
 /**
  * res_counter_margin - calculate chargeable space of a counter
diff --git a/include/net/sock.h b/include/net/sock.h
index a6ba1f8..a1b3f4802 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1048,7 +1048,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
 					      unsigned long amt)
 {
-	res_counter_uncharge(prot->memory_allocated, amt << PAGE_SHIFT);
+	res_counter_uncharge_nowarn(prot->memory_allocated,
+				amt << PAGE_SHIFT);
 }
 
 static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..2bb01ac 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,24 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+					unsigned long val)
+{
+	struct res_counter *c;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (c = counter; c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->usage < val)
+			val = c->usage;
+		res_counter_uncharge_locked(c, val);
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+}
+
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup
  2012-03-29  7:03 ` [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup KAMEZAWA Hiroyuki
@ 2012-03-29  9:14   ` Glauber Costa
  2012-03-29  9:16     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-03-29  9:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 03/29/2012 09:03 AM, KAMEZAWA Hiroyuki wrote:
> 
> Now, tcp memory control cgroup ignores memcg's use_hierarchy value
> and act as use_hierarchy=1 always. After this patch, tcp memcontrol will
> work as memcg is designed.
> 
> Note:
>     I know there is a discussion to remove use_hierarchy but this is BUG, now.
> 

Kame,

Are you sure about that?

I just tried it myself, and it seems to work:

root@inf5072-11:~/glommer-temporary/a/b# cat memory.kmem.tcp.usage_in_bytes
724992
root@inf5072-11:~/glommer-temporary/a/b# cat
../memory.kmem.tcp.usage_in_bytes
0


Did you got this conclusion through testing or code inspection?

As a matter of fact, that's why I believe the current behavior is indeed
correct:

the res_counter is initialized as:

        parent_cg = tcp_prot.proto_cgroup(parent);
        if (parent_cg)
                res_parent = parent_cg->memory_allocated;

        res_counter_init(&tcp->tcp_memory_allocated, res_parent);

now, parent is drawn from parent_mem_cgroup(), that reads as follows:

struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
{
        if (!memcg->res.parent)
                return NULL;
        return mem_cgroup_from_res_counter(memcg->res.parent, res);
}


so if we have use_hierarchy = 0, res.parent should be NULL (because that
is the way we initialize it)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup
  2012-03-29  9:14   ` Glauber Costa
@ 2012-03-29  9:16     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  9:16 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

(2012/03/29 18:14), Glauber Costa wrote:

> On 03/29/2012 09:03 AM, KAMEZAWA Hiroyuki wrote:
>>
>> Now, tcp memory control cgroup ignores memcg's use_hierarchy value
>> and act as use_hierarchy=1 always. After this patch, tcp memcontrol will
>> work as memcg is designed.
>>
>> Note:
>>     I know there is a discussion to remove use_hierarchy but this is BUG, now.
>>
> 
> Kame,
> 
> Are you sure about that?
> 
> I just tried it myself, and it seems to work:
> 
> root@inf5072-11:~/glommer-temporary/a/b# cat memory.kmem.tcp.usage_in_bytes
> 724992
> root@inf5072-11:~/glommer-temporary/a/b# cat
> ../memory.kmem.tcp.usage_in_bytes
> 0
> 
> 
> Did you got this conclusion through testing or code inspection?
> 
> As a matter of fact, that's why I believe the current behavior is indeed
> correct:
> 
> the res_counter is initialized as:
> 
>         parent_cg = tcp_prot.proto_cgroup(parent);
>         if (parent_cg)
>                 res_parent = parent_cg->memory_allocated;
> 
>         res_counter_init(&tcp->tcp_memory_allocated, res_parent);
> 
> now, parent is drawn from parent_mem_cgroup(), that reads as follows:
> 
> struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
> {
>         if (!memcg->res.parent)
>                 return NULL;
>         return mem_cgroup_from_res_counter(memcg->res.parent, res);
> }
> 
> 
> so if we have use_hierarchy = 0, res.parent should be NULL (because that
> is the way we initialize it)
> 


Ah, sorry. you're right. please forget this patch.

To be honest, I wrote this after seeing WARN_ON in patch 2 and 3 and
misunderstood the code at writing patch 2 and 3.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started
  2012-03-29  7:10 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
@ 2012-03-29  9:21   ` Glauber Costa
  2012-03-30  1:44     ` [PATCH] memcg/tcp: fix warning caused b res->usage go to negative KAMEZAWA Hiroyuki
  2012-04-02  3:41     ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started David Miller
  0 siblings, 2 replies; 26+ messages in thread
From: Glauber Costa @ 2012-03-29  9:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 03/29/2012 09:10 AM, KAMEZAWA Hiroyuki wrote:
> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
> starts before setting res->limit, there are already used resource.
> After setting res->limit, the resource (already used) will be uncharged and
> make res_counter below 0 because they are not charged. This causes warning.
> 
> This patch fixes that by adding res_counter_uncharge_nowarn().
> (*) We cannot avoid this while we have 'account start' switch.
> 
> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>

Fine by me.

Acked-by: Glauber Costa <glommer@parallels.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit
  2012-03-29  7:07 ` [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit KAMEZAWA Hiroyuki
@ 2012-03-29 10:58   ` Glauber Costa
  2012-03-29 23:51     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-03-29 10:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 03/29/2012 09:07 AM, KAMEZAWA Hiroyuki wrote:
> tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
> If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
> But it's buggy now.
> 
> For example, do following
>   # while sleep 1;do
>     echo 9223372036854775807>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>     echo 300M>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>     done
> 
> and run network application under A. tcp's usage is sometimes accounted
> and sometimes not accounted because of frequent changes of static_branch.
> Then, finally, you can see broken tcp.usage_in_bytes.
> WARN_ON() is printed because res_counter->usage goes below 0.
> ==
> kernel: ------------[ cut here ]----------
> kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
>   <snip>
> kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
> kernel: Call Trace:
> kernel:<IRQ>   [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
> kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
> kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
> kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
> ...
> ==
> 
> This patch removes static_branch_slow_dec() at changing res_counter's
> limit to RESOUCE_MAX. By this, once accounting started, the accountting
> will continue until the tcp cgroup is destroyed.
> 
> I think this will not be problem in real use.
> 

So...

Are the warnings still there if you have your other patch in this series?
Maybe what we should do is, flush the resource counters so they go back
to 0 besides decrementing the static branch. This way we get a more
consistent behavior.

Another thing to keep in mind, is that the static branch will only be
inactive if we turn off *all* controllers. You see this happening
because you are only testing with one.
So even if we go to the route you're proposing, we could probably try
doing something on the
global level, instead of a per-memcg boolean flat.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit
  2012-03-29 10:58   ` Glauber Costa
@ 2012-03-29 23:51     ` KAMEZAWA Hiroyuki
  2012-03-30  6:18       ` Glauber Costa
  0 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29 23:51 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

(2012/03/29 19:58), Glauber Costa wrote:

> On 03/29/2012 09:07 AM, KAMEZAWA Hiroyuki wrote:
>> tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
>> If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
>> But it's buggy now.
>>
>> For example, do following
>>   # while sleep 1;do
>>     echo 9223372036854775807>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>     echo 300M>  /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>     done
>>
>> and run network application under A. tcp's usage is sometimes accounted
>> and sometimes not accounted because of frequent changes of static_branch.
>> Then, finally, you can see broken tcp.usage_in_bytes.
>> WARN_ON() is printed because res_counter->usage goes below 0.
>> ==
>> kernel: ------------[ cut here ]----------
>> kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
>>   <snip>
>> kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
>> kernel: Call Trace:
>> kernel:<IRQ>   [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
>> kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
>> kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
>> kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
>> ...
>> ==
>>
>> This patch removes static_branch_slow_dec() at changing res_counter's
>> limit to RESOUCE_MAX. By this, once accounting started, the accountting
>> will continue until the tcp cgroup is destroyed.
>>
>> I think this will not be problem in real use.
>>
> 
> So...
> 
> Are the warnings still there if you have your other patch in this series?


I wrote patch 3/3 after 2/3 because I found all case cannot be fixed by this.

So, comparing patch 3/3 this fixes is leaking.
Considering following sequence

	enable accounting
	tcp allocate buffer
	disable accounting
	tcp free buffer

The accounted usage nerver disappear. This is the probelem which cannot be
covered by patch 3/3. Maybe it's better to change order of patches 3/3 -> 2/3
and describe this explicitly.

> Maybe what we should do is, flush the resource counters so they go back
> to 0 besides decrementing the static branch. This way we get a more
> consistent behavior.
> 

set all memcg's usage to be 0 at enable/disable accounting ?
But, there is a problem which static_branch() update is slow. So,
IIUC, we can't catch all cases because of races.


> Another thing to keep in mind, is that the static branch will only be
> inactive if we turn off *all* controllers. You see this happening
> because you are only testing with one.

yes. So, the behavior change by this patch will not affect usual cases.

> So even if we go to the route you're proposing, we could probably try
> doing something on the
> global level, instead of a per-memcg boolean flat.

In global level, static_key's counter handles it.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-03-29  9:21   ` Glauber Costa
@ 2012-03-30  1:44     ` KAMEZAWA Hiroyuki
  2012-04-06 15:49       ` Glauber Costa
  2012-04-02  3:41     ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started David Miller
  1 sibling, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-30  1:44 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

Maybe what we can do before lsf/mm summit will be this (avoid warning.)
This patch is onto linus's git tree. Patch description is updated.

Thanks.
-Kame
==
>From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 29 Mar 2012 14:59:04 +0900
Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.

tcp memcontrol starts accouting after res->limit is set. So, if a sockets
starts before setting res->limit, there are already used resource.
At setting res->limit, accounting starts. The resource will be uncharged
and make res_counter below 0 because they are not charged. 
This causes warning.

==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
 <snip>
kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==

This patch fixes that by adding res_counter_uncharge_nowarn()
and hide the wrong accounting.

The reason for this fix is :
 - For avoid overheads, we don't account tcp usage by a cgroup before
   setting limit. So, we cannot know the amount of possible error when
   we start accounting. If we have this on/off switch for performance,
   this cannot be avoidable.

Consideration:
 - As memcg, if some marking to resources as PCG_USED could be used..
   ...we don't have problems. But it adds overhead.

TODO:
 - When enable/disable tcp accounting frequently, we'll see usage accounting
   leak. This should be fixed. (still under discussion.)
 - sockets_allocated has the same trouble. But it has no warning and never
   shows negative value even if it's negative.

Changelog:
 - updated patch description.

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    2 ++
 include/net/sock.h          |    3 ++-
 kernel/res_counter.c        |   18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..e081948 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -134,6 +134,8 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
 
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+				unsigned long val);
 
 /**
  * res_counter_margin - calculate chargeable space of a counter
diff --git a/include/net/sock.h b/include/net/sock.h
index a6ba1f8..a1b3f4802 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1048,7 +1048,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
 					      unsigned long amt)
 {
-	res_counter_uncharge(prot->memory_allocated, amt << PAGE_SHIFT);
+	res_counter_uncharge_nowarn(prot->memory_allocated,
+				amt << PAGE_SHIFT);
 }
 
 static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..2bb01ac 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,24 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+					unsigned long val)
+{
+	struct res_counter *c;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (c = counter; c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->usage < val)
+			val = c->usage;
+		res_counter_uncharge_locked(c, val);
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+}
+
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit
  2012-03-29 23:51     ` KAMEZAWA Hiroyuki
@ 2012-03-30  6:18       ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-03-30  6:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 03/30/2012 01:51 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/29 19:58), Glauber Costa wrote:
> 
>> On 03/29/2012 09:07 AM, KAMEZAWA Hiroyuki wrote:
>>> tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
>>> If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
>>> But it's buggy now.
>>>
>>> For example, do following
>>>    # while sleep 1;do
>>>      echo 9223372036854775807>   /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>>      echo 300M>   /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
>>>      done
>>>
>>> and run network application under A. tcp's usage is sometimes accounted
>>> and sometimes not accounted because of frequent changes of static_branch.
>>> Then, finally, you can see broken tcp.usage_in_bytes.
>>> WARN_ON() is printed because res_counter->usage goes below 0.
>>> ==
>>> kernel: ------------[ cut here ]----------
>>> kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
>>>    <snip>
>>> kernel: Pid: 17753, comm: bash Tainted: G  W    3.3.0+ #99
>>> kernel: Call Trace:
>>> kernel:<IRQ>    [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
>>> kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
>>> kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
>>> kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
>>> ...
>>> ==
>>>
>>> This patch removes static_branch_slow_dec() at changing res_counter's
>>> limit to RESOUCE_MAX. By this, once accounting started, the accountting
>>> will continue until the tcp cgroup is destroyed.
>>>
>>> I think this will not be problem in real use.
>>>
>>
>> So...
>>
>> Are the warnings still there if you have your other patch in this series?
> 
> 
> I wrote patch 3/3 after 2/3 because I found all case cannot be fixed by this.
> 
> So, comparing patch 3/3 this fixes is leaking.
> Considering following sequence
> 
> 	enable accounting
> 	tcp allocate buffer
> 	disable accounting
> 	tcp free buffer
> 
> The accounted usage nerver disappear. This is the probelem which cannot be
> covered by patch 3/3. Maybe it's better to change order of patches 3/3 ->  2/3
> and describe this explicitly.
> 
>> Maybe what we should do is, flush the resource counters so they go back
>> to 0 besides decrementing the static branch. This way we get a more
>> consistent behavior.
>>
> 
> set all memcg's usage to be 0 at enable/disable accounting ?
> But, there is a problem which static_branch() update is slow. So,
> IIUC, we can't catch all cases because of races.
> 
> 
>> Another thing to keep in mind, is that the static branch will only be
>> inactive if we turn off *all* controllers. You see this happening
>> because you are only testing with one.
> 
> yes. So, the behavior change by this patch will not affect usual cases.
> 
>> So even if we go to the route you're proposing, we could probably try
>> doing something on the
>> global level, instead of a per-memcg boolean flat.
> 
> In global level, static_key's counter handles it.
> 

I gave it a bit more thought through the night... and I guess your
solution is okay.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started
  2012-03-29  9:21   ` Glauber Costa
  2012-03-30  1:44     ` [PATCH] memcg/tcp: fix warning caused b res->usage go to negative KAMEZAWA Hiroyuki
@ 2012-04-02  3:41     ` David Miller
  2012-04-03 22:31       ` Glauber Costa
  1 sibling, 1 reply; 26+ messages in thread
From: David Miller @ 2012-04-02  3:41 UTC (permalink / raw)
  To: glommer; +Cc: kamezawa.hiroyu, netdev, akpm

From: Glauber Costa <glommer@parallels.com>
Date: Thu, 29 Mar 2012 11:21:07 +0200

> On 03/29/2012 09:10 AM, KAMEZAWA Hiroyuki wrote:
>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>> starts before setting res->limit, there are already used resource.
>> After setting res->limit, the resource (already used) will be uncharged and
>> make res_counter below 0 because they are not charged. This causes warning.
>> 
>> This patch fixes that by adding res_counter_uncharge_nowarn().
>> (*) We cannot avoid this while we have 'account start' switch.
>> 
>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> 
> Fine by me.
> 
> Acked-by: Glauber Costa <glommer@parallels.com>

I'm not applying patches that simply ignore accounting counter
underflows.

You must either:

1) Integrate the socket's existing usage when the limit is set.

2) Avoid accounting completely for a socket that started before
   the limit was set.

No half-way solutions, please.  Otherwise it is impossible to design
validations of the resource usage for a particular socket or group of
sockets, because they can always be potentially "wrong" and over the
limit.  That's a design for a buggy system.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started
  2012-04-02  3:41     ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started David Miller
@ 2012-04-03 22:31       ` Glauber Costa
  2012-04-09  0:58         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-04-03 22:31 UTC (permalink / raw)
  To: David Miller; +Cc: kamezawa.hiroyu, netdev, akpm

On 04/02/2012 07:41 AM, David Miller wrote:
> From: Glauber Costa<glommer@parallels.com>
> Date: Thu, 29 Mar 2012 11:21:07 +0200
>
>> On 03/29/2012 09:10 AM, KAMEZAWA Hiroyuki wrote:
>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>>> starts before setting res->limit, there are already used resource.
>>> After setting res->limit, the resource (already used) will be uncharged and
>>> make res_counter below 0 because they are not charged. This causes warning.
>>>
>>> This patch fixes that by adding res_counter_uncharge_nowarn().
>>> (*) We cannot avoid this while we have 'account start' switch.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>
>> Fine by me.
>>
>> Acked-by: Glauber Costa<glommer@parallels.com>
>
> I'm not applying patches that simply ignore accounting counter
> underflows.
>
> You must either:
>
> 1) Integrate the socket's existing usage when the limit is set.
>
> 2) Avoid accounting completely for a socket that started before
>     the limit was set.
>
> No half-way solutions, please.  Otherwise it is impossible to design
> validations of the resource usage for a particular socket or group of
> sockets, because they can always be potentially "wrong" and over the
> limit.  That's a design for a buggy system.
>
>
Kame,

I agree with Dave FWIW.

We should be able to do this by dropping the reference count when the 
cgroup is finally destroyed, instead of from the remove callback. At 
that point, no more pending sockets should be attached to it.

Prior to increasing the static key, they are all assigned to the global 
cgroup, so we shouldn't care about them.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-03-30  1:44     ` [PATCH] memcg/tcp: fix warning caused b res->usage go to negative KAMEZAWA Hiroyuki
@ 2012-04-06 15:49       ` Glauber Costa
  2012-04-10  2:37         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-04-06 15:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1297 bytes --]

On 03/30/2012 05:44 AM, KAMEZAWA Hiroyuki wrote:
> Maybe what we can do before lsf/mm summit will be this (avoid warning.)
> This patch is onto linus's git tree. Patch description is updated.
> 
> Thanks.
> -Kame
> ==
>  From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Date: Thu, 29 Mar 2012 14:59:04 +0900
> Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
> 
> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
> starts before setting res->limit, there are already used resource.
> At setting res->limit, accounting starts. The resource will be uncharged
> and make res_counter below 0 because they are not charged.
> This causes warning.
> 

Kame,

Please test the following patch and see if it fixes your problems (I
tested locally, and it triggers me no warnings running the test script
you provided + an inbound scp -r copy of an iso directory from a remote
machine)

When you are reviewing, keep in mind that we're likely to have the same
problems with slab jump labels - since the slab pages will outlive the
cgroup as well, and it might be worthy to keep this in mind, and provide
a central point for the jump labels to be set of on cgroup destruction.


[-- Attachment #2: 0001-decrement-static-keys-on-real-destroy-time.patch --]
[-- Type: text/x-patch, Size: 3892 bytes --]

>From c40bbd69cbb655b6389c2398ce89abb06e64910d Mon Sep 17 00:00:00 2001
From: Glauber Costa <glommer@parallels.com>
Date: Wed, 4 Apr 2012 21:08:38 +0400
Subject: [PATCH] decrement static keys on real destroy time

We call the destroy function when a cgroup starts to be removed,
such as by a rmdir event.

However, because of our reference counters, some objects are still
inflight. Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference,
some objects will still have charges, but the code to properly
uncharge them won't be run.

This becomes a problem specially if it is ever enabled again, because
now new charges will be added to the staled charges making keeping
it pretty much impossible.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 include/net/tcp_memcontrol.h |    2 ++
 mm/memcontrol.c              |   15 +++++++++++++++
 net/ipv4/tcp_memcontrol.c    |   10 ++++------
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 7df18bc..5a2b915 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -9,6 +9,8 @@ struct tcp_memcontrol {
 	/* those two are read-mostly, leave them at the end */
 	long tcp_prot_mem[3];
 	int tcp_memory_pressure;
+	/* if this cgroup was ever limited, having static_keys activated */
+	bool limited;
 };
 
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 64a1bcd..74b757b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -442,6 +442,15 @@ void sock_release_memcg(struct sock *sk)
 	}
 }
 
+static void disarm_static_keys(struct mem_cgroup *memcg)
+{
+#ifdef CONFIG_INET
+	if (memcg->tcp_mem.limited)	
+		static_key_slow_dec(&memcg_socket_limit_enabled);
+#endif
+}
+
+
 #ifdef CONFIG_INET
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
 {
@@ -452,6 +461,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
 }
 EXPORT_SYMBOL(tcp_proto_cgroup);
 #endif /* CONFIG_INET */
+#else
+static inline void disarm_static_keys(struct mem_cgroup *memcg)
+{
+}
+
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
 
 static void drain_all_stock_async(struct mem_cgroup *memcg);
@@ -4883,6 +4897,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
 {
 	if (atomic_sub_and_test(count, &memcg->refcnt)) {
 		struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+		disarm_static_keys(memcg);
 		__mem_cgroup_free(memcg);
 		if (parent)
 			mem_cgroup_put(parent);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 1517037..93555ab 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -41,6 +41,7 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 	tcp->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1];
 	tcp->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2];
 	tcp->tcp_memory_pressure = 0;
+	tcp->limited = false;
 
 	parent_cg = tcp_prot.proto_cgroup(parent);
 	if (parent_cg)
@@ -74,9 +75,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 	percpu_counter_destroy(&tcp->tcp_sockets_allocated);
 
 	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
-
-	if (val != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
 }
 EXPORT_SYMBOL(tcp_destroy_cgroup);
 
@@ -107,10 +105,10 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
 		tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
 					     net->ipv4.sysctl_tcp_mem[i]);
 
-	if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
-	else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
+	if (old_lim == RESOURCE_MAX && !tcp->limited) {
 		static_key_slow_inc(&memcg_socket_limit_enabled);
+		tcp->limited = true;
+	}
 
 	return 0;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started
  2012-04-03 22:31       ` Glauber Costa
@ 2012-04-09  0:58         ` KAMEZAWA Hiroyuki
  2012-04-09  1:44           ` Glauber Costa
  0 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-09  0:58 UTC (permalink / raw)
  To: Glauber Costa; +Cc: David Miller, netdev, akpm

(2012/04/04 7:31), Glauber Costa wrote:

> On 04/02/2012 07:41 AM, David Miller wrote:
>> From: Glauber Costa<glommer@parallels.com>
>> Date: Thu, 29 Mar 2012 11:21:07 +0200
>>
>>> On 03/29/2012 09:10 AM, KAMEZAWA Hiroyuki wrote:
>>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>>>> starts before setting res->limit, there are already used resource.
>>>> After setting res->limit, the resource (already used) will be uncharged and
>>>> make res_counter below 0 because they are not charged. This causes warning.
>>>>
>>>> This patch fixes that by adding res_counter_uncharge_nowarn().
>>>> (*) We cannot avoid this while we have 'account start' switch.
>>>>
>>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>>
>>> Fine by me.
>>>
>>> Acked-by: Glauber Costa<glommer@parallels.com>
>>
>> I'm not applying patches that simply ignore accounting counter
>> underflows.
>>
>> You must either:
>>
>> 1) Integrate the socket's existing usage when the limit is set.
>>
>> 2) Avoid accounting completely for a socket that started before
>>     the limit was set.
>>
>> No half-way solutions, please.  Otherwise it is impossible to design
>> validations of the resource usage for a particular socket or group of
>> sockets, because they can always be potentially "wrong" and over the
>> limit.  That's a design for a buggy system.
>>
>>
> Kame,
> 
> I agree with Dave FWIW.
> 
> We should be able to do this by dropping the reference count when the 
> cgroup is finally destroyed, instead of from the remove callback. At 
> that point, no more pending sockets should be attached to it.
> 
> Prior to increasing the static key, they are all assigned to the global 
> cgroup, so we shouldn't care about them.
> 

Could you do the fix ?

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started
  2012-04-09  0:58         ` KAMEZAWA Hiroyuki
@ 2012-04-09  1:44           ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-04-09  1:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: David Miller, netdev, akpm

On 04/08/2012 09:58 PM, KAMEZAWA Hiroyuki wrote:
> (2012/04/04 7:31), Glauber Costa wrote:
>
>> On 04/02/2012 07:41 AM, David Miller wrote:
>>> From: Glauber Costa<glommer@parallels.com>
>>> Date: Thu, 29 Mar 2012 11:21:07 +0200
>>>
>>>> On 03/29/2012 09:10 AM, KAMEZAWA Hiroyuki wrote:
>>>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>>>>> starts before setting res->limit, there are already used resource.
>>>>> After setting res->limit, the resource (already used) will be uncharged and
>>>>> make res_counter below 0 because they are not charged. This causes warning.
>>>>>
>>>>> This patch fixes that by adding res_counter_uncharge_nowarn().
>>>>> (*) We cannot avoid this while we have 'account start' switch.
>>>>>
>>>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>>>
>>>> Fine by me.
>>>>
>>>> Acked-by: Glauber Costa<glommer@parallels.com>
>>>
>>> I'm not applying patches that simply ignore accounting counter
>>> underflows.
>>>
>>> You must either:
>>>
>>> 1) Integrate the socket's existing usage when the limit is set.
>>>
>>> 2) Avoid accounting completely for a socket that started before
>>>      the limit was set.
>>>
>>> No half-way solutions, please.  Otherwise it is impossible to design
>>> validations of the resource usage for a particular socket or group of
>>> sockets, because they can always be potentially "wrong" and over the
>>> limit.  That's a design for a buggy system.
>>>
>>>
>> Kame,
>>
>> I agree with Dave FWIW.
>>
>> We should be able to do this by dropping the reference count when the
>> cgroup is finally destroyed, instead of from the remove callback. At
>> that point, no more pending sockets should be attached to it.
>>
>> Prior to increasing the static key, they are all assigned to the global
>> cgroup, so we shouldn't care about them.
>>
>
> Could you do the fix ?
>
> Thanks,
> -Kame
>
>
I sent you a version already. Please just make sure it works for you

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-06 15:49       ` Glauber Costa
@ 2012-04-10  2:37         ` KAMEZAWA Hiroyuki
  2012-04-10  2:51           ` Glauber Costa
  2012-04-13 17:33           ` Glauber Costa
  0 siblings, 2 replies; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  2:37 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

(2012/04/07 0:49), Glauber Costa wrote:

> On 03/30/2012 05:44 AM, KAMEZAWA Hiroyuki wrote:
>> Maybe what we can do before lsf/mm summit will be this (avoid warning.)
>> This patch is onto linus's git tree. Patch description is updated.
>>
>> Thanks.
>> -Kame
>> ==
>>  From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
>> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Date: Thu, 29 Mar 2012 14:59:04 +0900
>> Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
>>
>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>> starts before setting res->limit, there are already used resource.
>> At setting res->limit, accounting starts. The resource will be uncharged
>> and make res_counter below 0 because they are not charged.
>> This causes warning.
>>
> 
> Kame,
> 
> Please test the following patch and see if it fixes your problems (I
> tested locally, and it triggers me no warnings running the test script
> you provided + an inbound scp -r copy of an iso directory from a remote
> machine)
> 
> When you are reviewing, keep in mind that we're likely to have the same
> problems with slab jump labels - since the slab pages will outlive the
> cgroup as well, and it might be worthy to keep this in mind, and provide
> a central point for the jump labels to be set of on cgroup destruction.
> 


Hm. What happens in following sequence ?

  1. a memcg is created
  2. put a task into the memcg, start tcp steam
  3. set tcp memory limit

The resource used between 2 and 3 will cause the problem finally.

Then, Dave's request
==
You must either:

1) Integrate the socket's existing usage when the limit is set.

2) Avoid accounting completely for a socket that started before
   the limit was set.
==
are not satisfied. So, we need to have a state per sockets, it's accounted
or not. I'll look into this problem again, today.


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-10  2:37         ` KAMEZAWA Hiroyuki
@ 2012-04-10  2:51           ` Glauber Costa
  2012-04-10  3:01             ` Glauber Costa
  2012-04-10  3:21             ` KAMEZAWA Hiroyuki
  2012-04-13 17:33           ` Glauber Costa
  1 sibling, 2 replies; 26+ messages in thread
From: Glauber Costa @ 2012-04-10  2:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
> Hm. What happens in following sequence ?
> 
>    1. a memcg is created
>    2. put a task into the memcg, start tcp steam
>    3. set tcp memory limit
> 
> The resource used between 2 and 3 will cause the problem finally.

I don't get it. if a task is in memcg, but no limit is set,
that socket will be assigned null memcg, and will stay like that
forever. Only new sockets will have the new memcg pointer.

And previously, we could have the memcg pointer alive, but the jump
labels to be disabled. With the patch I posted, this can't happen
anymore, since the jump labels are guaranteed to live throughout the
whole socket life.

> Then, Dave's request
> ==
> You must either:
> 
> 1) Integrate the socket's existing usage when the limit is set.
> 
> 2) Avoid accounting completely for a socket that started before
>     the limit was set.
> ==
> are not satisfied. So, we need to have a state per sockets, it's accounted
> or not. I'll look into this problem again, today.
> 

Of course they are.

Every socket created before we set the limit is not accounted.
This is 2) that Dave mentioned, and it was *always* this way.

The problem here was the opposite: You could disable the jump labels
with sockets still in flight, because we were disabling it based on
the limit being set back to unlimited.

What this patch does, is defer that until the last socket limited dies.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-10  2:51           ` Glauber Costa
@ 2012-04-10  3:01             ` Glauber Costa
  2012-04-10  4:15               ` KAMEZAWA Hiroyuki
  2012-04-10  3:21             ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-04-10  3:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 04/09/2012 11:51 PM, Glauber Costa wrote:
> On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
>> Hm. What happens in following sequence ?
>>
>>     1. a memcg is created
>>     2. put a task into the memcg, start tcp steam
>>     3. set tcp memory limit
>>
>> The resource used between 2 and 3 will cause the problem finally.
> 
> I don't get it. if a task is in memcg, but no limit is set,
> that socket will be assigned null memcg, and will stay like that
> forever. Only new sockets will have the new memcg pointer.
> 
> And previously, we could have the memcg pointer alive, but the jump
> labels to be disabled. With the patch I posted, this can't happen
> anymore, since the jump labels are guaranteed to live throughout the
> whole socket life.
> 
>> Then, Dave's request
>> ==
>> You must either:
>>
>> 1) Integrate the socket's existing usage when the limit is set.
>>
>> 2) Avoid accounting completely for a socket that started before
>>      the limit was set.
>> ==
>> are not satisfied. So, we need to have a state per sockets, it's accounted
>> or not. I'll look into this problem again, today.
>>
> 
> Of course they are.
> 
> Every socket created before we set the limit is not accounted.
> This is 2) that Dave mentioned, and it was *always* this way.
> 
> The problem here was the opposite: You could disable the jump labels
> with sockets still in flight, because we were disabling it based on
> the limit being set back to unlimited.
> 
> What this patch does, is defer that until the last socket limited dies.
> 

Okay, there is an additional thing to be considered here:

Due to the nature of how jump label works, once they are enabled for one
of the cgroups, they will be enabled for all of them. So the patch I
sent may still break in some scenarios because of the way we record that
the limit was set.

However, if my theory behind what is causing the problem is correct,
this patch should fix the issue for you. Let me know if it does, and
I'll work on the final solution.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-10  2:51           ` Glauber Costa
  2012-04-10  3:01             ` Glauber Costa
@ 2012-04-10  3:21             ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  3:21 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

(2012/04/10 11:51), Glauber Costa wrote:

> On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
>> Hm. What happens in following sequence ?
>>
>>    1. a memcg is created
>>    2. put a task into the memcg, start tcp steam
>>    3. set tcp memory limit
>>
>> The resource used between 2 and 3 will cause the problem finally.
> 
> I don't get it. if a task is in memcg, but no limit is set,
> that socket will be assigned null memcg, and will stay like that
> forever. Only new sockets will have the new memcg pointer.
> 
> And previously, we could have the memcg pointer alive, but the jump
> labels to be disabled. With the patch I posted, this can't happen
> anymore, since the jump labels are guaranteed to live throughout the
> whole socket life.
> 
>> Then, Dave's request
>> ==
>> You must either:
>>
>> 1) Integrate the socket's existing usage when the limit is set.
>>
>> 2) Avoid accounting completely for a socket that started before
>>     the limit was set.
>> ==
>> are not satisfied. So, we need to have a state per sockets, it's accounted
>> or not. I'll look into this problem again, today.
>>
> 
> Of course they are.
> 
> Every socket created before we set the limit is not accounted.
> This is 2) that Dave mentioned, and it was *always* this way.
> 
> The problem here was the opposite: You could disable the jump labels
> with sockets still in flight, because we were disabling it based on
> the limit being set back to unlimited.
> 
> What this patch does, is defer that until the last socket limited dies.
> 

Thank you for explanation. Hmm, sk->cgrp check ?
Ah, yes it's updated by sock_update_memcg() under jump_label, which is
called by tcp_v4_init_sock().
Hm. and jump_label()'s atomic counter and mutex_lock will be a guard against
set/unset race. Ok.

BTW, what will happen in following case ?

Assume that the last memcg is destroyed and call jump_label_dec. And the
thread waits for jump_label_mutex for a while.

      CPU A                                           CPU B

    jump_label_dec() # mutex will be held        sock_update_memcg() is called
                                                 sk_cgrp is set.
    ...modify instructions                       some accounting is done.
    mutex_unlock()

I wonder you need some serialization somewhere OR disallow turning off accounting.                                              

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-10  3:01             ` Glauber Costa
@ 2012-04-10  4:15               ` KAMEZAWA Hiroyuki
  2012-04-11  2:22                 ` Glauber Costa
  0 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  4:15 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

(2012/04/10 12:01), Glauber Costa wrote:

> On 04/09/2012 11:51 PM, Glauber Costa wrote:
>> On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
>>> Hm. What happens in following sequence ?
>>>
>>>     1. a memcg is created
>>>     2. put a task into the memcg, start tcp steam
>>>     3. set tcp memory limit
>>>
>>> The resource used between 2 and 3 will cause the problem finally.
>>
>> I don't get it. if a task is in memcg, but no limit is set,
>> that socket will be assigned null memcg, and will stay like that
>> forever. Only new sockets will have the new memcg pointer.
>>
>> And previously, we could have the memcg pointer alive, but the jump
>> labels to be disabled. With the patch I posted, this can't happen
>> anymore, since the jump labels are guaranteed to live throughout the
>> whole socket life.
>>
>>> Then, Dave's request
>>> ==
>>> You must either:
>>>
>>> 1) Integrate the socket's existing usage when the limit is set.
>>>
>>> 2) Avoid accounting completely for a socket that started before
>>>      the limit was set.
>>> ==
>>> are not satisfied. So, we need to have a state per sockets, it's accounted
>>> or not. I'll look into this problem again, today.
>>>
>>
>> Of course they are.
>>
>> Every socket created before we set the limit is not accounted.
>> This is 2) that Dave mentioned, and it was *always* this way.
>>
>> The problem here was the opposite: You could disable the jump labels
>> with sockets still in flight, because we were disabling it based on
>> the limit being set back to unlimited.
>>
>> What this patch does, is defer that until the last socket limited dies.
>>
> 
> Okay, there is an additional thing to be considered here:
> 
> Due to the nature of how jump label works, once they are enabled for one
> of the cgroups, they will be enabled for all of them. So the patch I
> sent may still break in some scenarios because of the way we record that
> the limit was set.
> 
> However, if my theory behind what is causing the problem is correct,
> this patch should fix the issue for you. 


Now, our issue is leak of accounting, regardless of warning.

> Let me know if it does, and

> I'll work on the final solution.
> 


The problem is that jump_label updating is not atomic_ops.

I'm _not_ sure the update order of the jump_label in sock_update_memcg()
and other jump instructions inserted at accounting.

For example, if the jump instruction in sock_update_memcg() is updated 1st
and others are updated later, it's unclear whether sockets which has _valid_
sock->sk_cgrp will be accounted or not because accounting jump instruction
may not be updated yet.

Hopefully, label in sock_update_memcg should be updated last...

Hm. If I do, I'll add one more key as:

atomic_t	sock_should_memcg_aware;

And update 2 keys in following order.

At enable
	static_key_slow_inc(&memcg_socket_limit_enabled)
	atomic_inc(&sock_should_memcg_aware);

At disable
	atomic_dec(&sock_should_memcg_aware);
	static_key_slow_dec(&memcg_socket_limit_enabled)

And
==
void sock_update_memcg(struct sock *sk)
{
        if (atomic_read(&sock_should_memcg_aware)) {

==


Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-10  4:15               ` KAMEZAWA Hiroyuki
@ 2012-04-11  2:22                 ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-04-11  2:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton


> 
> The problem is that jump_label updating is not atomic_ops.

That's indeed really bad.
You seem to be right about that... I wonder, however, if we should
prevent anything to run in any cpu during the jump_label update?
The update can be slow, sure, but it is not expected to be frequent...

> I'm _not_ sure the update order of the jump_label in sock_update_memcg()
> and other jump instructions inserted at accounting.

Any ordering assumptions here would be extremely fragile, even if we
could make one.

> For example, if the jump instruction in sock_update_memcg() is updated 1st
> and others are updated later, it's unclear whether sockets which has _valid_
> sock->sk_cgrp will be accounted or not because accounting jump instruction
> may not be updated yet.
> 
> Hopefully, label in sock_update_memcg should be updated last...
> 
> Hm. If I do, I'll add one more key as:
> 
> atomic_t	sock_should_memcg_aware;
> 
> And update 2 keys in following order.
> 
> At enable
> 	static_key_slow_inc(&memcg_socket_limit_enabled)
> 	atomic_inc(&sock_should_memcg_aware);
> 
> At disable
> 	atomic_dec(&sock_should_memcg_aware);
> 	static_key_slow_dec(&memcg_socket_limit_enabled)
> 
> And
> ==
> void sock_update_memcg(struct sock *sk)
> {
>          if (atomic_read(&sock_should_memcg_aware)) {
> 
> ==

Problem here is that having an atomic variable here defeats a bit the
purpose of the jump labels.

If it is just in the update path, it is not *that* bad.

But unless we go fix the jump labels to prevent such thing from
happening, maybe it would be better to have two jump labels here?

One for the writer, that is updated last, and one from the readers.
This way we can force the ordering the way we want.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-10  2:37         ` KAMEZAWA Hiroyuki
  2012-04-10  2:51           ` Glauber Costa
@ 2012-04-13 17:33           ` Glauber Costa
  2012-04-18  8:02             ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2012-04-13 17:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2209 bytes --]

On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
> (2012/04/07 0:49), Glauber Costa wrote:
> 
>> On 03/30/2012 05:44 AM, KAMEZAWA Hiroyuki wrote:
>>> Maybe what we can do before lsf/mm summit will be this (avoid warning.)
>>> This patch is onto linus's git tree. Patch description is updated.
>>>
>>> Thanks.
>>> -Kame
>>> ==
>>>   From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
>>> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>> Date: Thu, 29 Mar 2012 14:59:04 +0900
>>> Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
>>>
>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>>> starts before setting res->limit, there are already used resource.
>>> At setting res->limit, accounting starts. The resource will be uncharged
>>> and make res_counter below 0 because they are not charged.
>>> This causes warning.
>>>
>>
>> Kame,
>>
>> Please test the following patch and see if it fixes your problems (I
>> tested locally, and it triggers me no warnings running the test script
>> you provided + an inbound scp -r copy of an iso directory from a remote
>> machine)
>>
>> When you are reviewing, keep in mind that we're likely to have the same
>> problems with slab jump labels - since the slab pages will outlive the
>> cgroup as well, and it might be worthy to keep this in mind, and provide
>> a central point for the jump labels to be set of on cgroup destruction.
>>
> 
> 
> Hm. What happens in following sequence ?
> 
>    1. a memcg is created
>    2. put a task into the memcg, start tcp steam
>    3. set tcp memory limit
> 
> The resource used between 2 and 3 will cause the problem finally.
> 
> Then, Dave's request
> ==
> You must either:
> 
> 1) Integrate the socket's existing usage when the limit is set.
> 
> 2) Avoid accounting completely for a socket that started before
>     the limit was set.
> ==
> are not satisfied. So, we need to have a state per sockets, it's accounted
> or not. I'll look into this problem again, today.
> 

Kame,

Let me know what you think of the attached fix.
I still need to compile test it in other configs to be sure it doesn't
break, etc. But let's agree on it first.

[-- Attachment #2: 0001-decrement-static-keys-on-real-destroy-time.patch --]
[-- Type: text/x-patch, Size: 5324 bytes --]

>From cdebff23dceeb9d35fd27e5988748a506bb47985 Mon Sep 17 00:00:00 2001
From: Glauber Costa <glommer@parallels.com>
Date: Wed, 4 Apr 2012 21:08:38 +0400
Subject: [PATCH] decrement static keys on real destroy time

We call the destroy function when a cgroup starts to be removed,
such as by a rmdir event.

However, because of our reference counters, some objects are still
inflight. Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference,
some objects will still have charges, but the code to properly
uncharge them won't be run.

This becomes a problem specially if it is ever enabled again, because
now new charges will be added to the staled charges making keeping
it pretty much impossible.

We just need to be careful with the static branch activation:
since there is no particular preferred order of their activation,
we need to make sure that we only start using it after all
call sites are active. This is achieved by having a per-memcg
flag that is only updated after static_key_slow_inc() returns.
At this time, we are sure all sites are active.

This is made per-memcg, not global, for a reason:
it also has the effect of making socket accounting more
consistent. The first memcg to be limited will trigger static_key()
activation, therefore, accounting. But all the others will then be
accounted no matter what. After this patch, only limited memcgs
will have its sockets accounted.

[v2: changed a tcp limited flag for a generic proto limited flag ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 include/net/sock.h        |    1 +
 mm/memcontrol.c           |   20 ++++++++++++++++++--
 net/ipv4/tcp_memcontrol.c |   12 ++++++------
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b3ebe6b..f35ff7d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -913,6 +913,7 @@ struct cg_proto {
 	struct percpu_counter	*sockets_allocated;	/* Current number of sockets. */
 	int			*memory_pressure;
 	long			*sysctl_mem;
+	bool			limited;
 	/*
 	 * memcg field is used to find which memcg we belong directly
 	 * Each memcg struct can hold more than one cg_proto, so container_of
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 02b01d2..61f2d31 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk)
 {
 	if (mem_cgroup_sockets_enabled) {
 		struct mem_cgroup *memcg;
+		struct cg_proto *cg_proto;
 
 		BUG_ON(!sk->sk_prot->proto_cgroup);
 
@@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk)
 
 		rcu_read_lock();
 		memcg = mem_cgroup_from_task(current);
-		if (!mem_cgroup_is_root(memcg)) {
+		cg_proto = sk->sk_prot->proto_cgroup(memcg);
+		if (!mem_cgroup_is_root(memcg) && cg_proto->limited) {
 			mem_cgroup_get(memcg);
-			sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg);
+			sk->sk_cgrp = cg_proto;
 		}
 		rcu_read_unlock();
 	}
@@ -442,6 +444,14 @@ void sock_release_memcg(struct sock *sk)
 	}
 }
 
+static void disarm_static_keys(struct mem_cgroup *memcg)
+{
+#ifdef CONFIG_INET
+	if (memcg->tcp_mem.cg_proto.limited)
+		static_key_slow_dec(&memcg_socket_limit_enabled);
+#endif
+}
+
 #ifdef CONFIG_INET
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
 {
@@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
 }
 EXPORT_SYMBOL(tcp_proto_cgroup);
 #endif /* CONFIG_INET */
+#else
+static inline void disarm_static_keys(struct mem_cgroup *memcg)
+{
+}
+
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
 
 static void drain_all_stock_async(struct mem_cgroup *memcg);
@@ -4883,6 +4898,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
 {
 	if (atomic_sub_and_test(count, &memcg->refcnt)) {
 		struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+		disarm_static_keys(memcg);
 		__mem_cgroup_free(memcg);
 		if (parent)
 			mem_cgroup_put(parent);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 1517037..8005667 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -54,6 +54,7 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 	cg_proto->sysctl_mem = tcp->tcp_prot_mem;
 	cg_proto->memory_allocated = &tcp->tcp_memory_allocated;
 	cg_proto->sockets_allocated = &tcp->tcp_sockets_allocated;
+	cg_proto->limited = false;
 	cg_proto->memcg = memcg;
 
 	return 0;
@@ -74,9 +75,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 	percpu_counter_destroy(&tcp->tcp_sockets_allocated);
 
 	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
-
-	if (val != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
 }
 EXPORT_SYMBOL(tcp_destroy_cgroup);
 
@@ -107,10 +105,12 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
 		tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
 					     net->ipv4.sysctl_tcp_mem[i]);
 
-	if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
-	else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
+	if (val == RESOURCE_MAX)
+		cg_proto->limited = false;
+	else if (val != RESOURCE_MAX && !cg_proto->limited) {
 		static_key_slow_inc(&memcg_socket_limit_enabled);
+		cg_proto->limited = true;
+	}
 
 	return 0;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-13 17:33           ` Glauber Costa
@ 2012-04-18  8:02             ` KAMEZAWA Hiroyuki
  2012-04-18 16:32               ` Glauber Costa
  0 siblings, 1 reply; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  8:02 UTC (permalink / raw)
  To: Glauber Costa; +Cc: netdev, David Miller, Andrew Morton

(2012/04/14 2:33), Glauber Costa wrote:

> On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
>> (2012/04/07 0:49), Glauber Costa wrote:
>>
>>> On 03/30/2012 05:44 AM, KAMEZAWA Hiroyuki wrote:
>>>> Maybe what we can do before lsf/mm summit will be this (avoid warning.)
>>>> This patch is onto linus's git tree. Patch description is updated.
>>>>
>>>> Thanks.
>>>> -Kame
>>>> ==
>>>>   From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
>>>> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>>> Date: Thu, 29 Mar 2012 14:59:04 +0900
>>>> Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
>>>>
>>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>>>> starts before setting res->limit, there are already used resource.
>>>> At setting res->limit, accounting starts. The resource will be uncharged
>>>> and make res_counter below 0 because they are not charged.
>>>> This causes warning.
>>>>
>>>
>>> Kame,
>>>
>>> Please test the following patch and see if it fixes your problems (I
>>> tested locally, and it triggers me no warnings running the test script
>>> you provided + an inbound scp -r copy of an iso directory from a remote
>>> machine)
>>>
>>> When you are reviewing, keep in mind that we're likely to have the same
>>> problems with slab jump labels - since the slab pages will outlive the
>>> cgroup as well, and it might be worthy to keep this in mind, and provide
>>> a central point for the jump labels to be set of on cgroup destruction.
>>>
>>
>>
>> Hm. What happens in following sequence ?
>>
>>    1. a memcg is created
>>    2. put a task into the memcg, start tcp steam
>>    3. set tcp memory limit
>>
>> The resource used between 2 and 3 will cause the problem finally.
>>
>> Then, Dave's request
>> ==
>> You must either:
>>
>> 1) Integrate the socket's existing usage when the limit is set.
>>
>> 2) Avoid accounting completely for a socket that started before
>>     the limit was set.
>> ==
>> are not satisfied. So, we need to have a state per sockets, it's accounted
>> or not. I'll look into this problem again, today.
>>
> 
> Kame,
> 
> Let me know what you think of the attached fix.
> I still need to compile test it in other configs to be sure it doesn't
> break, etc. But let's agree on it first.



> Subject: [PATCH] decrement static keys on real destroy time
> 
> We call the destroy function when a cgroup starts to be removed,
> such as by a rmdir event.
> 
> However, because of our reference counters, some objects are still
> inflight. Right now, we are decrementing the static_keys at destroy()
> time, meaning that if we get rid of the last static_key reference,
> some objects will still have charges, but the code to properly
> uncharge them won't be run.
> 
> This becomes a problem specially if it is ever enabled again, because
> now new charges will be added to the staled charges making keeping
> it pretty much impossible.
> 
> We just need to be careful with the static branch activation:
> since there is no particular preferred order of their activation,
> we need to make sure that we only start using it after all
> call sites are active. This is achieved by having a per-memcg
> flag that is only updated after static_key_slow_inc() returns.
> At this time, we are sure all sites are active.
> 
> This is made per-memcg, not global, for a reason:
> it also has the effect of making socket accounting more
> consistent. The first memcg to be limited will trigger static_key()
> activation, therefore, accounting. But all the others will then be
> accounted no matter what. After this patch, only limited memcgs
> will have its sockets accounted.
> 
> [v2: changed a tcp limited flag for a generic proto limited flag ]
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> ---
>  include/net/sock.h        |    1 +
>  mm/memcontrol.c           |   20 ++++++++++++++++++--
>  net/ipv4/tcp_memcontrol.c |   12 ++++++------
>  3 files changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index b3ebe6b..f35ff7d 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -913,6 +913,7 @@ struct cg_proto {
>  	struct percpu_counter	*sockets_allocated;	/* Current number of sockets. */
>  	int			*memory_pressure;
>  	long			*sysctl_mem;
> +	bool			limited;


please add comment somewhere. Hmm, 'limited' is good name ?

>  	/*
>  	 * memcg field is used to find which memcg we belong directly
>  	 * Each memcg struct can hold more than one cg_proto, so container_of
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 02b01d2..61f2d31 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk)
>  {
>  	if (mem_cgroup_sockets_enabled) {
>  		struct mem_cgroup *memcg;
> +		struct cg_proto *cg_proto;
>  
>  		BUG_ON(!sk->sk_prot->proto_cgroup);
>  
> @@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk)
>  
>  		rcu_read_lock();
>  		memcg = mem_cgroup_from_task(current);
> -		if (!mem_cgroup_is_root(memcg)) {
> +		cg_proto = sk->sk_prot->proto_cgroup(memcg);
> +		if (!mem_cgroup_is_root(memcg) && cg_proto->limited) {
>  			mem_cgroup_get(memcg);
> -			sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg);
> +			sk->sk_cgrp = cg_proto;
>  		}



Ok, then, sk->sk_cgroup is set only after jump_label is enabled and
its memcg has limitation.


>  		rcu_read_unlock();
>  	}
> @@ -442,6 +444,14 @@ void sock_release_memcg(struct sock *sk)
>  	}
>  }
>  
> +static void disarm_static_keys(struct mem_cgroup *memcg)
> +{
> +#ifdef CONFIG_INET
> +	if (memcg->tcp_mem.cg_proto.limited)
> +		static_key_slow_dec(&memcg_socket_limit_enabled);
> +#endif
> +}
> +
>  #ifdef CONFIG_INET
>  struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
>  {
> @@ -452,6 +462,11 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
>  }
>  EXPORT_SYMBOL(tcp_proto_cgroup);
>  #endif /* CONFIG_INET */
> +#else
> +static inline void disarm_static_keys(struct mem_cgroup *memcg)
> +{
> +}
> +
>  #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
>  
>  static void drain_all_stock_async(struct mem_cgroup *memcg);
> @@ -4883,6 +4898,7 @@ static void __mem_cgroup_put(struct mem_cgroup *memcg, int count)
>  {
>  	if (atomic_sub_and_test(count, &memcg->refcnt)) {
>  		struct mem_cgroup *parent = parent_mem_cgroup(memcg);
> +		disarm_static_keys(memcg);
>  		__mem_cgroup_free(memcg);
>  		if (parent)
>  			mem_cgroup_put(parent);
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index 1517037..8005667 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -54,6 +54,7 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
>  	cg_proto->sysctl_mem = tcp->tcp_prot_mem;
>  	cg_proto->memory_allocated = &tcp->tcp_memory_allocated;
>  	cg_proto->sockets_allocated = &tcp->tcp_sockets_allocated;
> +	cg_proto->limited = false;
>  	cg_proto->memcg = memcg;
>  
>  	return 0;
> @@ -74,9 +75,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>  	percpu_counter_destroy(&tcp->tcp_sockets_allocated);
>  
>  	val = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT);
> -
> -	if (val != RESOURCE_MAX)
> -		static_key_slow_dec(&memcg_socket_limit_enabled);
>  }
>  EXPORT_SYMBOL(tcp_destroy_cgroup);
>  
> @@ -107,10 +105,12 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
>  		tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT,
>  					     net->ipv4.sysctl_tcp_mem[i]);
>  
> -	if (val == RESOURCE_MAX && old_lim != RESOURCE_MAX)
> -		static_key_slow_dec(&memcg_socket_limit_enabled);
> -	else if (old_lim == RESOURCE_MAX && val != RESOURCE_MAX)
> +	if (val == RESOURCE_MAX)
> +		cg_proto->limited = false;


Why we can reset this to be false ? disarm_static_keys() will not work
at destroy()....


> +	else if (val != RESOURCE_MAX && !cg_proto->limited) {
>  		static_key_slow_inc(&memcg_socket_limit_enabled);
> +		cg_proto->limited = true;
> +	}
>  

Hmm. don't we need any mutex ?

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
  2012-04-18  8:02             ` KAMEZAWA Hiroyuki
@ 2012-04-18 16:32               ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2012-04-18 16:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: netdev, David Miller, Andrew Morton

On 04/18/2012 05:02 AM, KAMEZAWA Hiroyuki wrote:
> (2012/04/14 2:33), Glauber Costa wrote:
> 
>> On 04/09/2012 11:37 PM, KAMEZAWA Hiroyuki wrote:
>>> (2012/04/07 0:49), Glauber Costa wrote:
>>>
>>>> On 03/30/2012 05:44 AM, KAMEZAWA Hiroyuki wrote:
>>>>> Maybe what we can do before lsf/mm summit will be this (avoid warning.)
>>>>> This patch is onto linus's git tree. Patch description is updated.
>>>>>
>>>>> Thanks.
>>>>> -Kame
>>>>> ==
>>>>>    From 4ab80f84bbcb02a790342426c1de84aeb17fcbe9 Mon Sep 17 00:00:00 2001
>>>>> From: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>>>> Date: Thu, 29 Mar 2012 14:59:04 +0900
>>>>> Subject: [PATCH] memcg/tcp: fix warning caused b res->usage go to negative.
>>>>>
>>>>> tcp memcontrol starts accouting after res->limit is set. So, if a sockets
>>>>> starts before setting res->limit, there are already used resource.
>>>>> At setting res->limit, accounting starts. The resource will be uncharged
>>>>> and make res_counter below 0 because they are not charged.
>>>>> This causes warning.
>>>>>
>>>>
>>>> Kame,
>>>>
>>>> Please test the following patch and see if it fixes your problems (I
>>>> tested locally, and it triggers me no warnings running the test script
>>>> you provided + an inbound scp -r copy of an iso directory from a remote
>>>> machine)
>>>>
>>>> When you are reviewing, keep in mind that we're likely to have the same
>>>> problems with slab jump labels - since the slab pages will outlive the
>>>> cgroup as well, and it might be worthy to keep this in mind, and provide
>>>> a central point for the jump labels to be set of on cgroup destruction.
>>>>
>>>
>>>
>>> Hm. What happens in following sequence ?
>>>
>>>     1. a memcg is created
>>>     2. put a task into the memcg, start tcp steam
>>>     3. set tcp memory limit
>>>
>>> The resource used between 2 and 3 will cause the problem finally.
>>>
>>> Then, Dave's request
>>> ==
>>> You must either:
>>>
>>> 1) Integrate the socket's existing usage when the limit is set.
>>>
>>> 2) Avoid accounting completely for a socket that started before
>>>      the limit was set.
>>> ==
>>> are not satisfied. So, we need to have a state per sockets, it's accounted
>>> or not. I'll look into this problem again, today.
>>>
>>
>> Kame,
>>
>> Let me know what you think of the attached fix.
>> I still need to compile test it in other configs to be sure it doesn't
>> break, etc. But let's agree on it first.
> 
> 
> 
>> Subject: [PATCH] decrement static keys on real destroy time
>>
>> We call the destroy function when a cgroup starts to be removed,
>> such as by a rmdir event.
>>
>> However, because of our reference counters, some objects are still
>> inflight. Right now, we are decrementing the static_keys at destroy()
>> time, meaning that if we get rid of the last static_key reference,
>> some objects will still have charges, but the code to properly
>> uncharge them won't be run.
>>
>> This becomes a problem specially if it is ever enabled again, because
>> now new charges will be added to the staled charges making keeping
>> it pretty much impossible.
>>
>> We just need to be careful with the static branch activation:
>> since there is no particular preferred order of their activation,
>> we need to make sure that we only start using it after all
>> call sites are active. This is achieved by having a per-memcg
>> flag that is only updated after static_key_slow_inc() returns.
>> At this time, we are sure all sites are active.
>>
>> This is made per-memcg, not global, for a reason:
>> it also has the effect of making socket accounting more
>> consistent. The first memcg to be limited will trigger static_key()
>> activation, therefore, accounting. But all the others will then be
>> accounted no matter what. After this patch, only limited memcgs
>> will have its sockets accounted.
>>
>> [v2: changed a tcp limited flag for a generic proto limited flag ]
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> ---
>>   include/net/sock.h        |    1 +
>>   mm/memcontrol.c           |   20 ++++++++++++++++++--
>>   net/ipv4/tcp_memcontrol.c |   12 ++++++------
>>   3 files changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index b3ebe6b..f35ff7d 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -913,6 +913,7 @@ struct cg_proto {
>>   	struct percpu_counter	*sockets_allocated;	/* Current number of sockets. */
>>   	int			*memory_pressure;
>>   	long			*sysctl_mem;
>> +	bool			limited;
> 
> 
> please add comment somewhere. Hmm, 'limited' is good name ?

I changed it to two fields in the version I am preparing:
accounted  - means it was ever triggered
account - means we are currently limited.

It also addresses the problem you mention bellow.

>>   	/*
>>   	 * memcg field is used to find which memcg we belong directly
>>   	 * Each memcg struct can hold more than one cg_proto, so container_of
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 02b01d2..61f2d31 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -404,6 +404,7 @@ void sock_update_memcg(struct sock *sk)
>>   {
>>   	if (mem_cgroup_sockets_enabled) {
>>   		struct mem_cgroup *memcg;
>> +		struct cg_proto *cg_proto;
>>
>>   		BUG_ON(!sk->sk_prot->proto_cgroup);
>>
>> @@ -423,9 +424,10 @@ void sock_update_memcg(struct sock *sk)
>>
>>   		rcu_read_lock();
>>   		memcg = mem_cgroup_from_task(current);
>> -		if (!mem_cgroup_is_root(memcg)) {
>> +		cg_proto = sk->sk_prot->proto_cgroup(memcg);
>> +		if (!mem_cgroup_is_root(memcg)&&  cg_proto->limited) {
>>   			mem_cgroup_get(memcg);
>> -			sk->sk_cgrp = sk->sk_prot->proto_cgroup(memcg);
>> +			sk->sk_cgrp = cg_proto;
>>   		}
> 
> 
> 
> Ok, then, sk->sk_cgroup is set only after jump_label is enabled and
> its memcg has limitation.
> 
> 
> 
> Why we can reset this to be false ? disarm_static_keys() will not work
> at destroy()....

This is solved by the two booleans. What I want, is achieve consistency,
and account only when turned on.

Account after the first is turned on - which is what we have now - is
way more confusing.

> 
>> +	else if (val != RESOURCE_MAX&&  !cg_proto->limited) {
>>   		static_key_slow_inc(&memcg_socket_limit_enabled);
>> +		cg_proto->limited = true;
>> +	}
>>
> 
> Hmm. don't we need any mutex ?

I thought no.

But now that you pointed this out, I gave it some time...
What we can't do, is take a mutex in sock_update_memcg(). It will kill us.

I think we only need to protect against two processes writing to the
same file (tcp.limit_in_bytes) at the same time.

We don't hold cgroup_mutex for that, so we need locking only if the vfs
does not protect us against this concurrency. I will check that to be sure.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started.
  2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
@ 2012-03-29  6:31 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 26+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  6:31 UTC (permalink / raw)
  To: Glauber Costa, Linux Kernel; +Cc: Andrew Morton, davem

tcp memcontrol starts accouting after res->limit is set. So, if a sockets
starts before setting res->limit, there are already used resource.
After setting res->limit, the resource will be uncharged and make res_counter
below 0. This causes warning.

This patch fixes that by adding res_counter_uncharge_nowarn() and ignore the usage.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    2 ++
 include/net/sock.h          |    3 ++-
 kernel/res_counter.c        |   18 ++++++++++++++++++
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index da81af0..e081948 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -134,6 +134,8 @@ int __must_check res_counter_charge_nofail(struct res_counter *counter,
 
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+				unsigned long val);
 
 /**
  * res_counter_margin - calculate chargeable space of a counter
diff --git a/include/net/sock.h b/include/net/sock.h
index a6ba1f8..a1b3f4802 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1048,7 +1048,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
 					      unsigned long amt)
 {
-	res_counter_uncharge(prot->memory_allocated, amt << PAGE_SHIFT);
+	res_counter_uncharge_nowarn(prot->memory_allocated,
+				amt << PAGE_SHIFT);
 }
 
 static inline u64 memcg_memory_allocated_read(struct cg_proto *prot)
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d508363..2bb01ac 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -113,6 +113,24 @@ void res_counter_uncharge(struct res_counter *counter, unsigned long val)
 	local_irq_restore(flags);
 }
 
+void res_counter_uncharge_nowarn(struct res_counter *counter,
+					unsigned long val)
+{
+	struct res_counter *c;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	for (c = counter; c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->usage < val)
+			val = c->usage;
+		res_counter_uncharge_locked(c, val);
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+}
+
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-04-18 16:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-29  7:01 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
2012-03-29  7:03 ` [PATCH 1/3] [BUGFIX] memcg/tcp : fix to see use_hierarchy in tcp memcontrol cgroup KAMEZAWA Hiroyuki
2012-03-29  9:14   ` Glauber Costa
2012-03-29  9:16     ` KAMEZAWA Hiroyuki
2012-03-29  7:07 ` [BUGFIX][PATCH 2/3] memcg/tcp: remove static_branch_slow_dec() at changing limit KAMEZAWA Hiroyuki
2012-03-29 10:58   ` Glauber Costa
2012-03-29 23:51     ` KAMEZAWA Hiroyuki
2012-03-30  6:18       ` Glauber Costa
2012-03-29  7:10 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki
2012-03-29  9:21   ` Glauber Costa
2012-03-30  1:44     ` [PATCH] memcg/tcp: fix warning caused b res->usage go to negative KAMEZAWA Hiroyuki
2012-04-06 15:49       ` Glauber Costa
2012-04-10  2:37         ` KAMEZAWA Hiroyuki
2012-04-10  2:51           ` Glauber Costa
2012-04-10  3:01             ` Glauber Costa
2012-04-10  4:15               ` KAMEZAWA Hiroyuki
2012-04-11  2:22                 ` Glauber Costa
2012-04-10  3:21             ` KAMEZAWA Hiroyuki
2012-04-13 17:33           ` Glauber Costa
2012-04-18  8:02             ` KAMEZAWA Hiroyuki
2012-04-18 16:32               ` Glauber Costa
2012-04-02  3:41     ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started David Miller
2012-04-03 22:31       ` Glauber Costa
2012-04-09  0:58         ` KAMEZAWA Hiroyuki
2012-04-09  1:44           ` Glauber Costa
  -- strict thread matches above, loose matches on Subject: below --
2012-03-29  6:22 [BUGFIX][PATCH 0/3] memcg: tcp memcontrol fixes KAMEZAWA Hiroyuki
2012-03-29  6:31 ` [BUGFIX][PATCH 3/3] memcg/tcp: ignore tcp usage before accounting started KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.