All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] mm: memcontrol: account socket memory in unified hierarchy v4
@ 2015-11-24 21:51 ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Hi,

this is version 4 of the patches to add socket memory accounting to
the unified hierarchy memory controller.

Andrew, absent any new showstoppers, please consider merging this
series for v4.5. Thanks!

Changes since v3 include:

- Restored the local vmpressure reporting while preserving the
  hierarchical pressure semantics of the user interface, such that
  networking is throttled also for global memory shortage, and not
  just when encountering configured cgroup limits. As per Vladimir,
  this will make fully provisioned systems perform more smoothly.

- Make packet submission paths enter direct reclaim when memory is
  tight, and reserve the background balancing worklet for receiving
  packets in softirq context.

- Dropped a buggy shrinker cleanup, spotted by Vladimir.

- Fixed a missing return statement, spotted by Eric.

- Documented cgroup.memory=nosocket, as per Michal.

- Rebased onto latest mmots and added ack tags.

Changes since v2 include:

- Fixed an underflow bug in the mem+swap counter that came through the
  design of the per-cpu charge cache. To fix that, the unused mem+swap
  counter is now fully patched out on unified hierarchy. Double whammy.

- Restored the counting jump label such that the networking callbacks
  get patched out again when the last memory-controlled cgroup goes
  away. The code was already there, so we might as well keep it.

- Broke down the massive tcp_memcontrol rewrite patch into smaller
  logical pieces to (hopefully) make it easier to review and verify.

Changes since v1 include:

- No accounting overhead unless a dedicated cgroup is created and the
  memory controller instructed to track that group's memory footprint.
  Distribution kernels enable CONFIG_MEMCG, and users (incl. systemd)
  might create cgroups only for process control or resources other
  than memory. As noted by David and Michal, these setups shouldn't
  pay any overhead for this.

- Continue to enter the socket pressure state when hitting the memory
  controller's hard limit. Vladimir noted that there is at least some
  value in telling other sockets in the cgroup to not increase their
  transmit windows when one of them is already dropping packets.

- Drop the controversial vmpressure rework. Instead of changing the
  level where pressure is noted, keep noting pressure in its origin
  and then make the pressure check hierarchical. As noted by Michal
  and Vladimir, we shouldn't risk changing user-visible behavior.

---

Socket buffer memory can make up a significant share of a workload's
memory footprint that can be directly linked to userspace activity,
and so it needs to be part of the memory controller to provide proper
resource isolation/containment.

Historically, socket buffers were accounted in a separate counter,
without any pressure equalization between anonymous memory, page
cache, and the socket buffers. When the socket buffer pool was
exhausted, buffer allocations would fail hard and cause network
performance to tank, regardless of whether there was still memory
available to the group or not. Likewise, struggling anonymous or cache
workingsets could not dip into an idle socket memory pool. Because of
this, the feature was not usable for many real life applications.

To not repeat this mistake, the new memory controller will account all
types of memory pages it is tracking on behalf of a cgroup in a single
pool. Upon pressure, the VM reclaims and shrinks and puts pressure on
whatever memory consumer in that pool is within its reach.

For socket memory, pressure feedback is provided through vmpressure
events. When the VM has trouble freeing memory, the network code is
instructed to stop growing the cgroup's transmit windows.

This series begins with a rework of the existing tcp memory controller
that simplifies and cleans up the code while allowing us to have only
one set of networking hooks for both memory controller versions. The
original behavior of the existing tcp controller should be preserved.

It then adds socket accounting to the v2 memory controller, including
the use of the per-cpu charge cache and async memory.high enforcement
from socket memory charges.

Lastly, vmpressure is hooked up to the socket code so that it stops
growing transmit windows when the VM has trouble reclaiming memory.

 Documentation/kernel-parameters.txt |   4 +
 include/linux/memcontrol.h          |  71 ++++----
 include/linux/vmpressure.h          |   5 +-
 include/net/sock.h                  | 149 ++---------------
 include/net/tcp.h                   |   5 +-
 include/net/tcp_memcontrol.h        |   1 -
 mm/backing-dev.c                    |   2 +-
 mm/memcontrol.c                     | 296 ++++++++++++++++++++++------------
 mm/vmpressure.c                     |  78 ++++++---
 mm/vmscan.c                         |  10 +-
 net/core/sock.c                     |  78 +++------
 net/ipv4/tcp.c                      |   3 +-
 net/ipv4/tcp_ipv4.c                 |   9 +-
 net/ipv4/tcp_memcontrol.c           |  82 ++++------
 net/ipv4/tcp_output.c               |   7 +-
 net/ipv6/tcp_ipv6.c                 |   3 -
 16 files changed, 383 insertions(+), 420 deletions(-)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 00/13] mm: memcontrol: account socket memory in unified hierarchy v4
@ 2015-11-24 21:51 ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Hi,

this is version 4 of the patches to add socket memory accounting to
the unified hierarchy memory controller.

Andrew, absent any new showstoppers, please consider merging this
series for v4.5. Thanks!

Changes since v3 include:

- Restored the local vmpressure reporting while preserving the
  hierarchical pressure semantics of the user interface, such that
  networking is throttled also for global memory shortage, and not
  just when encountering configured cgroup limits. As per Vladimir,
  this will make fully provisioned systems perform more smoothly.

- Make packet submission paths enter direct reclaim when memory is
  tight, and reserve the background balancing worklet for receiving
  packets in softirq context.

- Dropped a buggy shrinker cleanup, spotted by Vladimir.

- Fixed a missing return statement, spotted by Eric.

- Documented cgroup.memory=nosocket, as per Michal.

- Rebased onto latest mmots and added ack tags.

Changes since v2 include:

- Fixed an underflow bug in the mem+swap counter that came through the
  design of the per-cpu charge cache. To fix that, the unused mem+swap
  counter is now fully patched out on unified hierarchy. Double whammy.

- Restored the counting jump label such that the networking callbacks
  get patched out again when the last memory-controlled cgroup goes
  away. The code was already there, so we might as well keep it.

- Broke down the massive tcp_memcontrol rewrite patch into smaller
  logical pieces to (hopefully) make it easier to review and verify.

Changes since v1 include:

- No accounting overhead unless a dedicated cgroup is created and the
  memory controller instructed to track that group's memory footprint.
  Distribution kernels enable CONFIG_MEMCG, and users (incl. systemd)
  might create cgroups only for process control or resources other
  than memory. As noted by David and Michal, these setups shouldn't
  pay any overhead for this.

- Continue to enter the socket pressure state when hitting the memory
  controller's hard limit. Vladimir noted that there is at least some
  value in telling other sockets in the cgroup to not increase their
  transmit windows when one of them is already dropping packets.

- Drop the controversial vmpressure rework. Instead of changing the
  level where pressure is noted, keep noting pressure in its origin
  and then make the pressure check hierarchical. As noted by Michal
  and Vladimir, we shouldn't risk changing user-visible behavior.

---

Socket buffer memory can make up a significant share of a workload's
memory footprint that can be directly linked to userspace activity,
and so it needs to be part of the memory controller to provide proper
resource isolation/containment.

Historically, socket buffers were accounted in a separate counter,
without any pressure equalization between anonymous memory, page
cache, and the socket buffers. When the socket buffer pool was
exhausted, buffer allocations would fail hard and cause network
performance to tank, regardless of whether there was still memory
available to the group or not. Likewise, struggling anonymous or cache
workingsets could not dip into an idle socket memory pool. Because of
this, the feature was not usable for many real life applications.

To not repeat this mistake, the new memory controller will account all
types of memory pages it is tracking on behalf of a cgroup in a single
pool. Upon pressure, the VM reclaims and shrinks and puts pressure on
whatever memory consumer in that pool is within its reach.

For socket memory, pressure feedback is provided through vmpressure
events. When the VM has trouble freeing memory, the network code is
instructed to stop growing the cgroup's transmit windows.

This series begins with a rework of the existing tcp memory controller
that simplifies and cleans up the code while allowing us to have only
one set of networking hooks for both memory controller versions. The
original behavior of the existing tcp controller should be preserved.

It then adds socket accounting to the v2 memory controller, including
the use of the per-cpu charge cache and async memory.high enforcement
from socket memory charges.

Lastly, vmpressure is hooked up to the socket code so that it stops
growing transmit windows when the VM has trouble reclaiming memory.

 Documentation/kernel-parameters.txt |   4 +
 include/linux/memcontrol.h          |  71 ++++----
 include/linux/vmpressure.h          |   5 +-
 include/net/sock.h                  | 149 ++---------------
 include/net/tcp.h                   |   5 +-
 include/net/tcp_memcontrol.h        |   1 -
 mm/backing-dev.c                    |   2 +-
 mm/memcontrol.c                     | 296 ++++++++++++++++++++++------------
 mm/vmpressure.c                     |  78 ++++++---
 mm/vmscan.c                         |  10 +-
 net/core/sock.c                     |  78 +++------
 net/ipv4/tcp.c                      |   3 +-
 net/ipv4/tcp_ipv4.c                 |   9 +-
 net/ipv4/tcp_memcontrol.c           |  82 ++++------
 net/ipv4/tcp_output.c               |   7 +-
 net/ipv6/tcp_ipv6.c                 |   3 -
 16 files changed, 383 insertions(+), 420 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 01/13] mm: memcontrol: export root_mem_cgroup
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

A later patch will need this symbol in files other than memcontrol.c,
so export it now and replace mem_cgroup_root_css at the same time.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 3 ++-
 mm/backing-dev.c           | 2 +-
 mm/memcontrol.c            | 5 ++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9d5472b..320b690 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -265,7 +265,8 @@ struct mem_cgroup {
 	struct mem_cgroup_per_node *nodeinfo[0];
 	/* WARNING: nodeinfo must be the last member here */
 };
-extern struct cgroup_subsys_state *mem_cgroup_root_css;
+
+extern struct mem_cgroup *root_mem_cgroup;
 
 /**
  * mem_cgroup_events - count memory events against a cgroup
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 9160853..fdc6f4d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -707,7 +707,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
 	ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL);
 	if (!ret) {
-		bdi->wb.memcg_css = mem_cgroup_root_css;
+		bdi->wb.memcg_css = &root_mem_cgroup->css;
 		bdi->wb.blkcg_css = blkcg_root_css;
 	}
 	return ret;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d5..f6ea649 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,9 +76,9 @@
 struct cgroup_subsys memory_cgrp_subsys __read_mostly;
 EXPORT_SYMBOL(memory_cgrp_subsys);
 
+struct mem_cgroup *root_mem_cgroup __read_mostly;
+
 #define MEM_CGROUP_RECLAIM_RETRIES	5
-static struct mem_cgroup *root_mem_cgroup __read_mostly;
-struct cgroup_subsys_state *mem_cgroup_root_css __read_mostly;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
@@ -4217,7 +4217,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	/* root ? */
 	if (parent_css == NULL) {
 		root_mem_cgroup = memcg;
-		mem_cgroup_root_css = &memcg->css;
 		page_counter_init(&memcg->memory, NULL);
 		memcg->high = PAGE_COUNTER_MAX;
 		memcg->soft_limit = PAGE_COUNTER_MAX;
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 01/13] mm: memcontrol: export root_mem_cgroup
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

A later patch will need this symbol in files other than memcontrol.c,
so export it now and replace mem_cgroup_root_css at the same time.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 3 ++-
 mm/backing-dev.c           | 2 +-
 mm/memcontrol.c            | 5 ++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9d5472b..320b690 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -265,7 +265,8 @@ struct mem_cgroup {
 	struct mem_cgroup_per_node *nodeinfo[0];
 	/* WARNING: nodeinfo must be the last member here */
 };
-extern struct cgroup_subsys_state *mem_cgroup_root_css;
+
+extern struct mem_cgroup *root_mem_cgroup;
 
 /**
  * mem_cgroup_events - count memory events against a cgroup
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 9160853..fdc6f4d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -707,7 +707,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
 	ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL);
 	if (!ret) {
-		bdi->wb.memcg_css = mem_cgroup_root_css;
+		bdi->wb.memcg_css = &root_mem_cgroup->css;
 		bdi->wb.blkcg_css = blkcg_root_css;
 	}
 	return ret;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d5..f6ea649 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,9 +76,9 @@
 struct cgroup_subsys memory_cgrp_subsys __read_mostly;
 EXPORT_SYMBOL(memory_cgrp_subsys);
 
+struct mem_cgroup *root_mem_cgroup __read_mostly;
+
 #define MEM_CGROUP_RECLAIM_RETRIES	5
-static struct mem_cgroup *root_mem_cgroup __read_mostly;
-struct cgroup_subsys_state *mem_cgroup_root_css __read_mostly;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
@@ -4217,7 +4217,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	/* root ? */
 	if (parent_css == NULL) {
 		root_mem_cgroup = memcg;
-		mem_cgroup_root_css = &memcg->css;
 		page_counter_init(&memcg->memory, NULL);
 		memcg->high = PAGE_COUNTER_MAX;
 		memcg->soft_limit = PAGE_COUNTER_MAX;
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 02/13] net: tcp_memcontrol: properly detect ancestor socket pressure
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

When charging socket memory, the code currently checks only the local
page counter for excess to determine whether the memcg is under socket
pressure. But even if the local counter is fine, one of the ancestors
could have breached its limit, which should also force this child to
enter socket pressure. This currently doesn't happen.

Fix this by using page_counter_try_charge() first. If that fails, it
means that either the local counter or one of the ancestors are in
excess of their limit, and the child should enter socket pressure.

Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/net/sock.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7f89e4b..8133c71 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1190,11 +1190,13 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 					      unsigned long amt,
 					      int *parent_status)
 {
-	page_counter_charge(&prot->memory_allocated, amt);
+	struct page_counter *counter;
+
+	if (page_counter_try_charge(&prot->memory_allocated, amt, &counter))
+		return;
 
-	if (page_counter_read(&prot->memory_allocated) >
-	    prot->memory_allocated.limit)
-		*parent_status = OVER_LIMIT;
+	page_counter_charge(&prot->memory_allocated, amt);
+	*parent_status = OVER_LIMIT;
 }
 
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 02/13] net: tcp_memcontrol: properly detect ancestor socket pressure
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

When charging socket memory, the code currently checks only the local
page counter for excess to determine whether the memcg is under socket
pressure. But even if the local counter is fine, one of the ancestors
could have breached its limit, which should also force this child to
enter socket pressure. This currently doesn't happen.

Fix this by using page_counter_try_charge() first. If that fails, it
means that either the local counter or one of the ancestors are in
excess of their limit, and the child should enter socket pressure.

Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/net/sock.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7f89e4b..8133c71 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1190,11 +1190,13 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
 					      unsigned long amt,
 					      int *parent_status)
 {
-	page_counter_charge(&prot->memory_allocated, amt);
+	struct page_counter *counter;
+
+	if (page_counter_try_charge(&prot->memory_allocated, amt, &counter))
+		return;
 
-	if (page_counter_read(&prot->memory_allocated) >
-	    prot->memory_allocated.limit)
-		*parent_status = OVER_LIMIT;
+	page_counter_charge(&prot->memory_allocated, amt);
+	*parent_status = OVER_LIMIT;
 }
 
 static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 03/13] net: tcp_memcontrol: remove bogus hierarchy pressure propagation
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

When a cgroup currently breaches its socket memory limit, it enters
memory pressure mode for itself and its *ancestors*. This throttles
transmission in unrelated sibling and cousin subtrees that have
nothing to do with the breached limit.

On the contrary, breaching a limit should make that group and its
*children* enter memory pressure mode. But this happens already,
albeit lazily: if an ancestor limit is breached, siblings will enter
memory pressure on their own once the next packet arrives for them.

So no additional hierarchy code is needed. Remove the bogus stuff.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/net/sock.h | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8133c71..e27a8bb 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1152,14 +1152,8 @@ static inline void sk_leave_memory_pressure(struct sock *sk)
 	if (*memory_pressure)
 		*memory_pressure = 0;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-		struct proto *prot = sk->sk_prot;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			cg_proto->memory_pressure = 0;
-	}
-
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		sk->sk_cgrp->memory_pressure = 0;
 }
 
 static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1167,13 +1161,8 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
 	if (!sk->sk_prot->enter_memory_pressure)
 		return;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-		struct proto *prot = sk->sk_prot;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			cg_proto->memory_pressure = 1;
-	}
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		sk->sk_cgrp->memory_pressure = 1;
 
 	sk->sk_prot->enter_memory_pressure(sk);
 }
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 03/13] net: tcp_memcontrol: remove bogus hierarchy pressure propagation
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

When a cgroup currently breaches its socket memory limit, it enters
memory pressure mode for itself and its *ancestors*. This throttles
transmission in unrelated sibling and cousin subtrees that have
nothing to do with the breached limit.

On the contrary, breaching a limit should make that group and its
*children* enter memory pressure mode. But this happens already,
albeit lazily: if an ancestor limit is breached, siblings will enter
memory pressure on their own once the next packet arrives for them.

So no additional hierarchy code is needed. Remove the bogus stuff.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/net/sock.h | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8133c71..e27a8bb 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1152,14 +1152,8 @@ static inline void sk_leave_memory_pressure(struct sock *sk)
 	if (*memory_pressure)
 		*memory_pressure = 0;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-		struct proto *prot = sk->sk_prot;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			cg_proto->memory_pressure = 0;
-	}
-
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		sk->sk_cgrp->memory_pressure = 0;
 }
 
 static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1167,13 +1161,8 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
 	if (!sk->sk_prot->enter_memory_pressure)
 		return;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-		struct proto *prot = sk->sk_prot;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			cg_proto->memory_pressure = 1;
-	}
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		sk->sk_cgrp->memory_pressure = 1;
 
 	sk->sk_prot->enter_memory_pressure(sk);
 }
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 04/13] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Move the jump-label from sock_update_memcg() and sock_release_memcg()
to the callsite, and so eliminate those function calls when socket
accounting is not enabled.

This also eliminates the need for dummy functions because the calls
will be optimized away if the Kconfig options are not enabled.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h |  9 +-------
 mm/memcontrol.c            | 56 +++++++++++++++++++++-------------------------
 net/core/sock.c            |  9 ++------
 net/ipv4/tcp.c             |  3 ++-
 net/ipv4/tcp_ipv4.c        |  4 +++-
 5 files changed, 33 insertions(+), 48 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 320b690..724b76a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -697,17 +697,10 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-struct sock;
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+struct sock;
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
-#else
-static inline void sock_update_memcg(struct sock *sk)
-{
-}
-static inline void sock_release_memcg(struct sock *sk)
-{
-}
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f6ea649..0b78f82 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -293,46 +293,40 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 
 void sock_update_memcg(struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled) {
-		struct mem_cgroup *memcg;
-		struct cg_proto *cg_proto;
+	struct mem_cgroup *memcg;
+	struct cg_proto *cg_proto;
 
-		BUG_ON(!sk->sk_prot->proto_cgroup);
+	BUG_ON(!sk->sk_prot->proto_cgroup);
 
-		/* Socket cloning can throw us here with sk_cgrp already
-		 * filled. It won't however, necessarily happen from
-		 * process context. So the test for root memcg given
-		 * the current task's memcg won't help us in this case.
-		 *
-		 * Respecting the original socket's memcg is a better
-		 * decision in this case.
-		 */
-		if (sk->sk_cgrp) {
-			BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
-			css_get(&sk->sk_cgrp->memcg->css);
-			return;
-		}
+	/* Socket cloning can throw us here with sk_cgrp already
+	 * filled. It won't however, necessarily happen from
+	 * process context. So the test for root memcg given
+	 * the current task's memcg won't help us in this case.
+	 *
+	 * Respecting the original socket's memcg is a better
+	 * decision in this case.
+	 */
+	if (sk->sk_cgrp) {
+		BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
+		css_get(&sk->sk_cgrp->memcg->css);
+		return;
+	}
 
-		rcu_read_lock();
-		memcg = mem_cgroup_from_task(current);
-		cg_proto = sk->sk_prot->proto_cgroup(memcg);
-		if (cg_proto && cg_proto->active &&
-		    css_tryget_online(&memcg->css)) {
-			sk->sk_cgrp = cg_proto;
-		}
-		rcu_read_unlock();
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	cg_proto = sk->sk_prot->proto_cgroup(memcg);
+	if (cg_proto && cg_proto->active &&
+	    css_tryget_online(&memcg->css)) {
+		sk->sk_cgrp = cg_proto;
 	}
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(sock_update_memcg);
 
 void sock_release_memcg(struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct mem_cgroup *memcg;
-		WARN_ON(!sk->sk_cgrp->memcg);
-		memcg = sk->sk_cgrp->memcg;
-		css_put(&sk->sk_cgrp->memcg->css);
-	}
+	WARN_ON(!sk->sk_cgrp->memcg);
+	css_put(&sk->sk_cgrp->memcg->css);
 }
 
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1e4dd54..04e54bc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1488,12 +1488,6 @@ void sk_free(struct sock *sk)
 }
 EXPORT_SYMBOL(sk_free);
 
-static void sk_update_clone(const struct sock *sk, struct sock *newsk)
-{
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		sock_update_memcg(newsk);
-}
-
 /**
  *	sk_clone_lock - clone a socket, and lock its clone
  *	@sk: the socket to clone
@@ -1589,7 +1583,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		sk_update_clone(sk, newsk);
+		if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+			sock_update_memcg(newsk);
 
 		if (newsk->sk_prot->sockets_allocated)
 			sk_sockets_allocated_inc(newsk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c172877..f166c28 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -422,7 +422,8 @@ void tcp_init_sock(struct sock *sk)
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
 	local_bh_disable();
-	sock_update_memcg(sk);
+	if (mem_cgroup_sockets_enabled)
+		sock_update_memcg(sk);
 	sk_sockets_allocated_inc(sk);
 	local_bh_enable();
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index db00343..4027e02 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1813,7 +1813,9 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	tcp_saved_syn_free(tp);
 
 	sk_sockets_allocated_dec(sk);
-	sock_release_memcg(sk);
+
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		sock_release_memcg(sk);
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 04/13] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Move the jump-label from sock_update_memcg() and sock_release_memcg()
to the callsite, and so eliminate those function calls when socket
accounting is not enabled.

This also eliminates the need for dummy functions because the calls
will be optimized away if the Kconfig options are not enabled.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h |  9 +-------
 mm/memcontrol.c            | 56 +++++++++++++++++++++-------------------------
 net/core/sock.c            |  9 ++------
 net/ipv4/tcp.c             |  3 ++-
 net/ipv4/tcp_ipv4.c        |  4 +++-
 5 files changed, 33 insertions(+), 48 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 320b690..724b76a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -697,17 +697,10 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-struct sock;
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+struct sock;
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
-#else
-static inline void sock_update_memcg(struct sock *sk)
-{
-}
-static inline void sock_release_memcg(struct sock *sk)
-{
-}
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f6ea649..0b78f82 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -293,46 +293,40 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 
 void sock_update_memcg(struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled) {
-		struct mem_cgroup *memcg;
-		struct cg_proto *cg_proto;
+	struct mem_cgroup *memcg;
+	struct cg_proto *cg_proto;
 
-		BUG_ON(!sk->sk_prot->proto_cgroup);
+	BUG_ON(!sk->sk_prot->proto_cgroup);
 
-		/* Socket cloning can throw us here with sk_cgrp already
-		 * filled. It won't however, necessarily happen from
-		 * process context. So the test for root memcg given
-		 * the current task's memcg won't help us in this case.
-		 *
-		 * Respecting the original socket's memcg is a better
-		 * decision in this case.
-		 */
-		if (sk->sk_cgrp) {
-			BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
-			css_get(&sk->sk_cgrp->memcg->css);
-			return;
-		}
+	/* Socket cloning can throw us here with sk_cgrp already
+	 * filled. It won't however, necessarily happen from
+	 * process context. So the test for root memcg given
+	 * the current task's memcg won't help us in this case.
+	 *
+	 * Respecting the original socket's memcg is a better
+	 * decision in this case.
+	 */
+	if (sk->sk_cgrp) {
+		BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
+		css_get(&sk->sk_cgrp->memcg->css);
+		return;
+	}
 
-		rcu_read_lock();
-		memcg = mem_cgroup_from_task(current);
-		cg_proto = sk->sk_prot->proto_cgroup(memcg);
-		if (cg_proto && cg_proto->active &&
-		    css_tryget_online(&memcg->css)) {
-			sk->sk_cgrp = cg_proto;
-		}
-		rcu_read_unlock();
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	cg_proto = sk->sk_prot->proto_cgroup(memcg);
+	if (cg_proto && cg_proto->active &&
+	    css_tryget_online(&memcg->css)) {
+		sk->sk_cgrp = cg_proto;
 	}
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(sock_update_memcg);
 
 void sock_release_memcg(struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct mem_cgroup *memcg;
-		WARN_ON(!sk->sk_cgrp->memcg);
-		memcg = sk->sk_cgrp->memcg;
-		css_put(&sk->sk_cgrp->memcg->css);
-	}
+	WARN_ON(!sk->sk_cgrp->memcg);
+	css_put(&sk->sk_cgrp->memcg->css);
 }
 
 struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1e4dd54..04e54bc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1488,12 +1488,6 @@ void sk_free(struct sock *sk)
 }
 EXPORT_SYMBOL(sk_free);
 
-static void sk_update_clone(const struct sock *sk, struct sock *newsk)
-{
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		sock_update_memcg(newsk);
-}
-
 /**
  *	sk_clone_lock - clone a socket, and lock its clone
  *	@sk: the socket to clone
@@ -1589,7 +1583,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		sk_update_clone(sk, newsk);
+		if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+			sock_update_memcg(newsk);
 
 		if (newsk->sk_prot->sockets_allocated)
 			sk_sockets_allocated_inc(newsk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c172877..f166c28 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -422,7 +422,8 @@ void tcp_init_sock(struct sock *sk)
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
 	local_bh_disable();
-	sock_update_memcg(sk);
+	if (mem_cgroup_sockets_enabled)
+		sock_update_memcg(sk);
 	sk_sockets_allocated_inc(sk);
 	local_bh_enable();
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index db00343..4027e02 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1813,7 +1813,9 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	tcp_saved_syn_free(tp);
 
 	sk_sockets_allocated_dec(sk);
-	sock_release_memcg(sk);
+
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		sock_release_memcg(sk);
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 05/13] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The number of allocated sockets is used for calculations in the soft
limit phase, where packets are accepted but the socket is under memory
pressure. Since there is no soft limit phase in tcp_memcontrol, and
memory pressure is only entered when packets are already dropped, this
is actually dead code. Remove it.

As this is the last user of parent_cg_proto(), remove that too.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h |  1 -
 include/net/sock.h         | 39 +++------------------------------------
 net/ipv4/tcp_memcontrol.c  |  3 ---
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 724b76a..cc45407 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -87,7 +87,6 @@ enum mem_cgroup_events_target {
 
 struct cg_proto {
 	struct page_counter	memory_allocated;	/* Current allocated memory. */
-	struct percpu_counter	sockets_allocated;	/* Current number of sockets. */
 	int			memory_pressure;
 	bool			active;
 	long			sysctl_mem[3];
diff --git a/include/net/sock.h b/include/net/sock.h
index e27a8bb..7afbdab 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1095,19 +1095,9 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
 
 #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
 extern struct static_key memcg_socket_limit_enabled;
-static inline struct cg_proto *parent_cg_proto(struct proto *proto,
-					       struct cg_proto *cg_proto)
-{
-	return proto->proto_cgroup(parent_mem_cgroup(cg_proto->memcg));
-}
 #define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
 #else
 #define mem_cgroup_sockets_enabled 0
-static inline struct cg_proto *parent_cg_proto(struct proto *proto,
-					       struct cg_proto *cg_proto)
-{
-	return NULL;
-}
 #endif
 
 static inline bool sk_stream_memory_free(const struct sock *sk)
@@ -1233,41 +1223,18 @@ sk_memory_allocated_sub(struct sock *sk, int amt)
 
 static inline void sk_sockets_allocated_dec(struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			percpu_counter_dec(&cg_proto->sockets_allocated);
-	}
-
-	percpu_counter_dec(prot->sockets_allocated);
+	percpu_counter_dec(sk->sk_prot->sockets_allocated);
 }
 
 static inline void sk_sockets_allocated_inc(struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			percpu_counter_inc(&cg_proto->sockets_allocated);
-	}
-
-	percpu_counter_inc(prot->sockets_allocated);
+	percpu_counter_inc(sk->sk_prot->sockets_allocated);
 }
 
 static inline int
 sk_sockets_allocated_read_positive(struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return percpu_counter_read_positive(&sk->sk_cgrp->sockets_allocated);
-
-	return percpu_counter_read_positive(prot->sockets_allocated);
+	return percpu_counter_read_positive(sk->sk_prot->sockets_allocated);
 }
 
 static inline int
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index d07579a..6759e0d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -32,7 +32,6 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 		counter_parent = &parent_cg->memory_allocated;
 
 	page_counter_init(&cg_proto->memory_allocated, counter_parent);
-	percpu_counter_init(&cg_proto->sockets_allocated, 0, GFP_KERNEL);
 
 	return 0;
 }
@@ -46,8 +45,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 	if (!cg_proto)
 		return;
 
-	percpu_counter_destroy(&cg_proto->sockets_allocated);
-
 	if (cg_proto->active)
 		static_key_slow_dec(&memcg_socket_limit_enabled);
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 05/13] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The number of allocated sockets is used for calculations in the soft
limit phase, where packets are accepted but the socket is under memory
pressure. Since there is no soft limit phase in tcp_memcontrol, and
memory pressure is only entered when packets are already dropped, this
is actually dead code. Remove it.

As this is the last user of parent_cg_proto(), remove that too.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h |  1 -
 include/net/sock.h         | 39 +++------------------------------------
 net/ipv4/tcp_memcontrol.c  |  3 ---
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 724b76a..cc45407 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -87,7 +87,6 @@ enum mem_cgroup_events_target {
 
 struct cg_proto {
 	struct page_counter	memory_allocated;	/* Current allocated memory. */
-	struct percpu_counter	sockets_allocated;	/* Current number of sockets. */
 	int			memory_pressure;
 	bool			active;
 	long			sysctl_mem[3];
diff --git a/include/net/sock.h b/include/net/sock.h
index e27a8bb..7afbdab 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1095,19 +1095,9 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
 
 #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
 extern struct static_key memcg_socket_limit_enabled;
-static inline struct cg_proto *parent_cg_proto(struct proto *proto,
-					       struct cg_proto *cg_proto)
-{
-	return proto->proto_cgroup(parent_mem_cgroup(cg_proto->memcg));
-}
 #define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
 #else
 #define mem_cgroup_sockets_enabled 0
-static inline struct cg_proto *parent_cg_proto(struct proto *proto,
-					       struct cg_proto *cg_proto)
-{
-	return NULL;
-}
 #endif
 
 static inline bool sk_stream_memory_free(const struct sock *sk)
@@ -1233,41 +1223,18 @@ sk_memory_allocated_sub(struct sock *sk, int amt)
 
 static inline void sk_sockets_allocated_dec(struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			percpu_counter_dec(&cg_proto->sockets_allocated);
-	}
-
-	percpu_counter_dec(prot->sockets_allocated);
+	percpu_counter_dec(sk->sk_prot->sockets_allocated);
 }
 
 static inline void sk_sockets_allocated_inc(struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		struct cg_proto *cg_proto = sk->sk_cgrp;
-
-		for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-			percpu_counter_inc(&cg_proto->sockets_allocated);
-	}
-
-	percpu_counter_inc(prot->sockets_allocated);
+	percpu_counter_inc(sk->sk_prot->sockets_allocated);
 }
 
 static inline int
 sk_sockets_allocated_read_positive(struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return percpu_counter_read_positive(&sk->sk_cgrp->sockets_allocated);
-
-	return percpu_counter_read_positive(prot->sockets_allocated);
+	return percpu_counter_read_positive(sk->sk_prot->sockets_allocated);
 }
 
 static inline int
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index d07579a..6759e0d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -32,7 +32,6 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 		counter_parent = &parent_cg->memory_allocated;
 
 	page_counter_init(&cg_proto->memory_allocated, counter_parent);
-	percpu_counter_init(&cg_proto->sockets_allocated, 0, GFP_KERNEL);
 
 	return 0;
 }
@@ -46,8 +45,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 	if (!cg_proto)
 		return;
 
-	percpu_counter_destroy(&cg_proto->sockets_allocated);
-
 	if (cg_proto->active)
 		static_key_slow_dec(&memcg_socket_limit_enabled);
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 06/13] net: tcp_memcontrol: simplify the per-memcg limit access
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

tcp_memcontrol replicates the global sysctl_mem limit array per
cgroup, but it only ever sets these entries to the value of the
memory_allocated page_counter limit. Use the latter directly.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 1 -
 include/net/sock.h         | 8 +++++---
 net/ipv4/tcp_memcontrol.c  | 8 --------
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index cc45407..1a658be 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -89,7 +89,6 @@ struct cg_proto {
 	struct page_counter	memory_allocated;	/* Current allocated memory. */
 	int			memory_pressure;
 	bool			active;
-	long			sysctl_mem[3];
 	/*
 	 * memcg field is used to find which memcg we belong directly
 	 * Each memcg struct can hold more than one cg_proto, so container_of
diff --git a/include/net/sock.h b/include/net/sock.h
index 7afbdab..0b333c2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1159,10 +1159,12 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
 
 static inline long sk_prot_mem_limits(const struct sock *sk, int index)
 {
-	long *prot = sk->sk_prot->sysctl_mem;
+	long limit = sk->sk_prot->sysctl_mem[index];
+
 	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		prot = sk->sk_cgrp->sysctl_mem;
-	return prot[index];
+		limit = min_t(long, limit, sk->sk_cgrp->memory_allocated.limit);
+
+	return limit;
 }
 
 static inline void memcg_memory_allocated_add(struct cg_proto *prot,
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 6759e0d..ef4268d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -21,9 +21,6 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 	if (!cg_proto)
 		return 0;
 
-	cg_proto->sysctl_mem[0] = sysctl_tcp_mem[0];
-	cg_proto->sysctl_mem[1] = sysctl_tcp_mem[1];
-	cg_proto->sysctl_mem[2] = sysctl_tcp_mem[2];
 	cg_proto->memory_pressure = 0;
 	cg_proto->memcg = memcg;
 
@@ -54,7 +51,6 @@ EXPORT_SYMBOL(tcp_destroy_cgroup);
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 {
 	struct cg_proto *cg_proto;
-	int i;
 	int ret;
 
 	cg_proto = tcp_prot.proto_cgroup(memcg);
@@ -65,10 +61,6 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 	if (ret)
 		return ret;
 
-	for (i = 0; i < 3; i++)
-		cg_proto->sysctl_mem[i] = min_t(long, nr_pages,
-						sysctl_tcp_mem[i]);
-
 	if (!cg_proto->active) {
 		/*
 		 * The active flag needs to be written after the static_key
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 06/13] net: tcp_memcontrol: simplify the per-memcg limit access
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

tcp_memcontrol replicates the global sysctl_mem limit array per
cgroup, but it only ever sets these entries to the value of the
memory_allocated page_counter limit. Use the latter directly.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 1 -
 include/net/sock.h         | 8 +++++---
 net/ipv4/tcp_memcontrol.c  | 8 --------
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index cc45407..1a658be 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -89,7 +89,6 @@ struct cg_proto {
 	struct page_counter	memory_allocated;	/* Current allocated memory. */
 	int			memory_pressure;
 	bool			active;
-	long			sysctl_mem[3];
 	/*
 	 * memcg field is used to find which memcg we belong directly
 	 * Each memcg struct can hold more than one cg_proto, so container_of
diff --git a/include/net/sock.h b/include/net/sock.h
index 7afbdab..0b333c2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1159,10 +1159,12 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
 
 static inline long sk_prot_mem_limits(const struct sock *sk, int index)
 {
-	long *prot = sk->sk_prot->sysctl_mem;
+	long limit = sk->sk_prot->sysctl_mem[index];
+
 	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		prot = sk->sk_cgrp->sysctl_mem;
-	return prot[index];
+		limit = min_t(long, limit, sk->sk_cgrp->memory_allocated.limit);
+
+	return limit;
 }
 
 static inline void memcg_memory_allocated_add(struct cg_proto *prot,
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 6759e0d..ef4268d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -21,9 +21,6 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 	if (!cg_proto)
 		return 0;
 
-	cg_proto->sysctl_mem[0] = sysctl_tcp_mem[0];
-	cg_proto->sysctl_mem[1] = sysctl_tcp_mem[1];
-	cg_proto->sysctl_mem[2] = sysctl_tcp_mem[2];
 	cg_proto->memory_pressure = 0;
 	cg_proto->memcg = memcg;
 
@@ -54,7 +51,6 @@ EXPORT_SYMBOL(tcp_destroy_cgroup);
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 {
 	struct cg_proto *cg_proto;
-	int i;
 	int ret;
 
 	cg_proto = tcp_prot.proto_cgroup(memcg);
@@ -65,10 +61,6 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 	if (ret)
 		return ret;
 
-	for (i = 0; i < 3; i++)
-		cg_proto->sysctl_mem[i] = min_t(long, nr_pages,
-						sysctl_tcp_mem[i]);
-
 	if (!cg_proto->active) {
 		/*
 		 * The active flag needs to be written after the static_key
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 07/13] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:51   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

There won't be a tcp control soft limit, so integrating the memcg code
into the global skmem limiting scheme complicates things
unnecessarily. Replace this with simple and clear charge and uncharge
calls--hidden behind a jump label--to account skb memory.

Note that this is not purely aesthetic: as a result of shoehorning the
per-memcg code into the same memory accounting functions that handle
the global level, the old code would compare the per-memcg consumption
against the smaller of the per-memcg limit and the global limit. This
allowed the total consumption of multiple sockets to exceed the global
limit, as long as the individual sockets stayed within bounds. After
this change, the code will always compare the per-memcg consumption to
the per-memcg limit, and the global consumption to the global limit,
and thus close this loophole.

Without a soft limit, the per-memcg memory pressure state in sockets
is generally questionable. However, we did it until now, so we
continue to enter it when the hard limit is hit, and packets are
dropped, to let other sockets in the cgroup know that they shouldn't
grow their transmit windows, either. However, keep it simple in the
new callback model and leave memory pressure lazily when the next
packet is accepted (as opposed to doing it synchroneously when packets
are processed). When packets are dropped, network performance will
already be in the toilet, so that should be a reasonable trade-off.

As described above, consumption is now checked on the per-memcg level
and the global level separately. Likewise, memory pressure states are
maintained on both the per-memcg level and the global level, and a
socket is considered under pressure when either level asserts as much.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 12 ++++-----
 include/net/sock.h         | 64 ++++++----------------------------------------
 include/net/tcp.h          |  5 ++--
 mm/memcontrol.c            | 32 +++++++++++++++++++++++
 net/core/sock.c            | 26 +++++++++++--------
 net/ipv4/tcp_output.c      |  7 +++--
 6 files changed, 70 insertions(+), 76 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1a658be..4d80021 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -664,12 +664,6 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
-enum {
-	UNDER_LIMIT,
-	SOFT_LIMIT,
-	OVER_LIMIT,
-};
-
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
@@ -699,6 +693,12 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 struct sock;
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
+bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
+void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
+static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
+{
+	return proto->memory_pressure;
+}
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 0b333c2..888aa3f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1126,8 +1126,9 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	if (!sk->sk_prot->memory_pressure)
 		return false;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return !!sk->sk_cgrp->memory_pressure;
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+		return true;
 
 	return !!*sk->sk_prot->memory_pressure;
 }
@@ -1141,9 +1142,6 @@ static inline void sk_leave_memory_pressure(struct sock *sk)
 
 	if (*memory_pressure)
 		*memory_pressure = 0;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		sk->sk_cgrp->memory_pressure = 0;
 }
 
 static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1151,76 +1149,30 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
 	if (!sk->sk_prot->enter_memory_pressure)
 		return;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		sk->sk_cgrp->memory_pressure = 1;
-
 	sk->sk_prot->enter_memory_pressure(sk);
 }
 
 static inline long sk_prot_mem_limits(const struct sock *sk, int index)
 {
-	long limit = sk->sk_prot->sysctl_mem[index];
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		limit = min_t(long, limit, sk->sk_cgrp->memory_allocated.limit);
-
-	return limit;
-}
-
-static inline void memcg_memory_allocated_add(struct cg_proto *prot,
-					      unsigned long amt,
-					      int *parent_status)
-{
-	struct page_counter *counter;
-
-	if (page_counter_try_charge(&prot->memory_allocated, amt, &counter))
-		return;
-
-	page_counter_charge(&prot->memory_allocated, amt);
-	*parent_status = OVER_LIMIT;
-}
-
-static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
-					      unsigned long amt)
-{
-	page_counter_uncharge(&prot->memory_allocated, amt);
+	return sk->sk_prot->sysctl_mem[index];
 }
 
 static inline long
 sk_memory_allocated(const struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return page_counter_read(&sk->sk_cgrp->memory_allocated);
-
-	return atomic_long_read(prot->memory_allocated);
+	return atomic_long_read(sk->sk_prot->memory_allocated);
 }
 
 static inline long
-sk_memory_allocated_add(struct sock *sk, int amt, int *parent_status)
+sk_memory_allocated_add(struct sock *sk, int amt)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		memcg_memory_allocated_add(sk->sk_cgrp, amt, parent_status);
-		/* update the root cgroup regardless */
-		atomic_long_add_return(amt, prot->memory_allocated);
-		return page_counter_read(&sk->sk_cgrp->memory_allocated);
-	}
-
-	return atomic_long_add_return(amt, prot->memory_allocated);
+	return atomic_long_add_return(amt, sk->sk_prot->memory_allocated);
 }
 
 static inline void
 sk_memory_allocated_sub(struct sock *sk, int amt)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		memcg_memory_allocated_sub(sk->sk_cgrp, amt);
-
-	atomic_long_sub(amt, prot->memory_allocated);
+	atomic_long_sub(amt, sk->sk_prot->memory_allocated);
 }
 
 static inline void sk_sockets_allocated_dec(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f80e74c..04517d6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -292,8 +292,9 @@ extern int tcp_memory_pressure;
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return !!sk->sk_cgrp->memory_pressure;
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+		return true;
 
 	return tcp_memory_pressure;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0b78f82..4b586ea 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -338,6 +338,38 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
 }
 EXPORT_SYMBOL(tcp_proto_cgroup);
 
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @proto: proto to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @proto. Returns %true if the charge fit within
+ * @proto's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+{
+	struct page_counter *counter;
+
+	if (page_counter_try_charge(&proto->memory_allocated,
+				    nr_pages, &counter)) {
+		proto->memory_pressure = 0;
+		return true;
+	}
+	page_counter_charge(&proto->memory_allocated, nr_pages);
+	proto->memory_pressure = 1;
+	return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @proto - proto to uncharge
+ * @nr_pages - number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+{
+	page_counter_uncharge(&proto->memory_allocated, nr_pages);
+}
+
 #endif
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/net/core/sock.c b/net/core/sock.c
index 04e54bc..5b1b96f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2066,27 +2066,27 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
 	struct proto *prot = sk->sk_prot;
 	int amt = sk_mem_pages(size);
 	long allocated;
-	int parent_status = UNDER_LIMIT;
 
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
 
-	allocated = sk_memory_allocated_add(sk, amt, &parent_status);
+	allocated = sk_memory_allocated_add(sk, amt);
+
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+	    !mem_cgroup_charge_skmem(sk->sk_cgrp, amt))
+		goto suppress_allocation;
 
 	/* Under limit. */
-	if (parent_status == UNDER_LIMIT &&
-			allocated <= sk_prot_mem_limits(sk, 0)) {
+	if (allocated <= sk_prot_mem_limits(sk, 0)) {
 		sk_leave_memory_pressure(sk);
 		return 1;
 	}
 
-	/* Under pressure. (we or our parents) */
-	if ((parent_status > SOFT_LIMIT) ||
-			allocated > sk_prot_mem_limits(sk, 1))
+	/* Under pressure. */
+	if (allocated > sk_prot_mem_limits(sk, 1))
 		sk_enter_memory_pressure(sk);
 
-	/* Over hard limit (we or our parents) */
-	if ((parent_status == OVER_LIMIT) ||
-			(allocated > sk_prot_mem_limits(sk, 2)))
+	/* Over hard limit. */
+	if (allocated > sk_prot_mem_limits(sk, 2))
 		goto suppress_allocation;
 
 	/* guarantee minimum buffer size under pressure */
@@ -2135,6 +2135,9 @@ suppress_allocation:
 
 	sk_memory_allocated_sub(sk, amt);
 
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amt);
+
 	return 0;
 }
 EXPORT_SYMBOL(__sk_mem_schedule);
@@ -2150,6 +2153,9 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
 	sk_memory_allocated_sub(sk, amount);
 	sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
 
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amount);
+
 	if (sk_under_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
 		sk_leave_memory_pressure(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cb7ca56..7aa168a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2813,13 +2813,16 @@ begin_fwd:
  */
 void sk_forced_mem_schedule(struct sock *sk, int size)
 {
-	int amt, status;
+	int amt;
 
 	if (size <= sk->sk_forward_alloc)
 		return;
 	amt = sk_mem_pages(size);
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
-	sk_memory_allocated_add(sk, amt, &status);
+	sk_memory_allocated_add(sk, amt);
+
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		mem_cgroup_charge_skmem(sk->sk_cgrp, amt);
 }
 
 /* Send a FIN. The caller locks the socket for us.
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 07/13] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
@ 2015-11-24 21:51   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

There won't be a tcp control soft limit, so integrating the memcg code
into the global skmem limiting scheme complicates things
unnecessarily. Replace this with simple and clear charge and uncharge
calls--hidden behind a jump label--to account skb memory.

Note that this is not purely aesthetic: as a result of shoehorning the
per-memcg code into the same memory accounting functions that handle
the global level, the old code would compare the per-memcg consumption
against the smaller of the per-memcg limit and the global limit. This
allowed the total consumption of multiple sockets to exceed the global
limit, as long as the individual sockets stayed within bounds. After
this change, the code will always compare the per-memcg consumption to
the per-memcg limit, and the global consumption to the global limit,
and thus close this loophole.

Without a soft limit, the per-memcg memory pressure state in sockets
is generally questionable. However, we did it until now, so we
continue to enter it when the hard limit is hit, and packets are
dropped, to let other sockets in the cgroup know that they shouldn't
grow their transmit windows, either. However, keep it simple in the
new callback model and leave memory pressure lazily when the next
packet is accepted (as opposed to doing it synchroneously when packets
are processed). When packets are dropped, network performance will
already be in the toilet, so that should be a reasonable trade-off.

As described above, consumption is now checked on the per-memcg level
and the global level separately. Likewise, memory pressure states are
maintained on both the per-memcg level and the global level, and a
socket is considered under pressure when either level asserts as much.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 12 ++++-----
 include/net/sock.h         | 64 ++++++----------------------------------------
 include/net/tcp.h          |  5 ++--
 mm/memcontrol.c            | 32 +++++++++++++++++++++++
 net/core/sock.c            | 26 +++++++++++--------
 net/ipv4/tcp_output.c      |  7 +++--
 6 files changed, 70 insertions(+), 76 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1a658be..4d80021 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -664,12 +664,6 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
-enum {
-	UNDER_LIMIT,
-	SOFT_LIMIT,
-	OVER_LIMIT,
-};
-
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
@@ -699,6 +693,12 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 struct sock;
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
+bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
+void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
+static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
+{
+	return proto->memory_pressure;
+}
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 0b333c2..888aa3f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1126,8 +1126,9 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	if (!sk->sk_prot->memory_pressure)
 		return false;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return !!sk->sk_cgrp->memory_pressure;
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+		return true;
 
 	return !!*sk->sk_prot->memory_pressure;
 }
@@ -1141,9 +1142,6 @@ static inline void sk_leave_memory_pressure(struct sock *sk)
 
 	if (*memory_pressure)
 		*memory_pressure = 0;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		sk->sk_cgrp->memory_pressure = 0;
 }
 
 static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1151,76 +1149,30 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
 	if (!sk->sk_prot->enter_memory_pressure)
 		return;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		sk->sk_cgrp->memory_pressure = 1;
-
 	sk->sk_prot->enter_memory_pressure(sk);
 }
 
 static inline long sk_prot_mem_limits(const struct sock *sk, int index)
 {
-	long limit = sk->sk_prot->sysctl_mem[index];
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		limit = min_t(long, limit, sk->sk_cgrp->memory_allocated.limit);
-
-	return limit;
-}
-
-static inline void memcg_memory_allocated_add(struct cg_proto *prot,
-					      unsigned long amt,
-					      int *parent_status)
-{
-	struct page_counter *counter;
-
-	if (page_counter_try_charge(&prot->memory_allocated, amt, &counter))
-		return;
-
-	page_counter_charge(&prot->memory_allocated, amt);
-	*parent_status = OVER_LIMIT;
-}
-
-static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
-					      unsigned long amt)
-{
-	page_counter_uncharge(&prot->memory_allocated, amt);
+	return sk->sk_prot->sysctl_mem[index];
 }
 
 static inline long
 sk_memory_allocated(const struct sock *sk)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return page_counter_read(&sk->sk_cgrp->memory_allocated);
-
-	return atomic_long_read(prot->memory_allocated);
+	return atomic_long_read(sk->sk_prot->memory_allocated);
 }
 
 static inline long
-sk_memory_allocated_add(struct sock *sk, int amt, int *parent_status)
+sk_memory_allocated_add(struct sock *sk, int amt)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-		memcg_memory_allocated_add(sk->sk_cgrp, amt, parent_status);
-		/* update the root cgroup regardless */
-		atomic_long_add_return(amt, prot->memory_allocated);
-		return page_counter_read(&sk->sk_cgrp->memory_allocated);
-	}
-
-	return atomic_long_add_return(amt, prot->memory_allocated);
+	return atomic_long_add_return(amt, sk->sk_prot->memory_allocated);
 }
 
 static inline void
 sk_memory_allocated_sub(struct sock *sk, int amt)
 {
-	struct proto *prot = sk->sk_prot;
-
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		memcg_memory_allocated_sub(sk->sk_cgrp, amt);
-
-	atomic_long_sub(amt, prot->memory_allocated);
+	atomic_long_sub(amt, sk->sk_prot->memory_allocated);
 }
 
 static inline void sk_sockets_allocated_dec(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f80e74c..04517d6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -292,8 +292,9 @@ extern int tcp_memory_pressure;
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return !!sk->sk_cgrp->memory_pressure;
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+		return true;
 
 	return tcp_memory_pressure;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0b78f82..4b586ea 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -338,6 +338,38 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
 }
 EXPORT_SYMBOL(tcp_proto_cgroup);
 
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @proto: proto to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @proto. Returns %true if the charge fit within
+ * @proto's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+{
+	struct page_counter *counter;
+
+	if (page_counter_try_charge(&proto->memory_allocated,
+				    nr_pages, &counter)) {
+		proto->memory_pressure = 0;
+		return true;
+	}
+	page_counter_charge(&proto->memory_allocated, nr_pages);
+	proto->memory_pressure = 1;
+	return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @proto - proto to uncharge
+ * @nr_pages - number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+{
+	page_counter_uncharge(&proto->memory_allocated, nr_pages);
+}
+
 #endif
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/net/core/sock.c b/net/core/sock.c
index 04e54bc..5b1b96f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2066,27 +2066,27 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
 	struct proto *prot = sk->sk_prot;
 	int amt = sk_mem_pages(size);
 	long allocated;
-	int parent_status = UNDER_LIMIT;
 
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
 
-	allocated = sk_memory_allocated_add(sk, amt, &parent_status);
+	allocated = sk_memory_allocated_add(sk, amt);
+
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+	    !mem_cgroup_charge_skmem(sk->sk_cgrp, amt))
+		goto suppress_allocation;
 
 	/* Under limit. */
-	if (parent_status == UNDER_LIMIT &&
-			allocated <= sk_prot_mem_limits(sk, 0)) {
+	if (allocated <= sk_prot_mem_limits(sk, 0)) {
 		sk_leave_memory_pressure(sk);
 		return 1;
 	}
 
-	/* Under pressure. (we or our parents) */
-	if ((parent_status > SOFT_LIMIT) ||
-			allocated > sk_prot_mem_limits(sk, 1))
+	/* Under pressure. */
+	if (allocated > sk_prot_mem_limits(sk, 1))
 		sk_enter_memory_pressure(sk);
 
-	/* Over hard limit (we or our parents) */
-	if ((parent_status == OVER_LIMIT) ||
-			(allocated > sk_prot_mem_limits(sk, 2)))
+	/* Over hard limit. */
+	if (allocated > sk_prot_mem_limits(sk, 2))
 		goto suppress_allocation;
 
 	/* guarantee minimum buffer size under pressure */
@@ -2135,6 +2135,9 @@ suppress_allocation:
 
 	sk_memory_allocated_sub(sk, amt);
 
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amt);
+
 	return 0;
 }
 EXPORT_SYMBOL(__sk_mem_schedule);
@@ -2150,6 +2153,9 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
 	sk_memory_allocated_sub(sk, amount);
 	sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
 
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amount);
+
 	if (sk_under_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
 		sk_leave_memory_pressure(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cb7ca56..7aa168a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2813,13 +2813,16 @@ begin_fwd:
  */
 void sk_forced_mem_schedule(struct sock *sk, int size)
 {
-	int amt, status;
+	int amt;
 
 	if (size <= sk->sk_forward_alloc)
 		return;
 	amt = sk_mem_pages(size);
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
-	sk_memory_allocated_add(sk, amt, &status);
+	sk_memory_allocated_add(sk, amt);
+
+	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		mem_cgroup_charge_skmem(sk->sk_cgrp, amt);
 }
 
 /* Send a FIN. The caller locks the socket for us.
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 08/13] net: tcp_memcontrol: simplify linkage between socket and page counter
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:52   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

There won't be any separate counters for socket memory consumed by
protocols other than TCP in the future. Remove the indirection and
link sockets directly to their owning memory cgroup.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h   | 18 +++---------
 include/net/sock.h           | 36 +++---------------------
 include/net/tcp.h            |  4 +--
 include/net/tcp_memcontrol.h |  1 -
 mm/memcontrol.c              | 57 +++++++++++++++----------------------
 net/core/sock.c              | 52 +++++-----------------------------
 net/ipv4/tcp_ipv4.c          |  7 +----
 net/ipv4/tcp_memcontrol.c    | 67 +++++++++++++++++---------------------------
 net/ipv4/tcp_output.c        |  4 +--
 net/ipv6/tcp_ipv6.c          |  3 --
 10 files changed, 68 insertions(+), 181 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4d80021..d99fefe 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -89,16 +89,6 @@ struct cg_proto {
 	struct page_counter	memory_allocated;	/* Current allocated memory. */
 	int			memory_pressure;
 	bool			active;
-	/*
-	 * memcg field is used to find which memcg we belong directly
-	 * Each memcg struct can hold more than one cg_proto, so container_of
-	 * won't really cut.
-	 *
-	 * The elegant solution would be having an inverse function to
-	 * proto_cgroup in struct proto, but that means polluting the structure
-	 * for everybody, instead of just for memcg users.
-	 */
-	struct mem_cgroup	*memcg;
 };
 
 #ifdef CONFIG_MEMCG
@@ -693,11 +683,11 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 struct sock;
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
-bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
-void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
-static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
-	return proto->memory_pressure;
+	return memcg->tcp_mem.memory_pressure;
 }
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 888aa3f..1a94b85 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -69,22 +69,6 @@
 #include <net/tcp_states.h>
 #include <linux/net_tstamp.h>
 
-struct cgroup;
-struct cgroup_subsys;
-#ifdef CONFIG_NET
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss);
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg);
-#else
-static inline
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
-{
-	return 0;
-}
-static inline
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg)
-{
-}
-#endif
 /*
  * This structure really needs to be cleaned up.
  * Most of it is for TCP, and not used by any of
@@ -310,7 +294,7 @@ struct cg_proto;
   *	@sk_security: used by security modules
   *	@sk_mark: generic packet mark
   *	@sk_classid: this socket's cgroup classid
-  *	@sk_cgrp: this socket's cgroup-specific proto data
+  *	@sk_memcg: this socket's memory cgroup association
   *	@sk_write_pending: a write to stream socket waits to start
   *	@sk_state_change: callback to indicate change in the state of the sock
   *	@sk_data_ready: callback to indicate there is data to be processed
@@ -447,7 +431,7 @@ struct sock {
 #ifdef CONFIG_CGROUP_NET_CLASSID
 	u32			sk_classid;
 #endif
-	struct cg_proto		*sk_cgrp;
+	struct mem_cgroup	*sk_memcg;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk);
 	void			(*sk_write_space)(struct sock *sk);
@@ -1051,18 +1035,6 @@ struct proto {
 #ifdef SOCK_REFCNT_DEBUG
 	atomic_t		socks;
 #endif
-#ifdef CONFIG_MEMCG_KMEM
-	/*
-	 * cgroup specific init/deinit functions. Called once for all
-	 * protocols that implement it, from cgroups populate function.
-	 * This function has to setup any files the protocol want to
-	 * appear in the kmem cgroup filesystem.
-	 */
-	int			(*init_cgroup)(struct mem_cgroup *memcg,
-					       struct cgroup_subsys *ss);
-	void			(*destroy_cgroup)(struct mem_cgroup *memcg);
-	struct cg_proto		*(*proto_cgroup)(struct mem_cgroup *memcg);
-#endif
 };
 
 int proto_register(struct proto *prot, int alloc_slab);
@@ -1126,8 +1098,8 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	if (!sk->sk_prot->memory_pressure)
 		return false;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
-	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
 	return !!*sk->sk_prot->memory_pressure;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 04517d6..c008535 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -292,8 +292,8 @@ extern int tcp_memory_pressure;
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
-	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
 	return tcp_memory_pressure;
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 05b94d9..3a17b16 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -1,7 +1,6 @@
 #ifndef _TCP_MEMCG_H
 #define _TCP_MEMCG_H
 
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
 int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss);
 void tcp_destroy_cgroup(struct mem_cgroup *memcg);
 #endif /* _TCP_MEMCG_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b586ea..68d67fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -294,9 +294,6 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 void sock_update_memcg(struct sock *sk)
 {
 	struct mem_cgroup *memcg;
-	struct cg_proto *cg_proto;
-
-	BUG_ON(!sk->sk_prot->proto_cgroup);
 
 	/* Socket cloning can throw us here with sk_cgrp already
 	 * filled. It won't however, necessarily happen from
@@ -306,68 +303,58 @@ void sock_update_memcg(struct sock *sk)
 	 * Respecting the original socket's memcg is a better
 	 * decision in this case.
 	 */
-	if (sk->sk_cgrp) {
-		BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
-		css_get(&sk->sk_cgrp->memcg->css);
+	if (sk->sk_memcg) {
+		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+		css_get(&sk->sk_memcg->css);
 		return;
 	}
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
-	cg_proto = sk->sk_prot->proto_cgroup(memcg);
-	if (cg_proto && cg_proto->active &&
-	    css_tryget_online(&memcg->css)) {
-		sk->sk_cgrp = cg_proto;
-	}
+	if (memcg != root_mem_cgroup &&
+	    memcg->tcp_mem.active &&
+	    css_tryget_online(&memcg->css))
+		sk->sk_memcg = memcg;
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(sock_update_memcg);
 
 void sock_release_memcg(struct sock *sk)
 {
-	WARN_ON(!sk->sk_cgrp->memcg);
-	css_put(&sk->sk_cgrp->memcg->css);
-}
-
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
-{
-	if (!memcg || mem_cgroup_is_root(memcg))
-		return NULL;
-
-	return &memcg->tcp_mem;
+	WARN_ON(!sk->sk_memcg);
+	css_put(&sk->sk_memcg->css);
 }
-EXPORT_SYMBOL(tcp_proto_cgroup);
 
 /**
  * mem_cgroup_charge_skmem - charge socket memory
- * @proto: proto to charge
+ * @memcg: memcg to charge
  * @nr_pages: number of pages to charge
  *
- * Charges @nr_pages to @proto. Returns %true if the charge fit within
- * @proto's configured limit, %false if the charge had to be forced.
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if the charge had to be forced.
  */
-bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct page_counter *counter;
 
-	if (page_counter_try_charge(&proto->memory_allocated,
+	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
 				    nr_pages, &counter)) {
-		proto->memory_pressure = 0;
+		memcg->tcp_mem.memory_pressure = 0;
 		return true;
 	}
-	page_counter_charge(&proto->memory_allocated, nr_pages);
-	proto->memory_pressure = 1;
+	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+	memcg->tcp_mem.memory_pressure = 1;
 	return false;
 }
 
 /**
  * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @proto - proto to uncharge
+ * @memcg - memcg to uncharge
  * @nr_pages - number of pages to uncharge
  */
-void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	page_counter_uncharge(&proto->memory_allocated, nr_pages);
+	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
 }
 
 #endif
@@ -3629,7 +3616,7 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 	if (ret)
 		return ret;
 
-	return mem_cgroup_sockets_init(memcg, ss);
+	return tcp_init_cgroup(memcg, ss);
 }
 
 static void memcg_deactivate_kmem(struct mem_cgroup *memcg)
@@ -3685,7 +3672,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
 		static_key_slow_dec(&memcg_kmem_enabled_key);
 		WARN_ON(page_counter_read(&memcg->kmem));
 	}
-	mem_cgroup_sockets_destroy(memcg);
+	tcp_destroy_cgroup(memcg);
 }
 #else
 static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b1b96f..6486b0d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -194,44 +194,6 @@ bool sk_net_capable(const struct sock *sk, int cap)
 }
 EXPORT_SYMBOL(sk_net_capable);
 
-
-#ifdef CONFIG_MEMCG_KMEM
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
-{
-	struct proto *proto;
-	int ret = 0;
-
-	mutex_lock(&proto_list_mutex);
-	list_for_each_entry(proto, &proto_list, node) {
-		if (proto->init_cgroup) {
-			ret = proto->init_cgroup(memcg, ss);
-			if (ret)
-				goto out;
-		}
-	}
-
-	mutex_unlock(&proto_list_mutex);
-	return ret;
-out:
-	list_for_each_entry_continue_reverse(proto, &proto_list, node)
-		if (proto->destroy_cgroup)
-			proto->destroy_cgroup(memcg);
-	mutex_unlock(&proto_list_mutex);
-	return ret;
-}
-
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg)
-{
-	struct proto *proto;
-
-	mutex_lock(&proto_list_mutex);
-	list_for_each_entry_reverse(proto, &proto_list, node)
-		if (proto->destroy_cgroup)
-			proto->destroy_cgroup(memcg);
-	mutex_unlock(&proto_list_mutex);
-}
-#endif
-
 /*
  * Each address family might have different locking rules, so we have
  * one slock key per address family:
@@ -1583,7 +1545,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		if (mem_cgroup_sockets_enabled && sk->sk_memcg)
 			sock_update_memcg(newsk);
 
 		if (newsk->sk_prot->sockets_allocated)
@@ -2071,8 +2033,8 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
 
 	allocated = sk_memory_allocated_add(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
-	    !mem_cgroup_charge_skmem(sk->sk_cgrp, amt))
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	    !mem_cgroup_charge_skmem(sk->sk_memcg, amt))
 		goto suppress_allocation;
 
 	/* Under limit. */
@@ -2135,8 +2097,8 @@ suppress_allocation:
 
 	sk_memory_allocated_sub(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amt);
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+		mem_cgroup_uncharge_skmem(sk->sk_memcg, amt);
 
 	return 0;
 }
@@ -2153,8 +2115,8 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
 	sk_memory_allocated_sub(sk, amount);
 	sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amount);
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+		mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);
 
 	if (sk_under_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4027e02..34c2678 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1814,7 +1814,7 @@ void tcp_v4_destroy_sock(struct sock *sk)
 
 	sk_sockets_allocated_dec(sk);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
 		sock_release_memcg(sk);
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
@@ -2339,11 +2339,6 @@ struct proto tcp_prot = {
 	.compat_setsockopt	= compat_tcp_setsockopt,
 	.compat_getsockopt	= compat_tcp_getsockopt,
 #endif
-#ifdef CONFIG_MEMCG_KMEM
-	.init_cgroup		= tcp_init_cgroup,
-	.destroy_cgroup		= tcp_destroy_cgroup,
-	.proto_cgroup		= tcp_proto_cgroup,
-#endif
 };
 EXPORT_SYMBOL(tcp_prot);
 
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index ef4268d..e507825 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -8,60 +8,47 @@
 
 int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 {
+	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+	struct page_counter *counter_parent = NULL;
 	/*
 	 * The root cgroup does not use page_counters, but rather,
 	 * rely on the data already collected by the network
 	 * subsystem
 	 */
-	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
-	struct page_counter *counter_parent = NULL;
-	struct cg_proto *cg_proto, *parent_cg;
-
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return 0;
 
-	cg_proto->memory_pressure = 0;
-	cg_proto->memcg = memcg;
+	memcg->tcp_mem.memory_pressure = 0;
 
-	parent_cg = tcp_prot.proto_cgroup(parent);
-	if (parent_cg)
-		counter_parent = &parent_cg->memory_allocated;
+	if (parent)
+		counter_parent = &parent->tcp_mem.memory_allocated;
 
-	page_counter_init(&cg_proto->memory_allocated, counter_parent);
+	page_counter_init(&memcg->tcp_mem.memory_allocated, counter_parent);
 
 	return 0;
 }
-EXPORT_SYMBOL(tcp_init_cgroup);
 
 void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 {
-	struct cg_proto *cg_proto;
-
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return;
 
-	if (cg_proto->active)
+	if (memcg->tcp_mem.active)
 		static_key_slow_dec(&memcg_socket_limit_enabled);
-
 }
-EXPORT_SYMBOL(tcp_destroy_cgroup);
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 {
-	struct cg_proto *cg_proto;
 	int ret;
 
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return -EINVAL;
 
-	ret = page_counter_limit(&cg_proto->memory_allocated, nr_pages);
+	ret = page_counter_limit(&memcg->tcp_mem.memory_allocated, nr_pages);
 	if (ret)
 		return ret;
 
-	if (!cg_proto->active) {
+	if (!memcg->tcp_mem.active) {
 		/*
 		 * The active flag needs to be written after the static_key
 		 * update. This is what guarantees that the socket activation
@@ -79,7 +66,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * patched in yet.
 		 */
 		static_key_slow_inc(&memcg_socket_limit_enabled);
-		cg_proto->active = true;
+		memcg->tcp_mem.active = true;
 	}
 
 	return 0;
@@ -123,32 +110,32 @@ static ssize_t tcp_cgroup_write(struct kernfs_open_file *of,
 static u64 tcp_cgroup_read(struct cgroup_subsys_state *css, struct cftype *cft)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cg_proto *cg_proto = tcp_prot.proto_cgroup(memcg);
 	u64 val;
 
 	switch (cft->private) {
 	case RES_LIMIT:
-		if (!cg_proto)
-			return PAGE_COUNTER_MAX;
-		val = cg_proto->memory_allocated.limit;
+		if (memcg == root_mem_cgroup)
+			val = PAGE_COUNTER_MAX;
+		else
+			val = memcg->tcp_mem.memory_allocated.limit;
 		val *= PAGE_SIZE;
 		break;
 	case RES_USAGE:
-		if (!cg_proto)
+		if (memcg == root_mem_cgroup)
 			val = atomic_long_read(&tcp_memory_allocated);
 		else
-			val = page_counter_read(&cg_proto->memory_allocated);
+			val = page_counter_read(&memcg->tcp_mem.memory_allocated);
 		val *= PAGE_SIZE;
 		break;
 	case RES_FAILCNT:
-		if (!cg_proto)
+		if (memcg == root_mem_cgroup)
 			return 0;
-		val = cg_proto->memory_allocated.failcnt;
+		val = memcg->tcp_mem.memory_allocated.failcnt;
 		break;
 	case RES_MAX_USAGE:
-		if (!cg_proto)
+		if (memcg == root_mem_cgroup)
 			return 0;
-		val = cg_proto->memory_allocated.watermark;
+		val = memcg->tcp_mem.memory_allocated.watermark;
 		val *= PAGE_SIZE;
 		break;
 	default:
@@ -161,19 +148,17 @@ static ssize_t tcp_cgroup_reset(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off)
 {
 	struct mem_cgroup *memcg;
-	struct cg_proto *cg_proto;
 
 	memcg = mem_cgroup_from_css(of_css(of));
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return nbytes;
 
 	switch (of_cft(of)->private) {
 	case RES_MAX_USAGE:
-		page_counter_reset_watermark(&cg_proto->memory_allocated);
+		page_counter_reset_watermark(&memcg->tcp_mem.memory_allocated);
 		break;
 	case RES_FAILCNT:
-		cg_proto->memory_allocated.failcnt = 0;
+		memcg->tcp_mem.memory_allocated.failcnt = 0;
 		break;
 	}
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7aa168a..7b83a65 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2821,8 +2821,8 @@ void sk_forced_mem_schedule(struct sock *sk, int size)
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
 	sk_memory_allocated_add(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		mem_cgroup_charge_skmem(sk->sk_cgrp, amt);
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+		mem_cgroup_charge_skmem(sk->sk_memcg, amt);
 }
 
 /* Send a FIN. The caller locks the socket for us.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c5429a6..1bfb682 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1880,9 +1880,6 @@ struct proto tcpv6_prot = {
 	.compat_setsockopt	= compat_tcp_setsockopt,
 	.compat_getsockopt	= compat_tcp_getsockopt,
 #endif
-#ifdef CONFIG_MEMCG_KMEM
-	.proto_cgroup		= tcp_proto_cgroup,
-#endif
 	.clear_sk		= tcp_v6_clear_sk,
 };
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 08/13] net: tcp_memcontrol: simplify linkage between socket and page counter
@ 2015-11-24 21:52   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

There won't be any separate counters for socket memory consumed by
protocols other than TCP in the future. Remove the indirection and
link sockets directly to their owning memory cgroup.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h   | 18 +++---------
 include/net/sock.h           | 36 +++---------------------
 include/net/tcp.h            |  4 +--
 include/net/tcp_memcontrol.h |  1 -
 mm/memcontrol.c              | 57 +++++++++++++++----------------------
 net/core/sock.c              | 52 +++++-----------------------------
 net/ipv4/tcp_ipv4.c          |  7 +----
 net/ipv4/tcp_memcontrol.c    | 67 +++++++++++++++++---------------------------
 net/ipv4/tcp_output.c        |  4 +--
 net/ipv6/tcp_ipv6.c          |  3 --
 10 files changed, 68 insertions(+), 181 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4d80021..d99fefe 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -89,16 +89,6 @@ struct cg_proto {
 	struct page_counter	memory_allocated;	/* Current allocated memory. */
 	int			memory_pressure;
 	bool			active;
-	/*
-	 * memcg field is used to find which memcg we belong directly
-	 * Each memcg struct can hold more than one cg_proto, so container_of
-	 * won't really cut.
-	 *
-	 * The elegant solution would be having an inverse function to
-	 * proto_cgroup in struct proto, but that means polluting the structure
-	 * for everybody, instead of just for memcg users.
-	 */
-	struct mem_cgroup	*memcg;
 };
 
 #ifdef CONFIG_MEMCG
@@ -693,11 +683,11 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 struct sock;
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
-bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
-void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
-static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
-	return proto->memory_pressure;
+	return memcg->tcp_mem.memory_pressure;
 }
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 888aa3f..1a94b85 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -69,22 +69,6 @@
 #include <net/tcp_states.h>
 #include <linux/net_tstamp.h>
 
-struct cgroup;
-struct cgroup_subsys;
-#ifdef CONFIG_NET
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss);
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg);
-#else
-static inline
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
-{
-	return 0;
-}
-static inline
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg)
-{
-}
-#endif
 /*
  * This structure really needs to be cleaned up.
  * Most of it is for TCP, and not used by any of
@@ -310,7 +294,7 @@ struct cg_proto;
   *	@sk_security: used by security modules
   *	@sk_mark: generic packet mark
   *	@sk_classid: this socket's cgroup classid
-  *	@sk_cgrp: this socket's cgroup-specific proto data
+  *	@sk_memcg: this socket's memory cgroup association
   *	@sk_write_pending: a write to stream socket waits to start
   *	@sk_state_change: callback to indicate change in the state of the sock
   *	@sk_data_ready: callback to indicate there is data to be processed
@@ -447,7 +431,7 @@ struct sock {
 #ifdef CONFIG_CGROUP_NET_CLASSID
 	u32			sk_classid;
 #endif
-	struct cg_proto		*sk_cgrp;
+	struct mem_cgroup	*sk_memcg;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk);
 	void			(*sk_write_space)(struct sock *sk);
@@ -1051,18 +1035,6 @@ struct proto {
 #ifdef SOCK_REFCNT_DEBUG
 	atomic_t		socks;
 #endif
-#ifdef CONFIG_MEMCG_KMEM
-	/*
-	 * cgroup specific init/deinit functions. Called once for all
-	 * protocols that implement it, from cgroups populate function.
-	 * This function has to setup any files the protocol want to
-	 * appear in the kmem cgroup filesystem.
-	 */
-	int			(*init_cgroup)(struct mem_cgroup *memcg,
-					       struct cgroup_subsys *ss);
-	void			(*destroy_cgroup)(struct mem_cgroup *memcg);
-	struct cg_proto		*(*proto_cgroup)(struct mem_cgroup *memcg);
-#endif
 };
 
 int proto_register(struct proto *prot, int alloc_slab);
@@ -1126,8 +1098,8 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	if (!sk->sk_prot->memory_pressure)
 		return false;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
-	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
 	return !!*sk->sk_prot->memory_pressure;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 04517d6..c008535 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -292,8 +292,8 @@ extern int tcp_memory_pressure;
 /* optimized version of sk_under_memory_pressure() for TCP sockets */
 static inline bool tcp_under_memory_pressure(const struct sock *sk)
 {
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
-	    mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
 	return tcp_memory_pressure;
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 05b94d9..3a17b16 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -1,7 +1,6 @@
 #ifndef _TCP_MEMCG_H
 #define _TCP_MEMCG_H
 
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
 int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss);
 void tcp_destroy_cgroup(struct mem_cgroup *memcg);
 #endif /* _TCP_MEMCG_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b586ea..68d67fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -294,9 +294,6 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 void sock_update_memcg(struct sock *sk)
 {
 	struct mem_cgroup *memcg;
-	struct cg_proto *cg_proto;
-
-	BUG_ON(!sk->sk_prot->proto_cgroup);
 
 	/* Socket cloning can throw us here with sk_cgrp already
 	 * filled. It won't however, necessarily happen from
@@ -306,68 +303,58 @@ void sock_update_memcg(struct sock *sk)
 	 * Respecting the original socket's memcg is a better
 	 * decision in this case.
 	 */
-	if (sk->sk_cgrp) {
-		BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
-		css_get(&sk->sk_cgrp->memcg->css);
+	if (sk->sk_memcg) {
+		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+		css_get(&sk->sk_memcg->css);
 		return;
 	}
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
-	cg_proto = sk->sk_prot->proto_cgroup(memcg);
-	if (cg_proto && cg_proto->active &&
-	    css_tryget_online(&memcg->css)) {
-		sk->sk_cgrp = cg_proto;
-	}
+	if (memcg != root_mem_cgroup &&
+	    memcg->tcp_mem.active &&
+	    css_tryget_online(&memcg->css))
+		sk->sk_memcg = memcg;
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(sock_update_memcg);
 
 void sock_release_memcg(struct sock *sk)
 {
-	WARN_ON(!sk->sk_cgrp->memcg);
-	css_put(&sk->sk_cgrp->memcg->css);
-}
-
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
-{
-	if (!memcg || mem_cgroup_is_root(memcg))
-		return NULL;
-
-	return &memcg->tcp_mem;
+	WARN_ON(!sk->sk_memcg);
+	css_put(&sk->sk_memcg->css);
 }
-EXPORT_SYMBOL(tcp_proto_cgroup);
 
 /**
  * mem_cgroup_charge_skmem - charge socket memory
- * @proto: proto to charge
+ * @memcg: memcg to charge
  * @nr_pages: number of pages to charge
  *
- * Charges @nr_pages to @proto. Returns %true if the charge fit within
- * @proto's configured limit, %false if the charge had to be forced.
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if the charge had to be forced.
  */
-bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct page_counter *counter;
 
-	if (page_counter_try_charge(&proto->memory_allocated,
+	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
 				    nr_pages, &counter)) {
-		proto->memory_pressure = 0;
+		memcg->tcp_mem.memory_pressure = 0;
 		return true;
 	}
-	page_counter_charge(&proto->memory_allocated, nr_pages);
-	proto->memory_pressure = 1;
+	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+	memcg->tcp_mem.memory_pressure = 1;
 	return false;
 }
 
 /**
  * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @proto - proto to uncharge
+ * @memcg - memcg to uncharge
  * @nr_pages - number of pages to uncharge
  */
-void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	page_counter_uncharge(&proto->memory_allocated, nr_pages);
+	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
 }
 
 #endif
@@ -3629,7 +3616,7 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 	if (ret)
 		return ret;
 
-	return mem_cgroup_sockets_init(memcg, ss);
+	return tcp_init_cgroup(memcg, ss);
 }
 
 static void memcg_deactivate_kmem(struct mem_cgroup *memcg)
@@ -3685,7 +3672,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
 		static_key_slow_dec(&memcg_kmem_enabled_key);
 		WARN_ON(page_counter_read(&memcg->kmem));
 	}
-	mem_cgroup_sockets_destroy(memcg);
+	tcp_destroy_cgroup(memcg);
 }
 #else
 static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b1b96f..6486b0d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -194,44 +194,6 @@ bool sk_net_capable(const struct sock *sk, int cap)
 }
 EXPORT_SYMBOL(sk_net_capable);
 
-
-#ifdef CONFIG_MEMCG_KMEM
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
-{
-	struct proto *proto;
-	int ret = 0;
-
-	mutex_lock(&proto_list_mutex);
-	list_for_each_entry(proto, &proto_list, node) {
-		if (proto->init_cgroup) {
-			ret = proto->init_cgroup(memcg, ss);
-			if (ret)
-				goto out;
-		}
-	}
-
-	mutex_unlock(&proto_list_mutex);
-	return ret;
-out:
-	list_for_each_entry_continue_reverse(proto, &proto_list, node)
-		if (proto->destroy_cgroup)
-			proto->destroy_cgroup(memcg);
-	mutex_unlock(&proto_list_mutex);
-	return ret;
-}
-
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg)
-{
-	struct proto *proto;
-
-	mutex_lock(&proto_list_mutex);
-	list_for_each_entry_reverse(proto, &proto_list, node)
-		if (proto->destroy_cgroup)
-			proto->destroy_cgroup(memcg);
-	mutex_unlock(&proto_list_mutex);
-}
-#endif
-
 /*
  * Each address family might have different locking rules, so we have
  * one slock key per address family:
@@ -1583,7 +1545,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+		if (mem_cgroup_sockets_enabled && sk->sk_memcg)
 			sock_update_memcg(newsk);
 
 		if (newsk->sk_prot->sockets_allocated)
@@ -2071,8 +2033,8 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
 
 	allocated = sk_memory_allocated_add(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
-	    !mem_cgroup_charge_skmem(sk->sk_cgrp, amt))
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+	    !mem_cgroup_charge_skmem(sk->sk_memcg, amt))
 		goto suppress_allocation;
 
 	/* Under limit. */
@@ -2135,8 +2097,8 @@ suppress_allocation:
 
 	sk_memory_allocated_sub(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amt);
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+		mem_cgroup_uncharge_skmem(sk->sk_memcg, amt);
 
 	return 0;
 }
@@ -2153,8 +2115,8 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
 	sk_memory_allocated_sub(sk, amount);
 	sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		mem_cgroup_uncharge_skmem(sk->sk_cgrp, amount);
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+		mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);
 
 	if (sk_under_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4027e02..34c2678 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1814,7 +1814,7 @@ void tcp_v4_destroy_sock(struct sock *sk)
 
 	sk_sockets_allocated_dec(sk);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
 		sock_release_memcg(sk);
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
@@ -2339,11 +2339,6 @@ struct proto tcp_prot = {
 	.compat_setsockopt	= compat_tcp_setsockopt,
 	.compat_getsockopt	= compat_tcp_getsockopt,
 #endif
-#ifdef CONFIG_MEMCG_KMEM
-	.init_cgroup		= tcp_init_cgroup,
-	.destroy_cgroup		= tcp_destroy_cgroup,
-	.proto_cgroup		= tcp_proto_cgroup,
-#endif
 };
 EXPORT_SYMBOL(tcp_prot);
 
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index ef4268d..e507825 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -8,60 +8,47 @@
 
 int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 {
+	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+	struct page_counter *counter_parent = NULL;
 	/*
 	 * The root cgroup does not use page_counters, but rather,
 	 * rely on the data already collected by the network
 	 * subsystem
 	 */
-	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
-	struct page_counter *counter_parent = NULL;
-	struct cg_proto *cg_proto, *parent_cg;
-
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return 0;
 
-	cg_proto->memory_pressure = 0;
-	cg_proto->memcg = memcg;
+	memcg->tcp_mem.memory_pressure = 0;
 
-	parent_cg = tcp_prot.proto_cgroup(parent);
-	if (parent_cg)
-		counter_parent = &parent_cg->memory_allocated;
+	if (parent)
+		counter_parent = &parent->tcp_mem.memory_allocated;
 
-	page_counter_init(&cg_proto->memory_allocated, counter_parent);
+	page_counter_init(&memcg->tcp_mem.memory_allocated, counter_parent);
 
 	return 0;
 }
-EXPORT_SYMBOL(tcp_init_cgroup);
 
 void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 {
-	struct cg_proto *cg_proto;
-
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return;
 
-	if (cg_proto->active)
+	if (memcg->tcp_mem.active)
 		static_key_slow_dec(&memcg_socket_limit_enabled);
-
 }
-EXPORT_SYMBOL(tcp_destroy_cgroup);
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 {
-	struct cg_proto *cg_proto;
 	int ret;
 
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return -EINVAL;
 
-	ret = page_counter_limit(&cg_proto->memory_allocated, nr_pages);
+	ret = page_counter_limit(&memcg->tcp_mem.memory_allocated, nr_pages);
 	if (ret)
 		return ret;
 
-	if (!cg_proto->active) {
+	if (!memcg->tcp_mem.active) {
 		/*
 		 * The active flag needs to be written after the static_key
 		 * update. This is what guarantees that the socket activation
@@ -79,7 +66,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * patched in yet.
 		 */
 		static_key_slow_inc(&memcg_socket_limit_enabled);
-		cg_proto->active = true;
+		memcg->tcp_mem.active = true;
 	}
 
 	return 0;
@@ -123,32 +110,32 @@ static ssize_t tcp_cgroup_write(struct kernfs_open_file *of,
 static u64 tcp_cgroup_read(struct cgroup_subsys_state *css, struct cftype *cft)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-	struct cg_proto *cg_proto = tcp_prot.proto_cgroup(memcg);
 	u64 val;
 
 	switch (cft->private) {
 	case RES_LIMIT:
-		if (!cg_proto)
-			return PAGE_COUNTER_MAX;
-		val = cg_proto->memory_allocated.limit;
+		if (memcg == root_mem_cgroup)
+			val = PAGE_COUNTER_MAX;
+		else
+			val = memcg->tcp_mem.memory_allocated.limit;
 		val *= PAGE_SIZE;
 		break;
 	case RES_USAGE:
-		if (!cg_proto)
+		if (memcg == root_mem_cgroup)
 			val = atomic_long_read(&tcp_memory_allocated);
 		else
-			val = page_counter_read(&cg_proto->memory_allocated);
+			val = page_counter_read(&memcg->tcp_mem.memory_allocated);
 		val *= PAGE_SIZE;
 		break;
 	case RES_FAILCNT:
-		if (!cg_proto)
+		if (memcg == root_mem_cgroup)
 			return 0;
-		val = cg_proto->memory_allocated.failcnt;
+		val = memcg->tcp_mem.memory_allocated.failcnt;
 		break;
 	case RES_MAX_USAGE:
-		if (!cg_proto)
+		if (memcg == root_mem_cgroup)
 			return 0;
-		val = cg_proto->memory_allocated.watermark;
+		val = memcg->tcp_mem.memory_allocated.watermark;
 		val *= PAGE_SIZE;
 		break;
 	default:
@@ -161,19 +148,17 @@ static ssize_t tcp_cgroup_reset(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off)
 {
 	struct mem_cgroup *memcg;
-	struct cg_proto *cg_proto;
 
 	memcg = mem_cgroup_from_css(of_css(of));
-	cg_proto = tcp_prot.proto_cgroup(memcg);
-	if (!cg_proto)
+	if (memcg == root_mem_cgroup)
 		return nbytes;
 
 	switch (of_cft(of)->private) {
 	case RES_MAX_USAGE:
-		page_counter_reset_watermark(&cg_proto->memory_allocated);
+		page_counter_reset_watermark(&memcg->tcp_mem.memory_allocated);
 		break;
 	case RES_FAILCNT:
-		cg_proto->memory_allocated.failcnt = 0;
+		memcg->tcp_mem.memory_allocated.failcnt = 0;
 		break;
 	}
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7aa168a..7b83a65 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2821,8 +2821,8 @@ void sk_forced_mem_schedule(struct sock *sk, int size)
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
 	sk_memory_allocated_add(sk, amt);
 
-	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		mem_cgroup_charge_skmem(sk->sk_cgrp, amt);
+	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+		mem_cgroup_charge_skmem(sk->sk_memcg, amt);
 }
 
 /* Send a FIN. The caller locks the socket for us.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c5429a6..1bfb682 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1880,9 +1880,6 @@ struct proto tcpv6_prot = {
 	.compat_setsockopt	= compat_tcp_setsockopt,
 	.compat_getsockopt	= compat_tcp_getsockopt,
 #endif
-#ifdef CONFIG_MEMCG_KMEM
-	.proto_cgroup		= tcp_proto_cgroup,
-#endif
 	.clear_sk		= tcp_v6_clear_sk,
 };
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:52   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The unified hierarchy memory controller is going to use this jump
label as well to control the networking callbacks. Move it to the
memory controller code and give it a more generic name.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 4 ++++
 include/net/sock.h         | 7 -------
 mm/memcontrol.c            | 3 +++
 net/core/sock.c            | 5 -----
 net/ipv4/tcp_memcontrol.c  | 4 ++--
 5 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d99fefe..dad56ef 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -681,6 +681,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
 struct sock;
+extern struct static_key memcg_sockets_enabled_key;
+#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
@@ -689,6 +691,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 	return memcg->tcp_mem.memory_pressure;
 }
+#else
+#define mem_cgroup_sockets_enabled 0
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 1a94b85..fcc9442 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
 #define sk_refcnt_debug_release(sk) do { } while (0)
 #endif /* SOCK_REFCNT_DEBUG */
 
-#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
-extern struct static_key memcg_socket_limit_enabled;
-#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
-#else
-#define mem_cgroup_sockets_enabled 0
-#endif
-
 static inline bool sk_stream_memory_free(const struct sock *sk)
 {
 	if (sk->sk_wmem_queued >= sk->sk_sndbuf)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 68d67fc..0602bee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 /* Writing them here to avoid exposing memcg's inner layout */
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
 
+struct static_key memcg_sockets_enabled_key;
+EXPORT_SYMBOL(memcg_sockets_enabled_key);
+
 void sock_update_memcg(struct sock *sk)
 {
 	struct mem_cgroup *memcg;
diff --git a/net/core/sock.c b/net/core/sock.c
index 6486b0d..c5435b5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
 static struct lock_class_key af_family_keys[AF_MAX];
 static struct lock_class_key af_family_slock_keys[AF_MAX];
 
-#if defined(CONFIG_MEMCG_KMEM)
-struct static_key memcg_socket_limit_enabled;
-EXPORT_SYMBOL(memcg_socket_limit_enabled);
-#endif
-
 /*
  * Make lock validator output more readable. (we pre-construct these
  * strings build-time, so that runtime initialization of socket
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index e507825..9a22e2d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 		return;
 
 	if (memcg->tcp_mem.active)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
+		static_key_slow_dec(&memcg_sockets_enabled_key);
 }
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
@@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
-		static_key_slow_inc(&memcg_socket_limit_enabled);
+		static_key_slow_inc(&memcg_sockets_enabled_key);
 		memcg->tcp_mem.active = true;
 	}
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-24 21:52   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The unified hierarchy memory controller is going to use this jump
label as well to control the networking callbacks. Move it to the
memory controller code and give it a more generic name.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 include/linux/memcontrol.h | 4 ++++
 include/net/sock.h         | 7 -------
 mm/memcontrol.c            | 3 +++
 net/core/sock.c            | 5 -----
 net/ipv4/tcp_memcontrol.c  | 4 ++--
 5 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d99fefe..dad56ef 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -681,6 +681,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
 struct sock;
+extern struct static_key memcg_sockets_enabled_key;
+#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
@@ -689,6 +691,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 	return memcg->tcp_mem.memory_pressure;
 }
+#else
+#define mem_cgroup_sockets_enabled 0
 #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
 
 #ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 1a94b85..fcc9442 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
 #define sk_refcnt_debug_release(sk) do { } while (0)
 #endif /* SOCK_REFCNT_DEBUG */
 
-#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
-extern struct static_key memcg_socket_limit_enabled;
-#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
-#else
-#define mem_cgroup_sockets_enabled 0
-#endif
-
 static inline bool sk_stream_memory_free(const struct sock *sk)
 {
 	if (sk->sk_wmem_queued >= sk->sk_sndbuf)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 68d67fc..0602bee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 /* Writing them here to avoid exposing memcg's inner layout */
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
 
+struct static_key memcg_sockets_enabled_key;
+EXPORT_SYMBOL(memcg_sockets_enabled_key);
+
 void sock_update_memcg(struct sock *sk)
 {
 	struct mem_cgroup *memcg;
diff --git a/net/core/sock.c b/net/core/sock.c
index 6486b0d..c5435b5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
 static struct lock_class_key af_family_keys[AF_MAX];
 static struct lock_class_key af_family_slock_keys[AF_MAX];
 
-#if defined(CONFIG_MEMCG_KMEM)
-struct static_key memcg_socket_limit_enabled;
-EXPORT_SYMBOL(memcg_socket_limit_enabled);
-#endif
-
 /*
  * Make lock validator output more readable. (we pre-construct these
  * strings build-time, so that runtime initialization of socket
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index e507825..9a22e2d 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 		return;
 
 	if (memcg->tcp_mem.active)
-		static_key_slow_dec(&memcg_socket_limit_enabled);
+		static_key_slow_dec(&memcg_sockets_enabled_key);
 }
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
@@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
-		static_key_slow_inc(&memcg_socket_limit_enabled);
+		static_key_slow_inc(&memcg_sockets_enabled_key);
 		memcg->tcp_mem.active = true;
 	}
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 10/13] mm: memcontrol: do not account memory+swap on unified hierarchy
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:52   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The unified hierarchy memory controller doesn't expose the memory+swap
counter to userspace, but its accounting is hardcoded in all charge
paths right now, including the per-cpu charge cache ("the stock").

To avoid adding yet more pointless memory+swap accounting with the
socket memory support in unified hierarchy, disable the counter
altogether when in unified hierarchy mode.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/memcontrol.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0602bee..6b8c0f7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -87,6 +87,12 @@ int do_swap_account __read_mostly;
 #define do_swap_account		0
 #endif
 
+/* Whether legacy memory+swap accounting is active */
+static bool do_memsw_account(void)
+{
+	return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && do_swap_account;
+}
+
 static const char * const mem_cgroup_stat_names[] = {
 	"cache",
 	"rss",
@@ -1177,7 +1183,7 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
 	if (count < limit)
 		margin = limit - count;
 
-	if (do_swap_account) {
+	if (do_memsw_account()) {
 		count = page_counter_read(&memcg->memsw);
 		limit = READ_ONCE(memcg->memsw.limit);
 		if (count <= limit)
@@ -1280,7 +1286,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 		pr_cont(":");
 
 		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
-			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+			if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
 				continue;
 			pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
 				K(mem_cgroup_read_stat(iter, i)));
@@ -1903,7 +1909,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 
 	if (stock->nr_pages) {
 		page_counter_uncharge(&old->memory, stock->nr_pages);
-		if (do_swap_account)
+		if (do_memsw_account())
 			page_counter_uncharge(&old->memsw, stock->nr_pages);
 		css_put_many(&old->css, stock->nr_pages);
 		stock->nr_pages = 0;
@@ -2033,11 +2039,11 @@ retry:
 	if (consume_stock(memcg, nr_pages))
 		return 0;
 
-	if (!do_swap_account ||
+	if (!do_memsw_account() ||
 	    page_counter_try_charge(&memcg->memsw, batch, &counter)) {
 		if (page_counter_try_charge(&memcg->memory, batch, &counter))
 			goto done_restock;
-		if (do_swap_account)
+		if (do_memsw_account())
 			page_counter_uncharge(&memcg->memsw, batch);
 		mem_over_limit = mem_cgroup_from_counter(counter, memory);
 	} else {
@@ -2124,7 +2130,7 @@ force:
 	 * temporarily by force charging it.
 	 */
 	page_counter_charge(&memcg->memory, nr_pages);
-	if (do_swap_account)
+	if (do_memsw_account())
 		page_counter_charge(&memcg->memsw, nr_pages);
 	css_get_many(&memcg->css, nr_pages);
 
@@ -2161,7 +2167,7 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
 		return;
 
 	page_counter_uncharge(&memcg->memory, nr_pages);
-	if (do_swap_account)
+	if (do_memsw_account())
 		page_counter_uncharge(&memcg->memsw, nr_pages);
 
 	css_put_many(&memcg->css, nr_pages);
@@ -2447,7 +2453,7 @@ void __memcg_kmem_uncharge(struct page *page, int order)
 
 	page_counter_uncharge(&memcg->kmem, nr_pages);
 	page_counter_uncharge(&memcg->memory, nr_pages);
-	if (do_swap_account)
+	if (do_memsw_account())
 		page_counter_uncharge(&memcg->memsw, nr_pages);
 
 	page->mem_cgroup = NULL;
@@ -3160,7 +3166,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
 	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
 
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
-		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+		if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
 			continue;
 		seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
 			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
@@ -3182,14 +3188,14 @@ static int memcg_stat_show(struct seq_file *m, void *v)
 	}
 	seq_printf(m, "hierarchical_memory_limit %llu\n",
 		   (u64)memory * PAGE_SIZE);
-	if (do_swap_account)
+	if (do_memsw_account())
 		seq_printf(m, "hierarchical_memsw_limit %llu\n",
 			   (u64)memsw * PAGE_SIZE);
 
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
 		unsigned long long val = 0;
 
-		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+		if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
 			continue;
 		for_each_mem_cgroup_tree(mi, memcg)
 			val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
@@ -3320,7 +3326,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg)
 {
 	while (memcg) {
 		__mem_cgroup_threshold(memcg, false);
-		if (do_swap_account)
+		if (do_memsw_account())
 			__mem_cgroup_threshold(memcg, true);
 
 		memcg = parent_mem_cgroup(memcg);
@@ -4466,7 +4472,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
 	 * we call find_get_page() with swapper_space directly.
 	 */
 	page = find_get_page(swap_address_space(ent), ent.val);
-	if (do_swap_account)
+	if (do_memsw_account())
 		entry->val = ent.val;
 
 	return page;
@@ -4501,7 +4507,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
 		page = find_get_entry(mapping, pgoff);
 		if (radix_tree_exceptional_entry(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
-			if (do_swap_account)
+			if (do_memsw_account())
 				*entry = swp;
 			page = find_get_page(swap_address_space(swp), swp.val);
 		}
@@ -5276,7 +5282,7 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 		if (page->mem_cgroup)
 			goto out;
 
-		if (do_swap_account) {
+		if (do_memsw_account()) {
 			swp_entry_t ent = { .val = page_private(page), };
 			unsigned short id = lookup_swap_cgroup_id(ent);
 
@@ -5340,7 +5346,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
 	memcg_check_events(memcg, page);
 	local_irq_enable();
 
-	if (do_swap_account && PageSwapCache(page)) {
+	if (do_memsw_account() && PageSwapCache(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		/*
 		 * The swap entry might not get freed for a long time,
@@ -5385,7 +5391,7 @@ static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
 
 	if (!mem_cgroup_is_root(memcg)) {
 		page_counter_uncharge(&memcg->memory, nr_pages);
-		if (do_swap_account)
+		if (do_memsw_account())
 			page_counter_uncharge(&memcg->memsw, nr_pages);
 		memcg_oom_recover(memcg);
 	}
@@ -5593,7 +5599,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 
-	if (!do_swap_account)
+	if (!do_memsw_account())
 		return;
 
 	memcg = page->mem_cgroup;
@@ -5633,7 +5639,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry)
 	struct mem_cgroup *memcg;
 	unsigned short id;
 
-	if (!do_swap_account)
+	if (!do_memsw_account())
 		return;
 
 	id = swap_cgroup_record(entry, 0);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 10/13] mm: memcontrol: do not account memory+swap on unified hierarchy
@ 2015-11-24 21:52   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The unified hierarchy memory controller doesn't expose the memory+swap
counter to userspace, but its accounting is hardcoded in all charge
paths right now, including the per-cpu charge cache ("the stock").

To avoid adding yet more pointless memory+swap accounting with the
socket memory support in unified hierarchy, disable the counter
altogether when in unified hierarchy mode.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/memcontrol.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0602bee..6b8c0f7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -87,6 +87,12 @@ int do_swap_account __read_mostly;
 #define do_swap_account		0
 #endif
 
+/* Whether legacy memory+swap accounting is active */
+static bool do_memsw_account(void)
+{
+	return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && do_swap_account;
+}
+
 static const char * const mem_cgroup_stat_names[] = {
 	"cache",
 	"rss",
@@ -1177,7 +1183,7 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
 	if (count < limit)
 		margin = limit - count;
 
-	if (do_swap_account) {
+	if (do_memsw_account()) {
 		count = page_counter_read(&memcg->memsw);
 		limit = READ_ONCE(memcg->memsw.limit);
 		if (count <= limit)
@@ -1280,7 +1286,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 		pr_cont(":");
 
 		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
-			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+			if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
 				continue;
 			pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
 				K(mem_cgroup_read_stat(iter, i)));
@@ -1903,7 +1909,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 
 	if (stock->nr_pages) {
 		page_counter_uncharge(&old->memory, stock->nr_pages);
-		if (do_swap_account)
+		if (do_memsw_account())
 			page_counter_uncharge(&old->memsw, stock->nr_pages);
 		css_put_many(&old->css, stock->nr_pages);
 		stock->nr_pages = 0;
@@ -2033,11 +2039,11 @@ retry:
 	if (consume_stock(memcg, nr_pages))
 		return 0;
 
-	if (!do_swap_account ||
+	if (!do_memsw_account() ||
 	    page_counter_try_charge(&memcg->memsw, batch, &counter)) {
 		if (page_counter_try_charge(&memcg->memory, batch, &counter))
 			goto done_restock;
-		if (do_swap_account)
+		if (do_memsw_account())
 			page_counter_uncharge(&memcg->memsw, batch);
 		mem_over_limit = mem_cgroup_from_counter(counter, memory);
 	} else {
@@ -2124,7 +2130,7 @@ force:
 	 * temporarily by force charging it.
 	 */
 	page_counter_charge(&memcg->memory, nr_pages);
-	if (do_swap_account)
+	if (do_memsw_account())
 		page_counter_charge(&memcg->memsw, nr_pages);
 	css_get_many(&memcg->css, nr_pages);
 
@@ -2161,7 +2167,7 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
 		return;
 
 	page_counter_uncharge(&memcg->memory, nr_pages);
-	if (do_swap_account)
+	if (do_memsw_account())
 		page_counter_uncharge(&memcg->memsw, nr_pages);
 
 	css_put_many(&memcg->css, nr_pages);
@@ -2447,7 +2453,7 @@ void __memcg_kmem_uncharge(struct page *page, int order)
 
 	page_counter_uncharge(&memcg->kmem, nr_pages);
 	page_counter_uncharge(&memcg->memory, nr_pages);
-	if (do_swap_account)
+	if (do_memsw_account())
 		page_counter_uncharge(&memcg->memsw, nr_pages);
 
 	page->mem_cgroup = NULL;
@@ -3160,7 +3166,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
 	BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
 
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
-		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+		if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
 			continue;
 		seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
 			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
@@ -3182,14 +3188,14 @@ static int memcg_stat_show(struct seq_file *m, void *v)
 	}
 	seq_printf(m, "hierarchical_memory_limit %llu\n",
 		   (u64)memory * PAGE_SIZE);
-	if (do_swap_account)
+	if (do_memsw_account())
 		seq_printf(m, "hierarchical_memsw_limit %llu\n",
 			   (u64)memsw * PAGE_SIZE);
 
 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
 		unsigned long long val = 0;
 
-		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+		if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
 			continue;
 		for_each_mem_cgroup_tree(mi, memcg)
 			val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
@@ -3320,7 +3326,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg)
 {
 	while (memcg) {
 		__mem_cgroup_threshold(memcg, false);
-		if (do_swap_account)
+		if (do_memsw_account())
 			__mem_cgroup_threshold(memcg, true);
 
 		memcg = parent_mem_cgroup(memcg);
@@ -4466,7 +4472,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
 	 * we call find_get_page() with swapper_space directly.
 	 */
 	page = find_get_page(swap_address_space(ent), ent.val);
-	if (do_swap_account)
+	if (do_memsw_account())
 		entry->val = ent.val;
 
 	return page;
@@ -4501,7 +4507,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
 		page = find_get_entry(mapping, pgoff);
 		if (radix_tree_exceptional_entry(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
-			if (do_swap_account)
+			if (do_memsw_account())
 				*entry = swp;
 			page = find_get_page(swap_address_space(swp), swp.val);
 		}
@@ -5276,7 +5282,7 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 		if (page->mem_cgroup)
 			goto out;
 
-		if (do_swap_account) {
+		if (do_memsw_account()) {
 			swp_entry_t ent = { .val = page_private(page), };
 			unsigned short id = lookup_swap_cgroup_id(ent);
 
@@ -5340,7 +5346,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
 	memcg_check_events(memcg, page);
 	local_irq_enable();
 
-	if (do_swap_account && PageSwapCache(page)) {
+	if (do_memsw_account() && PageSwapCache(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		/*
 		 * The swap entry might not get freed for a long time,
@@ -5385,7 +5391,7 @@ static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
 
 	if (!mem_cgroup_is_root(memcg)) {
 		page_counter_uncharge(&memcg->memory, nr_pages);
-		if (do_swap_account)
+		if (do_memsw_account())
 			page_counter_uncharge(&memcg->memsw, nr_pages);
 		memcg_oom_recover(memcg);
 	}
@@ -5593,7 +5599,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 
-	if (!do_swap_account)
+	if (!do_memsw_account())
 		return;
 
 	memcg = page->mem_cgroup;
@@ -5633,7 +5639,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry)
 	struct mem_cgroup *memcg;
 	unsigned short id;
 
-	if (!do_swap_account)
+	if (!do_memsw_account())
 		return;
 
 	id = swap_cgroup_record(entry, 0);
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 11/13] mm: memcontrol: move socket code for unified hierarchy accounting
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:52   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The unified hierarchy memory controller will account socket
memory. Move the infrastructure functions accordingly.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/memcontrol.c | 148 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 74 insertions(+), 74 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6b8c0f7..ed030b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -294,80 +294,6 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 	return mem_cgroup_from_css(css);
 }
 
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
-
-struct static_key memcg_sockets_enabled_key;
-EXPORT_SYMBOL(memcg_sockets_enabled_key);
-
-void sock_update_memcg(struct sock *sk)
-{
-	struct mem_cgroup *memcg;
-
-	/* Socket cloning can throw us here with sk_cgrp already
-	 * filled. It won't however, necessarily happen from
-	 * process context. So the test for root memcg given
-	 * the current task's memcg won't help us in this case.
-	 *
-	 * Respecting the original socket's memcg is a better
-	 * decision in this case.
-	 */
-	if (sk->sk_memcg) {
-		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
-		css_get(&sk->sk_memcg->css);
-		return;
-	}
-
-	rcu_read_lock();
-	memcg = mem_cgroup_from_task(current);
-	if (memcg != root_mem_cgroup &&
-	    memcg->tcp_mem.active &&
-	    css_tryget_online(&memcg->css))
-		sk->sk_memcg = memcg;
-	rcu_read_unlock();
-}
-EXPORT_SYMBOL(sock_update_memcg);
-
-void sock_release_memcg(struct sock *sk)
-{
-	WARN_ON(!sk->sk_memcg);
-	css_put(&sk->sk_memcg->css);
-}
-
-/**
- * mem_cgroup_charge_skmem - charge socket memory
- * @memcg: memcg to charge
- * @nr_pages: number of pages to charge
- *
- * Charges @nr_pages to @memcg. Returns %true if the charge fit within
- * @memcg's configured limit, %false if the charge had to be forced.
- */
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	struct page_counter *counter;
-
-	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
-				    nr_pages, &counter)) {
-		memcg->tcp_mem.memory_pressure = 0;
-		return true;
-	}
-	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
-	memcg->tcp_mem.memory_pressure = 1;
-	return false;
-}
-
-/**
- * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @memcg - memcg to uncharge
- * @nr_pages - number of pages to uncharge
- */
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
-}
-
-#endif
-
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * This will be the memcg's index in each cache's ->memcg_params.memcg_caches.
@@ -5544,6 +5470,80 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 	commit_charge(newpage, memcg, true);
 }
 
+/* Writing them here to avoid exposing memcg's inner layout */
+#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+
+struct static_key memcg_sockets_enabled_key;
+EXPORT_SYMBOL(memcg_sockets_enabled_key);
+
+void sock_update_memcg(struct sock *sk)
+{
+	struct mem_cgroup *memcg;
+
+	/* Socket cloning can throw us here with sk_cgrp already
+	 * filled. It won't however, necessarily happen from
+	 * process context. So the test for root memcg given
+	 * the current task's memcg won't help us in this case.
+	 *
+	 * Respecting the original socket's memcg is a better
+	 * decision in this case.
+	 */
+	if (sk->sk_memcg) {
+		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+		css_get(&sk->sk_memcg->css);
+		return;
+	}
+
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	if (memcg != root_mem_cgroup &&
+	    memcg->tcp_mem.active &&
+	    css_tryget_online(&memcg->css))
+		sk->sk_memcg = memcg;
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(sock_update_memcg);
+
+void sock_release_memcg(struct sock *sk)
+{
+	WARN_ON(!sk->sk_memcg);
+	css_put(&sk->sk_memcg->css);
+}
+
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	struct page_counter *counter;
+
+	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
+				    nr_pages, &counter)) {
+		memcg->tcp_mem.memory_pressure = 0;
+		return true;
+	}
+	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+	memcg->tcp_mem.memory_pressure = 1;
+	return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @memcg - memcg to uncharge
+ * @nr_pages - number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
+}
+
+#endif
+
 /*
  * subsys_initcall() for memory controller.
  *
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 11/13] mm: memcontrol: move socket code for unified hierarchy accounting
@ 2015-11-24 21:52   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

The unified hierarchy memory controller will account socket
memory. Move the infrastructure functions accordingly.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
---
 mm/memcontrol.c | 148 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 74 insertions(+), 74 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6b8c0f7..ed030b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -294,80 +294,6 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
 	return mem_cgroup_from_css(css);
 }
 
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
-
-struct static_key memcg_sockets_enabled_key;
-EXPORT_SYMBOL(memcg_sockets_enabled_key);
-
-void sock_update_memcg(struct sock *sk)
-{
-	struct mem_cgroup *memcg;
-
-	/* Socket cloning can throw us here with sk_cgrp already
-	 * filled. It won't however, necessarily happen from
-	 * process context. So the test for root memcg given
-	 * the current task's memcg won't help us in this case.
-	 *
-	 * Respecting the original socket's memcg is a better
-	 * decision in this case.
-	 */
-	if (sk->sk_memcg) {
-		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
-		css_get(&sk->sk_memcg->css);
-		return;
-	}
-
-	rcu_read_lock();
-	memcg = mem_cgroup_from_task(current);
-	if (memcg != root_mem_cgroup &&
-	    memcg->tcp_mem.active &&
-	    css_tryget_online(&memcg->css))
-		sk->sk_memcg = memcg;
-	rcu_read_unlock();
-}
-EXPORT_SYMBOL(sock_update_memcg);
-
-void sock_release_memcg(struct sock *sk)
-{
-	WARN_ON(!sk->sk_memcg);
-	css_put(&sk->sk_memcg->css);
-}
-
-/**
- * mem_cgroup_charge_skmem - charge socket memory
- * @memcg: memcg to charge
- * @nr_pages: number of pages to charge
- *
- * Charges @nr_pages to @memcg. Returns %true if the charge fit within
- * @memcg's configured limit, %false if the charge had to be forced.
- */
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	struct page_counter *counter;
-
-	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
-				    nr_pages, &counter)) {
-		memcg->tcp_mem.memory_pressure = 0;
-		return true;
-	}
-	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
-	memcg->tcp_mem.memory_pressure = 1;
-	return false;
-}
-
-/**
- * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @memcg - memcg to uncharge
- * @nr_pages - number of pages to uncharge
- */
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
-}
-
-#endif
-
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * This will be the memcg's index in each cache's ->memcg_params.memcg_caches.
@@ -5544,6 +5470,80 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 	commit_charge(newpage, memcg, true);
 }
 
+/* Writing them here to avoid exposing memcg's inner layout */
+#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+
+struct static_key memcg_sockets_enabled_key;
+EXPORT_SYMBOL(memcg_sockets_enabled_key);
+
+void sock_update_memcg(struct sock *sk)
+{
+	struct mem_cgroup *memcg;
+
+	/* Socket cloning can throw us here with sk_cgrp already
+	 * filled. It won't however, necessarily happen from
+	 * process context. So the test for root memcg given
+	 * the current task's memcg won't help us in this case.
+	 *
+	 * Respecting the original socket's memcg is a better
+	 * decision in this case.
+	 */
+	if (sk->sk_memcg) {
+		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+		css_get(&sk->sk_memcg->css);
+		return;
+	}
+
+	rcu_read_lock();
+	memcg = mem_cgroup_from_task(current);
+	if (memcg != root_mem_cgroup &&
+	    memcg->tcp_mem.active &&
+	    css_tryget_online(&memcg->css))
+		sk->sk_memcg = memcg;
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(sock_update_memcg);
+
+void sock_release_memcg(struct sock *sk)
+{
+	WARN_ON(!sk->sk_memcg);
+	css_put(&sk->sk_memcg->css);
+}
+
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	struct page_counter *counter;
+
+	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
+				    nr_pages, &counter)) {
+		memcg->tcp_mem.memory_pressure = 0;
+		return true;
+	}
+	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+	memcg->tcp_mem.memory_pressure = 1;
+	return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @memcg - memcg to uncharge
+ * @nr_pages - number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
+}
+
+#endif
+
 /*
  * subsys_initcall() for memory controller.
  *
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:58   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Socket memory can be a significant share of overall memory consumed by
common workloads. In order to provide reasonable resource isolation in
the unified hierarchy, this type of memory needs to be included in the
tracking/accounting of a cgroup under active memory resource control.

Overhead is only incurred when a non-root control group is created AND
the memory controller is instructed to track and account the memory
footprint of that group. cgroup.memory=nosocket can be specified on
the boot commandline to override any runtime configuration and
forcibly exclude socket memory from active memory resource control.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/kernel-parameters.txt |   4 ++
 include/linux/memcontrol.h          |  11 +++-
 mm/memcontrol.c                     | 122 +++++++++++++++++++++++++++++-------
 3 files changed, 111 insertions(+), 26 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d..7868f1b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -599,6 +599,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			cut the overhead, others just disable the usage. So
 			only cgroup_disable=memory is actually worthy}
 
+	cgroup.memory=	[KNL] Pass options to the cgroup memory controller.
+			Format: <string>
+			nosocket -- Disable socket memory accounting.
+
 	checkreqprot	[SELINUX] Set initial checkreqprot flag value.
 			Format: { "0" | "1" }
 			See security/selinux/Kconfig help text.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index dad56ef..fae0aaf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -170,6 +170,9 @@ struct mem_cgroup {
 	unsigned long low;
 	unsigned long high;
 
+	/* Range enforcement for interrupt charges */
+	struct work_struct high_work;
+
 	unsigned long soft_limit;
 
 	/* vmpressure notifications */
@@ -679,7 +682,7 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
 struct sock;
 extern struct static_key memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
@@ -689,11 +692,15 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
+#ifdef CONFIG_MEMCG_KMEM
 	return memcg->tcp_mem.memory_pressure;
+#else
+	return false;
+#endif
 }
 #else
 #define mem_cgroup_sockets_enabled 0
-#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
+#endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
 extern struct static_key memcg_kmem_enabled_key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ed030b5..59555b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -80,6 +80,9 @@ struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #define MEM_CGROUP_RECLAIM_RETRIES	5
 
+/* Socket memory accounting disabled? */
+static bool cgroup_memory_nosocket;
+
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
 int do_swap_account __read_mostly;
@@ -1923,6 +1926,26 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+static void reclaim_high(struct mem_cgroup *memcg,
+			 unsigned int nr_pages,
+			 gfp_t gfp_mask)
+{
+	do {
+		if (page_counter_read(&memcg->memory) <= memcg->high)
+			continue;
+		mem_cgroup_events(memcg, MEMCG_HIGH, 1);
+		try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
+	} while ((memcg = parent_mem_cgroup(memcg)));
+}
+
+static void high_work_func(struct work_struct *work)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = container_of(work, struct mem_cgroup, high_work);
+	reclaim_high(memcg, CHARGE_BATCH, GFP_KERNEL);
+}
+
 /*
  * Scheduled by try_charge() to be executed from the userland return path
  * and reclaims memory over the high limit.
@@ -1930,20 +1953,13 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
 void mem_cgroup_handle_over_high(void)
 {
 	unsigned int nr_pages = current->memcg_nr_pages_over_high;
-	struct mem_cgroup *memcg, *pos;
+	struct mem_cgroup *memcg;
 
 	if (likely(!nr_pages))
 		return;
 
-	pos = memcg = get_mem_cgroup_from_mm(current->mm);
-
-	do {
-		if (page_counter_read(&pos->memory) <= pos->high)
-			continue;
-		mem_cgroup_events(pos, MEMCG_HIGH, 1);
-		try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
-	} while ((pos = parent_mem_cgroup(pos)));
-
+	memcg = get_mem_cgroup_from_mm(current->mm);
+	reclaim_high(memcg, nr_pages, GFP_KERNEL);
 	css_put(&memcg->css);
 	current->memcg_nr_pages_over_high = 0;
 }
@@ -2078,6 +2094,11 @@ done_restock:
 	 */
 	do {
 		if (page_counter_read(&memcg->memory) > memcg->high) {
+			/* Don't bother a random interrupted task */
+			if (in_interrupt()) {
+				schedule_work(&memcg->high_work);
+				break;
+			}
 			current->memcg_nr_pages_over_high += batch;
 			set_notify_resume(current);
 			break;
@@ -4126,6 +4147,8 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
 	int node;
 
+	cancel_work_sync(&memcg->high_work);
+
 	mem_cgroup_remove_from_trees(memcg);
 
 	for_each_node(node)
@@ -4172,6 +4195,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		page_counter_init(&memcg->kmem, NULL);
 	}
 
+	INIT_WORK(&memcg->high_work, high_work_func);
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
 	memcg->move_charge_at_immigrate = 0;
@@ -4243,6 +4267,11 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	if (ret)
 		return ret;
 
+#ifdef CONFIG_INET
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+		static_key_slow_inc(&memcg_sockets_enabled_key);
+#endif
+
 	/*
 	 * Make sure the memcg is initialized: mem_cgroup_iter()
 	 * orders reading memcg->initialized against its callers
@@ -4282,6 +4311,10 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
 	memcg_destroy_kmem(memcg);
+#ifdef CONFIG_INET
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+		static_key_slow_dec(&memcg_sockets_enabled_key);
+#endif
 	__mem_cgroup_free(memcg);
 }
 
@@ -5470,8 +5503,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 	commit_charge(newpage, memcg, true);
 }
 
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
 
 struct static_key memcg_sockets_enabled_key;
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
@@ -5496,10 +5528,15 @@ void sock_update_memcg(struct sock *sk)
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
-	if (memcg != root_mem_cgroup &&
-	    memcg->tcp_mem.active &&
-	    css_tryget_online(&memcg->css))
+	if (memcg == root_mem_cgroup)
+		goto out;
+#ifdef CONFIG_MEMCG_KMEM
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcp_mem.active)
+		goto out;
+#endif
+	if (css_tryget_online(&memcg->css))
 		sk->sk_memcg = memcg;
+out:
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(sock_update_memcg);
@@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
  */
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	struct page_counter *counter;
+	gfp_t gfp_mask = GFP_KERNEL;
 
-	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
-				    nr_pages, &counter)) {
-		memcg->tcp_mem.memory_pressure = 0;
-		return true;
+#ifdef CONFIG_MEMCG_KMEM
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+		struct page_counter *counter;
+
+		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
+					    nr_pages, &counter)) {
+			memcg->tcp_mem.memory_pressure = 0;
+			return true;
+		}
+		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+		memcg->tcp_mem.memory_pressure = 1;
+		return false;
 	}
-	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
-	memcg->tcp_mem.memory_pressure = 1;
+#endif
+	/* Don't block in the packet receive path */
+	if (in_softirq())
+		gfp_mask = GFP_NOWAIT;
+
+	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
+		return true;
+
+	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
 	return false;
 }
 
@@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
  */
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
+#ifdef CONFIG_MEMCG_KMEM
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
+				      nr_pages);
+		return;
+	}
+#endif
+	page_counter_uncharge(&memcg->memory, nr_pages);
+	css_put_many(&memcg->css, nr_pages);
 }
 
-#endif
+#endif /* CONFIG_INET */
+
+static int __init cgroup_memory(char *s)
+{
+	char *token;
+
+	while ((token = strsep(&s, ",")) != NULL) {
+		if (!*token)
+			continue;
+		if (!strcmp(token, "nosocket"))
+			cgroup_memory_nosocket = true;
+	}
+	return 0;
+}
+__setup("cgroup.memory=", cgroup_memory);
 
 /*
  * subsys_initcall() for memory controller.
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-24 21:58   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Socket memory can be a significant share of overall memory consumed by
common workloads. In order to provide reasonable resource isolation in
the unified hierarchy, this type of memory needs to be included in the
tracking/accounting of a cgroup under active memory resource control.

Overhead is only incurred when a non-root control group is created AND
the memory controller is instructed to track and account the memory
footprint of that group. cgroup.memory=nosocket can be specified on
the boot commandline to override any runtime configuration and
forcibly exclude socket memory from active memory resource control.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/kernel-parameters.txt |   4 ++
 include/linux/memcontrol.h          |  11 +++-
 mm/memcontrol.c                     | 122 +++++++++++++++++++++++++++++-------
 3 files changed, 111 insertions(+), 26 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d..7868f1b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -599,6 +599,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			cut the overhead, others just disable the usage. So
 			only cgroup_disable=memory is actually worthy}
 
+	cgroup.memory=	[KNL] Pass options to the cgroup memory controller.
+			Format: <string>
+			nosocket -- Disable socket memory accounting.
+
 	checkreqprot	[SELINUX] Set initial checkreqprot flag value.
 			Format: { "0" | "1" }
 			See security/selinux/Kconfig help text.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index dad56ef..fae0aaf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -170,6 +170,9 @@ struct mem_cgroup {
 	unsigned long low;
 	unsigned long high;
 
+	/* Range enforcement for interrupt charges */
+	struct work_struct high_work;
+
 	unsigned long soft_limit;
 
 	/* vmpressure notifications */
@@ -679,7 +682,7 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
 struct sock;
 extern struct static_key memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
@@ -689,11 +692,15 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
+#ifdef CONFIG_MEMCG_KMEM
 	return memcg->tcp_mem.memory_pressure;
+#else
+	return false;
+#endif
 }
 #else
 #define mem_cgroup_sockets_enabled 0
-#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
+#endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
 extern struct static_key memcg_kmem_enabled_key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ed030b5..59555b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -80,6 +80,9 @@ struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #define MEM_CGROUP_RECLAIM_RETRIES	5
 
+/* Socket memory accounting disabled? */
+static bool cgroup_memory_nosocket;
+
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
 int do_swap_account __read_mostly;
@@ -1923,6 +1926,26 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+static void reclaim_high(struct mem_cgroup *memcg,
+			 unsigned int nr_pages,
+			 gfp_t gfp_mask)
+{
+	do {
+		if (page_counter_read(&memcg->memory) <= memcg->high)
+			continue;
+		mem_cgroup_events(memcg, MEMCG_HIGH, 1);
+		try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
+	} while ((memcg = parent_mem_cgroup(memcg)));
+}
+
+static void high_work_func(struct work_struct *work)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = container_of(work, struct mem_cgroup, high_work);
+	reclaim_high(memcg, CHARGE_BATCH, GFP_KERNEL);
+}
+
 /*
  * Scheduled by try_charge() to be executed from the userland return path
  * and reclaims memory over the high limit.
@@ -1930,20 +1953,13 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
 void mem_cgroup_handle_over_high(void)
 {
 	unsigned int nr_pages = current->memcg_nr_pages_over_high;
-	struct mem_cgroup *memcg, *pos;
+	struct mem_cgroup *memcg;
 
 	if (likely(!nr_pages))
 		return;
 
-	pos = memcg = get_mem_cgroup_from_mm(current->mm);
-
-	do {
-		if (page_counter_read(&pos->memory) <= pos->high)
-			continue;
-		mem_cgroup_events(pos, MEMCG_HIGH, 1);
-		try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
-	} while ((pos = parent_mem_cgroup(pos)));
-
+	memcg = get_mem_cgroup_from_mm(current->mm);
+	reclaim_high(memcg, nr_pages, GFP_KERNEL);
 	css_put(&memcg->css);
 	current->memcg_nr_pages_over_high = 0;
 }
@@ -2078,6 +2094,11 @@ done_restock:
 	 */
 	do {
 		if (page_counter_read(&memcg->memory) > memcg->high) {
+			/* Don't bother a random interrupted task */
+			if (in_interrupt()) {
+				schedule_work(&memcg->high_work);
+				break;
+			}
 			current->memcg_nr_pages_over_high += batch;
 			set_notify_resume(current);
 			break;
@@ -4126,6 +4147,8 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
 	int node;
 
+	cancel_work_sync(&memcg->high_work);
+
 	mem_cgroup_remove_from_trees(memcg);
 
 	for_each_node(node)
@@ -4172,6 +4195,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 		page_counter_init(&memcg->kmem, NULL);
 	}
 
+	INIT_WORK(&memcg->high_work, high_work_func);
 	memcg->last_scanned_node = MAX_NUMNODES;
 	INIT_LIST_HEAD(&memcg->oom_notify);
 	memcg->move_charge_at_immigrate = 0;
@@ -4243,6 +4267,11 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	if (ret)
 		return ret;
 
+#ifdef CONFIG_INET
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+		static_key_slow_inc(&memcg_sockets_enabled_key);
+#endif
+
 	/*
 	 * Make sure the memcg is initialized: mem_cgroup_iter()
 	 * orders reading memcg->initialized against its callers
@@ -4282,6 +4311,10 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
 	memcg_destroy_kmem(memcg);
+#ifdef CONFIG_INET
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+		static_key_slow_dec(&memcg_sockets_enabled_key);
+#endif
 	__mem_cgroup_free(memcg);
 }
 
@@ -5470,8 +5503,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 	commit_charge(newpage, memcg, true);
 }
 
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
 
 struct static_key memcg_sockets_enabled_key;
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
@@ -5496,10 +5528,15 @@ void sock_update_memcg(struct sock *sk)
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
-	if (memcg != root_mem_cgroup &&
-	    memcg->tcp_mem.active &&
-	    css_tryget_online(&memcg->css))
+	if (memcg == root_mem_cgroup)
+		goto out;
+#ifdef CONFIG_MEMCG_KMEM
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcp_mem.active)
+		goto out;
+#endif
+	if (css_tryget_online(&memcg->css))
 		sk->sk_memcg = memcg;
+out:
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(sock_update_memcg);
@@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
  */
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	struct page_counter *counter;
+	gfp_t gfp_mask = GFP_KERNEL;
 
-	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
-				    nr_pages, &counter)) {
-		memcg->tcp_mem.memory_pressure = 0;
-		return true;
+#ifdef CONFIG_MEMCG_KMEM
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+		struct page_counter *counter;
+
+		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
+					    nr_pages, &counter)) {
+			memcg->tcp_mem.memory_pressure = 0;
+			return true;
+		}
+		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+		memcg->tcp_mem.memory_pressure = 1;
+		return false;
 	}
-	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
-	memcg->tcp_mem.memory_pressure = 1;
+#endif
+	/* Don't block in the packet receive path */
+	if (in_softirq())
+		gfp_mask = GFP_NOWAIT;
+
+	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
+		return true;
+
+	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
 	return false;
 }
 
@@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
  */
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
+#ifdef CONFIG_MEMCG_KMEM
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
+				      nr_pages);
+		return;
+	}
+#endif
+	page_counter_uncharge(&memcg->memory, nr_pages);
+	css_put_many(&memcg->css, nr_pages);
 }
 
-#endif
+#endif /* CONFIG_INET */
+
+static int __init cgroup_memory(char *s)
+{
+	char *token;
+
+	while ((token = strsep(&s, ",")) != NULL) {
+		if (!*token)
+			continue;
+		if (!strcmp(token, "nosocket"))
+			cgroup_memory_nosocket = true;
+	}
+	return 0;
+}
+__setup("cgroup.memory=", cgroup_memory);
 
 /*
  * subsys_initcall() for memory controller.
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
  2015-11-24 21:51 ` Johannes Weiner
@ 2015-11-24 21:59   ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Let the networking stack know when a memcg is under reclaim pressure
so that it can clamp its transmit windows accordingly.

Whenever the reclaim efficiency of a cgroup's LRU lists drops low
enough for a MEDIUM or HIGH vmpressure event to occur, assert a
pressure state in the socket and tcp memory code that tells it to curb
consumption growth from sockets associated with said control group.

Traditionally, vmpressure reports for the entire subtree of a memcg
under pressure, which drops useful information on the individual
groups reclaimed. However, it's too late to change the userinterface,
so add a second reporting mode that reports on the level of reclaim
instead of at the level of pressure, and use that report for sockets.

vmpressure events are naturally edge triggered, so for hysteresis
assert socket pressure for a second to allow for subsequent vmpressure
events to occur before letting the socket code return to normal.

This will likely need finetuning for a wider variety of workloads, but
for now stick to the vmpressure presets and keep hysteresis simple.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 32 ++++++++++++++++---
 include/linux/vmpressure.h |  5 ++-
 mm/memcontrol.c            | 17 ++--------
 mm/vmpressure.c            | 78 +++++++++++++++++++++++++++++++++++-----------
 mm/vmscan.c                | 10 +++++-
 5 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index fae0aaf..a8df46c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -249,6 +249,10 @@ struct mem_cgroup {
 	struct wb_domain cgwb_domain;
 #endif
 
+#ifdef CONFIG_INET
+	unsigned long		socket_pressure;
+#endif
+
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
@@ -292,18 +296,34 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
 
 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg);
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
 
 static inline
 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
 	return css ? container_of(css, struct mem_cgroup, css) : NULL;
 }
 
+#define mem_cgroup_from_counter(counter, member)	\
+	container_of(counter, struct mem_cgroup, member)
+
 struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
 				   struct mem_cgroup *,
 				   struct mem_cgroup_reclaim_cookie *);
 void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
 
+/**
+ * parent_mem_cgroup - find the accounting parent of a memcg
+ * @memcg: memcg whose parent to find
+ *
+ * Returns the parent memcg, or NULL if this is the root or the memory
+ * controller is in legacy no-hierarchy mode.
+ */
+static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
+{
+	if (!memcg->memory.parent)
+		return NULL;
+	return mem_cgroup_from_counter(memcg->memory.parent, memory);
+}
+
 static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
 			      struct mem_cgroup *root)
 {
@@ -693,10 +713,14 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 #ifdef CONFIG_MEMCG_KMEM
-	return memcg->tcp_mem.memory_pressure;
-#else
-	return false;
+	if (memcg->tcp_mem.memory_pressure)
+		return true;
 #endif
+	do {
+		if (time_before(jiffies, memcg->socket_pressure))
+			return true;
+	} while ((memcg = parent_mem_cgroup(memcg)));
+	return false;
 }
 #else
 #define mem_cgroup_sockets_enabled 0
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 3e45358..a77b142 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -12,6 +12,9 @@
 struct vmpressure {
 	unsigned long scanned;
 	unsigned long reclaimed;
+
+	unsigned long tree_scanned;
+	unsigned long tree_reclaimed;
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
 	struct spinlock sr_lock;
 
@@ -26,7 +29,7 @@ struct vmpressure {
 struct mem_cgroup;
 
 #ifdef CONFIG_MEMCG
-extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
+extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
 		       unsigned long scanned, unsigned long reclaimed);
 extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 59555b0..a0da91f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1091,9 +1091,6 @@ bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)
 	return ret;
 }
 
-#define mem_cgroup_from_counter(counter, member)	\
-	container_of(counter, struct mem_cgroup, member)
-
 /**
  * mem_cgroup_margin - calculate chargeable space of a memory cgroup
  * @memcg: the memory cgroup
@@ -4159,17 +4156,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 	kfree(memcg);
 }
 
-/*
- * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
- */
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
-{
-	if (!memcg->memory.parent)
-		return NULL;
-	return mem_cgroup_from_counter(memcg->memory.parent, memory);
-}
-EXPORT_SYMBOL(parent_mem_cgroup);
-
 static struct cgroup_subsys_state * __ref
 mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
@@ -4210,6 +4196,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&memcg->cgwb_list);
 #endif
+#ifdef CONFIG_INET
+	memcg->socket_pressure = jiffies;
+#endif
 	return &memcg->css;
 
 free_out:
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 4c25e62..af262bb 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -137,14 +137,11 @@ struct vmpressure_event {
 };
 
 static bool vmpressure_event(struct vmpressure *vmpr,
-			     unsigned long scanned, unsigned long reclaimed)
+			     enum vmpressure_levels level)
 {
 	struct vmpressure_event *ev;
-	enum vmpressure_levels level;
 	bool signalled = false;
 
-	level = vmpressure_calc_level(scanned, reclaimed);
-
 	mutex_lock(&vmpr->events_lock);
 
 	list_for_each_entry(ev, &vmpr->events, node) {
@@ -164,6 +161,7 @@ static void vmpressure_work_fn(struct work_struct *work)
 	struct vmpressure *vmpr = work_to_vmpressure(work);
 	unsigned long scanned;
 	unsigned long reclaimed;
+	enum vmpressure_levels level;
 
 	spin_lock(&vmpr->sr_lock);
 	/*
@@ -174,19 +172,21 @@ static void vmpressure_work_fn(struct work_struct *work)
 	 * here. No need for any locks here since we don't care if
 	 * vmpr->reclaimed is in sync.
 	 */
-	scanned = vmpr->scanned;
+	scanned = vmpr->tree_scanned;
 	if (!scanned) {
 		spin_unlock(&vmpr->sr_lock);
 		return;
 	}
 
-	reclaimed = vmpr->reclaimed;
-	vmpr->scanned = 0;
-	vmpr->reclaimed = 0;
+	reclaimed = vmpr->tree_reclaimed;
+	vmpr->tree_scanned = 0;
+	vmpr->tree_reclaimed = 0;
 	spin_unlock(&vmpr->sr_lock);
 
+	level = vmpressure_calc_level(scanned, reclaimed);
+
 	do {
-		if (vmpressure_event(vmpr, scanned, reclaimed))
+		if (vmpressure_event(vmpr, level))
 			break;
 		/*
 		 * If not handled, propagate the event upward into the
@@ -199,6 +199,7 @@ static void vmpressure_work_fn(struct work_struct *work)
  * vmpressure() - Account memory pressure through scanned/reclaimed ratio
  * @gfp:	reclaimer's gfp mask
  * @memcg:	cgroup memory controller handle
+ * @tree:	legacy subtree mode
  * @scanned:	number of pages scanned
  * @reclaimed:	number of pages reclaimed
  *
@@ -206,9 +207,16 @@ static void vmpressure_work_fn(struct work_struct *work)
  * "instantaneous" memory pressure (scanned/reclaimed ratio). The raw
  * pressure index is then further refined and averaged over time.
  *
+ * If @tree is set, vmpressure is in traditional userspace reporting
+ * mode: @memcg is considered the pressure root and userspace is
+ * notified of the entire subtree's reclaim efficiency.
+ *
+ * If @tree is not set, reclaim efficiency is recorded for @memcg, and
+ * only in-kernel users are notified.
+ *
  * This function does not return any value.
  */
-void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
+void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
 		unsigned long scanned, unsigned long reclaimed)
 {
 	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
@@ -238,15 +246,47 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	if (!scanned)
 		return;
 
-	spin_lock(&vmpr->sr_lock);
-	vmpr->scanned += scanned;
-	vmpr->reclaimed += reclaimed;
-	scanned = vmpr->scanned;
-	spin_unlock(&vmpr->sr_lock);
+	if (tree) {
+		spin_lock(&vmpr->sr_lock);
+		vmpr->tree_scanned += scanned;
+		vmpr->tree_reclaimed += reclaimed;
+		scanned = vmpr->scanned;
+		spin_unlock(&vmpr->sr_lock);
 
-	if (scanned < vmpressure_win)
-		return;
-	schedule_work(&vmpr->work);
+		if (scanned < vmpressure_win)
+			return;
+		schedule_work(&vmpr->work);
+	} else {
+		enum vmpressure_levels level;
+
+		/* For now, no users for root-level efficiency */
+		if (memcg == root_mem_cgroup)
+			return;
+
+		spin_lock(&vmpr->sr_lock);
+		scanned = vmpr->scanned += scanned;
+		reclaimed = vmpr->reclaimed += reclaimed;
+		if (scanned < vmpressure_win) {
+			spin_unlock(&vmpr->sr_lock);
+			return;
+		}
+		vmpr->scanned = vmpr->reclaimed = 0;
+		spin_unlock(&vmpr->sr_lock);
+
+		level = vmpressure_calc_level(scanned, reclaimed);
+
+		if (level > VMPRESSURE_LOW) {
+			/*
+			 * Let the socket buffer allocator know that
+			 * we are having trouble reclaiming LRU pages.
+			 *
+			 * For hysteresis keep the pressure state
+			 * asserted for a second in which subsequent
+			 * pressure events can occur.
+			 */
+			memcg->socket_pressure = jiffies + HZ;
+		}
+	}
 }
 
 /**
@@ -276,7 +316,7 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 	 * to the vmpressure() basically means that we signal 'critical'
 	 * level.
 	 */
-	vmpressure(gfp, memcg, vmpressure_win, 0);
+	vmpressure(gfp, memcg, true, vmpressure_win, 0);
 }
 
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 97ba9e1..50e54c0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 		memcg = mem_cgroup_iter(root, NULL, &reclaim);
 		do {
 			unsigned long lru_pages;
+			unsigned long reclaimed;
 			unsigned long scanned;
 			struct lruvec *lruvec;
 			int swappiness;
@@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 
 			lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 			swappiness = mem_cgroup_swappiness(memcg);
+			reclaimed = sc->nr_reclaimed;
 			scanned = sc->nr_scanned;
 
 			shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
@@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 					    memcg, sc->nr_scanned - scanned,
 					    lru_pages);
 
+			/* Record the group's reclaim efficiency */
+			vmpressure(sc->gfp_mask, memcg, false,
+				   sc->nr_scanned - scanned,
+				   sc->nr_reclaimed - reclaimed);
+
 			/*
 			 * Direct reclaim and kswapd have to scan all memory
 			 * cgroups to fulfill the overall scan target for the
@@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 			reclaim_state->reclaimed_slab = 0;
 		}
 
-		vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
+		/* Record the subtree's reclaim efficiency */
+		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
 			   sc->nr_scanned - nr_scanned,
 			   sc->nr_reclaimed - nr_reclaimed);
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-24 21:59   ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-24 21:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

Let the networking stack know when a memcg is under reclaim pressure
so that it can clamp its transmit windows accordingly.

Whenever the reclaim efficiency of a cgroup's LRU lists drops low
enough for a MEDIUM or HIGH vmpressure event to occur, assert a
pressure state in the socket and tcp memory code that tells it to curb
consumption growth from sockets associated with said control group.

Traditionally, vmpressure reports for the entire subtree of a memcg
under pressure, which drops useful information on the individual
groups reclaimed. However, it's too late to change the userinterface,
so add a second reporting mode that reports on the level of reclaim
instead of at the level of pressure, and use that report for sockets.

vmpressure events are naturally edge triggered, so for hysteresis
assert socket pressure for a second to allow for subsequent vmpressure
events to occur before letting the socket code return to normal.

This will likely need finetuning for a wider variety of workloads, but
for now stick to the vmpressure presets and keep hysteresis simple.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 32 ++++++++++++++++---
 include/linux/vmpressure.h |  5 ++-
 mm/memcontrol.c            | 17 ++--------
 mm/vmpressure.c            | 78 +++++++++++++++++++++++++++++++++++-----------
 mm/vmscan.c                | 10 +++++-
 5 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index fae0aaf..a8df46c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -249,6 +249,10 @@ struct mem_cgroup {
 	struct wb_domain cgwb_domain;
 #endif
 
+#ifdef CONFIG_INET
+	unsigned long		socket_pressure;
+#endif
+
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
@@ -292,18 +296,34 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
 
 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg);
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
 
 static inline
 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
 	return css ? container_of(css, struct mem_cgroup, css) : NULL;
 }
 
+#define mem_cgroup_from_counter(counter, member)	\
+	container_of(counter, struct mem_cgroup, member)
+
 struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
 				   struct mem_cgroup *,
 				   struct mem_cgroup_reclaim_cookie *);
 void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
 
+/**
+ * parent_mem_cgroup - find the accounting parent of a memcg
+ * @memcg: memcg whose parent to find
+ *
+ * Returns the parent memcg, or NULL if this is the root or the memory
+ * controller is in legacy no-hierarchy mode.
+ */
+static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
+{
+	if (!memcg->memory.parent)
+		return NULL;
+	return mem_cgroup_from_counter(memcg->memory.parent, memory);
+}
+
 static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
 			      struct mem_cgroup *root)
 {
@@ -693,10 +713,14 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 #ifdef CONFIG_MEMCG_KMEM
-	return memcg->tcp_mem.memory_pressure;
-#else
-	return false;
+	if (memcg->tcp_mem.memory_pressure)
+		return true;
 #endif
+	do {
+		if (time_before(jiffies, memcg->socket_pressure))
+			return true;
+	} while ((memcg = parent_mem_cgroup(memcg)));
+	return false;
 }
 #else
 #define mem_cgroup_sockets_enabled 0
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 3e45358..a77b142 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -12,6 +12,9 @@
 struct vmpressure {
 	unsigned long scanned;
 	unsigned long reclaimed;
+
+	unsigned long tree_scanned;
+	unsigned long tree_reclaimed;
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
 	struct spinlock sr_lock;
 
@@ -26,7 +29,7 @@ struct vmpressure {
 struct mem_cgroup;
 
 #ifdef CONFIG_MEMCG
-extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
+extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
 		       unsigned long scanned, unsigned long reclaimed);
 extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 59555b0..a0da91f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1091,9 +1091,6 @@ bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)
 	return ret;
 }
 
-#define mem_cgroup_from_counter(counter, member)	\
-	container_of(counter, struct mem_cgroup, member)
-
 /**
  * mem_cgroup_margin - calculate chargeable space of a memory cgroup
  * @memcg: the memory cgroup
@@ -4159,17 +4156,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 	kfree(memcg);
 }
 
-/*
- * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
- */
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
-{
-	if (!memcg->memory.parent)
-		return NULL;
-	return mem_cgroup_from_counter(memcg->memory.parent, memory);
-}
-EXPORT_SYMBOL(parent_mem_cgroup);
-
 static struct cgroup_subsys_state * __ref
 mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
@@ -4210,6 +4196,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&memcg->cgwb_list);
 #endif
+#ifdef CONFIG_INET
+	memcg->socket_pressure = jiffies;
+#endif
 	return &memcg->css;
 
 free_out:
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 4c25e62..af262bb 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -137,14 +137,11 @@ struct vmpressure_event {
 };
 
 static bool vmpressure_event(struct vmpressure *vmpr,
-			     unsigned long scanned, unsigned long reclaimed)
+			     enum vmpressure_levels level)
 {
 	struct vmpressure_event *ev;
-	enum vmpressure_levels level;
 	bool signalled = false;
 
-	level = vmpressure_calc_level(scanned, reclaimed);
-
 	mutex_lock(&vmpr->events_lock);
 
 	list_for_each_entry(ev, &vmpr->events, node) {
@@ -164,6 +161,7 @@ static void vmpressure_work_fn(struct work_struct *work)
 	struct vmpressure *vmpr = work_to_vmpressure(work);
 	unsigned long scanned;
 	unsigned long reclaimed;
+	enum vmpressure_levels level;
 
 	spin_lock(&vmpr->sr_lock);
 	/*
@@ -174,19 +172,21 @@ static void vmpressure_work_fn(struct work_struct *work)
 	 * here. No need for any locks here since we don't care if
 	 * vmpr->reclaimed is in sync.
 	 */
-	scanned = vmpr->scanned;
+	scanned = vmpr->tree_scanned;
 	if (!scanned) {
 		spin_unlock(&vmpr->sr_lock);
 		return;
 	}
 
-	reclaimed = vmpr->reclaimed;
-	vmpr->scanned = 0;
-	vmpr->reclaimed = 0;
+	reclaimed = vmpr->tree_reclaimed;
+	vmpr->tree_scanned = 0;
+	vmpr->tree_reclaimed = 0;
 	spin_unlock(&vmpr->sr_lock);
 
+	level = vmpressure_calc_level(scanned, reclaimed);
+
 	do {
-		if (vmpressure_event(vmpr, scanned, reclaimed))
+		if (vmpressure_event(vmpr, level))
 			break;
 		/*
 		 * If not handled, propagate the event upward into the
@@ -199,6 +199,7 @@ static void vmpressure_work_fn(struct work_struct *work)
  * vmpressure() - Account memory pressure through scanned/reclaimed ratio
  * @gfp:	reclaimer's gfp mask
  * @memcg:	cgroup memory controller handle
+ * @tree:	legacy subtree mode
  * @scanned:	number of pages scanned
  * @reclaimed:	number of pages reclaimed
  *
@@ -206,9 +207,16 @@ static void vmpressure_work_fn(struct work_struct *work)
  * "instantaneous" memory pressure (scanned/reclaimed ratio). The raw
  * pressure index is then further refined and averaged over time.
  *
+ * If @tree is set, vmpressure is in traditional userspace reporting
+ * mode: @memcg is considered the pressure root and userspace is
+ * notified of the entire subtree's reclaim efficiency.
+ *
+ * If @tree is not set, reclaim efficiency is recorded for @memcg, and
+ * only in-kernel users are notified.
+ *
  * This function does not return any value.
  */
-void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
+void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
 		unsigned long scanned, unsigned long reclaimed)
 {
 	struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
@@ -238,15 +246,47 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	if (!scanned)
 		return;
 
-	spin_lock(&vmpr->sr_lock);
-	vmpr->scanned += scanned;
-	vmpr->reclaimed += reclaimed;
-	scanned = vmpr->scanned;
-	spin_unlock(&vmpr->sr_lock);
+	if (tree) {
+		spin_lock(&vmpr->sr_lock);
+		vmpr->tree_scanned += scanned;
+		vmpr->tree_reclaimed += reclaimed;
+		scanned = vmpr->scanned;
+		spin_unlock(&vmpr->sr_lock);
 
-	if (scanned < vmpressure_win)
-		return;
-	schedule_work(&vmpr->work);
+		if (scanned < vmpressure_win)
+			return;
+		schedule_work(&vmpr->work);
+	} else {
+		enum vmpressure_levels level;
+
+		/* For now, no users for root-level efficiency */
+		if (memcg == root_mem_cgroup)
+			return;
+
+		spin_lock(&vmpr->sr_lock);
+		scanned = vmpr->scanned += scanned;
+		reclaimed = vmpr->reclaimed += reclaimed;
+		if (scanned < vmpressure_win) {
+			spin_unlock(&vmpr->sr_lock);
+			return;
+		}
+		vmpr->scanned = vmpr->reclaimed = 0;
+		spin_unlock(&vmpr->sr_lock);
+
+		level = vmpressure_calc_level(scanned, reclaimed);
+
+		if (level > VMPRESSURE_LOW) {
+			/*
+			 * Let the socket buffer allocator know that
+			 * we are having trouble reclaiming LRU pages.
+			 *
+			 * For hysteresis keep the pressure state
+			 * asserted for a second in which subsequent
+			 * pressure events can occur.
+			 */
+			memcg->socket_pressure = jiffies + HZ;
+		}
+	}
 }
 
 /**
@@ -276,7 +316,7 @@ void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio)
 	 * to the vmpressure() basically means that we signal 'critical'
 	 * level.
 	 */
-	vmpressure(gfp, memcg, vmpressure_win, 0);
+	vmpressure(gfp, memcg, true, vmpressure_win, 0);
 }
 
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 97ba9e1..50e54c0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 		memcg = mem_cgroup_iter(root, NULL, &reclaim);
 		do {
 			unsigned long lru_pages;
+			unsigned long reclaimed;
 			unsigned long scanned;
 			struct lruvec *lruvec;
 			int swappiness;
@@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 
 			lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 			swappiness = mem_cgroup_swappiness(memcg);
+			reclaimed = sc->nr_reclaimed;
 			scanned = sc->nr_scanned;
 
 			shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
@@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 					    memcg, sc->nr_scanned - scanned,
 					    lru_pages);
 
+			/* Record the group's reclaim efficiency */
+			vmpressure(sc->gfp_mask, memcg, false,
+				   sc->nr_scanned - scanned,
+				   sc->nr_reclaimed - reclaimed);
+
 			/*
 			 * Direct reclaim and kswapd have to scan all memory
 			 * cgroups to fulfill the overall scan target for the
@@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 			reclaim_state->reclaimed_slab = 0;
 		}
 
-		vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
+		/* Record the subtree's reclaim efficiency */
+		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
 			   sc->nr_scanned - nr_scanned,
 			   sc->nr_reclaimed - nr_reclaimed);
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/13] net: tcp_memcontrol: simplify the per-memcg limit access
@ 2015-11-25 16:26     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:26 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:51:58 -0500

> tcp_memcontrol replicates the global sysctl_mem limit array per
> cgroup, but it only ever sets these entries to the value of the
> memory_allocated page_counter limit. Use the latter directly.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/13] net: tcp_memcontrol: simplify the per-memcg limit access
@ 2015-11-25 16:26     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:26 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	vdavydov-5HdwGun5lf+gSpxsJD1C4w, mhocko-AlSwsSmVLrQ,
	tj-DgEjT+Ai2ygdnm+yROfE0A, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Date: Tue, 24 Nov 2015 16:51:58 -0500

> tcp_memcontrol replicates the global sysctl_mem limit array per
> cgroup, but it only ever sets these entries to the value of the
> memory_allocated page_counter limit. Use the latter directly.
> 
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

Acked-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/13] net: tcp_memcontrol: simplify the per-memcg limit access
@ 2015-11-25 16:26     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:26 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:51:58 -0500

> tcp_memcontrol replicates the global sysctl_mem limit array per
> cgroup, but it only ever sets these entries to the value of the
> memory_allocated page_counter limit. Use the latter directly.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/13] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
  2015-11-24 21:51   ` Johannes Weiner
@ 2015-11-25 16:28     ` David Miller
  -1 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:28 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:51:59 -0500

> There won't be a tcp control soft limit, so integrating the memcg code
> into the global skmem limiting scheme complicates things
> unnecessarily. Replace this with simple and clear charge and uncharge
> calls--hidden behind a jump label--to account skb memory.
> 
> Note that this is not purely aesthetic: as a result of shoehorning the
> per-memcg code into the same memory accounting functions that handle
> the global level, the old code would compare the per-memcg consumption
> against the smaller of the per-memcg limit and the global limit. This
> allowed the total consumption of multiple sockets to exceed the global
> limit, as long as the individual sockets stayed within bounds. After
> this change, the code will always compare the per-memcg consumption to
> the per-memcg limit, and the global consumption to the global limit,
> and thus close this loophole.
> 
> Without a soft limit, the per-memcg memory pressure state in sockets
> is generally questionable. However, we did it until now, so we
> continue to enter it when the hard limit is hit, and packets are
> dropped, to let other sockets in the cgroup know that they shouldn't
> grow their transmit windows, either. However, keep it simple in the
> new callback model and leave memory pressure lazily when the next
> packet is accepted (as opposed to doing it synchroneously when packets
> are processed). When packets are dropped, network performance will
> already be in the toilet, so that should be a reasonable trade-off.
> 
> As described above, consumption is now checked on the per-memcg level
> and the global level separately. Likewise, memory pressure states are
> maintained on both the per-memcg level and the global level, and a
> socket is considered under pressure when either level asserts as much.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/13] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
@ 2015-11-25 16:28     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:28 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:51:59 -0500

> There won't be a tcp control soft limit, so integrating the memcg code
> into the global skmem limiting scheme complicates things
> unnecessarily. Replace this with simple and clear charge and uncharge
> calls--hidden behind a jump label--to account skb memory.
> 
> Note that this is not purely aesthetic: as a result of shoehorning the
> per-memcg code into the same memory accounting functions that handle
> the global level, the old code would compare the per-memcg consumption
> against the smaller of the per-memcg limit and the global limit. This
> allowed the total consumption of multiple sockets to exceed the global
> limit, as long as the individual sockets stayed within bounds. After
> this change, the code will always compare the per-memcg consumption to
> the per-memcg limit, and the global consumption to the global limit,
> and thus close this loophole.
> 
> Without a soft limit, the per-memcg memory pressure state in sockets
> is generally questionable. However, we did it until now, so we
> continue to enter it when the hard limit is hit, and packets are
> dropped, to let other sockets in the cgroup know that they shouldn't
> grow their transmit windows, either. However, keep it simple in the
> new callback model and leave memory pressure lazily when the next
> packet is accepted (as opposed to doing it synchroneously when packets
> are processed). When packets are dropped, network performance will
> already be in the toilet, so that should be a reasonable trade-off.
> 
> As described above, consumption is now checked on the per-memcg level
> and the global level separately. Likewise, memory pressure states are
> maintained on both the per-memcg level and the global level, and a
> socket is considered under pressure when either level asserts as much.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 08/13] net: tcp_memcontrol: simplify linkage between socket and page counter
@ 2015-11-25 16:28     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:28 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:00 -0500

> There won't be any separate counters for socket memory consumed by
> protocols other than TCP in the future. Remove the indirection and
> link sockets directly to their owning memory cgroup.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 08/13] net: tcp_memcontrol: simplify linkage between socket and page counter
@ 2015-11-25 16:28     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:28 UTC (permalink / raw)
  To: hannes-druUgvl0LCNAfugRpC6u6w
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	vdavydov-5HdwGun5lf+gSpxsJD1C4w, mhocko-AlSwsSmVLrQ,
	tj-DgEjT+Ai2ygdnm+yROfE0A, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Date: Tue, 24 Nov 2015 16:52:00 -0500

> There won't be any separate counters for socket memory consumed by
> protocols other than TCP in the future. Remove the indirection and
> link sockets directly to their owning memory cgroup.
> 
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

Acked-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 08/13] net: tcp_memcontrol: simplify linkage between socket and page counter
@ 2015-11-25 16:28     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:28 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:00 -0500

> There won't be any separate counters for socket memory consumed by
> protocols other than TCP in the future. Remove the indirection and
> link sockets directly to their owning memory cgroup.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
  2015-11-24 21:52   ` Johannes Weiner
@ 2015-11-25 16:29     ` David Miller
  -1 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:29 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:01 -0500

> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-25 16:29     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:29 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:01 -0500

> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/13] mm: memcontrol: do not account memory+swap on unified hierarchy
  2015-11-24 21:52   ` Johannes Weiner
@ 2015-11-25 16:29     ` David Miller
  -1 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:29 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:02 -0500

> The unified hierarchy memory controller doesn't expose the memory+swap
> counter to userspace, but its accounting is hardcoded in all charge
> paths right now, including the per-cpu charge cache ("the stock").
> 
> To avoid adding yet more pointless memory+swap accounting with the
> socket memory support in unified hierarchy, disable the counter
> altogether when in unified hierarchy mode.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/13] mm: memcontrol: do not account memory+swap on unified hierarchy
@ 2015-11-25 16:29     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:29 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:02 -0500

> The unified hierarchy memory controller doesn't expose the memory+swap
> counter to userspace, but its accounting is hardcoded in all charge
> paths right now, including the per-cpu charge cache ("the stock").
> 
> To avoid adding yet more pointless memory+swap accounting with the
> socket memory support in unified hierarchy, disable the counter
> altogether when in unified hierarchy mode.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 11/13] mm: memcontrol: move socket code for unified hierarchy accounting
  2015-11-24 21:52   ` Johannes Weiner
@ 2015-11-25 16:29     ` David Miller
  -1 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:29 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:03 -0500

> The unified hierarchy memory controller will account socket
> memory. Move the infrastructure functions accordingly.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 11/13] mm: memcontrol: move socket code for unified hierarchy accounting
@ 2015-11-25 16:29     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:29 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:52:03 -0500

> The unified hierarchy memory controller will account socket
> memory. Move the infrastructure functions accordingly.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
  2015-11-24 21:58   ` Johannes Weiner
@ 2015-11-25 16:30     ` David Miller
  -1 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:30 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:58:44 -0500

> Socket memory can be a significant share of overall memory consumed by
> common workloads. In order to provide reasonable resource isolation in
> the unified hierarchy, this type of memory needs to be included in the
> tracking/accounting of a cgroup under active memory resource control.
> 
> Overhead is only incurred when a non-root control group is created AND
> the memory controller is instructed to track and account the memory
> footprint of that group. cgroup.memory=nosocket can be specified on
> the boot commandline to override any runtime configuration and
> forcibly exclude socket memory from active memory resource control.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-25 16:30     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:30 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:58:44 -0500

> Socket memory can be a significant share of overall memory consumed by
> common workloads. In order to provide reasonable resource isolation in
> the unified hierarchy, this type of memory needs to be included in the
> tracking/accounting of a cgroup under active memory resource control.
> 
> Overhead is only incurred when a non-root control group is created AND
> the memory controller is instructed to track and account the memory
> footprint of that group. cgroup.memory=nosocket can be specified on
> the boot commandline to override any runtime configuration and
> forcibly exclude socket memory from active memory resource control.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
  2015-11-24 21:59   ` Johannes Weiner
@ 2015-11-25 16:30     ` David Miller
  -1 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:30 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:59:40 -0500

> Let the networking stack know when a memcg is under reclaim pressure
> so that it can clamp its transmit windows accordingly.
> 
> Whenever the reclaim efficiency of a cgroup's LRU lists drops low
> enough for a MEDIUM or HIGH vmpressure event to occur, assert a
> pressure state in the socket and tcp memory code that tells it to curb
> consumption growth from sockets associated with said control group.
> 
> Traditionally, vmpressure reports for the entire subtree of a memcg
> under pressure, which drops useful information on the individual
> groups reclaimed. However, it's too late to change the userinterface,
> so add a second reporting mode that reports on the level of reclaim
> instead of at the level of pressure, and use that report for sockets.
> 
> vmpressure events are naturally edge triggered, so for hysteresis
> assert socket pressure for a second to allow for subsequent vmpressure
> events to occur before letting the socket code return to normal.
> 
> This will likely need finetuning for a wider variety of workloads, but
> for now stick to the vmpressure presets and keep hysteresis simple.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-25 16:30     ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2015-11-25 16:30 UTC (permalink / raw)
  To: hannes
  Cc: akpm, vdavydov, mhocko, tj, eric.dumazet, netdev, linux-mm,
	cgroups, linux-kernel, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Tue, 24 Nov 2015 16:59:40 -0500

> Let the networking stack know when a memcg is under reclaim pressure
> so that it can clamp its transmit windows accordingly.
> 
> Whenever the reclaim efficiency of a cgroup's LRU lists drops low
> enough for a MEDIUM or HIGH vmpressure event to occur, assert a
> pressure state in the socket and tcp memory code that tells it to curb
> consumption growth from sockets associated with said control group.
> 
> Traditionally, vmpressure reports for the entire subtree of a memcg
> under pressure, which drops useful information on the individual
> groups reclaimed. However, it's too late to change the userinterface,
> so add a second reporting mode that reports on the level of reclaim
> instead of at the level of pressure, and use that report for sockets.
> 
> vmpressure events are naturally edge triggered, so for hysteresis
> assert socket pressure for a second to allow for subsequent vmpressure
> events to occur before letting the socket code return to normal.
> 
> This will likely need finetuning for a wider variety of workloads, but
> for now stick to the vmpressure presets and keep hysteresis simple.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
  2015-11-24 21:58   ` Johannes Weiner
  (?)
@ 2015-11-30 10:54     ` Vladimir Davydov
  -1 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 10:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
...
> @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
>   */
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	struct page_counter *counter;
> +	gfp_t gfp_mask = GFP_KERNEL;
>  
> -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> -				    nr_pages, &counter)) {
> -		memcg->tcp_mem.memory_pressure = 0;
> -		return true;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		struct page_counter *counter;
> +
> +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> +					    nr_pages, &counter)) {
> +			memcg->tcp_mem.memory_pressure = 0;
> +			return true;
> +		}
> +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +		memcg->tcp_mem.memory_pressure = 1;
> +		return false;
>  	}
> -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> -	memcg->tcp_mem.memory_pressure = 1;
> +#endif
> +	/* Don't block in the packet receive path */
> +	if (in_softirq())
> +		gfp_mask = GFP_NOWAIT;
> +
> +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> +		return true;
> +
> +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);

We won't trigger high reclaim if we get here, because try_charge does
not check high threshold if failing or forcing charge. I think this
should be fixed regardless of this patch. The fix is attached below.

Also, I don't like calling try_charge twice: the second time will go
through all the try_charge steps for nothing. What about checking
page_counter value after calling try_charge instead:

	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
	return page_counter_read(&memcg->memory) <= memcg->memory.limit;

or adding an out parameter to try_charge that would inform us if charge
was forced?

>  	return false;
>  }
>  
> @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>   */
>  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> +				      nr_pages);
> +		return;
> +	}
> +#endif
> +	page_counter_uncharge(&memcg->memory, nr_pages);
> +	css_put_many(&memcg->css, nr_pages);

cancel_charge(memcg, nr_pages);

?

---
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: [PATCH] memcg: check high threshold if forcing allocation

try_charge() does not result in checking high threshold if it forces
charge. This is incorrect, because we could have failed to reclaim
memory due to the current context, so we do need to check high threshold
and try to compensate for the excess once we are in the safe context.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d564bff..e922965b572b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		page_counter_charge(&memcg->memsw, nr_pages);
 	css_get_many(&memcg->css, nr_pages);
 
-	return 0;
+	goto check_high;
 
 done_restock:
 	css_get_many(&memcg->css, batch);
 	if (batch > nr_pages)
 		refill_stock(memcg, batch - nr_pages);
 
+check_high:
 	/*
 	 * If the hierarchy is above the normal consumption range, schedule
 	 * reclaim on returning to userland.  We can perform reclaim here

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 10:54     ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 10:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
...
> @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
>   */
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	struct page_counter *counter;
> +	gfp_t gfp_mask = GFP_KERNEL;
>  
> -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> -				    nr_pages, &counter)) {
> -		memcg->tcp_mem.memory_pressure = 0;
> -		return true;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		struct page_counter *counter;
> +
> +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> +					    nr_pages, &counter)) {
> +			memcg->tcp_mem.memory_pressure = 0;
> +			return true;
> +		}
> +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +		memcg->tcp_mem.memory_pressure = 1;
> +		return false;
>  	}
> -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> -	memcg->tcp_mem.memory_pressure = 1;
> +#endif
> +	/* Don't block in the packet receive path */
> +	if (in_softirq())
> +		gfp_mask = GFP_NOWAIT;
> +
> +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> +		return true;
> +
> +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);

We won't trigger high reclaim if we get here, because try_charge does
not check high threshold if failing or forcing charge. I think this
should be fixed regardless of this patch. The fix is attached below.

Also, I don't like calling try_charge twice: the second time will go
through all the try_charge steps for nothing. What about checking
page_counter value after calling try_charge instead:

	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
	return page_counter_read(&memcg->memory) <= memcg->memory.limit;

or adding an out parameter to try_charge that would inform us if charge
was forced?

>  	return false;
>  }
>  
> @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>   */
>  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> +				      nr_pages);
> +		return;
> +	}
> +#endif
> +	page_counter_uncharge(&memcg->memory, nr_pages);
> +	css_put_many(&memcg->css, nr_pages);

cancel_charge(memcg, nr_pages);

?

---
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: [PATCH] memcg: check high threshold if forcing allocation

try_charge() does not result in checking high threshold if it forces
charge. This is incorrect, because we could have failed to reclaim
memory due to the current context, so we do need to check high threshold
and try to compensate for the excess once we are in the safe context.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d564bff..e922965b572b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		page_counter_charge(&memcg->memsw, nr_pages);
 	css_get_many(&memcg->css, nr_pages);
 
-	return 0;
+	goto check_high;
 
 done_restock:
 	css_get_many(&memcg->css, batch);
 	if (batch > nr_pages)
 		refill_stock(memcg, batch - nr_pages);
 
+check_high:
 	/*
 	 * If the hierarchy is above the normal consumption range, schedule
 	 * reclaim on returning to userland.  We can perform reclaim here

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 10:54     ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 10:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
...
> @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
>   */
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	struct page_counter *counter;
> +	gfp_t gfp_mask = GFP_KERNEL;
>  
> -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> -				    nr_pages, &counter)) {
> -		memcg->tcp_mem.memory_pressure = 0;
> -		return true;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		struct page_counter *counter;
> +
> +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> +					    nr_pages, &counter)) {
> +			memcg->tcp_mem.memory_pressure = 0;
> +			return true;
> +		}
> +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +		memcg->tcp_mem.memory_pressure = 1;
> +		return false;
>  	}
> -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> -	memcg->tcp_mem.memory_pressure = 1;
> +#endif
> +	/* Don't block in the packet receive path */
> +	if (in_softirq())
> +		gfp_mask = GFP_NOWAIT;
> +
> +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> +		return true;
> +
> +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);

We won't trigger high reclaim if we get here, because try_charge does
not check high threshold if failing or forcing charge. I think this
should be fixed regardless of this patch. The fix is attached below.

Also, I don't like calling try_charge twice: the second time will go
through all the try_charge steps for nothing. What about checking
page_counter value after calling try_charge instead:

	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
	return page_counter_read(&memcg->memory) <= memcg->memory.limit;

or adding an out parameter to try_charge that would inform us if charge
was forced?

>  	return false;
>  }
>  
> @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>   */
>  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> +				      nr_pages);
> +		return;
> +	}
> +#endif
> +	page_counter_uncharge(&memcg->memory, nr_pages);
> +	css_put_many(&memcg->css, nr_pages);

cancel_charge(memcg, nr_pages);

?

---
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: [PATCH] memcg: check high threshold if forcing allocation

try_charge() does not result in checking high threshold if it forces
charge. This is incorrect, because we could have failed to reclaim
memory due to the current context, so we do need to check high threshold
and try to compensate for the excess once we are in the safe context.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d564bff..e922965b572b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 		page_counter_charge(&memcg->memsw, nr_pages);
 	css_get_many(&memcg->css, nr_pages);
 
-	return 0;
+	goto check_high;
 
 done_restock:
 	css_get_many(&memcg->css, batch);
 	if (batch > nr_pages)
 		refill_stock(memcg, batch - nr_pages);
 
+check_high:
 	/*
 	 * If the hierarchy is above the normal consumption range, schedule
 	 * reclaim on returning to userland.  We can perform reclaim here

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 11:36     ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 11:36 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Tue, Nov 24, 2015 at 04:59:40PM -0500, Johannes Weiner wrote:
...
> @@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  		memcg = mem_cgroup_iter(root, NULL, &reclaim);
>  		do {
>  			unsigned long lru_pages;
> +			unsigned long reclaimed;
>  			unsigned long scanned;
>  			struct lruvec *lruvec;
>  			int swappiness;
> @@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  
>  			lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>  			swappiness = mem_cgroup_swappiness(memcg);
> +			reclaimed = sc->nr_reclaimed;
>  			scanned = sc->nr_scanned;
>  
>  			shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
> @@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  					    memcg, sc->nr_scanned - scanned,
>  					    lru_pages);
>  
> +			/* Record the group's reclaim efficiency */
> +			vmpressure(sc->gfp_mask, memcg, false,
> +				   sc->nr_scanned - scanned,
> +				   sc->nr_reclaimed - reclaimed);
> +

Suppose we have the following cgroup configuration.

A __ B
  \_ C

A is empty (which is natural for the unified hierarchy AFAIU). B has
some workload running in it, and C generates socket pressure. Due to the
socket pressure coming from C we start reclaim in A, which results in
thrashing of B, but we might not put sockets under pressure in A or C,
because vmpressure does not account pages scanned/reclaimed in B when
generating a vmpressure event for A or C. This might result in
aggressive reclaim and thrashing in B w/o generating a signal for C to
stop growing socket buffers.

Do you think such a situation is possible? If so, would it make sense to
switch to post-order walk in shrink_zone and pass sub-tree
scanned/reclaimed stats to vmpressure for each scanned memcg?

Thanks,
Vladimir

>  			/*
>  			 * Direct reclaim and kswapd have to scan all memory
>  			 * cgroups to fulfill the overall scan target for the
> @@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  			reclaim_state->reclaimed_slab = 0;
>  		}
>  
> -		vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
> +		/* Record the subtree's reclaim efficiency */
> +		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
>  			   sc->nr_scanned - nr_scanned,
>  			   sc->nr_reclaimed - nr_reclaimed);
>  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 11:36     ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 11:36 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Tue, Nov 24, 2015 at 04:59:40PM -0500, Johannes Weiner wrote:
...
> @@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  		memcg = mem_cgroup_iter(root, NULL, &reclaim);
>  		do {
>  			unsigned long lru_pages;
> +			unsigned long reclaimed;
>  			unsigned long scanned;
>  			struct lruvec *lruvec;
>  			int swappiness;
> @@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  
>  			lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>  			swappiness = mem_cgroup_swappiness(memcg);
> +			reclaimed = sc->nr_reclaimed;
>  			scanned = sc->nr_scanned;
>  
>  			shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
> @@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  					    memcg, sc->nr_scanned - scanned,
>  					    lru_pages);
>  
> +			/* Record the group's reclaim efficiency */
> +			vmpressure(sc->gfp_mask, memcg, false,
> +				   sc->nr_scanned - scanned,
> +				   sc->nr_reclaimed - reclaimed);
> +

Suppose we have the following cgroup configuration.

A __ B
  \_ C

A is empty (which is natural for the unified hierarchy AFAIU). B has
some workload running in it, and C generates socket pressure. Due to the
socket pressure coming from C we start reclaim in A, which results in
thrashing of B, but we might not put sockets under pressure in A or C,
because vmpressure does not account pages scanned/reclaimed in B when
generating a vmpressure event for A or C. This might result in
aggressive reclaim and thrashing in B w/o generating a signal for C to
stop growing socket buffers.

Do you think such a situation is possible? If so, would it make sense to
switch to post-order walk in shrink_zone and pass sub-tree
scanned/reclaimed stats to vmpressure for each scanned memcg?

Thanks,
Vladimir

>  			/*
>  			 * Direct reclaim and kswapd have to scan all memory
>  			 * cgroups to fulfill the overall scan target for the
> @@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  			reclaim_state->reclaimed_slab = 0;
>  		}
>  
> -		vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
> +		/* Record the subtree's reclaim efficiency */
> +		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
>  			   sc->nr_scanned - nr_scanned,
>  			   sc->nr_reclaimed - nr_reclaimed);
>  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 11:36     ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 11:36 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Tue, Nov 24, 2015 at 04:59:40PM -0500, Johannes Weiner wrote:
...
> @@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  		memcg = mem_cgroup_iter(root, NULL, &reclaim);
>  		do {
>  			unsigned long lru_pages;
> +			unsigned long reclaimed;
>  			unsigned long scanned;
>  			struct lruvec *lruvec;
>  			int swappiness;
> @@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  
>  			lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>  			swappiness = mem_cgroup_swappiness(memcg);
> +			reclaimed = sc->nr_reclaimed;
>  			scanned = sc->nr_scanned;
>  
>  			shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
> @@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  					    memcg, sc->nr_scanned - scanned,
>  					    lru_pages);
>  
> +			/* Record the group's reclaim efficiency */
> +			vmpressure(sc->gfp_mask, memcg, false,
> +				   sc->nr_scanned - scanned,
> +				   sc->nr_reclaimed - reclaimed);
> +

Suppose we have the following cgroup configuration.

A __ B
  \_ C

A is empty (which is natural for the unified hierarchy AFAIU). B has
some workload running in it, and C generates socket pressure. Due to the
socket pressure coming from C we start reclaim in A, which results in
thrashing of B, but we might not put sockets under pressure in A or C,
because vmpressure does not account pages scanned/reclaimed in B when
generating a vmpressure event for A or C. This might result in
aggressive reclaim and thrashing in B w/o generating a signal for C to
stop growing socket buffers.

Do you think such a situation is possible? If so, would it make sense to
switch to post-order walk in shrink_zone and pass sub-tree
scanned/reclaimed stats to vmpressure for each scanned memcg?

Thanks,
Vladimir

>  			/*
>  			 * Direct reclaim and kswapd have to scan all memory
>  			 * cgroups to fulfill the overall scan target for the
> @@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  			reclaim_state->reclaimed_slab = 0;
>  		}
>  
> -		vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
> +		/* Record the subtree's reclaim efficiency */
> +		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
>  			   sc->nr_scanned - nr_scanned,
>  			   sc->nr_reclaimed - nr_reclaimed);
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 11:36     ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 11:36 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Tue, Nov 24, 2015 at 04:59:40PM -0500, Johannes Weiner wrote:
...
> @@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  		memcg = mem_cgroup_iter(root, NULL, &reclaim);
>  		do {
>  			unsigned long lru_pages;
> +			unsigned long reclaimed;
>  			unsigned long scanned;
>  			struct lruvec *lruvec;
>  			int swappiness;
> @@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  
>  			lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>  			swappiness = mem_cgroup_swappiness(memcg);
> +			reclaimed = sc->nr_reclaimed;
>  			scanned = sc->nr_scanned;
>  
>  			shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
> @@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  					    memcg, sc->nr_scanned - scanned,
>  					    lru_pages);
>  
> +			/* Record the group's reclaim efficiency */
> +			vmpressure(sc->gfp_mask, memcg, false,
> +				   sc->nr_scanned - scanned,
> +				   sc->nr_reclaimed - reclaimed);
> +

Suppose we have the following cgroup configuration.

A __ B
  \_ C

A is empty (which is natural for the unified hierarchy AFAIU). B has
some workload running in it, and C generates socket pressure. Due to the
socket pressure coming from C we start reclaim in A, which results in
thrashing of B, but we might not put sockets under pressure in A or C,
because vmpressure does not account pages scanned/reclaimed in B when
generating a vmpressure event for A or C. This might result in
aggressive reclaim and thrashing in B w/o generating a signal for C to
stop growing socket buffers.

Do you think such a situation is possible? If so, would it make sense to
switch to post-order walk in shrink_zone and pass sub-tree
scanned/reclaimed stats to vmpressure for each scanned memcg?

Thanks,
Vladimir

>  			/*
>  			 * Direct reclaim and kswapd have to scan all memory
>  			 * cgroups to fulfill the overall scan target for the
> @@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  			reclaim_state->reclaimed_slab = 0;
>  		}
>  
> -		vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
> +		/* Record the subtree's reclaim efficiency */
> +		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
>  			   sc->nr_scanned - nr_scanned,
>  			   sc->nr_reclaimed - nr_reclaimed);
>  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
  2015-11-30 10:54     ` Vladimir Davydov
  (?)
@ 2015-11-30 15:26       ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 15:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> ...
> > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> >   */
> >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > -	struct page_counter *counter;
> > +	gfp_t gfp_mask = GFP_KERNEL;
> >  
> > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > -				    nr_pages, &counter)) {
> > -		memcg->tcp_mem.memory_pressure = 0;
> > -		return true;
> > +#ifdef CONFIG_MEMCG_KMEM
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +		struct page_counter *counter;
> > +
> > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > +					    nr_pages, &counter)) {
> > +			memcg->tcp_mem.memory_pressure = 0;
> > +			return true;
> > +		}
> > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > +		memcg->tcp_mem.memory_pressure = 1;
> > +		return false;
> >  	}
> > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > -	memcg->tcp_mem.memory_pressure = 1;
> > +#endif
> > +	/* Don't block in the packet receive path */
> > +	if (in_softirq())
> > +		gfp_mask = GFP_NOWAIT;
> > +
> > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > +		return true;
> > +
> > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> 
> We won't trigger high reclaim if we get here, because try_charge does
> not check high threshold if failing or forcing charge. I think this
> should be fixed regardless of this patch. The fix is attached below.

We kind of assume that max is either set above high, or not at
all. That means when max is hit the high limit has already failed and
it's of limited use to schedule background reclaim.

> Also, I don't like calling try_charge twice: the second time will go
> through all the try_charge steps for nothing. What about checking
> page_counter value after calling try_charge instead:
> 
> 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> 
> or adding an out parameter to try_charge that would inform us if charge
> was forced?

That's a complete cold path where we are going to drop the packet in
all but a few cases. It's not worth the trouble.

> > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >   */
> >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > +#ifdef CONFIG_MEMCG_KMEM
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > +				      nr_pages);
> > +		return;
> > +	}
> > +#endif
> > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > +	css_put_many(&memcg->css, nr_pages);
> 
> cancel_charge(memcg, nr_pages);

It does the same, but it's a weird name for regular uncharging.

> From: Vladimir Davydov <vdavydov@virtuozzo.com>
> Subject: [PATCH] memcg: check high threshold if forcing allocation
> 
> try_charge() does not result in checking high threshold if it forces
> charge. This is incorrect, because we could have failed to reclaim
> memory due to the current context, so we do need to check high threshold
> and try to compensate for the excess once we are in the safe context.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 79a29d564bff..e922965b572b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		page_counter_charge(&memcg->memsw, nr_pages);
>  	css_get_many(&memcg->css, nr_pages);
>  
> -	return 0;
> +	goto check_high;
>  
>  done_restock:
>  	css_get_many(&memcg->css, batch);
>  	if (batch > nr_pages)
>  		refill_stock(memcg, batch - nr_pages);
>  
> +check_high:
>  	/*
>  	 * If the hierarchy is above the normal consumption range, schedule
>  	 * reclaim on returning to userland.  We can perform reclaim here

One problem is that OOM victims force their charges so they can exit
quickly. It'd be contradictory to then task them with high reclaim.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 15:26       ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 15:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> ...
> > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> >   */
> >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > -	struct page_counter *counter;
> > +	gfp_t gfp_mask = GFP_KERNEL;
> >  
> > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > -				    nr_pages, &counter)) {
> > -		memcg->tcp_mem.memory_pressure = 0;
> > -		return true;
> > +#ifdef CONFIG_MEMCG_KMEM
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +		struct page_counter *counter;
> > +
> > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > +					    nr_pages, &counter)) {
> > +			memcg->tcp_mem.memory_pressure = 0;
> > +			return true;
> > +		}
> > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > +		memcg->tcp_mem.memory_pressure = 1;
> > +		return false;
> >  	}
> > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > -	memcg->tcp_mem.memory_pressure = 1;
> > +#endif
> > +	/* Don't block in the packet receive path */
> > +	if (in_softirq())
> > +		gfp_mask = GFP_NOWAIT;
> > +
> > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > +		return true;
> > +
> > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> 
> We won't trigger high reclaim if we get here, because try_charge does
> not check high threshold if failing or forcing charge. I think this
> should be fixed regardless of this patch. The fix is attached below.

We kind of assume that max is either set above high, or not at
all. That means when max is hit the high limit has already failed and
it's of limited use to schedule background reclaim.

> Also, I don't like calling try_charge twice: the second time will go
> through all the try_charge steps for nothing. What about checking
> page_counter value after calling try_charge instead:
> 
> 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> 
> or adding an out parameter to try_charge that would inform us if charge
> was forced?

That's a complete cold path where we are going to drop the packet in
all but a few cases. It's not worth the trouble.

> > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >   */
> >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > +#ifdef CONFIG_MEMCG_KMEM
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > +				      nr_pages);
> > +		return;
> > +	}
> > +#endif
> > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > +	css_put_many(&memcg->css, nr_pages);
> 
> cancel_charge(memcg, nr_pages);

It does the same, but it's a weird name for regular uncharging.

> From: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> Subject: [PATCH] memcg: check high threshold if forcing allocation
> 
> try_charge() does not result in checking high threshold if it forces
> charge. This is incorrect, because we could have failed to reclaim
> memory due to the current context, so we do need to check high threshold
> and try to compensate for the excess once we are in the safe context.
> 
> Signed-off-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 79a29d564bff..e922965b572b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		page_counter_charge(&memcg->memsw, nr_pages);
>  	css_get_many(&memcg->css, nr_pages);
>  
> -	return 0;
> +	goto check_high;
>  
>  done_restock:
>  	css_get_many(&memcg->css, batch);
>  	if (batch > nr_pages)
>  		refill_stock(memcg, batch - nr_pages);
>  
> +check_high:
>  	/*
>  	 * If the hierarchy is above the normal consumption range, schedule
>  	 * reclaim on returning to userland.  We can perform reclaim here

One problem is that OOM victims force their charges so they can exit
quickly. It'd be contradictory to then task them with high reclaim.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 15:26       ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 15:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> ...
> > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> >   */
> >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > -	struct page_counter *counter;
> > +	gfp_t gfp_mask = GFP_KERNEL;
> >  
> > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > -				    nr_pages, &counter)) {
> > -		memcg->tcp_mem.memory_pressure = 0;
> > -		return true;
> > +#ifdef CONFIG_MEMCG_KMEM
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +		struct page_counter *counter;
> > +
> > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > +					    nr_pages, &counter)) {
> > +			memcg->tcp_mem.memory_pressure = 0;
> > +			return true;
> > +		}
> > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > +		memcg->tcp_mem.memory_pressure = 1;
> > +		return false;
> >  	}
> > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > -	memcg->tcp_mem.memory_pressure = 1;
> > +#endif
> > +	/* Don't block in the packet receive path */
> > +	if (in_softirq())
> > +		gfp_mask = GFP_NOWAIT;
> > +
> > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > +		return true;
> > +
> > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> 
> We won't trigger high reclaim if we get here, because try_charge does
> not check high threshold if failing or forcing charge. I think this
> should be fixed regardless of this patch. The fix is attached below.

We kind of assume that max is either set above high, or not at
all. That means when max is hit the high limit has already failed and
it's of limited use to schedule background reclaim.

> Also, I don't like calling try_charge twice: the second time will go
> through all the try_charge steps for nothing. What about checking
> page_counter value after calling try_charge instead:
> 
> 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> 
> or adding an out parameter to try_charge that would inform us if charge
> was forced?

That's a complete cold path where we are going to drop the packet in
all but a few cases. It's not worth the trouble.

> > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >   */
> >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > +#ifdef CONFIG_MEMCG_KMEM
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > +				      nr_pages);
> > +		return;
> > +	}
> > +#endif
> > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > +	css_put_many(&memcg->css, nr_pages);
> 
> cancel_charge(memcg, nr_pages);

It does the same, but it's a weird name for regular uncharging.

> From: Vladimir Davydov <vdavydov@virtuozzo.com>
> Subject: [PATCH] memcg: check high threshold if forcing allocation
> 
> try_charge() does not result in checking high threshold if it forces
> charge. This is incorrect, because we could have failed to reclaim
> memory due to the current context, so we do need to check high threshold
> and try to compensate for the excess once we are in the safe context.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 79a29d564bff..e922965b572b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		page_counter_charge(&memcg->memsw, nr_pages);
>  	css_get_many(&memcg->css, nr_pages);
>  
> -	return 0;
> +	goto check_high;
>  
>  done_restock:
>  	css_get_many(&memcg->css, batch);
>  	if (batch > nr_pages)
>  		refill_stock(memcg, batch - nr_pages);
>  
> +check_high:
>  	/*
>  	 * If the hierarchy is above the normal consumption range, schedule
>  	 * reclaim on returning to userland.  We can perform reclaim here

One problem is that OOM victims force their charges so they can exit
quickly. It'd be contradictory to then task them with high reclaim.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
  2015-11-30 11:36     ` Vladimir Davydov
@ 2015-11-30 15:58       ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 15:58 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 02:36:28PM +0300, Vladimir Davydov wrote:
> Suppose we have the following cgroup configuration.
> 
> A __ B
>   \_ C
> 
> A is empty (which is natural for the unified hierarchy AFAIU). B has
> some workload running in it, and C generates socket pressure. Due to the
> socket pressure coming from C we start reclaim in A, which results in
> thrashing of B, but we might not put sockets under pressure in A or C,
> because vmpressure does not account pages scanned/reclaimed in B when
> generating a vmpressure event for A or C. This might result in
> aggressive reclaim and thrashing in B w/o generating a signal for C to
> stop growing socket buffers.
> 
> Do you think such a situation is possible? If so, would it make sense to
> switch to post-order walk in shrink_zone and pass sub-tree
> scanned/reclaimed stats to vmpressure for each scanned memcg?

In that case the LRU pages in C would experience pressure as well,
which would then reign in the sockets in C. There must be some LRU
pages in there, otherwise who is creating socket pressure?

The same applies to shrinkers. All secondary reclaim is driven by LRU
reclaim results.

I can see that there is some unfairness in distributing memcg reclaim
pressure purely based on LRU size, because there are scenarios where
the auxiliary objects (incl. sockets, but mostly shrinker pools)
amount to a significant portion of the group's memory footprint. But
substitute group for NUMA node and we've had this behavior for
years. I'm not sure it's actually a problem in practice.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 15:58       ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 15:58 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 02:36:28PM +0300, Vladimir Davydov wrote:
> Suppose we have the following cgroup configuration.
> 
> A __ B
>   \_ C
> 
> A is empty (which is natural for the unified hierarchy AFAIU). B has
> some workload running in it, and C generates socket pressure. Due to the
> socket pressure coming from C we start reclaim in A, which results in
> thrashing of B, but we might not put sockets under pressure in A or C,
> because vmpressure does not account pages scanned/reclaimed in B when
> generating a vmpressure event for A or C. This might result in
> aggressive reclaim and thrashing in B w/o generating a signal for C to
> stop growing socket buffers.
> 
> Do you think such a situation is possible? If so, would it make sense to
> switch to post-order walk in shrink_zone and pass sub-tree
> scanned/reclaimed stats to vmpressure for each scanned memcg?

In that case the LRU pages in C would experience pressure as well,
which would then reign in the sockets in C. There must be some LRU
pages in there, otherwise who is creating socket pressure?

The same applies to shrinkers. All secondary reclaim is driven by LRU
reclaim results.

I can see that there is some unfairness in distributing memcg reclaim
pressure purely based on LRU size, because there are scenarios where
the auxiliary objects (incl. sockets, but mostly shrinker pools)
amount to a significant portion of the group's memory footprint. But
substitute group for NUMA node and we've had this behavior for
years. I'm not sure it's actually a problem in practice.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
  2015-11-30 15:58       ` Johannes Weiner
  (?)
@ 2015-11-30 16:13         ` Vladimir Davydov
  -1 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 16:13 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 10:58:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 02:36:28PM +0300, Vladimir Davydov wrote:
> > Suppose we have the following cgroup configuration.
> > 
> > A __ B
> >   \_ C
> > 
> > A is empty (which is natural for the unified hierarchy AFAIU). B has
> > some workload running in it, and C generates socket pressure. Due to the
> > socket pressure coming from C we start reclaim in A, which results in
> > thrashing of B, but we might not put sockets under pressure in A or C,
> > because vmpressure does not account pages scanned/reclaimed in B when
> > generating a vmpressure event for A or C. This might result in
> > aggressive reclaim and thrashing in B w/o generating a signal for C to
> > stop growing socket buffers.
> > 
> > Do you think such a situation is possible? If so, would it make sense to
> > switch to post-order walk in shrink_zone and pass sub-tree
> > scanned/reclaimed stats to vmpressure for each scanned memcg?
> 
> In that case the LRU pages in C would experience pressure as well,
> which would then reign in the sockets in C. There must be some LRU
> pages in there, otherwise who is creating socket pressure?
> 
> The same applies to shrinkers. All secondary reclaim is driven by LRU
> reclaim results.
> 
> I can see that there is some unfairness in distributing memcg reclaim
> pressure purely based on LRU size, because there are scenarios where
> the auxiliary objects (incl. sockets, but mostly shrinker pools)
> amount to a significant portion of the group's memory footprint. But
> substitute group for NUMA node and we've had this behavior for
> years. I'm not sure it's actually a problem in practice.
> 

Fiar enough. Let's wait until we hit this problem in real world then.

The patch looks good to me.

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 16:13         ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 16:13 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 10:58:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 02:36:28PM +0300, Vladimir Davydov wrote:
> > Suppose we have the following cgroup configuration.
> > 
> > A __ B
> >   \_ C
> > 
> > A is empty (which is natural for the unified hierarchy AFAIU). B has
> > some workload running in it, and C generates socket pressure. Due to the
> > socket pressure coming from C we start reclaim in A, which results in
> > thrashing of B, but we might not put sockets under pressure in A or C,
> > because vmpressure does not account pages scanned/reclaimed in B when
> > generating a vmpressure event for A or C. This might result in
> > aggressive reclaim and thrashing in B w/o generating a signal for C to
> > stop growing socket buffers.
> > 
> > Do you think such a situation is possible? If so, would it make sense to
> > switch to post-order walk in shrink_zone and pass sub-tree
> > scanned/reclaimed stats to vmpressure for each scanned memcg?
> 
> In that case the LRU pages in C would experience pressure as well,
> which would then reign in the sockets in C. There must be some LRU
> pages in there, otherwise who is creating socket pressure?
> 
> The same applies to shrinkers. All secondary reclaim is driven by LRU
> reclaim results.
> 
> I can see that there is some unfairness in distributing memcg reclaim
> pressure purely based on LRU size, because there are scenarios where
> the auxiliary objects (incl. sockets, but mostly shrinker pools)
> amount to a significant portion of the group's memory footprint. But
> substitute group for NUMA node and we've had this behavior for
> years. I'm not sure it's actually a problem in practice.
> 

Fiar enough. Let's wait until we hit this problem in real world then.

The patch looks good to me.

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure
@ 2015-11-30 16:13         ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 16:13 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 10:58:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 02:36:28PM +0300, Vladimir Davydov wrote:
> > Suppose we have the following cgroup configuration.
> > 
> > A __ B
> >   \_ C
> > 
> > A is empty (which is natural for the unified hierarchy AFAIU). B has
> > some workload running in it, and C generates socket pressure. Due to the
> > socket pressure coming from C we start reclaim in A, which results in
> > thrashing of B, but we might not put sockets under pressure in A or C,
> > because vmpressure does not account pages scanned/reclaimed in B when
> > generating a vmpressure event for A or C. This might result in
> > aggressive reclaim and thrashing in B w/o generating a signal for C to
> > stop growing socket buffers.
> > 
> > Do you think such a situation is possible? If so, would it make sense to
> > switch to post-order walk in shrink_zone and pass sub-tree
> > scanned/reclaimed stats to vmpressure for each scanned memcg?
> 
> In that case the LRU pages in C would experience pressure as well,
> which would then reign in the sockets in C. There must be some LRU
> pages in there, otherwise who is creating socket pressure?
> 
> The same applies to shrinkers. All secondary reclaim is driven by LRU
> reclaim results.
> 
> I can see that there is some unfairness in distributing memcg reclaim
> pressure purely based on LRU size, because there are scenarios where
> the auxiliary objects (incl. sockets, but mostly shrinker pools)
> amount to a significant portion of the group's memory footprint. But
> substitute group for NUMA node and we've had this behavior for
> years. I'm not sure it's actually a problem in practice.
> 

Fiar enough. Let's wait until we hit this problem in real world then.

The patch looks good to me.

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 17:08         ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 17:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 10:26:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> > On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> > ...
> > > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> > >   */
> > >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	struct page_counter *counter;
> > > +	gfp_t gfp_mask = GFP_KERNEL;
> > >  
> > > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > -				    nr_pages, &counter)) {
> > > -		memcg->tcp_mem.memory_pressure = 0;
> > > -		return true;
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		struct page_counter *counter;
> > > +
> > > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > +					    nr_pages, &counter)) {
> > > +			memcg->tcp_mem.memory_pressure = 0;
> > > +			return true;
> > > +		}
> > > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +		memcg->tcp_mem.memory_pressure = 1;
> > > +		return false;
> > >  	}
> > > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > -	memcg->tcp_mem.memory_pressure = 1;
> > > +#endif
> > > +	/* Don't block in the packet receive path */
> > > +	if (in_softirq())
> > > +		gfp_mask = GFP_NOWAIT;
> > > +
> > > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > > +		return true;
> > > +
> > > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 
> > We won't trigger high reclaim if we get here, because try_charge does
> > not check high threshold if failing or forcing charge. I think this
> > should be fixed regardless of this patch. The fix is attached below.
> 
> We kind of assume that max is either set above high, or not at
> all. That means when max is hit the high limit has already failed and
> it's of limited use to schedule background reclaim.

Yeah, you're right. No point scheduling the work here - it must be
already running.

> 
> > Also, I don't like calling try_charge twice: the second time will go
> > through all the try_charge steps for nothing. What about checking
> > page_counter value after calling try_charge instead:
> > 
> > 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> > 
> > or adding an out parameter to try_charge that would inform us if charge
> > was forced?
> 
> That's a complete cold path where we are going to drop the packet in
> all but a few cases. It's not worth the trouble.

Right

> 
> > > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >   */
> > >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > > +				      nr_pages);
> > > +		return;
> > > +	}
> > > +#endif
> > > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > > +	css_put_many(&memcg->css, nr_pages);
> > 
> > cancel_charge(memcg, nr_pages);
> 
> It does the same, but it's a weird name for regular uncharging.

Right

> 
> > From: Vladimir Davydov <vdavydov@virtuozzo.com>
> > Subject: [PATCH] memcg: check high threshold if forcing allocation
> > 
> > try_charge() does not result in checking high threshold if it forces
> > charge. This is incorrect, because we could have failed to reclaim
> > memory due to the current context, so we do need to check high threshold
> > and try to compensate for the excess once we are in the safe context.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 79a29d564bff..e922965b572b 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  		page_counter_charge(&memcg->memsw, nr_pages);
> >  	css_get_many(&memcg->css, nr_pages);
> >  
> > -	return 0;
> > +	goto check_high;
> >  
> >  done_restock:
> >  	css_get_many(&memcg->css, batch);
> >  	if (batch > nr_pages)
> >  		refill_stock(memcg, batch - nr_pages);
> >  
> > +check_high:
> >  	/*
> >  	 * If the hierarchy is above the normal consumption range, schedule
> >  	 * reclaim on returning to userland.  We can perform reclaim here
> 
> One problem is that OOM victims force their charges so they can exit
> quickly. It'd be contradictory to then task them with high reclaim.
> 

Yeah, scratch that patch. It isn't necessary anyway, because, as you
pointed out, we don't really need to schedule high reclaim when we fail
hard in mem_cgroup_charge_skmem.

No more questions left,

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 17:08         ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 17:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Mon, Nov 30, 2015 at 10:26:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> > On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> > ...
> > > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> > >   */
> > >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	struct page_counter *counter;
> > > +	gfp_t gfp_mask = GFP_KERNEL;
> > >  
> > > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > -				    nr_pages, &counter)) {
> > > -		memcg->tcp_mem.memory_pressure = 0;
> > > -		return true;
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		struct page_counter *counter;
> > > +
> > > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > +					    nr_pages, &counter)) {
> > > +			memcg->tcp_mem.memory_pressure = 0;
> > > +			return true;
> > > +		}
> > > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +		memcg->tcp_mem.memory_pressure = 1;
> > > +		return false;
> > >  	}
> > > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > -	memcg->tcp_mem.memory_pressure = 1;
> > > +#endif
> > > +	/* Don't block in the packet receive path */
> > > +	if (in_softirq())
> > > +		gfp_mask = GFP_NOWAIT;
> > > +
> > > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > > +		return true;
> > > +
> > > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 
> > We won't trigger high reclaim if we get here, because try_charge does
> > not check high threshold if failing or forcing charge. I think this
> > should be fixed regardless of this patch. The fix is attached below.
> 
> We kind of assume that max is either set above high, or not at
> all. That means when max is hit the high limit has already failed and
> it's of limited use to schedule background reclaim.

Yeah, you're right. No point scheduling the work here - it must be
already running.

> 
> > Also, I don't like calling try_charge twice: the second time will go
> > through all the try_charge steps for nothing. What about checking
> > page_counter value after calling try_charge instead:
> > 
> > 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> > 
> > or adding an out parameter to try_charge that would inform us if charge
> > was forced?
> 
> That's a complete cold path where we are going to drop the packet in
> all but a few cases. It's not worth the trouble.

Right

> 
> > > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >   */
> > >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > > +				      nr_pages);
> > > +		return;
> > > +	}
> > > +#endif
> > > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > > +	css_put_many(&memcg->css, nr_pages);
> > 
> > cancel_charge(memcg, nr_pages);
> 
> It does the same, but it's a weird name for regular uncharging.

Right

> 
> > From: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> > Subject: [PATCH] memcg: check high threshold if forcing allocation
> > 
> > try_charge() does not result in checking high threshold if it forces
> > charge. This is incorrect, because we could have failed to reclaim
> > memory due to the current context, so we do need to check high threshold
> > and try to compensate for the excess once we are in the safe context.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 79a29d564bff..e922965b572b 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  		page_counter_charge(&memcg->memsw, nr_pages);
> >  	css_get_many(&memcg->css, nr_pages);
> >  
> > -	return 0;
> > +	goto check_high;
> >  
> >  done_restock:
> >  	css_get_many(&memcg->css, batch);
> >  	if (batch > nr_pages)
> >  		refill_stock(memcg, batch - nr_pages);
> >  
> > +check_high:
> >  	/*
> >  	 * If the hierarchy is above the normal consumption range, schedule
> >  	 * reclaim on returning to userland.  We can perform reclaim here
> 
> One problem is that OOM victims force their charges so they can exit
> quickly. It'd be contradictory to then task them with high reclaim.
> 

Yeah, scratch that patch. It isn't necessary anyway, because, as you
pointed out, we don't really need to schedule high reclaim when we fail
hard in mem_cgroup_charge_skmem.

No more questions left,

Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 17:08         ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 17:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team

On Mon, Nov 30, 2015 at 10:26:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> > On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> > ...
> > > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> > >   */
> > >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	struct page_counter *counter;
> > > +	gfp_t gfp_mask = GFP_KERNEL;
> > >  
> > > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > -				    nr_pages, &counter)) {
> > > -		memcg->tcp_mem.memory_pressure = 0;
> > > -		return true;
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		struct page_counter *counter;
> > > +
> > > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > +					    nr_pages, &counter)) {
> > > +			memcg->tcp_mem.memory_pressure = 0;
> > > +			return true;
> > > +		}
> > > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +		memcg->tcp_mem.memory_pressure = 1;
> > > +		return false;
> > >  	}
> > > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > -	memcg->tcp_mem.memory_pressure = 1;
> > > +#endif
> > > +	/* Don't block in the packet receive path */
> > > +	if (in_softirq())
> > > +		gfp_mask = GFP_NOWAIT;
> > > +
> > > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > > +		return true;
> > > +
> > > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 
> > We won't trigger high reclaim if we get here, because try_charge does
> > not check high threshold if failing or forcing charge. I think this
> > should be fixed regardless of this patch. The fix is attached below.
> 
> We kind of assume that max is either set above high, or not at
> all. That means when max is hit the high limit has already failed and
> it's of limited use to schedule background reclaim.

Yeah, you're right. No point scheduling the work here - it must be
already running.

> 
> > Also, I don't like calling try_charge twice: the second time will go
> > through all the try_charge steps for nothing. What about checking
> > page_counter value after calling try_charge instead:
> > 
> > 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> > 
> > or adding an out parameter to try_charge that would inform us if charge
> > was forced?
> 
> That's a complete cold path where we are going to drop the packet in
> all but a few cases. It's not worth the trouble.

Right

> 
> > > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >   */
> > >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > > +				      nr_pages);
> > > +		return;
> > > +	}
> > > +#endif
> > > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > > +	css_put_many(&memcg->css, nr_pages);
> > 
> > cancel_charge(memcg, nr_pages);
> 
> It does the same, but it's a weird name for regular uncharging.

Right

> 
> > From: Vladimir Davydov <vdavydov@virtuozzo.com>
> > Subject: [PATCH] memcg: check high threshold if forcing allocation
> > 
> > try_charge() does not result in checking high threshold if it forces
> > charge. This is incorrect, because we could have failed to reclaim
> > memory due to the current context, so we do need to check high threshold
> > and try to compensate for the excess once we are in the safe context.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 79a29d564bff..e922965b572b 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  		page_counter_charge(&memcg->memsw, nr_pages);
> >  	css_get_many(&memcg->css, nr_pages);
> >  
> > -	return 0;
> > +	goto check_high;
> >  
> >  done_restock:
> >  	css_get_many(&memcg->css, batch);
> >  	if (batch > nr_pages)
> >  		refill_stock(memcg, batch - nr_pages);
> >  
> > +check_high:
> >  	/*
> >  	 * If the hierarchy is above the normal consumption range, schedule
> >  	 * reclaim on returning to userland.  We can perform reclaim here
> 
> One problem is that OOM victims force their charges so they can exit
> quickly. It'd be contradictory to then task them with high reclaim.
> 

Yeah, scratch that patch. It isn't necessary anyway, because, as you
pointed out, we don't really need to schedule high reclaim when we fail
hard in mem_cgroup_charge_skmem.

No more questions left,

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller
@ 2015-11-30 17:08         ` Vladimir Davydov
  0 siblings, 0 replies; 77+ messages in thread
From: Vladimir Davydov @ 2015-11-30 17:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg

On Mon, Nov 30, 2015 at 10:26:38AM -0500, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 01:54:21PM +0300, Vladimir Davydov wrote:
> > On Tue, Nov 24, 2015 at 04:58:44PM -0500, Johannes Weiner wrote:
> > ...
> > > @@ -5520,15 +5557,30 @@ void sock_release_memcg(struct sock *sk)
> > >   */
> > >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	struct page_counter *counter;
> > > +	gfp_t gfp_mask = GFP_KERNEL;
> > >  
> > > -	if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > -				    nr_pages, &counter)) {
> > > -		memcg->tcp_mem.memory_pressure = 0;
> > > -		return true;
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		struct page_counter *counter;
> > > +
> > > +		if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > > +					    nr_pages, &counter)) {
> > > +			memcg->tcp_mem.memory_pressure = 0;
> > > +			return true;
> > > +		}
> > > +		page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +		memcg->tcp_mem.memory_pressure = 1;
> > > +		return false;
> > >  	}
> > > -	page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > -	memcg->tcp_mem.memory_pressure = 1;
> > > +#endif
> > > +	/* Don't block in the packet receive path */
> > > +	if (in_softirq())
> > > +		gfp_mask = GFP_NOWAIT;
> > > +
> > > +	if (try_charge(memcg, gfp_mask, nr_pages) == 0)
> > > +		return true;
> > > +
> > > +	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 
> > We won't trigger high reclaim if we get here, because try_charge does
> > not check high threshold if failing or forcing charge. I think this
> > should be fixed regardless of this patch. The fix is attached below.
> 
> We kind of assume that max is either set above high, or not at
> all. That means when max is hit the high limit has already failed and
> it's of limited use to schedule background reclaim.

Yeah, you're right. No point scheduling the work here - it must be
already running.

> 
> > Also, I don't like calling try_charge twice: the second time will go
> > through all the try_charge steps for nothing. What about checking
> > page_counter value after calling try_charge instead:
> > 
> > 	try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
> > 	return page_counter_read(&memcg->memory) <= memcg->memory.limit;
> > 
> > or adding an out parameter to try_charge that would inform us if charge
> > was forced?
> 
> That's a complete cold path where we are going to drop the packet in
> all but a few cases. It's not worth the trouble.

Right

> 
> > > @@ -5539,10 +5591,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >   */
> > >  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > >  {
> > > -	page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +		page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> > > +				      nr_pages);
> > > +		return;
> > > +	}
> > > +#endif
> > > +	page_counter_uncharge(&memcg->memory, nr_pages);
> > > +	css_put_many(&memcg->css, nr_pages);
> > 
> > cancel_charge(memcg, nr_pages);
> 
> It does the same, but it's a weird name for regular uncharging.

Right

> 
> > From: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> > Subject: [PATCH] memcg: check high threshold if forcing allocation
> > 
> > try_charge() does not result in checking high threshold if it forces
> > charge. This is incorrect, because we could have failed to reclaim
> > memory due to the current context, so we do need to check high threshold
> > and try to compensate for the excess once we are in the safe context.
> > 
> > Signed-off-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 79a29d564bff..e922965b572b 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2112,13 +2112,14 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  		page_counter_charge(&memcg->memsw, nr_pages);
> >  	css_get_many(&memcg->css, nr_pages);
> >  
> > -	return 0;
> > +	goto check_high;
> >  
> >  done_restock:
> >  	css_get_many(&memcg->css, batch);
> >  	if (batch > nr_pages)
> >  		refill_stock(memcg, batch - nr_pages);
> >  
> > +check_high:
> >  	/*
> >  	 * If the hierarchy is above the normal consumption range, schedule
> >  	 * reclaim on returning to userland.  We can perform reclaim here
> 
> One problem is that OOM victims force their charges so they can exit
> quickly. It'd be contradictory to then task them with high reclaim.
> 

Yeah, scratch that patch. It isn't necessary anyway, because, as you
pointed out, we don't really need to schedule high reclaim when we fail
hard in mem_cgroup_charge_skmem.

No more questions left,

Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>

Thanks,
Vladimir

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 21:08     ` Jason Baron
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Baron @ 2015-11-30 21:08 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

Hi,

On 11/24/2015 04:52 PM, Johannes Weiner wrote:
> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> ---
>  include/linux/memcontrol.h | 4 ++++
>  include/net/sock.h         | 7 -------
>  mm/memcontrol.c            | 3 +++
>  net/core/sock.c            | 5 -----
>  net/ipv4/tcp_memcontrol.c  | 4 ++--
>  5 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99fefe..dad56ef 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -681,6 +681,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>  
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>  struct sock;
> +extern struct static_key memcg_sockets_enabled_key;
> +#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)


We're trying to move to the updated API, so this should be:
static_branch_unlikely(&memcg_sockets_enabled_key)

see: include/linux/jump_label.h for details.


>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> @@ -689,6 +691,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>  {
>  	return memcg->tcp_mem.memory_pressure;
>  }
> +#else
> +#define mem_cgroup_sockets_enabled 0
>  #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
>  
>  #ifdef CONFIG_MEMCG_KMEM
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1a94b85..fcc9442 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
>  #define sk_refcnt_debug_release(sk) do { } while (0)
>  #endif /* SOCK_REFCNT_DEBUG */
>  
> -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
> -extern struct static_key memcg_socket_limit_enabled;
> -#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
> -#else
> -#define mem_cgroup_sockets_enabled 0
> -#endif
> -
>  static inline bool sk_stream_memory_free(const struct sock *sk)
>  {
>  	if (sk->sk_wmem_queued >= sk->sk_sndbuf)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 68d67fc..0602bee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
>  /* Writing them here to avoid exposing memcg's inner layout */
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>  
> +struct static_key memcg_sockets_enabled_key;


And this would be:

static DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);


>  void sock_update_memcg(struct sock *sk)
>  {
>  	struct mem_cgroup *memcg;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 6486b0d..c5435b5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
>  static struct lock_class_key af_family_keys[AF_MAX];
>  static struct lock_class_key af_family_slock_keys[AF_MAX];
>  
> -#if defined(CONFIG_MEMCG_KMEM)
> -struct static_key memcg_socket_limit_enabled;
> -EXPORT_SYMBOL(memcg_socket_limit_enabled);
> -#endif
> -
>  /*
>   * Make lock validator output more readable. (we pre-construct these
>   * strings build-time, so that runtime initialization of socket
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index e507825..9a22e2d 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>  		return;
>  
>  	if (memcg->tcp_mem.active)
> -		static_key_slow_dec(&memcg_socket_limit_enabled);
> +		static_key_slow_dec(&memcg_sockets_enabled_key);
>  

static_branch_dec(&memcg_sockets_enabled_key);

}
>  
>  static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> @@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
>  		 * because when this value change, the code to process it is not
>  		 * patched in yet.
>  		 */
> -		static_key_slow_inc(&memcg_socket_limit_enabled);
> +		static_key_slow_inc(&memcg_sockets_enabled_key);
>  		memcg->tcp_mem.active = true;
>  	}
>  
> 

static_branch_inc(&memcg_sockets_enabled_key);

Thanks,

-Jason


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 21:08     ` Jason Baron
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Baron @ 2015-11-30 21:08 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ

Hi,

On 11/24/2015 04:52 PM, Johannes Weiner wrote:
> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
> 
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
> Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> ---
>  include/linux/memcontrol.h | 4 ++++
>  include/net/sock.h         | 7 -------
>  mm/memcontrol.c            | 3 +++
>  net/core/sock.c            | 5 -----
>  net/ipv4/tcp_memcontrol.c  | 4 ++--
>  5 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99fefe..dad56ef 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -681,6 +681,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>  
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>  struct sock;
> +extern struct static_key memcg_sockets_enabled_key;
> +#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)


We're trying to move to the updated API, so this should be:
static_branch_unlikely(&memcg_sockets_enabled_key)

see: include/linux/jump_label.h for details.


>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> @@ -689,6 +691,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>  {
>  	return memcg->tcp_mem.memory_pressure;
>  }
> +#else
> +#define mem_cgroup_sockets_enabled 0
>  #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
>  
>  #ifdef CONFIG_MEMCG_KMEM
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1a94b85..fcc9442 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
>  #define sk_refcnt_debug_release(sk) do { } while (0)
>  #endif /* SOCK_REFCNT_DEBUG */
>  
> -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
> -extern struct static_key memcg_socket_limit_enabled;
> -#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
> -#else
> -#define mem_cgroup_sockets_enabled 0
> -#endif
> -
>  static inline bool sk_stream_memory_free(const struct sock *sk)
>  {
>  	if (sk->sk_wmem_queued >= sk->sk_sndbuf)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 68d67fc..0602bee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
>  /* Writing them here to avoid exposing memcg's inner layout */
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>  
> +struct static_key memcg_sockets_enabled_key;


And this would be:

static DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);


>  void sock_update_memcg(struct sock *sk)
>  {
>  	struct mem_cgroup *memcg;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 6486b0d..c5435b5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
>  static struct lock_class_key af_family_keys[AF_MAX];
>  static struct lock_class_key af_family_slock_keys[AF_MAX];
>  
> -#if defined(CONFIG_MEMCG_KMEM)
> -struct static_key memcg_socket_limit_enabled;
> -EXPORT_SYMBOL(memcg_socket_limit_enabled);
> -#endif
> -
>  /*
>   * Make lock validator output more readable. (we pre-construct these
>   * strings build-time, so that runtime initialization of socket
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index e507825..9a22e2d 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>  		return;
>  
>  	if (memcg->tcp_mem.active)
> -		static_key_slow_dec(&memcg_socket_limit_enabled);
> +		static_key_slow_dec(&memcg_sockets_enabled_key);
>  

static_branch_dec(&memcg_sockets_enabled_key);

}
>  
>  static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> @@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
>  		 * because when this value change, the code to process it is not
>  		 * patched in yet.
>  		 */
> -		static_key_slow_inc(&memcg_socket_limit_enabled);
> +		static_key_slow_inc(&memcg_sockets_enabled_key);
>  		memcg->tcp_mem.active = true;
>  	}
>  
> 

static_branch_inc(&memcg_sockets_enabled_key);

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 21:08     ` Jason Baron
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Baron @ 2015-11-30 21:08 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: David Miller, Vladimir Davydov, Michal Hocko, Tejun Heo,
	Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

Hi,

On 11/24/2015 04:52 PM, Johannes Weiner wrote:
> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> ---
>  include/linux/memcontrol.h | 4 ++++
>  include/net/sock.h         | 7 -------
>  mm/memcontrol.c            | 3 +++
>  net/core/sock.c            | 5 -----
>  net/ipv4/tcp_memcontrol.c  | 4 ++--
>  5 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99fefe..dad56ef 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -681,6 +681,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>  
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>  struct sock;
> +extern struct static_key memcg_sockets_enabled_key;
> +#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)


We're trying to move to the updated API, so this should be:
static_branch_unlikely(&memcg_sockets_enabled_key)

see: include/linux/jump_label.h for details.


>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> @@ -689,6 +691,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>  {
>  	return memcg->tcp_mem.memory_pressure;
>  }
> +#else
> +#define mem_cgroup_sockets_enabled 0
>  #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
>  
>  #ifdef CONFIG_MEMCG_KMEM
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 1a94b85..fcc9442 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
>  #define sk_refcnt_debug_release(sk) do { } while (0)
>  #endif /* SOCK_REFCNT_DEBUG */
>  
> -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
> -extern struct static_key memcg_socket_limit_enabled;
> -#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
> -#else
> -#define mem_cgroup_sockets_enabled 0
> -#endif
> -
>  static inline bool sk_stream_memory_free(const struct sock *sk)
>  {
>  	if (sk->sk_wmem_queued >= sk->sk_sndbuf)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 68d67fc..0602bee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
>  /* Writing them here to avoid exposing memcg's inner layout */
>  #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>  
> +struct static_key memcg_sockets_enabled_key;


And this would be:

static DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);


>  void sock_update_memcg(struct sock *sk)
>  {
>  	struct mem_cgroup *memcg;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 6486b0d..c5435b5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
>  static struct lock_class_key af_family_keys[AF_MAX];
>  static struct lock_class_key af_family_slock_keys[AF_MAX];
>  
> -#if defined(CONFIG_MEMCG_KMEM)
> -struct static_key memcg_socket_limit_enabled;
> -EXPORT_SYMBOL(memcg_socket_limit_enabled);
> -#endif
> -
>  /*
>   * Make lock validator output more readable. (we pre-construct these
>   * strings build-time, so that runtime initialization of socket
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index e507825..9a22e2d 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>  		return;
>  
>  	if (memcg->tcp_mem.active)
> -		static_key_slow_dec(&memcg_socket_limit_enabled);
> +		static_key_slow_dec(&memcg_sockets_enabled_key);
>  

static_branch_dec(&memcg_sockets_enabled_key);

}
>  
>  static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> @@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
>  		 * because when this value change, the code to process it is not
>  		 * patched in yet.
>  		 */
> -		static_key_slow_inc(&memcg_socket_limit_enabled);
> +		static_key_slow_inc(&memcg_sockets_enabled_key);
>  		memcg->tcp_mem.active = true;
>  	}
>  
> 

static_branch_inc(&memcg_sockets_enabled_key);

Thanks,

-Jason

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
  2015-11-30 21:08     ` Jason Baron
  (?)
  (?)
@ 2015-11-30 21:50       ` Johannes Weiner
  -1 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 21:50 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> We're trying to move to the updated API, so this should be:
> static_branch_unlikely(&memcg_sockets_enabled_key)
> 
> see: include/linux/jump_label.h for details.

Good point. There is another struct static_key in there as well. How
about the following on top of this series?

---
>From b784aa0323628d43272e13a67ead2a2ce0e93ea6 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 30 Nov 2015 16:41:38 -0500
Subject: [PATCH] mm: memcontrol: switch to the updated jump-label API

According to <linux/jump_label.h> the direct use of struct static_key
is deprecated. Update the socket and slab accounting code accordingly.

Reported-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  8 ++++----
 mm/memcontrol.c            | 12 ++++++------
 net/ipv4/tcp_memcontrol.c  |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a8df46c..9a19590 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -704,8 +704,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #ifdef CONFIG_INET
 struct sock;
-extern struct static_key memcg_sockets_enabled_key;
-#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
+extern struct static_key_false memcg_sockets_enabled_key;
+#define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
@@ -727,7 +727,7 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 #endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
-extern struct static_key memcg_kmem_enabled_key;
+extern struct static_key_false memcg_kmem_enabled_key;
 
 extern int memcg_nr_cache_ids;
 void memcg_get_cache_ids(void);
@@ -743,7 +743,7 @@ void memcg_put_cache_ids(void);
 
 static inline bool memcg_kmem_enabled(void)
 {
-	return static_key_false(&memcg_kmem_enabled_key);
+	return static_branch_unlikely(&memcg_kmem_enabled_key);
 }
 
 static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a0da91f..5fe45d68 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -346,7 +346,7 @@ void memcg_put_cache_ids(void)
  * conditional to this static branch, we'll have to allow modules that does
  * kmem_cache_alloc and the such to see this symbol as well
  */
-struct static_key memcg_kmem_enabled_key;
+DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
 EXPORT_SYMBOL(memcg_kmem_enabled_key);
 
 #endif /* CONFIG_MEMCG_KMEM */
@@ -2883,7 +2883,7 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
 	err = page_counter_limit(&memcg->kmem, nr_pages);
 	VM_BUG_ON(err);
 
-	static_key_slow_inc(&memcg_kmem_enabled_key);
+	static_branch_inc(&memcg_kmem_enabled_key);
 	/*
 	 * A memory cgroup is considered kmem-active as soon as it gets
 	 * kmemcg_id. Setting the id after enabling static branching will
@@ -3622,7 +3622,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
 {
 	if (memcg->kmem_acct_activated) {
 		memcg_destroy_kmem_caches(memcg);
-		static_key_slow_dec(&memcg_kmem_enabled_key);
+		static_branch_dec(&memcg_kmem_enabled_key);
 		WARN_ON(page_counter_read(&memcg->kmem));
 	}
 	tcp_destroy_cgroup(memcg);
@@ -4258,7 +4258,7 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
 
 #ifdef CONFIG_INET
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
-		static_key_slow_inc(&memcg_sockets_enabled_key);
+		static_branch_inc(&memcg_sockets_enabled_key);
 #endif
 
 	/*
@@ -4302,7 +4302,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	memcg_destroy_kmem(memcg);
 #ifdef CONFIG_INET
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
-		static_key_slow_dec(&memcg_sockets_enabled_key);
+		static_branch_dec(&memcg_sockets_enabled_key);
 #endif
 	__mem_cgroup_free(memcg);
 }
@@ -5494,7 +5494,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 
 #ifdef CONFIG_INET
 
-struct static_key memcg_sockets_enabled_key;
+DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
 
 void sock_update_memcg(struct sock *sk)
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 9a22e2d..18bc7f7 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 		return;
 
 	if (memcg->tcp_mem.active)
-		static_key_slow_dec(&memcg_sockets_enabled_key);
+		static_branch_dec(&memcg_sockets_enabled_key);
 }
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
@@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
-		static_key_slow_inc(&memcg_sockets_enabled_key);
+		static_branch_inc(&memcg_sockets_enabled_key);
 		memcg->tcp_mem.active = true;
 	}
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 21:50       ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 21:50 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> We're trying to move to the updated API, so this should be:
> static_branch_unlikely(&memcg_sockets_enabled_key)
> 
> see: include/linux/jump_label.h for details.

Good point. There is another struct static_key in there as well. How
about the following on top of this series?

---
>From b784aa0323628d43272e13a67ead2a2ce0e93ea6 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 30 Nov 2015 16:41:38 -0500
Subject: [PATCH] mm: memcontrol: switch to the updated jump-label API

According to <linux/jump_label.h> the direct use of struct static_key
is deprecated. Update the socket and slab accounting code accordingly.

Reported-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  8 ++++----
 mm/memcontrol.c            | 12 ++++++------
 net/ipv4/tcp_memcontrol.c  |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a8df46c..9a19590 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -704,8 +704,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #ifdef CONFIG_INET
 struct sock;
-extern struct static_key memcg_sockets_enabled_key;
-#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
+extern struct static_key_false memcg_sockets_enabled_key;
+#define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
@@ -727,7 +727,7 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 #endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
-extern struct static_key memcg_kmem_enabled_key;
+extern struct static_key_false memcg_kmem_enabled_key;
 
 extern int memcg_nr_cache_ids;
 void memcg_get_cache_ids(void);
@@ -743,7 +743,7 @@ void memcg_put_cache_ids(void);
 
 static inline bool memcg_kmem_enabled(void)
 {
-	return static_key_false(&memcg_kmem_enabled_key);
+	return static_branch_unlikely(&memcg_kmem_enabled_key);
 }
 
 static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a0da91f..5fe45d68 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -346,7 +346,7 @@ void memcg_put_cache_ids(void)
  * conditional to this static branch, we'll have to allow modules that does
  * kmem_cache_alloc and the such to see this symbol as well
  */
-struct static_key memcg_kmem_enabled_key;
+DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
 EXPORT_SYMBOL(memcg_kmem_enabled_key);
 
 #endif /* CONFIG_MEMCG_KMEM */
@@ -2883,7 +2883,7 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
 	err = page_counter_limit(&memcg->kmem, nr_pages);
 	VM_BUG_ON(err);
 
-	static_key_slow_inc(&memcg_kmem_enabled_key);
+	static_branch_inc(&memcg_kmem_enabled_key);
 	/*
 	 * A memory cgroup is considered kmem-active as soon as it gets
 	 * kmemcg_id. Setting the id after enabling static branching will
@@ -3622,7 +3622,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
 {
 	if (memcg->kmem_acct_activated) {
 		memcg_destroy_kmem_caches(memcg);
-		static_key_slow_dec(&memcg_kmem_enabled_key);
+		static_branch_dec(&memcg_kmem_enabled_key);
 		WARN_ON(page_counter_read(&memcg->kmem));
 	}
 	tcp_destroy_cgroup(memcg);
@@ -4258,7 +4258,7 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
 
 #ifdef CONFIG_INET
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
-		static_key_slow_inc(&memcg_sockets_enabled_key);
+		static_branch_inc(&memcg_sockets_enabled_key);
 #endif
 
 	/*
@@ -4302,7 +4302,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	memcg_destroy_kmem(memcg);
 #ifdef CONFIG_INET
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
-		static_key_slow_dec(&memcg_sockets_enabled_key);
+		static_branch_dec(&memcg_sockets_enabled_key);
 #endif
 	__mem_cgroup_free(memcg);
 }
@@ -5494,7 +5494,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 
 #ifdef CONFIG_INET
 
-struct static_key memcg_sockets_enabled_key;
+DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
 
 void sock_update_memcg(struct sock *sk)
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 9a22e2d..18bc7f7 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 		return;
 
 	if (memcg->tcp_mem.active)
-		static_key_slow_dec(&memcg_sockets_enabled_key);
+		static_branch_dec(&memcg_sockets_enabled_key);
 }
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
@@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
-		static_key_slow_inc(&memcg_sockets_enabled_key);
+		static_branch_inc(&memcg_sockets_enabled_key);
 		memcg->tcp_mem.active = true;
 	}
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 21:50       ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 21:50 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> We're trying to move to the updated API, so this should be:
> static_branch_unlikely(&memcg_sockets_enabled_key)
> 
> see: include/linux/jump_label.h for details.

Good point. There is another struct static_key in there as well. How
about the following on top of this series?

---

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 21:50       ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 21:50 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> We're trying to move to the updated API, so this should be:
> static_branch_unlikely(&memcg_sockets_enabled_key)
> 
> see: include/linux/jump_label.h for details.

Good point. There is another struct static_key in there as well. How
about the following on top of this series?

---
From b784aa0323628d43272e13a67ead2a2ce0e93ea6 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 30 Nov 2015 16:41:38 -0500
Subject: [PATCH] mm: memcontrol: switch to the updated jump-label API

According to <linux/jump_label.h> the direct use of struct static_key
is deprecated. Update the socket and slab accounting code accordingly.

Reported-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  8 ++++----
 mm/memcontrol.c            | 12 ++++++------
 net/ipv4/tcp_memcontrol.c  |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a8df46c..9a19590 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -704,8 +704,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 
 #ifdef CONFIG_INET
 struct sock;
-extern struct static_key memcg_sockets_enabled_key;
-#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
+extern struct static_key_false memcg_sockets_enabled_key;
+#define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
 void sock_update_memcg(struct sock *sk);
 void sock_release_memcg(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
@@ -727,7 +727,7 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 #endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
-extern struct static_key memcg_kmem_enabled_key;
+extern struct static_key_false memcg_kmem_enabled_key;
 
 extern int memcg_nr_cache_ids;
 void memcg_get_cache_ids(void);
@@ -743,7 +743,7 @@ void memcg_put_cache_ids(void);
 
 static inline bool memcg_kmem_enabled(void)
 {
-	return static_key_false(&memcg_kmem_enabled_key);
+	return static_branch_unlikely(&memcg_kmem_enabled_key);
 }
 
 static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a0da91f..5fe45d68 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -346,7 +346,7 @@ void memcg_put_cache_ids(void)
  * conditional to this static branch, we'll have to allow modules that does
  * kmem_cache_alloc and the such to see this symbol as well
  */
-struct static_key memcg_kmem_enabled_key;
+DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
 EXPORT_SYMBOL(memcg_kmem_enabled_key);
 
 #endif /* CONFIG_MEMCG_KMEM */
@@ -2883,7 +2883,7 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
 	err = page_counter_limit(&memcg->kmem, nr_pages);
 	VM_BUG_ON(err);
 
-	static_key_slow_inc(&memcg_kmem_enabled_key);
+	static_branch_inc(&memcg_kmem_enabled_key);
 	/*
 	 * A memory cgroup is considered kmem-active as soon as it gets
 	 * kmemcg_id. Setting the id after enabling static branching will
@@ -3622,7 +3622,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
 {
 	if (memcg->kmem_acct_activated) {
 		memcg_destroy_kmem_caches(memcg);
-		static_key_slow_dec(&memcg_kmem_enabled_key);
+		static_branch_dec(&memcg_kmem_enabled_key);
 		WARN_ON(page_counter_read(&memcg->kmem));
 	}
 	tcp_destroy_cgroup(memcg);
@@ -4258,7 +4258,7 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
 
 #ifdef CONFIG_INET
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
-		static_key_slow_inc(&memcg_sockets_enabled_key);
+		static_branch_inc(&memcg_sockets_enabled_key);
 #endif
 
 	/*
@@ -4302,7 +4302,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	memcg_destroy_kmem(memcg);
 #ifdef CONFIG_INET
 	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
-		static_key_slow_dec(&memcg_sockets_enabled_key);
+		static_branch_dec(&memcg_sockets_enabled_key);
 #endif
 	__mem_cgroup_free(memcg);
 }
@@ -5494,7 +5494,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
 
 #ifdef CONFIG_INET
 
-struct static_key memcg_sockets_enabled_key;
+DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
 
 void sock_update_memcg(struct sock *sk)
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 9a22e2d..18bc7f7 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
 		return;
 
 	if (memcg->tcp_mem.active)
-		static_key_slow_dec(&memcg_sockets_enabled_key);
+		static_branch_dec(&memcg_sockets_enabled_key);
 }
 
 static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
@@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
-		static_key_slow_inc(&memcg_sockets_enabled_key);
+		static_branch_inc(&memcg_sockets_enabled_key);
 		memcg->tcp_mem.active = true;
 	}
 
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
  2015-11-30 21:50       ` Johannes Weiner
@ 2015-11-30 22:28         ` Jason Baron
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Baron @ 2015-11-30 22:28 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On 11/30/2015 04:50 PM, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
>> We're trying to move to the updated API, so this should be:
>> static_branch_unlikely(&memcg_sockets_enabled_key)
>>
>> see: include/linux/jump_label.h for details.
> 
> Good point. There is another struct static_key in there as well. How
> about the following on top of this series?
> 

Looks fine - you may be able to make use of
'static_branch_enable()/disable()' instead of the inc()/dec() to simply
set the branch direction, if you think its more readable. Although I
didn't look to see if it would be racy here.

Thanks,

-Jason


> ---
> From b784aa0323628d43272e13a67ead2a2ce0e93ea6 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 30 Nov 2015 16:41:38 -0500
> Subject: [PATCH] mm: memcontrol: switch to the updated jump-label API
> 
> According to <linux/jump_label.h> the direct use of struct static_key
> is deprecated. Update the socket and slab accounting code accordingly.
> 
> Reported-by: Jason Baron <jbaron@akamai.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h |  8 ++++----
>  mm/memcontrol.c            | 12 ++++++------
>  net/ipv4/tcp_memcontrol.c  |  4 ++--
>  3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a8df46c..9a19590 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -704,8 +704,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>  
>  #ifdef CONFIG_INET
>  struct sock;
> -extern struct static_key memcg_sockets_enabled_key;
> -#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
> +extern struct static_key_false memcg_sockets_enabled_key;
> +#define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> @@ -727,7 +727,7 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>  #endif /* CONFIG_INET */
>  
>  #ifdef CONFIG_MEMCG_KMEM
> -extern struct static_key memcg_kmem_enabled_key;
> +extern struct static_key_false memcg_kmem_enabled_key;
>  
>  extern int memcg_nr_cache_ids;
>  void memcg_get_cache_ids(void);
> @@ -743,7 +743,7 @@ void memcg_put_cache_ids(void);
>  
>  static inline bool memcg_kmem_enabled(void)
>  {
> -	return static_key_false(&memcg_kmem_enabled_key);
> +	return static_branch_unlikely(&memcg_kmem_enabled_key);
>  }
>  
>  static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a0da91f..5fe45d68 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -346,7 +346,7 @@ void memcg_put_cache_ids(void)
>   * conditional to this static branch, we'll have to allow modules that does
>   * kmem_cache_alloc and the such to see this symbol as well
>   */
> -struct static_key memcg_kmem_enabled_key;
> +DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
>  EXPORT_SYMBOL(memcg_kmem_enabled_key);
>  
>  #endif /* CONFIG_MEMCG_KMEM */
> @@ -2883,7 +2883,7 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
>  	err = page_counter_limit(&memcg->kmem, nr_pages);
>  	VM_BUG_ON(err);
>  
> -	static_key_slow_inc(&memcg_kmem_enabled_key);
> +	static_branch_inc(&memcg_kmem_enabled_key);
>  	/*
>  	 * A memory cgroup is considered kmem-active as soon as it gets
>  	 * kmemcg_id. Setting the id after enabling static branching will
> @@ -3622,7 +3622,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
>  {
>  	if (memcg->kmem_acct_activated) {
>  		memcg_destroy_kmem_caches(memcg);
> -		static_key_slow_dec(&memcg_kmem_enabled_key);
> +		static_branch_dec(&memcg_kmem_enabled_key);
>  		WARN_ON(page_counter_read(&memcg->kmem));
>  	}
>  	tcp_destroy_cgroup(memcg);
> @@ -4258,7 +4258,7 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
>  
>  #ifdef CONFIG_INET
>  	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
> -		static_key_slow_inc(&memcg_sockets_enabled_key);
> +		static_branch_inc(&memcg_sockets_enabled_key);
>  #endif
>  
>  	/*
> @@ -4302,7 +4302,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
>  	memcg_destroy_kmem(memcg);
>  #ifdef CONFIG_INET
>  	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
> -		static_key_slow_dec(&memcg_sockets_enabled_key);
> +		static_branch_dec(&memcg_sockets_enabled_key);
>  #endif
>  	__mem_cgroup_free(memcg);
>  }
> @@ -5494,7 +5494,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
>  
>  #ifdef CONFIG_INET
>  
> -struct static_key memcg_sockets_enabled_key;
> +DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
>  EXPORT_SYMBOL(memcg_sockets_enabled_key);
>  
>  void sock_update_memcg(struct sock *sk)
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index 9a22e2d..18bc7f7 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>  		return;
>  
>  	if (memcg->tcp_mem.active)
> -		static_key_slow_dec(&memcg_sockets_enabled_key);
> +		static_branch_dec(&memcg_sockets_enabled_key);
>  }
>  
>  static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> @@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
>  		 * because when this value change, the code to process it is not
>  		 * patched in yet.
>  		 */
> -		static_key_slow_inc(&memcg_sockets_enabled_key);
> +		static_branch_inc(&memcg_sockets_enabled_key);
>  		memcg->tcp_mem.active = true;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 22:28         ` Jason Baron
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Baron @ 2015-11-30 22:28 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On 11/30/2015 04:50 PM, Johannes Weiner wrote:
> On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
>> We're trying to move to the updated API, so this should be:
>> static_branch_unlikely(&memcg_sockets_enabled_key)
>>
>> see: include/linux/jump_label.h for details.
> 
> Good point. There is another struct static_key in there as well. How
> about the following on top of this series?
> 

Looks fine - you may be able to make use of
'static_branch_enable()/disable()' instead of the inc()/dec() to simply
set the branch direction, if you think its more readable. Although I
didn't look to see if it would be racy here.

Thanks,

-Jason


> ---
> From b784aa0323628d43272e13a67ead2a2ce0e93ea6 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 30 Nov 2015 16:41:38 -0500
> Subject: [PATCH] mm: memcontrol: switch to the updated jump-label API
> 
> According to <linux/jump_label.h> the direct use of struct static_key
> is deprecated. Update the socket and slab accounting code accordingly.
> 
> Reported-by: Jason Baron <jbaron@akamai.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h |  8 ++++----
>  mm/memcontrol.c            | 12 ++++++------
>  net/ipv4/tcp_memcontrol.c  |  4 ++--
>  3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index a8df46c..9a19590 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -704,8 +704,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>  
>  #ifdef CONFIG_INET
>  struct sock;
> -extern struct static_key memcg_sockets_enabled_key;
> -#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
> +extern struct static_key_false memcg_sockets_enabled_key;
> +#define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> @@ -727,7 +727,7 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>  #endif /* CONFIG_INET */
>  
>  #ifdef CONFIG_MEMCG_KMEM
> -extern struct static_key memcg_kmem_enabled_key;
> +extern struct static_key_false memcg_kmem_enabled_key;
>  
>  extern int memcg_nr_cache_ids;
>  void memcg_get_cache_ids(void);
> @@ -743,7 +743,7 @@ void memcg_put_cache_ids(void);
>  
>  static inline bool memcg_kmem_enabled(void)
>  {
> -	return static_key_false(&memcg_kmem_enabled_key);
> +	return static_branch_unlikely(&memcg_kmem_enabled_key);
>  }
>  
>  static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a0da91f..5fe45d68 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -346,7 +346,7 @@ void memcg_put_cache_ids(void)
>   * conditional to this static branch, we'll have to allow modules that does
>   * kmem_cache_alloc and the such to see this symbol as well
>   */
> -struct static_key memcg_kmem_enabled_key;
> +DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
>  EXPORT_SYMBOL(memcg_kmem_enabled_key);
>  
>  #endif /* CONFIG_MEMCG_KMEM */
> @@ -2883,7 +2883,7 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
>  	err = page_counter_limit(&memcg->kmem, nr_pages);
>  	VM_BUG_ON(err);
>  
> -	static_key_slow_inc(&memcg_kmem_enabled_key);
> +	static_branch_inc(&memcg_kmem_enabled_key);
>  	/*
>  	 * A memory cgroup is considered kmem-active as soon as it gets
>  	 * kmemcg_id. Setting the id after enabling static branching will
> @@ -3622,7 +3622,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
>  {
>  	if (memcg->kmem_acct_activated) {
>  		memcg_destroy_kmem_caches(memcg);
> -		static_key_slow_dec(&memcg_kmem_enabled_key);
> +		static_branch_dec(&memcg_kmem_enabled_key);
>  		WARN_ON(page_counter_read(&memcg->kmem));
>  	}
>  	tcp_destroy_cgroup(memcg);
> @@ -4258,7 +4258,7 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
>  
>  #ifdef CONFIG_INET
>  	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
> -		static_key_slow_inc(&memcg_sockets_enabled_key);
> +		static_branch_inc(&memcg_sockets_enabled_key);
>  #endif
>  
>  	/*
> @@ -4302,7 +4302,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
>  	memcg_destroy_kmem(memcg);
>  #ifdef CONFIG_INET
>  	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
> -		static_key_slow_dec(&memcg_sockets_enabled_key);
> +		static_branch_dec(&memcg_sockets_enabled_key);
>  #endif
>  	__mem_cgroup_free(memcg);
>  }
> @@ -5494,7 +5494,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
>  
>  #ifdef CONFIG_INET
>  
> -struct static_key memcg_sockets_enabled_key;
> +DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
>  EXPORT_SYMBOL(memcg_sockets_enabled_key);
>  
>  void sock_update_memcg(struct sock *sk)
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index 9a22e2d..18bc7f7 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
>  		return;
>  
>  	if (memcg->tcp_mem.active)
> -		static_key_slow_dec(&memcg_sockets_enabled_key);
> +		static_branch_dec(&memcg_sockets_enabled_key);
>  }
>  
>  static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> @@ -65,7 +65,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
>  		 * because when this value change, the code to process it is not
>  		 * patched in yet.
>  		 */
> -		static_key_slow_inc(&memcg_sockets_enabled_key);
> +		static_branch_inc(&memcg_sockets_enabled_key);
>  		memcg->tcp_mem.active = true;
>  	}
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 22:46           ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 22:46 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On Mon, Nov 30, 2015 at 05:28:49PM -0500, Jason Baron wrote:
> On 11/30/2015 04:50 PM, Johannes Weiner wrote:
> > On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> >> We're trying to move to the updated API, so this should be:
> >> static_branch_unlikely(&memcg_sockets_enabled_key)
> >>
> >> see: include/linux/jump_label.h for details.
> > 
> > Good point. There is another struct static_key in there as well. How
> > about the following on top of this series?
> > 
> 
> Looks fine - you may be able to make use of
> 'static_branch_enable()/disable()' instead of the inc()/dec() to simply
> set the branch direction, if you think its more readable. Although I
> didn't look to see if it would be racy here.

Thanks!

We actually need the reference counting semantics for both these keys.
It counts the number of active cgroups in existence that require the
code behind those static branches.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 22:46           ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 22:46 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ

On Mon, Nov 30, 2015 at 05:28:49PM -0500, Jason Baron wrote:
> On 11/30/2015 04:50 PM, Johannes Weiner wrote:
> > On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> >> We're trying to move to the updated API, so this should be:
> >> static_branch_unlikely(&memcg_sockets_enabled_key)
> >>
> >> see: include/linux/jump_label.h for details.
> > 
> > Good point. There is another struct static_key in there as well. How
> > about the following on top of this series?
> > 
> 
> Looks fine - you may be able to make use of
> 'static_branch_enable()/disable()' instead of the inc()/dec() to simply
> set the branch direction, if you think its more readable. Although I
> didn't look to see if it would be racy here.

Thanks!

We actually need the reference counting semantics for both these keys.
It counts the number of active cgroups in existence that require the
code behind those static branches.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label
@ 2015-11-30 22:46           ` Johannes Weiner
  0 siblings, 0 replies; 77+ messages in thread
From: Johannes Weiner @ 2015-11-30 22:46 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andrew Morton, David Miller, Vladimir Davydov, Michal Hocko,
	Tejun Heo, Eric Dumazet, netdev, linux-mm, cgroups, linux-kernel,
	kernel-team, peterz

On Mon, Nov 30, 2015 at 05:28:49PM -0500, Jason Baron wrote:
> On 11/30/2015 04:50 PM, Johannes Weiner wrote:
> > On Mon, Nov 30, 2015 at 04:08:18PM -0500, Jason Baron wrote:
> >> We're trying to move to the updated API, so this should be:
> >> static_branch_unlikely(&memcg_sockets_enabled_key)
> >>
> >> see: include/linux/jump_label.h for details.
> > 
> > Good point. There is another struct static_key in there as well. How
> > about the following on top of this series?
> > 
> 
> Looks fine - you may be able to make use of
> 'static_branch_enable()/disable()' instead of the inc()/dec() to simply
> set the branch direction, if you think its more readable. Although I
> didn't look to see if it would be racy here.

Thanks!

We actually need the reference counting semantics for both these keys.
It counts the number of active cgroups in existence that require the
code behind those static branches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2015-11-30 22:46 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-24 21:51 [PATCH 00/13] mm: memcontrol: account socket memory in unified hierarchy v4 Johannes Weiner
2015-11-24 21:51 ` Johannes Weiner
2015-11-24 21:51 ` [PATCH 01/13] mm: memcontrol: export root_mem_cgroup Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-24 21:51 ` [PATCH 02/13] net: tcp_memcontrol: properly detect ancestor socket pressure Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-24 21:51 ` [PATCH 03/13] net: tcp_memcontrol: remove bogus hierarchy pressure propagation Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-24 21:51 ` [PATCH 04/13] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-24 21:51 ` [PATCH 05/13] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-24 21:51 ` [PATCH 06/13] net: tcp_memcontrol: simplify the per-memcg limit access Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-25 16:26   ` David Miller
2015-11-25 16:26     ` David Miller
2015-11-25 16:26     ` David Miller
2015-11-24 21:51 ` [PATCH 07/13] net: tcp_memcontrol: sanitize tcp memory accounting callbacks Johannes Weiner
2015-11-24 21:51   ` Johannes Weiner
2015-11-25 16:28   ` David Miller
2015-11-25 16:28     ` David Miller
2015-11-24 21:52 ` [PATCH 08/13] net: tcp_memcontrol: simplify linkage between socket and page counter Johannes Weiner
2015-11-24 21:52   ` Johannes Weiner
2015-11-25 16:28   ` David Miller
2015-11-25 16:28     ` David Miller
2015-11-25 16:28     ` David Miller
2015-11-24 21:52 ` [PATCH 09/13] mm: memcontrol: generalize the socket accounting jump label Johannes Weiner
2015-11-24 21:52   ` Johannes Weiner
2015-11-25 16:29   ` David Miller
2015-11-25 16:29     ` David Miller
2015-11-30 21:08   ` Jason Baron
2015-11-30 21:08     ` Jason Baron
2015-11-30 21:08     ` Jason Baron
2015-11-30 21:50     ` Johannes Weiner
2015-11-30 21:50       ` Johannes Weiner
2015-11-30 21:50       ` Johannes Weiner
2015-11-30 21:50       ` Johannes Weiner
2015-11-30 22:28       ` Jason Baron
2015-11-30 22:28         ` Jason Baron
2015-11-30 22:46         ` Johannes Weiner
2015-11-30 22:46           ` Johannes Weiner
2015-11-30 22:46           ` Johannes Weiner
2015-11-24 21:52 ` [PATCH 10/13] mm: memcontrol: do not account memory+swap on unified hierarchy Johannes Weiner
2015-11-24 21:52   ` Johannes Weiner
2015-11-25 16:29   ` David Miller
2015-11-25 16:29     ` David Miller
2015-11-24 21:52 ` [PATCH 11/13] mm: memcontrol: move socket code for unified hierarchy accounting Johannes Weiner
2015-11-24 21:52   ` Johannes Weiner
2015-11-25 16:29   ` David Miller
2015-11-25 16:29     ` David Miller
2015-11-24 21:58 ` [PATCH 12/13] mm: memcontrol: account socket memory in unified hierarchy memory controller Johannes Weiner
2015-11-24 21:58   ` Johannes Weiner
2015-11-25 16:30   ` David Miller
2015-11-25 16:30     ` David Miller
2015-11-30 10:54   ` Vladimir Davydov
2015-11-30 10:54     ` Vladimir Davydov
2015-11-30 10:54     ` Vladimir Davydov
2015-11-30 15:26     ` Johannes Weiner
2015-11-30 15:26       ` Johannes Weiner
2015-11-30 15:26       ` Johannes Weiner
2015-11-30 17:08       ` Vladimir Davydov
2015-11-30 17:08         ` Vladimir Davydov
2015-11-30 17:08         ` Vladimir Davydov
2015-11-30 17:08         ` Vladimir Davydov
2015-11-24 21:59 ` [PATCH 13/13] mm: memcontrol: hook up vmpressure to socket pressure Johannes Weiner
2015-11-24 21:59   ` Johannes Weiner
2015-11-25 16:30   ` David Miller
2015-11-25 16:30     ` David Miller
2015-11-30 11:36   ` Vladimir Davydov
2015-11-30 11:36     ` Vladimir Davydov
2015-11-30 11:36     ` Vladimir Davydov
2015-11-30 11:36     ` Vladimir Davydov
2015-11-30 15:58     ` Johannes Weiner
2015-11-30 15:58       ` Johannes Weiner
2015-11-30 16:13       ` Vladimir Davydov
2015-11-30 16:13         ` Vladimir Davydov
2015-11-30 16:13         ` Vladimir Davydov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.