linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
@ 2016-09-14 19:48 Johannes Weiner
  2016-09-14 19:48 ` [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets Johannes Weiner
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Johannes Weiner @ 2016-09-14 19:48 UTC (permalink / raw)
  To: Andrew Morton, Tejun Heo, David S. Miller
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

From: Johannes Weiner <jweiner@fb.com>

During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.

The problem turns out to be the per-cpu charge cache. Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for
all memory that is being tracked. Those caches are safe against
scheduling preemption, but not against interrupts - such as the newly
added packet receive path. When cache draining is interrupted by
network RX taking pages out of the cache, the resuming drain operation
will put references of in-use pages, thus causing the imbalance.

Disable IRQs during all per-cpu charge cache operations.

Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7a8d6624758a..60bb830abc34 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1710,17 +1710,22 @@ static DEFINE_MUTEX(percpu_charge_mutex);
 static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
+	unsigned long flags;
 	bool ret = false;
 
 	if (nr_pages > CHARGE_BATCH)
 		return ret;
 
-	stock = &get_cpu_var(memcg_stock);
+	local_irq_save(flags);
+
+	stock = this_cpu_ptr(&memcg_stock);
 	if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
 		stock->nr_pages -= nr_pages;
 		ret = true;
 	}
-	put_cpu_var(memcg_stock);
+
+	local_irq_restore(flags);
+
 	return ret;
 }
 
@@ -1741,15 +1746,18 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 	stock->cached = NULL;
 }
 
-/*
- * This must be called under preempt disabled or must be called by
- * a thread which is pinned to local cpu.
- */
 static void drain_local_stock(struct work_struct *dummy)
 {
-	struct memcg_stock_pcp *stock = this_cpu_ptr(&memcg_stock);
+	struct memcg_stock_pcp *stock;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	stock = this_cpu_ptr(&memcg_stock);
 	drain_stock(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
+
+	local_irq_restore(flags);
 }
 
 /*
@@ -1758,14 +1766,19 @@ static void drain_local_stock(struct work_struct *dummy)
  */
 static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
-	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
+	struct memcg_stock_pcp *stock;
+	unsigned long flags;
+
+	local_irq_save(flags);
 
+	stock = this_cpu_ptr(&memcg_stock);
 	if (stock->cached != memcg) { /* reset if necessary */
 		drain_stock(stock);
 		stock->cached = memcg;
 	}
 	stock->nr_pages += nr_pages;
-	put_cpu_var(memcg_stock);
+
+	local_irq_restore(flags);
 }
 
 /*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets
  2016-09-14 19:48 [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Johannes Weiner
@ 2016-09-14 19:48 ` Johannes Weiner
  2016-09-19 12:03   ` Michal Hocko
  2016-09-19 15:43   ` Vladimir Davydov
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Johannes Weiner @ 2016-09-14 19:48 UTC (permalink / raw)
  To: Andrew Morton, Tejun Heo, David S. Miller
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

From: Johannes Weiner <jweiner@fb.com>

When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 kernel/cgroup.c | 6 ++++++
 net/core/sock.c | 5 ++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0c4db7908264..b0d727d26fc7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6297,6 +6297,12 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 	if (cgroup_sk_alloc_disabled)
 		return;
 
+	/* Socket clone path */
+	if (skcd->val) {
+		cgroup_get(sock_cgroup_ptr(skcd));
+		return;
+	}
+
 	rcu_read_lock();
 
 	while (true) {
diff --git a/net/core/sock.c b/net/core/sock.c
index 51a730485649..038e660ef844 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1340,7 +1340,6 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		if (!try_module_get(prot->owner))
 			goto out_free_sec;
 		sk_tx_queue_clear(sk);
-		cgroup_sk_alloc(&sk->sk_cgrp_data);
 	}
 
 	return sk;
@@ -1400,6 +1399,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_net_set(sk, net);
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
+		cgroup_sk_alloc(&sk->sk_cgrp_data);
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
 	}
@@ -1544,6 +1544,9 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_priority = 0;
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
+
+		cgroup_sk_alloc(&newsk->sk_cgrp_data);
+
 		/*
 		 * Before updating sk_refcnt, we must commit prior changes to memory
 		 * (Documentation/RCU/rculist_nulls.txt for details)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 19:48 [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Johannes Weiner
  2016-09-14 19:48 ` [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets Johannes Weiner
@ 2016-09-14 19:48 ` Johannes Weiner
  2016-09-14 20:45   ` Tejun Heo
                     ` (4 more replies)
  2016-09-19 12:01 ` [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Michal Hocko
  2016-09-19 15:35 ` Vladimir Davydov
  3 siblings, 5 replies; 14+ messages in thread
From: Johannes Weiner @ 2016-09-14 19:48 UTC (permalink / raw)
  To: Andrew Morton, Tejun Heo, David S. Miller
  Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

The cgroup core and the memory controller need to track socket
ownership for different purposes, but the tracking sites being
entirely different is kind of ugly.

Be a better citizen and rename the memory controller callbacks to
match the cgroup core callbacks, then move them to the same place.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h |  4 ++--
 mm/memcontrol.c            | 19 +++++++++++--------
 net/core/sock.c            |  6 +++---
 net/ipv4/tcp.c             |  2 --
 net/ipv4/tcp_ipv4.c        |  3 ---
 5 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0710143723bc..ca11b3e6dd65 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -773,8 +773,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-void sock_update_memcg(struct sock *sk);
-void sock_release_memcg(struct sock *sk);
+void mem_cgroup_sk_alloc(struct sock *sk);
+void mem_cgroup_sk_free(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 #ifdef CONFIG_MEMCG
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 60bb830abc34..2caf1ee86e78 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2939,7 +2939,7 @@ static int memcg_update_tcp_limit(struct mem_cgroup *memcg, unsigned long limit)
 		/*
 		 * The active flag needs to be written after the static_key
 		 * update. This is what guarantees that the socket activation
-		 * function is the last one to run. See sock_update_memcg() for
+		 * function is the last one to run. See mem_cgroup_sk_alloc() for
 		 * details, and note that we don't mark any socket as belonging
 		 * to this memcg until that flag is up.
 		 *
@@ -2948,7 +2948,7 @@ static int memcg_update_tcp_limit(struct mem_cgroup *memcg, unsigned long limit)
 		 * as accounted, but the accounting functions are not patched in
 		 * yet, we'll lose accounting.
 		 *
-		 * We never race with the readers in sock_update_memcg(),
+		 * We never race with the readers in mem_cgroup_sk_alloc(),
 		 * because when this value change, the code to process it is not
 		 * patched in yet.
 		 */
@@ -5651,11 +5651,15 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
 DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
 EXPORT_SYMBOL(memcg_sockets_enabled_key);
 
-void sock_update_memcg(struct sock *sk)
+void mem_cgroup_sk_alloc(struct sock *sk)
 {
 	struct mem_cgroup *memcg;
 
-	/* Socket cloning can throw us here with sk_cgrp already
+	if (!mem_cgroup_sockets_enabled)
+		return;
+
+	/*
+	 * Socket cloning can throw us here with sk_memcg already
 	 * filled. It won't however, necessarily happen from
 	 * process context. So the test for root memcg given
 	 * the current task's memcg won't help us in this case.
@@ -5680,12 +5684,11 @@ void sock_update_memcg(struct sock *sk)
 out:
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(sock_update_memcg);
 
-void sock_release_memcg(struct sock *sk)
+void mem_cgroup_sk_free(struct sock *sk)
 {
-	WARN_ON(!sk->sk_memcg);
-	css_put(&sk->sk_memcg->css);
+	if (sk->sk_memcg)
+		css_put(&sk->sk_memcg->css);
 }
 
 /**
diff --git a/net/core/sock.c b/net/core/sock.c
index 038e660ef844..c73e28fc9c2a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1363,6 +1363,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
 	slab = prot->slab;
 
 	cgroup_sk_free(&sk->sk_cgrp_data);
+	mem_cgroup_sk_free(sk);
 	security_sk_free(sk);
 	if (slab != NULL)
 		kmem_cache_free(slab, sk);
@@ -1399,6 +1400,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sock_net_set(sk, net);
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
+		mem_cgroup_sk_alloc(sk);
 		cgroup_sk_alloc(&sk->sk_cgrp_data);
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
@@ -1545,6 +1547,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
+		mem_cgroup_sk_alloc(newsk);
 		cgroup_sk_alloc(&newsk->sk_cgrp_data);
 
 		/*
@@ -1569,9 +1572,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		if (mem_cgroup_sockets_enabled && sk->sk_memcg)
-			sock_update_memcg(newsk);
-
 		if (newsk->sk_prot->sockets_allocated)
 			sk_sockets_allocated_inc(newsk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a13fcb369f52..fc76ef51a5f4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -421,8 +421,6 @@ void tcp_init_sock(struct sock *sk)
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
 	local_bh_disable();
-	if (mem_cgroup_sockets_enabled)
-		sock_update_memcg(sk);
 	sk_sockets_allocated_inc(sk);
 	local_bh_enable();
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 04b989328558..b8fc74a66299 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1872,9 +1872,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	local_bh_disable();
 	sk_sockets_allocated_dec(sk);
 	local_bh_enable();
-
-	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
-		sock_release_memcg(sk);
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
@ 2016-09-14 20:45   ` Tejun Heo
  2016-09-14 21:12   ` kbuild test robot
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2016-09-14 20:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David S. Miller, Michal Hocko, Vladimir Davydov,
	linux-mm, cgroups, netdev, linux-kernel, kernel-team

On Wed, Sep 14, 2016 at 03:48:46PM -0400, Johannes Weiner wrote:
> The cgroup core and the memory controller need to track socket
> ownership for different purposes, but the tracking sites being
> entirely different is kind of ugly.
> 
> Be a better citizen and rename the memory controller callbacks to
> match the cgroup core callbacks, then move them to the same place.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

For 1-3,

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
  2016-09-14 20:45   ` Tejun Heo
@ 2016-09-14 21:12   ` kbuild test robot
  2016-09-14 21:12   ` kbuild test robot
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2016-09-14 21:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

Hi Johannes,

[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: parisc-c3000_defconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `sk_alloc':
>> (.text.sk_alloc+0x88): undefined reference to `mem_cgroup_sk_alloc'
   net/built-in.o: In function `__sk_destruct':
>> net/core/sock.o:(.text.__sk_destruct+0xbc): undefined reference to `mem_cgroup_sk_free'
   net/built-in.o: In function `sk_clone_lock':
>> (.text.sk_clone_lock+0x164): undefined reference to `mem_cgroup_sk_alloc'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 14256 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
  2016-09-14 20:45   ` Tejun Heo
  2016-09-14 21:12   ` kbuild test robot
@ 2016-09-14 21:12   ` kbuild test robot
  2016-09-15  5:34   ` kbuild test robot
  2016-09-19 12:04   ` Michal Hocko
  4 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2016-09-14 21:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]

Hi Johannes,

[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: parisc-b180_defconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `sk_alloc':
   (.text.sk_alloc+0xb4): undefined reference to `mem_cgroup_sk_alloc'
   net/built-in.o: In function `__sk_destruct':
>> net/core/.tmp_sock.o:(.text.__sk_destruct+0xdc): undefined reference to `mem_cgroup_sk_free'
   net/built-in.o: In function `sk_clone_lock':
   (.text.sk_clone_lock+0x184): undefined reference to `mem_cgroup_sk_alloc'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 12799 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-15  5:34   ` kbuild test robot
@ 2016-09-14 22:17     ` Andrew Morton
  2016-09-15 14:27       ` Johannes Weiner
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2016-09-14 22:17 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Johannes Weiner, kbuild-all, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

On Thu, 15 Sep 2016 13:34:24 +0800 kbuild test robot <lkp@intel.com> wrote:

> Hi Johannes,
> 
> [auto build test ERROR on net/master]
> [also build test ERROR on v4.8-rc6 next-20160914]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
> config: m68k-sun3_defconfig (attached as .config)
> compiler: m68k-linux-gcc (GCC) 4.9.0
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=m68k 
> 
> All errors (new ones prefixed by >>):
> 
>    net/built-in.o: In function `sk_alloc':
> >> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc'
>    net/built-in.o: In function `__sk_destruct':
> >> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free'
>    net/built-in.o: In function `sk_clone_lock':
>    (.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc'

This?

--- a/mm/memcontrol.c~mm-memcontrol-consolidate-cgroup-socket-tracking-fix
+++ a/mm/memcontrol.c
@@ -5655,9 +5655,6 @@ void mem_cgroup_sk_alloc(struct sock *sk
 {
 	struct mem_cgroup *memcg;
 
-	if (!mem_cgroup_sockets_enabled)
-		return;
-
 	/*
 	 * Socket cloning can throw us here with sk_memcg already
 	 * filled. It won't however, necessarily happen from
--- a/net/core/sock.c~mm-memcontrol-consolidate-cgroup-socket-tracking-fix
+++ a/net/core/sock.c
@@ -1385,7 +1385,8 @@ static void sk_prot_free(struct proto *p
 	slab = prot->slab;
 
 	cgroup_sk_free(&sk->sk_cgrp_data);
-	mem_cgroup_sk_free(sk);
+	if (mem_cgroup_sockets_enabled)
+		mem_cgroup_sk_free(sk);
 	security_sk_free(sk);
 	if (slab != NULL)
 		kmem_cache_free(slab, sk);
@@ -1422,7 +1423,8 @@ struct sock *sk_alloc(struct net *net, i
 		sock_net_set(sk, net);
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
-		mem_cgroup_sk_alloc(sk);
+		if (mem_cgroup_sockets_enabled)
+			mem_cgroup_sk_alloc(sk);
 		cgroup_sk_alloc(&sk->sk_cgrp_data);
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
@@ -1569,7 +1571,8 @@ struct sock *sk_clone_lock(const struct
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
-		mem_cgroup_sk_alloc(newsk);
+		if (mem_cgroup_sockets_enabled)
+			mem_cgroup_sk_alloc(newsk);
 		cgroup_sk_alloc(&newsk->sk_cgrp_data);
 
 		/*
_

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
                     ` (2 preceding siblings ...)
  2016-09-14 21:12   ` kbuild test robot
@ 2016-09-15  5:34   ` kbuild test robot
  2016-09-14 22:17     ` Andrew Morton
  2016-09-19 12:04   ` Michal Hocko
  4 siblings, 1 reply; 14+ messages in thread
From: kbuild test robot @ 2016-09-15  5:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

Hi Johannes,

[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `sk_alloc':
>> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc'
   net/built-in.o: In function `__sk_destruct':
>> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free'
   net/built-in.o: In function `sk_clone_lock':
   (.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 11447 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 22:17     ` Andrew Morton
@ 2016-09-15 14:27       ` Johannes Weiner
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2016-09-15 14:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kbuild test robot, kbuild-all, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team

On Wed, Sep 14, 2016 at 03:17:14PM -0700, Andrew Morton wrote:
> On Thu, 15 Sep 2016 13:34:24 +0800 kbuild test robot <lkp@intel.com> wrote:
> 
> > Hi Johannes,
> > 
> > [auto build test ERROR on net/master]
> > [also build test ERROR on v4.8-rc6 next-20160914]
> > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> > [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> > [Check https://git-scm.com/docs/git-format-patch for more information]
> > 
> > url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
> > config: m68k-sun3_defconfig (attached as .config)
> > compiler: m68k-linux-gcc (GCC) 4.9.0
> > reproduce:
> >         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # save the attached .config to linux build tree
> >         make.cross ARCH=m68k 
> > 
> > All errors (new ones prefixed by >>):
> > 
> >    net/built-in.o: In function `sk_alloc':
> > >> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc'
> >    net/built-in.o: In function `__sk_destruct':
> > >> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free'
> >    net/built-in.o: In function `sk_clone_lock':
> >    (.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc'
> 
> This?

Thanks for fixing it up, Andrew.

I think it'd be nicer to declare the dummy functions for !CONFIG_MEMCG;
it also doesn't look like a hotpath that would necessitate the jump
label in that place. Dave, any preference either way?

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ca11b3e6dd65..61d20c17f3b7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -773,13 +773,13 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-void mem_cgroup_sk_alloc(struct sock *sk);
-void mem_cgroup_sk_free(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 #ifdef CONFIG_MEMCG
 extern struct static_key_false memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled static_branch_unlikely(&memcg_sockets_enabled_key)
+void mem_cgroup_sk_alloc(struct sock *sk);
+void mem_cgroup_sk_free(struct sock *sk);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
@@ -792,6 +792,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 }
 #else
 #define mem_cgroup_sockets_enabled 0
+static inline void mem_cgroup_sk_alloc(struct sock *sk) { };
+static inline void mem_cgroup_sk_free(struct sock *sk) { };
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
 	return false;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
  2016-09-14 19:48 [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Johannes Weiner
  2016-09-14 19:48 ` [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets Johannes Weiner
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
@ 2016-09-19 12:01 ` Michal Hocko
  2016-09-19 15:35 ` Vladimir Davydov
  3 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2016-09-19 12:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, linux-mm, cgroups,
	netdev, linux-kernel, kernel-team, Vladimir Davydov

[Fixup Vladimir's email]

On Wed 14-09-16 15:48:44, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@fb.com>
> 
> During cgroup2 rollout into production, we started encountering css
> refcount underflows and css access crashes in the memory controller.
> Splitting the heavily shared css reference counter into logical users
> narrowed the imbalance down to the cgroup2 socket memory accounting.
> 
> The problem turns out to be the per-cpu charge cache. Cgroup1 had a
> separate socket counter, but the new cgroup2 socket accounting goes
> through the common charge path that uses a shared per-cpu cache for
> all memory that is being tracked. Those caches are safe against
> scheduling preemption, but not against interrupts - such as the newly
> added packet receive path. When cache draining is interrupted by
> network RX taking pages out of the cache, the resuming drain operation
> will put references of in-use pages, thus causing the imbalance.
> 
> Disable IRQs during all per-cpu charge cache operations.
> 
> Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
> Cc: <stable@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7a8d6624758a..60bb830abc34 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1710,17 +1710,22 @@ static DEFINE_MUTEX(percpu_charge_mutex);
>  static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
>  	struct memcg_stock_pcp *stock;
> +	unsigned long flags;
>  	bool ret = false;
>  
>  	if (nr_pages > CHARGE_BATCH)
>  		return ret;
>  
> -	stock = &get_cpu_var(memcg_stock);
> +	local_irq_save(flags);
> +
> +	stock = this_cpu_ptr(&memcg_stock);
>  	if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
>  		stock->nr_pages -= nr_pages;
>  		ret = true;
>  	}
> -	put_cpu_var(memcg_stock);
> +
> +	local_irq_restore(flags);
> +
>  	return ret;
>  }
>  
> @@ -1741,15 +1746,18 @@ static void drain_stock(struct memcg_stock_pcp *stock)
>  	stock->cached = NULL;
>  }
>  
> -/*
> - * This must be called under preempt disabled or must be called by
> - * a thread which is pinned to local cpu.
> - */
>  static void drain_local_stock(struct work_struct *dummy)
>  {
> -	struct memcg_stock_pcp *stock = this_cpu_ptr(&memcg_stock);
> +	struct memcg_stock_pcp *stock;
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +
> +	stock = this_cpu_ptr(&memcg_stock);
>  	drain_stock(stock);
>  	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
> +
> +	local_irq_restore(flags);
>  }
>  
>  /*
> @@ -1758,14 +1766,19 @@ static void drain_local_stock(struct work_struct *dummy)
>   */
>  static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> -	struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock);
> +	struct memcg_stock_pcp *stock;
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
>  
> +	stock = this_cpu_ptr(&memcg_stock);
>  	if (stock->cached != memcg) { /* reset if necessary */
>  		drain_stock(stock);
>  		stock->cached = memcg;
>  	}
>  	stock->nr_pages += nr_pages;
> -	put_cpu_var(memcg_stock);
> +
> +	local_irq_restore(flags);
>  }
>  
>  /*
> -- 
> 2.9.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets
  2016-09-14 19:48 ` [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets Johannes Weiner
@ 2016-09-19 12:03   ` Michal Hocko
  2016-09-19 15:43   ` Vladimir Davydov
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2016-09-19 12:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, linux-mm, cgroups,
	netdev, linux-kernel, kernel-team, Vladimir Davydov

[Fixup Vladimir's email]

I am not familiar with this code path to give my ack, unfortunatelly.

On Wed 14-09-16 15:48:45, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@fb.com>
> 
> When a socket is cloned, the associated sock_cgroup_data is duplicated
> but not its reference on the cgroup. As a result, the cgroup reference
> count will underflow when both sockets are destroyed later on.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Cc: <stable@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  kernel/cgroup.c | 6 ++++++
>  net/core/sock.c | 5 ++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 0c4db7908264..b0d727d26fc7 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -6297,6 +6297,12 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
>  	if (cgroup_sk_alloc_disabled)
>  		return;
>  
> +	/* Socket clone path */
> +	if (skcd->val) {
> +		cgroup_get(sock_cgroup_ptr(skcd));
> +		return;
> +	}
> +
>  	rcu_read_lock();
>  
>  	while (true) {
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 51a730485649..038e660ef844 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1340,7 +1340,6 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
>  		if (!try_module_get(prot->owner))
>  			goto out_free_sec;
>  		sk_tx_queue_clear(sk);
> -		cgroup_sk_alloc(&sk->sk_cgrp_data);
>  	}
>  
>  	return sk;
> @@ -1400,6 +1399,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
>  		sock_net_set(sk, net);
>  		atomic_set(&sk->sk_wmem_alloc, 1);
>  
> +		cgroup_sk_alloc(&sk->sk_cgrp_data);
>  		sock_update_classid(&sk->sk_cgrp_data);
>  		sock_update_netprioidx(&sk->sk_cgrp_data);
>  	}
> @@ -1544,6 +1544,9 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
>  		newsk->sk_priority = 0;
>  		newsk->sk_incoming_cpu = raw_smp_processor_id();
>  		atomic64_set(&newsk->sk_cookie, 0);
> +
> +		cgroup_sk_alloc(&newsk->sk_cgrp_data);
> +
>  		/*
>  		 * Before updating sk_refcnt, we must commit prior changes to memory
>  		 * (Documentation/RCU/rculist_nulls.txt for details)
> -- 
> 2.9.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
  2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
                     ` (3 preceding siblings ...)
  2016-09-15  5:34   ` kbuild test robot
@ 2016-09-19 12:04   ` Michal Hocko
  4 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2016-09-19 12:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, linux-mm, cgroups,
	netdev, linux-kernel, kernel-team, Vladimir Davydov

[Fixup Vladimir's email]

same here I do not feel familiar with the code enough to give my ack but
Vladimir might be in a better position

On Wed 14-09-16 15:48:46, Johannes Weiner wrote:
> The cgroup core and the memory controller need to track socket
> ownership for different purposes, but the tracking sites being
> entirely different is kind of ugly.
> 
> Be a better citizen and rename the memory controller callbacks to
> match the cgroup core callbacks, then move them to the same place.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h |  4 ++--
>  mm/memcontrol.c            | 19 +++++++++++--------
>  net/core/sock.c            |  6 +++---
>  net/ipv4/tcp.c             |  2 --
>  net/ipv4/tcp_ipv4.c        |  3 ---
>  5 files changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0710143723bc..ca11b3e6dd65 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -773,8 +773,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>  #endif	/* CONFIG_CGROUP_WRITEBACK */
>  
>  struct sock;
> -void sock_update_memcg(struct sock *sk);
> -void sock_release_memcg(struct sock *sk);
> +void mem_cgroup_sk_alloc(struct sock *sk);
> +void mem_cgroup_sk_free(struct sock *sk);
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
>  void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
>  #ifdef CONFIG_MEMCG
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 60bb830abc34..2caf1ee86e78 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2939,7 +2939,7 @@ static int memcg_update_tcp_limit(struct mem_cgroup *memcg, unsigned long limit)
>  		/*
>  		 * The active flag needs to be written after the static_key
>  		 * update. This is what guarantees that the socket activation
> -		 * function is the last one to run. See sock_update_memcg() for
> +		 * function is the last one to run. See mem_cgroup_sk_alloc() for
>  		 * details, and note that we don't mark any socket as belonging
>  		 * to this memcg until that flag is up.
>  		 *
> @@ -2948,7 +2948,7 @@ static int memcg_update_tcp_limit(struct mem_cgroup *memcg, unsigned long limit)
>  		 * as accounted, but the accounting functions are not patched in
>  		 * yet, we'll lose accounting.
>  		 *
> -		 * We never race with the readers in sock_update_memcg(),
> +		 * We never race with the readers in mem_cgroup_sk_alloc(),
>  		 * because when this value change, the code to process it is not
>  		 * patched in yet.
>  		 */
> @@ -5651,11 +5651,15 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
>  DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
>  EXPORT_SYMBOL(memcg_sockets_enabled_key);
>  
> -void sock_update_memcg(struct sock *sk)
> +void mem_cgroup_sk_alloc(struct sock *sk)
>  {
>  	struct mem_cgroup *memcg;
>  
> -	/* Socket cloning can throw us here with sk_cgrp already
> +	if (!mem_cgroup_sockets_enabled)
> +		return;
> +
> +	/*
> +	 * Socket cloning can throw us here with sk_memcg already
>  	 * filled. It won't however, necessarily happen from
>  	 * process context. So the test for root memcg given
>  	 * the current task's memcg won't help us in this case.
> @@ -5680,12 +5684,11 @@ void sock_update_memcg(struct sock *sk)
>  out:
>  	rcu_read_unlock();
>  }
> -EXPORT_SYMBOL(sock_update_memcg);
>  
> -void sock_release_memcg(struct sock *sk)
> +void mem_cgroup_sk_free(struct sock *sk)
>  {
> -	WARN_ON(!sk->sk_memcg);
> -	css_put(&sk->sk_memcg->css);
> +	if (sk->sk_memcg)
> +		css_put(&sk->sk_memcg->css);
>  }
>  
>  /**
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 038e660ef844..c73e28fc9c2a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1363,6 +1363,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
>  	slab = prot->slab;
>  
>  	cgroup_sk_free(&sk->sk_cgrp_data);
> +	mem_cgroup_sk_free(sk);
>  	security_sk_free(sk);
>  	if (slab != NULL)
>  		kmem_cache_free(slab, sk);
> @@ -1399,6 +1400,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
>  		sock_net_set(sk, net);
>  		atomic_set(&sk->sk_wmem_alloc, 1);
>  
> +		mem_cgroup_sk_alloc(sk);
>  		cgroup_sk_alloc(&sk->sk_cgrp_data);
>  		sock_update_classid(&sk->sk_cgrp_data);
>  		sock_update_netprioidx(&sk->sk_cgrp_data);
> @@ -1545,6 +1547,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
>  		newsk->sk_incoming_cpu = raw_smp_processor_id();
>  		atomic64_set(&newsk->sk_cookie, 0);
>  
> +		mem_cgroup_sk_alloc(newsk);
>  		cgroup_sk_alloc(&newsk->sk_cgrp_data);
>  
>  		/*
> @@ -1569,9 +1572,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
>  		sk_set_socket(newsk, NULL);
>  		newsk->sk_wq = NULL;
>  
> -		if (mem_cgroup_sockets_enabled && sk->sk_memcg)
> -			sock_update_memcg(newsk);
> -
>  		if (newsk->sk_prot->sockets_allocated)
>  			sk_sockets_allocated_inc(newsk);
>  
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index a13fcb369f52..fc76ef51a5f4 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -421,8 +421,6 @@ void tcp_init_sock(struct sock *sk)
>  	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
>  
>  	local_bh_disable();
> -	if (mem_cgroup_sockets_enabled)
> -		sock_update_memcg(sk);
>  	sk_sockets_allocated_inc(sk);
>  	local_bh_enable();
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 04b989328558..b8fc74a66299 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1872,9 +1872,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
>  	local_bh_disable();
>  	sk_sockets_allocated_dec(sk);
>  	local_bh_enable();
> -
> -	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
> -		sock_release_memcg(sk);
>  }
>  EXPORT_SYMBOL(tcp_v4_destroy_sock);
>  
> -- 
> 2.9.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
  2016-09-14 19:48 [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Johannes Weiner
                   ` (2 preceding siblings ...)
  2016-09-19 12:01 ` [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Michal Hocko
@ 2016-09-19 15:35 ` Vladimir Davydov
  3 siblings, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2016-09-19 15:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, Michal Hocko,
	linux-mm, cgroups, netdev, linux-kernel, kernel-team

On Wed, Sep 14, 2016 at 03:48:44PM -0400, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@fb.com>
> 
> During cgroup2 rollout into production, we started encountering css
> refcount underflows and css access crashes in the memory controller.
> Splitting the heavily shared css reference counter into logical users
> narrowed the imbalance down to the cgroup2 socket memory accounting.
> 
> The problem turns out to be the per-cpu charge cache. Cgroup1 had a
> separate socket counter, but the new cgroup2 socket accounting goes
> through the common charge path that uses a shared per-cpu cache for
> all memory that is being tracked. Those caches are safe against
> scheduling preemption, but not against interrupts - such as the newly
> added packet receive path. When cache draining is interrupted by
> network RX taking pages out of the cache, the resuming drain operation
> will put references of in-use pages, thus causing the imbalance.
> 
> Disable IRQs during all per-cpu charge cache operations.
> 
> Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
> Cc: <stable@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets
  2016-09-14 19:48 ` [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets Johannes Weiner
  2016-09-19 12:03   ` Michal Hocko
@ 2016-09-19 15:43   ` Vladimir Davydov
  1 sibling, 0 replies; 14+ messages in thread
From: Vladimir Davydov @ 2016-09-19 15:43 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, Michal Hocko,
	linux-mm, cgroups, netdev, linux-kernel, kernel-team

On Wed, Sep 14, 2016 at 03:48:45PM -0400, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@fb.com>
> 
> When a socket is cloned, the associated sock_cgroup_data is duplicated
> but not its reference on the cgroup. As a result, the cgroup reference
> count will underflow when both sockets are destroyed later on.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Cc: <stable@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-09-19 15:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 19:48 [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Johannes Weiner
2016-09-14 19:48 ` [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets Johannes Weiner
2016-09-19 12:03   ` Michal Hocko
2016-09-19 15:43   ` Vladimir Davydov
2016-09-14 19:48 ` [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking Johannes Weiner
2016-09-14 20:45   ` Tejun Heo
2016-09-14 21:12   ` kbuild test robot
2016-09-14 21:12   ` kbuild test robot
2016-09-15  5:34   ` kbuild test robot
2016-09-14 22:17     ` Andrew Morton
2016-09-15 14:27       ` Johannes Weiner
2016-09-19 12:04   ` Michal Hocko
2016-09-19 12:01 ` [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Michal Hocko
2016-09-19 15:35 ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).