Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, borntraeger@de.ibm.com, guro@fb.com,
	hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@suse.com,
	mm-commits@vger.kernel.org, shakeelb@google.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org
Subject: [patch 02/86] mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
Date: Wed, 04 Dec 2019 16:49:46 -0800
Message-ID: <20191205004946.YQxscs-3o%akpm@linux-foundation.org> (raw)
In-Reply-To: <20191204164858.fe4ed8886e34ad9f3b34ea00@linux-foundation.org>

From: Roman Gushchin <guro@fb.com>
Subject: mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction

Christian reported a warning like the following obtained during running
some KVM-related tests on s390:

WARNING: CPU: 8 PID: 208 at lib/percpu-refcount.c:108 percpu_ref_exit+0x50/0x58
Modules linked in: kvm(-) xt_CHECKSUM xt_MASQUERADE bonding xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_na>
CPU: 8 PID: 208 Comm: kworker/8:1 Not tainted 5.2.0+ #66
Hardware name: IBM 2964 NC9 712 (LPAR)
Workqueue: events sysfs_slab_remove_workfn
Krnl PSW : 0704e00180000000 0000001529746850 (percpu_ref_exit+0x50/0x58)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 00000000ffff8808 0000001529746740 000003f4e30e8e18 0036008100000000
           0000001f00000000 0035008100000000 0000001fb3573ab8 0000000000000000
           0000001fbdb6de00 0000000000000000 0000001529f01328 0000001fb3573b00
           0000001fbb27e000 0000001fbdb69300 000003e009263d00 000003e009263cd0
Krnl Code: 0000001529746842: f0a0000407fe        srp        4(11,%r0),2046,0
           0000001529746848: 47000700            bc         0,1792
          #000000152974684c: a7f40001            brc        15,152974684e
          >0000001529746850: a7f4fff2            brc        15,1529746834
           0000001529746854: 0707                bcr        0,%r7
           0000001529746856: 0707                bcr        0,%r7
           0000001529746858: eb8ff0580024        stmg       %r8,%r15,88(%r15)
           000000152974685e: a738ffff            lhi        %r3,-1
Call Trace:
([<000003e009263d00>] 0x3e009263d00)
 [<00000015293252ea>] slab_kmem_cache_release+0x3a/0x70
 [<0000001529b04882>] kobject_put+0xaa/0xe8
 [<000000152918cf28>] process_one_work+0x1e8/0x428
 [<000000152918d1b0>] worker_thread+0x48/0x460
 [<00000015291942c6>] kthread+0x126/0x160
 [<0000001529b22344>] ret_from_fork+0x28/0x30
 [<0000001529b2234c>] kernel_thread_starter+0x0/0x10
Last Breaking-Event-Address:
 [<000000152974684c>] percpu_ref_exit+0x4c/0x58
---[ end trace b035e7da5788eb09 ]---

The problem occurs because kmem_cache_destroy() is called immediately
after deleting of a memcg, so it races with the memcg kmem_cache
deactivation.

flush_memcg_workqueue() at the beginning of kmem_cache_destroy() is
supposed to guarantee that all deactivation processes are finished, but
failed to do so.  It waits for an rcu grace period, after which all
children kmem_caches should be deactivated.  During the deactivation
percpu_ref_kill() is called for non root kmem_cache refcounters, but it
requires yet another rcu grace period to finish the transition to the
atomic (dead) state.

So in a rare case when not all children kmem_caches are destroyed at the
moment when the root kmem_cache is about to be gone, we need to wait
another rcu grace period before destroying the root kmem_cache.

This issue can be triggered only with dynamically created kmem_caches
which are used with memcg accounting.  In this case per-memcg child
kmem_caches are created.  They are deactivated from the cgroup removing
path.  If the destruction of the root kmem_cache is racing with the
removal of the cgroup (both are quite complicated multi-stage processes),
the described issue can occur.  The only known way to trigger it in the
real life, is to unload some kernel module which creates a dedicated
kmem_cache, used from different memory cgroups with GFP_ACCOUNT flag.  If
the unloading happens immediately after calling rmdir on the corresponding
cgroup, there is some chance to trigger the issue.

Link: http://lkml.kernel.org/r/20191129025011.3076017-1-guro@fb.com
Fixes: f0a3a24b532d ("mm: memcg/slab: rework non-root kmem_cache lifecycle management")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slab_common.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/mm/slab_common.c~mm-memcg-slab-wait-for-root-kmem_cache-refcnt-killing-on-root-kmem_cache-destruction
+++ a/mm/slab_common.c
@@ -904,6 +904,18 @@ static void flush_memcg_workqueue(struct
 	 * previous workitems on workqueue are processed.
 	 */
 	flush_workqueue(memcg_kmem_cache_wq);
+
+	/*
+	 * If we're racing with children kmem_cache deactivation, it might
+	 * take another rcu grace period to complete their destruction.
+	 * At this moment the corresponding percpu_ref_kill() call should be
+	 * done, but it might take another rcu grace period to complete
+	 * switching to the atomic mode.
+	 * Please, note that we check without grabbing the slab_mutex. It's safe
+	 * because at this moment the children list can't grow.
+	 */
+	if (!list_empty(&s->memcg_params.children))
+		rcu_barrier();
 }
 #else
 static inline int shutdown_memcg_caches(struct kmem_cache *s)
_


  parent reply index

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05  0:48 incoming Andrew Morton
2019-12-05  0:49 ` [patch 01/86] mm/kasan/common.c: fix compile error Andrew Morton
2019-12-05  0:49 ` Andrew Morton [this message]
2019-12-05  0:49 ` [patch 03/86] mm/vmstat: add helpers to get vmstat item names for each enum type Andrew Morton
2019-12-05  0:49 ` [patch 04/86] mm/memcontrol: use vmstat names for printing statistics Andrew Morton
2019-12-05  0:49 ` [patch 05/86] mm/memory.c: replace is_zero_pfn with is_huge_zero_pmd for thp Andrew Morton
2019-12-05  0:49 ` [patch 06/86] proc: change ->nlink under proc_subdir_lock Andrew Morton
2019-12-05  0:50 ` [patch 07/86] fs/proc/generic.c: delete useless "len" variable Andrew Morton
2019-12-05  0:50 ` [patch 08/86] fs/proc/internal.h: shuffle "struct pde_opener" Andrew Morton
2019-12-05  0:50 ` [patch 09/86] include/linux/proc_fs.h: fix confusing macro arg name Andrew Morton
2019-12-05  0:50 ` [patch 10/86] fs/proc/Kconfig: fix indentation Andrew Morton
2019-12-05  0:50 ` [patch 11/86] include/linux/sysctl.h: inline braces for ctl_table and ctl_table_header Andrew Morton
2019-12-05  0:50 ` [patch 12/86] .gitattributes: use 'dts' diff driver for dts files Andrew Morton
2019-12-05  1:00   ` Frank Rowand
2019-12-05  0:50 ` [patch 13/86] linux/build_bug.h: change type to int Andrew Morton
2019-12-05  0:50 ` [patch 14/86] linux/scc.h: make uapi linux/scc.h self-contained Andrew Morton
2019-12-05  0:50 ` [patch 15/86] arch/Kconfig: fix indentation Andrew Morton
2019-12-05  0:50 ` [patch 16/86] scripts/get_maintainer.pl: add signatures from Fixes: <badcommit> lines in commit message Andrew Morton
2019-12-05  0:50 ` [patch 17/86] kernel.h: update comment about simple_strto<foo>() functions Andrew Morton
2019-12-05  0:50 ` [patch 18/86] auxdisplay: charlcd: deduplicate simple_strtoul() Andrew Morton
2019-12-05  0:50 ` [patch 19/86] kernel/notifier.c: intercept duplicate registrations to avoid infinite loops Andrew Morton
2019-12-05  0:50 ` [patch 20/86] kernel/notifier.c: remove notifier_chain_cond_register() Andrew Morton
2019-12-05  0:50 ` [patch 21/86] kernel/notifier.c: remove blocking_notifier_chain_cond_register() Andrew Morton
2019-12-05  0:50 ` [patch 22/86] kernel/profile.c: use cpumask_available to check for NULL cpumask Andrew Morton
2019-12-05  0:50 ` [patch 23/86] kernel/sys.c: avoid copying possible padding bytes in copy_to_user Andrew Morton
2019-12-05  0:50 ` [patch 24/86] bitops: introduce the for_each_set_clump8 macro Andrew Morton
2019-12-05  0:51 ` [patch 25/86] lib/test_bitmap.c: add for_each_set_clump8 test cases Andrew Morton
2019-12-05  0:51 ` [patch 26/86] gpio: 104-dio-48e: utilize for_each_set_clump8 macro Andrew Morton
2019-12-05  0:51 ` [patch 27/86] gpio: 104-idi-48: " Andrew Morton
2019-12-05  0:51 ` [patch 28/86] gpio: gpio-mm: " Andrew Morton
2019-12-05  0:51 ` [patch 29/86] gpio: ws16c48: " Andrew Morton
2019-12-05  0:51 ` [patch 30/86] gpio: pci-idio-16: " Andrew Morton
2019-12-05  0:51 ` [patch 31/86] gpio: pcie-idio-24: " Andrew Morton
2019-12-05  0:51 ` [patch 32/86] gpio: uniphier: " Andrew Morton
2019-12-05  0:51 ` [patch 33/86] gpio: 74x164: utilize the " Andrew Morton
2019-12-05  0:51 ` [patch 34/86] thermal: intel: intel_soc_dts_iosf: Utilize " Andrew Morton
2019-12-05  0:51 ` [patch 35/86] gpio: pisosr: utilize the " Andrew Morton
2019-12-05  0:51 ` [patch 36/86] gpio: max3191x: " Andrew Morton
2019-12-05  0:51 ` [patch 37/86] gpio: pca953x: " Andrew Morton
2019-12-05  0:51 ` [patch 38/86] lib/rbtree: set successor's parent unconditionally Andrew Morton
2019-12-05  0:51 ` [patch 39/86] lib/rbtree: get successor's color directly Andrew Morton
2019-12-05  0:51 ` [patch 40/86] lib/test_meminit.c: add bulk alloc/free tests Andrew Morton
2019-12-05  0:51 ` [patch 41/86] lib/math/rational.c: fix possible incorrect result from rational fractions helper Andrew Morton
2019-12-05  0:52 ` [patch 42/86] lib/genalloc.c: export symbol addr_in_gen_pool Andrew Morton
2019-12-05  0:52 ` [patch 43/86] lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr Andrew Morton
2019-12-05  0:52 ` [patch 44/86] checkpatch: improve ignoring CamelCase SI style variants like mA Andrew Morton
2019-12-05  0:52 ` [patch 45/86] checkpatch: reduce is_maintained_obsolete lookup runtime Andrew Morton
2019-12-05  0:52 ` [patch 46/86] epoll: simplify ep_poll_safewake() for CONFIG_DEBUG_LOCK_ALLOC Andrew Morton
2019-12-05  0:52 ` [patch 47/86] fs/epoll: remove unnecessary wakeups of nested epoll Andrew Morton
2019-12-05  0:52 ` [patch 48/86] selftests: add epoll selftests Andrew Morton
2019-12-05  0:52 ` [patch 49/86] fs/binfmt_elf.c: delete unused "interp_map_addr" argument Andrew Morton
2019-12-05  0:52 ` [patch 50/86] fs/binfmt_elf.c: extract elf_read() function Andrew Morton
2019-12-05  0:52 ` [patch 51/86] init/Kconfig: fix indentation Andrew Morton
2019-12-05  0:52 ` [patch 52/86] drivers/rapidio/rio-driver.c: fix missing include of <linux/rio_drv.h> Andrew Morton
2019-12-05  0:52 ` [patch 53/86] drivers/rapidio/rio-access.c: " Andrew Morton
2019-12-05  0:52 ` [patch 54/86] drm: limit to INT_MAX in create_blob ioctl Andrew Morton
2019-12-05  0:52 ` [patch 55/86] uaccess: disallow > INT_MAX copy sizes Andrew Morton
2019-12-05  0:52 ` [patch 56/86] kcov: remote coverage support Andrew Morton
2019-12-05  0:52 ` [patch 57/86] usb, kcov: collect coverage from hub_event Andrew Morton
2019-12-05  0:52 ` [patch 58/86] vhost, kcov: collect coverage from vhost_worker Andrew Morton
2019-12-05  0:52 ` [patch 59/86] lib/ubsan: don't serialize UBSAN report Andrew Morton
2019-12-05  0:52 ` [patch 60/86] arch: ipcbuf.h: make uapi asm/ipcbuf.h self-contained Andrew Morton
2019-12-05  0:53 ` [patch 61/86] arch: msgbuf.h: make uapi asm/msgbuf.h self-contained Andrew Morton
2019-12-05  0:53 ` [patch 62/86] arch: sembuf.h: make uapi asm/sembuf.h self-contained Andrew Morton
2019-12-05  0:53 ` [patch 63/86] lib/test_bitmap: force argument of bitmap_parselist_user() to proper address space Andrew Morton
2019-12-05  0:53 ` [patch 64/86] lib/test_bitmap: undefine macros after use Andrew Morton
2019-12-05  0:53 ` [patch 65/86] lib/test_bitmap: name EXP_BYTES properly Andrew Morton
2019-12-05  0:53 ` [patch 66/86] lib/test_bitmap: rename exp to exp1 to avoid ambiguous name Andrew Morton
2019-12-05  0:53 ` [patch 67/86] lib/test_bitmap: move exp1 and exp2 upper for others to use Andrew Morton
2019-12-05  0:53 ` [patch 68/86] lib/test_bitmap: fix comment about this file Andrew Morton
2019-12-05  0:53 ` [patch 69/86] lib/bitmap: introduce bitmap_replace() helper Andrew Morton
2019-12-05  0:53 ` [patch 70/86] gpio: pca953x: remove redundant variable and check in IRQ handler Andrew Morton
2019-12-05  0:53 ` [patch 71/86] gpio: pca953x: use input from regs structure in pca953x_irq_pending() Andrew Morton
2019-12-05  0:53 ` [patch 72/86] gpio: pca953x: convert to use bitmap API Andrew Morton
2019-12-05  0:53 ` [patch 73/86] gpio: pca953x: tighten up indentation Andrew Morton
2019-12-05  0:53 ` [patch 74/86] alpha: use pgtable-nopud instead of 4level-fixup Andrew Morton
2019-12-05  0:53 ` [patch 75/86] arm: nommu: " Andrew Morton
2019-12-05  0:53 ` [patch 76/86] c6x: " Andrew Morton
2019-12-05  0:53 ` [patch 77/86] m68k: nommu: " Andrew Morton
2019-12-05  0:53 ` [patch 78/86] m68k: mm: use pgtable-nopXd " Andrew Morton
2019-12-05  0:54 ` [patch 79/86] microblaze: use pgtable-nopmd " Andrew Morton
2019-12-05  0:54 ` [patch 80/86] nds32: " Andrew Morton
2019-12-05  0:54 ` [patch 81/86] parisc: use pgtable-nopXd " Andrew Morton
2019-12-05  0:54 ` [patch 82/86] parisc/hugetlb: " Andrew Morton
2019-12-05  0:54 ` [patch 83/86] sparc32: use pgtable-nopud " Andrew Morton
2019-12-05  0:54 ` [patch 84/86] um: remove unused pxx_offset_proc() and addr_pte() functions Andrew Morton
2019-12-05  0:54 ` [patch 85/86] um: add support for folded p4d page tables Andrew Morton
2019-12-05  0:54 ` [patch 86/86] mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h Andrew Morton

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191205004946.YQxscs-3o%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git