mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, guro@fb.com, hannes@cmpxchg.org,
	linux-mm@kvack.org, ltp@lists.linux.it, mhocko@kernel.org,
	mkoutny@suse.com, mm-commits@vger.kernel.org,
	rpalethorpe@suse.com, shakeelb@google.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org
Subject: [patch 04/15] mm: memcg: link page counters to root if use_hierarchy is false
Date: Sun, 01 Nov 2020 17:07:34 -0800	[thread overview]
Message-ID: <20201102010734.L07vYi-Dp%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201101170656.48abbd5e88375219f868af5e@linux-foundation.org>

From: Roman Gushchin <guro@fb.com>
Subject: mm: memcg: link page counters to root if use_hierarchy is false

Richard reported a warning which can be reproduced by running the LTP
madvise6 test (cgroup v1 in the non-hierarchical mode should be used):

[    9.841552] ------------[ cut here ]------------
[    9.841788] WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
[    9.841982] Modules linked in:
[    9.842072] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
[    9.842266] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
[    9.842571] Workqueue: events drain_local_stock
[    9.842750] RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
[ 9.842894] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 e8 4b f9 88 2a 48 8b 17 48 39 d6 72 41 41 54 49 89
[    9.843438] RSP: 0018:ffffb1c18006be28 EFLAGS: 00010086
[    9.843585] RAX: ffffffffffffffff RBX: ffffffffffffffff RCX: ffff94803bc2cae0
[    9.843806] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff948007d2b248
[    9.844026] RBP: ffff948007d2b248 R08: ffff948007c58eb0 R09: ffff948007da05ac
[    9.844248] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000001
[    9.844477] R13: ffffffffffffffff R14: 0000000000000000 R15: ffff94803bc2cac0
[    9.844696] FS:  0000000000000000(0000) GS:ffff94803bc00000(0000) knlGS:0000000000000000
[    9.844915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.845096] CR2: 00007f0579ee0384 CR3: 000000002cc0a000 CR4: 00000000000006f0
[    9.845319] Call Trace:
[    9.845429] __memcg_kmem_uncharge (mm/memcontrol.c:3022)
[    9.845582] drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
[    9.845684] drain_local_stock (mm/memcontrol.c:2255)
[    9.845789] process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
[    9.845898] worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
[    9.846034] ? process_one_work (kernel/workqueue.c:2358)
[    9.846162] kthread (kernel/kthread.c:292)
[    9.846271] ? __kthread_bind_mask (kernel/kthread.c:245)
[    9.846420] ret_from_fork (arch/x86/entry/entry_64.S:300)
[    9.846531] ---[ end trace 8b5647c1eba9d18a ]---

The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.

After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example.  It leads to an eventual underflow of the charge and
triggers a warning.

Fix it by linking all page counters to corresponding root page counters in
the non-hierarchical mode.

Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more than
1 level.  This patch doesn't change it.

The patch also doesn't affect how the hierarchical mode is working, which
is the only sane and truly supported mode now.

Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!

Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.com
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Debugged-by: Richard Palethorpe <rpalethorpe@suse.com>
Reported-by: <ltp@lists.linux.it>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

--- a/mm/memcontrol.c~mm-memcg-link-page-counters-to-root-if-use_hierarchy-is-false
+++ a/mm/memcontrol.c
@@ -5345,17 +5345,22 @@ mem_cgroup_css_alloc(struct cgroup_subsy
 		memcg->swappiness = mem_cgroup_swappiness(parent);
 		memcg->oom_kill_disable = parent->oom_kill_disable;
 	}
-	if (parent && parent->use_hierarchy) {
+	if (!parent) {
+		page_counter_init(&memcg->memory, NULL);
+		page_counter_init(&memcg->swap, NULL);
+		page_counter_init(&memcg->kmem, NULL);
+		page_counter_init(&memcg->tcpmem, NULL);
+	} else if (parent->use_hierarchy) {
 		memcg->use_hierarchy = true;
 		page_counter_init(&memcg->memory, &parent->memory);
 		page_counter_init(&memcg->swap, &parent->swap);
 		page_counter_init(&memcg->kmem, &parent->kmem);
 		page_counter_init(&memcg->tcpmem, &parent->tcpmem);
 	} else {
-		page_counter_init(&memcg->memory, NULL);
-		page_counter_init(&memcg->swap, NULL);
-		page_counter_init(&memcg->kmem, NULL);
-		page_counter_init(&memcg->tcpmem, NULL);
+		page_counter_init(&memcg->memory, &root_mem_cgroup->memory);
+		page_counter_init(&memcg->swap, &root_mem_cgroup->swap);
+		page_counter_init(&memcg->kmem, &root_mem_cgroup->kmem);
+		page_counter_init(&memcg->tcpmem, &root_mem_cgroup->tcpmem);
 		/*
 		 * Deeper hierachy with use_hierarchy == false doesn't make
 		 * much sense so let cgroup subsystem know about this
_

  parent reply	other threads:[~2020-11-02  1:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02  1:06 incoming Andrew Morton
2020-11-02  1:07 ` [patch 01/15] mm/mremap_pages: fix static key devmap_managed_key updates Andrew Morton
2020-11-02  1:07 ` [patch 02/15] hugetlb_cgroup: fix reservation accounting Andrew Morton
2020-11-02  1:07 ` [patch 03/15] mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg Andrew Morton
2020-11-02  1:07 ` Andrew Morton [this message]
2020-11-02  1:07 ` [patch 05/15] kasan: adopt KUNIT tests to SW_TAGS mode Andrew Morton
2020-11-02  1:07 ` [patch 06/15] mm: mempolicy: fix potential pte_unmap_unlock pte error Andrew Morton
2020-11-02  1:07 ` [patch 07/15] ptrace: fix task_join_group_stop() for the case when current is traced Andrew Morton
2020-11-02  1:07 ` [patch 08/15] lib/crc32test: remove extra local_irq_disable/enable Andrew Morton
2020-11-02  1:07 ` [patch 09/15] mm/truncate.c: make __invalidate_mapping_pages() static Andrew Morton
2020-11-02  1:07 ` [patch 10/15] kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled Andrew Morton
2020-11-02  1:07 ` [patch 11/15] mm, oom: keep oom_adj under or at upper limit when printing Andrew Morton
2020-11-02  1:08 ` [patch 12/15] mm: always have io_remap_pfn_range() set pgprot_decrypted() Andrew Morton
2020-11-02  1:08 ` [patch 13/15] epoll: check ep_events_available() upon timeout Andrew Morton
2020-11-02 17:08   ` Linus Torvalds
2020-11-02 17:48     ` Soheil Hassas Yeganeh
2020-11-02 18:51       ` Linus Torvalds
2020-11-02 19:38         ` Linus Torvalds
2020-11-02 19:54         ` Soheil Hassas Yeganeh
2020-11-02 20:12           ` Linus Torvalds
2020-11-02  1:08 ` [patch 14/15] epoll: add a selftest for epoll timeout race Andrew Morton
2020-11-02  1:08 ` [patch 15/15] kernel/hung_task.c: make type annotations consistent Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201102010734.L07vYi-Dp%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ltp@lists.linux.it \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=rpalethorpe@suse.com \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).