From: Johannes Weiner <hannes@cmpxchg.org>
To: miles.chen@mediatek.com
Cc: Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, wsd_upstream@mediatek.com,
linux-mediatek@lists.infradead.org
Subject: Re: [RFC PATCH] mm: memcontrol: fix use after free in mem_cgroup_iter()
Date: Thu, 25 Jul 2019 15:21:49 -0400 [thread overview]
Message-ID: <20190725192149.GA24234@cmpxchg.org> (raw)
In-Reply-To: <20190725142703.27276-1-miles.chen@mediatek.com>
On Thu, Jul 25, 2019 at 10:27:03PM +0800, miles.chen@mediatek.com wrote:
> From: Miles Chen <miles.chen@mediatek.com>
>
> This RFC patch is sent to report an use after free in mem_cgroup_iter()
> after merging commit: be2657752e9e "mm: memcg: fix use after free in
> mem_cgroup_iter()".
>
> I work with android kernel tree (4.9 & 4.14), and the commit:
> be2657752e9e "mm: memcg: fix use after free in mem_cgroup_iter()" has
> been merged to the trees. However, I can still observe use after free
> issues addressed in the commit be2657752e9e.
> (on low-end devices, a few times this month)
>
> backtrace:
> css_tryget <- crash here
> mem_cgroup_iter
> shrink_node
> shrink_zones
> do_try_to_free_pages
> try_to_free_pages
> __perform_reclaim
> __alloc_pages_direct_reclaim
> __alloc_pages_slowpath
> __alloc_pages_nodemask
>
> To debug, I poisoned mem_cgroup before freeing it:
>
> static void __mem_cgroup_free(struct mem_cgroup *memcg)
> for_each_node(node)
> free_mem_cgroup_per_node_info(memcg, node);
> free_percpu(memcg->stat);
> + /* poison memcg before freeing it */
> + memset(memcg, 0x78, sizeof(struct mem_cgroup));
> kfree(memcg);
> }
>
> The coredump shows the position=0xdbbc2a00 is freed.
>
> (gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
> $13 = {position = 0xdbbc2a00, generation = 0x2efd}
>
> 0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100
> 0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000
> 0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000
> 0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878
> 0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0
> 0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000
> 0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000
> 0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000
> 0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000
> 0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000
> 0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878
> 0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878
> 0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878
> 0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878
>
> In the reclaim path, try_to_free_pages() does not setup
> sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
> shrink_node().
>
> In mem_cgroup_iter(), root is set to root_mem_cgroup because
> sc->target_mem_cgroup is NULL.
> It is possible to assign a memcg to root_mem_cgroup.nodeinfo.iter in
> mem_cgroup_iter().
>
> try_to_free_pages
> struct scan_control sc = {...}, target_mem_cgroup is 0x0;
> do_try_to_free_pages
> shrink_zones
> shrink_node
> mem_cgroup *root = sc->target_mem_cgroup;
> memcg = mem_cgroup_iter(root, NULL, &reclaim);
> mem_cgroup_iter()
> if (!root)
> root = root_mem_cgroup;
> ...
>
> css = css_next_descendant_pre(css, &root->css);
> memcg = mem_cgroup_from_css(css);
> cmpxchg(&iter->position, pos, memcg);
>
> My device uses memcg non-hierarchical mode.
> When we release a memcg: invalidate_reclaim_iterators() reaches only
> dead_memcg and its parents. If non-hierarchical mode is used,
> invalidate_reclaim_iterators() never reaches root_mem_cgroup.
>
> static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
> {
> struct mem_cgroup *memcg = dead_memcg;
>
> for (; memcg; memcg = parent_mem_cgroup(memcg)
> ...
> }
>
> So the use after free scenario looks like:
>
> CPU1 CPU2
>
> try_to_free_pages
> do_try_to_free_pages
> shrink_zones
> shrink_node
> mem_cgroup_iter()
> if (!root)
> root = root_mem_cgroup;
> ...
> css = css_next_descendant_pre(css, &root->css);
> memcg = mem_cgroup_from_css(css);
> cmpxchg(&iter->position, pos, memcg);
>
> invalidate_reclaim_iterators(memcg);
> ...
> __mem_cgroup_free()
> kfree(memcg);
>
> try_to_free_pages
> do_try_to_free_pages
> shrink_zones
> shrink_node
> mem_cgroup_iter()
> if (!root)
> root = root_mem_cgroup;
> ...
> mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
> iter = &mz->iter[reclaim->priority];
> pos = READ_ONCE(iter->position);
> css_tryget(&pos->css) <- use after free
>
> To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in
> invalidate_reclaim_iterators().
>
> Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
This looks good to me, but please add a comment that documents why you
need to handle root_mem_cgroup separately:
> +static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
> +{
> + struct mem_cgroup *memcg = dead_memcg;
> + int invalid_root = 0;
> +
> + for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> + __invalidate_reclaim_iterators(memcg, dead_memcg);
> + if (memcg == root_mem_cgroup)
> + invalid_root = 1;
> + }
> +
> + if (!invalid_root)
> + __invalidate_reclaim_iterators(root_mem_cgroup, dead_memcg);
^ This block should have a comment that mentions that non-hierarchy
mode in cgroup1 means that parent_mem_cgroup doesn't walk all the way
up to the cgroup root.
prev parent reply other threads:[~2019-07-25 19:21 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-25 14:27 [RFC PATCH] mm: memcontrol: fix use after free in mem_cgroup_iter() miles.chen
2019-07-25 14:27 ` miles.chen-NuS5LvNUpcJWk0Htik3J/w
2019-07-25 19:21 ` Johannes Weiner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190725192149.GA24234@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=cgroups@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=miles.chen@mediatek.com \
--cc=vdavydov.dev@gmail.com \
--cc=wsd_upstream@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.