linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mmotm] memcg: Ignore partial THP when moving task
@ 2015-12-08 17:13 Michal Hocko
  2015-12-08 17:35 ` Johannes Weiner
  2015-12-09  9:07 ` Michal Hocko
  0 siblings, 2 replies; 3+ messages in thread
From: Michal Hocko @ 2015-12-08 17:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Kirill A. Shutemov,
	Vladimir Davydov, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

After "mm: rework mapcount accounting to enable 4k mapping of THPs"
it is possible to have a partial THP accessible via ptes. Memcg task
migration code is not prepared for this situation and uncharges the tail
page from the original memcg while the original THP is still charged via
the head page which is not mapped to the moved task. The page counter
of the origin memcg will underflow when the whole THP is uncharged later
on and lead to:
WARNING: CPU: 0 PID: 1340 at mm/page_counter.c:26 page_counter_cancel+0x34/0x40()
reported by Minchan Kim.

This patch prevents from the underflow by skipping any partial THP pages
in mem_cgroup_move_charge_pte_range. PageTransCompound is checked when
we do pte walk. This means that a process might leave a partial THP
behind in the original memcg if there is no other process mapping it via
pmd but this is considered acceptable because it shouldn't happen often
and this is not considered a memory leak because the original THP is
still accessible and reclaimable. Moreover the task migration has always
been racy and never guaranteed to move all pages.

Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
this is a patch tested by Minchan in the original thread [1]. I have
only replaced PageCompound with PageTransCompound because other similar
fixes in mmotm used this one. The underlying implementation is the same.
Johannes, I have kept your a-b but let me know if you are not OK with the
changelog.

This is mmotm only material. It can be merged into the page which
has introduced the issue but maybe it is worth having on its own for
documentation purposes. I will leave the decision to you Andrew.

[1] http://lkml.kernel.org/r/20151201133455.GB27574@bbox

 mm/memcontrol.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d564bff..143c933f0b81 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4895,6 +4895,14 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 		switch (get_mctgt_type(vma, addr, ptent, &target)) {
 		case MC_TARGET_PAGE:
 			page = target.page;
+			/*
+			 * We can have a part of the split pmd here. Moving it
+			 * can be done but it would be too convoluted so simply
+			 * ignore such a partial THP and keep it in original
+			 * memcg. There should be somebody mapping the head.
+			 */
+			if (PageCompound(page))
+				goto put;
 			if (isolate_lru_page(page))
 				goto put;
 			if (!mem_cgroup_move_account(page, false,
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH mmotm] memcg: Ignore partial THP when moving task
  2015-12-08 17:13 [PATCH mmotm] memcg: Ignore partial THP when moving task Michal Hocko
@ 2015-12-08 17:35 ` Johannes Weiner
  2015-12-09  9:07 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Johannes Weiner @ 2015-12-08 17:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Minchan Kim, Kirill A. Shutemov, Vladimir Davydov,
	linux-mm, LKML, Michal Hocko

On Tue, Dec 08, 2015 at 06:13:09PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> After "mm: rework mapcount accounting to enable 4k mapping of THPs"
> it is possible to have a partial THP accessible via ptes. Memcg task
> migration code is not prepared for this situation and uncharges the tail
> page from the original memcg while the original THP is still charged via
> the head page which is not mapped to the moved task. The page counter
> of the origin memcg will underflow when the whole THP is uncharged later
> on and lead to:
> WARNING: CPU: 0 PID: 1340 at mm/page_counter.c:26 page_counter_cancel+0x34/0x40()
> reported by Minchan Kim.
> 
> This patch prevents from the underflow by skipping any partial THP pages
> in mem_cgroup_move_charge_pte_range. PageTransCompound is checked when
> we do pte walk. This means that a process might leave a partial THP
> behind in the original memcg if there is no other process mapping it via
> pmd but this is considered acceptable because it shouldn't happen often
> and this is not considered a memory leak because the original THP is
> still accessible and reclaimable. Moreover the task migration has always
> been racy and never guaranteed to move all pages.
> 
> Reported-by: Minchan Kim <minchan@kernel.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> this is a patch tested by Minchan in the original thread [1]. I have
> only replaced PageCompound with PageTransCompound because other similar
> fixes in mmotm used this one. The underlying implementation is the same.
> Johannes, I have kept your a-b but let me know if you are not OK with the
> changelog.

Looks good to me, thanks Michal. Please keep my Acked-by.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH mmotm] memcg: Ignore partial THP when moving task
  2015-12-08 17:13 [PATCH mmotm] memcg: Ignore partial THP when moving task Michal Hocko
  2015-12-08 17:35 ` Johannes Weiner
@ 2015-12-09  9:07 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2015-12-09  9:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Kirill A. Shutemov,
	Vladimir Davydov, linux-mm, LKML

Dohh, forgot to git add after s@PageCoumpound@PageTransCompound@
Updated patch is below:
---
>From efff9d4696cbce6710827a8422a5b285bf9b8052 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 4 Dec 2015 08:30:22 +0100
Subject: [PATCH] memcg: Ignore partial THP when moving task

After "mm: rework mapcount accounting to enable 4k mapping of THPs"
it is possible to have a partial THP accessible via ptes. Memcg task
migration code is not prepared for this situation and uncharges the tail
page from the original memcg while the original THP is still charged via
the head page which is not mapped to the moved task. The page counter
of the origin memcg will underflow when the whole THP is uncharged later
on and lead to:
WARNING: CPU: 0 PID: 1340 at mm/page_counter.c:26 page_counter_cancel+0x34/0x40()
reported by Minchan Kim.

This patch prevents from the underflow by skipping any partial THP pages
in mem_cgroup_move_charge_pte_range. PageTransCompound is checked when
we do pte walk. This means that a process might leave a partial THP
behind in the original memcg if there is no other process mapping it via
pmd but this is considered acceptable because it shouldn't happen often
and this is not considered a memory leak because the original THP is
still accessible and reclaimable. Moreover the task migration has always
been racy and never guaranteed to move all pages.

Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 79a29d564bff..4cecefa4a3b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4895,6 +4895,14 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
 		switch (get_mctgt_type(vma, addr, ptent, &target)) {
 		case MC_TARGET_PAGE:
 			page = target.page;
+			/*
+			 * We can have a part of the split pmd here. Moving it
+			 * can be done but it would be too convoluted so simply
+			 * ignore such a partial THP and keep it in original
+			 * memcg. There should be somebody mapping the head.
+			 */
+			if (PageTransCompound(page))
+				goto put;
 			if (isolate_lru_page(page))
 				goto put;
 			if (!mem_cgroup_move_account(page, false,
-- 
2.6.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-09  9:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 17:13 [PATCH mmotm] memcg: Ignore partial THP when moving task Michal Hocko
2015-12-08 17:35 ` Johannes Weiner
2015-12-09  9:07 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).