[patch 0/3] mm: memcontrol: rewrite uncharge API follow-up fixes

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 0/3] mm: memcontrol: rewrite uncharge API follow-up fixes
@ 2014-07-07 18:52 Johannes Weiner
  2014-07-07 18:52 ` [patch 1/3] mm: memcontrol: rewrite uncharge API fix - uncharge from IRQ context Johannes Weiner
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Johannes Weiner @ 2014-07-07 18:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Michal Hocko, linux-mm, linux-kernel

Hi Andrew,

here are 3 fixlets on top of the memcg uncharge rewrite, two of which
based on problems that Hugh reported.  They should apply directly on
top of the existing fixlets for "mm: memcontrol: rewrite uncharge API".

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 1/3] mm: memcontrol: rewrite uncharge API fix - uncharge from IRQ context
  2014-07-07 18:52 [patch 0/3] mm: memcontrol: rewrite uncharge API follow-up fixes Johannes Weiner
@ 2014-07-07 18:52 ` Johannes Weiner
  2014-07-07 18:52 ` [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration Johannes Weiner
  2014-07-07 18:52 ` [patch 3/3] mm: memcontrol: rewrite uncharge API fix - migrate before re-mapping Johannes Weiner
  2 siblings, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2014-07-07 18:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Michal Hocko, linux-mm, linux-kernel

Hugh reports:

======================================================
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
3.16.0-rc2-mm1 #3 Not tainted
------------------------------------------------------
cc1/2771 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (&(&rtpz->lock)->rlock){+.+.-.}, at: [<ffffffff811518b5>] memcg_check_events+0x17e/0x206
dd
and this task is already holding:
 (&(&zone->lru_lock)->rlock){..-.-.}, at: [<ffffffff8110da3f>] release_pages+0xe7/0x239
which would create a new lock dependency:
 (&(&zone->lru_lock)->rlock){..-.-.} -> (&(&rtpz->lock)->rlock){+.+.-.}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&(&zone->lru_lock)->rlock){..-.-.}
... which became SOFTIRQ-irq-safe at:
  [<ffffffff810c201e>] __lock_acquire+0x59f/0x17e8
  [<ffffffff810c38a6>] lock_acquire+0x61/0x78
  [<ffffffff815bdfbd>] _raw_spin_lock_irqsave+0x3f/0x51
  [<ffffffff8110dc0e>] pagevec_lru_move_fn+0x7d/0xf6
  [<ffffffff8110dca4>] pagevec_move_tail+0x1d/0x2c
  [<ffffffff8110e298>] rotate_reclaimable_page+0xb2/0xd4
  [<ffffffff811018bf>] end_page_writeback+0x1c/0x45
  [<ffffffff81134400>] end_swap_bio_write+0x5c/0x69
  [<ffffffff8123473e>] bio_endio+0x50/0x6e
  [<ffffffff81238dee>] blk_update_request+0x163/0x255
  [<ffffffff81238ef7>] blk_update_bidi_request+0x17/0x65
  [<ffffffff81239242>] blk_end_bidi_request+0x1a/0x56
  [<ffffffff81239289>] blk_end_request+0xb/0xd
  [<ffffffff813a075a>] scsi_io_completion+0x16d/0x553
  [<ffffffff81399c0f>] scsi_finish_command+0xb6/0xbf
  [<ffffffff813a0564>] scsi_softirq_done+0xe9/0xf0
  [<ffffffff8123e8e5>] blk_done_softirq+0x79/0x8b
  [<ffffffff81088675>] __do_softirq+0xfc/0x21f
  [<ffffffff8108898f>] irq_exit+0x3d/0x92
  [<ffffffff81032379>] do_IRQ+0xcc/0xe5
  [<ffffffff815bf5ac>] ret_from_intr+0x0/0x13
  [<ffffffff81443ac0>] cpuidle_enter+0x12/0x14
  [<ffffffff810bb4e4>] cpu_startup_entry+0x187/0x243
  [<ffffffff815a90ab>] rest_init+0x12f/0x133
  [<ffffffff81970e7c>] start_kernel+0x396/0x3a3
  [<ffffffff81970489>] x86_64_start_reservations+0x2a/0x2c
  [<ffffffff81970552>] x86_64_start_kernel+0xc7/0xca

to a SOFTIRQ-irq-unsafe lock:
 (&(&rtpz->lock)->rlock){+.+.-.}
... which became SOFTIRQ-irq-unsafe at:
...  [<ffffffff810c2095>] __lock_acquire+0x616/0x17e8
  [<ffffffff810c38a6>] lock_acquire+0x61/0x78
  [<ffffffff815bde9f>] _raw_spin_lock+0x34/0x41
  [<ffffffff811518b5>] memcg_check_events+0x17e/0x206
  [<ffffffff811535bb>] commit_charge+0x260/0x26f
  [<ffffffff81157004>] mem_cgroup_commit_charge+0xb1/0xdb
  [<ffffffff81115b51>] shmem_getpage_gfp+0x400/0x6c2
  [<ffffffff81115ecc>] shmem_write_begin+0x33/0x35
  [<ffffffff81102a24>] generic_perform_write+0xb7/0x1a4
  [<ffffffff8110391e>] __generic_file_write_iter+0x25b/0x29b
  [<ffffffff81103999>] generic_file_write_iter+0x3b/0xa5
  [<ffffffff8115a115>] new_sync_write+0x7b/0x9f
  [<ffffffff8115a56c>] vfs_write+0xb5/0x169
  [<ffffffff8115ae1f>] SyS_write+0x45/0x8c
  [<ffffffff815bead2>] system_call_fastpath+0x16/0x1b

The soft limit tree lock needs to be IRQ-safe as it's acquired while
holding the IRQ-safe zone->lru_lock.

But more importantly, with uncharge happening in release_pages() now,
this path is executed from interrupt context.

Make the soft limit tree lock, uncharge batching, and charge
statistics IRQ-safe.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 108 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 55 insertions(+), 53 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6c3ffb02651e..1e3b27f8dc2f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -754,9 +754,11 @@ static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_zone *mz,
 static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_zone *mz,
 				       struct mem_cgroup_tree_per_zone *mctz)
 {
-	spin_lock(&mctz->lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&mctz->lock, flags);
 	__mem_cgroup_remove_exceeded(mz, mctz);
-	spin_unlock(&mctz->lock);
+	spin_unlock_irqrestore(&mctz->lock, flags);
 }
 
 
@@ -779,7 +781,9 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page)
 		 * mem is over its softlimit.
 		 */
 		if (excess || mz->on_tree) {
-			spin_lock(&mctz->lock);
+			unsigned long flags;
+
+			spin_lock_irqsave(&mctz->lock, flags);
 			/* if on-tree, remove it */
 			if (mz->on_tree)
 				__mem_cgroup_remove_exceeded(mz, mctz);
@@ -788,7 +792,7 @@ static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page)
 			 * If excess is 0, no tree ops.
 			 */
 			__mem_cgroup_insert_exceeded(mz, mctz, excess);
-			spin_unlock(&mctz->lock);
+			spin_unlock_irqrestore(&mctz->lock, flags);
 		}
 	}
 }
@@ -839,9 +843,9 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
 {
 	struct mem_cgroup_per_zone *mz;
 
-	spin_lock(&mctz->lock);
+	spin_lock_irq(&mctz->lock);
 	mz = __mem_cgroup_largest_soft_limit_node(mctz);
-	spin_unlock(&mctz->lock);
+	spin_unlock_irq(&mctz->lock);
 	return mz;
 }
 
@@ -904,8 +908,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 					 struct page *page,
 					 int nr_pages)
 {
-	preempt_disable();
-
 	/*
 	 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
 	 * counted as CACHE even if it's on ANON LRU.
@@ -930,7 +932,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 	}
 
 	__this_cpu_add(memcg->stat->nr_page_events, nr_pages);
-	preempt_enable();
 }
 
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
@@ -1009,7 +1010,6 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
  */
 static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
 {
-	preempt_disable();
 	/* threshold event is triggered in finer grain than soft limit */
 	if (unlikely(mem_cgroup_event_ratelimit(memcg,
 						MEM_CGROUP_TARGET_THRESH))) {
@@ -1022,8 +1022,6 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
 		do_numainfo = mem_cgroup_event_ratelimit(memcg,
 						MEM_CGROUP_TARGET_NUMAINFO);
 #endif
-		preempt_enable();
-
 		mem_cgroup_threshold(memcg);
 		if (unlikely(do_softlimit))
 			mem_cgroup_update_tree(memcg, page);
@@ -1031,8 +1029,7 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
 		if (unlikely(do_numainfo))
 			atomic_inc(&memcg->numainfo_events);
 #endif
-	} else
-		preempt_enable();
+	}
 }
 
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
@@ -2704,8 +2701,8 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg,
 {
 	struct page_cgroup *pc = lookup_page_cgroup(page);
 	struct zone *uninitialized_var(zone);
-	struct lruvec *lruvec;
 	bool was_on_lru = false;
+	struct lruvec *lruvec;
 
 	VM_BUG_ON_PAGE(PageCgroupUsed(pc), page);
 	/*
@@ -2755,6 +2752,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg,
 		spin_unlock_irq(&zone->lru_lock);
 	}
 
+	local_irq_disable();
 	mem_cgroup_charge_statistics(memcg, page, nr_pages);
 	/*
 	 * "charge_statistics" updated event counter. Then, check it.
@@ -2762,6 +2760,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg,
 	 * if they exceeds softlimit.
 	 */
 	memcg_check_events(memcg, page);
+	local_irq_enable();
 }
 
 static DEFINE_MUTEX(set_limit_mutex);
@@ -3522,8 +3521,6 @@ static int mem_cgroup_move_account(struct page *page,
 			       nr_pages);
 	}
 
-	mem_cgroup_charge_statistics(from, page, -nr_pages);
-
 	/*
 	 * It is safe to change pc->mem_cgroup here because the page
 	 * is referenced, charged, and isolated - we can't race with
@@ -3532,14 +3529,15 @@ static int mem_cgroup_move_account(struct page *page,
 
 	/* caller should have done css_get */
 	pc->mem_cgroup = to;
-	mem_cgroup_charge_statistics(to, page, nr_pages);
 	move_unlock_mem_cgroup(from, &flags);
 	ret = 0;
-	/*
-	 * check events
-	 */
+
+	local_irq_disable();
+	mem_cgroup_charge_statistics(to, page, nr_pages);
 	memcg_check_events(to, page);
+	mem_cgroup_charge_statistics(from, page, -nr_pages);
 	memcg_check_events(from, page);
+	local_irq_enable();
 out:
 	return ret;
 }
@@ -3620,6 +3618,9 @@ out:
 
 void mem_cgroup_uncharge_start(void)
 {
+	unsigned long flags;
+
+	local_irq_save(flags);
 	current->memcg_batch.do_batch++;
 	/* We can do nest. */
 	if (current->memcg_batch.do_batch == 1) {
@@ -3627,21 +3628,18 @@ void mem_cgroup_uncharge_start(void)
 		current->memcg_batch.nr_pages = 0;
 		current->memcg_batch.memsw_nr_pages = 0;
 	}
+	local_irq_restore(flags);
 }
 
 void mem_cgroup_uncharge_end(void)
 {
 	struct memcg_batch_info *batch = &current->memcg_batch;
+	unsigned long flags;
 
-	if (!batch->do_batch)
-		return;
-
-	batch->do_batch--;
-	if (batch->do_batch) /* If stacked, do nothing. */
-		return;
-
-	if (!batch->memcg)
-		return;
+	local_irq_save(flags);
+	VM_BUG_ON(!batch->do_batch);
+	if (--batch->do_batch) /* If stacked, do nothing */
+		goto out;
 	/*
 	 * This "batch->memcg" is valid without any css_get/put etc...
 	 * bacause we hide charges behind us.
@@ -3653,8 +3651,8 @@ void mem_cgroup_uncharge_end(void)
 		res_counter_uncharge(&batch->memcg->memsw,
 				     batch->memsw_nr_pages * PAGE_SIZE);
 	memcg_oom_recover(batch->memcg);
-	/* forget this pointer (for sanity check) */
-	batch->memcg = NULL;
+out:
+	local_irq_restore(flags);
 }
 
 #ifdef CONFIG_MEMCG_SWAP
@@ -3912,7 +3910,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						    gfp_mask, &nr_scanned);
 		nr_reclaimed += reclaimed;
 		*total_scanned += nr_scanned;
-		spin_lock(&mctz->lock);
+		spin_lock_irq(&mctz->lock);
 
 		/*
 		 * If we failed to reclaim anything from this memory cgroup
@@ -3952,7 +3950,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 		 */
 		/* If excess == 0, no tree ops */
 		__mem_cgroup_insert_exceeded(mz, mctz, excess);
-		spin_unlock(&mctz->lock);
+		spin_unlock_irq(&mctz->lock);
 		css_put(&mz->memcg->css);
 		loop++;
 		/*
@@ -6567,6 +6565,7 @@ void mem_cgroup_uncharge(struct page *page)
 	unsigned int nr_pages = 1;
 	struct mem_cgroup *memcg;
 	struct page_cgroup *pc;
+	unsigned long pc_flags;
 	unsigned long flags;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
@@ -6591,35 +6590,38 @@ void mem_cgroup_uncharge(struct page *page)
 	 * exclusive access to the page.
 	 */
 	memcg = pc->mem_cgroup;
-	flags = pc->flags;
+	pc_flags = pc->flags;
 	pc->flags = 0;
 
-	mem_cgroup_charge_statistics(memcg, page, -nr_pages);
-	memcg_check_events(memcg, page);
+	local_irq_save(flags);
 
+	if (nr_pages > 1)
+		goto direct;
+	if (unlikely(test_thread_flag(TIF_MEMDIE)))
+		goto direct;
 	batch = &current->memcg_batch;
+	if (!batch->do_batch)
+		goto direct;
+	if (batch->memcg && batch->memcg != memcg)
+		goto direct;
 	if (!batch->memcg)
 		batch->memcg = memcg;
-	else if (batch->memcg != memcg)
-		goto uncharge;
-	if (nr_pages > 1)
-		goto uncharge;
-	if (!batch->do_batch)
-		goto uncharge;
-	if (test_thread_flag(TIF_MEMDIE))
-		goto uncharge;
-	if (flags & PCG_MEM)
+	if (pc_flags & PCG_MEM)
 		batch->nr_pages++;
-	if (flags & PCG_MEMSW)
+	if (pc_flags & PCG_MEMSW)
 		batch->memsw_nr_pages++;
-	return;
-uncharge:
-	if (flags & PCG_MEM)
+	goto out;
+direct:
+	if (pc_flags & PCG_MEM)
 		res_counter_uncharge(&memcg->res, nr_pages * PAGE_SIZE);
-	if (flags & PCG_MEMSW)
+	if (pc_flags & PCG_MEMSW)
 		res_counter_uncharge(&memcg->memsw, nr_pages * PAGE_SIZE);
-	if (batch->memcg != memcg)
-		memcg_oom_recover(memcg);
+	memcg_oom_recover(memcg);
+out:
+	mem_cgroup_charge_statistics(memcg, page, -nr_pages);
+	memcg_check_events(memcg, page);
+
+	local_irq_restore(flags);
 }
 
 /**
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-07 18:52 [patch 0/3] mm: memcontrol: rewrite uncharge API follow-up fixes Johannes Weiner
  2014-07-07 18:52 ` [patch 1/3] mm: memcontrol: rewrite uncharge API fix - uncharge from IRQ context Johannes Weiner
@ 2014-07-07 18:52 ` Johannes Weiner
  2014-07-14 19:57   ` Hugh Dickins
  2014-07-07 18:52 ` [patch 3/3] mm: memcontrol: rewrite uncharge API fix - migrate before re-mapping Johannes Weiner
  2 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2014-07-07 18:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Michal Hocko, linux-mm, linux-kernel

Hugh reports:

VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
mm/memcontrol.c:6680!
page had count 1 mapcount 0 mapping anon index 0x196
flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
__alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
handle_mm_fault < __do_page_fault

mem_cgroup_migrate() assumes that a page is only migrated once and
then freed immediately after.

However, putting the page back on the LRU list and dropping the
isolation refcount is not done atomically.  This allows a PFN-based
migrator like compaction to isolate the page, see the expected
anonymous page refcount of 1, and migrate the page once more.

Catch pages that have already been migrated and abort migration
gracefully.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1e3b27f8dc2f..e4afdbdda0a7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6653,7 +6653,10 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
 	if (!PageCgroupUsed(pc))
 		return;
 
-	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
+	/* Already migrated */
+	if (!(pc->flags & PCG_MEM))
+		return;
+
 	VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
 	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
 
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [patch 3/3] mm: memcontrol: rewrite uncharge API fix - migrate before re-mapping
  2014-07-07 18:52 [patch 0/3] mm: memcontrol: rewrite uncharge API follow-up fixes Johannes Weiner
  2014-07-07 18:52 ` [patch 1/3] mm: memcontrol: rewrite uncharge API fix - uncharge from IRQ context Johannes Weiner
  2014-07-07 18:52 ` [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration Johannes Weiner
@ 2014-07-07 18:52 ` Johannes Weiner
  2 siblings, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2014-07-07 18:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, Michal Hocko, linux-mm, linux-kernel

Mapped file accounting depends on the the page being charged already,
or it won't get accounted properly, and the mapped file counter will
underflow during unmap later on.

Move mem_cgroup_migrate() before remove_migration_ptes().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index ab43fbfff8ba..7f5a42403fae 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -781,11 +781,11 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 		if (!PageAnon(newpage))
 			newpage->mapping = NULL;
 	} else {
+		mem_cgroup_migrate(page, newpage, false);
 		if (remap_swapcache)
 			remove_migration_ptes(page, newpage);
 		if (!PageAnon(page))
 			page->mapping = NULL;
-		mem_cgroup_migrate(page, newpage, false);
 	}
 
 	unlock_page(newpage);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-07 18:52 ` [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration Johannes Weiner
@ 2014-07-14 19:57   ` Hugh Dickins
  2014-07-15 14:45     ` Johannes Weiner
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2014-07-14 19:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Hugh Dickins, Michal Hocko, linux-mm, linux-kernel

On Mon, 7 Jul 2014, Johannes Weiner wrote:

> Hugh reports:
> 
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> mm/memcontrol.c:6680!
> page had count 1 mapcount 0 mapping anon index 0x196
> flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> handle_mm_fault < __do_page_fault
> 
> mem_cgroup_migrate() assumes that a page is only migrated once and
> then freed immediately after.
> 
> However, putting the page back on the LRU list and dropping the
> isolation refcount is not done atomically.  This allows a PFN-based
> migrator like compaction to isolate the page, see the expected
> anonymous page refcount of 1, and migrate the page once more.
> 
> Catch pages that have already been migrated and abort migration
> gracefully.
> 
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/memcontrol.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1e3b27f8dc2f..e4afdbdda0a7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6653,7 +6653,10 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
>  	if (!PageCgroupUsed(pc))
>  		return;
>  
> -	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
> +	/* Already migrated */
> +	if (!(pc->flags & PCG_MEM))
> +		return;
> +

I am curious why you chose to fix the BUG in this way, instead of
-	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
+	pc->flags = 0;
a few lines further down.

The page that gets left behind with just PCG_USED is anomalous (for an
LRU page, maybe not for a kmem page), isn'it it?  And liable to cause
other problems.

For example, won't it go the wrong way in the "Surreptitiously" test
in mem_cgroup_page_lruvec(): the page no longer has a hold on any
memcg, so is in a danger of being placed on a gone-memcg's LRU?

Hugh

>  	VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
>  	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
>  
> -- 
> 2.0.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-14 19:57   ` Hugh Dickins
@ 2014-07-15 14:45     ` Johannes Weiner
  2014-07-15 22:14       ` Hugh Dickins
  2014-07-16  8:34       ` Michal Hocko
  0 siblings, 2 replies; 10+ messages in thread
From: Johannes Weiner @ 2014-07-15 14:45 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

Hi Hugh,

On Mon, Jul 14, 2014 at 12:57:33PM -0700, Hugh Dickins wrote:
> On Mon, 7 Jul 2014, Johannes Weiner wrote:
> 
> > Hugh reports:
> > 
> > VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> > mm/memcontrol.c:6680!
> > page had count 1 mapcount 0 mapping anon index 0x196
> > flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> > mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> > compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> > __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> > handle_mm_fault < __do_page_fault
> > 
> > mem_cgroup_migrate() assumes that a page is only migrated once and
> > then freed immediately after.
> > 
> > However, putting the page back on the LRU list and dropping the
> > isolation refcount is not done atomically.  This allows a PFN-based
> > migrator like compaction to isolate the page, see the expected
> > anonymous page refcount of 1, and migrate the page once more.
> > 
> > Catch pages that have already been migrated and abort migration
> > gracefully.
> > 
> > Reported-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/memcontrol.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1e3b27f8dc2f..e4afdbdda0a7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6653,7 +6653,10 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
> >  	if (!PageCgroupUsed(pc))
> >  		return;
> >  
> > -	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
> > +	/* Already migrated */
> > +	if (!(pc->flags & PCG_MEM))
> > +		return;
> > +
> 
> I am curious why you chose to fix the BUG in this way, instead of
> -	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
> +	pc->flags = 0;
> a few lines further down.
> 
> The page that gets left behind with just PCG_USED is anomalous (for an
> LRU page, maybe not for a kmem page), isn'it it?  And liable to cause
> other problems.
> 
> For example, won't it go the wrong way in the "Surreptitiously" test
> in mem_cgroup_page_lruvec(): the page no longer has a hold on any
> memcg, so is in a danger of being placed on a gone-memcg's LRU?

I was worried about unusing the page before we have exclusive access
to it (migration_entry_to_page() can still work at this point, though
the current situation seems safe).

But you are right, with the charge belonging to the new page, the old
page no longer pins the memcg and we have to prevent use-after-free.

How about this as a drop-in replacement?

---
>From 274b94ad83b38fe7dc1707a8eb4015b3ab1673c5 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 10 Jul 2014 01:02:11 +0000
Subject: [patch] mm: memcontrol: rewrite uncharge API fix - double migration

Hugh reports:

VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
mm/memcontrol.c:6680!
page had count 1 mapcount 0 mapping anon index 0x196
flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
__alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
handle_mm_fault < __do_page_fault

mem_cgroup_migrate() assumes that a page is only migrated once and
then freed immediately after.

However, putting the page back on the LRU list and dropping the
isolation refcount is not done atomically.  This allows a PFN-based
migrator like compaction to isolate the page, see the expected
anonymous page refcount of 1, and migrate the page once more.

Furthermore, once the charges are transferred to the new page, the old
page no longer has a pin on the memcg, which might get released before
the page itself now.  pc->mem_cgroup is invalid at this point, but
PCG_USED suggests otherwise, provoking use-after-free.

Properly uncharge the page after it's been migrated, including the
clearing of PCG_USED, so that a subsequent charge migration attempt
will be able to detect it and bail out.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Hugh Dickins <hughd@google.com>
---
 mm/memcontrol.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1e3b27f8dc2f..1439537fe7c9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6655,7 +6655,6 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
 
 	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
 	VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
-	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
 
 	if (PageTransHuge(oldpage)) {
 		nr_pages <<= compound_order(oldpage);
@@ -6663,6 +6662,13 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
 		VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage);
 	}
 
+	pc->flags = 0;
+
+	local_irq_disable();
+	mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages);
+	memcg_check_events(pc->mem_cgroup, oldpage);
+	local_irq_enable();
+
 	commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare);
 }
 
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-15 14:45     ` Johannes Weiner
@ 2014-07-15 22:14       ` Hugh Dickins
  2014-07-16  8:34       ` Michal Hocko
  1 sibling, 0 replies; 10+ messages in thread
From: Hugh Dickins @ 2014-07-15 22:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hugh Dickins, Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Tue, 15 Jul 2014, Johannes Weiner wrote:
> On Mon, Jul 14, 2014 at 12:57:33PM -0700, Hugh Dickins wrote:
> > On Mon, 7 Jul 2014, Johannes Weiner wrote:
> > 
> > > Hugh reports:
> > > 
> > > VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> > > mm/memcontrol.c:6680!
> > > page had count 1 mapcount 0 mapping anon index 0x196
> > > flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> > > mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> > > compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> > > __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> > > handle_mm_fault < __do_page_fault
> > > 
> > > mem_cgroup_migrate() assumes that a page is only migrated once and
> > > then freed immediately after.
> > > 
> > > However, putting the page back on the LRU list and dropping the
> > > isolation refcount is not done atomically.  This allows a PFN-based
> > > migrator like compaction to isolate the page, see the expected
> > > anonymous page refcount of 1, and migrate the page once more.
> > > 
> > > Catch pages that have already been migrated and abort migration
> > > gracefully.
> > > 
> > > Reported-by: Hugh Dickins <hughd@google.com>
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > >  mm/memcontrol.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 1e3b27f8dc2f..e4afdbdda0a7 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -6653,7 +6653,10 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
> > >  	if (!PageCgroupUsed(pc))
> > >  		return;
> > >  
> > > -	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
> > > +	/* Already migrated */
> > > +	if (!(pc->flags & PCG_MEM))
> > > +		return;
> > > +
> > 
> > I am curious why you chose to fix the BUG in this way, instead of
> > -	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
> > +	pc->flags = 0;
> > a few lines further down.
> > 
> > The page that gets left behind with just PCG_USED is anomalous (for an
> > LRU page, maybe not for a kmem page), isn'it it?  And liable to cause
> > other problems.
> > 
> > For example, won't it go the wrong way in the "Surreptitiously" test
> > in mem_cgroup_page_lruvec(): the page no longer has a hold on any
> > memcg, so is in a danger of being placed on a gone-memcg's LRU?
> 
> I was worried about unusing the page before we have exclusive access
> to it (migration_entry_to_page() can still work at this point, though
> the current situation seems safe).
> 
> But you are right, with the charge belonging to the new page, the old
> page no longer pins the memcg and we have to prevent use-after-free.
> 
> How about this as a drop-in replacement?

Yes, that looks much better to me, thanks.  I had not realized that the
mem_cgroup_charge_statistics()/memcg_check_events() would also be needed,
but yes, that looks necessary to complement the commit_charge() on the
new page.  I _think_ it should all add up now, but I've certainly not
reviewed thoroughly.

Hugh

> 
> ---
> From 274b94ad83b38fe7dc1707a8eb4015b3ab1673c5 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Thu, 10 Jul 2014 01:02:11 +0000
> Subject: [patch] mm: memcontrol: rewrite uncharge API fix - double migration
> 
> Hugh reports:
> 
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> mm/memcontrol.c:6680!
> page had count 1 mapcount 0 mapping anon index 0x196
> flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> handle_mm_fault < __do_page_fault
> 
> mem_cgroup_migrate() assumes that a page is only migrated once and
> then freed immediately after.
> 
> However, putting the page back on the LRU list and dropping the
> isolation refcount is not done atomically.  This allows a PFN-based
> migrator like compaction to isolate the page, see the expected
> anonymous page refcount of 1, and migrate the page once more.
> 
> Furthermore, once the charges are transferred to the new page, the old
> page no longer has a pin on the memcg, which might get released before
> the page itself now.  pc->mem_cgroup is invalid at this point, but
> PCG_USED suggests otherwise, provoking use-after-free.
> 
> Properly uncharge the page after it's been migrated, including the
> clearing of PCG_USED, so that a subsequent charge migration attempt
> will be able to detect it and bail out.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/memcontrol.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1e3b27f8dc2f..1439537fe7c9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6655,7 +6655,6 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
>  
>  	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
>  	VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
> -	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
>  
>  	if (PageTransHuge(oldpage)) {
>  		nr_pages <<= compound_order(oldpage);
> @@ -6663,6 +6662,13 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
>  		VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage);
>  	}
>  
> +	pc->flags = 0;
> +
> +	local_irq_disable();
> +	mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages);
> +	memcg_check_events(pc->mem_cgroup, oldpage);
> +	local_irq_enable();
> +
>  	commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare);
>  }
>  
> -- 
> 2.0.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-15 14:45     ` Johannes Weiner
  2014-07-15 22:14       ` Hugh Dickins
@ 2014-07-16  8:34       ` Michal Hocko
  2014-07-16 16:04         ` Johannes Weiner
  1 sibling, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2014-07-16  8:34 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel

[Sorry I have missed this thread]

On Tue 15-07-14 10:45:39, Johannes Weiner wrote:
[...]
> From 274b94ad83b38fe7dc1707a8eb4015b3ab1673c5 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Thu, 10 Jul 2014 01:02:11 +0000
> Subject: [patch] mm: memcontrol: rewrite uncharge API fix - double migration
> 
> Hugh reports:
> 
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> mm/memcontrol.c:6680!
> page had count 1 mapcount 0 mapping anon index 0x196
> flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> handle_mm_fault < __do_page_fault
> 
> mem_cgroup_migrate() assumes that a page is only migrated once and
> then freed immediately after.
> 
> However, putting the page back on the LRU list and dropping the
> isolation refcount is not done atomically.  This allows a PFN-based
> migrator like compaction to isolate the page, see the expected
> anonymous page refcount of 1, and migrate the page once more.
> 
> Furthermore, once the charges are transferred to the new page, the old
> page no longer has a pin on the memcg, which might get released before
> the page itself now.  pc->mem_cgroup is invalid at this point, but
> PCG_USED suggests otherwise, provoking use-after-free.

The same applies to to the new page because we are transferring only
statistics. The old page with PCG_USED would uncharge the res_counter
and so the new page is not backed by any and so memcg can go away.
This sounds like a more probable scenario to me because old page should
go away quite early after successful migration.

> Properly uncharge the page after it's been migrated, including the
> clearing of PCG_USED, so that a subsequent charge migration attempt
> will be able to detect it and bail out.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/memcontrol.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1e3b27f8dc2f..1439537fe7c9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6655,7 +6655,6 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
>  
>  	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
>  	VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
> -	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
>  
>  	if (PageTransHuge(oldpage)) {
>  		nr_pages <<= compound_order(oldpage);
> @@ -6663,6 +6662,13 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
>  		VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage);
>  	}
>  
> +	pc->flags = 0;
> +
> +	local_irq_disable();
> +	mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages);
> +	memcg_check_events(pc->mem_cgroup, oldpage);
> +	local_irq_enable();
> +
>  	commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare);
>  }

Looks good to me. I am just wondering whether we should really
fiddle with stats and events when actually nothing changed during
the transition. I would simply extract core of commit_charge into
__commit_charge which would be called from here.

The impact is minimal because events are rate limited and stats are
per-cpu so it is not a big deal it just looks ugly to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-16  8:34       ` Michal Hocko
@ 2014-07-16 16:04         ` Johannes Weiner
  2014-07-16 19:28           ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2014-07-16 16:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel

On Wed, Jul 16, 2014 at 10:34:56AM +0200, Michal Hocko wrote:
> [Sorry I have missed this thread]
> 
> On Tue 15-07-14 10:45:39, Johannes Weiner wrote:
> [...]
> > From 274b94ad83b38fe7dc1707a8eb4015b3ab1673c5 Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Date: Thu, 10 Jul 2014 01:02:11 +0000
> > Subject: [patch] mm: memcontrol: rewrite uncharge API fix - double migration
> > 
> > Hugh reports:
> > 
> > VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> > mm/memcontrol.c:6680!
> > page had count 1 mapcount 0 mapping anon index 0x196
> > flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> > mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> > compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> > __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> > handle_mm_fault < __do_page_fault
> > 
> > mem_cgroup_migrate() assumes that a page is only migrated once and
> > then freed immediately after.
> > 
> > However, putting the page back on the LRU list and dropping the
> > isolation refcount is not done atomically.  This allows a PFN-based
> > migrator like compaction to isolate the page, see the expected
> > anonymous page refcount of 1, and migrate the page once more.
> > 
> > Furthermore, once the charges are transferred to the new page, the old
> > page no longer has a pin on the memcg, which might get released before
> > the page itself now.  pc->mem_cgroup is invalid at this point, but
> > PCG_USED suggests otherwise, provoking use-after-free.
> 
> The same applies to to the new page because we are transferring only
> statistics. The old page with PCG_USED would uncharge the res_counter
> and so the new page is not backed by any and so memcg can go away.
> This sounds like a more probable scenario to me because old page should
> go away quite early after successful migration.

No, the charges are carried by PCG_MEM and PCG_MEMSW, not PCG_USED.

> > Properly uncharge the page after it's been migrated, including the
> > clearing of PCG_USED, so that a subsequent charge migration attempt
> > will be able to detect it and bail out.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Reported-by: Hugh Dickins <hughd@google.com>
> > ---
> >  mm/memcontrol.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 1e3b27f8dc2f..1439537fe7c9 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6655,7 +6655,6 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
> >  
> >  	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage);
> >  	VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage);
> > -	pc->flags &= ~(PCG_MEM | PCG_MEMSW);
> >  
> >  	if (PageTransHuge(oldpage)) {
> >  		nr_pages <<= compound_order(oldpage);
> > @@ -6663,6 +6662,13 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
> >  		VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage);
> >  	}
> >  
> > +	pc->flags = 0;
> > +
> > +	local_irq_disable();
> > +	mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages);
> > +	memcg_check_events(pc->mem_cgroup, oldpage);
> > +	local_irq_enable();
> > +
> >  	commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare);
> >  }
> 
> Looks good to me. I am just wondering whether we should really
> fiddle with stats and events when actually nothing changed during
> the transition. I would simply extract core of commit_charge into
> __commit_charge which would be called from here.
> 
> The impact is minimal because events are rate limited and stats are
> per-cpu so it is not a big deal it just looks ugly to me.

Agreed.  This is the minimal change to get it functionally right (we
are already at -rc5), it can always be optimized later.  I'll send a
patch soon.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration
  2014-07-16 16:04         ` Johannes Weiner
@ 2014-07-16 19:28           ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2014-07-16 19:28 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel

On Wed 16-07-14 12:04:14, Johannes Weiner wrote:
> On Wed, Jul 16, 2014 at 10:34:56AM +0200, Michal Hocko wrote:
> > [Sorry I have missed this thread]
> > 
> > On Tue 15-07-14 10:45:39, Johannes Weiner wrote:
> > [...]
> > > From 274b94ad83b38fe7dc1707a8eb4015b3ab1673c5 Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > Date: Thu, 10 Jul 2014 01:02:11 +0000
> > > Subject: [patch] mm: memcontrol: rewrite uncharge API fix - double migration
> > > 
> > > Hugh reports:
> > > 
> > > VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> > > mm/memcontrol.c:6680!
> > > page had count 1 mapcount 0 mapping anon index 0x196
> > > flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> > > mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> > > compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> > > __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> > > handle_mm_fault < __do_page_fault
> > > 
> > > mem_cgroup_migrate() assumes that a page is only migrated once and
> > > then freed immediately after.
> > > 
> > > However, putting the page back on the LRU list and dropping the
> > > isolation refcount is not done atomically.  This allows a PFN-based
> > > migrator like compaction to isolate the page, see the expected
> > > anonymous page refcount of 1, and migrate the page once more.
> > > 
> > > Furthermore, once the charges are transferred to the new page, the old
> > > page no longer has a pin on the memcg, which might get released before
> > > the page itself now.  pc->mem_cgroup is invalid at this point, but
> > > PCG_USED suggests otherwise, provoking use-after-free.
> > 
> > The same applies to to the new page because we are transferring only
> > statistics. The old page with PCG_USED would uncharge the res_counter
> > and so the new page is not backed by any and so memcg can go away.
> > This sounds like a more probable scenario to me because old page should
> > go away quite early after successful migration.
> 
> No, the charges are carried by PCG_MEM and PCG_MEMSW, not PCG_USED.

Dang. I am blind. Sorry about the noise...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-16 19:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-07 18:52 [patch 0/3] mm: memcontrol: rewrite uncharge API follow-up fixes Johannes Weiner
2014-07-07 18:52 ` [patch 1/3] mm: memcontrol: rewrite uncharge API fix - uncharge from IRQ context Johannes Weiner
2014-07-07 18:52 ` [patch 2/3] mm: memcontrol: rewrite uncharge API fix - double migration Johannes Weiner
2014-07-14 19:57   ` Hugh Dickins
2014-07-15 14:45     ` Johannes Weiner
2014-07-15 22:14       ` Hugh Dickins
2014-07-16  8:34       ` Michal Hocko
2014-07-16 16:04         ` Johannes Weiner
2014-07-16 19:28           ` Michal Hocko
2014-07-07 18:52 ` [patch 3/3] mm: memcontrol: rewrite uncharge API fix - migrate before re-mapping Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).