Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH mmotm] mm: memcontrol: decouple reference counting from page accounting fix
@ 2020-07-31  3:17 Hugh Dickins
  2020-07-31 15:04 ` Johannes Weiner
  2020-08-01  1:20 ` Roman Gushchin
  0 siblings, 2 replies; 3+ messages in thread
From: Hugh Dickins @ 2020-07-31  3:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Gushchin, Johannes Weiner, Shakeel Butt, Hugh Dickins,
	linux-kernel, linux-mm

Moving tasks between mem cgroups with memory.move_charge_at_immigrate 3,
while swapping, crashes soon on mmotm (and so presumably on linux-next):
for example, spinlock found corrupted when lock_page_memcg() is called.
It's as if the mem cgroup structures have been freed too early.

Stab in the dark: what if all the accounting is right, except that the
css_put_many() in __mem_cgroup_clear_mc() is now (worse than) redundant?
Removing it fixes the crashes, but that's hardly surprising; and stats
temporarily hacked into mem_cgroup_css_alloc() and mem_cgroup_css_free()
showed that mem cgroups were not being leaked with this change.

Note: this removes the last call to css_put_many() from the tree; and
mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations.patch
removes the last call to css_get_many(): now that their last references
have gone, I expect them soon to be freed from include/linux/cgroup.h.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
Fixes mm-memcontrol-decouple-reference-counting-from-page-accounting.patch

 mm/memcontrol.c |    2 --
 1 file changed, 2 deletions(-)

--- mmotm/mm/memcontrol.c	2020-07-27 18:55:00.700554752 -0700
+++ linux/mm/memcontrol.c	2020-07-30 12:05:00.640091618 -0700
@@ -5887,8 +5887,6 @@ static void __mem_cgroup_clear_mc(void)
 		if (!mem_cgroup_is_root(mc.to))
 			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
 
-		css_put_many(&mc.to->css, mc.moved_swap);
-
 		mc.moved_swap = 0;
 	}
 	memcg_oom_recover(from);


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH mmotm] mm: memcontrol: decouple reference counting from page accounting fix
  2020-07-31  3:17 [PATCH mmotm] mm: memcontrol: decouple reference counting from page accounting fix Hugh Dickins
@ 2020-07-31 15:04 ` Johannes Weiner
  2020-08-01  1:20 ` Roman Gushchin
  1 sibling, 0 replies; 3+ messages in thread
From: Johannes Weiner @ 2020-07-31 15:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Roman Gushchin, Shakeel Butt, linux-kernel, linux-mm

On Thu, Jul 30, 2020 at 08:17:50PM -0700, Hugh Dickins wrote:
> Moving tasks between mem cgroups with memory.move_charge_at_immigrate 3,
> while swapping, crashes soon on mmotm (and so presumably on linux-next):
> for example, spinlock found corrupted when lock_page_memcg() is called.
> It's as if the mem cgroup structures have been freed too early.
> 
> Stab in the dark: what if all the accounting is right, except that the
> css_put_many() in __mem_cgroup_clear_mc() is now (worse than) redundant?
> Removing it fixes the crashes, but that's hardly surprising; and stats
> temporarily hacked into mem_cgroup_css_alloc() and mem_cgroup_css_free()
> showed that mem cgroups were not being leaked with this change.
> 
> Note: this removes the last call to css_put_many() from the tree; and
> mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations.patch
> removes the last call to css_get_many(): now that their last references
> have gone, I expect them soon to be freed from include/linux/cgroup.h.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Thanks, Hugh. This fix looks correct to me.

And I'd agree with the put being worse than redundant. Its counterpart
in try_charge() has been removed, so this a clear-cut ref imbalance.

When moving a task between cgroups, we scan the page tables for pages
and swap entries, and then pre-charge the target group while we're
still allowed to veto the task move (can_attach). In the actual attach
step we then reassign all the pages and swap entries and balance the
books in the cgroup the task emigrated from.

That precharging used to acquire css references for every page charge
and swap entry charge when calling try_charge(). That is gone. Now we
move css references along with the page (move_account), and swap
entries use the mem_cgroup_id references which pin the css indirectly.

Leaving that css_put_many behind in the swap path was an oversight.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

> ---
> Fixes mm-memcontrol-decouple-reference-counting-from-page-accounting.patch
> 
>  mm/memcontrol.c |    2 --
>  1 file changed, 2 deletions(-)
> 
> --- mmotm/mm/memcontrol.c	2020-07-27 18:55:00.700554752 -0700
> +++ linux/mm/memcontrol.c	2020-07-30 12:05:00.640091618 -0700
> @@ -5887,8 +5887,6 @@ static void __mem_cgroup_clear_mc(void)
>  		if (!mem_cgroup_is_root(mc.to))
>  			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
>  
> -		css_put_many(&mc.to->css, mc.moved_swap);
> -
>  		mc.moved_swap = 0;
>  	}
>  	memcg_oom_recover(from);


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH mmotm] mm: memcontrol: decouple reference counting from page accounting fix
  2020-07-31  3:17 [PATCH mmotm] mm: memcontrol: decouple reference counting from page accounting fix Hugh Dickins
  2020-07-31 15:04 ` Johannes Weiner
@ 2020-08-01  1:20 ` Roman Gushchin
  1 sibling, 0 replies; 3+ messages in thread
From: Roman Gushchin @ 2020-08-01  1:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Johannes Weiner, Shakeel Butt, linux-kernel, linux-mm

On Thu, Jul 30, 2020 at 08:17:50PM -0700, Hugh Dickins wrote:
> Moving tasks between mem cgroups with memory.move_charge_at_immigrate 3,
> while swapping, crashes soon on mmotm (and so presumably on linux-next):
> for example, spinlock found corrupted when lock_page_memcg() is called.
> It's as if the mem cgroup structures have been freed too early.
> 
> Stab in the dark: what if all the accounting is right, except that the
> css_put_many() in __mem_cgroup_clear_mc() is now (worse than) redundant?
> Removing it fixes the crashes, but that's hardly surprising; and stats
> temporarily hacked into mem_cgroup_css_alloc() and mem_cgroup_css_free()
> showed that mem cgroups were not being leaked with this change.
> 
> Note: this removes the last call to css_put_many() from the tree; and
> mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations.patch
> removes the last call to css_get_many(): now that their last references
> have gone, I expect them soon to be freed from include/linux/cgroup.h.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> Fixes mm-memcontrol-decouple-reference-counting-from-page-accounting.patch
> 
>  mm/memcontrol.c |    2 --
>  1 file changed, 2 deletions(-)
> 
> --- mmotm/mm/memcontrol.c	2020-07-27 18:55:00.700554752 -0700
> +++ linux/mm/memcontrol.c	2020-07-30 12:05:00.640091618 -0700
> @@ -5887,8 +5887,6 @@ static void __mem_cgroup_clear_mc(void)
>  		if (!mem_cgroup_is_root(mc.to))
>  			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
>  
> -		css_put_many(&mc.to->css, mc.moved_swap);
> -
>  		mc.moved_swap = 0;
>  	}
>  	memcg_oom_recover(from);

Acked-by: Roman Gushchin <guro@fb.com>

Good catch!

Thank you, Hugh!


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31  3:17 [PATCH mmotm] mm: memcontrol: decouple reference counting from page accounting fix Hugh Dickins
2020-07-31 15:04 ` Johannes Weiner
2020-08-01  1:20 ` Roman Gushchin

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git