linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm, memcg: do not retry precharge charges
@ 2017-01-12  4:32 David Rientjes
  2017-01-12 10:00 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: David Rientjes @ 2017-01-12  4:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Vladimir Davydov, Michal Hocko, cgroups,
	linux-kernel, linux-mm

When memory.move_charge_at_immigrate is enabled and precharges are
depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
increase the size of the precharge.

This livelocks if reclaim fails and if an oom killed process attached to
the destination memcg is trying to exit, which requires 
cgroup_threadgroup_rwsem, since we're holding the mutex (we also livelock
while holding mm->mmap_sem for read).

Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
pointless as written.

This also restructures mem_cgroup_wait_acct_move() since it is not
possible for mc.moving_task to be current.

Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/memcontrol.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1125,18 +1125,19 @@ static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
 
 static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
 {
-	if (mc.moving_task && current != mc.moving_task) {
-		if (mem_cgroup_under_move(memcg)) {
-			DEFINE_WAIT(wait);
-			prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
-			/* moving charge context might have finished. */
-			if (mc.moving_task)
-				schedule();
-			finish_wait(&mc.waitq, &wait);
-			return true;
-		}
+	DEFINE_WAIT(wait);
+
+	if (likely(!mem_cgroup_under_move(memcg)))
+		return false;
+
+	prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
+	/* moving charge context might have finished. */
+	if (mc.moving_task) {
+		WARN_ON_ONCE(mc.moving_task == current);
+		schedule();
 	}
-	return false;
+	finish_wait(&mc.waitq, &wait);
+	return true;
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
@@ -4355,9 +4356,14 @@ static int mem_cgroup_do_precharge(unsigned long count)
 		return ret;
 	}
 
-	/* Try charges one by one with reclaim */
+	/*
+	 * Try charges one by one with reclaim, but do not retry.  This avoids
+	 * looping forever when try_charge() cannot reclaim memory and the oom
+	 * killer defers while waiting for a process to exit which is trying to
+	 * acquire cgroup_threadgroup_rwsem in the exit path.
+	 */
 	while (count--) {
-		ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1);
+		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
 		if (ret)
 			return ret;
 		mc.precharge++;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] mm, memcg: do not retry precharge charges
  2017-01-12  4:32 [patch] mm, memcg: do not retry precharge charges David Rientjes
@ 2017-01-12 10:00 ` Michal Hocko
  2017-01-12 10:17 ` Michal Hocko
  2017-01-12 22:46 ` [patch v2] " David Rientjes
  2 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-01-12 10:00 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Wed 11-01-17 20:32:12, David Rientjes wrote:
> When memory.move_charge_at_immigrate is enabled and precharges are
> depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
> increase the size of the precharge.
> 
> This livelocks if reclaim fails and if an oom killed process attached to
> the destination memcg is trying to exit, which requires 
> cgroup_threadgroup_rwsem, since we're holding the mutex (we also livelock
> while holding mm->mmap_sem for read).

Is this really the case? try_charge will return with ENOMEM for
GFP_KERNEL requests and mem_cgroup_do_precharge will bail out. So how
exactly do we livelock? We do not depend on the exiting task to make a
forward progress. Or am I missing something?

> Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
> probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
> pointless as written.

Yes the current code is clearly bogus, I really do not remember why we
ended up with this rather than GFP_KERNEL | __GFP_NORETRY.
 
> This also restructures mem_cgroup_wait_acct_move() since it is not
> possible for mc.moving_task to be current.

Please separate this out to its own patch.

> Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
> Signed-off-by: David Rientjes <rientjes@google.com>

For the mem_cgroup_do_precharge part
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1125,18 +1125,19 @@ static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
>  
>  static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
>  {
> -	if (mc.moving_task && current != mc.moving_task) {
> -		if (mem_cgroup_under_move(memcg)) {
> -			DEFINE_WAIT(wait);
> -			prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> -			/* moving charge context might have finished. */
> -			if (mc.moving_task)
> -				schedule();
> -			finish_wait(&mc.waitq, &wait);
> -			return true;
> -		}
> +	DEFINE_WAIT(wait);
> +
> +	if (likely(!mem_cgroup_under_move(memcg)))
> +		return false;
> +
> +	prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> +	/* moving charge context might have finished. */
> +	if (mc.moving_task) {
> +		WARN_ON_ONCE(mc.moving_task == current);
> +		schedule();
>  	}
> -	return false;
> +	finish_wait(&mc.waitq, &wait);
> +	return true;
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> @@ -4355,9 +4356,14 @@ static int mem_cgroup_do_precharge(unsigned long count)
>  		return ret;
>  	}
>  
> -	/* Try charges one by one with reclaim */
> +	/*
> +	 * Try charges one by one with reclaim, but do not retry.  This avoids
> +	 * looping forever when try_charge() cannot reclaim memory and the oom
> +	 * killer defers while waiting for a process to exit which is trying to
> +	 * acquire cgroup_threadgroup_rwsem in the exit path.
> +	 */
>  	while (count--) {
> -		ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1);
> +		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
>  		if (ret)
>  			return ret;
>  		mc.precharge++;

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] mm, memcg: do not retry precharge charges
  2017-01-12  4:32 [patch] mm, memcg: do not retry precharge charges David Rientjes
  2017-01-12 10:00 ` Michal Hocko
@ 2017-01-12 10:17 ` Michal Hocko
  2017-01-12 22:46 ` [patch v2] " David Rientjes
  2 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-01-12 10:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Wed 11-01-17 20:32:12, David Rientjes wrote:
[...]
> This also restructures mem_cgroup_wait_acct_move() since it is not
> possible for mc.moving_task to be current.

thinking about this some more, I do not think this is the right way to
go. It is true that we will not reach mem_cgroup_wait_acct_move if all
the charges from the task moving code path are __GFP_NORETRY but that
is quite subtle requirement IMHO.

> Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/memcontrol.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1125,18 +1125,19 @@ static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
>  
>  static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg)
>  {
> -	if (mc.moving_task && current != mc.moving_task) {
> -		if (mem_cgroup_under_move(memcg)) {
> -			DEFINE_WAIT(wait);
> -			prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> -			/* moving charge context might have finished. */
> -			if (mc.moving_task)
> -				schedule();
> -			finish_wait(&mc.waitq, &wait);
> -			return true;
> -		}
> +	DEFINE_WAIT(wait);
> +
> +	if (likely(!mem_cgroup_under_move(memcg)))
> +		return false;
> +
> +	prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> +	/* moving charge context might have finished. */
> +	if (mc.moving_task) {
> +		WARN_ON_ONCE(mc.moving_task == current);
> +		schedule();
>  	}
> -	return false;
> +	finish_wait(&mc.waitq, &wait);
> +	return true;
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch v2] mm, memcg: do not retry precharge charges
  2017-01-12  4:32 [patch] mm, memcg: do not retry precharge charges David Rientjes
  2017-01-12 10:00 ` Michal Hocko
  2017-01-12 10:17 ` Michal Hocko
@ 2017-01-12 22:46 ` David Rientjes
  2017-01-13  8:40   ` Michal Hocko
  2 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2017-01-12 22:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Vladimir Davydov, Michal Hocko, cgroups,
	linux-kernel, linux-mm

When memory.move_charge_at_immigrate is enabled and precharges are
depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
increase the size of the precharge.

This can be allowed to do reclaim, but should not call the oom killer to
oom kill a process.  It's better to fail the attach rather than oom kill
a process attached to the memcg hierarchy.

Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
pointless as written.

Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/memcontrol.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4353,9 +4353,12 @@ static int mem_cgroup_do_precharge(unsigned long count)
 		return ret;
 	}
 
-	/* Try charges one by one with reclaim */
+	/*
+	 * Try charges one by one with reclaim, but do not retry.  This avoids
+	 * calling the oom killer when the precharge should just fail.
+	 */
 	while (count--) {
-		ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1);
+		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
 		if (ret)
 			return ret;
 		mc.precharge++;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch v2] mm, memcg: do not retry precharge charges
  2017-01-12 22:46 ` [patch v2] " David Rientjes
@ 2017-01-13  8:40   ` Michal Hocko
  2017-01-13 10:09     ` David Rientjes
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2017-01-13  8:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Thu 12-01-17 14:46:34, David Rientjes wrote:
> When memory.move_charge_at_immigrate is enabled and precharges are
> depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
> increase the size of the precharge.
> 
> This can be allowed to do reclaim, but should not call the oom killer to
> oom kill a process.  It's better to fail the attach rather than oom kill
> a process attached to the memcg hierarchy.

This is not the case though since 3812c8c8f395 ("mm: memcg: do not trap
chargers with full callstack on OOM") - 3.12. Only the page fault path
is allowed to trigger the oom killer. 

> Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
> probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
> pointless as written.
> 
> Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
> Signed-off-by: David Rientjes <rientjes@google.com>

Without the note about the oom killer you can add
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/memcontrol.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4353,9 +4353,12 @@ static int mem_cgroup_do_precharge(unsigned long count)
>  		return ret;
>  	}
>  
> -	/* Try charges one by one with reclaim */
> +	/*
> +	 * Try charges one by one with reclaim, but do not retry.  This avoids
> +	 * calling the oom killer when the precharge should just fail.
> +	 */
>  	while (count--) {
> -		ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1);
> +		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
>  		if (ret)
>  			return ret;
>  		mc.precharge++;

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch v2] mm, memcg: do not retry precharge charges
  2017-01-13  8:40   ` Michal Hocko
@ 2017-01-13 10:09     ` David Rientjes
  2017-01-14 16:22       ` Johannes Weiner
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2017-01-13 10:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

When memory.move_charge_at_immigrate is enabled and precharges are
depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
increase the size of the precharge.

Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
pointless as written.

Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4353,9 +4353,9 @@ static int mem_cgroup_do_precharge(unsigned long count)
 		return ret;
 	}
 
-	/* Try charges one by one with reclaim */
+	/* Try charges one by one with reclaim, but do not retry */
 	while (count--) {
-		ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_NORETRY, 1);
+		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
 		if (ret)
 			return ret;
 		mc.precharge++;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch v2] mm, memcg: do not retry precharge charges
  2017-01-13 10:09     ` David Rientjes
@ 2017-01-14 16:22       ` Johannes Weiner
  2017-01-15  5:42         ` David Rientjes
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2017-01-14 16:22 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Michal Hocko, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Fri, Jan 13, 2017 at 02:09:53AM -0800, David Rientjes wrote:
> When memory.move_charge_at_immigrate is enabled and precharges are
> depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
> increase the size of the precharge.
> 
> Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
> probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
> pointless as written.

The OOM killer livelock was the motivation for this patch. With that
ruled out, what's the point of this patch? Try a bit less hard to move
charges during task migration?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch v2] mm, memcg: do not retry precharge charges
  2017-01-14 16:22       ` Johannes Weiner
@ 2017-01-15  5:42         ` David Rientjes
  2017-01-15 15:19           ` Johannes Weiner
  2017-01-16  7:35           ` Michal Hocko
  0 siblings, 2 replies; 10+ messages in thread
From: David Rientjes @ 2017-01-15  5:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Sat, 14 Jan 2017, Johannes Weiner wrote:

> The OOM killer livelock was the motivation for this patch. With that
> ruled out, what's the point of this patch? Try a bit less hard to move
> charges during task migration?
> 

Most important part is to fail ->can_attach() instead of oom killing 
processes when attaching a process to a memcg hierarchy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch v2] mm, memcg: do not retry precharge charges
  2017-01-15  5:42         ` David Rientjes
@ 2017-01-15 15:19           ` Johannes Weiner
  2017-01-16  7:35           ` Michal Hocko
  1 sibling, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2017-01-15 15:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Michal Hocko, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Sat, Jan 14, 2017 at 09:42:48PM -0800, David Rientjes wrote:
> On Sat, 14 Jan 2017, Johannes Weiner wrote:
> 
> > The OOM killer livelock was the motivation for this patch. With that
> > ruled out, what's the point of this patch? Try a bit less hard to move
> > charges during task migration?
> > 
> 
> Most important part is to fail ->can_attach() instead of oom killing 
> processes when attaching a process to a memcg hierarchy.

Ah, that makes sense.

Could you please update the changelog to reflect this? Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch v2] mm, memcg: do not retry precharge charges
  2017-01-15  5:42         ` David Rientjes
  2017-01-15 15:19           ` Johannes Weiner
@ 2017-01-16  7:35           ` Michal Hocko
  1 sibling, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-01-16  7:35 UTC (permalink / raw)
  To: David Rientjes
  Cc: Johannes Weiner, Andrew Morton, Vladimir Davydov, cgroups,
	linux-kernel, linux-mm

On Sat 14-01-17 21:42:48, David Rientjes wrote:
> On Sat, 14 Jan 2017, Johannes Weiner wrote:
> 
> > The OOM killer livelock was the motivation for this patch. With that
> > ruled out, what's the point of this patch? Try a bit less hard to move
> > charges during task migration?
> > 
> 
> Most important part is to fail ->can_attach() instead of oom killing 
> processes when attaching a process to a memcg hierarchy.

But we are not invoking the oom killer from this path even without
__GFP_NORETRY. Or am I missing your point?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-01-16  7:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12  4:32 [patch] mm, memcg: do not retry precharge charges David Rientjes
2017-01-12 10:00 ` Michal Hocko
2017-01-12 10:17 ` Michal Hocko
2017-01-12 22:46 ` [patch v2] " David Rientjes
2017-01-13  8:40   ` Michal Hocko
2017-01-13 10:09     ` David Rientjes
2017-01-14 16:22       ` Johannes Weiner
2017-01-15  5:42         ` David Rientjes
2017-01-15 15:19           ` Johannes Weiner
2017-01-16  7:35           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).