[PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation
@ 2018-09-17 23:10 Roman Gushchin
  2018-09-25 15:58 ` Roman Gushchin
  2018-09-25 18:58 ` Michal Hocko
  0 siblings, 2 replies; 5+ messages in thread
From: Roman Gushchin @ 2018-09-17 23:10 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Kernel Team, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Vladimir Davydov

The memcg OOM killer is never invoked due to a failed high-order
allocation, however the MEMCG_OOM event can be raised.

As shown below, it can happen under conditions, which are very
far from a real OOM: e.g. there is plenty of clean pagecache
and low memory pressure.

There is no sense in raising an OOM event in such a case,
as it might confuse a user and lead to wrong and excessive actions.

Let's look at the charging path in try_caharge(). If the memory usage
is about memory.max, which is absolutely natural for most memory cgroups,
we try to reclaim some pages. Even if we were able to reclaim
enough memory for the allocation, the following check can fail due to
a race with another concurrent allocation:

    if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
        goto retry;

For regular pages the following condition will save us from triggering
the OOM:

   if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
       goto retry;

But for high-order allocation this condition will intentionally fail.
The reason behind is that we'll likely fall to regular pages anyway,
so it's ok and even preferred to return ENOMEM.

In this case the idea of raising MEMCG_OOM looks dubious.

Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after
allocation order check, so that the event won't be raised for high
order allocations. This change doesn't affect regular pages allocation
and charging.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fcec9b39e2a3..103ca3c31c04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 	if (order > PAGE_ALLOC_COSTLY_ORDER)
 		return OOM_SKIPPED;

+	memcg_memory_event(memcg, MEMCG_OOM);
+
 	/*
 	 * We are in the middle of the charge context here, so we
 	 * don't want to block when potentially sitting on a callstack
@@ -2250,8 +2252,6 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (fatal_signal_pending(current))
 		goto force;

-	memcg_memory_event(mem_over_limit, MEMCG_OOM);
-
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
 	 * a forward progress or bypass the charge if the oom killer
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation
  2018-09-17 23:10 [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation Roman Gushchin
@ 2018-09-25 15:58 ` Roman Gushchin
  2018-09-25 18:58 ` Michal Hocko
  1 sibling, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2018-09-25 15:58 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, kernel-team, Johannes Weiner, Michal Hocko,
	Vladimir Davydov

On Mon, Sep 17, 2018 at 04:08:46PM -0700, Roman Gushchin wrote:
> The memcg OOM killer is never invoked due to a failed high-order
> allocation, however the MEMCG_OOM event can be raised.
> 
> As shown below, it can happen under conditions, which are very
> far from a real OOM: e.g. there is plenty of clean pagecache
> and low memory pressure.
> 
> There is no sense in raising an OOM event in such a case,
> as it might confuse a user and lead to wrong and excessive actions.
> 
> Let's look at the charging path in try_caharge(). If the memory usage
> is about memory.max, which is absolutely natural for most memory cgroups,
> we try to reclaim some pages. Even if we were able to reclaim
> enough memory for the allocation, the following check can fail due to
> a race with another concurrent allocation:
> 
>     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>         goto retry;
> 
> For regular pages the following condition will save us from triggering
> the OOM:
> 
>    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
>        goto retry;
> 
> But for high-order allocation this condition will intentionally fail.
> The reason behind is that we'll likely fall to regular pages anyway,
> so it's ok and even preferred to return ENOMEM.
> 
> In this case the idea of raising MEMCG_OOM looks dubious.
> 
> Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after
> allocation order check, so that the event won't be raised for high
> order allocations. This change doesn't affect regular pages allocation
> and charging.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>

I've tried to address all concerns and questions in the updated
changelog, so, hopefully, now it's clear why do we need this change.

Are there any comments, thoughts or objections left?

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation
  2018-09-17 23:10 [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation Roman Gushchin
  2018-09-25 15:58 ` Roman Gushchin
@ 2018-09-25 18:58 ` Michal Hocko
  2018-09-26  8:13   ` Roman Gushchin
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2018-09-25 18:58 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, Kernel Team, Johannes Weiner, Vladimir Davydov

On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> The memcg OOM killer is never invoked due to a failed high-order
> allocation, however the MEMCG_OOM event can be raised.
> 
> As shown below, it can happen under conditions, which are very
> far from a real OOM: e.g. there is plenty of clean pagecache
> and low memory pressure.
> 
> There is no sense in raising an OOM event in such a case,
> as it might confuse a user and lead to wrong and excessive actions.
> 
> Let's look at the charging path in try_caharge(). If the memory usage
> is about memory.max, which is absolutely natural for most memory cgroups,
> we try to reclaim some pages. Even if we were able to reclaim
> enough memory for the allocation, the following check can fail due to
> a race with another concurrent allocation:
> 
>     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>         goto retry;
> 
> For regular pages the following condition will save us from triggering
> the OOM:
> 
>    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
>        goto retry;
> 
> But for high-order allocation this condition will intentionally fail.
> The reason behind is that we'll likely fall to regular pages anyway,
> so it's ok and even preferred to return ENOMEM.
> 
> In this case the idea of raising MEMCG_OOM looks dubious.

I would really appreciate an example of application that would get
confused by consuming this event and an explanation why. I do agree that
the event itself is kinda weird because it doesn't give you any context
for what kind of requests the memcg is OOM. Costly orders are a little
different story than others and users shouldn't care about this because
this is a mere implementation detail.

In other words, do we have any users to actually care about this half
baked event at all? Shouldn't we simply stop emiting it (or make it an
alias of OOM_KILL) rather than making it slightly better but yet kinda
incomplete?

Jeez, we really suck at defining proper interfaces. Things seem so cool
when they are proposed, then those users come and ruin our lives...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation
  2018-09-25 18:58 ` Michal Hocko
@ 2018-09-26  8:13   ` Roman Gushchin
  2018-09-26  8:24     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Roman Gushchin @ 2018-09-26  8:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Kernel Team, Johannes Weiner, Vladimir Davydov

On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote:
> On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> > The memcg OOM killer is never invoked due to a failed high-order
> > allocation, however the MEMCG_OOM event can be raised.
> > 
> > As shown below, it can happen under conditions, which are very
> > far from a real OOM: e.g. there is plenty of clean pagecache
> > and low memory pressure.
> > 
> > There is no sense in raising an OOM event in such a case,
> > as it might confuse a user and lead to wrong and excessive actions.
> > 
> > Let's look at the charging path in try_caharge(). If the memory usage
> > is about memory.max, which is absolutely natural for most memory cgroups,
> > we try to reclaim some pages. Even if we were able to reclaim
> > enough memory for the allocation, the following check can fail due to
> > a race with another concurrent allocation:
> > 
> >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> >         goto retry;
> > 
> > For regular pages the following condition will save us from triggering
> > the OOM:
> > 
> >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> >        goto retry;
> > 
> > But for high-order allocation this condition will intentionally fail.
> > The reason behind is that we'll likely fall to regular pages anyway,
> > so it's ok and even preferred to return ENOMEM.
> > 
> > In this case the idea of raising MEMCG_OOM looks dubious.
> 
> I would really appreciate an example of application that would get
> confused by consuming this event and an explanation why. I do agree that
> the event itself is kinda weird because it doesn't give you any context
> for what kind of requests the memcg is OOM. Costly orders are a little
> different story than others and users shouldn't care about this because
> this is a mere implementation detail.

Our container management system (called Tupperware) used the OOM event
as a signal that a workload might be affected by the OOM killer, so
it restarted the corresponding container.

I started looking at this problem, when I was reported, that it sometimes
happens when there is a plenty of inactive page cache, and also there were
no signs that the OOM killer has been invoking at all.
The proposed patch resolves this problem.

> 
> In other words, do we have any users to actually care about this half
> baked event at all? Shouldn't we simply stop emiting it (or make it an
> alias of OOM_KILL) rather than making it slightly better but yet kinda
> incomplete?

The only problem with OOM_KILL I see is that OOM_KILL might not be raised
at all, if the OOM killer is not able to find an appropriate victim.
For instance, if all tasks are oom protected (oom_score_adj set to -1000).
Also, with oom.group it might be raised multiple times.

I'm not against an idea to deprecate it in some way (neither I'm completely
sold), but in any case the proposed change isn't a step into a wrong direction:
under the conditions I described there is absolutely no sense in raising
the OOM event.

> 
> Jeez, we really suck at defining proper interfaces. Things seem so cool
> when they are proposed, then those users come and ruin our lives...

:)


Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation
  2018-09-26  8:13   ` Roman Gushchin
@ 2018-09-26  8:24     ` Michal Hocko
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2018-09-26  8:24 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, Kernel Team, Johannes Weiner, Vladimir Davydov

On Wed 26-09-18 09:13:43, Roman Gushchin wrote:
> On Tue, Sep 25, 2018 at 08:58:45PM +0200, Michal Hocko wrote:
> > On Mon 17-09-18 23:10:59, Roman Gushchin wrote:
> > > The memcg OOM killer is never invoked due to a failed high-order
> > > allocation, however the MEMCG_OOM event can be raised.
> > > 
> > > As shown below, it can happen under conditions, which are very
> > > far from a real OOM: e.g. there is plenty of clean pagecache
> > > and low memory pressure.
> > > 
> > > There is no sense in raising an OOM event in such a case,
> > > as it might confuse a user and lead to wrong and excessive actions.
> > > 
> > > Let's look at the charging path in try_caharge(). If the memory usage
> > > is about memory.max, which is absolutely natural for most memory cgroups,
> > > we try to reclaim some pages. Even if we were able to reclaim
> > > enough memory for the allocation, the following check can fail due to
> > > a race with another concurrent allocation:
> > > 
> > >     if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> > >         goto retry;
> > > 
> > > For regular pages the following condition will save us from triggering
> > > the OOM:
> > > 
> > >    if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
> > >        goto retry;
> > > 
> > > But for high-order allocation this condition will intentionally fail.
> > > The reason behind is that we'll likely fall to regular pages anyway,
> > > so it's ok and even preferred to return ENOMEM.
> > > 
> > > In this case the idea of raising MEMCG_OOM looks dubious.
> > 
> > I would really appreciate an example of application that would get
> > confused by consuming this event and an explanation why. I do agree that
> > the event itself is kinda weird because it doesn't give you any context
> > for what kind of requests the memcg is OOM. Costly orders are a little
> > different story than others and users shouldn't care about this because
> > this is a mere implementation detail.
> 
> Our container management system (called Tupperware) used the OOM event
> as a signal that a workload might be affected by the OOM killer, so
> it restarted the corresponding container.
> 
> I started looking at this problem, when I was reported, that it sometimes
> happens when there is a plenty of inactive page cache, and also there were
> no signs that the OOM killer has been invoking at all.
> The proposed patch resolves this problem.

Thanks! This is exactly the kind of information that should be in the
changelog. With the changelog updated and an explicit note in the
documentation that the event is triggered only when the memcg is _going_
to consider the oom killer as the only option you can add

Acked-by: Michal Hocko <mhocko@suse.com>

> > In other words, do we have any users to actually care about this half
> > baked event at all? Shouldn't we simply stop emiting it (or make it an
> > alias of OOM_KILL) rather than making it slightly better but yet kinda
> > incomplete?
> 
> The only problem with OOM_KILL I see is that OOM_KILL might not be raised
> at all, if the OOM killer is not able to find an appropriate victim.
> For instance, if all tasks are oom protected (oom_score_adj set to -1000).

This is a very good point.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-26  8:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 23:10 [PATCH RESEND] mm: don't raise MEMCG_OOM event due to failed high-order allocation Roman Gushchin
2018-09-25 15:58 ` Roman Gushchin
2018-09-25 18:58 ` Michal Hocko
2018-09-26  8:13   ` Roman Gushchin
2018-09-26  8:24     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).