All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Greg Thelen <gthelen@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Thu, 26 Oct 2017 10:31:40 -0400	[thread overview]
Message-ID: <20171026143140.GB21147@cmpxchg.org> (raw)
In-Reply-To: <xr931slqdery.fsf@gthelen.svl.corp.google.com>

On Wed, Oct 25, 2017 at 03:49:21PM -0700, Greg Thelen wrote:
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > On Wed, Oct 25, 2017 at 09:00:57PM +0200, Michal Hocko wrote:
> >> On Wed 25-10-17 14:11:06, Johannes Weiner wrote:
> >> > "Safe" is a vague term, and it doesn't make much sense to me in this
> >> > situation. The OOM behavior should be predictable and consistent.
> >> > 
> >> > Yes, global might in the rarest cases also return -ENOMEM. Maybe. We
> >> > don't have to do that in memcg because we're not physically limited.
> >> 
> >> OK, so here seems to be the biggest disconnect. Being physically or
> >> artificially constrained shouldn't make much difference IMHO. In both
> >> cases the resource is simply limited for the consumer. And once all the
> >> attempts to fit within the limit fail then the request for the resource
> >> has to fail.
> >
> > It's a huge difference. In the global case, we have to make trade-offs
> > to not deadlock the kernel. In the memcg case, we have to make a trade
> > off between desirable OOM behavior and desirable meaning of memory.max.
> >
> > If we can borrow a resource temporarily from the ether to resolve the
> > OOM situation, I don't see why we shouldn't. We're only briefly
> > ignoring the limit to make sure the allocating task isn't preventing
> > the OOM victim from exiting or the OOM reaper from reaping. It's more
> > of an implementation detail than interface.
> >
> > The only scenario you brought up where this might be the permanent
> > overrun is the single, oom-disabled task. And I explained why that is
> > a silly argument, why that's the least problematic consequence of
> > oom-disabling, and why it probably shouldn't even be configurable.
> >
> > The idea that memory.max must never be breached is an extreme and
> > narrow view. As Greg points out, there are allocations we do not even
> > track. There are other scenarios that force allocations. They may
> > violate the limit on paper, but they're not notably weakening the goal
> > of memory.max - isolating workloads from each other.
> >
> > Let's look at it this way.
> >
> > There are two deadlock relationships the OOM killer needs to solve
> > between the triggerer and the potential OOM victim:
> >
> > 	#1 Memory. The triggerer needs memory that the victim has,
> > 	    but the victim needs some additional memory to release it.
> >
> > 	#2 Locks. The triggerer needs memory that the victim has, but
> > 	    the victim needs a lock the triggerer holds to release it.
> >
> > We have no qualms letting the victim temporarily (until the victim's
> > exit) ignore memory.max to resolve the memory deadlock #1.
> >
> > I don't understand why it's such a stretch to let the triggerer
> > temporarily (until the victim's exit) ignore memory.max to resolve the
> > locks deadlock #2. [1]
> >
> > We need both for the OOM killer to function correctly.
> >
> > We've solved #1 both for memcg and globally. But we haven't solved #2.
> > Global can still deadlock, and memcg copped out and returns -ENOMEM.
> >
> > Adding speculative OOM killing before the -ENOMEM makes things more
> > muddy and unpredictable. It doesn't actually solve deadlock #2.
> >
> > [1] And arguably that's what we should be doing in the global case
> >     too: give the triggerer access to reserves. If you recall this
> >     thread here: https://patchwork.kernel.org/patch/6088511/
> >
> >> > > So the only change I am really proposing is to keep retrying as long
> >> > > as the oom killer makes a forward progress and ENOMEM otherwise.
> >> > 
> >> > That's the behavior change I'm against.
> >> 
> >> So just to make it clear you would be OK with the retry on successful
> >> OOM killer invocation and force charge on oom failure, right?
> >
> > Yeah, that sounds reasonable to me.
> 
> Assuming we're talking about retrying within try_charge(), then there's
> a detail to iron out...
> 
> If there is a pending oom victim blocked on a lock held by try_charge() caller
> (the "#2 Locks" case), then I think repeated calls to out_of_memory() will
> return true until the victim either gets MMF_OOM_SKIP or disappears.  So a force
> charge fallback might be a needed even with oom killer successful invocations.
> Or we'll need to teach out_of_memory() to return three values (e.g. NO_VICTIM,
> NEW_VICTIM, PENDING_VICTIM) and try_charge() can loop on NEW_VICTIM.

True. I was assuming we'd retry MEM_CGROUP_RECLAIM_RETRIES times at a
maximum, even if the OOM killer indicates a kill has been issued. What
you propose makes sense too.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Greg Thelen <gthelen@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg
Date: Thu, 26 Oct 2017 10:31:40 -0400	[thread overview]
Message-ID: <20171026143140.GB21147@cmpxchg.org> (raw)
In-Reply-To: <xr931slqdery.fsf@gthelen.svl.corp.google.com>

On Wed, Oct 25, 2017 at 03:49:21PM -0700, Greg Thelen wrote:
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > On Wed, Oct 25, 2017 at 09:00:57PM +0200, Michal Hocko wrote:
> >> On Wed 25-10-17 14:11:06, Johannes Weiner wrote:
> >> > "Safe" is a vague term, and it doesn't make much sense to me in this
> >> > situation. The OOM behavior should be predictable and consistent.
> >> > 
> >> > Yes, global might in the rarest cases also return -ENOMEM. Maybe. We
> >> > don't have to do that in memcg because we're not physically limited.
> >> 
> >> OK, so here seems to be the biggest disconnect. Being physically or
> >> artificially constrained shouldn't make much difference IMHO. In both
> >> cases the resource is simply limited for the consumer. And once all the
> >> attempts to fit within the limit fail then the request for the resource
> >> has to fail.
> >
> > It's a huge difference. In the global case, we have to make trade-offs
> > to not deadlock the kernel. In the memcg case, we have to make a trade
> > off between desirable OOM behavior and desirable meaning of memory.max.
> >
> > If we can borrow a resource temporarily from the ether to resolve the
> > OOM situation, I don't see why we shouldn't. We're only briefly
> > ignoring the limit to make sure the allocating task isn't preventing
> > the OOM victim from exiting or the OOM reaper from reaping. It's more
> > of an implementation detail than interface.
> >
> > The only scenario you brought up where this might be the permanent
> > overrun is the single, oom-disabled task. And I explained why that is
> > a silly argument, why that's the least problematic consequence of
> > oom-disabling, and why it probably shouldn't even be configurable.
> >
> > The idea that memory.max must never be breached is an extreme and
> > narrow view. As Greg points out, there are allocations we do not even
> > track. There are other scenarios that force allocations. They may
> > violate the limit on paper, but they're not notably weakening the goal
> > of memory.max - isolating workloads from each other.
> >
> > Let's look at it this way.
> >
> > There are two deadlock relationships the OOM killer needs to solve
> > between the triggerer and the potential OOM victim:
> >
> > 	#1 Memory. The triggerer needs memory that the victim has,
> > 	    but the victim needs some additional memory to release it.
> >
> > 	#2 Locks. The triggerer needs memory that the victim has, but
> > 	    the victim needs a lock the triggerer holds to release it.
> >
> > We have no qualms letting the victim temporarily (until the victim's
> > exit) ignore memory.max to resolve the memory deadlock #1.
> >
> > I don't understand why it's such a stretch to let the triggerer
> > temporarily (until the victim's exit) ignore memory.max to resolve the
> > locks deadlock #2. [1]
> >
> > We need both for the OOM killer to function correctly.
> >
> > We've solved #1 both for memcg and globally. But we haven't solved #2.
> > Global can still deadlock, and memcg copped out and returns -ENOMEM.
> >
> > Adding speculative OOM killing before the -ENOMEM makes things more
> > muddy and unpredictable. It doesn't actually solve deadlock #2.
> >
> > [1] And arguably that's what we should be doing in the global case
> >     too: give the triggerer access to reserves. If you recall this
> >     thread here: https://patchwork.kernel.org/patch/6088511/
> >
> >> > > So the only change I am really proposing is to keep retrying as long
> >> > > as the oom killer makes a forward progress and ENOMEM otherwise.
> >> > 
> >> > That's the behavior change I'm against.
> >> 
> >> So just to make it clear you would be OK with the retry on successful
> >> OOM killer invocation and force charge on oom failure, right?
> >
> > Yeah, that sounds reasonable to me.
> 
> Assuming we're talking about retrying within try_charge(), then there's
> a detail to iron out...
> 
> If there is a pending oom victim blocked on a lock held by try_charge() caller
> (the "#2 Locks" case), then I think repeated calls to out_of_memory() will
> return true until the victim either gets MMF_OOM_SKIP or disappears.  So a force
> charge fallback might be a needed even with oom killer successful invocations.
> Or we'll need to teach out_of_memory() to return three values (e.g. NO_VICTIM,
> NEW_VICTIM, PENDING_VICTIM) and try_charge() can loop on NEW_VICTIM.

True. I was assuming we'd retry MEM_CGROUP_RECLAIM_RETRIES times at a
maximum, even if the OOM killer indicates a kill has been issued. What
you propose makes sense too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-10-26 14:31 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 22:21 [PATCH] fs, mm: account filp and names caches to kmemcg Shakeel Butt
2017-10-05 22:21 ` Shakeel Butt
2017-10-06  7:59 ` Michal Hocko
2017-10-06  7:59   ` Michal Hocko
2017-10-06 19:33   ` Shakeel Butt
2017-10-06 19:33     ` Shakeel Butt
2017-10-09  6:24     ` Michal Hocko
2017-10-09  6:24       ` Michal Hocko
2017-10-09 17:52       ` Greg Thelen
2017-10-09 17:52         ` Greg Thelen
2017-10-09 18:04         ` Michal Hocko
2017-10-09 18:04           ` Michal Hocko
2017-10-09 18:17           ` Michal Hocko
2017-10-09 18:17             ` Michal Hocko
2017-10-10  9:10             ` Michal Hocko
2017-10-10  9:10               ` Michal Hocko
2017-10-10 22:21               ` Shakeel Butt
2017-10-10 22:21                 ` Shakeel Butt
2017-10-11  9:09                 ` Michal Hocko
2017-10-11  9:09                   ` Michal Hocko
2017-10-09 20:26         ` Johannes Weiner
2017-10-09 20:26           ` Johannes Weiner
2017-10-10  9:14           ` Michal Hocko
2017-10-10  9:14             ` Michal Hocko
2017-10-10 14:17             ` Johannes Weiner
2017-10-10 14:17               ` Johannes Weiner
2017-10-10 14:24               ` Michal Hocko
2017-10-10 14:24                 ` Michal Hocko
2017-10-12 19:03                 ` Johannes Weiner
2017-10-12 19:03                   ` Johannes Weiner
2017-10-12 23:57                   ` Greg Thelen
2017-10-12 23:57                     ` Greg Thelen
2017-10-13  6:51                     ` Michal Hocko
2017-10-13  6:51                       ` Michal Hocko
2017-10-13  6:35                   ` Michal Hocko
2017-10-13  6:35                     ` Michal Hocko
2017-10-13  7:00                     ` Michal Hocko
2017-10-13  7:00                       ` Michal Hocko
2017-10-13 15:24                       ` Michal Hocko
2017-10-13 15:24                         ` Michal Hocko
2017-10-24 12:18                         ` Michal Hocko
2017-10-24 12:18                           ` Michal Hocko
2017-10-24 17:54                           ` Johannes Weiner
2017-10-24 17:54                             ` Johannes Weiner
2017-10-24 16:06                         ` Johannes Weiner
2017-10-24 16:06                           ` Johannes Weiner
2017-10-24 16:22                           ` Michal Hocko
2017-10-24 16:22                             ` Michal Hocko
2017-10-24 17:23                             ` Johannes Weiner
2017-10-24 17:23                               ` Johannes Weiner
2017-10-24 17:55                               ` Michal Hocko
2017-10-24 17:55                                 ` Michal Hocko
2017-10-24 18:58                                 ` Johannes Weiner
2017-10-24 18:58                                   ` Johannes Weiner
2017-10-24 20:15                                   ` Michal Hocko
2017-10-24 20:15                                     ` Michal Hocko
2017-10-25  6:51                                     ` Greg Thelen
2017-10-25  6:51                                       ` Greg Thelen
2017-10-25  7:15                                       ` Michal Hocko
2017-10-25  7:15                                         ` Michal Hocko
2017-10-25 13:11                                         ` Johannes Weiner
2017-10-25 13:11                                           ` Johannes Weiner
2017-10-25 14:12                                           ` Michal Hocko
2017-10-25 14:12                                             ` Michal Hocko
2017-10-25 16:44                                             ` Johannes Weiner
2017-10-25 16:44                                               ` Johannes Weiner
2017-10-25 17:29                                               ` Michal Hocko
2017-10-25 17:29                                                 ` Michal Hocko
2017-10-25 18:11                                                 ` Johannes Weiner
2017-10-25 18:11                                                   ` Johannes Weiner
2017-10-25 19:00                                                   ` Michal Hocko
2017-10-25 19:00                                                     ` Michal Hocko
2017-10-25 21:13                                                     ` Johannes Weiner
2017-10-25 21:13                                                       ` Johannes Weiner
2017-10-25 22:49                                                       ` Greg Thelen
2017-10-25 22:49                                                         ` Greg Thelen
2017-10-26  7:49                                                         ` Michal Hocko
2017-10-26  7:49                                                           ` Michal Hocko
2017-10-26 12:45                                                           ` Tetsuo Handa
2017-10-26 12:45                                                             ` Tetsuo Handa
2017-10-26 14:31                                                         ` Johannes Weiner [this message]
2017-10-26 14:31                                                           ` Johannes Weiner
2017-10-26 19:56                                                           ` Greg Thelen
2017-10-26 19:56                                                             ` Greg Thelen
2017-10-27  8:20                                                             ` Michal Hocko
2017-10-27  8:20                                                               ` Michal Hocko
2017-10-27 20:50                                               ` Shakeel Butt
2017-10-27 20:50                                                 ` Shakeel Butt
2017-10-30  8:29                                                 ` Michal Hocko
2017-10-30  8:29                                                   ` Michal Hocko
2017-10-30 19:28                                                   ` Shakeel Butt
2017-10-30 19:28                                                     ` Shakeel Butt
2017-10-31  8:00                                                     ` Michal Hocko
2017-10-31  8:00                                                       ` Michal Hocko
2017-10-31 16:49                                                       ` Johannes Weiner
2017-10-31 16:49                                                         ` Johannes Weiner
2017-10-31 18:50                                                         ` Michal Hocko
2017-10-31 18:50                                                           ` Michal Hocko
2017-10-24 15:45                     ` Johannes Weiner
2017-10-24 15:45                       ` Johannes Weiner
2017-10-24 16:30                       ` Michal Hocko
2017-10-24 16:30                         ` Michal Hocko
2017-10-10 23:32 ` Al Viro
2017-10-10 23:32   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026143140.GB21147@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=gthelen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.