linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joonsoo Kim <js1304@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	greg@suse.cz, Linus Torvalds <torvalds@linux-foundation.org>,
	Markus Trippelsdorf <markus@trippelsdorf.de>,
	Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>,
	Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>,
	Jiri Slaby <jslaby@suse.com>, Olaf Hering <olaf@aepfle.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: OOM detection regressions since 4.7
Date: Wed, 24 Aug 2016 16:29:20 +0900	[thread overview]
Message-ID: <CAAmzW4MTbBbo54op_9sZ1kE9XPBsvsx=n=_vRXa21KtuFMiJ3w@mail.gmail.com> (raw)
In-Reply-To: <20160824070442.GB31179@dhcp22.suse.cz>

2016-08-24 16:04 GMT+09:00 Michal Hocko <mhocko@kernel.org>:
> On Wed 24-08-16 14:01:57, Joonsoo Kim wrote:
>> Looks like my mail client eat my reply so I resend.
>>
>> On Tue, Aug 23, 2016 at 09:33:18AM +0200, Michal Hocko wrote:
>> > On Tue 23-08-16 13:52:45, Joonsoo Kim wrote:
>> > [...]
>> > > Hello, Michal.
>> > >
>> > > I agree with partial revert but revert should be a different form.
>> > > Below change try to reuse should_compact_retry() version for
>> > > !CONFIG_COMPACTION but it turned out that it also causes regression in
>> > > Markus report [1].
>> >
>> > I would argue that CONFIG_COMPACTION=n behaves so arbitrary for high
>> > order workloads that calling any change in that behavior a regression
>> > is little bit exaggerated. Disabling compaction should have a very
>> > strong reason. I haven't heard any so far. I am even wondering whether
>> > there is a legitimate reason for that these days.
>> >
>> > > Theoretical reason for this regression is that it would stop retry
>> > > even if there are enough lru pages. It only checks if freepage
>> > > excesses min watermark or not for retry decision. To prevent
>> > > pre-mature OOM killer, we need to keep allocation loop when there are
>> > > enough lru pages. So, logic should be something like that.
>> > >
>> > > should_compact_retry()
>> > > {
>> > >         for_each_zone_zonelist_nodemask {
>> > >                 available = zone_reclaimable_pages(zone);
>> > >                 available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
>> > >                 if (__zone_watermark_ok(zone, *0*, min_wmark_pages(zone),
>> > >                         ac_classzone_idx(ac), alloc_flags, available))
>> > >                         return true;
>> > >
>> > >         }
>> > > }
>> > >
>> > > I suggested it before and current situation looks like it is indeed
>> > > needed.
>> >
>> > this just opens doors for an unbounded reclaim/threshing becacause
>> > you can reclaim as much as you like and there is no guarantee of a
>> > forward progress. The reason why !COMPACTION should_compact_retry only
>> > checks for the min_wmark without the reclaimable bias is that this will
>> > guarantee a retry if we are failing due to high order wmark check rather
>> > than a lack of memory. This condition is guaranteed to converge and the
>> > probability of the unbounded reclaim is much more reduced.
>>
>> In case of a lack of memory with a lot of reclaimable lru pages, why
>> do we stop reclaim/compaction?
>>
>> With your partial reverting patch, allocation logic would be like as
>> following.
>>
>> Assume following situation:
>> o a lot of reclaimable lru pages
>> o no order-2 freepage
>> o not enough order-0 freepage for min watermark
>> o order-2 allocation
>>
>> 1. order-2 allocation failed due to min watermark
>> 2. go to reclaim/compaction
>> 3. reclaim some pages (maybe SWAP_CLUSTER_MAX (32) pages) but still
>> min watermark isn't met for order-0
>> 4. compaction is skipped due to not enough freepage
>> 5. should_reclaim_retry() returns false because min watermark for
>> order-2 page isn't met
>> 6. should_compact_retry() returns false because min watermark for
>> order-0 page isn't met
>> 6. allocation is failed without any retry and OOM is invoked.
>
> If the direct reclaim is not able to get us over min wmark for order-0
> then we would be likely to hit the oom even for order-0 requests.

No, this situation is that direct reclaim can get us over min wmark for order-0
but it needs retry. IIUC, direct reclaim would not reclaim enough memory
at once. It tries to reclaim small amount of lru pages and break out to check
watermark.

>> Is it what you want?
>>
>> And, please elaborate more on how your logic guarantee to converge.
>> After order-0 freepage exceed min watermark, there is no way to stop
>> reclaim/threshing. Number of freepage just increase monotonically and
>> retry cannot be stopped until order-2 allocation succeed. Am I missing
>> something?
>
> My statement was imprecise at best. You are right that there is no
> guarantee to fullfil order-2 request. What I meant to say is that we
> should converge when we are getting out of memory (aka even order-0
> would have hard time to succeed). should_reclaim_retry does that by
> the back off scaling of the reclaimable pages. should_compact_retry
> would have to do the same thing which would effectively turn it into
> should_reclaim_retry.

So, I suggested to change should_reclaim_retry() for high order request,
before.

>> > > And, I still think that your OOM detection rework has some flaws.
>> > >
>> > > 1) It doesn't consider freeable objects that can be freed by shrink_slab().
>> > > There are many subsystems that cache many objects and they will be
>> > > freed by shrink_slab() interface. But, you don't account them when
>> > > making the OOM decision.
>> >
>> > I fully rely on the reclaim and compaction feedback. And that is the
>> > place where we should strive for improvements. So if we are growing way
>> > too many slab objects we should take care about that in the slab reclaim
>> > which is tightly coupled with the LRU reclaim rather than up the layer
>> > in the page allocator.
>>
>> No. slab shrink logic which is tightly coupled with the LRU reclaim
>> totally makes sense.
>
> Once the number of slab object is much larger than LRU pages (what we
> have seen in some oom reports) then the way how they are coupled just
> stops making a sense because the current approach no longer scales.  We
> might not have cared before because we used to retry blindly.  At least
> that is my understanding.

If your logic guarantee to retry until number of lru pages are scanned,
it would work well. It's not a problem of slab shrink.

> I am sorry to skip large parts of your email but I believe those things
> have been discussed and we would just repeat here. I full understand

Okay. We discussed it several times and I'm also tired to discuss this topic.

Thanks.

      reply	other threads:[~2016-08-24  7:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-22  9:32 OOM detection regressions since 4.7 Michal Hocko
2016-08-22  9:37 ` Michal Hocko
2016-08-22 10:05   ` Greg KH
2016-08-22 10:54     ` Michal Hocko
2016-08-22 13:31       ` Greg KH
2016-08-22 13:42         ` Michal Hocko
2016-08-22 14:02           ` Greg KH
2016-08-22 22:05           ` Andrew Morton
2016-08-23  7:43             ` Michal Hocko
2016-08-25  7:11               ` Michal Hocko
2016-08-25  7:17                 ` Olaf Hering
2016-08-29 14:52                   ` Olaf Hering
2016-08-29 14:54                     ` Olaf Hering
2016-08-29 15:07                     ` Michal Hocko
2016-08-29 15:59                       ` Olaf Hering
2016-08-29 17:28                     ` Linus Torvalds
2016-08-29 17:52                       ` Jeff Layton
2016-08-28  5:50                 ` Arkadiusz Miskiewicz
2016-08-25 20:30               ` Ralf-Peter Rohbeck
2016-08-26  6:26                 ` Michal Hocko
2016-08-26 20:17                   ` Ralf-Peter Rohbeck
2016-08-22 10:16 ` Markus Trippelsdorf
2016-08-22 10:56   ` Michal Hocko
2016-08-22 11:01     ` Markus Trippelsdorf
2016-08-22 11:13       ` Michal Hocko
2016-08-22 11:20         ` Markus Trippelsdorf
2016-08-23  4:52 ` Joonsoo Kim
2016-08-23  7:33   ` Michal Hocko
2016-08-23  7:40     ` Markus Trippelsdorf
2016-08-23  7:48       ` Michal Hocko
2016-08-23 19:08     ` Linus Torvalds
2016-08-24  6:32       ` Michal Hocko
2016-08-24  5:01     ` Joonsoo Kim
2016-08-24  7:04       ` Michal Hocko
2016-08-24  7:29         ` Joonsoo Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAmzW4MTbBbo54op_9sZ1kE9XPBsvsx=n=_vRXa21KtuFMiJ3w@mail.gmail.com' \
    --to=js1304@gmail.com \
    --cc=Ralf-Peter.Rohbeck@quantum.com \
    --cc=a.miskiewicz@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=greg@suse.cz \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jslaby@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=markus@trippelsdorf.de \
    --cc=mhocko@kernel.org \
    --cc=olaf@aepfle.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).