LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>
Cc: mhocko@kernel.org, ying.huang@intel.com, s.priebe@profihost.ag,
	mgorman@techsingularity.net,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org,
	David Rientjes <rientjes@google.com>,
	kirill@shutemov.name, Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Tue, 4 Dec 2018 10:22:26 +0100
Message-ID: <64a4aec6-3275-a716-8345-f021f6186d9b@suse.cz> (raw)
In-Reply-To: <CAHk-=whDg5+e2-eXYo-jwC1spt2r7JjLQSaLm4OyfGMQHLTrdw@mail.gmail.com>

On 12/3/18 11:27 PM, Linus Torvalds wrote:
> On Mon, Dec 3, 2018 at 2:04 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> so I think all of David's patch is somewhat sensible, even if that
>> specific "order == pageblock_order" test really looks like it might
>> want to be clarified.
> 
> Side note: I think maybe people should just look at that whole
> compaction logic for that block, because it doesn't make much sense to
> me:
> 
>                 /*
>                  * Checks for costly allocations with __GFP_NORETRY, which
>                  * includes THP page fault allocations
>                  */
>                 if (costly_order && (gfp_mask & __GFP_NORETRY)) {
>                         /*
>                          * If compaction is deferred for high-order allocations,
>                          * it is because sync compaction recently failed. If
>                          * this is the case and the caller requested a THP
>                          * allocation, we do not want to heavily disrupt the
>                          * system, so we fail the allocation instead of entering
>                          * direct reclaim.
>                          */
>                         if (compact_result == COMPACT_DEFERRED)
>                                 goto nopage;
> 
>                         /*
>                          * Looks like reclaim/compaction is worth trying, but
>                          * sync compaction could be very expensive, so keep
>                          * using async compaction.
>                          */
>                         compact_priority = INIT_COMPACT_PRIORITY;
>                 }
> 
> this is where David wants to add *his* odd test, and I think everybody
> looks at that added case
> 
> +                       if (order == pageblock_order &&
> +                                       !(current->flags & PF_KTHREAD))
> +                               goto nopage;
> 
> and just goes "Eww".
> 
> But I think the real problem is that it's the "goto nopage" thing that
> makes _sense_, and the current cases for "let's try compaction" that

More precisely it's "let's try reclaim + compaction".

> are the odd ones, and then David adds one new special case for the
> sensible behavior.
> 
> For example, why would COMPACT_DEFERRED mean "don't bother", but not
> all the other reasons it didn't really make sense?

COMPACT_DEFERRED means that compaction was failing recently, even with
sufficient free pages (e.g. freed by direct reclaim), so it doesn't make
sense to continue.
What are "all the other reasons"? __alloc_pages_direct_compact() could
have also returned COMPACT_SKIPPED, which means compaction actually
didn't happen at all, because there's not enough free pages.

> So does it really make sense to fall through AT ALL to that "retry"
> case, when we explicitly already had (gfp_mask & __GFP_NORETRY)?

Well if there was no free memory to begin with, and thus compaction
returned COMPACT_SKIPPED, then we didn't really "try" anything yet, so
there's nothing to "not retry".

> Maybe the real fix is to instead of adding yet another special case
> for "goto nopage", it should just be unconditional: simply don't try
> to compact large-pages if __GFP_NORETRY was set.

I think that would destroy THP success rates too much, in situations
where reclaim and compaction would succeed, because there's enough
easily reclaimable and migratable memory.

> Hmm? I dunno. Right now - for 4.20, I'd obviously want to keep changes
> smallish, so a hacky added special case might be the right thing to
> do. But the code does look odd, doesn't it?
> 
> I think part of it comes from the fact that we *used* to do the
> compaction first, and then we did the reclaim, and then it was
> re-orghanized to do reclaim first, but it tried to keep semantic
> changes minimal and some of the above comes from that re-org.

IIRC the point of reorg was that in typical case we actually do want to
try the reclaim first (or only), and the exception are those THP-ish
allocations where typically the problem is fragmentation, and not number
of free pages, so we check first if we can defragment the memory or
whether it makes sense to free pages in case the defragmentation is
expected to help afterwards. It seemed better to put this special case
out of the main reclaim/compaction retry-with-increasing-priority loop
for non-costly-order allocations that in general can't fail.

Vlastimil

> I think.
> 
>                 Linus
> 


  parent reply index

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  6:25 kernel test robot
2018-11-27 17:08 ` Linus Torvalds
2018-11-27 18:17   ` Michal Hocko
2018-11-27 18:21     ` Michal Hocko
2018-11-27 19:05   ` Vlastimil Babka
2018-11-27 19:16     ` Vlastimil Babka
2018-11-27 20:57   ` Andrea Arcangeli
2018-11-27 22:50     ` Linus Torvalds
2018-11-28  6:30       ` Michal Hocko
2018-11-28  3:20     ` Huang\, Ying
2018-11-28 16:48       ` Linus Torvalds
2018-11-28 18:39         ` Andrea Arcangeli
2018-11-28 23:10         ` David Rientjes
2018-12-03 18:01         ` Linus Torvalds
2018-12-03 18:14           ` Michal Hocko
2018-12-03 18:19             ` Linus Torvalds
2018-12-03 18:30               ` Michal Hocko
2018-12-03 18:45                 ` Linus Torvalds
2018-12-03 18:59                   ` Michal Hocko
2018-12-03 19:23                     ` Andrea Arcangeli
2018-12-03 20:26                       ` David Rientjes
2018-12-03 19:28                     ` Linus Torvalds
2018-12-03 20:12                       ` Andrea Arcangeli
2018-12-03 20:36                         ` David Rientjes
2018-12-03 22:04                         ` Linus Torvalds
2018-12-03 22:27                           ` Linus Torvalds
2018-12-03 22:57                             ` David Rientjes
2018-12-04  9:22                             ` Vlastimil Babka [this message]
2018-12-04 10:45                               ` Mel Gorman
2018-12-05  0:47                                 ` David Rientjes
2018-12-05  9:08                                   ` Michal Hocko
2018-12-05 10:43                                     ` Mel Gorman
2018-12-05 11:43                                       ` Michal Hocko
2018-12-05 10:06                                 ` Mel Gorman
2018-12-05 20:40                                 ` Andrea Arcangeli
2018-12-05 21:59                                   ` David Rientjes
2018-12-06  0:00                                     ` Andrea Arcangeli
2018-12-05 22:03                                   ` Linus Torvalds
2018-12-05 22:12                                     ` David Rientjes
2018-12-05 23:36                                     ` Andrea Arcangeli
2018-12-05 23:51                                       ` Linus Torvalds
2018-12-06  0:58                                         ` Linus Torvalds
2018-12-06  9:14                                           ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06 23:49                                             ` David Rientjes
2018-12-07  7:34                                               ` Michal Hocko
2018-12-07  4:31                                             ` Linus Torvalds
2018-12-07  7:49                                               ` Michal Hocko
2018-12-07  9:06                                                 ` Vlastimil Babka
2018-12-07 23:15                                                   ` David Rientjes
2018-12-06 23:43                                           ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-07  4:01                                             ` Linus Torvalds
2018-12-10  0:29                                               ` David Rientjes
2018-12-10  4:49                                                 ` Andrea Arcangeli
2018-12-12  0:37                                                   ` David Rientjes
2018-12-12  9:50                                                     ` Michal Hocko
2018-12-12 17:00                                                       ` Andrea Arcangeli
2018-12-14 11:32                                                         ` Michal Hocko
2018-12-12 10:14                                                     ` Vlastimil Babka
2018-12-14 21:04                                                       ` David Rientjes
2018-12-14 21:33                                                         ` Vlastimil Babka
2018-12-21 22:18                                                           ` David Rientjes
2018-12-22 12:08                                                             ` Mel Gorman
2018-12-14 23:11                                                         ` Mel Gorman
2018-12-21 22:15                                                           ` David Rientjes
2018-12-12 10:44                                                   ` Andrea Arcangeli
2019-04-15 11:48                                             ` Michal Hocko
2018-12-06  0:18                                       ` David Rientjes
2018-12-06  0:54                                         ` Andrea Arcangeli
2018-12-06  9:23                                           ` Vlastimil Babka
2018-12-03 20:39                     ` David Rientjes
2018-12-03 21:25                       ` Michal Hocko
2018-12-03 21:53                         ` David Rientjes
2018-12-04  8:48                           ` Michal Hocko
2018-12-05  0:07                             ` David Rientjes
2018-12-05 10:18                               ` Michal Hocko
2018-12-05 19:16                                 ` David Rientjes
2018-11-27  7:23 kernel test robot

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64a4aec6-3275-a716-8345-f021f6186d9b@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox