LKML Archive on lore.kernel.org
 help / color / Atom feed
From: David Rientjes <rientjes@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	mgorman@techsingularity.net, Michal Hocko <mhocko@kernel.org>,
	ying.huang@intel.com, s.priebe@profihost.ag,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
	Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression
Date: Fri, 14 Dec 2018 13:04:11 -0800 (PST)
Message-ID: <alpine.DEB.2.21.1812141244450.186427@chino.kir.corp.google.com> (raw)
In-Reply-To: <0bbf4202-6187-28fb-37b7-da6885b89cce@suse.cz>

On Wed, 12 Dec 2018, Vlastimil Babka wrote:

> > Regarding the role of direct reclaim in the allocator, I think we need 
> > work on the feedback from compaction to determine whether it's worthwhile.  
> > That's difficult because of the point I continue to bring up: 
> > isolate_freepages() is not necessarily always able to access this freed 
> > memory.
> 
> That's one of the *many* reasons why having free base pages doesn't
> guarantee compaction success. We can and will improve on that. But I
> don't think it would be e.g. practical to check the pfns of free pages
> wrt compaction scanner positions and decide based on that.

Yeah, agreed.  Rather than proposing that memory is only reclaimed if its 
known that it can be accessible to isolate_freepages(), I'm wondering 
about the implementation of the freeing scanner entirely.

In other words, I think there is a lot of potential stranding that occurs 
for both scanners that could otherwise result in completely free 
pageblocks.  If there a single movable page present near the end of the 
zone in an otherwise fully free pageblock, surely we can do better than 
the current implementation that would never consider this very easy to 
compact memory.

For hugepages, we don't care what pageblock we allocate from.  There are 
requirements for MAX_ORDER-1, but I assume we shouldn't optimize for these 
cases (and if CMA has requirements for a migration/freeing scanner 
redesign, I think that can be special cased).

The same problem occurs for the migration scanner where we can iterate 
over a ton of free memory that is never considered a suitable migration 
target.  The implementation that attempts to migrate all memory toward the 
end of the zone penalizes the freeing scanner when it is reset: we just 
iterate over a ton of used pages.

Reclaim likely could be deterministically useful if we consider a redesign 
of how migration sources and targets are determined in compaction.

Has anybody tried a migration scanner that isn't linearly based, rather 
finding the highest-order free page of the same migratetype, iterating the 
pages of its pageblock, and using this to determine whether the actual 
migration will be worthwhile or not?  I could imagine pageblock_skip being 
repurposed for this as the heuristic.

Finding migration targets would be more tricky, but if we iterate the 
pages of the pageblock for low-order free pages and find them to be mostly 
used, that seems more appropriate than just pushing all memory to the end 
of the zone?

It would be interesting to know if anybody has tried using the per-zone 
free_area's to determine migration targets and set a bit if it should be 
considered a migration source or a migration target.  If all pages for a 
pageblock are not on free_areas, they are fully used.

> > otherwise we fail and defer because it wasn't able 
> > to make a hugepage available.
> 
> Note that THP fault compaction doesn't actually defer itself, which I
> think is a weakness of the current implementation and hope that patch 3
> in my series from yesterday [1] can address that. Because defering is
> the general feedback mechanism that we have for suppressing compaction
> (and thus associated reclaim) in cases it fails for any reason, not just
> the one you mention. Instead of inspecting failure conditions in detail,
> which would be costly, it's a simple statistical approach. And when
> compaction is improved to fail less, defering automatically also happens
> less.
> 

I couldn't get the link to work, unfortunately, I don't think the patch 
series made it to LKML :/  I do see it archived for linux-mm, though, so 
I'll take a look, thanks!

> [1] https://lkml.kernel.org/r/20181211142941.20500-1-vbabka@suse.cz
> 

  reply index

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  6:25 kernel test robot
2018-11-27 17:08 ` Linus Torvalds
2018-11-27 18:17   ` Michal Hocko
2018-11-27 18:21     ` Michal Hocko
2018-11-27 19:05   ` Vlastimil Babka
2018-11-27 19:16     ` Vlastimil Babka
2018-11-27 20:57   ` Andrea Arcangeli
2018-11-27 22:50     ` Linus Torvalds
2018-11-28  6:30       ` Michal Hocko
2018-11-28  3:20     ` Huang\, Ying
2018-11-28 16:48       ` Linus Torvalds
2018-11-28 18:39         ` Andrea Arcangeli
2018-11-28 23:10         ` David Rientjes
2018-12-03 18:01         ` Linus Torvalds
2018-12-03 18:14           ` Michal Hocko
2018-12-03 18:19             ` Linus Torvalds
2018-12-03 18:30               ` Michal Hocko
2018-12-03 18:45                 ` Linus Torvalds
2018-12-03 18:59                   ` Michal Hocko
2018-12-03 19:23                     ` Andrea Arcangeli
2018-12-03 20:26                       ` David Rientjes
2018-12-03 19:28                     ` Linus Torvalds
2018-12-03 20:12                       ` Andrea Arcangeli
2018-12-03 20:36                         ` David Rientjes
2018-12-03 22:04                         ` Linus Torvalds
2018-12-03 22:27                           ` Linus Torvalds
2018-12-03 22:57                             ` David Rientjes
2018-12-04  9:22                             ` Vlastimil Babka
2018-12-04 10:45                               ` Mel Gorman
2018-12-05  0:47                                 ` David Rientjes
2018-12-05  9:08                                   ` Michal Hocko
2018-12-05 10:43                                     ` Mel Gorman
2018-12-05 11:43                                       ` Michal Hocko
2018-12-05 10:06                                 ` Mel Gorman
2018-12-05 20:40                                 ` Andrea Arcangeli
2018-12-05 21:59                                   ` David Rientjes
2018-12-06  0:00                                     ` Andrea Arcangeli
2018-12-05 22:03                                   ` Linus Torvalds
2018-12-05 22:12                                     ` David Rientjes
2018-12-05 23:36                                     ` Andrea Arcangeli
2018-12-05 23:51                                       ` Linus Torvalds
2018-12-06  0:58                                         ` Linus Torvalds
2018-12-06  9:14                                           ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06 23:49                                             ` David Rientjes
2018-12-07  7:34                                               ` Michal Hocko
2018-12-07  4:31                                             ` Linus Torvalds
2018-12-07  7:49                                               ` Michal Hocko
2018-12-07  9:06                                                 ` Vlastimil Babka
2018-12-07 23:15                                                   ` David Rientjes
2018-12-06 23:43                                           ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-07  4:01                                             ` Linus Torvalds
2018-12-10  0:29                                               ` David Rientjes
2018-12-10  4:49                                                 ` Andrea Arcangeli
2018-12-12  0:37                                                   ` David Rientjes
2018-12-12  9:50                                                     ` Michal Hocko
2018-12-12 17:00                                                       ` Andrea Arcangeli
2018-12-14 11:32                                                         ` Michal Hocko
2018-12-12 10:14                                                     ` Vlastimil Babka
2018-12-14 21:04                                                       ` David Rientjes [this message]
2018-12-14 21:33                                                         ` Vlastimil Babka
2018-12-21 22:18                                                           ` David Rientjes
2018-12-22 12:08                                                             ` Mel Gorman
2018-12-14 23:11                                                         ` Mel Gorman
2018-12-21 22:15                                                           ` David Rientjes
2018-12-12 10:44                                                   ` Andrea Arcangeli
2019-04-15 11:48                                             ` Michal Hocko
2018-12-06  0:18                                       ` David Rientjes
2018-12-06  0:54                                         ` Andrea Arcangeli
2018-12-06  9:23                                           ` Vlastimil Babka
2018-12-03 20:39                     ` David Rientjes
2018-12-03 21:25                       ` Michal Hocko
2018-12-03 21:53                         ` David Rientjes
2018-12-04  8:48                           ` Michal Hocko
2018-12-05  0:07                             ` David Rientjes
2018-12-05 10:18                               ` Michal Hocko
2018-12-05 19:16                                 ` David Rientjes
2018-11-27  7:23 kernel test robot

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1812141244450.186427@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox