linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: Rik van Riel <riel@surriel.com>, Zi Yan <ziy@nvidia.com>,
	Roman Gushchin <guro@fb.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-mm@kvack.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Shakeel Butt <shakeelb@google.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	David Nellans <dnellans@nvidia.com>,
	linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64
Date: Thu, 10 Sep 2020 09:32:13 +0200	[thread overview]
Message-ID: <20200910073213.GC28354@dhcp22.suse.cz> (raw)
In-Reply-To: <6135d2c5-2a74-6ca8-4b3b-8ceb25c0d4b1@redhat.com>

[Cc Vlastimil and Mel - the whole email thread starts
 http://lkml.kernel.org/r/20200902180628.4052244-1-zi.yan@sent.com
 but this particular subthread has diverged a bit and you might find it
 interesting]

On Wed 09-09-20 15:43:55, David Hildenbrand wrote:
> On 09.09.20 15:19, Rik van Riel wrote:
> > On Wed, 2020-09-09 at 09:04 +0200, Michal Hocko wrote:
> >> On Tue 08-09-20 10:41:10, Rik van Riel wrote:
> >>> On Tue, 2020-09-08 at 16:35 +0200, Michal Hocko wrote:
> >>>
> >>>> A global knob is insufficient. 1G pages will become a very
> >>>> precious
> >>>> resource as it requires a pre-allocation (reservation). So it
> >>>> really
> >>>> has
> >>>> to be an opt-in and the question is whether there is also some
> >>>> sort
> >>>> of
> >>>> access control needed.
> >>>
> >>> The 1GB pages do not require that much in the way of
> >>> pre-allocation. The memory can be obtained through CMA,
> >>> which means it can be used for movable 4kB and 2MB
> >>> allocations when not
> >>> being used for 1GB pages.
> >>
> >> That CMA has to be pre-reserved, right? That requires a
> >> configuration.
> > 
> > To some extent, yes.
> > 
> > However, because that pool can be used for movable
> > 4kB and 2MB
> > pages as well as for 1GB pages, it would be easy to just set
> > the size of that pool to eg. 1/3 or even 1/2 of memory for every
> > system.
> > 
> > It isn't like the pool needs to be the exact right size. We
> > just need to avoid the "highmem problem" of having too little
> > memory for kernel allocations.
> > 
> 
> I am not sure I like the trend towards CMA that we are seeing, reserving
> huge buffers for specific users (and eventually even doing it
> automatically).
> 
> What we actually want is ZONE_MOVABLE with relaxed guarantees, such that
> anybody who requires large, unmovable allocations can use it.
> 
> I once played with the idea of having ZONE_PREFER_MOVABLE, which
> a) Is the primary choice for movable allocations
> b) Is allowed to contain unmovable allocations (esp., gigantic pages)
> c) Is the fallback for ZONE_NORMAL for unmovable allocations, instead of
> running out of memory

I might be missing something but how can this work longterm? Or put in
another words why would this work any better than existing fragmentation
avoidance techniques that page allocator implements already - movability
grouping etc. Please note that I am not deeply familiar with those but
my high level understanding is that we already try hard to not mix
movable and unmovable objects in same page blocks as much as we can.

My suspicion is that a separate zone would work in a similar fashion. As
long as there is a lot of free memory then zone will be effectively
MOVABLE. Similar applies to normal zone when unmovable allocations are
in minority. As long as the Normal zone gets full of unmovable objects
they start overflowing to ZONE_PREFER_MOVABLE and it will resemble page
block stealing when unmovable objects start spreading over movable page
blocks.

Again, my level of expertise to page allocator is quite low so all the
above might be simply wrong...

> If someone messes up the zone ratio, issues known from zone imbalances
> are avoided - large allocations simply become less likely to succeed. In
> contrast to ZONE_MOVABLE, memory offlining is not guaranteed to work.
-- 
Michal Hocko
SUSE Labs


  parent reply	other threads:[~2020-09-10  7:32 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-02 18:06 [RFC PATCH 00/16] 1GB THP support on x86_64 Zi Yan
2020-09-02 18:06 ` [RFC PATCH 01/16] mm: add pagechain container for storing multiple pages Zi Yan
2020-09-02 20:29   ` Randy Dunlap
2020-09-02 20:48     ` Zi Yan
2020-09-03  3:15   ` Matthew Wilcox
2020-09-07 12:22   ` Kirill A. Shutemov
2020-09-07 15:11     ` Zi Yan
2020-09-09 13:46       ` Kirill A. Shutemov
2020-09-09 14:15         ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 02/16] mm: thp: 1GB anonymous page implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 03/16] mm: proc: add 1GB THP kpageflag Zi Yan
2020-09-09 13:46   ` Kirill A. Shutemov
2020-09-02 18:06 ` [RFC PATCH 04/16] mm: thp: 1GB THP copy on write implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 05/16] mm: thp: handling 1GB THP reference bit Zi Yan
2020-09-09 14:09   ` Kirill A. Shutemov
2020-09-09 14:36     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 06/16] mm: thp: add 1GB THP split_huge_pud_page() function Zi Yan
2020-09-09 14:18   ` Kirill A. Shutemov
2020-09-09 14:19     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 07/16] mm: stats: make smap stats understand PUD THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 08/16] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan
2020-09-02 18:06 ` [RFC PATCH 09/16] mm: thp: 1GB THP support in try_to_unmap() Zi Yan
2020-09-02 18:06 ` [RFC PATCH 10/16] mm: thp: split 1GB THPs at page reclaim Zi Yan
2020-09-02 18:06 ` [RFC PATCH 11/16] mm: thp: 1GB THP follow_p*d_page() support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 12/16] mm: support 1GB THP pagemap support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 13/16] mm: thp: add a knob to enable/disable 1GB THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 14/16] mm: page_alloc: >=MAX_ORDER pages allocation an deallocation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 15/16] hugetlb: cma: move cma reserve function to cma.c Zi Yan
2020-09-02 18:06 ` [RFC PATCH 16/16] mm: thp: use cma reservation for pud thp allocation Zi Yan
2020-09-02 18:40 ` [RFC PATCH 00/16] 1GB THP support on x86_64 Jason Gunthorpe
2020-09-02 18:45   ` Zi Yan
2020-09-02 18:48     ` Jason Gunthorpe
2020-09-02 19:05       ` Zi Yan
2020-09-02 19:57         ` Jason Gunthorpe
2020-09-02 20:29           ` Zi Yan
2020-09-03 16:40             ` Jason Gunthorpe
2020-09-03 16:55               ` Matthew Wilcox
2020-09-03 17:08                 ` Jason Gunthorpe
2020-09-03  7:32 ` Michal Hocko
2020-09-03 16:25   ` Roman Gushchin
2020-09-03 16:50     ` Jason Gunthorpe
2020-09-03 17:01       ` Matthew Wilcox
2020-09-03 17:18         ` Jason Gunthorpe
2020-09-03 20:57     ` Mike Kravetz
2020-09-03 21:06       ` Roman Gushchin
2020-09-04  7:42     ` Michal Hocko
2020-09-04 21:10       ` Roman Gushchin
2020-09-07  7:20         ` Michal Hocko
2020-09-08 15:09           ` Zi Yan
2020-09-08 19:58             ` Roman Gushchin
2020-09-09  4:01               ` John Hubbard
2020-09-09  7:15               ` Michal Hocko
2020-09-03 14:23 ` Kirill A. Shutemov
2020-09-03 16:30   ` Roman Gushchin
2020-09-08 11:57     ` David Hildenbrand
2020-09-08 14:05       ` Zi Yan
2020-09-08 14:22         ` David Hildenbrand
2020-09-08 15:36           ` Zi Yan
2020-09-08 14:27         ` Matthew Wilcox
2020-09-08 15:50           ` Zi Yan
2020-09-09 12:11           ` Jason Gunthorpe
2020-09-09 12:32             ` Matthew Wilcox
2020-09-09 13:14               ` Jason Gunthorpe
2020-09-09 13:27                 ` David Hildenbrand
2020-09-10 10:02                   ` William Kucharski
2020-09-08 14:35         ` Michal Hocko
2020-09-08 14:41           ` Rik van Riel
2020-09-08 15:02             ` David Hildenbrand
2020-09-09  7:04             ` Michal Hocko
2020-09-09 13:19               ` Rik van Riel
2020-09-09 13:43                 ` David Hildenbrand
2020-09-09 13:49                   ` Rik van Riel
2020-09-09 13:54                     ` David Hildenbrand
2020-09-10  7:32                   ` Michal Hocko [this message]
2020-09-10  8:27                     ` David Hildenbrand
2020-09-10 14:21                       ` Zi Yan
2020-09-10 14:34                         ` David Hildenbrand
2020-09-10 14:41                           ` Zi Yan
2020-09-10 15:15                             ` David Hildenbrand
2020-09-10 13:32                     ` Rik van Riel
2020-09-10 14:30                       ` Zi Yan
2020-09-09 13:59                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200910073213.GC28354@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=david@redhat.com \
    --cc=dnellans@nvidia.com \
    --cc=guro@fb.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).