linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Roman Gushchin <guro@fb.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>, <linux-mm@kvack.org>,
	Rik van Riel <riel@surriel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Shakeel Butt <shakeelb@google.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	David Nellans <dnellans@nvidia.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64
Date: Tue, 8 Sep 2020 11:36:53 -0400	[thread overview]
Message-ID: <DD3CCCDB-55CA-4981-AF3D-14853C8855C6@nvidia.com> (raw)
In-Reply-To: <32e979c4-68c0-f309-d9d7-d274724bd23e@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4314 bytes --]

On 8 Sep 2020, at 10:22, David Hildenbrand wrote:

> On 08.09.20 16:05, Zi Yan wrote:
>> On 8 Sep 2020, at 7:57, David Hildenbrand wrote:
>>
>>> On 03.09.20 18:30, Roman Gushchin wrote:
>>>> On Thu, Sep 03, 2020 at 05:23:00PM +0300, Kirill A. Shutemov wrote:
>>>>> On Wed, Sep 02, 2020 at 02:06:12PM -0400, Zi Yan wrote:
>>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> This patchset adds support for 1GB THP on x86_64. It is on top of
>>>>>> v5.9-rc2-mmots-2020-08-25-21-13.
>>>>>>
>>>>>> 1GB THP is more flexible for reducing translation overhead and increasing the
>>>>>> performance of applications with large memory footprint without application
>>>>>> changes compared to hugetlb.
>>>>>
>>>>> This statement needs a lot of justification. I don't see 1GB THP as viable
>>>>> for any workload. Opportunistic 1GB allocation is very questionable
>>>>> strategy.
>>>>
>>>> Hello, Kirill!
>>>>
>>>> I share your skepticism about opportunistic 1 GB allocations, however it might be useful
>>>> if backed by an madvise() annotations from userspace application. In this case,
>>>> 1 GB THPs might be an alternative to 1 GB hugetlbfs pages, but with a more convenient
>>>> interface.
>>>
>>> I have concerns if we would silently use 1~GB THPs in most scenarios
>>> where be would have used 2~MB THP. I'd appreciate a trigger to
>>> explicitly enable that - MADV_HUGEPAGE is not sufficient because some
>>> applications relying on that assume that the THP size will be 2~MB
>>> (especially, if you want sparse, large VMAs).
>>
>> This patchset is not intended to silently use 1GB THP in place of 2MB THP.
>> First of all, there is a knob /sys/kernel/mm/transparent_hugepage/enable_1GB
>> to enable 1GB THP explicitly. Also, 1GB THP is allocated from a reserved CMA
>> region (although I had alloc_contig_pages as a fallback, which can be removed
>> in next version), so users need to add hugepage_cma=nG kernel parameter to
>> enable 1GB THP allocation. If a finer control is necessary, we can add
>> a new MADV_HUGEPAGE_1GB for 1GB THP.
>
> Thanks for the information - I would have loved to see important
> information like that (esp. how to use) in the cover letter.
>
> So what you propose is (excluding alloc_contig_pages()) really just
> automatically using (previously reserved) 1GB huge pages as 1GB THP
> instead of explicitly using them in an application using hugetlbfs.
> Still, not convinced how helpful that actually is - most certainly you
> really want a mechanism to control this per application (+ maybe make
> the application indicate actual ranges where it makes sense - but then
> you can directly modify the application to use hugetlbfs).
>
> I guess the interesting thing of this approach is that we can
> mix-and-match THP of differing granularity within a single mapping -
> whereby a hugetlbfs allocation would fail in case there isn't sufficient
> 1GB pages available. However, there are no guarantees for applications
> anymore (thinking about RT KVM and similar, we really want gigantic
> pages and cannot tolerate falling back to smaller granularity).

I agree that currently THP allocation does not provide a strong guarantee
like hugetlbfs, which can pre-allocate pages at boot time. For users like
RT KVM and such, pre-allocated hugetlb might be the only choice, since
allocating huge pages from CMA (either hugetlb or 1GB THP) would fail
if some pages are pinned and scattered in the CMA that could prevent
huge page allocation.

In other cases, if the user can tolerate fall backs but do not like the
unpredictable huge page formation outcome, we could add an madvise()
option like Michal suggested [1], so the user will know whether he gets
huge pages or not and can act accordingly.


> What are intended use cases/applications that could benefit? I doubt
> databases and virtualization are really a good fit - they know how to
> handle hugetlbfs just fine.

Romand and Jason have provided some use cases [2,3]

[1]https://lore.kernel.org/linux-mm/20200907072014.GD30144@dhcp22.suse.cz/
[2]https://lore.kernel.org/linux-mm/20200903162527.GF60440@carbon.dhcp.thefacebook.com/
[3]https://lore.kernel.org/linux-mm/20200903165051.GN24045@ziepe.ca/

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2020-09-08 15:37 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-02 18:06 [RFC PATCH 00/16] 1GB THP support on x86_64 Zi Yan
2020-09-02 18:06 ` [RFC PATCH 01/16] mm: add pagechain container for storing multiple pages Zi Yan
2020-09-02 20:29   ` Randy Dunlap
2020-09-02 20:48     ` Zi Yan
2020-09-03  3:15   ` Matthew Wilcox
2020-09-07 12:22   ` Kirill A. Shutemov
2020-09-07 15:11     ` Zi Yan
2020-09-09 13:46       ` Kirill A. Shutemov
2020-09-09 14:15         ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 02/16] mm: thp: 1GB anonymous page implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 03/16] mm: proc: add 1GB THP kpageflag Zi Yan
2020-09-09 13:46   ` Kirill A. Shutemov
2020-09-02 18:06 ` [RFC PATCH 04/16] mm: thp: 1GB THP copy on write implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 05/16] mm: thp: handling 1GB THP reference bit Zi Yan
2020-09-09 14:09   ` Kirill A. Shutemov
2020-09-09 14:36     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 06/16] mm: thp: add 1GB THP split_huge_pud_page() function Zi Yan
2020-09-09 14:18   ` Kirill A. Shutemov
2020-09-09 14:19     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 07/16] mm: stats: make smap stats understand PUD THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 08/16] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan
2020-09-02 18:06 ` [RFC PATCH 09/16] mm: thp: 1GB THP support in try_to_unmap() Zi Yan
2020-09-02 18:06 ` [RFC PATCH 10/16] mm: thp: split 1GB THPs at page reclaim Zi Yan
2020-09-02 18:06 ` [RFC PATCH 11/16] mm: thp: 1GB THP follow_p*d_page() support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 12/16] mm: support 1GB THP pagemap support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 13/16] mm: thp: add a knob to enable/disable 1GB THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 14/16] mm: page_alloc: >=MAX_ORDER pages allocation an deallocation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 15/16] hugetlb: cma: move cma reserve function to cma.c Zi Yan
2020-09-02 18:06 ` [RFC PATCH 16/16] mm: thp: use cma reservation for pud thp allocation Zi Yan
2020-09-02 18:40 ` [RFC PATCH 00/16] 1GB THP support on x86_64 Jason Gunthorpe
2020-09-02 18:45   ` Zi Yan
2020-09-02 18:48     ` Jason Gunthorpe
2020-09-02 19:05       ` Zi Yan
2020-09-02 19:57         ` Jason Gunthorpe
2020-09-02 20:29           ` Zi Yan
2020-09-03 16:40             ` Jason Gunthorpe
2020-09-03 16:55               ` Matthew Wilcox
2020-09-03 17:08                 ` Jason Gunthorpe
2020-09-03  7:32 ` Michal Hocko
2020-09-03 16:25   ` Roman Gushchin
2020-09-03 16:50     ` Jason Gunthorpe
2020-09-03 17:01       ` Matthew Wilcox
2020-09-03 17:18         ` Jason Gunthorpe
2020-09-03 20:57     ` Mike Kravetz
2020-09-03 21:06       ` Roman Gushchin
2020-09-04  7:42     ` Michal Hocko
2020-09-04 21:10       ` Roman Gushchin
2020-09-07  7:20         ` Michal Hocko
2020-09-08 15:09           ` Zi Yan
2020-09-08 19:58             ` Roman Gushchin
2020-09-09  4:01               ` John Hubbard
2020-09-09  7:15               ` Michal Hocko
2020-09-03 14:23 ` Kirill A. Shutemov
2020-09-03 16:30   ` Roman Gushchin
2020-09-08 11:57     ` David Hildenbrand
2020-09-08 14:05       ` Zi Yan
2020-09-08 14:22         ` David Hildenbrand
2020-09-08 15:36           ` Zi Yan [this message]
2020-09-08 14:27         ` Matthew Wilcox
2020-09-08 15:50           ` Zi Yan
2020-09-09 12:11           ` Jason Gunthorpe
2020-09-09 12:32             ` Matthew Wilcox
2020-09-09 13:14               ` Jason Gunthorpe
2020-09-09 13:27                 ` David Hildenbrand
2020-09-10 10:02                   ` William Kucharski
2020-09-08 14:35         ` Michal Hocko
2020-09-08 14:41           ` Rik van Riel
2020-09-08 15:02             ` David Hildenbrand
2020-09-09  7:04             ` Michal Hocko
2020-09-09 13:19               ` Rik van Riel
2020-09-09 13:43                 ` David Hildenbrand
2020-09-09 13:49                   ` Rik van Riel
2020-09-09 13:54                     ` David Hildenbrand
2020-09-10  7:32                   ` Michal Hocko
2020-09-10  8:27                     ` David Hildenbrand
2020-09-10 14:21                       ` Zi Yan
2020-09-10 14:34                         ` David Hildenbrand
2020-09-10 14:41                           ` Zi Yan
2020-09-10 15:15                             ` David Hildenbrand
2020-09-10 13:32                     ` Rik van Riel
2020-09-10 14:30                       ` Zi Yan
2020-09-09 13:59                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DD3CCCDB-55CA-4981-AF3D-14853C8855C6@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=david@redhat.com \
    --cc=dnellans@nvidia.com \
    --cc=guro@fb.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).