All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Shakeel Butt <shakeelb@google.com>,
	Yang Shi <shy828301@gmail.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64
Date: Fri, 2 Oct 2020 09:32:05 +0200	[thread overview]
Message-ID: <20201002073205.GC20872@dhcp22.suse.cz> (raw)
In-Reply-To: <73394A41-16D8-431C-9E48-B14D44F045F8@nvidia.com>

On Thu 01-10-20 11:14:14, Zi Yan wrote:
> On 30 Sep 2020, at 7:55, Michal Hocko wrote:
> 
> > On Mon 28-09-20 13:53:58, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> Hi all,
> >>
> >> This patchset adds support for 1GB PUD THP on x86_64. It is on top of
> >> v5.9-rc5-mmots-2020-09-18-21-23. It is also available at:
> >> https://github.com/x-y-z/linux-1gb-thp/tree/1gb_thp_v5.9-rc5-mmots-2020-09-18-21-23
> >>
> >> Other than PUD THP, we had some discussion on generating THPs and contiguous
> >> physical memory via a synchronous system call [0]. I am planning to send out a
> >> separate patchset on it later, since I feel that it can be done independently of
> >> PUD THP support.
> >
> > While the technical challenges for the kernel implementation can be
> > discussed before the user API is decided I believe we cannot simply add
> > something now and then decide about a proper interface. I have raised
> > few basic questions we should should find answers for before the any
> > interface is added. Let me copy them here for easier reference
> Sure. Thank you for doing this.
> 
> For this new interface, I think it should generate THPs out of populated
> memory regions synchronously. It would be complement to khugepaged, which
> generate THPs asynchronously on the background.
> 
> > - THP allocation time - #PF and/or madvise context
> I am not sure this is relevant, since the new interface is supposed to
> operate on populated memory regions. For THP allocation, madvise and
> the options from /sys/kernel/mm/transparent_hugepage/defrag should give
> enough choices to users.

OK, so no #PF, this makes things easier.

> > - lazy/sync instantiation
> 
> I would say the new interface only does sync instantiation. madvise has
> provided the lazy instantiation option by adding MADV_HUGEPAGE to populated
> memory regions and letting khugepaged generate THPs from them.

OK

> > - huge page sizes controllable by the userspace?
> 
> It might be good to allow advanced users to choose the page sizes, so they
> have better control of their applications.

Could you elaborate more? Those advanced users can use hugetlb, right?
They get a very good control over page size and pool preallocation etc.
So they can get what they need - assuming there is enough memory.

> For normal users, we can provide
> best-effort service. Different options can be provided for these two cases.

Do we really need two sync mechanisms to compact physical memory? This
adds an API complexity because it has to cover all possible huge pages
and that can be a large set of sizes. We already have that choice for
hugetlb mmap interface but that is needed to cover all existing setups.
I would argue this doesn't make the API particurarly easy to use.

> The new interface might want to inform user how many THPs are generated
> after the call for them to decide what to do with the memory region.

Why would that be useful? /proc/<pid>/smaps should give a good picture
already, right?

> > - aggressiveness - how hard to try
> 
> The new interface would try as hard as it can, since I assume users really
> want THPs when they use this interface.
> 
> > - internal fragmentation - allow to create THPs on sparsely or unpopulated
> >   ranges
> 
> The new interface would only operate on populated memory regions. MAP_POPULATE
> like option can be added if necessary.

OK, so initialy you do not want to populate more memory. How do you
envision a future extension to provide such a functionality. A different
API, modification to existing?

> > - do we need some sort of access control or privilege check as some THPs
> >   would be a really scarce (like those that require pre-reservation).
> 
> It seems too much to me. I suppose if we provide page size options to users
> when generating THPs, users apps could coordinate themselves. BTW, do we have
> access control for hugetlb pages? If yes, we could borrow their method.

We do not. Well, there is a hugetlb cgroup controller but I am not sure
this is the right method. A lack of hugetlb access controll is a serious
shortcoming which has turned this interface into "only first class
citizens" feature with a very closed coordination with an admin.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-10-02  7:32 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-28 17:53 [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64 Zi Yan
2020-09-28 17:53 ` [RFC PATCH v2 01/30] mm/pagewalk: use READ_ONCE when reading the PUD entry unlocked Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 02/30] mm: pagewalk: use READ_ONCE when reading the PMD " Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 03/30] mm: thp: use single linked list for THP page table page deposit Zi Yan
2020-09-28 19:34   ` Matthew Wilcox
2020-09-28 20:34     ` Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 04/30] mm: add new helper functions to allocate one PMD page with 512 PTE pages Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 05/30] mm: thp: add page table deposit/withdraw functions for PUD THP Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 06/30] mm: change thp_order and thp_nr as we will have not just PMD THPs Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 07/30] mm: thp: add anonymous PUD THP page fault support without enabling it Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 08/30] mm: thp: add PUD THP support for copy_huge_pud Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 09/30] mm: thp: add PUD THP support to zap_huge_pud Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 10/30] fs: proc: add PUD THP kpageflag Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 11/30] mm: thp: handling PUD THP reference bit Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 12/30] mm: rmap: add mappped/unmapped page order to anonymous page rmap functions Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 13/30] mm: rmap: add map_order to page_remove_anon_compound_rmap Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 14/30] mm: thp: add PUD THP split_huge_pud_page() function Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 15/30] mm: thp: add PUD THP to deferred split list when PUD mapping is gone Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 16/30] mm: debug: adapt dump_page to PUD THP Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 17/30] mm: thp: PUD THP COW splits PUD page and falls back to PMD page Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 18/30] mm: thp: PUD THP follow_p*d_page() support Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 19/30] mm: stats: make smap stats understand PUD THPs Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 20/30] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 21/30] mm: thp: PUD THP support in try_to_unmap() Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 22/30] mm: thp: split PUD THPs at page reclaim Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 23/30] mm: support PUD THP pagemap support Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 24/30] mm: madvise: add page size options to MADV_HUGEPAGE and MADV_NOHUGEPAGE Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 25/30] mm: vma: add VM_HUGEPAGE_PUD to vm_flags at bit 37 Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 26/30] mm: thp: add a global knob to enable/disable PUD THPs Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 27/30] mm: thp: make PUD THP size public Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 28/30] hugetlb: cma: move cma reserve function to cma.c Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 29/30] mm: thp: use cma reservation for pud thp allocation Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 30/30] mm: thp: enable anonymous PUD THP at page fault path Zi Yan
2020-09-30 11:55 ` [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64 Michal Hocko
2020-10-01 15:14   ` Zi Yan
2020-10-02  7:32     ` Michal Hocko [this message]
2020-10-02  7:50       ` David Hildenbrand
2020-10-02  8:10         ` Michal Hocko
2020-10-02  8:30           ` David Hildenbrand
2020-10-05 15:03             ` Zi Yan
2020-10-05 15:55               ` Matthew Wilcox
2020-10-05 17:04                 ` Roman Gushchin
2020-10-05 19:12                 ` Zi Yan
2020-10-05 19:37                   ` Matthew Wilcox
2020-10-05 17:16               ` Roman Gushchin
2020-10-05 17:27                 ` David Hildenbrand
2020-10-05 18:25                   ` Roman Gushchin
2020-10-05 18:33                     ` David Hildenbrand
2020-10-05 19:11                       ` Roman Gushchin
2020-10-06  8:25                         ` David Hildenbrand
2020-10-05 17:39               ` David Hildenbrand
2020-10-05 18:05                 ` Zi Yan
2020-10-05 18:48                   ` David Hildenbrand
2020-10-06 11:59                   ` Michal Hocko
2020-10-05 15:34         ` Zi Yan
2020-10-05 17:30           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201002073205.GC20872@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=aarcange@redhat.com \
    --cc=david@redhat.com \
    --cc=dnellans@nvidia.com \
    --cc=guro@fb.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.