linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>,
	nouveau@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>,
	linux-kselftest@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Christoph Hellwig <hch@lst.de>,
	Jason Gunthorpe <jgg@mellanox.com>,
	Ben Skeggs <bskeggs@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>
Subject: Re: [PATCH 13/16] mm: support THP migration to device private memory
Date: Mon, 22 Jun 2020 15:30:18 -0700	[thread overview]
Message-ID: <CAHbLzkqe50+KUsRH92O4Be2PjuwAYGw9nK+d-73syxi2Xnf9-Q@mail.gmail.com> (raw)
In-Reply-To: <4C364E23-0716-4D59-85A1-0C293B86BC2C@nvidia.com>

On Mon, Jun 22, 2020 at 2:53 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 22 Jun 2020, at 17:31, Ralph Campbell wrote:
>
> > On 6/22/20 1:10 PM, Zi Yan wrote:
> >> On 22 Jun 2020, at 15:36, Ralph Campbell wrote:
> >>
> >>> On 6/21/20 4:20 PM, Zi Yan wrote:
> >>>> On 19 Jun 2020, at 17:56, Ralph Campbell wrote:
> >>>>
> >>>>> Support transparent huge page migration to ZONE_DEVICE private memory.
> >>>>> A new flag (MIGRATE_PFN_COMPOUND) is added to the input PFN array to
> >>>>> indicate the huge page was fully mapped by the CPU.
> >>>>> Export prep_compound_page() so that device drivers can create huge
> >>>>> device private pages after calling memremap_pages().
> >>>>>
> >>>>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> >>>>> ---
> >>>>>    include/linux/migrate.h |   1 +
> >>>>>    include/linux/mm.h      |   1 +
> >>>>>    mm/huge_memory.c        |  30 ++++--
> >>>>>    mm/internal.h           |   1 -
> >>>>>    mm/memory.c             |  10 +-
> >>>>>    mm/memremap.c           |   9 +-
> >>>>>    mm/migrate.c            | 226 ++++++++++++++++++++++++++++++++--------
> >>>>>    mm/page_alloc.c         |   1 +
> >>>>>    8 files changed, 226 insertions(+), 53 deletions(-)
> >>>>>
> >>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> >>>>> index 3e546cbf03dd..f6a64965c8bd 100644
> >>>>> --- a/include/linux/migrate.h
> >>>>> +++ b/include/linux/migrate.h
> >>>>> @@ -166,6 +166,7 @@ static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm,
> >>>>>    #define MIGRATE_PFN_MIGRATE    (1UL << 1)
> >>>>>    #define MIGRATE_PFN_LOCKED     (1UL << 2)
> >>>>>    #define MIGRATE_PFN_WRITE      (1UL << 3)
> >>>>> +#define MIGRATE_PFN_COMPOUND     (1UL << 4)
> >>>>>    #define MIGRATE_PFN_SHIFT      6
> >>>>>
> >>>>>    static inline struct page *migrate_pfn_to_page(unsigned long mpfn)
> >>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>>> index dc7b87310c10..020b9dd3cddb 100644
> >>>>> --- a/include/linux/mm.h
> >>>>> +++ b/include/linux/mm.h
> >>>>> @@ -932,6 +932,7 @@ static inline unsigned int page_shift(struct page *page)
> >>>>>    }
> >>>>>
> >>>>>    void free_compound_page(struct page *page);
> >>>>> +void prep_compound_page(struct page *page, unsigned int order);
> >>>>>
> >>>>>    #ifdef CONFIG_MMU
> >>>>>    /*
> >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>>> index 78c84bee7e29..25d95f7b1e98 100644
> >>>>> --- a/mm/huge_memory.c
> >>>>> +++ b/mm/huge_memory.c
> >>>>> @@ -1663,23 +1663,35 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >>>>>           } else {
> >>>>>                   struct page *page = NULL;
> >>>>>                   int flush_needed = 1;
> >>>>> +         bool is_anon = false;
> >>>>>
> >>>>>                   if (pmd_present(orig_pmd)) {
> >>>>>                           page = pmd_page(orig_pmd);
> >>>>> +                 is_anon = PageAnon(page);
> >>>>>                           page_remove_rmap(page, true);
> >>>>>                           VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
> >>>>>                           VM_BUG_ON_PAGE(!PageHead(page), page);
> >>>>>                   } else if (thp_migration_supported()) {
> >>>>>                           swp_entry_t entry;
> >>>>>
> >>>>> -                 VM_BUG_ON(!is_pmd_migration_entry(orig_pmd));
> >>>>>                           entry = pmd_to_swp_entry(orig_pmd);
> >>>>> -                 page = pfn_to_page(swp_offset(entry));
> >>>>> +                 if (is_device_private_entry(entry)) {
> >>>>> +                         page = device_private_entry_to_page(entry);
> >>>>> +                         is_anon = PageAnon(page);
> >>>>> +                         page_remove_rmap(page, true);
> >>>>> +                         VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
> >>>>> +                         VM_BUG_ON_PAGE(!PageHead(page), page);
> >>>>> +                         put_page(page);
> >>>>
> >>>> Why do you hide this code behind thp_migration_supported()? It seems that you just need
> >>>> pmd swap entry not pmd migration entry. Also the condition is not consistent with the code
> >>>> in __handle_mm_fault(), in which you handle is_device_private_entry() directly without
> >>>> checking thp_migration_support().
> >>>
> >>> Good point, I think "else if (thp_migration_supported())" should be
> >>> "else if (is_pmd_migration_entry(orig_pmd))" since if the PMD *is*
> >>> a device private or migration entry, then it should be handled and the
> >>> VM_BUG_ON() should be that thp_migration_supported() is true
> >>> (or maybe remove the VM_BUG_ON?).
> >>
> >> I disagree. A device private entry is independent of a PMD migration entry, since a device private
> >> entry is just a swap entry, which is available when CONFIG_TRANSPARENT_HUGEPAGE. So for architectures
> >> support THP but not THP migration (like ARM64), your code should still work.
> >
> > I'll fix this up for v2 and you can double check me.
>
> Sure.
>
> >
> >> I would suggest you to check all the use of is_swap_pmd() and make sure the code
> >> can handle is_device_private_entry().
> >
> > OK.
> >
> >> For new device private code, you might need to guard it either statically or dynamically in case
> >> CONFIG_DEVICE_PRIVATE is disabled. Potentially, you would like to make sure a system without
> >> CONFIG_DEVICE_PRIVATE will not see is_device_private_entry() == true and give errors when it does.
> >
> > I have compiled and run with CONFIG_DEVICE_PRIVATE off but I can test more combinations of
> > config settings.
>
> Thanks.
>
> >
> >>>
> >>>> Do we need to support split_huge_pmd() if a page is migrated to device? Any new code
> >>>> needed in split_huge_pmd()?
> >>>
> >>> I was thinking that any CPU usage of the device private page would cause it to be
> >>> migrated back to system memory as a whole PMD/PUD page but I'll double check.
> >>> At least there should be a check that the page isn't a device private page.
> >>
> >> Well, that depends. If we can allocate a THP on CPU memory, we can migrate the whole page back.
> >> But if no THP is allocated due to low on free memory or memory fragmentation, I think you
> >> might need a fallback plan, either splitting the device private page and migrating smaller
> >> pages instead or reclaiming CPU memory until you get a THP. IMHO, the former might be preferred,
> >> since the latter might cost a lot of CPU cycles but still gives no THP after all.
> >
> > Sounds reasonable. I'll work on adding the fallback path for v2.
>
> Ying(cc’d) developed the code to swapout and swapin THP in one piece: https://lore.kernel.org/linux-mm/20181207054122.27822-1-ying.huang@intel.com/.
> I am not sure whether the patchset makes into mainstream or not. It could be a good technical reference
> for swapping in device private pages, although swapping in pages from disk and from device private
> memory are two different scenarios.
>
> Since the device private memory swapin impacts core mm performance, we might want to discuss your patches
> with more people, like the ones from Ying’s patchset, in the next version.

I believe Ying will give you more insights about how THP swap works.

But, IMHO device memory migration (migrate to system memory) seems
like THP CoW more than swap.

When migrating in:

>
>
>
> —
> Best Regards,
> Yan Zi

  reply	other threads:[~2020-06-22 22:30 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 21:56 [PATCH 00/16] mm/hmm/nouveau: THP mapping and migration Ralph Campbell
2020-06-19 21:56 ` [PATCH 01/16] mm: fix migrate_vma_setup() src_owner and normal pages Ralph Campbell
2020-06-19 21:56 ` [PATCH 02/16] nouveau: fix migrate page regression Ralph Campbell
2020-06-19 21:56 ` [PATCH 03/16] nouveau: fix mixed normal and device private page migration Ralph Campbell
2020-06-19 21:56 ` [PATCH 04/16] mm/hmm: fix test timeout on slower machines Ralph Campbell
2020-06-19 21:56 ` [PATCH 05/16] mm/hmm/test: remove redundant page table invalidate Ralph Campbell
2020-06-19 21:56 ` [PATCH 06/16] mm/hmm: test mixed normal and device private migrations Ralph Campbell
2020-06-19 21:56 ` [PATCH 07/16] nouveau: make nvkm_vmm_ctor() and nvkm_mmu_ptp_get() static Ralph Campbell
2020-06-19 21:56 ` [PATCH 08/16] nouveau/hmm: fault one page at a time Ralph Campbell
2020-06-22 17:22   ` Jason Gunthorpe
2020-06-22 18:44     ` Ralph Campbell
2020-06-19 21:56 ` [PATCH 09/16] mm/hmm: add output flag for compound page mapping Ralph Campbell
2020-06-22 17:25   ` Jason Gunthorpe
2020-06-22 18:10     ` Ralph Campbell
2020-06-22 23:18       ` Jason Gunthorpe
2020-06-22 23:26         ` Ralph Campbell
2020-06-19 21:56 ` [PATCH 10/16] nouveau/hmm: support mapping large sysmem pages Ralph Campbell
2020-06-19 21:56 ` [PATCH 11/16] hmm: add tests for HMM_PFN_COMPOUND flag Ralph Campbell
2020-06-19 21:56 ` [PATCH 12/16] mm/hmm: optimize migrate_vma_setup() for holes Ralph Campbell
2020-06-19 21:56 ` [PATCH 13/16] mm: support THP migration to device private memory Ralph Campbell
2020-06-21 23:20   ` Zi Yan
2020-06-22 19:36     ` Ralph Campbell
2020-06-22 20:10       ` Zi Yan
2020-06-22 21:31         ` Ralph Campbell
2020-06-22 21:53           ` Zi Yan
2020-06-22 22:30             ` Yang Shi [this message]
2020-06-22 22:33               ` Yang Shi
2020-06-22 23:01                 ` John Hubbard
2020-06-22 23:54                   ` Yang Shi
2020-06-23  0:05                     ` Ralph Campbell
2020-06-23  2:51                       ` Huang, Ying
2020-06-19 21:56 ` [PATCH 14/16] mm/thp: add THP allocation helper Ralph Campbell
2020-06-22  0:15   ` Zi Yan
2020-06-22 21:33     ` Ralph Campbell
2020-06-19 21:56 ` [PATCH 15/16] mm/hmm/test: add self tests for THP migration Ralph Campbell
2020-06-19 21:56 ` [PATCH 16/16] nouveau: support THP migration to private memory Ralph Campbell
2020-06-22 12:39 ` [PATCH 00/16] mm/hmm/nouveau: THP mapping and migration Jason Gunthorpe
2020-06-22 16:58   ` Ralph Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkqe50+KUsRH92O4Be2PjuwAYGw9nK+d-73syxi2Xnf9-Q@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bskeggs@redhat.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=rcampbell@nvidia.com \
    --cc=shuah@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).