From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B84AC433E1 for ; Mon, 22 Jun 2020 22:30:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 01C452075A for ; Mon, 22 Jun 2020 22:30:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Pwsb9oeB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 01C452075A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E5716B0006; Mon, 22 Jun 2020 18:30:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 56DF96B0007; Mon, 22 Jun 2020 18:30:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45C566B000C; Mon, 22 Jun 2020 18:30:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 28FB66B0006 for ; Mon, 22 Jun 2020 18:30:47 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B18B113CF41 for ; Mon, 22 Jun 2020 22:30:46 +0000 (UTC) X-FDA: 76958293692.27.soap51_300f46326e36 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 7C7DB3D684 for ; Mon, 22 Jun 2020 22:30:46 +0000 (UTC) X-HE-Tag: soap51_300f46326e36 X-Filterd-Recvd-Size: 11059 Received: from mail-ej1-f67.google.com (mail-ej1-f67.google.com [209.85.218.67]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Mon, 22 Jun 2020 22:30:45 +0000 (UTC) Received: by mail-ej1-f67.google.com with SMTP id rk21so3239560ejb.2 for ; Mon, 22 Jun 2020 15:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=l7mIVPy1j/cnam/uUYY8bMLQFIQliSv4zuXgPQh246o=; b=Pwsb9oeBu0j8MMFXMvO/QoeRRQK/fHcITcJW55FVfh1JJjng4fwYpvXU1IneQcrRW7 LD7dMNDGKuzrDMxgWhfTICpondbKKL9PAouUdCuORnS+zlIeZ6XEuWgr8onxlJfEH0/l 6tsDPxVue6v2ZKjiRW9yzXflx087sb2Rux6xT6Uk6DeVg3Mb/yCRUNQGmbC3mMDpCXJl 5whB7gWzJD8GUGjNF+1EQTJLnIE+e+sT+Cn1+tIj1hwBxZtBSa/lA2mUonbqbk4pntxt RX/FDMCurmb8TBfHodG01tzffxKeFuKvAUUcI+ie1XGa6iSr705nfpEfm7/HUlGBNM3x Tndg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=l7mIVPy1j/cnam/uUYY8bMLQFIQliSv4zuXgPQh246o=; b=V4CJ3dOZ2RC0O5rHqUxV7xTXeX3WVWK1Zr+fhUIfObluIdHvBHcoVQP0lKbE3YW25t aIfBRVheQsEV/JdaOUQJ9E6uAf4lTg1lQSVN7gMEhHYu+Yr0RJg+laYMZLybOMtN0+mD 9tPaZJNJd9A0ipt0gO0ZTLnxd4Lao2pujPFzItdZeVnfw7USFGr+t/NK9sMg53pxRWry 2QpgPebH4ez7QJkLi9MlfmiqtoXRQCWQvawBx0xgCh1NlvWd8VGNAaDVNE9a/Y4tDqaz H2b5ba9OG5ypAKZCjCXtd13W37wZ+w1ZmmK7C/2rMskVCrN6mIq7cqFS0H7TtOcvBAgm xv4w== X-Gm-Message-State: AOAM530DIvV2bwNUCZ5zF2u49tKbI2+3K/yLgSVygszCbNsyrzXdWHq6 fhq2YVEjIUui8xsfwIEDTjgAcLD2PQdDoF93hgo= X-Google-Smtp-Source: ABdhPJzJ4tbwUKrBs9zx94rndjWy/D6LNvS8e9C2V2fig1gCFONmOKtK4IzllJJ5OXm3iVgG0hOxsdz/9Rv4cfIk3HA= X-Received: by 2002:a17:906:fa13:: with SMTP id lo19mr9158149ejb.213.1592865044714; Mon, 22 Jun 2020 15:30:44 -0700 (PDT) MIME-Version: 1.0 References: <20200619215649.32297-1-rcampbell@nvidia.com> <20200619215649.32297-14-rcampbell@nvidia.com> <4C364E23-0716-4D59-85A1-0C293B86BC2C@nvidia.com> In-Reply-To: <4C364E23-0716-4D59-85A1-0C293B86BC2C@nvidia.com> From: Yang Shi Date: Mon, 22 Jun 2020 15:30:18 -0700 Message-ID: Subject: Re: [PATCH 13/16] mm: support THP migration to device private memory To: Zi Yan Cc: Ralph Campbell , nouveau@lists.freedesktop.org, linux-rdma@vger.kernel.org, Linux MM , linux-kselftest@vger.kernel.org, Linux Kernel Mailing List , Jerome Glisse , John Hubbard , Christoph Hellwig , Jason Gunthorpe , Ben Skeggs , Andrew Morton , Shuah Khan , "Huang, Ying" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7C7DB3D684 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 22, 2020 at 2:53 PM Zi Yan wrote: > > On 22 Jun 2020, at 17:31, Ralph Campbell wrote: > > > On 6/22/20 1:10 PM, Zi Yan wrote: > >> On 22 Jun 2020, at 15:36, Ralph Campbell wrote: > >> > >>> On 6/21/20 4:20 PM, Zi Yan wrote: > >>>> On 19 Jun 2020, at 17:56, Ralph Campbell wrote: > >>>> > >>>>> Support transparent huge page migration to ZONE_DEVICE private memo= ry. > >>>>> A new flag (MIGRATE_PFN_COMPOUND) is added to the input PFN array t= o > >>>>> indicate the huge page was fully mapped by the CPU. > >>>>> Export prep_compound_page() so that device drivers can create huge > >>>>> device private pages after calling memremap_pages(). > >>>>> > >>>>> Signed-off-by: Ralph Campbell > >>>>> --- > >>>>> include/linux/migrate.h | 1 + > >>>>> include/linux/mm.h | 1 + > >>>>> mm/huge_memory.c | 30 ++++-- > >>>>> mm/internal.h | 1 - > >>>>> mm/memory.c | 10 +- > >>>>> mm/memremap.c | 9 +- > >>>>> mm/migrate.c | 226 ++++++++++++++++++++++++++++++++--= ------ > >>>>> mm/page_alloc.c | 1 + > >>>>> 8 files changed, 226 insertions(+), 53 deletions(-) > >>>>> > >>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h > >>>>> index 3e546cbf03dd..f6a64965c8bd 100644 > >>>>> --- a/include/linux/migrate.h > >>>>> +++ b/include/linux/migrate.h > >>>>> @@ -166,6 +166,7 @@ static inline int migrate_misplaced_transhuge_p= age(struct mm_struct *mm, > >>>>> #define MIGRATE_PFN_MIGRATE (1UL << 1) > >>>>> #define MIGRATE_PFN_LOCKED (1UL << 2) > >>>>> #define MIGRATE_PFN_WRITE (1UL << 3) > >>>>> +#define MIGRATE_PFN_COMPOUND (1UL << 4) > >>>>> #define MIGRATE_PFN_SHIFT 6 > >>>>> > >>>>> static inline struct page *migrate_pfn_to_page(unsigned long mpf= n) > >>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h > >>>>> index dc7b87310c10..020b9dd3cddb 100644 > >>>>> --- a/include/linux/mm.h > >>>>> +++ b/include/linux/mm.h > >>>>> @@ -932,6 +932,7 @@ static inline unsigned int page_shift(struct pa= ge *page) > >>>>> } > >>>>> > >>>>> void free_compound_page(struct page *page); > >>>>> +void prep_compound_page(struct page *page, unsigned int order); > >>>>> > >>>>> #ifdef CONFIG_MMU > >>>>> /* > >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>>> index 78c84bee7e29..25d95f7b1e98 100644 > >>>>> --- a/mm/huge_memory.c > >>>>> +++ b/mm/huge_memory.c > >>>>> @@ -1663,23 +1663,35 @@ int zap_huge_pmd(struct mmu_gather *tlb, st= ruct vm_area_struct *vma, > >>>>> } else { > >>>>> struct page *page =3D NULL; > >>>>> int flush_needed =3D 1; > >>>>> + bool is_anon =3D false; > >>>>> > >>>>> if (pmd_present(orig_pmd)) { > >>>>> page =3D pmd_page(orig_pmd); > >>>>> + is_anon =3D PageAnon(page); > >>>>> page_remove_rmap(page, true); > >>>>> VM_BUG_ON_PAGE(page_mapcount(page) < 0, p= age); > >>>>> VM_BUG_ON_PAGE(!PageHead(page), page); > >>>>> } else if (thp_migration_supported()) { > >>>>> swp_entry_t entry; > >>>>> > >>>>> - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); > >>>>> entry =3D pmd_to_swp_entry(orig_pmd); > >>>>> - page =3D pfn_to_page(swp_offset(entry)); > >>>>> + if (is_device_private_entry(entry)) { > >>>>> + page =3D device_private_entry_to_page(ent= ry); > >>>>> + is_anon =3D PageAnon(page); > >>>>> + page_remove_rmap(page, true); > >>>>> + VM_BUG_ON_PAGE(page_mapcount(page) < 0, p= age); > >>>>> + VM_BUG_ON_PAGE(!PageHead(page), page); > >>>>> + put_page(page); > >>>> > >>>> Why do you hide this code behind thp_migration_supported()? It seems= that you just need > >>>> pmd swap entry not pmd migration entry. Also the condition is not co= nsistent with the code > >>>> in __handle_mm_fault(), in which you handle is_device_private_entry(= ) directly without > >>>> checking thp_migration_support(). > >>> > >>> Good point, I think "else if (thp_migration_supported())" should be > >>> "else if (is_pmd_migration_entry(orig_pmd))" since if the PMD *is* > >>> a device private or migration entry, then it should be handled and th= e > >>> VM_BUG_ON() should be that thp_migration_supported() is true > >>> (or maybe remove the VM_BUG_ON?). > >> > >> I disagree. A device private entry is independent of a PMD migration e= ntry, since a device private > >> entry is just a swap entry, which is available when CONFIG_TRANSPARENT= _HUGEPAGE. So for architectures > >> support THP but not THP migration (like ARM64), your code should still= work. > > > > I'll fix this up for v2 and you can double check me. > > Sure. > > > > >> I would suggest you to check all the use of is_swap_pmd() and make sur= e the code > >> can handle is_device_private_entry(). > > > > OK. > > > >> For new device private code, you might need to guard it either statica= lly or dynamically in case > >> CONFIG_DEVICE_PRIVATE is disabled. Potentially, you would like to make= sure a system without > >> CONFIG_DEVICE_PRIVATE will not see is_device_private_entry() =3D=3D tr= ue and give errors when it does. > > > > I have compiled and run with CONFIG_DEVICE_PRIVATE off but I can test m= ore combinations of > > config settings. > > Thanks. > > > > >>> > >>>> Do we need to support split_huge_pmd() if a page is migrated to devi= ce? Any new code > >>>> needed in split_huge_pmd()? > >>> > >>> I was thinking that any CPU usage of the device private page would ca= use it to be > >>> migrated back to system memory as a whole PMD/PUD page but I'll doubl= e check. > >>> At least there should be a check that the page isn't a device private= page. > >> > >> Well, that depends. If we can allocate a THP on CPU memory, we can mig= rate the whole page back. > >> But if no THP is allocated due to low on free memory or memory fragmen= tation, I think you > >> might need a fallback plan, either splitting the device private page a= nd migrating smaller > >> pages instead or reclaiming CPU memory until you get a THP. IMHO, the = former might be preferred, > >> since the latter might cost a lot of CPU cycles but still gives no THP= after all. > > > > Sounds reasonable. I'll work on adding the fallback path for v2. > > Ying(cc=E2=80=99d) developed the code to swapout and swapin THP in one pi= ece: https://lore.kernel.org/linux-mm/20181207054122.27822-1-ying.huang@int= el.com/. > I am not sure whether the patchset makes into mainstream or not. It could= be a good technical reference > for swapping in device private pages, although swapping in pages from dis= k and from device private > memory are two different scenarios. > > Since the device private memory swapin impacts core mm performance, we mi= ght want to discuss your patches > with more people, like the ones from Ying=E2=80=99s patchset, in the next= version. I believe Ying will give you more insights about how THP swap works. But, IMHO device memory migration (migrate to system memory) seems like THP CoW more than swap. When migrating in: > > > > =E2=80=94 > Best Regards, > Yan Zi