From: Joao Martins <joao.m.martins@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux MM <linux-mm@kvack.org>, Ira Weiny <ira.weiny@intel.com>,
Matthew Wilcox <willy@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>, Jane Chu <jane.chu@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux NVDIMM <nvdimm@lists.linux.dev>
Subject: Re: [PATCH v1 10/11] device-dax: compound pagemap support
Date: Mon, 7 Jun 2021 14:59:35 +0100 [thread overview]
Message-ID: <9191a120-2728-51f7-a57e-e16644f33bc1@oracle.com> (raw)
In-Reply-To: <CAPcyv4jeY0K7ciWeCLjxXmiWs7NNeM-_zEdZ2XAdYnyZc9PvWA@mail.gmail.com>
On 6/2/21 1:36 AM, Dan Williams wrote:
> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> dax devices are created with a fixed @align (huge page size) which
>> is enforced through as well at mmap() of the device. Faults,
>> consequently happen too at the specified @align specified at the
>> creation, and those don't change through out dax device lifetime.
>> MCEs poisons a whole dax huge page, as well as splits occurring at
>> at the configured page size.
>
> This paragraph last...
>
/me nods
>>
>> Use the newly added compound pagemap facility which maps the
>> assigned dax ranges as compound pages at a page size of @align.
>> Currently, this means, that region/namespace bootstrap would take
>> considerably less, given that you would initialize considerably less
>> pages.
>
> This paragraph should go first...
>
/me nods
>>
>> On setups with 128G NVDIMMs the initialization with DRAM stored struct pages
>> improves from ~268-358 ms to ~78-100 ms with 2M pages, and to less than
>> a 1msec with 1G pages.
>
> This paragraph second...
>
/me nods
>
> The reason for this ordering is to have increasingly more detail as
> the changelog is read so that people that don't care about the details
> can get the main theme immediately, and others that wonder why
> device-dax is able to support this can read deeper.
>
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> drivers/dax/device.c | 58 ++++++++++++++++++++++++++++++++++----------
>> 1 file changed, 45 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
>> index db92573c94e8..e3dcc4ad1727 100644
>> --- a/drivers/dax/device.c
>> +++ b/drivers/dax/device.c
>> @@ -192,6 +192,43 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
>> }
>> #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>
>> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn,
>> + unsigned long fault_size,
>> + struct address_space *f_mapping)
>> +{
>> + unsigned long i;
>> + pgoff_t pgoff;
>> +
>> + pgoff = linear_page_index(vmf->vma, vmf->address
>> + & ~(fault_size - 1));
>
> I know you are just copying this style from whomever wrote it this way
> originally, but that person (me) was wrong this should be:
>
> pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size));
>
> ...you might do a lead-in cleanup patch before this one.
>
Yeap, will do.
>
>> +
>> + for (i = 0; i < fault_size / PAGE_SIZE; i++) {
>> + struct page *page;
>> +
>> + page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
>> + if (page->mapping)
>> + continue;
>> + page->mapping = f_mapping;
>> + page->index = pgoff + i;
>> + }
>> +}
>> +
>> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn,
>> + unsigned long fault_size,
>> + struct address_space *f_mapping)
>> +{
>> + struct page *head;
>> +
>> + head = pfn_to_page(pfn_t_to_pfn(pfn));
>> + head = compound_head(head);
>> + if (head->mapping)
>> + return;
>> +
>> + head->mapping = f_mapping;
>> + head->index = linear_page_index(vmf->vma, vmf->address
>> + & ~(fault_size - 1));
>> +}
>> +
>> static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>> enum page_entry_size pe_size)
>> {
>> @@ -225,8 +262,7 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>> }
>>
>> if (rc == VM_FAULT_NOPAGE) {
>> - unsigned long i;
>> - pgoff_t pgoff;
>> + struct dev_pagemap *pgmap = pfn_t_to_page(pfn)->pgmap;
>
> The device should already know its pagemap...
>
> There is a distinction in dev_dax_probe() for "static" vs "dynamic"
> pgmap, but once the pgmap is allocated it should be fine to assign it
> back to dev_dax->pgmap in the "dynamic" case. That could be a lead-in
> patch to make dev_dax->pgmap always valid.
>
I suppose you mean to always set dev_dax->pgmap at the end of the
'if (!pgmap)' in dev_dax_probe() after we allocate the pgmap.
I will make this a separate cleanup patch as you suggested.
>>
>> /*
>> * In the device-dax case the only possibility for a
>> @@ -234,17 +270,10 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>> * mapped. No need to consider the zero page, or racing
>> * conflicting mappings.
>> */
>> - pgoff = linear_page_index(vmf->vma, vmf->address
>> - & ~(fault_size - 1));
>> - for (i = 0; i < fault_size / PAGE_SIZE; i++) {
>> - struct page *page;
>> -
>> - page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
>> - if (page->mapping)
>> - continue;
>> - page->mapping = filp->f_mapping;
>> - page->index = pgoff + i;
>> - }
>> + if (pgmap->align > PAGE_SIZE)
>> + set_compound_mapping(vmf, pfn, fault_size, filp->f_mapping);
>> + else
>> + set_page_mapping(vmf, pfn, fault_size, filp->f_mapping);
>> }
>> dax_read_unlock(id);
>>
>> @@ -426,6 +455,9 @@ int dev_dax_probe(struct dev_dax *dev_dax)
>> }
>>
>> pgmap->type = MEMORY_DEVICE_GENERIC;
>> + if (dev_dax->align > PAGE_SIZE)
>> + pgmap->align = dev_dax->align;
>
> Just needs updates for whatever renames you do for the "compound
> geometry" terminology rather than subtle side effects of "align".
>
> Other than that, looks good to me.
>
OK, will do.
Thanks!
next prev parent reply other threads:[~2021-06-07 13:59 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-25 23:09 [PATCH v1 00/11] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-03-25 23:09 ` [PATCH v1 01/11] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-03-25 23:09 ` [PATCH v1 02/11] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-03-25 23:09 ` [PATCH v1 03/11] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-03-25 23:09 ` [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
[not found] ` <CAPcyv4gs_rHL7FPqyQEb3yT4jrv8Wo_xA2ojKsppoBfmDocq8A@mail.gmail.com>
[not found] ` <cd1c9849-8660-dbdc-718a-aa4ba5d48c01@oracle.com>
[not found] ` <CAPcyv4jG8+S6xJyp=1S2=dpit0Hs2+HgGwpWeRROCRuJnQYAxQ@mail.gmail.com>
[not found] ` <56a3e271-4ef8-ba02-639e-fd7fe7de7e36@oracle.com>
[not found] ` <8c922a58-c901-1ad9-5d19-1182bd6dea1e@oracle.com>
[not found] ` <CAPcyv4j_PdzytEeabe95FrUiNVNobdJRvUE9M9j0krKQ1defBg@mail.gmail.com>
[not found] ` <e22ef769-5eb2-1812-497f-6d069d632cd0@oracle.com>
2021-06-07 21:00 ` Joao Martins
2021-06-07 21:57 ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 05/11] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-03-25 23:09 ` [PATCH v1 06/11] mm/sparse-vmemmap: refactor vmemmap_populate_basepages() Joao Martins
2021-03-25 23:09 ` [PATCH v1 07/11] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-03-25 23:09 ` [PATCH v1 08/11] mm/sparse-vmemmap: use hugepages for PUD " Joao Martins
2021-03-25 23:09 ` [PATCH v1 09/11] mm/page_alloc: reuse tail struct pages for " Joao Martins
[not found] ` <CAPcyv4gtSqfmuAaX9cs63OvLkf-h4B_5fPiEnM9p9cqLZztXpg@mail.gmail.com>
2021-06-07 13:48 ` Joao Martins
2021-06-07 19:32 ` Dan Williams
2021-06-14 18:41 ` Joao Martins
2021-06-14 23:07 ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 10/11] device-dax: compound pagemap support Joao Martins
[not found] ` <CAPcyv4jeY0K7ciWeCLjxXmiWs7NNeM-_zEdZ2XAdYnyZc9PvWA@mail.gmail.com>
2021-06-07 13:59 ` Joao Martins [this message]
2021-03-25 23:09 ` [PATCH v1 11/11] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-06-02 1:05 ` Dan Williams
2021-06-07 15:21 ` Joao Martins
2021-06-07 19:22 ` Dan Williams
2021-04-01 9:38 ` [PATCH v1 00/11] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9191a120-2728-51f7-a57e-e16644f33bc1@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=ira.weiny@intel.com \
--cc=jane.chu@oracle.com \
--cc=jgg@ziepe.ca \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nvdimm@lists.linux.dev \
--cc=songmuchun@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).