linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Linux MM <linux-mm@kvack.org>, Ira Weiny <ira.weiny@intel.com>,
	 linux-nvdimm <linux-nvdimm@lists.01.org>,
	Matthew Wilcox <willy@infradead.org>,
	 Jason Gunthorpe <jgg@ziepe.ca>, Jane Chu <jane.chu@oracle.com>,
	 Muchun Song <songmuchun@bytedance.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	 Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v1 10/11] device-dax: compound pagemap support
Date: Tue, 1 Jun 2021 17:36:48 -0700	[thread overview]
Message-ID: <CAPcyv4jeY0K7ciWeCLjxXmiWs7NNeM-_zEdZ2XAdYnyZc9PvWA@mail.gmail.com> (raw)
In-Reply-To: <20210325230938.30752-11-joao.m.martins@oracle.com>

On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> dax devices are created with a fixed @align (huge page size) which
> is enforced through as well at mmap() of the device. Faults,
> consequently happen too at the specified @align specified at the
> creation, and those don't change through out dax device lifetime.
> MCEs poisons a whole dax huge page, as well as splits occurring at
> at the configured page size.

This paragraph last...

>
> Use the newly added compound pagemap facility which maps the
> assigned dax ranges as compound pages at a page size of @align.
> Currently, this means, that region/namespace bootstrap would take
> considerably less, given that you would initialize considerably less
> pages.

This paragraph should go first...

>
> On setups with 128G NVDIMMs the initialization with DRAM stored struct pages
> improves from ~268-358 ms to ~78-100 ms with 2M pages, and to less than
> a 1msec with 1G pages.

This paragraph second...


The reason for this ordering is to have increasingly more detail as
the changelog is read so that people that don't care about the details
can get the main theme immediately, and others that wonder why
device-dax is able to support this can read deeper.

>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/dax/device.c | 58 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index db92573c94e8..e3dcc4ad1727 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -192,6 +192,43 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
>  }
>  #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>
> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn,
> +                            unsigned long fault_size,
> +                            struct address_space *f_mapping)
> +{
> +       unsigned long i;
> +       pgoff_t pgoff;
> +
> +       pgoff = linear_page_index(vmf->vma, vmf->address
> +                       & ~(fault_size - 1));

I know you are just copying this style from whomever wrote it this way
originally, but that person (me) was wrong this should be:

pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size));

...you might do a lead-in cleanup patch before this one.


> +
> +       for (i = 0; i < fault_size / PAGE_SIZE; i++) {
> +               struct page *page;
> +
> +               page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
> +               if (page->mapping)
> +                       continue;
> +               page->mapping = f_mapping;
> +               page->index = pgoff + i;
> +       }
> +}
> +
> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn,
> +                                unsigned long fault_size,
> +                                struct address_space *f_mapping)
> +{
> +       struct page *head;
> +
> +       head = pfn_to_page(pfn_t_to_pfn(pfn));
> +       head = compound_head(head);
> +       if (head->mapping)
> +               return;
> +
> +       head->mapping = f_mapping;
> +       head->index = linear_page_index(vmf->vma, vmf->address
> +                       & ~(fault_size - 1));
> +}
> +
>  static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>                 enum page_entry_size pe_size)
>  {
> @@ -225,8 +262,7 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>         }
>
>         if (rc == VM_FAULT_NOPAGE) {
> -               unsigned long i;
> -               pgoff_t pgoff;
> +               struct dev_pagemap *pgmap = pfn_t_to_page(pfn)->pgmap;

The device should already know its pagemap...

There is a distinction in dev_dax_probe() for "static" vs "dynamic"
pgmap, but once the pgmap is allocated it should be fine to assign it
back to dev_dax->pgmap in the "dynamic" case. That could be a lead-in
patch to make dev_dax->pgmap always valid.

>
>                 /*
>                  * In the device-dax case the only possibility for a
> @@ -234,17 +270,10 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf,
>                  * mapped. No need to consider the zero page, or racing
>                  * conflicting mappings.
>                  */
> -               pgoff = linear_page_index(vmf->vma, vmf->address
> -                               & ~(fault_size - 1));
> -               for (i = 0; i < fault_size / PAGE_SIZE; i++) {
> -                       struct page *page;
> -
> -                       page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
> -                       if (page->mapping)
> -                               continue;
> -                       page->mapping = filp->f_mapping;
> -                       page->index = pgoff + i;
> -               }
> +               if (pgmap->align > PAGE_SIZE)
> +                       set_compound_mapping(vmf, pfn, fault_size, filp->f_mapping);
> +               else
> +                       set_page_mapping(vmf, pfn, fault_size, filp->f_mapping);
>         }
>         dax_read_unlock(id);
>
> @@ -426,6 +455,9 @@ int dev_dax_probe(struct dev_dax *dev_dax)
>         }
>
>         pgmap->type = MEMORY_DEVICE_GENERIC;
> +       if (dev_dax->align > PAGE_SIZE)
> +               pgmap->align = dev_dax->align;

Just needs updates for whatever renames you do for the "compound
geometry" terminology rather than subtle side effects of "align".

Other than that, looks good to me.


  reply	other threads:[~2021-06-02  0:37 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25 23:09 [PATCH v1 00/11] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-03-25 23:09 ` [PATCH v1 01/11] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-04-24  0:12   ` Dan Williams
2021-04-24 19:00     ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 02/11] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-04-24  0:16   ` Dan Williams
2021-04-24 19:05     ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 03/11] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-04-24  0:18   ` Dan Williams
2021-04-24 19:05     ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-05-05 18:44   ` Dan Williams
2021-05-05 18:58     ` Matthew Wilcox
2021-05-05 19:49     ` Joao Martins
2021-05-05 22:20       ` Dan Williams
2021-05-05 22:36         ` Joao Martins
2021-05-05 23:03           ` Dan Williams
2021-05-06 10:12             ` Joao Martins
2021-05-18 17:27           ` Joao Martins
2021-05-18 19:56             ` Jane Chu
2021-05-19 11:29               ` Joao Martins
2021-05-19 18:36                 ` Jane Chu
2021-06-07 20:17             ` Dan Williams
2021-06-07 20:47               ` Joao Martins
2021-06-07 21:00                 ` Joao Martins
2021-06-07 21:57                   ` Dan Williams
2021-05-06  8:05         ` Aneesh Kumar K.V
2021-05-06 10:23           ` Joao Martins
2021-05-06 11:43             ` Matthew Wilcox
2021-05-06 12:15               ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 05/11] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-05-05 22:34   ` Dan Williams
2021-05-05 22:37     ` Joao Martins
2021-05-05 23:14       ` Dan Williams
2021-05-06 10:24         ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 06/11] mm/sparse-vmemmap: refactor vmemmap_populate_basepages() Joao Martins
2021-05-05 22:43   ` Dan Williams
2021-05-06 10:27     ` Joao Martins
2021-05-06 18:36       ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 07/11] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-05-06  1:18   ` Dan Williams
2021-05-06 11:01     ` Joao Martins
2021-05-10 19:19       ` Dan Williams
2021-05-13 18:45         ` Joao Martins
2021-06-16 15:05           ` Joao Martins
2021-06-16 23:35             ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 08/11] mm/sparse-vmemmap: use hugepages for PUD " Joao Martins
2021-06-01 19:30   ` Dan Williams
2021-06-07 12:02     ` Joao Martins
2021-06-07 19:47       ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 09/11] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-06-01 23:35   ` Dan Williams
2021-06-07 13:48     ` Joao Martins
2021-06-07 19:32       ` Dan Williams
2021-06-14 18:41         ` Joao Martins
2021-06-14 23:07           ` Dan Williams
2021-03-25 23:09 ` [PATCH v1 10/11] device-dax: compound pagemap support Joao Martins
2021-06-02  0:36   ` Dan Williams [this message]
2021-06-07 13:59     ` Joao Martins
2021-03-25 23:09 ` [PATCH v1 11/11] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-06-02  1:05   ` Dan Williams
2021-06-07 15:21     ` Joao Martins
2021-06-07 19:22       ` Dan Williams
2021-04-01  9:38 ` [PATCH v1 00/11] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4jeY0K7ciWeCLjxXmiWs7NNeM-_zEdZ2XAdYnyZc9PvWA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ira.weiny@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).