All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	 Dave Jiang <dave.jiang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>,
	 Jane Chu <jane.chu@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Jonathan Corbet <corbet@lwn.net>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	 Linux Doc Mailing List <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v3 04/14] mm/memremap: add ZONE_DEVICE support for compound pages
Date: Wed, 14 Jul 2021 18:08:14 -0700	[thread overview]
Message-ID: <CAPcyv4jwd_dzTH1H+cbiKqfK5=Xaa9JY=EVKHhPbjicVZA-URQ@mail.gmail.com> (raw)
In-Reply-To: <20210714193542.21857-5-joao.m.martins@oracle.com>

On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> Add a new align property for struct dev_pagemap which specifies that a

s/align/@geometry/

> pagemap is composed of a set of compound pages of size @align,

s/@align/@geometry/

> instead of
> base pages. When a compound page geometry is requested, all but the first
> page are initialised as tail pages instead of order-0 pages.
>
> For certain ZONE_DEVICE users like device-dax which have a fixed page size,
> this creates an opportunity to optimize GUP and GUP-fast walkers, treating
> it the same way as THP or hugetlb pages.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  include/linux/memremap.h | 17 +++++++++++++++++
>  mm/memremap.c            |  8 ++++++--
>  mm/page_alloc.c          | 34 +++++++++++++++++++++++++++++++++-
>  3 files changed, 56 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 119f130ef8f1..e5ab6d4525c1 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -99,6 +99,10 @@ struct dev_pagemap_ops {
>   * @done: completion for @internal_ref
>   * @type: memory type: see MEMORY_* in memory_hotplug.h
>   * @flags: PGMAP_* flags to specify defailed behavior
> + * @geometry: structural definition of how the vmemmap metadata is populated.
> + *     A zero or PAGE_SIZE defaults to using base pages as the memmap metadata
> + *     representation. A bigger value but also multiple of PAGE_SIZE will set
> + *     up compound struct pages representative of the requested geometry size.
>   * @ops: method table
>   * @owner: an opaque pointer identifying the entity that manages this
>   *     instance.  Used by various helpers to make sure that no
> @@ -114,6 +118,7 @@ struct dev_pagemap {
>         struct completion done;
>         enum memory_type type;
>         unsigned int flags;
> +       unsigned long geometry;
>         const struct dev_pagemap_ops *ops;
>         void *owner;
>         int nr_range;
> @@ -130,6 +135,18 @@ static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap)
>         return NULL;
>  }
>
> +static inline unsigned long pgmap_geometry(struct dev_pagemap *pgmap)
> +{
> +       if (!pgmap || !pgmap->geometry)
> +               return PAGE_SIZE;
> +       return pgmap->geometry;
> +}
> +
> +static inline unsigned long pgmap_pfn_geometry(struct dev_pagemap *pgmap)
> +{
> +       return PHYS_PFN(pgmap_geometry(pgmap));
> +}

Are both needed? Maybe just have ->geometry natively be in nr_pages
units directly, because pgmap_pfn_geometry() makes it confusing
whether it's a geometry of the pfn or the geometry of the pgmap.

> +
>  #ifdef CONFIG_ZONE_DEVICE
>  bool pfn_zone_device_reserved(unsigned long pfn);
>  void *memremap_pages(struct dev_pagemap *pgmap, int nid);
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 805d761740c4..ffcb924eb6a5 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -318,8 +318,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
>         memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
>                                 PHYS_PFN(range->start),
>                                 PHYS_PFN(range_len(range)), pgmap);
> -       percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> -                       - pfn_first(pgmap, range_id));
> +       if (pgmap_geometry(pgmap) > PAGE_SIZE)

This would become

if (pgmap_geometry(pgmap) > 1)

> +               percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
> +                       - pfn_first(pgmap, range_id)) / pgmap_pfn_geometry(pgmap));

...and this would be pgmap_geometry()

> +       else
> +               percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> +                               - pfn_first(pgmap, range_id));
>         return 0;
>
>  err_add_memory:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 79f3b38afeca..188cb5f8c308 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6597,6 +6597,31 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>         }
>  }
>
> +static void __ref memmap_init_compound(struct page *page, unsigned long pfn,

I'd feel better if @page was renamed @head... more below:

> +                                       unsigned long zone_idx, int nid,
> +                                       struct dev_pagemap *pgmap,
> +                                       unsigned long nr_pages)
> +{
> +       unsigned int order_align = order_base_2(nr_pages);
> +       unsigned long i;
> +
> +       __SetPageHead(page);
> +
> +       for (i = 1; i < nr_pages; i++) {

The switch of loop styles is jarring. I.e. the switch from
memmap_init_zone_device() that is using pfn, end_pfn, and a local
'struct page *' variable to this helper using pfn + i and a mix of
helpers (__init_zone_device_page,  prep_compound_tail) that have
different expectations of head page + tail_idx and current page.

I.e. this reads more obviously correct to me, but maybe I'm just in
the wrong headspace:

        for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) {
                struct page *page = pfn_to_page(pfn);

                __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
                prep_compound_tail(head, pfn - head_pfn);

> +               __init_zone_device_page(page + i, pfn + i, zone_idx,
> +                                       nid, pgmap);
> +               prep_compound_tail(page, i);
> +
> +               /*
> +                * The first and second tail pages need to
> +                * initialized first, hence the head page is
> +                * prepared last.

I'd change this comment to say why rather than restate what can be
gleaned from the code. It's actually not clear to me why this order is
necessary.

> +                */
> +               if (i == 2)
> +                       prep_compound_head(page, order_align);
> +       }
> +}
> +
>  void __ref memmap_init_zone_device(struct zone *zone,
>                                    unsigned long start_pfn,
>                                    unsigned long nr_pages,
> @@ -6605,6 +6630,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>         unsigned long pfn, end_pfn = start_pfn + nr_pages;
>         struct pglist_data *pgdat = zone->zone_pgdat;
>         struct vmem_altmap *altmap = pgmap_altmap(pgmap);
> +       unsigned int pfns_per_compound = pgmap_pfn_geometry(pgmap);
>         unsigned long zone_idx = zone_idx(zone);
>         unsigned long start = jiffies;
>         int nid = pgdat->node_id;
> @@ -6622,10 +6648,16 @@ void __ref memmap_init_zone_device(struct zone *zone,
>                 nr_pages = end_pfn - start_pfn;
>         }
>
> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +       for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) {
>                 struct page *page = pfn_to_page(pfn);
>
>                 __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> +
> +               if (pfns_per_compound == 1)
> +                       continue;
> +
> +               memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
> +                                    pfns_per_compound);

I otherwise don't see anything broken with this patch, so feel free to include:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

...on the resend with the fixups.

  reply	other threads:[~2021-07-15  1:08 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 19:35 [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-07-14 19:35 ` [PATCH v3 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-07-15  0:17   ` Dan Williams
2021-07-15  0:17     ` Dan Williams
2021-07-15  2:51   ` [External] " Muchun Song
2021-07-15  2:51     ` Muchun Song
2021-07-15  6:40     ` Christoph Hellwig
2021-07-15  9:19       ` Muchun Song
2021-07-15  9:19         ` Muchun Song
2021-07-15 13:17     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-07-15  0:19   ` Dan Williams
2021-07-15  0:19     ` Dan Williams
2021-07-15  2:53   ` [External] " Muchun Song
2021-07-15  2:53     ` Muchun Song
2021-07-15 13:17     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-07-15  0:20   ` Dan Williams
2021-07-15  0:20     ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-07-15  1:08   ` Dan Williams [this message]
2021-07-15  1:08     ` Dan Williams
2021-07-15 12:52     ` Joao Martins
2021-07-15 13:06       ` Joao Martins
2021-07-15 19:48       ` Dan Williams
2021-07-15 19:48         ` Dan Williams
2021-07-30 16:13         ` Joao Martins
2021-07-22  0:38       ` Jane Chu
2021-07-22 10:56         ` Joao Martins
2021-07-15 12:59     ` Christoph Hellwig
2021-07-15 13:15       ` Joao Martins
2021-07-15  6:48   ` Christoph Hellwig
2021-07-15 13:15     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 05/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-07-28  5:56   ` Dan Williams
2021-07-28  5:56     ` Dan Williams
2021-07-28  9:43     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 06/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-07-28  6:04   ` Dan Williams
2021-07-28  6:04     ` Dan Williams
2021-07-28 10:48     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 07/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-07-15  2:47   ` [External] " Muchun Song
2021-07-15  2:47     ` Muchun Song
2021-07-15 13:16     ` Joao Martins
2021-07-28  6:09   ` Dan Williams
2021-07-28  6:09     ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-07-28  6:55   ` Dan Williams
2021-07-28  6:55     ` Dan Williams
2021-07-28 15:35     ` Joao Martins
2021-07-28 18:03       ` Dan Williams
2021-07-28 18:03         ` Dan Williams
2021-07-28 18:54         ` Joao Martins
2021-07-28 20:04           ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 09/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-07-28  7:28   ` Dan Williams
2021-07-28  7:28     ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-07-28 16:08       ` Dan Williams
2021-07-28 16:08         ` Dan Williams
2021-07-28 16:12         ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 10/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-07-28  7:29   ` Dan Williams
2021-07-28  7:29     ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 11/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-07-28  7:30   ` Dan Williams
2021-07-28  7:30     ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-08-06 12:28       ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 12/14] device-dax: compound pagemap support Joao Martins
2021-07-14 23:36   ` Dan Williams
2021-07-14 23:36     ` Dan Williams
2021-07-15 12:00     ` Joao Martins
2021-07-27 23:51       ` Dan Williams
2021-07-27 23:51         ` Dan Williams
2021-07-28  9:36         ` Joao Martins
2021-07-28 18:51           ` Dan Williams
2021-07-28 18:51             ` Dan Williams
2021-07-28 18:59             ` Joao Martins
2021-07-28 19:03               ` Dan Williams
2021-07-28 19:03                 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 13/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-07-28 19:55   ` Dan Williams
2021-07-28 19:55     ` Dan Williams
2021-07-28 20:07     ` Joao Martins
2021-07-28 20:23       ` Dan Williams
2021-07-28 20:23         ` Dan Williams
2021-08-25 19:10         ` Joao Martins
2021-08-25 19:15           ` Matthew Wilcox
2021-08-25 19:26             ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins
2021-07-28 20:03   ` Dan Williams
2021-07-28 20:03     ` Dan Williams
2021-07-28 20:08     ` Joao Martins
2021-07-14 21:48 ` [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Andrew Morton
2021-07-14 23:47   ` Dan Williams
2021-07-14 23:47     ` Dan Williams
2021-07-22  2:24   ` Matthew Wilcox
2021-07-22 10:53     ` Joao Martins
2021-07-27 23:23       ` Dan Williams
2021-07-27 23:23         ` Dan Williams
2021-08-02 10:40         ` Joao Martins
2021-08-02 14:06           ` Dan Williams
2021-08-02 14:06             ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4jwd_dzTH1H+cbiKqfK5=Xaa9JY=EVKHhPbjicVZA-URQ@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dave.jiang@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.