All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Logan Gunthorpe <logang@deltatee.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	owner-linux-mm@kvack.org
Subject: Re: [PATCH v8 01/12] mm/sparsemem: Introduce struct mem_section_usage
Date: Fri, 10 May 2019 12:38:43 -0700	[thread overview]
Message-ID: <CAPcyv4i1zQb-D-8iB3hr8ipMHH2yV8ssxh+Zeh2aeMw0ZJASfg@mail.gmail.com> (raw)
In-Reply-To: <dd7b53bd986d79a94ac0b08e32336e44@suse.de>

On Fri, May 10, 2019 at 6:30 AM <osalvador@suse.de> wrote:
>
> On 2019-05-07 01:39, Dan Williams wrote:
> > Towards enabling memory hotplug to track partial population of a
> > section, introduce 'struct mem_section_usage'.
> >
> > A pointer to a 'struct mem_section_usage' instance replaces the
> > existing
> > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
> > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to
> > house a new 'subsection_map' bitmap.  The new bitmap enables the memory
> > hot{plug,remove} implementation to act on incremental sub-divisions of
> > a
> > section.
> >
> > The default SUBSECTION_SHIFT is chosen to keep the 'subsection_map' no
> > larger than a single 'unsigned long' on the major architectures.
> > Alternatively an architecture can define ARCH_SUBSECTION_SHIFT to
> > override the default PMD_SHIFT. Note that PowerPC needs to use
> > ARCH_SUBSECTION_SHIFT to workaround PMD_SHIFT being a non-constant
> > expression on PowerPC.
> >
> > The primary motivation for this functionality is to support platforms
> > that mix "System RAM" and "Persistent Memory" within a single section,
> > or multiple PMEM ranges with different mapping lifetimes within a
> > single
> > section. The section restriction for hotplug has caused an ongoing saga
> > of hacks and bugs for devm_memremap_pages() users.
> >
> > Beyond the fixups to teach existing paths how to retrieve the 'usemap'
> > from a section, and updates to usemap allocation path, there are no
> > expected behavior changes.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  arch/powerpc/include/asm/sparsemem.h |    3 +
> >  include/linux/mmzone.h               |   48 +++++++++++++++++++-
> >  mm/memory_hotplug.c                  |   18 ++++----
> >  mm/page_alloc.c                      |    2 -
> >  mm/sparse.c                          |   81
> > +++++++++++++++++-----------------
> >  5 files changed, 99 insertions(+), 53 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/sparsemem.h
> > b/arch/powerpc/include/asm/sparsemem.h
> > index 3192d454a733..1aa3c9303bf8 100644
> > --- a/arch/powerpc/include/asm/sparsemem.h
> > +++ b/arch/powerpc/include/asm/sparsemem.h
> > @@ -10,6 +10,9 @@
> >   */
> >  #define SECTION_SIZE_BITS       24
> >
> > +/* Reflect the largest possible PMD-size as the subsection-size
> > constant */
> > +#define ARCH_SUBSECTION_SHIFT 24
> > +
>
> I guess this is done because PMD_SHIFT is defined at runtime rather at
> compile time,
> right?

Correct, PowerPC has:

    #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
    #define PTE_INDEX_SIZE  __pte_index_size

...where __pte_index_size is variable established at kernel init time.

> >  #endif /* CONFIG_SPARSEMEM */
> >
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 70394cabaf4e..ef8d878079f9 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1160,6 +1160,44 @@ static inline unsigned long
> > section_nr_to_pfn(unsigned long sec)
> >  #define SECTION_ALIGN_UP(pfn)        (((pfn) + PAGES_PER_SECTION - 1) &
> > PAGE_SECTION_MASK)
> >  #define SECTION_ALIGN_DOWN(pfn)      ((pfn) & PAGE_SECTION_MASK)
> >
> > +/*
> > + * SUBSECTION_SHIFT must be constant since it is used to declare
> > + * subsection_map and related bitmaps without triggering the
> > generation
> > + * of variable-length arrays. The most natural size for a subsection
> > is
> > + * a PMD-page. For architectures that do not have a constant PMD-size
> > + * ARCH_SUBSECTION_SHIFT can be set to a constant max size, or
> > otherwise
> > + * fallback to 2MB.
> > + */
> > +#if defined(ARCH_SUBSECTION_SHIFT)
> > +#define SUBSECTION_SHIFT (ARCH_SUBSECTION_SHIFT)
> > +#elif defined(PMD_SHIFT)
> > +#define SUBSECTION_SHIFT (PMD_SHIFT)
> > +#else
> > +/*
> > + * Memory hotplug enabled platforms avoid this default because they
> > + * either define ARCH_SUBSECTION_SHIFT, or PMD_SHIFT is a constant,
> > but
> > + * this is kept as a backstop to allow compilation on
> > + * !ARCH_ENABLE_MEMORY_HOTPLUG archs.
> > + */
> > +#define SUBSECTION_SHIFT 21
> > +#endif
> > +
> > +#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT)
> > +#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT)
> > +#define PAGE_SUBSECTION_MASK ((~(PAGES_PER_SUBSECTION-1)))
> > +
> > +#if SUBSECTION_SHIFT > SECTION_SIZE_BITS
> > +#error Subsection size exceeds section size
> > +#else
> > +#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS -
> > SUBSECTION_SHIFT))
> > +#endif
>
> On powerpc, SUBSECTIONS_PER_SECTION will equal 1 (so one big section),
> is that to be expected?

Yes, it turns out that PowerPC has no real need for subsection support
since they were already using small 16MB sections from day one.

> Will subsection_map_init handle this right?

Yes, should work as subsection_map_index() will always return 0. Which
means that 'end' will always be 0:

    pfns = min(nr_pages, PAGES_PER_SECTION
        - (pfn & ~PAGE_SECTION_MASK));
    end = subsection_map_index(pfn + pfns - 1);

...and then the bitmap manipulation:

    bitmap_set(ms->usage->subsection_map, idx, end - idx + 1);

...will only ever set bit0.

  reply	other threads:[~2019-05-10 19:38 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-06 23:39 [PATCH v8 00/12] mm: Sub-section memory hotplug support Dan Williams
2019-05-06 23:39 ` Dan Williams
2019-05-06 23:39 ` [PATCH v8 01/12] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-05-06 23:39   ` Dan Williams
2019-05-10 13:30   ` osalvador
2019-05-10 19:38     ` Dan Williams [this message]
2019-05-10 19:38       ` Dan Williams
2019-05-06 23:39 ` [PATCH v8 02/12] mm/memremap: Rename and consolidate SECTION_SIZE Dan Williams
2019-05-06 23:39   ` Dan Williams
2019-05-06 23:39 ` [PATCH v8 03/12] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-05-06 23:39   ` Dan Williams
2019-05-10 12:56   ` osalvador
2019-05-06 23:39 ` [PATCH v8 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-05-06 23:39   ` Dan Williams
2019-05-06 23:39 ` [PATCH v8 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-05-06 23:39   ` Dan Williams
2019-05-06 23:39 ` [PATCH v8 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams
2019-05-06 23:39   ` Dan Williams
2019-05-06 23:40 ` [PATCH v8 07/12] mm: Kill is_dev_zone() helper Dan Williams
2019-05-06 23:40   ` Dan Williams
2019-05-06 23:40 ` [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-05-06 23:40   ` Dan Williams
2019-05-10 13:00   ` osalvador
2019-05-10 13:00     ` osalvador
2019-05-06 23:40 ` [PATCH v8 09/12] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-05-06 23:40   ` Dan Williams
2019-05-08 23:15   ` Oscar Salvador
2019-05-08 23:15     ` Oscar Salvador
2019-05-08 23:15     ` Oscar Salvador
2019-05-13 13:54   ` Oscar Salvador
2019-06-04  4:47     ` Dan Williams
2019-06-04  4:47       ` Dan Williams
2019-06-04  4:47       ` Dan Williams
2019-05-06 23:40 ` [PATCH v8 10/12] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-05-06 23:40   ` Dan Williams
2019-05-06 23:40 ` [PATCH v8 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-05-06 23:40   ` Dan Williams
2019-05-06 23:40 ` [PATCH v8 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-05-06 23:40   ` Dan Williams
2019-05-13 21:01 ` [PATCH v8 00/12] mm: Sub-section memory hotplug support Mike Rapoport
2019-05-13 21:01   ` Mike Rapoport
2019-05-13 21:11   ` Dan Williams
2019-05-13 21:11     ` Dan Williams
2019-06-04  7:41 ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4i1zQb-D-8iB3hr8ipMHH2yV8ssxh+Zeh2aeMw0ZJASfg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=logang@deltatee.com \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=osalvador@suse.de \
    --cc=owner-linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulus@samba.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.