From: Dan Williams <dan.j.williams@intel.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Logan Gunthorpe <logang@deltatee.com>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
Linux MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug
Date: Mon, 3 Jun 2019 21:17:44 -0700 [thread overview]
Message-ID: <CAPcyv4jx-+QJC3Aw-wY9PWshCWpu2VZKZz=PjTO7jN5Ojxz+pg@mail.gmail.com> (raw)
In-Reply-To: <20190503125634.GH15740@linux>
On Fri, May 3, 2019 at 5:56 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, May 01, 2019 at 10:56:10PM -0700, Dan Williams wrote:
> > The libnvdimm sub-system has suffered a series of hacks and broken
> > workarounds for the memory-hotplug implementation's awkward
> > section-aligned (128MB) granularity. For example the following backtrace
> > is emitted when attempting arch_add_memory() with physical address
> > ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
> > within a given section:
> >
> > WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
> > devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
> > [..]
> > Call Trace:
> > dump_stack+0x86/0xc3
> > __warn+0xcb/0xf0
> > warn_slowpath_fmt+0x5f/0x80
> > devm_memremap_pages+0x3b5/0x4c0
> > __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
> > pmem_attach_disk+0x19a/0x440 [nd_pmem]
> >
> > Recently it was discovered that the problem goes beyond RAM vs PMEM
> > collisions as some platform produce PMEM vs PMEM collisions within a
> > given section. The libnvdimm workaround for that case revealed that the
> > libnvdimm section-alignment-padding implementation has been broken for a
> > long while. A fix for that long-standing breakage introduces as many
> > problems as it solves as it would require a backward-incompatible change
> > to the namespace metadata interpretation. Instead of that dubious route
> > [1], address the root problem in the memory-hotplug implementation.
> >
> > [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > mm/sparse.c | 223 ++++++++++++++++++++++++++++++++++++++++-------------------
> > 1 file changed, 150 insertions(+), 73 deletions(-)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 198371e5fc87..419a3620af6e 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
> > unsigned long root = SECTION_NR_TO_ROOT(section_nr);
> > struct mem_section *section;
> >
> > + /*
> > + * An existing section is possible in the sub-section hotplug
> > + * case. First hot-add instantiates, follow-on hot-add reuses
> > + * the existing section.
> > + *
> > + * The mem_hotplug_lock resolves the apparent race below.
> > + */
> > if (mem_section[root])
> > - return -EEXIST;
> > + return 0;
>
> Just a sidenote: we do not bail out on -EEXIST, so it should be fine if we
> stick with it.
> But if not, I would then clean up sparse_add_section:
>
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -901,13 +901,12 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> int ret;
>
> ret = sparse_index_init(section_nr, nid);
> - if (ret < 0 && ret != -EEXIST)
> + if (ret < 0)
> return ret;
>
> memmap = section_activate(nid, start_pfn, nr_pages, altmap);
> if (IS_ERR(memmap))
> return PTR_ERR(memmap);
> - ret = 0;
Good catch, folded the cleanup.
>
>
> > +
> > + if (!mask)
> > + rc = -EINVAL;
> > + else if (mask & ms->usage->map_active)
>
> else if (ms->usage->map_active) should be enough?
>
> > + rc = -EEXIST;
> > + else
> > + ms->usage->map_active |= mask;
> > +
> > + if (rc) {
> > + if (usage)
> > + ms->usage = NULL;
> > + kfree(usage);
> > + return ERR_PTR(rc);
> > + }
> > +
> > + /*
> > + * The early init code does not consider partially populated
> > + * initial sections, it simply assumes that memory will never be
> > + * referenced. If we hot-add memory into such a section then we
> > + * do not need to populate the memmap and can simply reuse what
> > + * is already there.
> > + */
>
> This puzzles me a bit.
> I think we cannot have partially populated early sections, can we?
Yes, at boot memory need not be section aligned it has historically
been handled as a un-removable section of memory with holes.
> And how we even come to hot-add memory into those?
>
> Could you please elaborate a bit here?
Those sections are excluded from add_memory_resource() adding more
memory, but arch_add_memory() with sub-section support can fill in the
subsection holes in mem_map.
>
> > + ms = __pfn_to_section(start_pfn);
> > section_mark_present(ms);
> > - sparse_init_one_section(ms, section_nr, memmap, usage);
> > + sparse_init_one_section(ms, section_nr, memmap, ms->usage);
> >
> > -out:
> > - if (ret < 0) {
> > - kfree(usage);
> > - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
> > - }
> > + if (ret < 0)
> > + section_deactivate(start_pfn, nr_pages, nid, altmap);
>
> Uhm, if my eyes do not trick me, ret is only used for the return value from
> sparse_index_init(), so this is not needed. Can we get rid of it?
Yes, these can go.
Apologies for the delay and missing these comments in the v8 posting.
next prev parent reply other threads:[~2019-06-04 4:17 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-02 5:55 [PATCH v7 00/12] mm: Sub-section memory hotplug support Dan Williams
2019-05-02 5:55 ` [PATCH v7 01/12] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-05-03 7:35 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section Dan Williams
2019-05-03 8:06 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-05-02 7:48 ` Oscar Salvador
2019-05-02 14:03 ` Dan Williams
2019-05-03 7:31 ` Oscar Salvador
2019-05-03 19:52 ` Pavel Tatashin
2019-05-02 5:55 ` [PATCH v7 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-05-02 5:55 ` [PATCH v7 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-05-03 8:46 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams
2019-05-02 11:27 ` David Hildenbrand
2019-05-03 7:37 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 07/12] mm: Kill is_dev_zone() helper Dan Williams
2019-05-02 5:56 ` [PATCH v7 08/12] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-05-03 11:00 ` Oscar Salvador
2019-05-02 5:56 ` [PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-05-03 12:56 ` Oscar Salvador
2019-06-04 4:17 ` Dan Williams [this message]
2019-05-02 5:56 ` [PATCH v7 10/12] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-05-02 5:56 ` [PATCH v7 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-05-02 5:56 ` [PATCH v7 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPcyv4jx-+QJC3Aw-wY9PWshCWpu2VZKZz=PjTO7jN5Ojxz+pg@mail.gmail.com' \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=logang@deltatee.com \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).