From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: Michal Hocko <mhocko@suse.com>, Pavel Tatashin <pasha.tatashin@soleen.com>, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Qian Cai <cai@lca.pw>, Vlastimil Babka <vbabka@suse.cz>, Oscar Salvador <osalvador@suse.de> Subject: [PATCH v10 03/13] mm/sparsemem: Add helpers track active portions of a section at boot Date: Tue, 18 Jun 2019 22:51:48 -0700 [thread overview] Message-ID: <156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <156092349300.979959.17603710711957735135.stgit@dwillia2-desk3.amr.corp.intel.com> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a sub-section active bitmask, each bit representing a PMD_SIZE span of the architecture's memory hotplug section size. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and either determine that the section is an "early section", or read the sub-section active ranges from the bitmask. The expectation is that the bitmask (subsection_map) fits in the same cacheline as the valid_section() / early_section() data, so the incremental performance overhead to pfn_valid() should be negligible. The rationale for using early_section() to short-ciruit the subsection_map check is that there are legacy code paths that use pfn_valid() at section granularity before validating the pfn against pgdat data. So, the early_section() check allows those traditional assumptions to persist while also permitting subsection_map to tell the truth for purposes of populating the unused portions of early sections with PMEM and other ZONE_DEVICE mappings. Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Reported-by: Qian Cai <cai@lca.pw> Tested-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- include/linux/mmzone.h | 33 ++++++++++++++++++++++++++++++++- mm/page_alloc.c | 10 ++++++++-- mm/sparse.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 75 insertions(+), 3 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d081c9a1d25d..c4e8843e283c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1179,6 +1179,8 @@ struct mem_section_usage { unsigned long pageblock_flags[0]; }; +void subsection_map_init(unsigned long pfn, unsigned long nr_pages); + struct page; struct page_ext; struct mem_section { @@ -1322,12 +1324,40 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn) extern int __highest_present_section_nr; +static inline int subsection_map_index(unsigned long pfn) +{ + return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION; +} + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + int idx = subsection_map_index(pfn); + + return test_bit(idx, ms->usage->subsection_map); +} +#else +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + return 1; +} +#endif + #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { + struct mem_section *ms; + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); + ms = __nr_to_section(pfn_to_section_nr(pfn)); + if (!valid_section(ms)) + return 0; + /* + * Traditionally early sections always returned pfn_valid() for + * the entire section-sized span. + */ + return early_section(ms) || pfn_section_valid(ms, pfn); } #endif @@ -1359,6 +1389,7 @@ void sparse_init(void); #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) #define pfn_present pfn_valid +#define subsection_map_init(_pfn, _nr_pages) do {} while (0) #endif /* CONFIG_SPARSEMEM */ /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8cc091e87200..8e7215fb6976 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7306,12 +7306,18 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) (u64)zone_movable_pfn[i] << PAGE_SHIFT); } - /* Print out the early node map */ + /* + * Print out the early node map, and initialize the + * subsection-map relative to active online memory ranges to + * enable future "sub-section" extensions of the memory map. + */ pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1); + subsection_map_init(start_pfn, end_pfn - start_pfn); + } /* Initialise every node */ mminit_verify_pageflags_layout(); diff --git a/mm/sparse.c b/mm/sparse.c index 2031a0694f35..e9fec3c2f7ec 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -210,6 +210,41 @@ static inline unsigned long first_present_section_nr(void) return next_present_section_nr(-1); } +void subsection_mask_set(unsigned long *map, unsigned long pfn, + unsigned long nr_pages) +{ + int idx = subsection_map_index(pfn); + int end = subsection_map_index(pfn + nr_pages - 1); + + bitmap_set(map, idx, end - idx + 1); +} + +void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages) +{ + int end_sec = pfn_to_section_nr(pfn + nr_pages - 1); + int i, start_sec = pfn_to_section_nr(pfn); + + if (!nr_pages) + return; + + for (i = start_sec; i <= end_sec; i++) { + struct mem_section *ms; + unsigned long pfns; + + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + ms = __nr_to_section(i); + subsection_mask_set(ms->usage->subsection_map, pfn, pfns); + + pr_debug("%s: sec: %d pfns: %ld set(%d, %d)\n", __func__, i, + pfns, subsection_map_index(pfn), + subsection_map_index(pfn + pfns - 1)); + + pfn += pfns; + nr_pages -= pfns; + } +} + /* Record a memory area against a node. */ void __init memory_present(int nid, unsigned long start, unsigned long end) { _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>, Logan Gunthorpe <logang@deltatee.com>, Oscar Salvador <osalvador@suse.de>, Pavel Tatashin <pasha.tatashin@soleen.com>, Qian Cai <cai@lca.pw>, Jane Chu <jane.chu@oracle.com>, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org Subject: [PATCH v10 03/13] mm/sparsemem: Add helpers track active portions of a section at boot Date: Tue, 18 Jun 2019 22:51:48 -0700 [thread overview] Message-ID: <156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <156092349300.979959.17603710711957735135.stgit@dwillia2-desk3.amr.corp.intel.com> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a sub-section active bitmask, each bit representing a PMD_SIZE span of the architecture's memory hotplug section size. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and either determine that the section is an "early section", or read the sub-section active ranges from the bitmask. The expectation is that the bitmask (subsection_map) fits in the same cacheline as the valid_section() / early_section() data, so the incremental performance overhead to pfn_valid() should be negligible. The rationale for using early_section() to short-ciruit the subsection_map check is that there are legacy code paths that use pfn_valid() at section granularity before validating the pfn against pgdat data. So, the early_section() check allows those traditional assumptions to persist while also permitting subsection_map to tell the truth for purposes of populating the unused portions of early sections with PMEM and other ZONE_DEVICE mappings. Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Reported-by: Qian Cai <cai@lca.pw> Tested-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- include/linux/mmzone.h | 33 ++++++++++++++++++++++++++++++++- mm/page_alloc.c | 10 ++++++++-- mm/sparse.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 75 insertions(+), 3 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d081c9a1d25d..c4e8843e283c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1179,6 +1179,8 @@ struct mem_section_usage { unsigned long pageblock_flags[0]; }; +void subsection_map_init(unsigned long pfn, unsigned long nr_pages); + struct page; struct page_ext; struct mem_section { @@ -1322,12 +1324,40 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn) extern int __highest_present_section_nr; +static inline int subsection_map_index(unsigned long pfn) +{ + return (pfn & ~(PAGE_SECTION_MASK)) / PAGES_PER_SUBSECTION; +} + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + int idx = subsection_map_index(pfn); + + return test_bit(idx, ms->usage->subsection_map); +} +#else +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) +{ + return 1; +} +#endif + #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { + struct mem_section *ms; + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; - return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); + ms = __nr_to_section(pfn_to_section_nr(pfn)); + if (!valid_section(ms)) + return 0; + /* + * Traditionally early sections always returned pfn_valid() for + * the entire section-sized span. + */ + return early_section(ms) || pfn_section_valid(ms, pfn); } #endif @@ -1359,6 +1389,7 @@ void sparse_init(void); #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) #define pfn_present pfn_valid +#define subsection_map_init(_pfn, _nr_pages) do {} while (0) #endif /* CONFIG_SPARSEMEM */ /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8cc091e87200..8e7215fb6976 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7306,12 +7306,18 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) (u64)zone_movable_pfn[i] << PAGE_SHIFT); } - /* Print out the early node map */ + /* + * Print out the early node map, and initialize the + * subsection-map relative to active online memory ranges to + * enable future "sub-section" extensions of the memory map. + */ pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1); + subsection_map_init(start_pfn, end_pfn - start_pfn); + } /* Initialise every node */ mminit_verify_pageflags_layout(); diff --git a/mm/sparse.c b/mm/sparse.c index 2031a0694f35..e9fec3c2f7ec 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -210,6 +210,41 @@ static inline unsigned long first_present_section_nr(void) return next_present_section_nr(-1); } +void subsection_mask_set(unsigned long *map, unsigned long pfn, + unsigned long nr_pages) +{ + int idx = subsection_map_index(pfn); + int end = subsection_map_index(pfn + nr_pages - 1); + + bitmap_set(map, idx, end - idx + 1); +} + +void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages) +{ + int end_sec = pfn_to_section_nr(pfn + nr_pages - 1); + int i, start_sec = pfn_to_section_nr(pfn); + + if (!nr_pages) + return; + + for (i = start_sec; i <= end_sec; i++) { + struct mem_section *ms; + unsigned long pfns; + + pfns = min(nr_pages, PAGES_PER_SECTION + - (pfn & ~PAGE_SECTION_MASK)); + ms = __nr_to_section(i); + subsection_mask_set(ms->usage->subsection_map, pfn, pfns); + + pr_debug("%s: sec: %d pfns: %ld set(%d, %d)\n", __func__, i, + pfns, subsection_map_index(pfn), + subsection_map_index(pfn + pfns - 1)); + + pfn += pfns; + nr_pages -= pfns; + } +} + /* Record a memory area against a node. */ void __init memory_present(int nid, unsigned long start, unsigned long end) {
next prev parent reply other threads:[~2019-06-19 6:06 UTC|newest] Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-06-19 5:51 [PATCH v10 00/13] mm: Sub-section memory hotplug support Dan Williams 2019-06-19 5:51 ` Dan Williams 2019-06-19 5:51 ` [PATCH v10 01/13] mm/sparsemem: Introduce struct mem_section_usage Dan Williams 2019-06-19 5:51 ` Dan Williams 2019-06-19 5:51 ` [PATCH v10 02/13] mm/sparsemem: Introduce a SECTION_IS_EARLY flag Dan Williams 2019-06-19 5:51 ` Dan Williams 2019-06-24 17:54 ` Oscar Salvador 2019-06-24 17:54 ` Oscar Salvador 2019-06-19 5:51 ` Dan Williams [this message] 2019-06-19 5:51 ` [PATCH v10 03/13] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams 2019-06-24 17:57 ` Oscar Salvador 2019-06-24 17:57 ` Oscar Salvador 2019-06-24 17:57 ` Oscar Salvador 2019-06-19 5:51 ` [PATCH v10 04/13] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams 2019-06-19 5:51 ` Dan Williams 2019-06-19 5:52 ` [PATCH v10 05/13] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-24 18:00 ` Oscar Salvador 2019-06-24 18:00 ` Oscar Salvador 2019-06-24 18:00 ` Oscar Salvador 2019-06-19 5:52 ` [PATCH v10 06/13] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-19 5:52 ` [PATCH v10 07/13] mm: Kill is_dev_zone() helper Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-19 5:52 ` [PATCH v10 08/13] mm/sparsemem: Prepare for sub-section ranges Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-20 10:31 ` David Hildenbrand 2019-06-20 10:31 ` David Hildenbrand 2019-06-20 16:19 ` Dan Williams 2019-06-20 16:19 ` Dan Williams 2019-06-20 16:35 ` David Hildenbrand 2019-06-20 16:35 ` David Hildenbrand 2019-06-20 16:56 ` Dan Williams 2019-06-20 16:56 ` Dan Williams 2019-06-20 16:56 ` Dan Williams 2019-06-24 18:05 ` Oscar Salvador 2019-06-24 18:05 ` Oscar Salvador 2019-06-24 18:05 ` Oscar Salvador 2019-06-19 5:52 ` [PATCH v10 09/13] mm/sparsemem: Support sub-section hotplug Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-24 20:45 ` Oscar Salvador 2019-06-24 20:45 ` Oscar Salvador 2019-06-24 20:45 ` Oscar Salvador 2019-06-19 5:52 ` [PATCH v10 10/13] mm: Document ZONE_DEVICE memory-model implications Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-20 12:30 ` Mike Rapoport 2019-06-20 12:30 ` Mike Rapoport 2019-06-19 5:52 ` [PATCH v10 11/13] mm/devm_memremap_pages: Enable sub-section remap Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-19 5:52 ` [PATCH v10 12/13] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-19 16:30 ` Aneesh Kumar K.V 2019-06-19 17:06 ` Dan Williams 2019-06-19 17:06 ` Dan Williams 2019-06-19 17:06 ` Dan Williams 2019-06-19 5:52 ` [PATCH v10 13/13] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams 2019-06-19 5:52 ` Dan Williams 2019-06-20 12:30 ` [PATCH v10 00/13] mm: Sub-section memory hotplug support Aneesh Kumar K.V 2019-06-20 12:30 ` Aneesh Kumar K.V 2019-06-20 16:30 ` Dan Williams 2019-06-20 16:30 ` Dan Williams 2019-06-20 17:00 ` Oscar Salvador 2019-06-20 17:00 ` Oscar Salvador
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=cai@lca.pw \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=mhocko@suse.com \ --cc=osalvador@suse.de \ --cc=pasha.tatashin@soleen.com \ --cc=vbabka@suse.cz \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.