linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API
@ 2021-06-02 11:10 Max Gurtovoy
  2021-06-02 11:10 ` [PATCH 1/3] mm,memory_hotplug: export mhp min alignment Max Gurtovoy
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-02 11:10 UTC (permalink / raw)
  To: linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, david, oren, linux-kernel, akpm, Max Gurtovoy

Hi all,
In hotplugged memory (from check_pfn_span function):
"
Disallow all operations smaller than a sub-section and only
allow operations smaller than a section for
SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
enforces a larger memory_block_size_bytes() granularity for
memory that will be marked online, so this check should only
fire for direct arch_{add,remove}_memory() users outside of
add_memory_resource()
"

This restriction will disqualify, for example, large NVMe CMBs that might have
non power of 2 number of pages (e.g. 32767 pages of 4KB). For these
devices, the CMB size will be rounded down from 0x7fff000 to 0x7e00000
but it's better than having un-mapped CMB.

If we all agree on the approach, this RFC can assist as-is to these NVMe
devices and other P2PMEM devices in the future and can considered for
the next merge window.

Max Gurtovoy (3):
  mm,memory_hotplug: export mhp min alignment
  PCI/P2PMEM: introduce pci_p2pdma_align_size API
  nvme-pci: align CMB size according to P2PMEM alignment

 drivers/nvme/host/pci.c        |  7 +++++++
 drivers/pci/p2pdma.c           | 23 +++++++++++++++++++++++
 include/linux/memory_hotplug.h |  5 +++++
 include/linux/pci-p2pdma.h     |  5 +++++
 mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
 5 files changed, 59 insertions(+), 14 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] mm,memory_hotplug: export mhp min alignment
  2021-06-02 11:10 [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
@ 2021-06-02 11:10 ` Max Gurtovoy
  2021-06-02 12:14   ` David Hildenbrand
  2021-06-02 11:10 ` [PATCH 2/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-02 11:10 UTC (permalink / raw)
  To: linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, david, oren, linux-kernel, akpm, Max Gurtovoy

Hotplugged memory has alignmet restrictions. E.g, it disallows all
operations smaller than a sub-section and only allow operations smaller
than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
for mhp users.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 include/linux/memory_hotplug.h |  5 +++++
 mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 28f32fd00fe9..c55a9049b11e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -76,6 +76,7 @@ struct mhp_params {
 
 bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
 struct range mhp_get_pluggable_range(bool need_mapping);
+unsigned long mhp_get_min_align(void);
 
 /*
  * Zone resizing functions
@@ -248,6 +249,10 @@ void mem_hotplug_done(void);
 	___page;				\
  })
 
+static inline unsigned long mhp_get_min_align(void)
+{
+	return 0;
+}
 static inline unsigned zone_span_seqbegin(struct zone *zone)
 {
 	return 0;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9e86e9ee0a10..161bb6704a9b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -270,24 +270,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 }
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
+/*
+ * Disallow all operations smaller than a sub-section and only
+ * allow operations smaller than a section for
+ * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
+ * enforces a larger memory_block_size_bytes() granularity for
+ * memory that will be marked online, so this check should only
+ * fire for direct arch_{add,remove}_memory() users outside of
+ * add_memory_resource().
+ */
+unsigned long mhp_get_min_align(void)
+{
+	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+		return PAGES_PER_SUBSECTION;
+	return PAGES_PER_SECTION;
+}
+EXPORT_SYMBOL_GPL(mhp_get_min_align);
+
+
 static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
 		const char *reason)
 {
-	/*
-	 * Disallow all operations smaller than a sub-section and only
-	 * allow operations smaller than a section for
-	 * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
-	 * enforces a larger memory_block_size_bytes() granularity for
-	 * memory that will be marked online, so this check should only
-	 * fire for direct arch_{add,remove}_memory() users outside of
-	 * add_memory_resource().
-	 */
-	unsigned long min_align;
+	unsigned long min_align = mhp_get_min_align();
 
-	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
-		min_align = PAGES_PER_SUBSECTION;
-	else
-		min_align = PAGES_PER_SECTION;
 	if (!IS_ALIGNED(pfn, min_align) || !IS_ALIGNED(nr_pages, min_align)) {
 		WARN(1, "Misaligned __%s_pages min_align: %#lx start: %#lx end: %#lx\n",
 		     reason, min_align, pfn, pfn + nr_pages - 1);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API
  2021-06-02 11:10 [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
  2021-06-02 11:10 ` [PATCH 1/3] mm,memory_hotplug: export mhp min alignment Max Gurtovoy
@ 2021-06-02 11:10 ` Max Gurtovoy
  2021-06-02 11:10 ` [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment Max Gurtovoy
  2021-06-02 12:08 ` [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API David Hildenbrand
  3 siblings, 0 replies; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-02 11:10 UTC (permalink / raw)
  To: linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, david, oren, linux-kernel, akpm, Max Gurtovoy

The mhp layer has alignment restrictions that may cause a failure in
adding a large P2PMEM that has non-aligned number of pages. The new API
pci_p2pdma_align_size will align the P2PMEM size to be inline with the
mhp restrictions.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/pci/p2pdma.c       | 23 +++++++++++++++++++++++
 include/linux/pci-p2pdma.h |  5 +++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 196382630363..d232bc4ada2b 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -220,6 +220,29 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
 
+/**
+ * pci_p2pdma_align_size - align p2p memory size
+ * @size: size of the initial memory to align before adding.
+ *
+ * The size will be aligned according to mph alignment rules. This function
+ * should be called before adding new p2pdma resource since non-aligned memory
+ * will not be added.
+ */
+size_t pci_p2pdma_align_size(size_t size)
+{
+	unsigned int min_align, nr_pages;
+
+	min_align = mhp_get_min_align();
+	nr_pages = size >> PAGE_SHIFT;
+	if (!IS_ALIGNED(nr_pages, min_align)) {
+		nr_pages = ALIGN_DOWN(nr_pages, min_align);
+		size = nr_pages << PAGE_SHIFT;
+	}
+
+	return size;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_align_size);
+
 /*
  * Note this function returns the parent PCI device with a
  * reference taken. It is the caller's responsibility to drop
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..33a8fce52bec 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -38,6 +38,7 @@ int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
 			    bool *use_p2pdma);
 ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
 			       bool use_p2pdma);
+size_t pci_p2pdma_align_size(size_t size);
 #else /* CONFIG_PCI_P2PDMA */
 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
 		size_t size, u64 offset)
@@ -105,6 +106,10 @@ static inline ssize_t pci_p2pdma_enable_show(char *page,
 {
 	return sprintf(page, "none\n");
 }
+static inline size_t pci_p2pdma_align_size(size_t size)
+{
+	return 0;
+}
 #endif /* CONFIG_PCI_P2PDMA */
 
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment
  2021-06-02 11:10 [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
  2021-06-02 11:10 ` [PATCH 1/3] mm,memory_hotplug: export mhp min alignment Max Gurtovoy
  2021-06-02 11:10 ` [PATCH 2/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
@ 2021-06-02 11:10 ` Max Gurtovoy
  2021-06-02 14:39   ` Keith Busch
  2021-06-02 12:08 ` [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API David Hildenbrand
  3 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-02 11:10 UTC (permalink / raw)
  To: linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, david, oren, linux-kernel, akpm, Max Gurtovoy

P2PMEM is using mhp framework to connect to the memory subsystem. In
case the CMB size is not compatible to mhp alignment, the CMB mapping
will fail. Use pci_p2pdma_align_size to align CMB size in order to
successfully map non-aligned original CMB.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/host/pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a29b170701fc..1197263b4cd0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1817,6 +1817,7 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	u64 size, offset;
 	resource_size_t bar_size;
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
+	unsigned long nr_pages, min_align;
 	int bar;
 
 	if (dev->cmb_size)
@@ -1856,6 +1857,12 @@ static void nvme_map_cmb(struct nvme_dev *dev)
 	if (size > bar_size - offset)
 		size = bar_size - offset;
 
+	size = pci_p2pdma_align_size(size);
+	if (!size) {
+		dev_warn(dev->ctrl.device, "CMB size is 0 after alignment\n");
+		return;
+	}
+
 	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
 		dev_warn(dev->ctrl.device,
 			 "failed to register the CMB\n");
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API
  2021-06-02 11:10 [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
                   ` (2 preceding siblings ...)
  2021-06-02 11:10 ` [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment Max Gurtovoy
@ 2021-06-02 12:08 ` David Hildenbrand
  3 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-06-02 12:08 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, oren, linux-kernel, akpm

On 02.06.21 13:10, Max Gurtovoy wrote:
> Hi all,
> In hotplugged memory (from check_pfn_span function):
> "
> Disallow all operations smaller than a sub-section and only
> allow operations smaller than a section for
> SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
> enforces a larger memory_block_size_bytes() granularity for
> memory that will be marked online, so this check should only
> fire for direct arch_{add,remove}_memory() users outside of
> add_memory_resource()
> "
> 
> This restriction will disqualify, for example, large NVMe CMBs that might have
> non power of 2 number of pages (e.g. 32767 pages of 4KB). For these
> devices, the CMB size will be rounded down from 0x7fff000 to 0x7e00000
> but it's better than having un-mapped CMB.

Just some high-level questions:

A CMB is just a PCI BAR used for communicating with the device, to be 
mapped into physical address space, right? I assume the relevant hotplug 
code is:

drivers/pci/p2pdma.c:   addr = devm_memremap_pages(&pdev->dev, pgmap);

correct?


Having a BAR span such weird sizes will most probably never be fully 
supported. But if sub-sections work for you, great.



-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment
  2021-06-02 11:10 ` [PATCH 1/3] mm,memory_hotplug: export mhp min alignment Max Gurtovoy
@ 2021-06-02 12:14   ` David Hildenbrand
  2021-06-03 10:52     ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-06-02 12:14 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, oren, linux-kernel, akpm

On 02.06.21 13:10, Max Gurtovoy wrote:
> Hotplugged memory has alignmet restrictions. E.g, it disallows all
> operations smaller than a sub-section and only allow operations smaller
> than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
> for mhp users.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>   include/linux/memory_hotplug.h |  5 +++++
>   mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
>   2 files changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 28f32fd00fe9..c55a9049b11e 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -76,6 +76,7 @@ struct mhp_params {
>   
>   bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
>   struct range mhp_get_pluggable_range(bool need_mapping);
> +unsigned long mhp_get_min_align(void);
>   
>   /*
>    * Zone resizing functions
> @@ -248,6 +249,10 @@ void mem_hotplug_done(void);
>   	___page;				\
>    })
>   
> +static inline unsigned long mhp_get_min_align(void)
> +{
> +	return 0;
> +}
>   static inline unsigned zone_span_seqbegin(struct zone *zone)
>   {
>   	return 0;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9e86e9ee0a10..161bb6704a9b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -270,24 +270,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>   }
>   #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>   
> +/*
> + * Disallow all operations smaller than a sub-section and only
> + * allow operations smaller than a section for
> + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
> + * enforces a larger memory_block_size_bytes() granularity for
> + * memory that will be marked online, so this check should only
> + * fire for direct arch_{add,remove}_memory() users outside of
> + * add_memory_resource().
> + */
> +unsigned long mhp_get_min_align(void)
> +{
> +	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> +		return PAGES_PER_SUBSECTION;
> +	return PAGES_PER_SECTION;
> +}
> +EXPORT_SYMBOL_GPL(mhp_get_min_align);

We have to main interfaces to "hotplug" memory:

a) add_memory() and friends for System RAM, which have memory block 
alignment requirements.

b) memremap_pages(), which has the alignemnt requirements you mention here.

I feel like what you need would better be exposed in mm/memremap.c, for 
example, via "memremap_min_alignment" so it matches the "memremap_pages" 
semantics.

And then, memremap_pages() is only available with CONFIG_ZONE_DEVICE, 
which depends on SPARSEMEM_VMEMMAP. So you'll always have 
PAGES_PER_SUBSECTION.

I can already spot "memremap_compat_align", maybe you can reuse that or 
handle it accordingly in there?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment
  2021-06-02 11:10 ` [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment Max Gurtovoy
@ 2021-06-02 14:39   ` Keith Busch
  2021-06-02 14:51     ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2021-06-02 14:39 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, dan.j.williams, logang, linux-mm, hch, sagi, david,
	oren, linux-kernel, akpm

On Wed, Jun 02, 2021 at 02:10:55PM +0300, Max Gurtovoy wrote:
> P2PMEM is using mhp framework to connect to the memory subsystem. In
> case the CMB size is not compatible to mhp alignment, the CMB mapping
> will fail. Use pci_p2pdma_align_size to align CMB size in order to
> successfully map non-aligned original CMB.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/nvme/host/pci.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a29b170701fc..1197263b4cd0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1817,6 +1817,7 @@ static void nvme_map_cmb(struct nvme_dev *dev)
>  	u64 size, offset;
>  	resource_size_t bar_size;
>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
> +	unsigned long nr_pages, min_align;

These new variables don't appear to be used anywhere.

>  	int bar;
>  
>  	if (dev->cmb_size)
> @@ -1856,6 +1857,12 @@ static void nvme_map_cmb(struct nvme_dev *dev)
>  	if (size > bar_size - offset)
>  		size = bar_size - offset;
>  
> +	size = pci_p2pdma_align_size(size);
> +	if (!size) {
> +		dev_warn(dev->ctrl.device, "CMB size is 0 after alignment\n");
> +		return;
> +	}
> +
>  	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
>  		dev_warn(dev->ctrl.device,
>  			 "failed to register the CMB\n");
> -- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment
  2021-06-02 14:39   ` Keith Busch
@ 2021-06-02 14:51     ` Max Gurtovoy
  0 siblings, 0 replies; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-02 14:51 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, dan.j.williams, logang, linux-mm, hch, sagi, david,
	oren, linux-kernel, akpm


On 6/2/2021 5:39 PM, Keith Busch wrote:
> On Wed, Jun 02, 2021 at 02:10:55PM +0300, Max Gurtovoy wrote:
>> P2PMEM is using mhp framework to connect to the memory subsystem. In
>> case the CMB size is not compatible to mhp alignment, the CMB mapping
>> will fail. Use pci_p2pdma_align_size to align CMB size in order to
>> successfully map non-aligned original CMB.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   drivers/nvme/host/pci.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index a29b170701fc..1197263b4cd0 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -1817,6 +1817,7 @@ static void nvme_map_cmb(struct nvme_dev *dev)
>>   	u64 size, offset;
>>   	resource_size_t bar_size;
>>   	struct pci_dev *pdev = to_pci_dev(dev->dev);
>> +	unsigned long nr_pages, min_align;
> These new variables don't appear to be used anywhere.

oh, I have some leftovers..

Sorry.


>
>>   	int bar;
>>   
>>   	if (dev->cmb_size)
>> @@ -1856,6 +1857,12 @@ static void nvme_map_cmb(struct nvme_dev *dev)
>>   	if (size > bar_size - offset)
>>   		size = bar_size - offset;
>>   
>> +	size = pci_p2pdma_align_size(size);
>> +	if (!size) {
>> +		dev_warn(dev->ctrl.device, "CMB size is 0 after alignment\n");
>> +		return;
>> +	}
>> +
>>   	if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
>>   		dev_warn(dev->ctrl.device,
>>   			 "failed to register the CMB\n");
>> -- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment
  2021-06-02 12:14   ` David Hildenbrand
@ 2021-06-03 10:52     ` Max Gurtovoy
  2021-06-21 16:11       ` Max Gurtovoy
  0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-03 10:52 UTC (permalink / raw)
  To: David Hildenbrand, linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, oren, linux-kernel, akpm


On 6/2/2021 3:14 PM, David Hildenbrand wrote:
> On 02.06.21 13:10, Max Gurtovoy wrote:
>> Hotplugged memory has alignmet restrictions. E.g, it disallows all
>> operations smaller than a sub-section and only allow operations smaller
>> than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
>> for mhp users.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   include/linux/memory_hotplug.h |  5 +++++
>>   mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
>>   2 files changed, 24 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/memory_hotplug.h 
>> b/include/linux/memory_hotplug.h
>> index 28f32fd00fe9..c55a9049b11e 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -76,6 +76,7 @@ struct mhp_params {
>>     bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
>>   struct range mhp_get_pluggable_range(bool need_mapping);
>> +unsigned long mhp_get_min_align(void);
>>     /*
>>    * Zone resizing functions
>> @@ -248,6 +249,10 @@ void mem_hotplug_done(void);
>>       ___page;                \
>>    })
>>   +static inline unsigned long mhp_get_min_align(void)
>> +{
>> +    return 0;
>> +}
>>   static inline unsigned zone_span_seqbegin(struct zone *zone)
>>   {
>>       return 0;
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9e86e9ee0a10..161bb6704a9b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -270,24 +270,29 @@ void __init 
>> register_page_bootmem_info_node(struct pglist_data *pgdat)
>>   }
>>   #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>   +/*
>> + * Disallow all operations smaller than a sub-section and only
>> + * allow operations smaller than a section for
>> + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
>> + * enforces a larger memory_block_size_bytes() granularity for
>> + * memory that will be marked online, so this check should only
>> + * fire for direct arch_{add,remove}_memory() users outside of
>> + * add_memory_resource().
>> + */
>> +unsigned long mhp_get_min_align(void)
>> +{
>> +    if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>> +        return PAGES_PER_SUBSECTION;
>> +    return PAGES_PER_SECTION;
>> +}
>> +EXPORT_SYMBOL_GPL(mhp_get_min_align);
>
> We have to main interfaces to "hotplug" memory:
>
> a) add_memory() and friends for System RAM, which have memory block 
> alignment requirements.
>
> b) memremap_pages(), which has the alignemnt requirements you mention 
> here.
>
> I feel like what you need would better be exposed in mm/memremap.c, 
> for example, via "memremap_min_alignment" so it matches the 
> "memremap_pages" semantics.
>
> And then, memremap_pages() is only available with CONFIG_ZONE_DEVICE, 
> which depends on SPARSEMEM_VMEMMAP. So you'll always have 
> PAGES_PER_SUBSECTION.
>
> I can already spot "memremap_compat_align", maybe you can reuse that 
> or handle it accordingly in there?

Yes I think that since subsection is aligned to PAGE_SIZE I can do:

size_t pci_p2pdma_align_size(size_t size)
{
         unsigned long min_align;

         min_align = memremap_compat_align();
         if (!IS_ALIGNED(size, min_align))
                 return ALIGN_DOWN(size, min_align);

         return size;
}


thoughts ?

>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment
  2021-06-03 10:52     ` Max Gurtovoy
@ 2021-06-21 16:11       ` Max Gurtovoy
  0 siblings, 0 replies; 10+ messages in thread
From: Max Gurtovoy @ 2021-06-21 16:11 UTC (permalink / raw)
  To: David Hildenbrand, linux-nvme, dan.j.williams, logang, linux-mm, hch
  Cc: sagi, oren, linux-kernel, akpm

hi David,

do we have a conclusion for this series ?

Is the below suggestion accepted by the maintainers ?

I would like to send a new series before closing 5.14 merge window.

On 6/3/2021 1:52 PM, Max Gurtovoy wrote:
>
> On 6/2/2021 3:14 PM, David Hildenbrand wrote:
>> On 02.06.21 13:10, Max Gurtovoy wrote:
>>> Hotplugged memory has alignmet restrictions. E.g, it disallows all
>>> operations smaller than a sub-section and only allow operations smaller
>>> than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
>>> for mhp users.
>>>
>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>> ---
>>>   include/linux/memory_hotplug.h |  5 +++++
>>>   mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
>>>   2 files changed, 24 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/linux/memory_hotplug.h 
>>> b/include/linux/memory_hotplug.h
>>> index 28f32fd00fe9..c55a9049b11e 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -76,6 +76,7 @@ struct mhp_params {
>>>     bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
>>>   struct range mhp_get_pluggable_range(bool need_mapping);
>>> +unsigned long mhp_get_min_align(void);
>>>     /*
>>>    * Zone resizing functions
>>> @@ -248,6 +249,10 @@ void mem_hotplug_done(void);
>>>       ___page;                \
>>>    })
>>>   +static inline unsigned long mhp_get_min_align(void)
>>> +{
>>> +    return 0;
>>> +}
>>>   static inline unsigned zone_span_seqbegin(struct zone *zone)
>>>   {
>>>       return 0;
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 9e86e9ee0a10..161bb6704a9b 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -270,24 +270,29 @@ void __init 
>>> register_page_bootmem_info_node(struct pglist_data *pgdat)
>>>   }
>>>   #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>>   +/*
>>> + * Disallow all operations smaller than a sub-section and only
>>> + * allow operations smaller than a section for
>>> + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
>>> + * enforces a larger memory_block_size_bytes() granularity for
>>> + * memory that will be marked online, so this check should only
>>> + * fire for direct arch_{add,remove}_memory() users outside of
>>> + * add_memory_resource().
>>> + */
>>> +unsigned long mhp_get_min_align(void)
>>> +{
>>> +    if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>>> +        return PAGES_PER_SUBSECTION;
>>> +    return PAGES_PER_SECTION;
>>> +}
>>> +EXPORT_SYMBOL_GPL(mhp_get_min_align);
>>
>> We have to main interfaces to "hotplug" memory:
>>
>> a) add_memory() and friends for System RAM, which have memory block 
>> alignment requirements.
>>
>> b) memremap_pages(), which has the alignemnt requirements you mention 
>> here.
>>
>> I feel like what you need would better be exposed in mm/memremap.c, 
>> for example, via "memremap_min_alignment" so it matches the 
>> "memremap_pages" semantics.
>>
>> And then, memremap_pages() is only available with CONFIG_ZONE_DEVICE, 
>> which depends on SPARSEMEM_VMEMMAP. So you'll always have 
>> PAGES_PER_SUBSECTION.
>>
>> I can already spot "memremap_compat_align", maybe you can reuse that 
>> or handle it accordingly in there?
>
> Yes I think that since subsection is aligned to PAGE_SIZE I can do:
>
> size_t pci_p2pdma_align_size(size_t size)
> {
>         unsigned long min_align;
>
>         min_align = memremap_compat_align();
>         if (!IS_ALIGNED(size, min_align))
>                 return ALIGN_DOWN(size, min_align);
>
>         return size;
> }
>
>
> thoughts ?
>
>>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-21 16:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 11:10 [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
2021-06-02 11:10 ` [PATCH 1/3] mm,memory_hotplug: export mhp min alignment Max Gurtovoy
2021-06-02 12:14   ` David Hildenbrand
2021-06-03 10:52     ` Max Gurtovoy
2021-06-21 16:11       ` Max Gurtovoy
2021-06-02 11:10 ` [PATCH 2/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API Max Gurtovoy
2021-06-02 11:10 ` [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment Max Gurtovoy
2021-06-02 14:39   ` Keith Busch
2021-06-02 14:51     ` Max Gurtovoy
2021-06-02 12:08 ` [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).