iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA
@ 2020-07-23 13:13 Barry Song
  2020-07-23 13:13 ` [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
  2020-07-23 13:13 ` [PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers Barry Song
  0 siblings, 2 replies; 8+ messages in thread
From: Barry Song @ 2020-07-23 13:13 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, will, ganapatrao.kulkarni,
	catalin.marinas
  Cc: linux-kernel, linuxarm, huangdaode, iommu, prime.zeng, linux-arm-kernel

Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
memory to save command queues[1]. I also did similar job in patch
"iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
[2] while not realizing Ganapatrao has done that before.

But it seems it is much better to make dma_alloc_coherent() to be
inherently NUMA-aware on NUMA-capable systems.

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
Saving queues and tables remotely will increase the latency of ARM SMMU
significantly. For example, when SMMU is at node2 and the default global
CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
have to wait more than 550ns for the completion of the command CMD_SYNC.
However, if we save them locally, we only need to wait for 240ns.

with per-numa CMA, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.

Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html
[2] https://www.spinics.net/lists/iommu/msg44767.html

-v4:
 * rebase on top of Christoph Hellwig's patch:
 [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous
 https://lore.kernel.org/linux-iommu/20200723120133.94105-1-hch@lst.de/
 * cleanup according to Christoph's comment
 * patch 2/2 rebased on top of linux-next to avoid arch/arm64 conflicts
 * reserve cma by checking N_MEMORY rather than N_ONLINE

-v3:
  * move to use page_to_nid() while freeing cma with respect to Robin's
  comment, but this will only work after applying my below patch:
  "mm/cma.c: use exact_nid true to fix possible per-numa cma leak"
  https://marc.info/?l=linux-mm&m=159333034726647&w=2

  * handle the case count <= 1 more properly according to Robin's
  comment;

  * add pernuma_cma parameter to support dynamic setting of per-numa
  cma size;
  ideally we can leverage the CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE and
  "cma=" kernel parameter and avoid a new paramter separately for per-
  numa cma. Practically, it is really too complicated considering the
  below problems:
  (1) if we leverage the size of default numa for per-numa, we have to
  avoid creating two cma with same size in node0 since default cma is
  probably on node0.
  (2) default cma can consider the address limitation for old devices
  while per-numa cma doesn't support GFP_DMA and GFP_DMA32. all
  allocations with limitation flags will fallback to default one.
  (3) hard to apply CMA_SIZE_PERCENTAGE to per-numa. it is hard to
  decide if the percentage should apply to the whole memory size
  or only apply to the memory size of a specific numa node.
  (4) default cma size has CMA_SIZE_SEL_MIN and CMA_SIZE_SEL_MAX, it
  makes things even more complicated to per-numa cma.

  I haven't figured out a good way to leverage the size of default cma
  for per-numa cma. it seems a separate parameter for per-numa could
  make life easier.

  * move dma_pernuma_cma_reserve() after hugetlb_cma_reserve() to
  reuse the comment before hugetlb_cma_reserve() with respect to
  Robin's comment

-v2: 
  * fix some issues reported by kernel test robot
  * fallback to default cma while allocation fails in per-numa cma
     free memory properly

Barry Song (2):
  dma-direct: provide the ability to reserve per-numa CMA
  arm64: mm: reserve per-numa CMA to localize coherent dma buffers

 .../admin-guide/kernel-parameters.txt         |  9 ++
 arch/arm64/mm/init.c                          |  2 +
 include/linux/dma-contiguous.h                |  4 +
 kernel/dma/Kconfig                            | 10 ++
 kernel/dma/contiguous.c                       | 94 +++++++++++++++++--
 5 files changed, 109 insertions(+), 10 deletions(-)

-- 
2.27.0


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA
  2020-07-23 13:13 [PATCH v4 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA Barry Song
@ 2020-07-23 13:13 ` Barry Song
  2020-07-28 11:52   ` Christoph Hellwig
  2020-07-23 13:13 ` [PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers Barry Song
  1 sibling, 1 reply; 8+ messages in thread
From: Barry Song @ 2020-07-23 13:13 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, will, ganapatrao.kulkarni,
	catalin.marinas
  Cc: Steve Capper, linux-kernel, linuxarm, huangdaode, iommu,
	prime.zeng, Andrew Morton, Mike Rapoport, linux-arm-kernel

Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get
coherent DMA buffers to save their command queues and page tables. As
there is only one default CMA in the whole system, SMMUs on nodes other
than node0 will get remote memory. This leads to significant latency.

This patch provides per-numa CMA so that drivers like SMMU can get local
memory. Tests show localizing CMA can decrease dma_unmap latency much.
For instance, before this patch, SMMU on node2 has to wait for more than
560ns for the completion of CMD_SYNC in an empty command queue; with this
patch, it needs 240ns only.

A positive side effect of this patch would be improving performance even
further for those users who are worried about performance more than DMA
security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all
drivers can get local coherent DMA buffers.

Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
-v4:
 * rebase on top of Christoph Hellwig's patch:
 [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous
 https://lore.kernel.org/linux-iommu/20200723120133.94105-1-hch@lst.de/
 * cleanup according to Christoph's comment
 * reserve cma by checking N_MEMORY rather than N_ONLINE


 .../admin-guide/kernel-parameters.txt         |  9 ++
 include/linux/dma-contiguous.h                |  4 +
 kernel/dma/Kconfig                            | 10 ++
 kernel/dma/contiguous.c                       | 94 +++++++++++++++++--
 4 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b8d6cde06463..6d963484bbdc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,15 @@
 			altogether. For more information, see
 			include/linux/dma-contiguous.h
 
+	pernuma_cma=nn[MG]@[start[MG][-end[MG]]]
+			[ARM,X86,KNL]
+			Sets the size of kernel per-numa memory area for
+			contiguous memory allocations. A value of 0 disables
+			per-numa CMA altogether. DMA users on node nid will
+			first try to allocate buffer from the pernuma area
+			which is located in node nid, if the allocation fails,
+			they will fallback to the global default memory area.
+
 	cmo_free_hint=	[PPC] Format: { yes | no }
 			Specify whether pages are marked as being inactive
 			when they are freed.  This is used in CMO environments
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index 03f8e98e3bcc..278a80a40456 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -79,6 +79,8 @@ static inline void dma_contiguous_set_default(struct cma *cma)
 
 void dma_contiguous_reserve(phys_addr_t addr_limit);
 
+void dma_pernuma_cma_reserve(void);
+
 int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 				       phys_addr_t limit, struct cma **res_cma,
 				       bool fixed);
@@ -128,6 +130,8 @@ static inline void dma_contiguous_set_default(struct cma *cma) { }
 
 static inline void dma_contiguous_reserve(phys_addr_t limit) { }
 
+static inline void dma_pernuma_cma_reserve(void) { }
+
 static inline int dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
 				       phys_addr_t limit, struct cma **res_cma,
 				       bool fixed)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 5d18456b5f01..37e0e3559624 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -120,6 +120,16 @@ config DMA_CMA
 if  DMA_CMA
 comment "Default contiguous memory area size:"
 
+config CMA_PERNUMA_SIZE_MBYTES
+	int "Size in Mega Bytes for per-numa CMA areas"
+	depends on NUMA
+	default 16 if ARM64
+	default 0
+	help
+	  Defines the size (in MiB) of the per-numa memory area for Contiguous
+	  Memory Allocator. Every numa node will get a separate CMA with this
+	  size. If the size of 0 is selected, per-numa CMA is disabled.
+
 config CMA_SIZE_MBYTES
 	int "Size in Mega Bytes"
 	depends on !CMA_SIZE_SEL_PERCENTAGE
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index cff7e60968b9..c0482231c0d7 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -30,7 +30,14 @@
 #define CMA_SIZE_MBYTES 0
 #endif
 
+#ifdef CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#define CMA_SIZE_PERNUMA_MBYTES CONFIG_CMA_PERNUMA_SIZE_MBYTES
+#else
+#define CMA_SIZE_PERNUMA_MBYTES 0
+#endif
+
 struct cma *dma_contiguous_default_area;
+static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
 
 /*
  * Default global CMA area size can be defined in kernel's .config.
@@ -44,6 +51,8 @@ struct cma *dma_contiguous_default_area;
  */
 static const phys_addr_t size_bytes __initconst =
 	(phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
+static phys_addr_t pernuma_size_bytes __initdata =
+	(phys_addr_t)CMA_SIZE_PERNUMA_MBYTES * SZ_1M;
 static phys_addr_t  size_cmdline __initdata = -1;
 static phys_addr_t base_cmdline __initdata;
 static phys_addr_t limit_cmdline __initdata;
@@ -69,6 +78,13 @@ static int __init early_cma(char *p)
 }
 early_param("cma", early_cma);
 
+static int __init early_pernuma_cma(char *p)
+{
+	pernuma_size_bytes = memparse(p, &p);
+	return 0;
+}
+early_param("pernuma_cma", early_pernuma_cma);
+
 #ifdef CONFIG_CMA_SIZE_PERCENTAGE
 
 static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
@@ -96,6 +112,33 @@ static inline __maybe_unused phys_addr_t cma_early_percent_memory(void)
 
 #endif
 
+void __init dma_pernuma_cma_reserve(void)
+{
+	int nid;
+
+	if (!pernuma_size_bytes)
+		return;
+
+	for_each_node_state(nid, N_MEMORY) {
+		int ret;
+		char name[20];
+
+		snprintf(name, sizeof(name), "pernuma%d", nid);
+		ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
+						 0, false, name,
+						 &dma_contiguous_pernuma_area[nid],
+						 nid);
+		if (ret) {
+			pr_warn("%s: reservation failed: err %d, node %d", __func__,
+				ret, nid);
+			continue;
+		}
+
+		pr_debug("%s: reserved %llu MiB on node %d\n", __func__,
+			(unsigned long long)pernuma_size_bytes / SZ_1M, nid);
+	}
+}
+
 /**
  * dma_contiguous_reserve() - reserve area(s) for contiguous memory handling
  * @limit: End address of the reserved memory (optional, 0 for any).
@@ -228,23 +271,38 @@ static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp)
  * @size:  Requested allocation size.
  * @gfp:   Allocation flags.
  *
- * This function allocates contiguous memory buffer for specified device. It
- * tries to use device specific contiguous memory area if available, or the
- * default global one.
+ * tries to use device specific contiguous memory area if available, or it
+ * tries to use per-numa cma, if the allocation fails, it will fallback to
+ * try default global one.
  *
- * Note that it byapss one-page size of allocations from the global area as
- * the addresses within one page are always contiguous, so there is no need
- * to waste CMA pages for that kind; it also helps reduce fragmentations.
+ * Note that it bypass one-page size of allocations from the per-numa and
+ * global area as the addresses within one page are always contiguous, so
+ * there is no need to waste CMA pages for that kind; it also helps reduce
+ * fragmentations.
  */
 struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
 {
+	int nid = dev_to_node(dev);
+
 	/* CMA can be used only in the context which permits sleeping */
 	if (!gfpflags_allow_blocking(gfp))
 		return NULL;
 	if (dev->cma_area)
 		return cma_alloc_aligned(dev->cma_area, size, gfp);
-	if (size <= PAGE_SIZE || !dma_contiguous_default_area)
+	if (size <= PAGE_SIZE)
 		return NULL;
+
+	if ((nid != NUMA_NO_NODE) && !(gfp & (GFP_DMA | GFP_DMA32))) {
+		struct cma *cma = dma_contiguous_pernuma_area[nid];
+		struct page *page;
+
+		if (cma) {
+			page = cma_alloc_aligned(cma, size, gfp);
+			if (page)
+				return page;
+		}
+	}
+
 	return cma_alloc_aligned(dma_contiguous_default_area, size, gfp);
 }
 
@@ -261,9 +319,25 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
  */
 void dma_free_contiguous(struct device *dev, struct page *page, size_t size)
 {
-	if (!cma_release(dev_get_cma_area(dev), page,
-			 PAGE_ALIGN(size) >> PAGE_SHIFT))
-		__free_pages(page, get_order(size));
+	/* if dev has its own cma, free page from there */
+	if (dev->cma_area) {
+		if (cma_release(dev->cma_area, page, PAGE_ALIGN(size) >> PAGE_SHIFT))
+			return;
+	} else {
+		/*
+		 * otherwise, page is from either per-numa cma or default cma
+		 */
+		if (cma_release(dma_contiguous_pernuma_area[page_to_nid(page)],
+					page, PAGE_ALIGN(size) >> PAGE_SHIFT))
+			return;
+
+		if (cma_release(dma_contiguous_default_area, page,
+					PAGE_ALIGN(size) >> PAGE_SHIFT))
+			return;
+	}
+
+	/* not in any cma, free from buddy */
+	__free_pages(page, get_order(size));
 }
 
 /*
-- 
2.27.0


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers
  2020-07-23 13:13 [PATCH v4 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA Barry Song
  2020-07-23 13:13 ` [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
@ 2020-07-23 13:13 ` Barry Song
  2020-07-28 11:53   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Barry Song @ 2020-07-23 13:13 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, will, ganapatrao.kulkarni,
	catalin.marinas
  Cc: Steve Capper, linux-kernel, linuxarm, huangdaode, iommu,
	prime.zeng, Andrew Morton, Mike Rapoport, linux-arm-kernel

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
with this patch, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.
Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 -v4:
 * rebase on top of linux-next to avoid arch/arm64 conflicts

 arch/arm64/mm/init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index b6881d61b818..a6e19145ebb3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -437,6 +437,8 @@ void __init bootmem_init(void)
 	arm64_hugetlb_cma_reserve();
 #endif
 
+	dma_pernuma_cma_reserve();
+
 	memblock_dump_all();
 }
 
-- 
2.27.0


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA
  2020-07-23 13:13 ` [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
@ 2020-07-28 11:52   ` Christoph Hellwig
  2020-07-28 12:19     ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-07-28 11:52 UTC (permalink / raw)
  To: Barry Song
  Cc: catalin.marinas, Mike Rapoport, Steve Capper, robin.murphy,
	linuxarm, linux-kernel, iommu, prime.zeng, ganapatrao.kulkarni,
	Andrew Morton, huangdaode, will, hch, linux-arm-kernel

On Fri, Jul 24, 2020 at 01:13:43AM +1200, Barry Song wrote:
> +config CMA_PERNUMA_SIZE_MBYTES
> +	int "Size in Mega Bytes for per-numa CMA areas"
> +	depends on NUMA
> +	default 16 if ARM64
> +	default 0
> +	help
> +	  Defines the size (in MiB) of the per-numa memory area for Contiguous
> +	  Memory Allocator. Every numa node will get a separate CMA with this
> +	  size. If the size of 0 is selected, per-numa CMA is disabled.

I'm still not a fan of the config option.  You can just hardcode the
value in CONFIG_CMDLINE based on the kernel parameter.  Also I wonder
if a way to expose this in the device tree might be useful, but people
more familiar with the device tree and the arm code will have to chime
in on that.

>  struct cma *dma_contiguous_default_area;
> +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
>  
>  /*
>   * Default global CMA area size can be defined in kernel's .config.
> @@ -44,6 +51,8 @@ struct cma *dma_contiguous_default_area;
>   */
>  static const phys_addr_t size_bytes __initconst =
>  	(phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
> +static phys_addr_t pernuma_size_bytes __initdata =
> +	(phys_addr_t)CMA_SIZE_PERNUMA_MBYTES * SZ_1M;
>  static phys_addr_t  size_cmdline __initdata = -1;
>  static phys_addr_t base_cmdline __initdata;
>  static phys_addr_t limit_cmdline __initdata;
> @@ -69,6 +78,13 @@ static int __init early_cma(char *p)
>  }
>  early_param("cma", early_cma);
>  
> +static int __init early_pernuma_cma(char *p)
> +{
> +	pernuma_size_bytes = memparse(p, &p);
> +	return 0;
> +}
> +early_param("pernuma_cma", early_pernuma_cma);
> +
>  #ifdef CONFIG_CMA_SIZE_PERCENTAGE
>  
>  static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
> @@ -96,6 +112,33 @@ static inline __maybe_unused phys_addr_t cma_early_percent_memory(void)
>  
>  #endif
>  
> +void __init dma_pernuma_cma_reserve(void)
> +{
> +	int nid;
> +
> +	if (!pernuma_size_bytes)
> +		return;
> +
> +	for_each_node_state(nid, N_MEMORY) {
> +		int ret;
> +		char name[20];
> +
> +		snprintf(name, sizeof(name), "pernuma%d", nid);
> +		ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
> +						 0, false, name,
> +						 &dma_contiguous_pernuma_area[nid],
> +						 nid);

This adds a > 80 char line.

>  struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
>  {
> +	int nid = dev_to_node(dev);
> +
>  	/* CMA can be used only in the context which permits sleeping */
>  	if (!gfpflags_allow_blocking(gfp))
>  		return NULL;
>  	if (dev->cma_area)
>  		return cma_alloc_aligned(dev->cma_area, size, gfp);
> -	if (size <= PAGE_SIZE || !dma_contiguous_default_area)
> +	if (size <= PAGE_SIZE)
>  		return NULL;
> +
> +	if ((nid != NUMA_NO_NODE) && !(gfp & (GFP_DMA | GFP_DMA32))) {

No need for the braces around the nid check.

> +		struct cma *cma = dma_contiguous_pernuma_area[nid];
> +		struct page *page;
> +
> +		if (cma) {
> +			page = cma_alloc_aligned(cma, size, gfp);
> +			if (page)
> +				return page;
> +		}
> +	}
> +
>  	return cma_alloc_aligned(dma_contiguous_default_area, size, gfp);

This seems to have lost the dma_contiguous_default_area NULL check.

> +	/* if dev has its own cma, free page from there */
> +	if (dev->cma_area) {
> +		if (cma_release(dev->cma_area, page, PAGE_ALIGN(size) >> PAGE_SHIFT))
> +			return;

Another overly long line.

> +	} else {
> +		/*
> +		 * otherwise, page is from either per-numa cma or default cma
> +		 */
> +		if (cma_release(dma_contiguous_pernuma_area[page_to_nid(page)],
> +					page, PAGE_ALIGN(size) >> PAGE_SHIFT))
> +			return;
> +
> +		if (cma_release(dma_contiguous_default_area, page,
> +					PAGE_ALIGN(size) >> PAGE_SHIFT))
> +			return;
> +	}

I'd introduce a count variable for the value of "PAGE_ALIGN(size) >>
PAGE_SHIFT" to clean al lthis up a bit.

Also please add a CONFIG_PERCPU_DMA_CMA config variable so that we
don't build this code for the vast majority of users that don't need it.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers
  2020-07-23 13:13 ` [PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers Barry Song
@ 2020-07-28 11:53   ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-07-28 11:53 UTC (permalink / raw)
  To: Barry Song
  Cc: catalin.marinas, Mike Rapoport, Steve Capper, robin.murphy,
	linuxarm, linux-kernel, iommu, prime.zeng, ganapatrao.kulkarni,
	Andrew Morton, huangdaode, will, hch, linux-arm-kernel

Looks ok to me, but before I pick the series up in the dma-mapping tree
I really want an ACK from the arm64 maintainers.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA
  2020-07-28 11:52   ` Christoph Hellwig
@ 2020-07-28 12:19     ` Song Bao Hua (Barry Song)
  2020-07-28 12:22       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Song Bao Hua (Barry Song) @ 2020-07-28 12:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: catalin.marinas, Mike Rapoport, Steve Capper, robin.murphy,
	Linuxarm, linux-kernel, iommu, Zengtao (B),
	ganapatrao.kulkarni, Andrew Morton, huangdaode, will,
	linux-arm-kernel



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Tuesday, July 28, 2020 11:53 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: hch@lst.de; m.szyprowski@samsung.com; robin.murphy@arm.com;
> will@kernel.org; ganapatrao.kulkarni@cavium.com;
> catalin.marinas@arm.com; iommu@lists.linux-foundation.org; Linuxarm
> <linuxarm@huawei.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; Zengtao (B) <prime.zeng@hisilicon.com>;
> huangdaode <huangdaode@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de>; Steve Capper <steve.capper@arm.com>; Andrew
> Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>
> Subject: Re: [PATCH v4 1/2] dma-direct: provide the ability to reserve
> per-numa CMA
> 
> On Fri, Jul 24, 2020 at 01:13:43AM +1200, Barry Song wrote:
> > +config CMA_PERNUMA_SIZE_MBYTES
> > +	int "Size in Mega Bytes for per-numa CMA areas"
> > +	depends on NUMA
> > +	default 16 if ARM64
> > +	default 0
> > +	help
> > +	  Defines the size (in MiB) of the per-numa memory area for Contiguous
> > +	  Memory Allocator. Every numa node will get a separate CMA with this
> > +	  size. If the size of 0 is selected, per-numa CMA is disabled.
> 
> I'm still not a fan of the config option.  You can just hardcode the
> value in CONFIG_CMDLINE based on the kernel parameter.  Also I wonder

I am sorry I haven't got your point yet. Do you mean something like the below?

arch/arm64/Kconfig:
config CMDLINE
	string "Default kernel command string"
-	default ""
+	default "pernuma_cma=16M"
	help
	  Provide a set of default command-line options at build time by
	  entering them here. As a minimum, you should specify the the
	  root device (e.g. root=/dev/nfs).

A background of the current code is that Linux distributions can usually use arch/arm64/configs/defconfig
directly to build kernel. cmdline can be easily ignored during the generation of Linux distributions.

> if a way to expose this in the device tree might be useful, but people
> more familiar with the device tree and the arm code will have to chime
> in on that.

Not sure if it is an useful user case as we are using ACPI but not device tree since it is an ARM64
server with NUMA.

> 
> >  struct cma *dma_contiguous_default_area;
> > +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
> >
> >  /*
> >   * Default global CMA area size can be defined in kernel's .config.
> > @@ -44,6 +51,8 @@ struct cma *dma_contiguous_default_area;
> >   */
> >  static const phys_addr_t size_bytes __initconst =
> >  	(phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
> > +static phys_addr_t pernuma_size_bytes __initdata =
> > +	(phys_addr_t)CMA_SIZE_PERNUMA_MBYTES * SZ_1M;
> >  static phys_addr_t  size_cmdline __initdata = -1;
> >  static phys_addr_t base_cmdline __initdata;
> >  static phys_addr_t limit_cmdline __initdata;
> > @@ -69,6 +78,13 @@ static int __init early_cma(char *p)
> >  }
> >  early_param("cma", early_cma);
> >
> > +static int __init early_pernuma_cma(char *p)
> > +{
> > +	pernuma_size_bytes = memparse(p, &p);
> > +	return 0;
> > +}
> > +early_param("pernuma_cma", early_pernuma_cma);
> > +
> >  #ifdef CONFIG_CMA_SIZE_PERCENTAGE
> >
> >  static phys_addr_t __init __maybe_unused
> cma_early_percent_memory(void)
> > @@ -96,6 +112,33 @@ static inline __maybe_unused phys_addr_t
> cma_early_percent_memory(void)
> >
> >  #endif
> >
> > +void __init dma_pernuma_cma_reserve(void)
> > +{
> > +	int nid;
> > +
> > +	if (!pernuma_size_bytes)
> > +		return;
> > +
> > +	for_each_node_state(nid, N_MEMORY) {
> > +		int ret;
> > +		char name[20];
> > +
> > +		snprintf(name, sizeof(name), "pernuma%d", nid);
> > +		ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
> > +						 0, false, name,
> > +						 &dma_contiguous_pernuma_area[nid],
> > +						 nid);
> 
> This adds a > 80 char line.

Will refine.

> 
> >  struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t
> gfp)
> >  {
> > +	int nid = dev_to_node(dev);
> > +
> >  	/* CMA can be used only in the context which permits sleeping */
> >  	if (!gfpflags_allow_blocking(gfp))
> >  		return NULL;
> >  	if (dev->cma_area)
> >  		return cma_alloc_aligned(dev->cma_area, size, gfp);
> > -	if (size <= PAGE_SIZE || !dma_contiguous_default_area)
> > +	if (size <= PAGE_SIZE)
> >  		return NULL;
> > +
> > +	if ((nid != NUMA_NO_NODE) && !(gfp & (GFP_DMA | GFP_DMA32))) {
> 
> No need for the braces around the nid check.

Will refine.

> 
> > +		struct cma *cma = dma_contiguous_pernuma_area[nid];
> > +		struct page *page;
> > +
> > +		if (cma) {
> > +			page = cma_alloc_aligned(cma, size, gfp);
> > +			if (page)
> > +				return page;
> > +		}
> > +	}
> > +
> >  	return cma_alloc_aligned(dma_contiguous_default_area, size, gfp);
> 
> This seems to have lost the dma_contiguous_default_area NULL check.

cma_alloc() is doing the check by returning NULL if cma is NULL.

struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
		       bool no_warn)
{
	...
	if (!cma || !cma->count)
		return NULL;
}

But I agree here the code can check before calling cma_alloc_aligned.

> 
> > +	/* if dev has its own cma, free page from there */
> > +	if (dev->cma_area) {
> > +		if (cma_release(dev->cma_area, page, PAGE_ALIGN(size) >>
> PAGE_SHIFT))
> > +			return;
> 
> Another overly long line.

Will refine.

> 
> > +	} else {
> > +		/*
> > +		 * otherwise, page is from either per-numa cma or default cma
> > +		 */
> > +		if (cma_release(dma_contiguous_pernuma_area[page_to_nid(page)],
> > +					page, PAGE_ALIGN(size) >> PAGE_SHIFT))
> > +			return;
> > +
> > +		if (cma_release(dma_contiguous_default_area, page,
> > +					PAGE_ALIGN(size) >> PAGE_SHIFT))
> > +			return;
> > +	}
> 
> I'd introduce a count variable for the value of "PAGE_ALIGN(size) >>
> PAGE_SHIFT" to clean al lthis up a bit.

Good idea.

> 
> Also please add a CONFIG_PERCPU_DMA_CMA config variable so that we
> don't build this code for the vast majority of users that don't need it.

agreed.

Thanks
Barry

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA
  2020-07-28 12:19     ` Song Bao Hua (Barry Song)
@ 2020-07-28 12:22       ` Christoph Hellwig
  2020-07-29 11:21         ` Song Bao Hua (Barry Song)
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-07-28 12:22 UTC (permalink / raw)
  To: Song Bao Hua (Barry Song)
  Cc: catalin.marinas, Mike Rapoport, Steve Capper, robin.murphy,
	Linuxarm, linux-kernel, iommu, Zengtao (B),
	ganapatrao.kulkarni, Andrew Morton, huangdaode, will,
	Christoph Hellwig, linux-arm-kernel

On Tue, Jul 28, 2020 at 12:19:03PM +0000, Song Bao Hua (Barry Song) wrote:
> I am sorry I haven't got your point yet. Do you mean something like the below?
> 
> arch/arm64/Kconfig:
> config CMDLINE
> 	string "Default kernel command string"
> -	default ""
> +	default "pernuma_cma=16M"
> 	help
> 	  Provide a set of default command-line options at build time by
> 	  entering them here. As a minimum, you should specify the the
> 	  root device (e.g. root=/dev/nfs).

Yes.

> A background of the current code is that Linux distributions can usually use arch/arm64/configs/defconfig
> directly to build kernel. cmdline can be easily ignored during the generation of Linux distributions.

I've not actually heard of a distro shipping defconfig yet..

> 
> > if a way to expose this in the device tree might be useful, but people
> > more familiar with the device tree and the arm code will have to chime
> > in on that.
> 
> Not sure if it is an useful user case as we are using ACPI but not device tree since it is an ARM64
> server with NUMA.

Well, than maybe ACPI experts need to chime in on this.

> > This seems to have lost the dma_contiguous_default_area NULL check.
> 
> cma_alloc() is doing the check by returning NULL if cma is NULL.
> 
> struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
> 		       bool no_warn)
> {
> 	...
> 	if (!cma || !cma->count)
> 		return NULL;
> }
> 
> But I agree here the code can check before calling cma_alloc_aligned.

Oh, indeed.  Please split the removal of the NULL check in to a prep
patch then.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA
  2020-07-28 12:22       ` Christoph Hellwig
@ 2020-07-29 11:21         ` Song Bao Hua (Barry Song)
  0 siblings, 0 replies; 8+ messages in thread
From: Song Bao Hua (Barry Song) @ 2020-07-29 11:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: catalin.marinas, Mike Rapoport, Steve Capper, robin.murphy,
	Linuxarm, linux-kernel, iommu, Zengtao (B),
	ganapatrao.kulkarni, Andrew Morton, huangdaode, will,
	linux-arm-kernel



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Wednesday, July 29, 2020 12:23 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: Christoph Hellwig <hch@lst.de>; m.szyprowski@samsung.com;
> robin.murphy@arm.com; will@kernel.org; ganapatrao.kulkarni@cavium.com;
> catalin.marinas@arm.com; iommu@lists.linux-foundation.org; Linuxarm
> <linuxarm@huawei.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; Zengtao (B) <prime.zeng@hisilicon.com>;
> huangdaode <huangdaode@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de>; Steve Capper <steve.capper@arm.com>; Andrew
> Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>
> Subject: Re: [PATCH v4 1/2] dma-direct: provide the ability to reserve
> per-numa CMA
> 
> On Tue, Jul 28, 2020 at 12:19:03PM +0000, Song Bao Hua (Barry Song) wrote:
> > I am sorry I haven't got your point yet. Do you mean something like the
> below?
> >
> > arch/arm64/Kconfig:
> > config CMDLINE
> > 	string "Default kernel command string"
> > -	default ""
> > +	default "pernuma_cma=16M"
> > 	help
> > 	  Provide a set of default command-line options at build time by
> > 	  entering them here. As a minimum, you should specify the the
> > 	  root device (e.g. root=/dev/nfs).
> 
> Yes.
> 
> > A background of the current code is that Linux distributions can usually use
> arch/arm64/configs/defconfig
> > directly to build kernel. cmdline can be easily ignored during the generation
> of Linux distributions.
> 
> I've not actually heard of a distro shipping defconfig yet..
> 
> >
> > > if a way to expose this in the device tree might be useful, but people
> > > more familiar with the device tree and the arm code will have to chime
> > > in on that.
> >
> > Not sure if it is an useful user case as we are using ACPI but not device tree
> since it is an ARM64
> > server with NUMA.
> 
> Well, than maybe ACPI experts need to chime in on this.
> 
> > > This seems to have lost the dma_contiguous_default_area NULL check.
> >
> > cma_alloc() is doing the check by returning NULL if cma is NULL.
> >
> > struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
> > 		       bool no_warn)
> > {
> > 	...
> > 	if (!cma || !cma->count)
> > 		return NULL;
> > }
> >
> > But I agree here the code can check before calling cma_alloc_aligned.
> 
> Oh, indeed.  Please split the removal of the NULL check in to a prep
> patch then.

Do you mean removing the NULL check in cma_alloc()? If so, it seems lot of places
need to be changed:

struct page *dma_alloc_from_contiguous(struct device *dev, size_t count,
				       unsigned int align, bool no_warn)
{
	if (align > CONFIG_CMA_ALIGNMENT)
		align = CONFIG_CMA_ALIGNMENT;
+ code to check dev_get_cma_area(dev) is not NULL
	return cma_alloc(dev_get_cma_area(dev), count, align, no_warn);
}

bool dma_release_from_contiguous(struct device *dev, struct page *pages,
				 int count)
{
+ code to check dev_get_cma_area(dev) is not NULL
	return cma_release(dev_get_cma_area(dev), pages, count);
}

bool cma_release(struct cma *cma, const struct page *pages, unsigned int count)
{
	unsigned long pfn;
+ do we need to remove this !cma too if we remove it in cma_alloc()?
	if (!cma || !pages)
		return false;
	...
}

And some other places where cma_alloc() and cma_release() are called:

arch/powerpc/kvm/book3s_hv_builtin.c
drivers/dma-buf/heaps/cma_heap.c
drivers/s390/char/vmcp.c
drivers/staging/android/ion/ion_cma_heap.c
mm/hugetlb.c

it seems many code were written with the assumption that cma_alloc/release will
check if cma is null so they don't check it before calling cma_alloc().

And I am not sure if kernel robot will report error like pointer reference before checking
it if !cma is removed in cma_alloc().

Thanks
Barry
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-29 11:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 13:13 [PATCH v4 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA Barry Song
2020-07-23 13:13 ` [PATCH v4 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
2020-07-28 11:52   ` Christoph Hellwig
2020-07-28 12:19     ` Song Bao Hua (Barry Song)
2020-07-28 12:22       ` Christoph Hellwig
2020-07-29 11:21         ` Song Bao Hua (Barry Song)
2020-07-23 13:13 ` [PATCH v4 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers Barry Song
2020-07-28 11:53   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).