linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fix dma coherent pool sizing
@ 2022-08-17  6:06 Christoph Hellwig
  2022-08-17  6:06 ` [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-17  6:06 UTC (permalink / raw)
  To: iommu, Marek Szyprowski, Robin Murphy
  Cc: Michal Hocko, David Rientjes, linux-kernel

Hi all

Michal pointed out that the sizing of the dma coherent pools is bonkers
as it uses the total memory to size the pool for each zone, which leads
to comically larger zones for the 16-MB ZONE_DMA on x86, which tends
leads to allocation failure warnings.  This series switches to sizing
the ZONE_DMA and ZONE_DMA32 pools based on the number of pages that
actually reside in those zones instead.

Diffstat:
 pool.c |   65 +++++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper
  2022-08-17  6:06 fix dma coherent pool sizing Christoph Hellwig
@ 2022-08-17  6:06 ` Christoph Hellwig
  2022-08-17 12:32   ` Robin Murphy
  2022-08-17  6:06 ` [PATCH 2/3] dma-pool: don't return errors from dma_atomic_pool_init Christoph Hellwig
  2022-08-17  6:06 ` [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools Christoph Hellwig
  2 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-17  6:06 UTC (permalink / raw)
  To: iommu, Marek Szyprowski, Robin Murphy
  Cc: Michal Hocko, David Rientjes, linux-kernel

Add a helper to calculate the pool size from dma_atomic_pool_init,
and fix up the last max_t to use the proper type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/pool.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 4d40dcce7604b..56f96678934bf 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -184,6 +184,15 @@ static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
 	return pool;
 }
 
+static unsigned long calculate_pool_size(unsigned long zone_pages)
+{
+	unsigned long nr_pages = min_t(unsigned long,
+				       zone_pages / (SZ_1G / SZ_128K),
+				       MAX_ORDER_NR_PAGES);
+
+	return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
+}
+
 static int __init dma_atomic_pool_init(void)
 {
 	int ret = 0;
@@ -192,11 +201,9 @@ static int __init dma_atomic_pool_init(void)
 	 * If coherent_pool was not used on the command line, default the pool
 	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
 	 */
-	if (!atomic_pool_size) {
-		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
-		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
-		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
-	}
+	if (!atomic_pool_size)
+		atomic_pool_size = calculate_pool_size(totalram_pages());
+
 	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
 
 	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] dma-pool: don't return errors from dma_atomic_pool_init
  2022-08-17  6:06 fix dma coherent pool sizing Christoph Hellwig
  2022-08-17  6:06 ` [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper Christoph Hellwig
@ 2022-08-17  6:06 ` Christoph Hellwig
  2022-08-17 12:36   ` Robin Murphy
  2022-08-17  6:06 ` [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools Christoph Hellwig
  2 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-17  6:06 UTC (permalink / raw)
  To: iommu, Marek Szyprowski, Robin Murphy
  Cc: Michal Hocko, David Rientjes, linux-kernel

Returning errors from initcalls does not change a thing, thus don't
bother with returning -ENOMEM if one of the pool allocations failed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/pool.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 56f96678934bf..5b07b0379a501 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -195,8 +195,6 @@ static unsigned long calculate_pool_size(unsigned long zone_pages)
 
 static int __init dma_atomic_pool_init(void)
 {
-	int ret = 0;
-
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
 	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
@@ -208,23 +206,17 @@ static int __init dma_atomic_pool_init(void)
 
 	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
 						    GFP_KERNEL);
-	if (!atomic_pool_kernel)
-		ret = -ENOMEM;
 	if (has_managed_dma()) {
 		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA);
-		if (!atomic_pool_dma)
-			ret = -ENOMEM;
 	}
 	if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
 		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
 						GFP_KERNEL | GFP_DMA32);
-		if (!atomic_pool_dma32)
-			ret = -ENOMEM;
 	}
 
 	dma_atomic_pool_debugfs_init();
-	return ret;
+	return 0;
 }
 postcore_initcall(dma_atomic_pool_init);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-17  6:06 fix dma coherent pool sizing Christoph Hellwig
  2022-08-17  6:06 ` [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper Christoph Hellwig
  2022-08-17  6:06 ` [PATCH 2/3] dma-pool: don't return errors from dma_atomic_pool_init Christoph Hellwig
@ 2022-08-17  6:06 ` Christoph Hellwig
  2022-08-17 12:50   ` Robin Murphy
  2022-08-17 12:53   ` Michal Hocko
  2 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-17  6:06 UTC (permalink / raw)
  To: iommu, Marek Szyprowski, Robin Murphy
  Cc: Michal Hocko, David Rientjes, linux-kernel

Limit the sizing of the atomic pools for allocations from
ZONE_DMA and ZONE_DMA32 based on the number of pages actually
present in those zones instead of the total memory.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask")
Reported-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/pool.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5b07b0379a501..f629c6dfd8555 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -193,28 +193,46 @@ static unsigned long calculate_pool_size(unsigned long zone_pages)
 	return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
 }
 
+#if defined(CONFIG_ZONE_DMA) || defined(CONFIG_ZONE_DMA32)
+static unsigned long __init nr_managed_pages(int zone_idx)
+{
+	struct pglist_data *pgdat;
+	unsigned long nr_pages = 0;
+
+	for_each_online_pgdat(pgdat)
+		nr_pages += zone_managed_pages(&pgdat->node_zones[zone_idx]);
+	return nr_pages;
+}
+#endif /* CONFIG_ZONE_DMA || CONFIG_ZONE_DMA32 */
+
 static int __init dma_atomic_pool_init(void)
 {
+	unsigned long nr_zone_dma_pages, nr_zone_dma32_pages;
+
+#ifdef CONFIG_ZONE_DMA
+	nr_zone_dma_pages = nr_managed_pages(ZONE_DMA);
+	if (nr_zone_dma_pages)
+		atomic_pool_dma = __dma_atomic_pool_init(
+				calculate_pool_size(nr_zone_dma_pages),
+				GFP_DMA);
+#endif
+#ifdef CONFIG_ZONE_DMA32
+	nr_zone_dma32_pages = nr_managed_pages(ZONE_DMA32);
+	if (nr_zone_dma32_pages)
+		atomic_pool_dma32 = __dma_atomic_pool_init(
+				calculate_pool_size(nr_zone_dma32_pages),
+				GFP_DMA32);
+#endif
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
 	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
 	 */
 	if (!atomic_pool_size)
 		atomic_pool_size = calculate_pool_size(totalram_pages());
-
-	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
-
 	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
 						    GFP_KERNEL);
-	if (has_managed_dma()) {
-		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
-						GFP_KERNEL | GFP_DMA);
-	}
-	if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
-		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
-						GFP_KERNEL | GFP_DMA32);
-	}
 
+	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
 	dma_atomic_pool_debugfs_init();
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper
  2022-08-17  6:06 ` [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper Christoph Hellwig
@ 2022-08-17 12:32   ` Robin Murphy
  2022-08-21 10:41     ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2022-08-17 12:32 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Marek Szyprowski
  Cc: Michal Hocko, David Rientjes, linux-kernel

On 2022-08-17 07:06, Christoph Hellwig wrote:
> Add a helper to calculate the pool size from dma_atomic_pool_init,
> and fix up the last max_t to use the proper type.

Hmm, both atomic_pool_size and the argument to __dma_atomic_pool_init() 
where this gets directly passed later are size_t, not to mention that 
the function name says we're calculating a size, so I'd say size_t *is* 
the proper type to return here.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/pool.c | 17 ++++++++++++-----
>   1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 4d40dcce7604b..56f96678934bf 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -184,6 +184,15 @@ static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
>   	return pool;
>   }
>   
> +static unsigned long calculate_pool_size(unsigned long zone_pages)
> +{
> +	unsigned long nr_pages = min_t(unsigned long,
> +				       zone_pages / (SZ_1G / SZ_128K),
> +				       MAX_ORDER_NR_PAGES);

Nit: this is arguably less readable, and objectively one line longer, 
than the original two statements.

Other that those nits though,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> +
> +	return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
> +}
> +
>   static int __init dma_atomic_pool_init(void)
>   {
>   	int ret = 0;
> @@ -192,11 +201,9 @@ static int __init dma_atomic_pool_init(void)
>   	 * If coherent_pool was not used on the command line, default the pool
>   	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
>   	 */
> -	if (!atomic_pool_size) {
> -		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
> -		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
> -		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
> -	}
> +	if (!atomic_pool_size)
> +		atomic_pool_size = calculate_pool_size(totalram_pages());
> +
>   	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
>   
>   	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] dma-pool: don't return errors from dma_atomic_pool_init
  2022-08-17  6:06 ` [PATCH 2/3] dma-pool: don't return errors from dma_atomic_pool_init Christoph Hellwig
@ 2022-08-17 12:36   ` Robin Murphy
  0 siblings, 0 replies; 17+ messages in thread
From: Robin Murphy @ 2022-08-17 12:36 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Marek Szyprowski
  Cc: Michal Hocko, David Rientjes, linux-kernel

On 2022-08-17 07:06, Christoph Hellwig wrote:
> Returning errors from initcalls does not change a thing, thus don't
> bother with returning -ENOMEM if one of the pool allocations failed.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/pool.c | 10 +---------
>   1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 56f96678934bf..5b07b0379a501 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -195,8 +195,6 @@ static unsigned long calculate_pool_size(unsigned long zone_pages)
>   
>   static int __init dma_atomic_pool_init(void)
>   {
> -	int ret = 0;
> -
>   	/*
>   	 * If coherent_pool was not used on the command line, default the pool
>   	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
> @@ -208,23 +206,17 @@ static int __init dma_atomic_pool_init(void)
>   
>   	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
>   						    GFP_KERNEL);
> -	if (!atomic_pool_kernel)
> -		ret = -ENOMEM;
>   	if (has_managed_dma()) {
>   		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA);
> -		if (!atomic_pool_dma)
> -			ret = -ENOMEM;
>   	}
>   	if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
>   		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
>   						GFP_KERNEL | GFP_DMA32);
> -		if (!atomic_pool_dma32)
> -			ret = -ENOMEM;
>   	}
>   
>   	dma_atomic_pool_debugfs_init();
> -	return ret;
> +	return 0;
>   }
>   postcore_initcall(dma_atomic_pool_init);
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-17  6:06 ` [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools Christoph Hellwig
@ 2022-08-17 12:50   ` Robin Murphy
  2022-08-21 10:43     ` Christoph Hellwig
  2022-08-17 12:53   ` Michal Hocko
  1 sibling, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2022-08-17 12:50 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Marek Szyprowski
  Cc: Michal Hocko, David Rientjes, linux-kernel

On 2022-08-17 07:06, Christoph Hellwig wrote:
> Limit the sizing of the atomic pools for allocations from
> ZONE_DMA and ZONE_DMA32 based on the number of pages actually
> present in those zones instead of the total memory.
> 
> Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask")
> Reported-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   kernel/dma/pool.c | 40 +++++++++++++++++++++++++++++-----------
>   1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5b07b0379a501..f629c6dfd8555 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -193,28 +193,46 @@ static unsigned long calculate_pool_size(unsigned long zone_pages)
>   	return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
>   }
>   
> +#if defined(CONFIG_ZONE_DMA) || defined(CONFIG_ZONE_DMA32)

This #ifdeffery seems horribly clunky - I think it would be much nicer 
to mark this __maybe_unused, and preserve the has_managed_dma/IS_ENABLED 
logic below.

> +static unsigned long __init nr_managed_pages(int zone_idx)
> +{
> +	struct pglist_data *pgdat;
> +	unsigned long nr_pages = 0;
> +
> +	for_each_online_pgdat(pgdat)
> +		nr_pages += zone_managed_pages(&pgdat->node_zones[zone_idx]);
> +	return nr_pages;
> +}
> +#endif /* CONFIG_ZONE_DMA || CONFIG_ZONE_DMA32 */
> +
>   static int __init dma_atomic_pool_init(void)
>   {
> +	unsigned long nr_zone_dma_pages, nr_zone_dma32_pages;

...otherwise, I expect the buildbots will be along shortly with unused 
variable warnings for these :)

Cheers,
Robin.

> +
> +#ifdef CONFIG_ZONE_DMA
> +	nr_zone_dma_pages = nr_managed_pages(ZONE_DMA);
> +	if (nr_zone_dma_pages)
> +		atomic_pool_dma = __dma_atomic_pool_init(
> +				calculate_pool_size(nr_zone_dma_pages),
> +				GFP_DMA);
> +#endif
> +#ifdef CONFIG_ZONE_DMA32
> +	nr_zone_dma32_pages = nr_managed_pages(ZONE_DMA32);
> +	if (nr_zone_dma32_pages)
> +		atomic_pool_dma32 = __dma_atomic_pool_init(
> +				calculate_pool_size(nr_zone_dma32_pages),
> +				GFP_DMA32);
> +#endif
>   	/*
>   	 * If coherent_pool was not used on the command line, default the pool
>   	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
>   	 */
>   	if (!atomic_pool_size)
>   		atomic_pool_size = calculate_pool_size(totalram_pages());
> -
> -	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
> -
>   	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
>   						    GFP_KERNEL);
> -	if (has_managed_dma()) {
> -		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
> -						GFP_KERNEL | GFP_DMA);
> -	}
> -	if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
> -		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
> -						GFP_KERNEL | GFP_DMA32);
> -	}
>   
> +	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
>   	dma_atomic_pool_debugfs_init();
>   	return 0;
>   }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-17  6:06 ` [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools Christoph Hellwig
  2022-08-17 12:50   ` Robin Murphy
@ 2022-08-17 12:53   ` Michal Hocko
  2022-08-21 10:44     ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2022-08-17 12:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Marek Szyprowski, Robin Murphy, David Rientjes, linux-kernel

On Wed 17-08-22 08:06:47, Christoph Hellwig wrote:
> Limit the sizing of the atomic pools for allocations from
> ZONE_DMA and ZONE_DMA32 based on the number of pages actually
> present in those zones instead of the total memory.
> 
> Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask")
> Reported-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Yes this is much more reasonable estimation. Especially for the DMA
zone. I still believe some downscaling would make sense but as said
earlier nothing to really insist on. Also you likely want to have
pr_warn for non-poppulated DMA zone as pointed in other thread[1].

Other than that looks good to me. FWIW
Acked-by: Michal Hocko <mhocko@suse.com>

[1] https://lore.kernel.org/all/YvyN/vI7+cTv0geS@dhcp22.suse.cz/?q=http%3A%2F%2Flkml.kernel.org%2Fr%2FYvyN%2FvI7%2BcTv0geS%40dhcp22.suse.cz

> ---
>  kernel/dma/pool.c | 40 +++++++++++++++++++++++++++++-----------
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index 5b07b0379a501..f629c6dfd8555 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -193,28 +193,46 @@ static unsigned long calculate_pool_size(unsigned long zone_pages)
>  	return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
>  }
>  
> +#if defined(CONFIG_ZONE_DMA) || defined(CONFIG_ZONE_DMA32)
> +static unsigned long __init nr_managed_pages(int zone_idx)
> +{
> +	struct pglist_data *pgdat;
> +	unsigned long nr_pages = 0;
> +
> +	for_each_online_pgdat(pgdat)
> +		nr_pages += zone_managed_pages(&pgdat->node_zones[zone_idx]);
> +	return nr_pages;
> +}
> +#endif /* CONFIG_ZONE_DMA || CONFIG_ZONE_DMA32 */
> +
>  static int __init dma_atomic_pool_init(void)
>  {
> +	unsigned long nr_zone_dma_pages, nr_zone_dma32_pages;
> +
> +#ifdef CONFIG_ZONE_DMA
> +	nr_zone_dma_pages = nr_managed_pages(ZONE_DMA);
> +	if (nr_zone_dma_pages)
> +		atomic_pool_dma = __dma_atomic_pool_init(
> +				calculate_pool_size(nr_zone_dma_pages),
> +				GFP_DMA);
> +#endif
> +#ifdef CONFIG_ZONE_DMA32
> +	nr_zone_dma32_pages = nr_managed_pages(ZONE_DMA32);
> +	if (nr_zone_dma32_pages)
> +		atomic_pool_dma32 = __dma_atomic_pool_init(
> +				calculate_pool_size(nr_zone_dma32_pages),
> +				GFP_DMA32);
> +#endif
>  	/*
>  	 * If coherent_pool was not used on the command line, default the pool
>  	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
>  	 */
>  	if (!atomic_pool_size)
>  		atomic_pool_size = calculate_pool_size(totalram_pages());
> -
> -	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
> -
>  	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
>  						    GFP_KERNEL);
> -	if (has_managed_dma()) {
> -		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
> -						GFP_KERNEL | GFP_DMA);
> -	}
> -	if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
> -		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
> -						GFP_KERNEL | GFP_DMA32);
> -	}
>  
> +	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
>  	dma_atomic_pool_debugfs_init();
>  	return 0;
>  }
> -- 
> 2.30.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper
  2022-08-17 12:32   ` Robin Murphy
@ 2022-08-21 10:41     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-21 10:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, iommu, Marek Szyprowski, Michal Hocko,
	David Rientjes, linux-kernel

On Wed, Aug 17, 2022 at 01:32:42PM +0100, Robin Murphy wrote:
> On 2022-08-17 07:06, Christoph Hellwig wrote:
>> Add a helper to calculate the pool size from dma_atomic_pool_init,
>> and fix up the last max_t to use the proper type.
>
> Hmm, both atomic_pool_size and the argument to __dma_atomic_pool_init() 
> where this gets directly passed later are size_t, not to mention that the 
> function name says we're calculating a size, so I'd say size_t *is* the 
> proper type to return here.

But the type passed to calculate_pool_size isn't about the return type,
but rather the type to use for the comparing the other two arguments.

The should generally by the largest of the involved types.  Besides
that using a size_t for a number of pages is not a correct use of
size_t, but that's a separate story.

But I'll go back to what we had originally, there's no good reason
to change it in this series.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-17 12:50   ` Robin Murphy
@ 2022-08-21 10:43     ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-21 10:43 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, iommu, Marek Szyprowski, Michal Hocko,
	David Rientjes, linux-kernel

On Wed, Aug 17, 2022 at 01:50:09PM +0100, Robin Murphy wrote:
>>   +#if defined(CONFIG_ZONE_DMA) || defined(CONFIG_ZONE_DMA32)
>
> This #ifdeffery seems horribly clunky - I think it would be much nicer to 
> mark this __maybe_unused, and preserve the has_managed_dma/IS_ENABLED logic 
> below.

>>   static int __init dma_atomic_pool_init(void)
>>   {
>> +	unsigned long nr_zone_dma_pages, nr_zone_dma32_pages;
>
> ...otherwise, I expect the buildbots will be along shortly with unused 
> variable warnings for these :)

Amazingly they haven't complained yet, even if they should.  But
simple IS_ENABLED won't work either as the ZONE_DMA / ZONE_DMA32
constants are not defined unless the corresponding config options
are set.  I'll see what I can do here using helpers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-17 12:53   ` Michal Hocko
@ 2022-08-21 10:44     ` Christoph Hellwig
  2022-08-22  8:41       ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-21 10:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, iommu, Marek Szyprowski, Robin Murphy,
	David Rientjes, linux-kernel

On Wed, Aug 17, 2022 at 02:53:47PM +0200, Michal Hocko wrote:
> Yes this is much more reasonable estimation. Especially for the DMA
> zone. I still believe some downscaling would make sense but as said
> earlier nothing to really insist on. Also you likely want to have
> pr_warn for non-poppulated DMA zone as pointed in other thread[1].

So I think the warning would be useful, but how do we detect that the
zone should exist and is empty vs the case where it doesn't even
exist based on a runtime decision?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-21 10:44     ` Christoph Hellwig
@ 2022-08-22  8:41       ` Michal Hocko
  2022-08-22 11:14         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2022-08-22  8:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Marek Szyprowski, Robin Murphy, David Rientjes, linux-kernel

On Sun 21-08-22 12:44:16, Christoph Hellwig wrote:
> On Wed, Aug 17, 2022 at 02:53:47PM +0200, Michal Hocko wrote:
> > Yes this is much more reasonable estimation. Especially for the DMA
> > zone. I still believe some downscaling would make sense but as said
> > earlier nothing to really insist on. Also you likely want to have
> > pr_warn for non-poppulated DMA zone as pointed in other thread[1].
> 
> So I think the warning would be useful, but how do we detect that the
> zone should exist and is empty vs the case where it doesn't even
> exist based on a runtime decision?

ZONE_DMA existance is not runtime controlable (at least on x86). We only
have 2 possible cases. There is no memory released into the page
allocator (zone_managed_pages == 0) because the physical memory range
has been reserved during early boot or it has been consumed by somebody.

From the DMA pool allocator they are effectively the same case because
the pool fails to allocate. From the reporting POV those cases could be
arguably different because the later case can report the meminfo while
the !managed case doesn't have much to report about except for "You do
not have anything there". TBH I am not really sure whether the full
meminfo report is all that useful because there is not much you can
learn from that anyway. The full report looks like this

: [   14.017417][    T1] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-7
: [   14.017429][    T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.14.21-150400.22-default #1 SLE15-SP4 0b6a6578ade2de5c4a0b916095dff44f76ef1704
: [   14.017434][    T1] Hardware name: FUJITSU PRIMEQUEST 2800E3/D3752, BIOS PRIMEQUEST 2000 Series BIOS Version 01.52 11/23/2021
: [   14.017437][    T1] Call Trace:
: [   14.017444][    T1]  <TASK>
: [   14.017449][    T1]  dump_stack_lvl+0x45/0x57
: [   14.017469][    T1]  warn_alloc+0xfe/0x160
: [   14.017490][    T1]  __alloc_pages_slowpath.constprop.112+0xc27/0xc60
: [   14.017497][    T1]  ? rdinit_setup+0x2b/0x2b
: [   14.017509][    T1]  ? rdinit_setup+0x2b/0x2b
: [   14.017512][    T1]  __alloc_pages+0x2d5/0x320
: [   14.017517][    T1]  alloc_page_interleave+0xf/0x70
: [   14.017531][    T1]  atomic_pool_expand+0x4a/0x200
: [   14.017541][    T1]  ? rdinit_setup+0x2b/0x2b
: [   14.017544][    T1]  __dma_atomic_pool_init+0x44/0x90
: [   14.017556][    T1]  dma_atomic_pool_init+0xad/0x13f
: [   14.017560][    T1]  ? __dma_atomic_pool_init+0x90/0x90
: [   14.017562][    T1]  do_one_initcall+0x41/0x200
: [   14.017581][    T1]  kernel_init_freeable+0x236/0x298
: [   14.017589][    T1]  ? rest_init+0xd0/0xd0
: [   14.017596][    T1]  kernel_init+0x16/0x120
: [   14.017599][    T1]  ret_from_fork+0x22/0x30
: [   14.017604][    T1]  </TASK>
: [   14.017605][    T1] Mem-Info:
: [   14.017956][    T1] active_anon:0 inactive_anon:0 isolated_anon:0
: [   14.017956][    T1]  active_file:0 inactive_file:0 isolated_file:0
: [   14.017956][    T1]  unevictable:0 dirty:0 writeback:0
: [   14.017956][    T1]  slab_reclaimable:876 slab_unreclaimable:30300
: [   14.017956][    T1]  mapped:0 shmem:0 pagetables:12 bounce:0
: [   14.017956][    T1]  free:3170151735 free_pcp:6868 free_cma:0
: [   14.017962][    T1] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:7200kB pagetables:4kB all_unreclaimable? no
: [   14.017969][    T1] Node 1 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:6728kB pagetables:16kB all_unreclaimable? no
: [   14.017973][    T1] Node 2 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:6800kB pagetables:0kB all_unreclaimable? no
: [   14.017977][    T1] Node 3 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:6960kB pagetables:0kB all_unreclaimable? no
: [   14.017982][    T1] Node 4 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:6656kB pagetables:0kB all_unreclaimable? no
: [   14.017985][    T1] Node 5 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:6784kB pagetables:24kB all_unreclaimable? no
: [   14.017989][    T1] Node 6 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:6928kB pagetables:4kB all_unreclaimable? no
: [   14.017994][    T1] Node 7 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:7072kB pagetables:0kB all_unreclaimable? no
: [   14.018026][    T1] Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
: [   14.018035][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018063][    T1] Node 0 DMA32 free:0kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1313380kB managed:1154392kB mlocked:0kB bounce:0kB free_pcp:216kB local_pcp:0kB free_cma:0kB
: [   14.018070][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018098][    T1] Node 0 Normal free:1583344560kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1609039872kB managed:1583858824kB mlocked:0kB bounce:0kB free_pcp:12692kB local_pcp:256kB free_cma:0kB
: [   14.018105][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018132][    T1] Node 1 Normal free:1585397152kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1585429652kB mlocked:0kB bounce:0kB free_pcp:1772kB local_pcp:76kB free_cma:0kB
: [   14.018138][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018161][    T1] Node 2 Normal free:1585394152kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1585429652kB mlocked:0kB bounce:0kB free_pcp:2292kB local_pcp:48kB free_cma:0kB
: [   14.018166][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018191][    T1] Node 3 Normal free:1585394052kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1585429652kB mlocked:0kB bounce:0kB free_pcp:2336kB local_pcp:0kB free_cma:0kB
: [   14.018197][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018232][    T1] Node 4 Normal free:1585397584kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1585429652kB mlocked:0kB bounce:0kB free_pcp:1764kB local_pcp:68kB free_cma:0kB
: [   14.018238][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018264][    T1] Node 5 Normal free:1585349060kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1585386604kB mlocked:0kB bounce:0kB free_pcp:1924kB local_pcp:32kB free_cma:0kB
: [   14.018270][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018300][    T1] Node 6 Normal free:1585395340kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1585429652kB mlocked:0kB bounce:0kB free_pcp:2160kB local_pcp:72kB free_cma:0kB
: [   14.018305][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018330][    T1] Node 7 Normal free:1584934880kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1610612736kB managed:1584970756kB mlocked:0kB bounce:0kB free_pcp:2316kB local_pcp:100kB free_cma:0kB
: [   14.018335][    T1] lowmem_reserve[]: 0 0 0 0 0
: [   14.018339][    T1] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 160kB
: [   14.018352][    T1] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
: [   14.018361][    T1] Node 0 Normal: 1*4kB (U) 1*8kB (M) 0*16kB 1*32kB (E) 1*64kB (U) 2*128kB (ME) 2*256kB (ME) 2*512kB (UM) 1*1024kB (E) 0*2048kB 386558*4096kB (M) = 1583344492kB
: [   14.018377][    T1] Node 1 Normal: 17*4kB (UM) 1*8kB (U) 2*16kB (UM) 2*32kB (UE) 1*64kB (E) 2*128kB (UM) 3*256kB (UME) 2*512kB (ME) 1*1024kB (E) 2*2048kB (ME) 387058*4096kB (M) = 1585396972kB
: [   14.018393][    T1] Node 2 Normal: 44*4kB (UM) 1*8kB (U) 1*16kB (M) 1*32kB (U) 2*64kB (UE) 3*128kB (UME) 3*256kB (UME) 2*512kB (UM) 0*1024kB 3*2048kB (UME) 387057*4096kB (M) = 1585394152kB
: [   14.018409][    T1] Node 3 Normal: 37*4kB (UM) 0*8kB 1*16kB (M) 1*32kB (U) 1*64kB (E) 3*128kB (UME) 3*256kB (UME) 2*512kB (ME) 0*1024kB 3*2048kB (UME) 387057*4096kB (M) = 1585394052kB
: [   14.018423][    T1] Node 4 Normal: 43*4kB (UM) 1*8kB (E) 2*16kB (UM) 1*32kB (E) 2*64kB (UE) 1*128kB (M) 3*256kB (UME) 3*512kB (UME) 1*1024kB (E) 2*2048kB (ME) 387058*4096kB (M) = 1585397492kB
: [   14.018438][    T1] Node 5 Normal: 27*4kB (UM) 0*8kB 1*16kB (M) 0*32kB 1*64kB (E) 2*128kB (ME) 2*256kB (UM) 3*512kB (UME) 2*1024kB (UE) 2*2048kB (ME) 387046*4096kB (M) = 1585349052kB
: [   14.018451][    T1] Node 6 Normal: 46*4kB (UM) 0*8kB 2*16kB (UM) 0*32kB 2*64kB (UE) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 2*1024kB (UE) 3*2048kB (UME) 387057*4096kB (M) = 1585395288kB
: [   14.018465][    T1] Node 7 Normal: 30*4kB (UM) 3*8kB (UME) 0*16kB 2*32kB (ME) 2*64kB (ME) 2*128kB (ME) 1*256kB (U) 2*512kB (UM) 2*1024kB (ME) 2*2048kB (UE) 386945*4096kB (M) = 1584934736kB
: [   14.018480][    T1] 0 total pagecache pages
: [   14.018483][    T1] 0 pages in swap cache
: [   14.018484][    T1] Swap cache stats: add 0, delete 0, find 0/0
: [   14.018486][    T1] Free swap  = 0kB
: [   14.018487][    T1] Total swap = 0kB
: [   14.018488][    T1] 3221164600 pages RAM
: [   14.018489][    T1] 0 pages HighMem/MovableOnly
: [   14.018490][    T1] 50531051 pages reserved
: [   14.018491][    T1] 0 pages cma reserved
: [   14.018492][    T1] 0 pages hwpoisoned

From that the only useful information for me as the MM developer would
be 
: swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-7

because this tells me that this is 4MB request
[...]
: Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

because this tells me that the zone is populated (managed is 16MB) but mostly consumed (free == 160kB)

and 
: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 160kB

because this would tell me whether the zone was fragmented to satisfy the
allocation if free would be large enough to cover the request (not in
this case)

We could improve the reporting here and drop all the nodes which do not
have DMA zone populated which would shrink the overall size. On the
other hand there are parts which are not all that interesting in this
case but useful they are useful in most other situations - i.e. the
stack trace or cumulative stats.

IMHO it would be more useful to have something like this instead. This
would reduce the output to the 
Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 160kB
for all nodes, which should tell the whole story because it would tell
both the fragmentation as well as the lack of memory case.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7898e29bcfb5..5e16a899c620 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1828,6 +1828,7 @@ extern void pagefault_out_of_memory(void);
 #define SHOW_MEM_FILTER_NODES		(0x0001u)	/* disallowed nodes */
 
 extern void show_free_areas(unsigned int flags, nodemask_t *nodemask);
+extern void show_zone_free_area(struct zone *zone);
 
 #ifdef CONFIG_MMU
 extern bool can_do_mlock(void);
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 2a6c71311a17..a4b125a7ae90 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -197,14 +197,27 @@ static struct gen_pool * __init __maybe_unused zone_pool_init(int zone_idx,
 {
 	struct pglist_data *pgdat;
 	unsigned long nr_pages = 0;
+	struct gen_pool *pool;
+	size_t pool_size = 0;
 
 	for_each_online_pgdat(pgdat)
 		nr_pages += zone_managed_pages(&pgdat->node_zones[zone_idx]);
 
-	if (!nr_pages)
-		return NULL;
+	/*
+	 * memory failure report can be pretty noisy so limit the report to
+	 * necessary minumum
+	 * */
+	if (nr_pages) {
+		pool_size = calculate_pool_size(nr_pages);
+		pool = __dma_atomic_pool_init(pool_size, gfp | __GFP_NOWARN);
+	}
 
-	return __dma_atomic_pool_init(calculate_pool_size(nr_pages), gfp);
+	if (!pool) {
+		pr_info("Failed to pre-allocate pool to %lu pages\n", pool_size);
+		for_each_online_pgdat(pgdat)
+			show_zone_free_area(&pgdat->node_zones[zone_idx]);
+	}
+	return pool
 }
 
 static int __init dma_atomic_pool_init(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b5b14b78c4fd..49ff8f156b7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5926,6 +5926,38 @@ static void show_migration_types(unsigned char type)
 	printk(KERN_CONT "(%s) ", tmp);
 }
 
+void show_zone_free_area(struct zone *zone)
+{
+	unsigned int order;
+	unsigned long nr[MAX_ORDER], flags, total = 0;
+	unsigned char types[MAX_ORDER];
+
+	show_node(zone);
+	printk(KERN_CONT "%s: ", zone->name);
+
+	spin_lock_irqsave(&zone->lock, flags);
+	for (order = 0; order < MAX_ORDER; order++) {
+		struct free_area *area = &zone->free_area[order];
+		int type;
+
+		nr[order] = area->nr_free;
+		total += nr[order] << order;
+
+		types[order] = 0;
+		for (type = 0; type < MIGRATE_TYPES; type++) {
+			if (!free_area_empty(area, type))
+				types[order] |= 1 << type;
+		}
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+	for (order = 0; order < MAX_ORDER; order++) {
+		printk(KERN_CONT "%lu*%lukB ",
+		       nr[order], K(1UL) << order);
+		if (nr[order])
+			show_migration_types(types[order]);
+	}
+	printk(KERN_CONT "= %lukB\n", K(total));
+}
 /*
  * Show free area list (used inside shift_scroll-lock stuff)
  * We also calculate the percentage fragmentation. We do this by counting the
@@ -6093,37 +6125,9 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 	}
 
 	for_each_populated_zone(zone) {
-		unsigned int order;
-		unsigned long nr[MAX_ORDER], flags, total = 0;
-		unsigned char types[MAX_ORDER];
-
 		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
 			continue;
-		show_node(zone);
-		printk(KERN_CONT "%s: ", zone->name);
-
-		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
-			struct free_area *area = &zone->free_area[order];
-			int type;
-
-			nr[order] = area->nr_free;
-			total += nr[order] << order;
-
-			types[order] = 0;
-			for (type = 0; type < MIGRATE_TYPES; type++) {
-				if (!free_area_empty(area, type))
-					types[order] |= 1 << type;
-			}
-		}
-		spin_unlock_irqrestore(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
-			printk(KERN_CONT "%lu*%lukB ",
-			       nr[order], K(1UL) << order);
-			if (nr[order])
-				show_migration_types(types[order]);
-		}
-		printk(KERN_CONT "= %lukB\n", K(total));
+		show_zone_free_area(zone);
 	}
 
 	hugetlb_show_meminfo();
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-22  8:41       ` Michal Hocko
@ 2022-08-22 11:14         ` Christoph Hellwig
  2022-08-22 12:09           ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-22 11:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, iommu, Marek Szyprowski, Robin Murphy,
	David Rientjes, linux-kernel

On Mon, Aug 22, 2022 at 10:41:01AM +0200, Michal Hocko wrote:
> ZONE_DMA existance is not runtime controlable (at least on x86).

This is more about ZONE_DMA32 that can disappear if all memory is
inside ZONE_DMA.  That is highly unusual (but still possible) with a
tiny x86-style ZONE_DMA, but can happen with a larger one other
architectures.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-22 11:14         ` Christoph Hellwig
@ 2022-08-22 12:09           ` Michal Hocko
  2022-08-22 12:27             ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2022-08-22 12:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Marek Szyprowski, Robin Murphy, David Rientjes, linux-kernel

On Mon 22-08-22 13:14:47, Christoph Hellwig wrote:
> On Mon, Aug 22, 2022 at 10:41:01AM +0200, Michal Hocko wrote:
> > ZONE_DMA existance is not runtime controlable (at least on x86).
> 
> This is more about ZONE_DMA32 that can disappear if all memory is
> inside ZONE_DMA.  That is highly unusual (but still possible) with a
> tiny x86-style ZONE_DMA, but can happen with a larger one other
> architectures.

OK, so you are worried about configurations that lack ZONE_DMA32 because
that memory range is covered by ZONE_DMA? I thought though simply do not
have CONFIG_ZONE_DMA32. Or am I wrong here?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-22 12:09           ` Michal Hocko
@ 2022-08-22 12:27             ` Christoph Hellwig
  2022-08-22 14:21               ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-22 12:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, iommu, Marek Szyprowski, Robin Murphy,
	David Rientjes, linux-kernel

On Mon, Aug 22, 2022 at 02:09:13PM +0200, Michal Hocko wrote:
> OK, so you are worried about configurations that lack ZONE_DMA32 because
> that memory range is covered by ZONE_DMA? I thought though simply do not
> have CONFIG_ZONE_DMA32. Or am I wrong here?

Zones can disappear when they are empty.  Normally that affects
ZONE_NORMAL and ZONE_HIGHMEM, but it can also happen to ZONE_DMA32
when there is not enough memory.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-22 12:27             ` Christoph Hellwig
@ 2022-08-22 14:21               ` Michal Hocko
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Hocko @ 2022-08-22 14:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Marek Szyprowski, Robin Murphy, David Rientjes, linux-kernel

On Mon 22-08-22 14:27:59, Christoph Hellwig wrote:
> On Mon, Aug 22, 2022 at 02:09:13PM +0200, Michal Hocko wrote:
> > OK, so you are worried about configurations that lack ZONE_DMA32 because
> > that memory range is covered by ZONE_DMA? I thought though simply do not
> > have CONFIG_ZONE_DMA32. Or am I wrong here?
> 
> Zones can disappear when they are empty.  Normally that affects
> ZONE_NORMAL and ZONE_HIGHMEM, but it can also happen to ZONE_DMA32
> when there is not enough memory.

Correct but that is essentially the same problem as ZONE_DMA becoming
empty. Or maybe I am just completely missing what you are trying to
say/ask?

My whole point was that if those zones are configured and you cannot
pre-allocate the pool then this can be either a result of OOM on that
zone or zone completely empty. In either case the pool would be empty
and if that is something worth bothering^Wnotifying an admin about then
better provide something useful rather than over messaging for one case
and be silent about the other. See my point?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools
  2022-08-22  6:12 fix dma coherent pool sizing Christoph Hellwig
@ 2022-08-22  6:12 ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-08-22  6:12 UTC (permalink / raw)
  To: iommu, Marek Szyprowski, Robin Murphy
  Cc: Michal Hocko, David Rientjes, linux-kernel

Limit the sizing of the atomic pools for allocations from
ZONE_DMA and ZONE_DMA32 based on the number of pages actually
present in those zones instead of the total memory.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask")
Reported-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 kernel/dma/pool.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 4b4c59d05e436..2a6c71311a175 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -192,28 +192,40 @@ static size_t calculate_pool_size(unsigned long zone_pages)
 	return max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
 }
 
+static struct gen_pool * __init __maybe_unused zone_pool_init(int zone_idx,
+		gfp_t gfp)
+{
+	struct pglist_data *pgdat;
+	unsigned long nr_pages = 0;
+
+	for_each_online_pgdat(pgdat)
+		nr_pages += zone_managed_pages(&pgdat->node_zones[zone_idx]);
+
+	if (!nr_pages)
+		return NULL;
+
+	return __dma_atomic_pool_init(calculate_pool_size(nr_pages), gfp);
+}
+
 static int __init dma_atomic_pool_init(void)
 {
+#ifdef CONFIG_ZONE_DMA
+	atomic_pool_dma = zone_pool_init(ZONE_DMA, GFP_DMA);
+#endif
+#ifdef CONFIG_ZONE_DMA32
+	atomic_pool_dma32 = zone_pool_init(ZONE_DMA32, GFP_DMA32);
+#endif
+
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
 	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
 	 */
 	if (!atomic_pool_size)
 		atomic_pool_size = calculate_pool_size(totalram_pages());
-
-	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
-
 	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
 						    GFP_KERNEL);
-	if (has_managed_dma()) {
-		atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
-						GFP_KERNEL | GFP_DMA);
-	}
-	if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
-		atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
-						GFP_KERNEL | GFP_DMA32);
-	}
 
+	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
 	dma_atomic_pool_debugfs_init();
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-22 14:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17  6:06 fix dma coherent pool sizing Christoph Hellwig
2022-08-17  6:06 ` [PATCH 1/3] dma-pool: factor out a calculate_pool_size helper Christoph Hellwig
2022-08-17 12:32   ` Robin Murphy
2022-08-21 10:41     ` Christoph Hellwig
2022-08-17  6:06 ` [PATCH 2/3] dma-pool: don't return errors from dma_atomic_pool_init Christoph Hellwig
2022-08-17 12:36   ` Robin Murphy
2022-08-17  6:06 ` [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools Christoph Hellwig
2022-08-17 12:50   ` Robin Murphy
2022-08-21 10:43     ` Christoph Hellwig
2022-08-17 12:53   ` Michal Hocko
2022-08-21 10:44     ` Christoph Hellwig
2022-08-22  8:41       ` Michal Hocko
2022-08-22 11:14         ` Christoph Hellwig
2022-08-22 12:09           ` Michal Hocko
2022-08-22 12:27             ` Christoph Hellwig
2022-08-22 14:21               ` Michal Hocko
2022-08-22  6:12 fix dma coherent pool sizing Christoph Hellwig
2022-08-22  6:12 ` [PATCH 3/3] dma-pool: limit DMA and DMA32 zone size pools Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).