linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] mm/cma: better error handling and count pages per zone
@ 2021-01-27 10:18 David Hildenbrand
  2021-01-27 10:18 ` [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails David Hildenbrand
  2021-01-27 10:18 ` [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo David Hildenbrand
  0 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-27 10:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko,
	Mike Rapoport, Oscar Salvador, Peter Zijlstra (Intel),
	Thomas Gleixner, Wei Yang

Two improvements for CMA:
1. When activation of a CMA area fails, hand back all pages to the buddy
2. Count CMA pages per zone and print them in /proc/zoneinfo

David Hildenbrand (2):
  mm/cma: expose all pages to the buddy if activation of an area fails
  mm/page_alloc: count CMA pages per zone and print them in
    /proc/zoneinfo

 include/linux/mmzone.h |  4 ++++
 mm/cma.c               | 43 +++++++++++++++++++++---------------------
 mm/page_alloc.c        |  1 +
 mm/vmstat.c            |  6 ++++--
 4 files changed, 30 insertions(+), 24 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails
  2021-01-27 10:18 [PATCH v1 0/2] mm/cma: better error handling and count pages per zone David Hildenbrand
@ 2021-01-27 10:18 ` David Hildenbrand
  2021-01-27 15:58   ` Zi Yan
  2021-01-28  9:59   ` Oscar Salvador
  2021-01-27 10:18 ` [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo David Hildenbrand
  1 sibling, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-27 10:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang

Right now, if activation fails, we might already have exposed some pages to
the buddy for CMA use (although they will never get actually used by CMA),
and some pages won't be exposed to the buddy at all.

Let's check for "single zone" early and on error, don't expose any pages
for CMA use - instead, expose them to the buddy available for any use.
Simply call free_reserved_page() on every single page - easier than
going via free_reserved_area(), converting back and forth between pfns
and virt addresses.

In addition, make sure to fixup totalcma_pages properly.

Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...":
  [    0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
  [    0.006893] cma: Reserved 2048 MiB at 0x0000000100000000
  [    0.006893] hugetlb_cma: reserved 2048 MiB on node 0
  ...
  [    0.175433] cma: CMA area hugetlb0 could not be activated

Before this patch:
  # cat /proc/meminfo
  MemTotal:        5867348 kB
  MemFree:         5692808 kB
  MemAvailable:    5542516 kB
  ...
  CmaTotal:        2097152 kB
  CmaFree:         1884160 kB

After this patch:
  # cat /proc/meminfo
  MemTotal:        6077308 kB
  MemFree:         5904208 kB
  MemAvailable:    5747968 kB
  ...
  CmaTotal:              0 kB
  CmaFree:               0 kB

Note: cma_init_reserved_mem() makes sure that we always cover full
pageblocks / MAX_ORDER - 1 pages.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/cma.c | 43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 0ba69cd16aeb..23d4a97c834a 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -94,34 +94,29 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn,
 
 static void __init cma_activate_area(struct cma *cma)
 {
-	unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
-	unsigned i = cma->count >> pageblock_order;
+	unsigned long base_pfn = cma->base_pfn, pfn;
 	struct zone *zone;
 
 	cma->bitmap = bitmap_zalloc(cma_bitmap_maxno(cma), GFP_KERNEL);
 	if (!cma->bitmap)
 		goto out_error;
 
-	WARN_ON_ONCE(!pfn_valid(pfn));
-	zone = page_zone(pfn_to_page(pfn));
-
-	do {
-		unsigned j;
-
-		base_pfn = pfn;
-		for (j = pageblock_nr_pages; j; --j, pfn++) {
-			WARN_ON_ONCE(!pfn_valid(pfn));
-			/*
-			 * alloc_contig_range requires the pfn range
-			 * specified to be in the same zone. Make this
-			 * simple by forcing the entire CMA resv range
-			 * to be in the same zone.
-			 */
-			if (page_zone(pfn_to_page(pfn)) != zone)
-				goto not_in_zone;
-		}
-		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
-	} while (--i);
+	/*
+	 * alloc_contig_range() requires the pfn range specified to be in the
+	 * same zone. Simplify by forcing the entire CMA resv range to be in the
+	 * same zone.
+	 */
+	WARN_ON_ONCE(!pfn_valid(base_pfn));
+	zone = page_zone(pfn_to_page(base_pfn));
+	for (pfn = base_pfn + 1; pfn < base_pfn + cma->count; pfn++) {
+		WARN_ON_ONCE(!pfn_valid(pfn));
+		if (page_zone(pfn_to_page(pfn)) != zone)
+			goto not_in_zone;
+	}
+
+	for (pfn = base_pfn; pfn < base_pfn + cma->count;
+	     pfn += pageblock_nr_pages)
+		init_cma_reserved_pageblock(pfn_to_page(pfn));
 
 	mutex_init(&cma->lock);
 
@@ -135,6 +130,10 @@ static void __init cma_activate_area(struct cma *cma)
 not_in_zone:
 	bitmap_free(cma->bitmap);
 out_error:
+	/* Expose all pages to the buddy, they are useless for CMA. */
+	for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
+		free_reserved_page(pfn_to_page(pfn));
+	totalcma_pages -= cma->count;
 	cma->count = 0;
 	pr_err("CMA area %s could not be activated\n", cma->name);
 	return;
-- 
2.29.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-27 10:18 [PATCH v1 0/2] mm/cma: better error handling and count pages per zone David Hildenbrand
  2021-01-27 10:18 ` [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails David Hildenbrand
@ 2021-01-27 10:18 ` David Hildenbrand
  2021-01-28 10:22   ` Oscar Salvador
                     ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-27 10:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang

Let's count the number of CMA pages per zone and print them in
/proc/zoneinfo.

Having access to the total number of CMA pages per zone is helpful for
debugging purposes to know where exactly the CMA pages ended up, and to
figure out how many pages of a zone might behave differently (e.g., like
ZONE_MOVABLE) - even after some of these pages might already have been
allocated.

For now, we are only able to get the global nr+free cma pages from
/proc/meminfo and the free cma pages per zone from /proc/zoneinfo.

Note: Track/print that information even without CONFIG_CMA, similar to
"nr_free_cma" in /proc/zoneinfo. This is different to /proc/meminfo -
maybe we want to make that consistent in the future (however, changing
/proc/zoneinfo output might uglify the code a bit).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mmzone.h | 4 ++++
 mm/page_alloc.c        | 1 +
 mm/vmstat.c            | 6 ++++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae588b2f87ef..3bc18c9976fd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -503,6 +503,9 @@ struct zone {
 	 * bootmem allocator):
 	 *	managed_pages = present_pages - reserved_pages;
 	 *
+	 * cma pages is present pages that are assigned for CMA use
+	 * (MIGRATE_CMA).
+	 *
 	 * So present_pages may be used by memory hotplug or memory power
 	 * management logic to figure out unmanaged pages by checking
 	 * (present_pages - managed_pages). And managed_pages should be used
@@ -527,6 +530,7 @@ struct zone {
 	atomic_long_t		managed_pages;
 	unsigned long		spanned_pages;
 	unsigned long		present_pages;
+	unsigned long		cma_pages;
 
 	const char		*name;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b031a5ae0bd5..9a82375bbcb2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 	}
 
 	adjust_managed_page_count(page, pageblock_nr_pages);
+	page_zone(page)->cma_pages += pageblock_nr_pages;
 }
 #endif
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7758486097f9..97fc32a53320 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1642,14 +1642,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   "\n        high     %lu"
 		   "\n        spanned  %lu"
 		   "\n        present  %lu"
-		   "\n        managed  %lu",
+		   "\n        managed  %lu"
+		   "\n        cma      %lu",
 		   zone_page_state(zone, NR_FREE_PAGES),
 		   min_wmark_pages(zone),
 		   low_wmark_pages(zone),
 		   high_wmark_pages(zone),
 		   zone->spanned_pages,
 		   zone->present_pages,
-		   zone_managed_pages(zone));
+		   zone_managed_pages(zone),
+		   zone->cma_pages);
 
 	seq_printf(m,
 		   "\n        protection: (%ld",
-- 
2.29.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails
  2021-01-27 10:18 ` [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails David Hildenbrand
@ 2021-01-27 15:58   ` Zi Yan
  2021-01-28  9:59   ` Oscar Salvador
  1 sibling, 0 replies; 20+ messages in thread
From: Zi Yan @ 2021-01-27 15:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang

[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]

On 27 Jan 2021, at 5:18, David Hildenbrand wrote:

> Right now, if activation fails, we might already have exposed some pages to
> the buddy for CMA use (although they will never get actually used by CMA),
> and some pages won't be exposed to the buddy at all.
>
> Let's check for "single zone" early and on error, don't expose any pages
> for CMA use - instead, expose them to the buddy available for any use.
> Simply call free_reserved_page() on every single page - easier than
> going via free_reserved_area(), converting back and forth between pfns
> and virt addresses.
>
> In addition, make sure to fixup totalcma_pages properly.
>
> Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...":
>   [    0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
>   [    0.006893] cma: Reserved 2048 MiB at 0x0000000100000000
>   [    0.006893] hugetlb_cma: reserved 2048 MiB on node 0
>   ...
>   [    0.175433] cma: CMA area hugetlb0 could not be activated
>
> Before this patch:
>   # cat /proc/meminfo
>   MemTotal:        5867348 kB
>   MemFree:         5692808 kB
>   MemAvailable:    5542516 kB
>   ...
>   CmaTotal:        2097152 kB
>   CmaFree:         1884160 kB
>
> After this patch:
>   # cat /proc/meminfo
>   MemTotal:        6077308 kB
>   MemFree:         5904208 kB
>   MemAvailable:    5747968 kB
>   ...
>   CmaTotal:              0 kB
>   CmaFree:               0 kB
>
> Note: cma_init_reserved_mem() makes sure that we always cover full
> pageblocks / MAX_ORDER - 1 pages.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/cma.c | 43 +++++++++++++++++++++----------------------
>  1 file changed, 21 insertions(+), 22 deletions(-)

LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails
  2021-01-27 10:18 ` [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails David Hildenbrand
  2021-01-27 15:58   ` Zi Yan
@ 2021-01-28  9:59   ` Oscar Salvador
  1 sibling, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2021-01-28  9:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang

On Wed, Jan 27, 2021 at 11:18:12AM +0100, David Hildenbrand wrote:
> Right now, if activation fails, we might already have exposed some pages to
> the buddy for CMA use (although they will never get actually used by CMA),
> and some pages won't be exposed to the buddy at all.
> 
> Let's check for "single zone" early and on error, don't expose any pages
> for CMA use - instead, expose them to the buddy available for any use.
> Simply call free_reserved_page() on every single page - easier than
> going via free_reserved_area(), converting back and forth between pfns
> and virt addresses.
> 
> In addition, make sure to fixup totalcma_pages properly.
> 
> Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...":
>   [    0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
>   [    0.006893] cma: Reserved 2048 MiB at 0x0000000100000000
>   [    0.006893] hugetlb_cma: reserved 2048 MiB on node 0
>   ...
>   [    0.175433] cma: CMA area hugetlb0 could not be activated
> 
> Before this patch:
>   # cat /proc/meminfo
>   MemTotal:        5867348 kB
>   MemFree:         5692808 kB
>   MemAvailable:    5542516 kB
>   ...
>   CmaTotal:        2097152 kB
>   CmaFree:         1884160 kB
> 
> After this patch:
>   # cat /proc/meminfo
>   MemTotal:        6077308 kB
>   MemFree:         5904208 kB
>   MemAvailable:    5747968 kB
>   ...
>   CmaTotal:              0 kB
>   CmaFree:               0 kB
> 
> Note: cma_init_reserved_mem() makes sure that we always cover full
> pageblocks / MAX_ORDER - 1 pages.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Besides benefit of the error handling, I find this code much more
cleaer:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-27 10:18 ` [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo David Hildenbrand
@ 2021-01-28 10:22   ` Oscar Salvador
  2021-01-28 10:43     ` David Hildenbrand
  2021-01-28 16:45   ` [PATCH v2] " David Hildenbrand
  2021-01-29 11:34   ` [PATCH v3] " David Hildenbrand
  2 siblings, 1 reply; 20+ messages in thread
From: Oscar Salvador @ 2021-01-28 10:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang

On Wed, Jan 27, 2021 at 11:18:13AM +0100, David Hildenbrand wrote:
> Let's count the number of CMA pages per zone and print them in
> /proc/zoneinfo.
> 
> Having access to the total number of CMA pages per zone is helpful for
> debugging purposes to know where exactly the CMA pages ended up, and to
> figure out how many pages of a zone might behave differently (e.g., like
> ZONE_MOVABLE) - even after some of these pages might already have been
> allocated.

My knowledge of CMA tends to be quite low, actually I though that CMA
was somehow tied to ZONE_MOVABLE.

I see how tracking CMA pages per zona might give you a clue, but what do
you mean by "might behave differently - even after some of these pages might
already have been allocated"

> For now, we are only able to get the global nr+free cma pages from
> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
> 
> Note: Track/print that information even without CONFIG_CMA, similar to
> "nr_free_cma" in /proc/zoneinfo. This is different to /proc/meminfo -
> maybe we want to make that consistent in the future (however, changing
> /proc/zoneinfo output might uglify the code a bit).
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/mmzone.h | 4 ++++
>  mm/page_alloc.c        | 1 +
>  mm/vmstat.c            | 6 ++++--
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ae588b2f87ef..3bc18c9976fd 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -503,6 +503,9 @@ struct zone {
>  	 * bootmem allocator):
>  	 *	managed_pages = present_pages - reserved_pages;
>  	 *
> +	 * cma pages is present pages that are assigned for CMA use
> +	 * (MIGRATE_CMA).
> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -527,6 +530,7 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +	unsigned long		cma_pages;

I see that NR_FREE_CMA_PAGES is there even without CONFIG_CMA, as you
said, but I am not sure about adding size to a zone unconditionally.
I mean, it is not terrible as IIRC, the maximum MAX_NUMNODES can get
is 1024, and on x86_64 that would be (1024 * 4 zones) * 8 = 32K.
So not a big deal, but still.

Besides following NR_FREE_CMA_PAGES, is there any reason for not doing:

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1e22d96734e0..2d8a830d168d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -436,6 +436,9 @@ struct zone {
        unsigned long           managed_pages;
        unsigned long           spanned_pages;
        unsigned long           present_pages;
+#ifdef CONFIG_CMA
+       unsigned long           cma_pages;
+#endif
 
        const char              *name;
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 8ba0870ecddd..5757df4bfd45 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1559,13 +1559,15 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
                   "\n        spanned  %lu"
                   "\n        present  %lu"
                   "\n        managed  %lu",
+                  "\n        cma      %lu",
                   zone_page_state(zone, NR_FREE_PAGES),
                   min_wmark_pages(zone),
                   low_wmark_pages(zone),
                   high_wmark_pages(zone),
                   zone->spanned_pages,
                   zone->present_pages,
-                  zone->managed_pages);
+                  zone->managed_pages,
+                  IS_ENABLED(CONFIG_CMA) ? zone->cma_pages : 0);
 
        seq_printf(m,
                   "\n        protection: (%ld",


I do not see it that ugly, but just my taste.


-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 10:22   ` Oscar Salvador
@ 2021-01-28 10:43     ` David Hildenbrand
  2021-01-28 13:44       ` Oscar Salvador
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2021-01-28 10:43 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang

On 28.01.21 11:22, Oscar Salvador wrote:
> On Wed, Jan 27, 2021 at 11:18:13AM +0100, David Hildenbrand wrote:
>> Let's count the number of CMA pages per zone and print them in
>> /proc/zoneinfo.
>>
>> Having access to the total number of CMA pages per zone is helpful for
>> debugging purposes to know where exactly the CMA pages ended up, and to
>> figure out how many pages of a zone might behave differently (e.g., like
>> ZONE_MOVABLE) - even after some of these pages might already have been
>> allocated.
> 
> My knowledge of CMA tends to be quite low, actually I though that CMA
> was somehow tied to ZONE_MOVABLE.

CMA is often placed into one of the kernel zones, but can also end up in the movable zone.

> 
> I see how tracking CMA pages per zona might give you a clue, but what do
> you mean by "might behave differently - even after some of these pages might
> already have been allocated"

Assume you have 4GB in ZONE_NORMAL but 1GB is assigned for CMA. You actually only have 3GB available for random kernel allocations, not 4GB.

Currently, you can only observe the free CMA pages, excluding any pages that are already allocated. Having that information how many CMA pages we have can be helpful - similar to what we already have in /proc/meminfo.

> 
>> For now, we are only able to get the global nr+free cma pages from
>> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
>>
>> Note: Track/print that information even without CONFIG_CMA, similar to
>> "nr_free_cma" in /proc/zoneinfo. This is different to /proc/meminfo -
>> maybe we want to make that consistent in the future (however, changing
>> /proc/zoneinfo output might uglify the code a bit).
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   include/linux/mmzone.h | 4 ++++
>>   mm/page_alloc.c        | 1 +
>>   mm/vmstat.c            | 6 ++++--
>>   3 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index ae588b2f87ef..3bc18c9976fd 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -503,6 +503,9 @@ struct zone {
>>   	 * bootmem allocator):
>>   	 *	managed_pages = present_pages - reserved_pages;
>>   	 *
>> +	 * cma pages is present pages that are assigned for CMA use
>> +	 * (MIGRATE_CMA).
>> +	 *
>>   	 * So present_pages may be used by memory hotplug or memory power
>>   	 * management logic to figure out unmanaged pages by checking
>>   	 * (present_pages - managed_pages). And managed_pages should be used
>> @@ -527,6 +530,7 @@ struct zone {
>>   	atomic_long_t		managed_pages;
>>   	unsigned long		spanned_pages;
>>   	unsigned long		present_pages;
>> +	unsigned long		cma_pages;
> 
> I see that NR_FREE_CMA_PAGES is there even without CONFIG_CMA, as you
> said, but I am not sure about adding size to a zone unconditionally.
> I mean, it is not terrible as IIRC, the maximum MAX_NUMNODES can get
> is 1024, and on x86_64 that would be (1024 * 4 zones) * 8 = 32K.
> So not a big deal, but still.

I'm asking myself how many such systems will run without
CONFIG_CMA in the future.

> 
> Besides following NR_FREE_CMA_PAGES, is there any reason for not doing:
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 1e22d96734e0..2d8a830d168d 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -436,6 +436,9 @@ struct zone {
>          unsigned long           managed_pages;
>          unsigned long           spanned_pages;
>          unsigned long           present_pages;
> +#ifdef CONFIG_CMA
> +       unsigned long           cma_pages;
> +#endif
>   
>          const char              *name;
>   
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 8ba0870ecddd..5757df4bfd45 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1559,13 +1559,15 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>                     "\n        spanned  %lu"
>                     "\n        present  %lu"
>                     "\n        managed  %lu",
> +                  "\n        cma      %lu",
>                     zone_page_state(zone, NR_FREE_PAGES),
>                     min_wmark_pages(zone),
>                     low_wmark_pages(zone),
>                     high_wmark_pages(zone),
>                     zone->spanned_pages,
>                     zone->present_pages,
> -                  zone->managed_pages);
> +                  zone->managed_pages,
> +                  IS_ENABLED(CONFIG_CMA) ? zone->cma_pages : 0);
>   
>          seq_printf(m,
>                     "\n        protection: (%ld",
> 
> 
> I do not see it that ugly, but just my taste.

IIRC, that does not work. The compiler will still complain
about a missing struct members. We would have to provide a
zone_cma_pages() helper with some ifdefery.



We could do something like this on top

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -530,7 +530,9 @@ struct zone {
         atomic_long_t           managed_pages;
         unsigned long           spanned_pages;
         unsigned long           present_pages;
+#ifdef CONFIG_CMA
         unsigned long           cma_pages;
+#endif
  
         const char              *name;
  
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 97fc32a53320..b753a64f099f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1643,7 +1643,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
                    "\n        spanned  %lu"
                    "\n        present  %lu"
                    "\n        managed  %lu"
-                  "\n        cma      %lu",
+#ifdef CONFIG_CMA
+                  "\n        cma      %lu"
+#endif
+                  "%s",
                    zone_page_state(zone, NR_FREE_PAGES),
                    min_wmark_pages(zone),
                    low_wmark_pages(zone),
@@ -1651,7 +1654,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
                    zone->spanned_pages,
                    zone->present_pages,
                    zone_managed_pages(zone),
-                  zone->cma_pages);
+#ifdef CONFIG_CMA
+                  zone->cma_pages,
+#endif
+                  "");
  
         seq_printf(m,
                    "\n        protection: (%ld",


Getting rid of NR_FREE_CMA_PAGES will be more ugly.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 10:43     ` David Hildenbrand
@ 2021-01-28 13:44       ` Oscar Salvador
  2021-01-28 13:46         ` Oscar Salvador
  2021-01-28 14:01         ` David Hildenbrand
  0 siblings, 2 replies; 20+ messages in thread
From: Oscar Salvador @ 2021-01-28 13:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang

On Thu, Jan 28, 2021 at 11:43:41AM +0100, David Hildenbrand wrote:
> > My knowledge of CMA tends to be quite low, actually I though that CMA
> > was somehow tied to ZONE_MOVABLE.
> 
> CMA is often placed into one of the kernel zones, but can also end up in the movable zone.

Ok good to know.

> > I see how tracking CMA pages per zona might give you a clue, but what do
> > you mean by "might behave differently - even after some of these pages might
> > already have been allocated"
> 
> Assume you have 4GB in ZONE_NORMAL but 1GB is assigned for CMA. You actually only have 3GB available for random kernel allocations, not 4GB.
> 
> Currently, you can only observe the free CMA pages, excluding any pages that are already allocated. Having that information how many CMA pages we have can be helpful - similar to what we already have in /proc/meminfo.

I see, I agree that it can provide some guidance. 

> > I see that NR_FREE_CMA_PAGES is there even without CONFIG_CMA, as you
> > said, but I am not sure about adding size to a zone unconditionally.
> > I mean, it is not terrible as IIRC, the maximum MAX_NUMNODES can get
> > is 1024, and on x86_64 that would be (1024 * 4 zones) * 8 = 32K.
> > So not a big deal, but still.
> 
> I'm asking myself how many such systems will run without
> CONFIG_CMA in the future.

I am not sure, my comment was just to point out that even the added size might
not be that large, hiding it under CONFIG_CMA seemed the right thing to
do.

> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 8ba0870ecddd..5757df4bfd45 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -1559,13 +1559,15 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
> >                     "\n        spanned  %lu"
> >                     "\n        present  %lu"
> >                     "\n        managed  %lu",
> > +                  "\n        cma      %lu",
> >                     zone_page_state(zone, NR_FREE_PAGES),
> >                     min_wmark_pages(zone),
> >                     low_wmark_pages(zone),
> >                     high_wmark_pages(zone),
> >                     zone->spanned_pages,
> >                     zone->present_pages,
> > -                  zone->managed_pages);
> > +                  zone->managed_pages,
> > +                  IS_ENABLED(CONFIG_CMA) ? zone->cma_pages : 0);
> >          seq_printf(m,
> >                     "\n        protection: (%ld",
> > 
> > 
> > I do not see it that ugly, but just my taste.
> 
> IIRC, that does not work. The compiler will still complain
> about a missing struct members. We would have to provide a
> zone_cma_pages() helper with some ifdefery.

Of course, it seems I switched off my brain.

> We could do something like this on top
> 
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -530,7 +530,9 @@ struct zone {
>         atomic_long_t           managed_pages;
>         unsigned long           spanned_pages;
>         unsigned long           present_pages;
> +#ifdef CONFIG_CMA
>         unsigned long           cma_pages;
> +#endif
>         const char              *name;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 97fc32a53320..b753a64f099f 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1643,7 +1643,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>                    "\n        spanned  %lu"
>                    "\n        present  %lu"
>                    "\n        managed  %lu"
> -                  "\n        cma      %lu",
> +#ifdef CONFIG_CMA
> +                  "\n        cma      %lu"
> +#endif
> +                  "%s",
>                    zone_page_state(zone, NR_FREE_PAGES),
>                    min_wmark_pages(zone),
>                    low_wmark_pages(zone),
> @@ -1651,7 +1654,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>                    zone->spanned_pages,
>                    zone->present_pages,
>                    zone_managed_pages(zone),
> -                  zone->cma_pages);
> +#ifdef CONFIG_CMA
> +                  zone->cma_pages,
> +#endif
> +                  "");
>         seq_printf(m,
>                    "\n        protection: (%ld",

Looks good to me, but I can see how those #ifdef can raise some
eyebrows.
Let us see what other thinks as well.

Btw, should linux-uapi be CCed, as /proc/vmstat layout will change?

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 13:44       ` Oscar Salvador
@ 2021-01-28 13:46         ` Oscar Salvador
  2021-01-28 14:01         ` David Hildenbrand
  1 sibling, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2021-01-28 13:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang

On Thu, Jan 28, 2021 at 02:44:58PM +0100, Oscar Salvador wrote:
> Btw, should linux-uapi be CCed, as /proc/vmstat layout will change?

I meant /proc/zoneinfo

> 
> -- 
> Oscar Salvador
> SUSE L3
> 

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 13:44       ` Oscar Salvador
  2021-01-28 13:46         ` Oscar Salvador
@ 2021-01-28 14:01         ` David Hildenbrand
  1 sibling, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-28 14:01 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang

On 28.01.21 14:44, Oscar Salvador wrote:
> On Thu, Jan 28, 2021 at 11:43:41AM +0100, David Hildenbrand wrote:
>>> My knowledge of CMA tends to be quite low, actually I though that CMA
>>> was somehow tied to ZONE_MOVABLE.
>>
>> CMA is often placed into one of the kernel zones, but can also end up in the movable zone.
> 
> Ok good to know.
> 
>>> I see how tracking CMA pages per zona might give you a clue, but what do
>>> you mean by "might behave differently - even after some of these pages might
>>> already have been allocated"
>>
>> Assume you have 4GB in ZONE_NORMAL but 1GB is assigned for CMA. You actually only have 3GB available for random kernel allocations, not 4GB.
>>
>> Currently, you can only observe the free CMA pages, excluding any pages that are already allocated. Having that information how many CMA pages we have can be helpful - similar to what we already have in /proc/meminfo.
> 
> I see, I agree that it can provide some guidance.
> 
>>> I see that NR_FREE_CMA_PAGES is there even without CONFIG_CMA, as you
>>> said, but I am not sure about adding size to a zone unconditionally.
>>> I mean, it is not terrible as IIRC, the maximum MAX_NUMNODES can get
>>> is 1024, and on x86_64 that would be (1024 * 4 zones) * 8 = 32K.
>>> So not a big deal, but still.
>>
>> I'm asking myself how many such systems will run without
>> CONFIG_CMA in the future.
> 
> I am not sure, my comment was just to point out that even the added size might
> not be that large, hiding it under CONFIG_CMA seemed the right thing to
> do.
> 
>>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>>> index 8ba0870ecddd..5757df4bfd45 100644
>>> --- a/mm/vmstat.c
>>> +++ b/mm/vmstat.c
>>> @@ -1559,13 +1559,15 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>>                      "\n        spanned  %lu"
>>>                      "\n        present  %lu"
>>>                      "\n        managed  %lu",
>>> +                  "\n        cma      %lu",
>>>                      zone_page_state(zone, NR_FREE_PAGES),
>>>                      min_wmark_pages(zone),
>>>                      low_wmark_pages(zone),
>>>                      high_wmark_pages(zone),
>>>                      zone->spanned_pages,
>>>                      zone->present_pages,
>>> -                  zone->managed_pages);
>>> +                  zone->managed_pages,
>>> +                  IS_ENABLED(CONFIG_CMA) ? zone->cma_pages : 0);
>>>           seq_printf(m,
>>>                      "\n        protection: (%ld",
>>>
>>>
>>> I do not see it that ugly, but just my taste.
>>
>> IIRC, that does not work. The compiler will still complain
>> about a missing struct members. We would have to provide a
>> zone_cma_pages() helper with some ifdefery.
> 
> Of course, it seems I switched off my brain.
> 
>> We could do something like this on top
>>
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -530,7 +530,9 @@ struct zone {
>>          atomic_long_t           managed_pages;
>>          unsigned long           spanned_pages;
>>          unsigned long           present_pages;
>> +#ifdef CONFIG_CMA
>>          unsigned long           cma_pages;
>> +#endif
>>          const char              *name;
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 97fc32a53320..b753a64f099f 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1643,7 +1643,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>                     "\n        spanned  %lu"
>>                     "\n        present  %lu"
>>                     "\n        managed  %lu"
>> -                  "\n        cma      %lu",
>> +#ifdef CONFIG_CMA
>> +                  "\n        cma      %lu"
>> +#endif
>> +                  "%s",
>>                     zone_page_state(zone, NR_FREE_PAGES),
>>                     min_wmark_pages(zone),
>>                     low_wmark_pages(zone),
>> @@ -1651,7 +1654,10 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>                     zone->spanned_pages,
>>                     zone->present_pages,
>>                     zone_managed_pages(zone),
>> -                  zone->cma_pages);
>> +#ifdef CONFIG_CMA
>> +                  zone->cma_pages,
>> +#endif
>> +                  "");
>>          seq_printf(m,
>>                     "\n        protection: (%ld",
> 
> Looks good to me, but I can see how those #ifdef can raise some
> eyebrows.

We could print it further above to avoid the "%s" ... "", or print it 
separately below. Then we'd only need a single ifdef. Might make sense

> Let us see what other thinks as well.
> 
> Btw, should linux-uapi be CCed, as /proc/vmstat layout will change?

Is there a linux-uapi@ list? I know linux-api@ ("forum to discuss 
changes that affect the Linux programming interface (API or ABI)".

Good question, I can certainly cc linux-api@, although I doubt it's 
strictly necessary when adding something here.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-27 10:18 ` [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo David Hildenbrand
  2021-01-28 10:22   ` Oscar Salvador
@ 2021-01-28 16:45   ` David Hildenbrand
  2021-01-28 21:42     ` Oscar Salvador
  2021-01-28 21:54     ` David Rientjes
  2021-01-29 11:34   ` [PATCH v3] " David Hildenbrand
  2 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-28 16:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang, linux-api

Let's count the number of CMA pages per zone and print them in
/proc/zoneinfo.

Having access to the total number of CMA pages per zone is helpful for
debugging purposes to know where exactly the CMA pages ended up, and to
figure out how many pages of a zone might behave differently, even after
some of these pages might already have been allocated.

As one example, CMA pages part of a kernel zone cannot be used for
ordinary kernel allocations but instead behave more like ZONE_MOVABLE.

For now, we are only able to get the global nr+free cma pages from
/proc/meminfo and the free cma pages per zone from /proc/zoneinfo.

Example after this patch when booting a 6 GiB QEMU VM with
"hugetlb_cma=2G":
  # cat /proc/zoneinfo | grep cma
          cma      0
        nr_free_cma  0
          cma      0
        nr_free_cma  0
          cma      524288
        nr_free_cma  493016
          cma      0
          cma      0
  # cat /proc/meminfo | grep Cma
  CmaTotal:        2097152 kB
  CmaFree:         1972064 kB

Note: We track/print only with CONFIG_CMA; "nr_free_cma" in /proc/zoneinfo
is currently also printed without CONFIG_CMA.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: linux-api@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---

v1 -> v2:
- Print/track only with CONFIG_CMA
- Extend patch description

---
 include/linux/mmzone.h | 6 ++++++
 mm/page_alloc.c        | 1 +
 mm/vmstat.c            | 5 +++++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae588b2f87ef..27d22fb22e05 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -503,6 +503,9 @@ struct zone {
 	 * bootmem allocator):
 	 *	managed_pages = present_pages - reserved_pages;
 	 *
+	 * cma pages is present pages that are assigned for CMA use
+	 * (MIGRATE_CMA).
+	 *
 	 * So present_pages may be used by memory hotplug or memory power
 	 * management logic to figure out unmanaged pages by checking
 	 * (present_pages - managed_pages). And managed_pages should be used
@@ -527,6 +530,9 @@ struct zone {
 	atomic_long_t		managed_pages;
 	unsigned long		spanned_pages;
 	unsigned long		present_pages;
+#ifdef CONFIG_CMA
+	unsigned long		cma_pages;
+#endif
 
 	const char		*name;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b031a5ae0bd5..9a82375bbcb2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 	}
 
 	adjust_managed_page_count(page, pageblock_nr_pages);
+	page_zone(page)->cma_pages += pageblock_nr_pages;
 }
 #endif
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7758486097f9..957680db41fa 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   zone->spanned_pages,
 		   zone->present_pages,
 		   zone_managed_pages(zone));
+#ifdef CONFIG_CMA
+	seq_printf(m,
+		   "\n        cma      %lu",
+		   zone->cma_pages);
+#endif
 
 	seq_printf(m,
 		   "\n        protection: (%ld",
-- 
2.29.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 16:45   ` [PATCH v2] " David Hildenbrand
@ 2021-01-28 21:42     ` Oscar Salvador
  2021-01-28 21:54     ` David Rientjes
  1 sibling, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2021-01-28 21:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang, linux-api

On Thu, Jan 28, 2021 at 05:45:33PM +0100, David Hildenbrand wrote:
> Let's count the number of CMA pages per zone and print them in
> /proc/zoneinfo.
> 
> Having access to the total number of CMA pages per zone is helpful for
> debugging purposes to know where exactly the CMA pages ended up, and to
> figure out how many pages of a zone might behave differently, even after
> some of these pages might already have been allocated.
> 
> As one example, CMA pages part of a kernel zone cannot be used for
> ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
> 
> For now, we are only able to get the global nr+free cma pages from
> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
> 
> Example after this patch when booting a 6 GiB QEMU VM with
> "hugetlb_cma=2G":
>   # cat /proc/zoneinfo | grep cma
>           cma      0
>         nr_free_cma  0
>           cma      0
>         nr_free_cma  0
>           cma      524288
>         nr_free_cma  493016
>           cma      0
>           cma      0
>   # cat /proc/meminfo | grep Cma
>   CmaTotal:        2097152 kB
>   CmaFree:         1972064 kB
> 
> Note: We track/print only with CONFIG_CMA; "nr_free_cma" in /proc/zoneinfo
> is currently also printed without CONFIG_CMA.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: linux-api@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>

IMHO looks better to me, thanks:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
> 
> v1 -> v2:
> - Print/track only with CONFIG_CMA
> - Extend patch description
> 
> ---
>  include/linux/mmzone.h | 6 ++++++
>  mm/page_alloc.c        | 1 +
>  mm/vmstat.c            | 5 +++++
>  3 files changed, 12 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ae588b2f87ef..27d22fb22e05 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -503,6 +503,9 @@ struct zone {
>  	 * bootmem allocator):
>  	 *	managed_pages = present_pages - reserved_pages;
>  	 *
> +	 * cma pages is present pages that are assigned for CMA use
> +	 * (MIGRATE_CMA).
> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -527,6 +530,9 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +#ifdef CONFIG_CMA
> +	unsigned long		cma_pages;
> +#endif
>  
>  	const char		*name;
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b031a5ae0bd5..9a82375bbcb2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  	}
>  
>  	adjust_managed_page_count(page, pageblock_nr_pages);
> +	page_zone(page)->cma_pages += pageblock_nr_pages;
>  }
>  #endif
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 7758486097f9..957680db41fa 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  		   zone->spanned_pages,
>  		   zone->present_pages,
>  		   zone_managed_pages(zone));
> +#ifdef CONFIG_CMA
> +	seq_printf(m,
> +		   "\n        cma      %lu",
> +		   zone->cma_pages);
> +#endif
>  
>  	seq_printf(m,
>  		   "\n        protection: (%ld",
> -- 
> 2.29.2
> 

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 16:45   ` [PATCH v2] " David Hildenbrand
  2021-01-28 21:42     ` Oscar Salvador
@ 2021-01-28 21:54     ` David Rientjes
  2021-01-28 22:03       ` David Hildenbrand
  1 sibling, 1 reply; 20+ messages in thread
From: David Rientjes @ 2021-01-28 21:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang, linux-api

On Thu, 28 Jan 2021, David Hildenbrand wrote:

> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 7758486097f9..957680db41fa 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  		   zone->spanned_pages,
>  		   zone->present_pages,
>  		   zone_managed_pages(zone));
> +#ifdef CONFIG_CMA
> +	seq_printf(m,
> +		   "\n        cma      %lu",
> +		   zone->cma_pages);
> +#endif
>  
>  	seq_printf(m,
>  		   "\n        protection: (%ld",

Hmm, not sure about this.  If cma is only printed for CONFIG_CMA, we can't 
distinguish between (1) a kernel without your patch without including some 
version checking and (2) a kernel without CONFIG_CMA enabled.  IOW, 
"cma 0" carries value: we know immediately that we do not have any CMA 
pages on this zone, period.

/proc/zoneinfo is also not known for its conciseness so I think printing 
"cma 0" even for !CONFIG_CMA is helpful :)

I think this #ifdef should be removed and it should call into a 
zone_cma_pages(struct zone *zone) which returns 0UL if disabled.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 21:54     ` David Rientjes
@ 2021-01-28 22:03       ` David Hildenbrand
  2021-01-28 22:28         ` David Rientjes
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2021-01-28 22:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang, linux-api


> Am 28.01.2021 um 22:54 schrieb David Rientjes <rientjes@google.com>:
> 
> On Thu, 28 Jan 2021, David Hildenbrand wrote:
> 
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 7758486097f9..957680db41fa 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>           zone->spanned_pages,
>>           zone->present_pages,
>>           zone_managed_pages(zone));
>> +#ifdef CONFIG_CMA
>> +    seq_printf(m,
>> +           "\n        cma      %lu",
>> +           zone->cma_pages);
>> +#endif
>> 
>>    seq_printf(m,
>>           "\n        protection: (%ld",
> 
> Hmm, not sure about this.  If cma is only printed for CONFIG_CMA, we can't 
> distinguish between (1) a kernel without your patch without including some 
> version checking and (2) a kernel without CONFIG_CMA enabled.  IOW, 
> "cma 0" carries value: we know immediately that we do not have any CMA 
> pages on this zone, period.
> 
> /proc/zoneinfo is also not known for its conciseness so I think printing 
> "cma 0" even for !CONFIG_CMA is helpful :)
> 
> I think this #ifdef should be removed and it should call into a 
> zone_cma_pages(struct zone *zone) which returns 0UL if disabled.
> 

Yeah, that’s also what I proposed in a sub-thread here.

The last option would be going the full mile and not printing nr_free_cma. Code might get a bit uglier though, but we could also remove that stats counter ;)

I don‘t particularly care, while printing „0“ might be easier, removing nr_free_cma might be cleaner.

But then, maybe there are tools that expect that value to be around on any kernel?

Thoughts?

Thanks



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 22:03       ` David Hildenbrand
@ 2021-01-28 22:28         ` David Rientjes
  2021-01-28 22:30           ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: David Rientjes @ 2021-01-28 22:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang, linux-api

[-- Attachment #1: Type: text/plain, Size: 2498 bytes --]

On Thu, 28 Jan 2021, David Hildenbrand wrote:

> > On Thu, 28 Jan 2021, David Hildenbrand wrote:
> > 
> >> diff --git a/mm/vmstat.c b/mm/vmstat.c
> >> index 7758486097f9..957680db41fa 100644
> >> --- a/mm/vmstat.c
> >> +++ b/mm/vmstat.c
> >> @@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
> >>           zone->spanned_pages,
> >>           zone->present_pages,
> >>           zone_managed_pages(zone));
> >> +#ifdef CONFIG_CMA
> >> +    seq_printf(m,
> >> +           "\n        cma      %lu",
> >> +           zone->cma_pages);
> >> +#endif
> >> 
> >>    seq_printf(m,
> >>           "\n        protection: (%ld",
> > 
> > Hmm, not sure about this.  If cma is only printed for CONFIG_CMA, we can't 
> > distinguish between (1) a kernel without your patch without including some 
> > version checking and (2) a kernel without CONFIG_CMA enabled.  IOW, 
> > "cma 0" carries value: we know immediately that we do not have any CMA 
> > pages on this zone, period.
> > 
> > /proc/zoneinfo is also not known for its conciseness so I think printing 
> > "cma 0" even for !CONFIG_CMA is helpful :)
> > 
> > I think this #ifdef should be removed and it should call into a 
> > zone_cma_pages(struct zone *zone) which returns 0UL if disabled.
> > 
> 
> Yeah, that’s also what I proposed in a sub-thread here.
> 

Ah, I certainly think your original intuition was correct.

> The last option would be going the full mile and not printing nr_free_cma. Code might get a bit uglier though, but we could also remove that stats counter ;)
> 
> I don‘t particularly care, while printing „0“ might be easier, removing nr_free_cma might be cleaner.
> 
> But then, maybe there are tools that expect that value to be around on any kernel?
> 

Yeah, that's probably undue risk, the ship has sailed and there's no 
significant upside.

I still think "cma 0" in /proc/zoneinfo carries value, though, especially 
for NUMA and it looks like this is how it's done in linux-next.  With a 
single read of the file, userspace can make the determination what CMA 
pages exist on this node.

In general, I think the rule-of-thumb is that the fewer ifdefs in 
/proc/zoneinfo, the easier it is for userspace to parse it.

(I made that change to /proc/zoneinfo to even print non-existant zones for 
each node because otherwise you cannot determine what the indices of 
things like vm.lowmem_reserve_ratio represent.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-28 22:28         ` David Rientjes
@ 2021-01-28 22:30           ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-28 22:30 UTC (permalink / raw)
  To: David Rientjes
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang, linux-api


> Am 28.01.2021 um 23:28 schrieb David Rientjes <rientjes@google.com>:
> 
> On Thu, 28 Jan 2021, David Hildenbrand wrote:
> 
>>> On Thu, 28 Jan 2021, David Hildenbrand wrote:
>>> 
>>>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>>>> index 7758486097f9..957680db41fa 100644
>>>> --- a/mm/vmstat.c
>>>> +++ b/mm/vmstat.c
>>>> @@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>>>          zone->spanned_pages,
>>>>          zone->present_pages,
>>>>          zone_managed_pages(zone));
>>>> +#ifdef CONFIG_CMA
>>>> +    seq_printf(m,
>>>> +           "\n        cma      %lu",
>>>> +           zone->cma_pages);
>>>> +#endif
>>>> 
>>>>   seq_printf(m,
>>>>          "\n        protection: (%ld",
>>> 
>>> Hmm, not sure about this.  If cma is only printed for CONFIG_CMA, we can't 
>>> distinguish between (1) a kernel without your patch without including some 
>>> version checking and (2) a kernel without CONFIG_CMA enabled.  IOW, 
>>> "cma 0" carries value: we know immediately that we do not have any CMA 
>>> pages on this zone, period.
>>> 
>>> /proc/zoneinfo is also not known for its conciseness so I think printing 
>>> "cma 0" even for !CONFIG_CMA is helpful :)
>>> 
>>> I think this #ifdef should be removed and it should call into a 
>>> zone_cma_pages(struct zone *zone) which returns 0UL if disabled.
>>> 
>> 
>> Yeah, that’s also what I proposed in a sub-thread here.
>> 
> 
> Ah, I certainly think your original intuition was correct.
> 
>> The last option would be going the full mile and not printing nr_free_cma. Code might get a bit uglier though, but we could also remove that stats counter ;)
>> 
>> I don‘t particularly care, while printing „0“ might be easier, removing nr_free_cma might be cleaner.
>> 
>> But then, maybe there are tools that expect that value to be around on any kernel?
>> 
> 
> Yeah, that's probably undue risk, the ship has sailed and there's no 
> significant upside.
> 
> I still think "cma 0" in /proc/zoneinfo carries value, though, especially 
> for NUMA and it looks like this is how it's done in linux-next.  With a 
> single read of the file, userspace can make the determination what CMA 
> pages exist on this node.
> 
> In general, I think the rule-of-thumb is that the fewer ifdefs in 
> /proc/zoneinfo, the easier it is for userspace to parse it.

Makes sense, I‘ll send an updated version tomorrow - thanks!


> 
> (I made that change to /proc/zoneinfo to even print non-existant zones for 
> each node because otherwise you cannot determine what the indices of 
> things like vm.lowmem_reserve_ratio represent.)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-27 10:18 ` [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo David Hildenbrand
  2021-01-28 10:22   ` Oscar Salvador
  2021-01-28 16:45   ` [PATCH v2] " David Hildenbrand
@ 2021-01-29 11:34   ` David Hildenbrand
  2021-01-29 11:46     ` Oscar Salvador
  2021-01-30  8:48     ` David Rientjes
  2 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-29 11:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang,
	David Rientjes, linux-api

Let's count the number of CMA pages per zone and print them in
/proc/zoneinfo.

Having access to the total number of CMA pages per zone is helpful for
debugging purposes to know where exactly the CMA pages ended up, and to
figure out how many pages of a zone might behave differently, even after
some of these pages might already have been allocated.

As one example, CMA pages part of a kernel zone cannot be used for
ordinary kernel allocations but instead behave more like ZONE_MOVABLE.

For now, we are only able to get the global nr+free cma pages from
/proc/meminfo and the free cma pages per zone from /proc/zoneinfo.

Example after this patch when booting a 6 GiB QEMU VM with
"hugetlb_cma=2G":
  # cat /proc/zoneinfo | grep cma
          cma      0
        nr_free_cma  0
          cma      0
        nr_free_cma  0
          cma      524288
        nr_free_cma  493016
          cma      0
          cma      0
  # cat /proc/meminfo | grep Cma
  CmaTotal:        2097152 kB
  CmaFree:         1972064 kB

Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
      one can be sure when spotting "cma 0", that there are definetly no
      CMA pages located in a zone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: linux-api@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---

The third time is the charm.

v2 -> v3:
- Print even without CONFIG_CMA. Use zone_cma_pages().
- Adjust patch description
- Dropped Oscar's RB due to the changes

v1 -> v2:
- Print/track only with CONFIG_CMA
- Extend patch description

---
 include/linux/mmzone.h | 15 +++++++++++++++
 mm/page_alloc.c        |  1 +
 mm/vmstat.c            |  6 ++++--
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ae588b2f87ef..caafd5e37080 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -503,6 +503,9 @@ struct zone {
 	 * bootmem allocator):
 	 *	managed_pages = present_pages - reserved_pages;
 	 *
+	 * cma pages is present pages that are assigned for CMA use
+	 * (MIGRATE_CMA).
+	 *
 	 * So present_pages may be used by memory hotplug or memory power
 	 * management logic to figure out unmanaged pages by checking
 	 * (present_pages - managed_pages). And managed_pages should be used
@@ -527,6 +530,9 @@ struct zone {
 	atomic_long_t		managed_pages;
 	unsigned long		spanned_pages;
 	unsigned long		present_pages;
+#ifdef CONFIG_CMA
+	unsigned long		cma_pages;
+#endif
 
 	const char		*name;
 
@@ -624,6 +630,15 @@ static inline unsigned long zone_managed_pages(struct zone *zone)
 	return (unsigned long)atomic_long_read(&zone->managed_pages);
 }
 
+static inline unsigned long zone_cma_pages(struct zone *zone)
+{
+#ifdef CONFIG_CMA
+	return zone->cma_pages;
+#else
+	return 0;
+#endif
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
 	return zone->zone_start_pfn + zone->spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b031a5ae0bd5..9a82375bbcb2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 	}
 
 	adjust_managed_page_count(page, pageblock_nr_pages);
+	page_zone(page)->cma_pages += pageblock_nr_pages;
 }
 #endif
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7758486097f9..b2537852d498 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1642,14 +1642,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   "\n        high     %lu"
 		   "\n        spanned  %lu"
 		   "\n        present  %lu"
-		   "\n        managed  %lu",
+		   "\n        managed  %lu"
+		   "\n        cma      %lu",
 		   zone_page_state(zone, NR_FREE_PAGES),
 		   min_wmark_pages(zone),
 		   low_wmark_pages(zone),
 		   high_wmark_pages(zone),
 		   zone->spanned_pages,
 		   zone->present_pages,
-		   zone_managed_pages(zone));
+		   zone_managed_pages(zone),
+		   zone_cma_pages(zone));
 
 	seq_printf(m,
 		   "\n        protection: (%ld",
-- 
2.29.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-29 11:34   ` [PATCH v3] " David Hildenbrand
@ 2021-01-29 11:46     ` Oscar Salvador
  2021-01-29 11:51       ` David Hildenbrand
  2021-01-30  8:48     ` David Rientjes
  1 sibling, 1 reply; 20+ messages in thread
From: Oscar Salvador @ 2021-01-29 11:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang, David Rientjes, linux-api

On Fri, Jan 29, 2021 at 12:34:51PM +0100, David Hildenbrand wrote:
> Let's count the number of CMA pages per zone and print them in
> /proc/zoneinfo.
> 
> Having access to the total number of CMA pages per zone is helpful for
> debugging purposes to know where exactly the CMA pages ended up, and to
> figure out how many pages of a zone might behave differently, even after
> some of these pages might already have been allocated.
> 
> As one example, CMA pages part of a kernel zone cannot be used for
> ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
> 
> For now, we are only able to get the global nr+free cma pages from
> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
> 
> Example after this patch when booting a 6 GiB QEMU VM with
> "hugetlb_cma=2G":
>   # cat /proc/zoneinfo | grep cma
>           cma      0
>         nr_free_cma  0
>           cma      0
>         nr_free_cma  0
>           cma      524288
>         nr_free_cma  493016
>           cma      0
>           cma      0
>   # cat /proc/meminfo | grep Cma
>   CmaTotal:        2097152 kB
>   CmaFree:         1972064 kB
> 
> Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
>       one can be sure when spotting "cma 0", that there are definetly no
>       CMA pages located in a zone.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: linux-api@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>

Looks good to me, I guess it is better to print it unconditionally
so the layout does not change.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

thanks

> ---
> 
> The third time is the charm.
> 
> v2 -> v3:
> - Print even without CONFIG_CMA. Use zone_cma_pages().
> - Adjust patch description
> - Dropped Oscar's RB due to the changes
> 
> v1 -> v2:
> - Print/track only with CONFIG_CMA
> - Extend patch description
> 
> ---
>  include/linux/mmzone.h | 15 +++++++++++++++
>  mm/page_alloc.c        |  1 +
>  mm/vmstat.c            |  6 ++++--
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ae588b2f87ef..caafd5e37080 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -503,6 +503,9 @@ struct zone {
>  	 * bootmem allocator):
>  	 *	managed_pages = present_pages - reserved_pages;
>  	 *
> +	 * cma pages is present pages that are assigned for CMA use
> +	 * (MIGRATE_CMA).
> +	 *
>  	 * So present_pages may be used by memory hotplug or memory power
>  	 * management logic to figure out unmanaged pages by checking
>  	 * (present_pages - managed_pages). And managed_pages should be used
> @@ -527,6 +530,9 @@ struct zone {
>  	atomic_long_t		managed_pages;
>  	unsigned long		spanned_pages;
>  	unsigned long		present_pages;
> +#ifdef CONFIG_CMA
> +	unsigned long		cma_pages;
> +#endif
>  
>  	const char		*name;
>  
> @@ -624,6 +630,15 @@ static inline unsigned long zone_managed_pages(struct zone *zone)
>  	return (unsigned long)atomic_long_read(&zone->managed_pages);
>  }
>  
> +static inline unsigned long zone_cma_pages(struct zone *zone)
> +{
> +#ifdef CONFIG_CMA
> +	return zone->cma_pages;
> +#else
> +	return 0;
> +#endif
> +}
> +
>  static inline unsigned long zone_end_pfn(const struct zone *zone)
>  {
>  	return zone->zone_start_pfn + zone->spanned_pages;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b031a5ae0bd5..9a82375bbcb2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  	}
>  
>  	adjust_managed_page_count(page, pageblock_nr_pages);
> +	page_zone(page)->cma_pages += pageblock_nr_pages;
>  }
>  #endif
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 7758486097f9..b2537852d498 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1642,14 +1642,16 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>  		   "\n        high     %lu"
>  		   "\n        spanned  %lu"
>  		   "\n        present  %lu"
> -		   "\n        managed  %lu",
> +		   "\n        managed  %lu"
> +		   "\n        cma      %lu",
>  		   zone_page_state(zone, NR_FREE_PAGES),
>  		   min_wmark_pages(zone),
>  		   low_wmark_pages(zone),
>  		   high_wmark_pages(zone),
>  		   zone->spanned_pages,
>  		   zone->present_pages,
> -		   zone_managed_pages(zone));
> +		   zone_managed_pages(zone),
> +		   zone_cma_pages(zone));
>  
>  	seq_printf(m,
>  		   "\n        protection: (%ld",
> -- 
> 2.29.2
> 
> 

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-29 11:46     ` Oscar Salvador
@ 2021-01-29 11:51       ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2021-01-29 11:51 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Michal Hocko, Wei Yang, David Rientjes, linux-api

On 29.01.21 12:46, Oscar Salvador wrote:
> On Fri, Jan 29, 2021 at 12:34:51PM +0100, David Hildenbrand wrote:
>> Let's count the number of CMA pages per zone and print them in
>> /proc/zoneinfo.
>>
>> Having access to the total number of CMA pages per zone is helpful for
>> debugging purposes to know where exactly the CMA pages ended up, and to
>> figure out how many pages of a zone might behave differently, even after
>> some of these pages might already have been allocated.
>>
>> As one example, CMA pages part of a kernel zone cannot be used for
>> ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
>>
>> For now, we are only able to get the global nr+free cma pages from
>> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
>>
>> Example after this patch when booting a 6 GiB QEMU VM with
>> "hugetlb_cma=2G":
>>    # cat /proc/zoneinfo | grep cma
>>            cma      0
>>          nr_free_cma  0
>>            cma      0
>>          nr_free_cma  0
>>            cma      524288
>>          nr_free_cma  493016
>>            cma      0
>>            cma      0
>>    # cat /proc/meminfo | grep Cma
>>    CmaTotal:        2097152 kB
>>    CmaFree:         1972064 kB
>>
>> Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
>>        one can be sure when spotting "cma 0", that there are definetly no
>>        CMA pages located in a zone.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: linux-api@vger.kernel.org
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Looks good to me, I guess it is better to print it unconditionally
> so the layout does not change.
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks for the fast review!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-01-29 11:34   ` [PATCH v3] " David Hildenbrand
  2021-01-29 11:46     ` Oscar Salvador
@ 2021-01-30  8:48     ` David Rientjes
  1 sibling, 0 replies; 20+ messages in thread
From: David Rientjes @ 2021-01-30  8:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Mike Rapoport, Oscar Salvador, Michal Hocko, Wei Yang, linux-api

On Fri, 29 Jan 2021, David Hildenbrand wrote:

> Let's count the number of CMA pages per zone and print them in
> /proc/zoneinfo.
> 
> Having access to the total number of CMA pages per zone is helpful for
> debugging purposes to know where exactly the CMA pages ended up, and to
> figure out how many pages of a zone might behave differently, even after
> some of these pages might already have been allocated.
> 
> As one example, CMA pages part of a kernel zone cannot be used for
> ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
> 
> For now, we are only able to get the global nr+free cma pages from
> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
> 
> Example after this patch when booting a 6 GiB QEMU VM with
> "hugetlb_cma=2G":
>   # cat /proc/zoneinfo | grep cma
>           cma      0
>         nr_free_cma  0
>           cma      0
>         nr_free_cma  0
>           cma      524288
>         nr_free_cma  493016
>           cma      0
>           cma      0
>   # cat /proc/meminfo | grep Cma
>   CmaTotal:        2097152 kB
>   CmaFree:         1972064 kB
> 
> Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
>       one can be sure when spotting "cma 0", that there are definetly no
>       CMA pages located in a zone.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: linux-api@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-01-30  8:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 10:18 [PATCH v1 0/2] mm/cma: better error handling and count pages per zone David Hildenbrand
2021-01-27 10:18 ` [PATCH v1 1/2] mm/cma: expose all pages to the buddy if activation of an area fails David Hildenbrand
2021-01-27 15:58   ` Zi Yan
2021-01-28  9:59   ` Oscar Salvador
2021-01-27 10:18 ` [PATCH v1 2/2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo David Hildenbrand
2021-01-28 10:22   ` Oscar Salvador
2021-01-28 10:43     ` David Hildenbrand
2021-01-28 13:44       ` Oscar Salvador
2021-01-28 13:46         ` Oscar Salvador
2021-01-28 14:01         ` David Hildenbrand
2021-01-28 16:45   ` [PATCH v2] " David Hildenbrand
2021-01-28 21:42     ` Oscar Salvador
2021-01-28 21:54     ` David Rientjes
2021-01-28 22:03       ` David Hildenbrand
2021-01-28 22:28         ` David Rientjes
2021-01-28 22:30           ` David Hildenbrand
2021-01-29 11:34   ` [PATCH v3] " David Hildenbrand
2021-01-29 11:46     ` Oscar Salvador
2021-01-29 11:51       ` David Hildenbrand
2021-01-30  8:48     ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).