All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-17  3:48 Muchun Song
  2021-09-17  3:48 ` [PATCH RESEND v2 1/4] mm: hugetlb: free " Muchun Song
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-17  3:48 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet, willy
  Cc: duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc,
	linux-kernel, linux-mm, Muchun Song

Hi,

This series can minimize the overhead of struct page for 2MB HugeTLB pages
significantly, I'd like to get some review input. Thanks.

After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled,
the mapping of the vmemmap addresses associated with a 2MB HugeTLB page
becomes the figure below.

     HugeTLB                  struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | -------------> |     1     |
 |           |                     +-----------+                +-----------+
 |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
 |           |                     +-----------+                   | | | | |
 |           |                     |     3     | ------------------+ | | | |
 |           |                     +-----------+                     | | | |
 |           |                     |     4     | --------------------+ | | |
 |    2MB    |                     +-----------+                       | | |
 |           |                     |     5     | ----------------------+ | |
 |           |                     +-----------+                         | |
 |           |                     |     6     | ------------------------+ |
 |           |                     +-----------+                           |
 |           |                     |     7     | --------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped.
However, the 2nd vmemmap page frame is also can be freed to the buddy allocator,
then we can change the mapping from the figure above to the figure below.

    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
 |           |                     +-----------+                  | | | | | |
 |           |                     |     2     | -----------------+ | | | | |
 |           |                     +-----------+                    | | | | |
 |           |                     |     3     | -------------------+ | | | |
 |           |                     +-----------+                      | | | |
 |           |                     |     4     | ---------------------+ | | |
 |    2MB    |                     +-----------+                        | | |
 |           |                     |     5     | -----------------------+ | |
 |           |                     +-----------+                          | |
 |           |                     |     6     | -------------------------+ |
 |           |                     +-----------+                            |
 |           |                     |     7     | ---------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap
page frame (0). In other words, there are more than one page struct with
PG_head associated with each HugeTLB page.  We __know__ that there is only one
head page struct, the tail page structs with PG_head are fake head page structs.
We need an approach to distinguish between those two different types of page
structs so that compound_head(), PageHead() and PageTail() can work properly
if the parameter is the tail page struct but with PG_head.

The following code snippet describes how to distinguish between real and fake
head page struct.

	if (test_bit(PG_head, &page->flags)) {
		unsigned long head = READ_ONCE(page[1].compound_head);

		if (head & 1) {
			if (head == (unsigned long)page + 1)
				==> head page struct
			else
				==> tail page struct
		} else
			==> head page struct
	}

We can safely access the field of the @page[1] with PG_head because the @page
is a compound page composed with at least two contiguous pages. The main
implementation is in the patch 1.

In our server, we can save extra 2GB memory with this patchset applied if there
are 1 TB HugeTLB (2 MB) pages. If the size of the HugeTLB page is 1 GB, it only
can save 4MB. For 2 MB HugeTLB page, it is a nice gain.

Changlogs in v2:
  1. Drop two patches of introducing PAGEFLAGS_MASK from this series.
  2. Let page_head_if_fake() return page instead of NULL.
  3. Add a selftest to check if PageHead or PageTail work well.

Muchun Song (4):
  mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB
    page
  mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  mm: sparsemem: use page table lock to protect kernel pmd operations
  selftests: vm: add a hugetlb test case

 Documentation/admin-guide/kernel-parameters.txt |   2 +-
 include/linux/hugetlb.h                         |   6 +-
 include/linux/page-flags.h                      |  77 ++++++++++++-
 mm/hugetlb_vmemmap.c                            |  64 ++++++-----
 mm/ptdump.c                                     |  16 ++-
 mm/sparse-vmemmap.c                             |  70 +++++++++---
 tools/testing/selftests/vm/vmemmap_hugetlb.c    | 139 ++++++++++++++++++++++++
 7 files changed, 320 insertions(+), 54 deletions(-)
 create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c

-- 
2.11.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-17  3:48 [PATCH RESEND v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Muchun Song
@ 2021-09-17  3:48 ` Muchun Song
  2021-09-18  4:38     ` Barry Song
  2021-09-17  3:48 ` [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Muchun Song
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Muchun Song @ 2021-09-17  3:48 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet, willy
  Cc: duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc,
	linux-kernel, linux-mm, Muchun Song

Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
page. However, we can remap all tail vmemmap pages to the page frame
mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
extra 2GB memory when there is 1TB HugeTLB pages in the system
compared with the current implementation).

But the head vmemmap page is not freed to the buddy allocator and all
tail vmemmap pages are mapped to the head vmemmap page frame. So we
can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
HugeTLB page) associated with each HugeTLB page. We should adjust
compound_head() to make it returns the real head struct page when the
parameter is the tail struct page but with PG_head flag.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
 mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
 mm/sparse-vmemmap.c                             | 21 +++++++
 4 files changed, 126 insertions(+), 32 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdb22006f713..a154a7b3b9a5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1606,7 +1606,7 @@
 			[KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
 			enabled.
 			Allows heavy hugetlb users to free up some more
-			memory (6 * PAGE_SIZE for each 2MB hugetlb page).
+			memory (7 * PAGE_SIZE for each 2MB hugetlb page).
 			Format: { on | off (default) }
 
 			on:  enable the feature
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 8e1d97d8f3bd..7b1a918ebd43 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -184,13 +184,64 @@ enum pageflags {
 
 #ifndef __GENERATING_BOUNDS_H
 
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+extern bool hugetlb_free_vmemmap_enabled;
+
+/*
+ * If the feature of freeing some vmemmap pages associated with each HugeTLB
+ * page is enabled, the head vmemmap page frame is reused and all of the tail
+ * vmemmap addresses map to the head vmemmap page frame (furture details can
+ * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
+ * word, there are more than one page struct with PG_head associated with each
+ * HugeTLB page.  We __know__ that there is only one head page struct, the tail
+ * page structs with PG_head are fake head page structs.  We need an approach
+ * to distinguish between those two different types of page structs so that
+ * compound_head() can return the real head page struct when the parameter is
+ * the tail page struct but with PG_head.
+ *
+ * The page_head_if_fake() returns the real head page struct iff the @page may
+ * be fake, otherwise, returns the @page if it cannot be a fake page struct.
+ */
+static __always_inline const struct page *page_head_if_fake(const struct page *page)
+{
+	if (!hugetlb_free_vmemmap_enabled)
+		return page;
+
+	/*
+	 * Only addresses aligned with PAGE_SIZE of struct page may be fake head
+	 * struct page. The alignment check aims to avoid access the fields (
+	 * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
+	 * cold cacheline in some cases.
+	 */
+	if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
+	    test_bit(PG_head, &page->flags)) {
+		/*
+		 * We can safely access the field of the @page[1] with PG_head
+		 * because the @page is a compound page composed with at least
+		 * two contiguous pages.
+		 */
+		unsigned long head = READ_ONCE(page[1].compound_head);
+
+		if (likely(head & 1))
+			return (const struct page *)(head - 1);
+	}
+
+	return page;
+}
+#else
+static __always_inline const struct page *page_head_if_fake(const struct page *page)
+{
+	return page;
+}
+#endif
+
 static inline unsigned long _compound_head(const struct page *page)
 {
 	unsigned long head = READ_ONCE(page->compound_head);
 
 	if (unlikely(head & 1))
 		return head - 1;
-	return (unsigned long)page;
+	return (unsigned long)page_head_if_fake(page);
 }
 
 #define compound_head(page)	((typeof(page))_compound_head(page))
@@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
 
 static __always_inline int PageTail(struct page *page)
 {
-	return READ_ONCE(page->compound_head) & 1;
+	return READ_ONCE(page->compound_head) & 1 ||
+	       page_head_if_fake(page) != page;
 }
 
 static __always_inline int PageCompound(struct page *page)
 {
-	return test_bit(PG_head, &page->flags) || PageTail(page);
+	return test_bit(PG_head, &page->flags) ||
+	       READ_ONCE(page->compound_head) & 1;
 }
 
 #define	PAGE_POISON_PATTERN	-1l
@@ -675,7 +728,21 @@ static inline bool test_set_page_writeback(struct page *page)
 	return set_page_writeback(page);
 }
 
-__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
+static __always_inline bool folio_test_head(struct folio *folio)
+{
+	return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY));
+}
+
+static __always_inline int PageHead(struct page *page)
+{
+	PF_POISONED_CHECK(page);
+	return test_bit(PG_head, &page->flags) &&
+	       page_head_if_fake(page) == page;
+}
+
+__SETPAGEFLAG(Head, head, PF_ANY)
+__CLEARPAGEFLAG(Head, head, PF_ANY)
+CLEARPAGEFLAG(Head, head, PF_ANY)
 
 /* Whether there are one or multiple pages in a folio */
 static inline bool folio_single(struct folio *folio)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c540c21e26f5..527bcaa44a48 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -124,9 +124,9 @@
  * page of page structs (page 0) associated with the HugeTLB page contains the 4
  * page structs necessary to describe the HugeTLB. The only use of the remaining
  * pages of page structs (page 1 to page 7) is to point to page->compound_head.
- * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs
+ * Therefore, we can remap pages 1 to 7 to page 0. Only 1 pages of page structs
  * will be used for each HugeTLB page. This will allow us to free the remaining
- * 6 pages to the buddy allocator.
+ * 7 pages to the buddy allocator.
  *
  * Here is how things look after remapping.
  *
@@ -134,30 +134,30 @@
  * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
  * |           |                     |     0     | -------------> |     0     |
  * |           |                     +-----------+                +-----------+
- * |           |                     |     1     | -------------> |     1     |
- * |           |                     +-----------+                +-----------+
- * |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
- * |           |                     +-----------+                   | | | | |
- * |           |                     |     3     | ------------------+ | | | |
- * |           |                     +-----------+                     | | | |
- * |           |                     |     4     | --------------------+ | | |
- * |    PMD    |                     +-----------+                       | | |
- * |   level   |                     |     5     | ----------------------+ | |
- * |  mapping  |                     +-----------+                         | |
- * |           |                     |     6     | ------------------------+ |
- * |           |                     +-----------+                           |
- * |           |                     |     7     | --------------------------+
+ * |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
+ * |           |                     +-----------+                  | | | | | |
+ * |           |                     |     2     | -----------------+ | | | | |
+ * |           |                     +-----------+                    | | | | |
+ * |           |                     |     3     | -------------------+ | | | |
+ * |           |                     +-----------+                      | | | |
+ * |           |                     |     4     | ---------------------+ | | |
+ * |    PMD    |                     +-----------+                        | | |
+ * |   level   |                     |     5     | -----------------------+ | |
+ * |  mapping  |                     +-----------+                          | |
+ * |           |                     |     6     | -------------------------+ |
+ * |           |                     +-----------+                            |
+ * |           |                     |     7     | ---------------------------+
  * |           |                     +-----------+
  * |           |
  * |           |
  * |           |
  * +-----------+
  *
- * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
+ * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
  * vmemmap pages and restore the previous mapping relationship.
  *
  * For the HugeTLB page of the pud level mapping. It is similar to the former.
- * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages.
+ * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
  *
  * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
  * (e.g. aarch64) provides a contiguous bit in the translation table entries
@@ -166,7 +166,13 @@
  *
  * The contiguous bit is used to increase the mapping size at the pmd and pte
  * (last) level. So this type of HugeTLB page can be optimized only when its
- * size of the struct page structs is greater than 2 pages.
+ * size of the struct page structs is greater than 1 pages.
+ *
+ * Notice: The head vmemmap page is not freed to the buddy allocator and all
+ * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
+ * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
+ * associated with each HugeTLB page. The compound_head() can handle this
+ * correctly (more details refer to the comment above compound_head()).
  */
 #define pr_fmt(fmt)	"HugeTLB: " fmt
 
@@ -175,14 +181,16 @@
 /*
  * There are a lot of struct page structures associated with each HugeTLB page.
  * For tail pages, the value of compound_head is the same. So we can reuse first
- * page of tail page structures. We map the virtual addresses of the remaining
- * pages of tail page structures to the first tail page struct, and then free
- * these page frames. Therefore, we need to reserve two pages as vmemmap areas.
+ * page of head page structures. We map the virtual addresses of all the pages
+ * of tail page structures to the head page struct, and then free these page
+ * frames. Therefore, we need to reserve one pages as vmemmap areas.
  */
-#define RESERVE_VMEMMAP_NR		2U
+#define RESERVE_VMEMMAP_NR		1U
 #define RESERVE_VMEMMAP_SIZE		(RESERVE_VMEMMAP_NR << PAGE_SHIFT)
 
-bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
+bool hugetlb_free_vmemmap_enabled __read_mostly =
+	IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
+EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
 
 static int __init early_hugetlb_free_vmemmap_param(char *buf)
 {
@@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head)
 	 */
 	ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
 				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
-
 	if (!ret)
 		ClearHPageVmemmapOptimized(head);
 
@@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h)
 
 	vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
 	/*
-	 * The head page and the first tail page are not to be freed to buddy
-	 * allocator, the other pages will map to the first tail page, so they
-	 * can be freed.
+	 * The head page is not to be freed to buddy allocator, the other tail
+	 * pages will map to the head page, so they can be freed.
 	 *
 	 * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true
 	 * on some architectures (e.g. aarch64). See Documentation/arm64/
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index bdce883f9286..62e3d20648ce 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -53,6 +53,17 @@ struct vmemmap_remap_walk {
 	struct list_head *vmemmap_pages;
 };
 
+/*
+ * How many struct page structs need to be reset. When we reuse the head
+ * struct page, the special metadata (e.g. page->flags or page->mapping)
+ * cannot copy to the tail struct page structs. The invalid value will be
+ * checked in the free_tail_pages_check(). In order to avoid the message
+ * of "corrupted mapping in tail page". We need to reset at least 3 (one
+ * head struct page struct and two tail struct page structs) struct page
+ * structs.
+ */
+#define NR_RESET_STRUCT_PAGE		3
+
 static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
 				  struct vmemmap_remap_walk *walk)
 {
@@ -245,6 +256,15 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
 	set_pte_at(&init_mm, addr, pte, entry);
 }
 
+static inline void reset_struct_pages(struct page *start)
+{
+	int i;
+	struct page *from = start + NR_RESET_STRUCT_PAGE;
+
+	for (i = 0; i < NR_RESET_STRUCT_PAGE; i++)
+		memcpy(start + i, from, sizeof(*from));
+}
+
 static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
 				struct vmemmap_remap_walk *walk)
 {
@@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
 	list_del(&page->lru);
 	to = page_to_virt(page);
 	copy_page(to, (void *)walk->reuse_addr);
+	reset_struct_pages(to);
 
 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  2021-09-17  3:48 [PATCH RESEND v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Muchun Song
  2021-09-17  3:48 ` [PATCH RESEND v2 1/4] mm: hugetlb: free " Muchun Song
@ 2021-09-17  3:48 ` Muchun Song
  2021-09-18  4:55     ` Barry Song
  2021-09-17  3:48 ` [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations Muchun Song
  2021-09-17  3:48 ` [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case Muchun Song
  3 siblings, 1 reply; 47+ messages in thread
From: Muchun Song @ 2021-09-17  3:48 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet, willy
  Cc: duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc,
	linux-kernel, linux-mm, Muchun Song

The page_head_if_fake() is used throughout memory management and the
conditional check requires checking a global variable, although the
overhead of this check may be small, it increases when the memory
cache comes under pressure. Also, the global variable will not be
modified after system boot, so it is very appropriate to use static
key machanism.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/hugetlb.h    |  6 +++++-
 include/linux/page-flags.h |  6 ++++--
 mm/hugetlb_vmemmap.c       | 10 +++++-----
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f7ca1a3870ea..ee3ddf3d12cf 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
 #endif	/* CONFIG_HUGETLB_PAGE */
 
 #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
+DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+			 hugetlb_free_vmemmap_enabled_key);
+#define hugetlb_free_vmemmap_enabled					 \
+	static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
+
 #else
 #define hugetlb_free_vmemmap_enabled	false
 #endif
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 7b1a918ebd43..d68d2cf30d76 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -185,7 +185,8 @@ enum pageflags {
 #ifndef __GENERATING_BOUNDS_H
 
 #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
+DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+			 hugetlb_free_vmemmap_enabled_key);
 
 /*
  * If the feature of freeing some vmemmap pages associated with each HugeTLB
@@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
  */
 static __always_inline const struct page *page_head_if_fake(const struct page *page)
 {
-	if (!hugetlb_free_vmemmap_enabled)
+	if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+				 &hugetlb_free_vmemmap_enabled_key))
 		return page;
 
 	/*
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 527bcaa44a48..5b80129c684c 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -188,9 +188,9 @@
 #define RESERVE_VMEMMAP_NR		1U
 #define RESERVE_VMEMMAP_SIZE		(RESERVE_VMEMMAP_NR << PAGE_SHIFT)
 
-bool hugetlb_free_vmemmap_enabled __read_mostly =
-	IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
-EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
+DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+			hugetlb_free_vmemmap_enabled_key);
+EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key);
 
 static int __init early_hugetlb_free_vmemmap_param(char *buf)
 {
@@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf)
 		return -EINVAL;
 
 	if (!strcmp(buf, "on"))
-		hugetlb_free_vmemmap_enabled = true;
+		static_branch_enable(&hugetlb_free_vmemmap_enabled_key);
 	else if (!strcmp(buf, "off"))
-		hugetlb_free_vmemmap_enabled = false;
+		static_branch_disable(&hugetlb_free_vmemmap_enabled_key);
 	else
 		return -EINVAL;
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
  2021-09-17  3:48 [PATCH RESEND v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Muchun Song
  2021-09-17  3:48 ` [PATCH RESEND v2 1/4] mm: hugetlb: free " Muchun Song
  2021-09-17  3:48 ` [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Muchun Song
@ 2021-09-17  3:48 ` Muchun Song
  2021-09-18  5:06     ` Barry Song
  2021-09-17  3:48 ` [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case Muchun Song
  3 siblings, 1 reply; 47+ messages in thread
From: Muchun Song @ 2021-09-17  3:48 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet, willy
  Cc: duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc,
	linux-kernel, linux-mm, Muchun Song

The init_mm.page_table_lock is used to protect kernel page tables, we
can use it to serialize splitting vmemmap PMD mappings instead of mmap
write lock, which can increase the concurrency of vmemmap_remap_free().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/ptdump.c         | 16 ++++++++++++----
 mm/sparse-vmemmap.c | 49 ++++++++++++++++++++++++++++++++++---------------
 2 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/mm/ptdump.c b/mm/ptdump.c
index da751448d0e4..eea3d28d173c 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
 	if (st->effective_prot)
 		st->effective_prot(st, 0, pgd_val(val));
 
-	if (pgd_leaf(val))
+	if (pgd_leaf(val)) {
 		st->note_page(st, addr, 0, pgd_val(val));
+		walk->action = ACTION_CONTINUE;
+	}
 
 	return 0;
 }
@@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
 	if (st->effective_prot)
 		st->effective_prot(st, 1, p4d_val(val));
 
-	if (p4d_leaf(val))
+	if (p4d_leaf(val)) {
 		st->note_page(st, addr, 1, p4d_val(val));
+		walk->action = ACTION_CONTINUE;
+	}
 
 	return 0;
 }
@@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
 	if (st->effective_prot)
 		st->effective_prot(st, 2, pud_val(val));
 
-	if (pud_leaf(val))
+	if (pud_leaf(val)) {
 		st->note_page(st, addr, 2, pud_val(val));
+		walk->action = ACTION_CONTINUE;
+	}
 
 	return 0;
 }
@@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
 
 	if (st->effective_prot)
 		st->effective_prot(st, 3, pmd_val(val));
-	if (pmd_leaf(val))
+	if (pmd_leaf(val)) {
 		st->note_page(st, addr, 3, pmd_val(val));
+		walk->action = ACTION_CONTINUE;
+	}
 
 	return 0;
 }
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 62e3d20648ce..e636943ccfc4 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -64,8 +64,8 @@ struct vmemmap_remap_walk {
  */
 #define NR_RESET_STRUCT_PAGE		3
 
-static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
-				  struct vmemmap_remap_walk *walk)
+static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
+				    struct vmemmap_remap_walk *walk)
 {
 	pmd_t __pmd;
 	int i;
@@ -87,15 +87,37 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
 		set_pte_at(&init_mm, addr, pte, entry);
 	}
 
-	/* Make pte visible before pmd. See comment in __pte_alloc(). */
-	smp_wmb();
-	pmd_populate_kernel(&init_mm, pmd, pgtable);
+	spin_lock(&init_mm.page_table_lock);
+	if (likely(pmd_leaf(*pmd))) {
+		/* Make pte visible before pmd. See comment in __pte_alloc(). */
+		smp_wmb();
+		pmd_populate_kernel(&init_mm, pmd, pgtable);
+		flush_tlb_kernel_range(start, start + PMD_SIZE);
+		spin_unlock(&init_mm.page_table_lock);
 
-	flush_tlb_kernel_range(start, start + PMD_SIZE);
+		return 0;
+	}
+	spin_unlock(&init_mm.page_table_lock);
+	pte_free_kernel(&init_mm, pgtable);
 
 	return 0;
 }
 
+static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
+				  struct vmemmap_remap_walk *walk)
+{
+	int ret;
+
+	spin_lock(&init_mm.page_table_lock);
+	ret = pmd_leaf(*pmd);
+	spin_unlock(&init_mm.page_table_lock);
+
+	if (ret)
+		ret = __split_vmemmap_huge_pmd(pmd, start, walk);
+
+	return ret;
+}
+
 static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
 			      unsigned long end,
 			      struct vmemmap_remap_walk *walk)
@@ -132,13 +154,12 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
 
 	pmd = pmd_offset(pud, addr);
 	do {
-		if (pmd_leaf(*pmd)) {
-			int ret;
+		int ret;
+
+		ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
+		if (ret)
+			return ret;
 
-			ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
-			if (ret)
-				return ret;
-		}
 		next = pmd_addr_end(addr, end);
 		vmemmap_pte_range(pmd, addr, next, walk);
 	} while (pmd++, addr = next, addr != end);
@@ -321,10 +342,8 @@ int vmemmap_remap_free(unsigned long start, unsigned long end,
 	 */
 	BUG_ON(start - reuse != PAGE_SIZE);
 
-	mmap_write_lock(&init_mm);
+	mmap_read_lock(&init_mm);
 	ret = vmemmap_remap_range(reuse, end, &walk);
-	mmap_write_downgrade(&init_mm);
-
 	if (ret && walk.nr_walked) {
 		end = reuse + walk.nr_walked * PAGE_SIZE;
 		/*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
  2021-09-17  3:48 [PATCH RESEND v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Muchun Song
                   ` (2 preceding siblings ...)
  2021-09-17  3:48 ` [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations Muchun Song
@ 2021-09-17  3:48 ` Muchun Song
  2021-09-18  5:20     ` Barry Song
  3 siblings, 1 reply; 47+ messages in thread
From: Muchun Song @ 2021-09-17  3:48 UTC (permalink / raw)
  To: mike.kravetz, akpm, osalvador, mhocko, song.bao.hua, david,
	chenhuang5, bodeddub, corbet, willy
  Cc: duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc,
	linux-kernel, linux-mm, Muchun Song

Since the head vmemmap page frame associated with each HugeTLB page is
reused, we should hide the PG_head flag of tail struct page from the
user. Add a tese case to check whether it is work properly.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 tools/testing/selftests/vm/vmemmap_hugetlb.c | 139 +++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c

diff --git a/tools/testing/selftests/vm/vmemmap_hugetlb.c b/tools/testing/selftests/vm/vmemmap_hugetlb.c
new file mode 100644
index 000000000000..b6e945bf4053
--- /dev/null
+++ b/tools/testing/selftests/vm/vmemmap_hugetlb.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A test case of using hugepage memory in a user application using the
+ * mmap system call with MAP_HUGETLB flag.  Before running this program
+ * make sure the administrator has allocated enough default sized huge
+ * pages to cover the 2 MB allocation.
+ *
+ * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
+ * That means the addresses starting with 0x800000... will need to be
+ * specified.  Specifying a fixed address is not required on ppc64, i386
+ * or x86_64.
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+
+#define MAP_LENGTH		(2UL * 1024 * 1024)
+
+#ifndef MAP_HUGETLB
+#define MAP_HUGETLB		0x40000	/* arch specific */
+#endif
+
+#define PAGE_SIZE		4096
+
+#define PAGE_COMPOUND_HEAD	(1UL << 15)
+#define PAGE_COMPOUND_TAIL	(1UL << 16)
+#define PAGE_HUGE		(1UL << 17)
+
+#define HEAD_PAGE_FLAGS		(PAGE_COMPOUND_HEAD | PAGE_HUGE)
+#define TAIL_PAGE_FLAGS		(PAGE_COMPOUND_TAIL | PAGE_HUGE)
+
+#define PM_PFRAME_BITS		55
+#define PM_PFRAME_MASK		~((1UL << PM_PFRAME_BITS) - 1)
+
+/* Only ia64 requires this */
+#ifdef __ia64__
+#define MAP_ADDR		(void *)(0x8000000000000000UL)
+#define MAP_FLAGS		(MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
+#else
+#define MAP_ADDR		NULL
+#define MAP_FLAGS		(MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
+#endif
+
+static void write_bytes(char *addr, size_t length)
+{
+	unsigned long i;
+
+	for (i = 0; i < length; i++)
+		*(addr + i) = (char)i;
+}
+
+static unsigned long virt_to_pfn(void *addr)
+{
+	int fd;
+	unsigned long pagemap;
+
+	fd = open("/proc/self/pagemap", O_RDONLY);
+	if (fd < 0)
+		return -1UL;
+
+	lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
+	read(fd, &pagemap, sizeof(pagemap));
+	close(fd);
+
+	return pagemap & ~PM_PFRAME_MASK;
+}
+
+static int check_page_flags(unsigned long pfn)
+{
+	int fd, i;
+	unsigned long pageflags;
+
+	fd = open("/proc/kpageflags", O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	lseek(fd, pfn * sizeof(pageflags), SEEK_SET);
+
+	read(fd, &pageflags, sizeof(pageflags));
+	if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) {
+		close(fd);
+		printf("Head page flags (%lx) is invalid\n", pageflags);
+		return -1;
+	}
+
+	for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+		read(fd, &pageflags, sizeof(pageflags));
+		if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
+		    (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
+			close(fd);
+			printf("Tail page flags (%lx) is invalid\n", pageflags);
+			return -1;
+		}
+	}
+
+	close(fd);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	void *addr;
+	unsigned long pfn;
+
+	addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* Trigger allocation of HugeTLB page. */
+	write_bytes(addr, MAP_LENGTH);
+
+	pfn = virt_to_pfn(addr);
+	if (pfn == -1UL) {
+		munmap(addr, MAP_LENGTH);
+		perror("virt_to_pfn");
+		exit(1);
+	}
+
+	printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
+
+	if (check_page_flags(pfn) < 0) {
+		munmap(addr, MAP_LENGTH);
+		perror("check_page_flags");
+		exit(1);
+	}
+
+	/* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
+	if (munmap(addr, MAP_LENGTH)) {
+		perror("munmap");
+		exit(1);
+	}
+
+	return 0;
+}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-17  3:48 ` [PATCH RESEND v2 1/4] mm: hugetlb: free " Muchun Song
@ 2021-09-18  4:38     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  4:38 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> page. However, we can remap all tail vmemmap pages to the page frame
> mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> extra 2GB memory when there is 1TB HugeTLB pages in the system
> compared with the current implementation).
>
> But the head vmemmap page is not freed to the buddy allocator and all
> tail vmemmap pages are mapped to the head vmemmap page frame. So we
> can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> HugeTLB page) associated with each HugeTLB page. We should adjust
> compound_head() to make it returns the real head struct page when the
> parameter is the tail struct page but with PG_head flag.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  2 +-
>  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
>  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
>  mm/sparse-vmemmap.c                             | 21 +++++++
>  4 files changed, 126 insertions(+), 32 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index bdb22006f713..a154a7b3b9a5 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1606,7 +1606,7 @@
>                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
>                         enabled.
>                         Allows heavy hugetlb users to free up some more
> -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
>                         Format: { on | off (default) }
>
>                         on:  enable the feature
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 8e1d97d8f3bd..7b1a918ebd43 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -184,13 +184,64 @@ enum pageflags {
>
>  #ifndef __GENERATING_BOUNDS_H
>
> +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> +extern bool hugetlb_free_vmemmap_enabled;
> +
> +/*
> + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> + * page is enabled, the head vmemmap page frame is reused and all of the tail
> + * vmemmap addresses map to the head vmemmap page frame (furture details can
> + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> + * word, there are more than one page struct with PG_head associated with each
> + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> + * page structs with PG_head are fake head page structs.  We need an approach
> + * to distinguish between those two different types of page structs so that
> + * compound_head() can return the real head page struct when the parameter is
> + * the tail page struct but with PG_head.
> + *
> + * The page_head_if_fake() returns the real head page struct iff the @page may
> + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> + */
> +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> +{
> +       if (!hugetlb_free_vmemmap_enabled)
> +               return page;
> +
> +       /*
> +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> +        * struct page. The alignment check aims to avoid access the fields (
> +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> +        * cold cacheline in some cases.
> +        */
> +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> +           test_bit(PG_head, &page->flags)) {
> +               /*
> +                * We can safely access the field of the @page[1] with PG_head
> +                * because the @page is a compound page composed with at least
> +                * two contiguous pages.
> +                */
> +               unsigned long head = READ_ONCE(page[1].compound_head);
> +
> +               if (likely(head & 1))
> +                       return (const struct page *)(head - 1);
> +       }
> +
> +       return page;
> +}
> +#else
> +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> +{
> +       return page;
> +}
> +#endif
> +
>  static inline unsigned long _compound_head(const struct page *page)
>  {
>         unsigned long head = READ_ONCE(page->compound_head);
>
>         if (unlikely(head & 1))
>                 return head - 1;
> -       return (unsigned long)page;
> +       return (unsigned long)page_head_if_fake(page);

hard to read. page_head_if_fake,  what is the other side of
page_head_if_not_fake?
I would expect something like
page_to_page_head()
or
get_page_head()

Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
sounds odd to me. just like the things have two sides, but if_fake  presents
one side only.

>  }
>
>  #define compound_head(page)    ((typeof(page))_compound_head(page))
> @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
>
>  static __always_inline int PageTail(struct page *page)
>  {
> -       return READ_ONCE(page->compound_head) & 1;
> +       return READ_ONCE(page->compound_head) & 1 ||
> +              page_head_if_fake(page) != page;

i would expect a wrapper like:
page_is_fake_head()

and the above page_to_page_head() can leverage the wrapper.
here too.

>  }
>
>  static __always_inline int PageCompound(struct page *page)
>  {
> -       return test_bit(PG_head, &page->flags) || PageTail(page);
> +       return test_bit(PG_head, &page->flags) ||
> +              READ_ONCE(page->compound_head) & 1;

hard to read. could it be something like the below?
return PageHead(page) || PageTail(page);

or do we really need to change this function? even a fake head still has
the true test_bit(PG_head, &page->flags), though it is not a real head, it
is still a pagecompound, right?


>  }
>
>  #define        PAGE_POISON_PATTERN     -1l
> @@ -675,7 +728,21 @@ static inline bool test_set_page_writeback(struct page *page)
>         return set_page_writeback(page);
>  }
>
> -__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
> +static __always_inline bool folio_test_head(struct folio *folio)
> +{
> +       return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY));
> +}
> +
> +static __always_inline int PageHead(struct page *page)
> +{
> +       PF_POISONED_CHECK(page);
> +       return test_bit(PG_head, &page->flags) &&
> +              page_head_if_fake(page) == page;
> +}
> +
> +__SETPAGEFLAG(Head, head, PF_ANY)
> +__CLEARPAGEFLAG(Head, head, PF_ANY)
> +CLEARPAGEFLAG(Head, head, PF_ANY)
>
>  /* Whether there are one or multiple pages in a folio */
>  static inline bool folio_single(struct folio *folio)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index c540c21e26f5..527bcaa44a48 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -124,9 +124,9 @@
>   * page of page structs (page 0) associated with the HugeTLB page contains the 4
>   * page structs necessary to describe the HugeTLB. The only use of the remaining
>   * pages of page structs (page 1 to page 7) is to point to page->compound_head.
> - * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs
> + * Therefore, we can remap pages 1 to 7 to page 0. Only 1 pages of page structs
>   * will be used for each HugeTLB page. This will allow us to free the remaining
> - * 6 pages to the buddy allocator.
> + * 7 pages to the buddy allocator.
>   *
>   * Here is how things look after remapping.
>   *
> @@ -134,30 +134,30 @@
>   * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
>   * |           |                     |     0     | -------------> |     0     |
>   * |           |                     +-----------+                +-----------+
> - * |           |                     |     1     | -------------> |     1     |
> - * |           |                     +-----------+                +-----------+
> - * |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
> - * |           |                     +-----------+                   | | | | |
> - * |           |                     |     3     | ------------------+ | | | |
> - * |           |                     +-----------+                     | | | |
> - * |           |                     |     4     | --------------------+ | | |
> - * |    PMD    |                     +-----------+                       | | |
> - * |   level   |                     |     5     | ----------------------+ | |
> - * |  mapping  |                     +-----------+                         | |
> - * |           |                     |     6     | ------------------------+ |
> - * |           |                     +-----------+                           |
> - * |           |                     |     7     | --------------------------+
> + * |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
> + * |           |                     +-----------+                  | | | | | |
> + * |           |                     |     2     | -----------------+ | | | | |
> + * |           |                     +-----------+                    | | | | |
> + * |           |                     |     3     | -------------------+ | | | |
> + * |           |                     +-----------+                      | | | |
> + * |           |                     |     4     | ---------------------+ | | |
> + * |    PMD    |                     +-----------+                        | | |
> + * |   level   |                     |     5     | -----------------------+ | |
> + * |  mapping  |                     +-----------+                          | |
> + * |           |                     |     6     | -------------------------+ |
> + * |           |                     +-----------+                            |
> + * |           |                     |     7     | ---------------------------+
>   * |           |                     +-----------+
>   * |           |
>   * |           |
>   * |           |
>   * +-----------+
>   *
> - * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
> + * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
>   * vmemmap pages and restore the previous mapping relationship.
>   *
>   * For the HugeTLB page of the pud level mapping. It is similar to the former.
> - * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages.
> + * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
>   *
>   * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
>   * (e.g. aarch64) provides a contiguous bit in the translation table entries
> @@ -166,7 +166,13 @@
>   *
>   * The contiguous bit is used to increase the mapping size at the pmd and pte
>   * (last) level. So this type of HugeTLB page can be optimized only when its
> - * size of the struct page structs is greater than 2 pages.
> + * size of the struct page structs is greater than 1 pages.
> + *
> + * Notice: The head vmemmap page is not freed to the buddy allocator and all
> + * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
> + * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
> + * associated with each HugeTLB page. The compound_head() can handle this
> + * correctly (more details refer to the comment above compound_head()).
>   */
>  #define pr_fmt(fmt)    "HugeTLB: " fmt
>
> @@ -175,14 +181,16 @@
>  /*
>   * There are a lot of struct page structures associated with each HugeTLB page.
>   * For tail pages, the value of compound_head is the same. So we can reuse first
> - * page of tail page structures. We map the virtual addresses of the remaining
> - * pages of tail page structures to the first tail page struct, and then free
> - * these page frames. Therefore, we need to reserve two pages as vmemmap areas.
> + * page of head page structures. We map the virtual addresses of all the pages
> + * of tail page structures to the head page struct, and then free these page
> + * frames. Therefore, we need to reserve one pages as vmemmap areas.
>   */
> -#define RESERVE_VMEMMAP_NR             2U
> +#define RESERVE_VMEMMAP_NR             1U
>  #define RESERVE_VMEMMAP_SIZE           (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
>
> -bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
> +bool hugetlb_free_vmemmap_enabled __read_mostly =
> +       IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
> +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
>
>  static int __init early_hugetlb_free_vmemmap_param(char *buf)
>  {
> @@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head)
>          */
>         ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
>                                   GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> -
>         if (!ret)
>                 ClearHPageVmemmapOptimized(head);
>
> @@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h)
>
>         vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
>         /*
> -        * The head page and the first tail page are not to be freed to buddy
> -        * allocator, the other pages will map to the first tail page, so they
> -        * can be freed.
> +        * The head page is not to be freed to buddy allocator, the other tail
> +        * pages will map to the head page, so they can be freed.
>          *
>          * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true
>          * on some architectures (e.g. aarch64). See Documentation/arm64/
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index bdce883f9286..62e3d20648ce 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -53,6 +53,17 @@ struct vmemmap_remap_walk {
>         struct list_head *vmemmap_pages;
>  };
>
> +/*
> + * How many struct page structs need to be reset. When we reuse the head
> + * struct page, the special metadata (e.g. page->flags or page->mapping)
> + * cannot copy to the tail struct page structs. The invalid value will be
> + * checked in the free_tail_pages_check(). In order to avoid the message
> + * of "corrupted mapping in tail page". We need to reset at least 3 (one
> + * head struct page struct and two tail struct page structs) struct page
> + * structs.
> + */
> +#define NR_RESET_STRUCT_PAGE           3
> +
>  static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
>                                   struct vmemmap_remap_walk *walk)
>  {
> @@ -245,6 +256,15 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
>         set_pte_at(&init_mm, addr, pte, entry);
>  }
>
> +static inline void reset_struct_pages(struct page *start)
> +{
> +       int i;
> +       struct page *from = start + NR_RESET_STRUCT_PAGE;
> +
> +       for (i = 0; i < NR_RESET_STRUCT_PAGE; i++)
> +               memcpy(start + i, from, sizeof(*from));
> +}
> +
>  static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
>                                 struct vmemmap_remap_walk *walk)
>  {
> @@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
>         list_del(&page->lru);
>         to = page_to_virt(page);
>         copy_page(to, (void *)walk->reuse_addr);
> +       reset_struct_pages(to);
>
>         set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
>  }
> --
> 2.11.0
>

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-18  4:38     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  4:38 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> page. However, we can remap all tail vmemmap pages to the page frame
> mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> extra 2GB memory when there is 1TB HugeTLB pages in the system
> compared with the current implementation).
>
> But the head vmemmap page is not freed to the buddy allocator and all
> tail vmemmap pages are mapped to the head vmemmap page frame. So we
> can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> HugeTLB page) associated with each HugeTLB page. We should adjust
> compound_head() to make it returns the real head struct page when the
> parameter is the tail struct page but with PG_head flag.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  2 +-
>  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
>  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
>  mm/sparse-vmemmap.c                             | 21 +++++++
>  4 files changed, 126 insertions(+), 32 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index bdb22006f713..a154a7b3b9a5 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1606,7 +1606,7 @@
>                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
>                         enabled.
>                         Allows heavy hugetlb users to free up some more
> -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
>                         Format: { on | off (default) }
>
>                         on:  enable the feature
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 8e1d97d8f3bd..7b1a918ebd43 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -184,13 +184,64 @@ enum pageflags {
>
>  #ifndef __GENERATING_BOUNDS_H
>
> +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> +extern bool hugetlb_free_vmemmap_enabled;
> +
> +/*
> + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> + * page is enabled, the head vmemmap page frame is reused and all of the tail
> + * vmemmap addresses map to the head vmemmap page frame (furture details can
> + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> + * word, there are more than one page struct with PG_head associated with each
> + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> + * page structs with PG_head are fake head page structs.  We need an approach
> + * to distinguish between those two different types of page structs so that
> + * compound_head() can return the real head page struct when the parameter is
> + * the tail page struct but with PG_head.
> + *
> + * The page_head_if_fake() returns the real head page struct iff the @page may
> + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> + */
> +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> +{
> +       if (!hugetlb_free_vmemmap_enabled)
> +               return page;
> +
> +       /*
> +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> +        * struct page. The alignment check aims to avoid access the fields (
> +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> +        * cold cacheline in some cases.
> +        */
> +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> +           test_bit(PG_head, &page->flags)) {
> +               /*
> +                * We can safely access the field of the @page[1] with PG_head
> +                * because the @page is a compound page composed with at least
> +                * two contiguous pages.
> +                */
> +               unsigned long head = READ_ONCE(page[1].compound_head);
> +
> +               if (likely(head & 1))
> +                       return (const struct page *)(head - 1);
> +       }
> +
> +       return page;
> +}
> +#else
> +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> +{
> +       return page;
> +}
> +#endif
> +
>  static inline unsigned long _compound_head(const struct page *page)
>  {
>         unsigned long head = READ_ONCE(page->compound_head);
>
>         if (unlikely(head & 1))
>                 return head - 1;
> -       return (unsigned long)page;
> +       return (unsigned long)page_head_if_fake(page);

hard to read. page_head_if_fake,  what is the other side of
page_head_if_not_fake?
I would expect something like
page_to_page_head()
or
get_page_head()

Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
sounds odd to me. just like the things have two sides, but if_fake  presents
one side only.

>  }
>
>  #define compound_head(page)    ((typeof(page))_compound_head(page))
> @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
>
>  static __always_inline int PageTail(struct page *page)
>  {
> -       return READ_ONCE(page->compound_head) & 1;
> +       return READ_ONCE(page->compound_head) & 1 ||
> +              page_head_if_fake(page) != page;

i would expect a wrapper like:
page_is_fake_head()

and the above page_to_page_head() can leverage the wrapper.
here too.

>  }
>
>  static __always_inline int PageCompound(struct page *page)
>  {
> -       return test_bit(PG_head, &page->flags) || PageTail(page);
> +       return test_bit(PG_head, &page->flags) ||
> +              READ_ONCE(page->compound_head) & 1;

hard to read. could it be something like the below?
return PageHead(page) || PageTail(page);

or do we really need to change this function? even a fake head still has
the true test_bit(PG_head, &page->flags), though it is not a real head, it
is still a pagecompound, right?


>  }
>
>  #define        PAGE_POISON_PATTERN     -1l
> @@ -675,7 +728,21 @@ static inline bool test_set_page_writeback(struct page *page)
>         return set_page_writeback(page);
>  }
>
> -__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
> +static __always_inline bool folio_test_head(struct folio *folio)
> +{
> +       return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY));
> +}
> +
> +static __always_inline int PageHead(struct page *page)
> +{
> +       PF_POISONED_CHECK(page);
> +       return test_bit(PG_head, &page->flags) &&
> +              page_head_if_fake(page) == page;
> +}
> +
> +__SETPAGEFLAG(Head, head, PF_ANY)
> +__CLEARPAGEFLAG(Head, head, PF_ANY)
> +CLEARPAGEFLAG(Head, head, PF_ANY)
>
>  /* Whether there are one or multiple pages in a folio */
>  static inline bool folio_single(struct folio *folio)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index c540c21e26f5..527bcaa44a48 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -124,9 +124,9 @@
>   * page of page structs (page 0) associated with the HugeTLB page contains the 4
>   * page structs necessary to describe the HugeTLB. The only use of the remaining
>   * pages of page structs (page 1 to page 7) is to point to page->compound_head.
> - * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs
> + * Therefore, we can remap pages 1 to 7 to page 0. Only 1 pages of page structs
>   * will be used for each HugeTLB page. This will allow us to free the remaining
> - * 6 pages to the buddy allocator.
> + * 7 pages to the buddy allocator.
>   *
>   * Here is how things look after remapping.
>   *
> @@ -134,30 +134,30 @@
>   * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
>   * |           |                     |     0     | -------------> |     0     |
>   * |           |                     +-----------+                +-----------+
> - * |           |                     |     1     | -------------> |     1     |
> - * |           |                     +-----------+                +-----------+
> - * |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
> - * |           |                     +-----------+                   | | | | |
> - * |           |                     |     3     | ------------------+ | | | |
> - * |           |                     +-----------+                     | | | |
> - * |           |                     |     4     | --------------------+ | | |
> - * |    PMD    |                     +-----------+                       | | |
> - * |   level   |                     |     5     | ----------------------+ | |
> - * |  mapping  |                     +-----------+                         | |
> - * |           |                     |     6     | ------------------------+ |
> - * |           |                     +-----------+                           |
> - * |           |                     |     7     | --------------------------+
> + * |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
> + * |           |                     +-----------+                  | | | | | |
> + * |           |                     |     2     | -----------------+ | | | | |
> + * |           |                     +-----------+                    | | | | |
> + * |           |                     |     3     | -------------------+ | | | |
> + * |           |                     +-----------+                      | | | |
> + * |           |                     |     4     | ---------------------+ | | |
> + * |    PMD    |                     +-----------+                        | | |
> + * |   level   |                     |     5     | -----------------------+ | |
> + * |  mapping  |                     +-----------+                          | |
> + * |           |                     |     6     | -------------------------+ |
> + * |           |                     +-----------+                            |
> + * |           |                     |     7     | ---------------------------+
>   * |           |                     +-----------+
>   * |           |
>   * |           |
>   * |           |
>   * +-----------+
>   *
> - * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
> + * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
>   * vmemmap pages and restore the previous mapping relationship.
>   *
>   * For the HugeTLB page of the pud level mapping. It is similar to the former.
> - * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages.
> + * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
>   *
>   * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
>   * (e.g. aarch64) provides a contiguous bit in the translation table entries
> @@ -166,7 +166,13 @@
>   *
>   * The contiguous bit is used to increase the mapping size at the pmd and pte
>   * (last) level. So this type of HugeTLB page can be optimized only when its
> - * size of the struct page structs is greater than 2 pages.
> + * size of the struct page structs is greater than 1 pages.
> + *
> + * Notice: The head vmemmap page is not freed to the buddy allocator and all
> + * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
> + * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
> + * associated with each HugeTLB page. The compound_head() can handle this
> + * correctly (more details refer to the comment above compound_head()).
>   */
>  #define pr_fmt(fmt)    "HugeTLB: " fmt
>
> @@ -175,14 +181,16 @@
>  /*
>   * There are a lot of struct page structures associated with each HugeTLB page.
>   * For tail pages, the value of compound_head is the same. So we can reuse first
> - * page of tail page structures. We map the virtual addresses of the remaining
> - * pages of tail page structures to the first tail page struct, and then free
> - * these page frames. Therefore, we need to reserve two pages as vmemmap areas.
> + * page of head page structures. We map the virtual addresses of all the pages
> + * of tail page structures to the head page struct, and then free these page
> + * frames. Therefore, we need to reserve one pages as vmemmap areas.
>   */
> -#define RESERVE_VMEMMAP_NR             2U
> +#define RESERVE_VMEMMAP_NR             1U
>  #define RESERVE_VMEMMAP_SIZE           (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
>
> -bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
> +bool hugetlb_free_vmemmap_enabled __read_mostly =
> +       IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
> +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
>
>  static int __init early_hugetlb_free_vmemmap_param(char *buf)
>  {
> @@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head)
>          */
>         ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
>                                   GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> -
>         if (!ret)
>                 ClearHPageVmemmapOptimized(head);
>
> @@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h)
>
>         vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
>         /*
> -        * The head page and the first tail page are not to be freed to buddy
> -        * allocator, the other pages will map to the first tail page, so they
> -        * can be freed.
> +        * The head page is not to be freed to buddy allocator, the other tail
> +        * pages will map to the head page, so they can be freed.
>          *
>          * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true
>          * on some architectures (e.g. aarch64). See Documentation/arm64/
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index bdce883f9286..62e3d20648ce 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -53,6 +53,17 @@ struct vmemmap_remap_walk {
>         struct list_head *vmemmap_pages;
>  };
>
> +/*
> + * How many struct page structs need to be reset. When we reuse the head
> + * struct page, the special metadata (e.g. page->flags or page->mapping)
> + * cannot copy to the tail struct page structs. The invalid value will be
> + * checked in the free_tail_pages_check(). In order to avoid the message
> + * of "corrupted mapping in tail page". We need to reset at least 3 (one
> + * head struct page struct and two tail struct page structs) struct page
> + * structs.
> + */
> +#define NR_RESET_STRUCT_PAGE           3
> +
>  static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
>                                   struct vmemmap_remap_walk *walk)
>  {
> @@ -245,6 +256,15 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
>         set_pte_at(&init_mm, addr, pte, entry);
>  }
>
> +static inline void reset_struct_pages(struct page *start)
> +{
> +       int i;
> +       struct page *from = start + NR_RESET_STRUCT_PAGE;
> +
> +       for (i = 0; i < NR_RESET_STRUCT_PAGE; i++)
> +               memcpy(start + i, from, sizeof(*from));
> +}
> +
>  static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
>                                 struct vmemmap_remap_walk *walk)
>  {
> @@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
>         list_del(&page->lru);
>         to = page_to_virt(page);
>         copy_page(to, (void *)walk->reuse_addr);
> +       reset_struct_pages(to);
>
>         set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
>  }
> --
> 2.11.0
>

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  2021-09-17  3:48 ` [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Muchun Song
@ 2021-09-18  4:55     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  4:55 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> The page_head_if_fake() is used throughout memory management and the
> conditional check requires checking a global variable, although the
> overhead of this check may be small, it increases when the memory
> cache comes under pressure. Also, the global variable will not be
> modified after system boot, so it is very appropriate to use static
> key machanism.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/hugetlb.h    |  6 +++++-
>  include/linux/page-flags.h |  6 ++++--
>  mm/hugetlb_vmemmap.c       | 10 +++++-----
>  3 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index f7ca1a3870ea..ee3ddf3d12cf 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
>  #endif /* CONFIG_HUGETLB_PAGE */
>
>  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> -extern bool hugetlb_free_vmemmap_enabled;
> +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                        hugetlb_free_vmemmap_enabled_key);
> +#define hugetlb_free_vmemmap_enabled                                    \
> +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> +
>  #else
>  #define hugetlb_free_vmemmap_enabled   false
>  #endif
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 7b1a918ebd43..d68d2cf30d76 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -185,7 +185,8 @@ enum pageflags {
>  #ifndef __GENERATING_BOUNDS_H
>
>  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> -extern bool hugetlb_free_vmemmap_enabled;
> +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                        hugetlb_free_vmemmap_enabled_key);
>
>  /*
>   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
>   */
>  static __always_inline const struct page *page_head_if_fake(const struct page *page)
>  {
> -       if (!hugetlb_free_vmemmap_enabled)
> +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                                &hugetlb_free_vmemmap_enabled_key))

A question bothering me is that we still have hugetlb_free_vmemmap_enabled
defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
but here you are using static_branch_maybe() with the CONFIG and refer the key
directly.
Do we only need one of them? Or something is wrong?

>                 return page;
>
>         /*
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 527bcaa44a48..5b80129c684c 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -188,9 +188,9 @@
>  #define RESERVE_VMEMMAP_NR             1U
>  #define RESERVE_VMEMMAP_SIZE           (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
>
> -bool hugetlb_free_vmemmap_enabled __read_mostly =
> -       IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
> -EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
> +DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                       hugetlb_free_vmemmap_enabled_key);
> +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key);
>
>  static int __init early_hugetlb_free_vmemmap_param(char *buf)
>  {
> @@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf)
>                 return -EINVAL;
>
>         if (!strcmp(buf, "on"))
> -               hugetlb_free_vmemmap_enabled = true;
> +               static_branch_enable(&hugetlb_free_vmemmap_enabled_key);
>         else if (!strcmp(buf, "off"))
> -               hugetlb_free_vmemmap_enabled = false;
> +               static_branch_disable(&hugetlb_free_vmemmap_enabled_key);
>         else
>                 return -EINVAL;
>
> --
> 2.11.0
>

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
@ 2021-09-18  4:55     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  4:55 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> The page_head_if_fake() is used throughout memory management and the
> conditional check requires checking a global variable, although the
> overhead of this check may be small, it increases when the memory
> cache comes under pressure. Also, the global variable will not be
> modified after system boot, so it is very appropriate to use static
> key machanism.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/hugetlb.h    |  6 +++++-
>  include/linux/page-flags.h |  6 ++++--
>  mm/hugetlb_vmemmap.c       | 10 +++++-----
>  3 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index f7ca1a3870ea..ee3ddf3d12cf 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
>  #endif /* CONFIG_HUGETLB_PAGE */
>
>  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> -extern bool hugetlb_free_vmemmap_enabled;
> +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                        hugetlb_free_vmemmap_enabled_key);
> +#define hugetlb_free_vmemmap_enabled                                    \
> +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> +
>  #else
>  #define hugetlb_free_vmemmap_enabled   false
>  #endif
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 7b1a918ebd43..d68d2cf30d76 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -185,7 +185,8 @@ enum pageflags {
>  #ifndef __GENERATING_BOUNDS_H
>
>  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> -extern bool hugetlb_free_vmemmap_enabled;
> +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                        hugetlb_free_vmemmap_enabled_key);
>
>  /*
>   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
>   */
>  static __always_inline const struct page *page_head_if_fake(const struct page *page)
>  {
> -       if (!hugetlb_free_vmemmap_enabled)
> +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                                &hugetlb_free_vmemmap_enabled_key))

A question bothering me is that we still have hugetlb_free_vmemmap_enabled
defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
but here you are using static_branch_maybe() with the CONFIG and refer the key
directly.
Do we only need one of them? Or something is wrong?

>                 return page;
>
>         /*
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 527bcaa44a48..5b80129c684c 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -188,9 +188,9 @@
>  #define RESERVE_VMEMMAP_NR             1U
>  #define RESERVE_VMEMMAP_SIZE           (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
>
> -bool hugetlb_free_vmemmap_enabled __read_mostly =
> -       IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
> -EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
> +DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> +                       hugetlb_free_vmemmap_enabled_key);
> +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key);
>
>  static int __init early_hugetlb_free_vmemmap_param(char *buf)
>  {
> @@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf)
>                 return -EINVAL;
>
>         if (!strcmp(buf, "on"))
> -               hugetlb_free_vmemmap_enabled = true;
> +               static_branch_enable(&hugetlb_free_vmemmap_enabled_key);
>         else if (!strcmp(buf, "off"))
> -               hugetlb_free_vmemmap_enabled = false;
> +               static_branch_disable(&hugetlb_free_vmemmap_enabled_key);
>         else
>                 return -EINVAL;
>
> --
> 2.11.0
>

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
  2021-09-17  3:48 ` [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations Muchun Song
@ 2021-09-18  5:06     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  5:06 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> The init_mm.page_table_lock is used to protect kernel page tables, we
> can use it to serialize splitting vmemmap PMD mappings instead of mmap
> write lock, which can increase the concurrency of vmemmap_remap_free().
>

Curious what is the actual benefit we get in user scenarios from this patch,
1. we set bootargs to reserve hugetlb statically
2. we "echo" some figures to sys or proc.

In other words, Who is going to care about this concurrency?
Can we have some details on this to put in the commit log?

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/ptdump.c         | 16 ++++++++++++----
>  mm/sparse-vmemmap.c | 49 ++++++++++++++++++++++++++++++++++---------------
>  2 files changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/mm/ptdump.c b/mm/ptdump.c
> index da751448d0e4..eea3d28d173c 100644
> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 0, pgd_val(val));
>
> -       if (pgd_leaf(val))
> +       if (pgd_leaf(val)) {
>                 st->note_page(st, addr, 0, pgd_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> @@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 1, p4d_val(val));
>
> -       if (p4d_leaf(val))
> +       if (p4d_leaf(val)) {
>                 st->note_page(st, addr, 1, p4d_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> @@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 2, pud_val(val));
>
> -       if (pud_leaf(val))
> +       if (pud_leaf(val)) {
>                 st->note_page(st, addr, 2, pud_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> @@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
>
>         if (st->effective_prot)
>                 st->effective_prot(st, 3, pmd_val(val));
> -       if (pmd_leaf(val))
> +       if (pmd_leaf(val)) {
>                 st->note_page(st, addr, 3, pmd_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 62e3d20648ce..e636943ccfc4 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -64,8 +64,8 @@ struct vmemmap_remap_walk {
>   */
>  #define NR_RESET_STRUCT_PAGE           3
>
> -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> -                                 struct vmemmap_remap_walk *walk)
> +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> +                                   struct vmemmap_remap_walk *walk)
>  {
>         pmd_t __pmd;
>         int i;
> @@ -87,15 +87,37 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
>                 set_pte_at(&init_mm, addr, pte, entry);
>         }
>
> -       /* Make pte visible before pmd. See comment in __pte_alloc(). */
> -       smp_wmb();
> -       pmd_populate_kernel(&init_mm, pmd, pgtable);
> +       spin_lock(&init_mm.page_table_lock);
> +       if (likely(pmd_leaf(*pmd))) {
> +               /* Make pte visible before pmd. See comment in __pte_alloc(). */
> +               smp_wmb();
> +               pmd_populate_kernel(&init_mm, pmd, pgtable);
> +               flush_tlb_kernel_range(start, start + PMD_SIZE);
> +               spin_unlock(&init_mm.page_table_lock);
>
> -       flush_tlb_kernel_range(start, start + PMD_SIZE);
> +               return 0;
> +       }
> +       spin_unlock(&init_mm.page_table_lock);
> +       pte_free_kernel(&init_mm, pgtable);
>
>         return 0;
>  }
>
> +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> +                                 struct vmemmap_remap_walk *walk)
> +{
> +       int ret;
> +
> +       spin_lock(&init_mm.page_table_lock);
> +       ret = pmd_leaf(*pmd);
> +       spin_unlock(&init_mm.page_table_lock);
> +
> +       if (ret)
> +               ret = __split_vmemmap_huge_pmd(pmd, start, walk);
> +
> +       return ret;
> +}
> +
>  static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>                               unsigned long end,
>                               struct vmemmap_remap_walk *walk)
> @@ -132,13 +154,12 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
>
>         pmd = pmd_offset(pud, addr);
>         do {
> -               if (pmd_leaf(*pmd)) {
> -                       int ret;
> +               int ret;
> +
> +               ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
> +               if (ret)
> +                       return ret;
>
> -                       ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
> -                       if (ret)
> -                               return ret;
> -               }
>                 next = pmd_addr_end(addr, end);
>                 vmemmap_pte_range(pmd, addr, next, walk);
>         } while (pmd++, addr = next, addr != end);
> @@ -321,10 +342,8 @@ int vmemmap_remap_free(unsigned long start, unsigned long end,
>          */
>         BUG_ON(start - reuse != PAGE_SIZE);
>
> -       mmap_write_lock(&init_mm);
> +       mmap_read_lock(&init_mm);
>         ret = vmemmap_remap_range(reuse, end, &walk);
> -       mmap_write_downgrade(&init_mm);
> -
>         if (ret && walk.nr_walked) {
>                 end = reuse + walk.nr_walked * PAGE_SIZE;
>                 /*
> --
> 2.11.0
>

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
@ 2021-09-18  5:06     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  5:06 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> The init_mm.page_table_lock is used to protect kernel page tables, we
> can use it to serialize splitting vmemmap PMD mappings instead of mmap
> write lock, which can increase the concurrency of vmemmap_remap_free().
>

Curious what is the actual benefit we get in user scenarios from this patch,
1. we set bootargs to reserve hugetlb statically
2. we "echo" some figures to sys or proc.

In other words, Who is going to care about this concurrency?
Can we have some details on this to put in the commit log?

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/ptdump.c         | 16 ++++++++++++----
>  mm/sparse-vmemmap.c | 49 ++++++++++++++++++++++++++++++++++---------------
>  2 files changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/mm/ptdump.c b/mm/ptdump.c
> index da751448d0e4..eea3d28d173c 100644
> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 0, pgd_val(val));
>
> -       if (pgd_leaf(val))
> +       if (pgd_leaf(val)) {
>                 st->note_page(st, addr, 0, pgd_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> @@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 1, p4d_val(val));
>
> -       if (p4d_leaf(val))
> +       if (p4d_leaf(val)) {
>                 st->note_page(st, addr, 1, p4d_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> @@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
>         if (st->effective_prot)
>                 st->effective_prot(st, 2, pud_val(val));
>
> -       if (pud_leaf(val))
> +       if (pud_leaf(val)) {
>                 st->note_page(st, addr, 2, pud_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> @@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
>
>         if (st->effective_prot)
>                 st->effective_prot(st, 3, pmd_val(val));
> -       if (pmd_leaf(val))
> +       if (pmd_leaf(val)) {
>                 st->note_page(st, addr, 3, pmd_val(val));
> +               walk->action = ACTION_CONTINUE;
> +       }
>
>         return 0;
>  }
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 62e3d20648ce..e636943ccfc4 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -64,8 +64,8 @@ struct vmemmap_remap_walk {
>   */
>  #define NR_RESET_STRUCT_PAGE           3
>
> -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> -                                 struct vmemmap_remap_walk *walk)
> +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> +                                   struct vmemmap_remap_walk *walk)
>  {
>         pmd_t __pmd;
>         int i;
> @@ -87,15 +87,37 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
>                 set_pte_at(&init_mm, addr, pte, entry);
>         }
>
> -       /* Make pte visible before pmd. See comment in __pte_alloc(). */
> -       smp_wmb();
> -       pmd_populate_kernel(&init_mm, pmd, pgtable);
> +       spin_lock(&init_mm.page_table_lock);
> +       if (likely(pmd_leaf(*pmd))) {
> +               /* Make pte visible before pmd. See comment in __pte_alloc(). */
> +               smp_wmb();
> +               pmd_populate_kernel(&init_mm, pmd, pgtable);
> +               flush_tlb_kernel_range(start, start + PMD_SIZE);
> +               spin_unlock(&init_mm.page_table_lock);
>
> -       flush_tlb_kernel_range(start, start + PMD_SIZE);
> +               return 0;
> +       }
> +       spin_unlock(&init_mm.page_table_lock);
> +       pte_free_kernel(&init_mm, pgtable);
>
>         return 0;
>  }
>
> +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
> +                                 struct vmemmap_remap_walk *walk)
> +{
> +       int ret;
> +
> +       spin_lock(&init_mm.page_table_lock);
> +       ret = pmd_leaf(*pmd);
> +       spin_unlock(&init_mm.page_table_lock);
> +
> +       if (ret)
> +               ret = __split_vmemmap_huge_pmd(pmd, start, walk);
> +
> +       return ret;
> +}
> +
>  static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
>                               unsigned long end,
>                               struct vmemmap_remap_walk *walk)
> @@ -132,13 +154,12 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
>
>         pmd = pmd_offset(pud, addr);
>         do {
> -               if (pmd_leaf(*pmd)) {
> -                       int ret;
> +               int ret;
> +
> +               ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
> +               if (ret)
> +                       return ret;
>
> -                       ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
> -                       if (ret)
> -                               return ret;
> -               }
>                 next = pmd_addr_end(addr, end);
>                 vmemmap_pte_range(pmd, addr, next, walk);
>         } while (pmd++, addr = next, addr != end);
> @@ -321,10 +342,8 @@ int vmemmap_remap_free(unsigned long start, unsigned long end,
>          */
>         BUG_ON(start - reuse != PAGE_SIZE);
>
> -       mmap_write_lock(&init_mm);
> +       mmap_read_lock(&init_mm);
>         ret = vmemmap_remap_range(reuse, end, &walk);
> -       mmap_write_downgrade(&init_mm);
> -
>         if (ret && walk.nr_walked) {
>                 end = reuse + walk.nr_walked * PAGE_SIZE;
>                 /*
> --
> 2.11.0
>

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
  2021-09-17  3:48 ` [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case Muchun Song
@ 2021-09-18  5:20     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  5:20 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> Since the head vmemmap page frame associated with each HugeTLB page is
> reused, we should hide the PG_head flag of tail struct page from the
> user. Add a tese case to check whether it is work properly.
>

TBH, I am a bit confused. I was thinking about some kernel unit tests to make
sure those kernel APIs touched by this patchset are still working as before.
This userspace test, while certainly useful for checking the content of page
frames as expected, doesn't directly prove things haven't changed.

In patch 1/4, a couple of APIs have the fixup for the fake head issue.
Do you think a test like the below would be more sensible?
1. alloc 2MB hugeTLB
2. get each page frame
3. apply those APIs in each page frame
4. Those APIs work completely the same as before.

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  tools/testing/selftests/vm/vmemmap_hugetlb.c | 139 +++++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
>  create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c
>
> diff --git a/tools/testing/selftests/vm/vmemmap_hugetlb.c b/tools/testing/selftests/vm/vmemmap_hugetlb.c
> new file mode 100644
> index 000000000000..b6e945bf4053
> --- /dev/null
> +++ b/tools/testing/selftests/vm/vmemmap_hugetlb.c
> @@ -0,0 +1,139 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * A test case of using hugepage memory in a user application using the
> + * mmap system call with MAP_HUGETLB flag.  Before running this program
> + * make sure the administrator has allocated enough default sized huge
> + * pages to cover the 2 MB allocation.
> + *
> + * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
> + * That means the addresses starting with 0x800000... will need to be
> + * specified.  Specifying a fixed address is not required on ppc64, i386
> + * or x86_64.
> + */
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <fcntl.h>
> +
> +#define MAP_LENGTH             (2UL * 1024 * 1024)
> +
> +#ifndef MAP_HUGETLB
> +#define MAP_HUGETLB            0x40000 /* arch specific */
> +#endif
> +
> +#define PAGE_SIZE              4096
> +
> +#define PAGE_COMPOUND_HEAD     (1UL << 15)
> +#define PAGE_COMPOUND_TAIL     (1UL << 16)
> +#define PAGE_HUGE              (1UL << 17)
> +
> +#define HEAD_PAGE_FLAGS                (PAGE_COMPOUND_HEAD | PAGE_HUGE)
> +#define TAIL_PAGE_FLAGS                (PAGE_COMPOUND_TAIL | PAGE_HUGE)
> +
> +#define PM_PFRAME_BITS         55
> +#define PM_PFRAME_MASK         ~((1UL << PM_PFRAME_BITS) - 1)
> +
> +/* Only ia64 requires this */
> +#ifdef __ia64__
> +#define MAP_ADDR               (void *)(0x8000000000000000UL)
> +#define MAP_FLAGS              (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
> +#else
> +#define MAP_ADDR               NULL
> +#define MAP_FLAGS              (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
> +#endif
> +
> +static void write_bytes(char *addr, size_t length)
> +{
> +       unsigned long i;
> +
> +       for (i = 0; i < length; i++)
> +               *(addr + i) = (char)i;
> +}
> +
> +static unsigned long virt_to_pfn(void *addr)
> +{
> +       int fd;
> +       unsigned long pagemap;
> +
> +       fd = open("/proc/self/pagemap", O_RDONLY);
> +       if (fd < 0)
> +               return -1UL;
> +
> +       lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
> +       read(fd, &pagemap, sizeof(pagemap));
> +       close(fd);
> +
> +       return pagemap & ~PM_PFRAME_MASK;
> +}
> +
> +static int check_page_flags(unsigned long pfn)
> +{
> +       int fd, i;
> +       unsigned long pageflags;
> +
> +       fd = open("/proc/kpageflags", O_RDONLY);
> +       if (fd < 0)
> +               return -1;
> +
> +       lseek(fd, pfn * sizeof(pageflags), SEEK_SET);
> +
> +       read(fd, &pageflags, sizeof(pageflags));
> +       if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) {
> +               close(fd);
> +               printf("Head page flags (%lx) is invalid\n", pageflags);
> +               return -1;
> +       }
> +
> +       for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
> +               read(fd, &pageflags, sizeof(pageflags));
> +               if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
> +                   (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
> +                       close(fd);
> +                       printf("Tail page flags (%lx) is invalid\n", pageflags);
> +                       return -1;
> +               }
> +       }
> +
> +       close(fd);
> +
> +       return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       void *addr;
> +       unsigned long pfn;
> +
> +       addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
> +       if (addr == MAP_FAILED) {
> +               perror("mmap");
> +               exit(1);
> +       }
> +
> +       /* Trigger allocation of HugeTLB page. */
> +       write_bytes(addr, MAP_LENGTH);
> +
> +       pfn = virt_to_pfn(addr);
> +       if (pfn == -1UL) {
> +               munmap(addr, MAP_LENGTH);
> +               perror("virt_to_pfn");
> +               exit(1);
> +       }
> +
> +       printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
> +
> +       if (check_page_flags(pfn) < 0) {
> +               munmap(addr, MAP_LENGTH);
> +               perror("check_page_flags");
> +               exit(1);
> +       }
> +
> +       /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
> +       if (munmap(addr, MAP_LENGTH)) {
> +               perror("munmap");
> +               exit(1);
> +       }
> +
> +       return 0;
> +}
> --
> 2.11.0
>

Thanks
Barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
@ 2021-09-18  5:20     ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18  5:20 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, Andrew Morton, osalvador, mhocko, Barry Song,
	david, chenhuang5, bodeddub, Jonathan Corbet, Matthew Wilcox,
	duanxiongchun, fam.zheng, smuchun, zhengqi.arch, linux-doc, LKML,
	Linux-MM

On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> Since the head vmemmap page frame associated with each HugeTLB page is
> reused, we should hide the PG_head flag of tail struct page from the
> user. Add a tese case to check whether it is work properly.
>

TBH, I am a bit confused. I was thinking about some kernel unit tests to make
sure those kernel APIs touched by this patchset are still working as before.
This userspace test, while certainly useful for checking the content of page
frames as expected, doesn't directly prove things haven't changed.

In patch 1/4, a couple of APIs have the fixup for the fake head issue.
Do you think a test like the below would be more sensible?
1. alloc 2MB hugeTLB
2. get each page frame
3. apply those APIs in each page frame
4. Those APIs work completely the same as before.

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  tools/testing/selftests/vm/vmemmap_hugetlb.c | 139 +++++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
>  create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c
>
> diff --git a/tools/testing/selftests/vm/vmemmap_hugetlb.c b/tools/testing/selftests/vm/vmemmap_hugetlb.c
> new file mode 100644
> index 000000000000..b6e945bf4053
> --- /dev/null
> +++ b/tools/testing/selftests/vm/vmemmap_hugetlb.c
> @@ -0,0 +1,139 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * A test case of using hugepage memory in a user application using the
> + * mmap system call with MAP_HUGETLB flag.  Before running this program
> + * make sure the administrator has allocated enough default sized huge
> + * pages to cover the 2 MB allocation.
> + *
> + * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
> + * That means the addresses starting with 0x800000... will need to be
> + * specified.  Specifying a fixed address is not required on ppc64, i386
> + * or x86_64.
> + */
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <fcntl.h>
> +
> +#define MAP_LENGTH             (2UL * 1024 * 1024)
> +
> +#ifndef MAP_HUGETLB
> +#define MAP_HUGETLB            0x40000 /* arch specific */
> +#endif
> +
> +#define PAGE_SIZE              4096
> +
> +#define PAGE_COMPOUND_HEAD     (1UL << 15)
> +#define PAGE_COMPOUND_TAIL     (1UL << 16)
> +#define PAGE_HUGE              (1UL << 17)
> +
> +#define HEAD_PAGE_FLAGS                (PAGE_COMPOUND_HEAD | PAGE_HUGE)
> +#define TAIL_PAGE_FLAGS                (PAGE_COMPOUND_TAIL | PAGE_HUGE)
> +
> +#define PM_PFRAME_BITS         55
> +#define PM_PFRAME_MASK         ~((1UL << PM_PFRAME_BITS) - 1)
> +
> +/* Only ia64 requires this */
> +#ifdef __ia64__
> +#define MAP_ADDR               (void *)(0x8000000000000000UL)
> +#define MAP_FLAGS              (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
> +#else
> +#define MAP_ADDR               NULL
> +#define MAP_FLAGS              (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
> +#endif
> +
> +static void write_bytes(char *addr, size_t length)
> +{
> +       unsigned long i;
> +
> +       for (i = 0; i < length; i++)
> +               *(addr + i) = (char)i;
> +}
> +
> +static unsigned long virt_to_pfn(void *addr)
> +{
> +       int fd;
> +       unsigned long pagemap;
> +
> +       fd = open("/proc/self/pagemap", O_RDONLY);
> +       if (fd < 0)
> +               return -1UL;
> +
> +       lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
> +       read(fd, &pagemap, sizeof(pagemap));
> +       close(fd);
> +
> +       return pagemap & ~PM_PFRAME_MASK;
> +}
> +
> +static int check_page_flags(unsigned long pfn)
> +{
> +       int fd, i;
> +       unsigned long pageflags;
> +
> +       fd = open("/proc/kpageflags", O_RDONLY);
> +       if (fd < 0)
> +               return -1;
> +
> +       lseek(fd, pfn * sizeof(pageflags), SEEK_SET);
> +
> +       read(fd, &pageflags, sizeof(pageflags));
> +       if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) {
> +               close(fd);
> +               printf("Head page flags (%lx) is invalid\n", pageflags);
> +               return -1;
> +       }
> +
> +       for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
> +               read(fd, &pageflags, sizeof(pageflags));
> +               if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
> +                   (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
> +                       close(fd);
> +                       printf("Tail page flags (%lx) is invalid\n", pageflags);
> +                       return -1;
> +               }
> +       }
> +
> +       close(fd);
> +
> +       return 0;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       void *addr;
> +       unsigned long pfn;
> +
> +       addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
> +       if (addr == MAP_FAILED) {
> +               perror("mmap");
> +               exit(1);
> +       }
> +
> +       /* Trigger allocation of HugeTLB page. */
> +       write_bytes(addr, MAP_LENGTH);
> +
> +       pfn = virt_to_pfn(addr);
> +       if (pfn == -1UL) {
> +               munmap(addr, MAP_LENGTH);
> +               perror("virt_to_pfn");
> +               exit(1);
> +       }
> +
> +       printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
> +
> +       if (check_page_flags(pfn) < 0) {
> +               munmap(addr, MAP_LENGTH);
> +               perror("check_page_flags");
> +               exit(1);
> +       }
> +
> +       /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
> +       if (munmap(addr, MAP_LENGTH)) {
> +               perror("munmap");
> +               exit(1);
> +       }
> +
> +       return 0;
> +}
> --
> 2.11.0
>

Thanks
Barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-18  4:38     ` Barry Song
@ 2021-09-18 10:06       ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 10:06 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > page. However, we can remap all tail vmemmap pages to the page frame
> > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > compared with the current implementation).
> >
> > But the head vmemmap page is not freed to the buddy allocator and all
> > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > HugeTLB page) associated with each HugeTLB page. We should adjust
> > compound_head() to make it returns the real head struct page when the
> > parameter is the tail struct page but with PG_head flag.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> >  mm/sparse-vmemmap.c                             | 21 +++++++
> >  4 files changed, 126 insertions(+), 32 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index bdb22006f713..a154a7b3b9a5 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1606,7 +1606,7 @@
> >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> >                         enabled.
> >                         Allows heavy hugetlb users to free up some more
> > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> >                         Format: { on | off (default) }
> >
> >                         on:  enable the feature
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -184,13 +184,64 @@ enum pageflags {
> >
> >  #ifndef __GENERATING_BOUNDS_H
> >
> > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > +extern bool hugetlb_free_vmemmap_enabled;
> > +
> > +/*
> > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > + * word, there are more than one page struct with PG_head associated with each
> > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > + * page structs with PG_head are fake head page structs.  We need an approach
> > + * to distinguish between those two different types of page structs so that
> > + * compound_head() can return the real head page struct when the parameter is
> > + * the tail page struct but with PG_head.
> > + *
> > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > + */
> > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > +{
> > +       if (!hugetlb_free_vmemmap_enabled)
> > +               return page;
> > +
> > +       /*
> > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > +        * struct page. The alignment check aims to avoid access the fields (
> > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > +        * cold cacheline in some cases.
> > +        */
> > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > +           test_bit(PG_head, &page->flags)) {
> > +               /*
> > +                * We can safely access the field of the @page[1] with PG_head
> > +                * because the @page is a compound page composed with at least
> > +                * two contiguous pages.
> > +                */
> > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > +
> > +               if (likely(head & 1))
> > +                       return (const struct page *)(head - 1);
> > +       }
> > +
> > +       return page;
> > +}
> > +#else
> > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > +{
> > +       return page;
> > +}
> > +#endif
> > +
> >  static inline unsigned long _compound_head(const struct page *page)
> >  {
> >         unsigned long head = READ_ONCE(page->compound_head);
> >
> >         if (unlikely(head & 1))
> >                 return head - 1;
> > -       return (unsigned long)page;
> > +       return (unsigned long)page_head_if_fake(page);
>
> hard to read. page_head_if_fake,  what is the other side of
> page_head_if_not_fake?

1) return itself if the @page is not a fake head page.
2) return head page if @page is a fake head page.

So I want to express that page_head_if_fake returns a
head page only and only if the parameter of @page is a
fake head page. Otherwise, it returns itself.

> I would expect something like
> page_to_page_head()
> or
> get_page_head()
>

Those names seem to be not appropriate as well, because
its functionality does not make sure it can return a head
page. If the parameter is a head page, it definitely
returns a head page, otherwise, it may return itself which
may be a tail page.

From this point of view, I still prefer page_head_if_fake.

> Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> sounds odd to me. just like the things have two sides, but if_fake  presents
> one side only.

If others have any ideas, comments are welcome.

>
> >  }
> >
> >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> >
> >  static __always_inline int PageTail(struct page *page)
> >  {
> > -       return READ_ONCE(page->compound_head) & 1;
> > +       return READ_ONCE(page->compound_head) & 1 ||
> > +              page_head_if_fake(page) != page;
>
> i would expect a wrapper like:
> page_is_fake_head()

Good point. Will do.

>
> and the above page_to_page_head() can leverage the wrapper.
> here too.
>
> >  }
> >
> >  static __always_inline int PageCompound(struct page *page)
> >  {
> > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > +       return test_bit(PG_head, &page->flags) ||
> > +              READ_ONCE(page->compound_head) & 1;
>
> hard to read. could it be something like the below?
> return PageHead(page) || PageTail(page);
>
> or do we really need to change this function? even a fake head still has
> the true test_bit(PG_head, &page->flags), though it is not a real head, it
> is still a pagecompound, right?

Right. PageCompound() can not be changed.  It is odd but
efficient because calling page_head_if_fake is eliminated.
So I select performance not readability. I'm not sure if it's
worth it.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-18 10:06       ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 10:06 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > page. However, we can remap all tail vmemmap pages to the page frame
> > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > compared with the current implementation).
> >
> > But the head vmemmap page is not freed to the buddy allocator and all
> > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > HugeTLB page) associated with each HugeTLB page. We should adjust
> > compound_head() to make it returns the real head struct page when the
> > parameter is the tail struct page but with PG_head flag.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> >  mm/sparse-vmemmap.c                             | 21 +++++++
> >  4 files changed, 126 insertions(+), 32 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index bdb22006f713..a154a7b3b9a5 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1606,7 +1606,7 @@
> >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> >                         enabled.
> >                         Allows heavy hugetlb users to free up some more
> > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> >                         Format: { on | off (default) }
> >
> >                         on:  enable the feature
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -184,13 +184,64 @@ enum pageflags {
> >
> >  #ifndef __GENERATING_BOUNDS_H
> >
> > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > +extern bool hugetlb_free_vmemmap_enabled;
> > +
> > +/*
> > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > + * word, there are more than one page struct with PG_head associated with each
> > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > + * page structs with PG_head are fake head page structs.  We need an approach
> > + * to distinguish between those two different types of page structs so that
> > + * compound_head() can return the real head page struct when the parameter is
> > + * the tail page struct but with PG_head.
> > + *
> > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > + */
> > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > +{
> > +       if (!hugetlb_free_vmemmap_enabled)
> > +               return page;
> > +
> > +       /*
> > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > +        * struct page. The alignment check aims to avoid access the fields (
> > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > +        * cold cacheline in some cases.
> > +        */
> > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > +           test_bit(PG_head, &page->flags)) {
> > +               /*
> > +                * We can safely access the field of the @page[1] with PG_head
> > +                * because the @page is a compound page composed with at least
> > +                * two contiguous pages.
> > +                */
> > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > +
> > +               if (likely(head & 1))
> > +                       return (const struct page *)(head - 1);
> > +       }
> > +
> > +       return page;
> > +}
> > +#else
> > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > +{
> > +       return page;
> > +}
> > +#endif
> > +
> >  static inline unsigned long _compound_head(const struct page *page)
> >  {
> >         unsigned long head = READ_ONCE(page->compound_head);
> >
> >         if (unlikely(head & 1))
> >                 return head - 1;
> > -       return (unsigned long)page;
> > +       return (unsigned long)page_head_if_fake(page);
>
> hard to read. page_head_if_fake,  what is the other side of
> page_head_if_not_fake?

1) return itself if the @page is not a fake head page.
2) return head page if @page is a fake head page.

So I want to express that page_head_if_fake returns a
head page only and only if the parameter of @page is a
fake head page. Otherwise, it returns itself.

> I would expect something like
> page_to_page_head()
> or
> get_page_head()
>

Those names seem to be not appropriate as well, because
its functionality does not make sure it can return a head
page. If the parameter is a head page, it definitely
returns a head page, otherwise, it may return itself which
may be a tail page.

From this point of view, I still prefer page_head_if_fake.

> Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> sounds odd to me. just like the things have two sides, but if_fake  presents
> one side only.

If others have any ideas, comments are welcome.

>
> >  }
> >
> >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> >
> >  static __always_inline int PageTail(struct page *page)
> >  {
> > -       return READ_ONCE(page->compound_head) & 1;
> > +       return READ_ONCE(page->compound_head) & 1 ||
> > +              page_head_if_fake(page) != page;
>
> i would expect a wrapper like:
> page_is_fake_head()

Good point. Will do.

>
> and the above page_to_page_head() can leverage the wrapper.
> here too.
>
> >  }
> >
> >  static __always_inline int PageCompound(struct page *page)
> >  {
> > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > +       return test_bit(PG_head, &page->flags) ||
> > +              READ_ONCE(page->compound_head) & 1;
>
> hard to read. could it be something like the below?
> return PageHead(page) || PageTail(page);
>
> or do we really need to change this function? even a fake head still has
> the true test_bit(PG_head, &page->flags), though it is not a real head, it
> is still a pagecompound, right?

Right. PageCompound() can not be changed.  It is odd but
efficient because calling page_head_if_fake is eliminated.
So I select performance not readability. I'm not sure if it's
worth it.

Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  2021-09-18  4:55     ` Barry Song
@ 2021-09-18 10:30       ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 10:30 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > The page_head_if_fake() is used throughout memory management and the
> > conditional check requires checking a global variable, although the
> > overhead of this check may be small, it increases when the memory
> > cache comes under pressure. Also, the global variable will not be
> > modified after system boot, so it is very appropriate to use static
> > key machanism.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/hugetlb.h    |  6 +++++-
> >  include/linux/page-flags.h |  6 ++++--
> >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> >  3 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> >  #endif /* CONFIG_HUGETLB_PAGE */
> >
> >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > -extern bool hugetlb_free_vmemmap_enabled;
> > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > +                        hugetlb_free_vmemmap_enabled_key);
> > +#define hugetlb_free_vmemmap_enabled                                    \
> > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > +
> >  #else
> >  #define hugetlb_free_vmemmap_enabled   false
> >  #endif
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 7b1a918ebd43..d68d2cf30d76 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -185,7 +185,8 @@ enum pageflags {
> >  #ifndef __GENERATING_BOUNDS_H
> >
> >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > -extern bool hugetlb_free_vmemmap_enabled;
> > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > +                        hugetlb_free_vmemmap_enabled_key);
> >
> >  /*
> >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> >   */
> >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> >  {
> > -       if (!hugetlb_free_vmemmap_enabled)
> > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > +                                &hugetlb_free_vmemmap_enabled_key))
>
> A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> but here you are using static_branch_maybe() with the CONFIG and refer the key
> directly.
> Do we only need one of them? Or something is wrong?
>

Yeah, we only need one. But my consideration is that we
use static_branch_maybe() for performance sensitive places.
So I do not change hugetlb_free_vmemmap_enabled
to static_branch_maybe(), this can reduce some codes
that need to be updated when the static key is enabled.
Actually, the user of hugetlb_free_vmemmap_enabled
is not performance sensitive.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
@ 2021-09-18 10:30       ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 10:30 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > The page_head_if_fake() is used throughout memory management and the
> > conditional check requires checking a global variable, although the
> > overhead of this check may be small, it increases when the memory
> > cache comes under pressure. Also, the global variable will not be
> > modified after system boot, so it is very appropriate to use static
> > key machanism.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/hugetlb.h    |  6 +++++-
> >  include/linux/page-flags.h |  6 ++++--
> >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> >  3 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> >  #endif /* CONFIG_HUGETLB_PAGE */
> >
> >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > -extern bool hugetlb_free_vmemmap_enabled;
> > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > +                        hugetlb_free_vmemmap_enabled_key);
> > +#define hugetlb_free_vmemmap_enabled                                    \
> > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > +
> >  #else
> >  #define hugetlb_free_vmemmap_enabled   false
> >  #endif
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 7b1a918ebd43..d68d2cf30d76 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -185,7 +185,8 @@ enum pageflags {
> >  #ifndef __GENERATING_BOUNDS_H
> >
> >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > -extern bool hugetlb_free_vmemmap_enabled;
> > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > +                        hugetlb_free_vmemmap_enabled_key);
> >
> >  /*
> >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> >   */
> >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> >  {
> > -       if (!hugetlb_free_vmemmap_enabled)
> > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > +                                &hugetlb_free_vmemmap_enabled_key))
>
> A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> but here you are using static_branch_maybe() with the CONFIG and refer the key
> directly.
> Do we only need one of them? Or something is wrong?
>

Yeah, we only need one. But my consideration is that we
use static_branch_maybe() for performance sensitive places.
So I do not change hugetlb_free_vmemmap_enabled
to static_branch_maybe(), this can reduce some codes
that need to be updated when the static key is enabled.
Actually, the user of hugetlb_free_vmemmap_enabled
is not performance sensitive.

Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
  2021-09-18  5:06     ` Barry Song
@ 2021-09-18 10:51       ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 10:51 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 1:07 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > The init_mm.page_table_lock is used to protect kernel page tables, we
> > can use it to serialize splitting vmemmap PMD mappings instead of mmap
> > write lock, which can increase the concurrency of vmemmap_remap_free().
> >
>
> Curious what is the actual benefit we get in user scenarios from this patch,
> 1. we set bootargs to reserve hugetlb statically
> 2. we "echo" some figures to sys or proc.
>
> In other words, Who is going to care about this concurrency?

Actually, It increase the concurrency between allocations of
HugeTLB pages. But it is not my first consideration. There are
a lot of users of mmap read lock of init_mm. The mmap write
lock is holding through vmemmap_remap_free(), I want to make
it does not affect other users of mmap read lock.

I suppose a lot of developers are trying to avoid using mmap write
lock. I am also one of them.

> Can we have some details on this to put in the commit log?

For sure. Those judgments above should be placed in the
commit log.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
@ 2021-09-18 10:51       ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 10:51 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 1:07 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > The init_mm.page_table_lock is used to protect kernel page tables, we
> > can use it to serialize splitting vmemmap PMD mappings instead of mmap
> > write lock, which can increase the concurrency of vmemmap_remap_free().
> >
>
> Curious what is the actual benefit we get in user scenarios from this patch,
> 1. we set bootargs to reserve hugetlb statically
> 2. we "echo" some figures to sys or proc.
>
> In other words, Who is going to care about this concurrency?

Actually, It increase the concurrency between allocations of
HugeTLB pages. But it is not my first consideration. There are
a lot of users of mmap read lock of init_mm. The mmap write
lock is holding through vmemmap_remap_free(), I want to make
it does not affect other users of mmap read lock.

I suppose a lot of developers are trying to avoid using mmap write
lock. I am also one of them.

> Can we have some details on this to put in the commit log?

For sure. Those judgments above should be placed in the
commit log.

Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
  2021-09-18 10:51       ` Muchun Song
@ 2021-09-18 11:01         ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18 11:01 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 10:51 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 1:07 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > The init_mm.page_table_lock is used to protect kernel page tables, we
> > > can use it to serialize splitting vmemmap PMD mappings instead of mmap
> > > write lock, which can increase the concurrency of vmemmap_remap_free().
> > >
> >
> > Curious what is the actual benefit we get in user scenarios from this patch,
> > 1. we set bootargs to reserve hugetlb statically
> > 2. we "echo" some figures to sys or proc.
> >
> > In other words, Who is going to care about this concurrency?
>
> Actually, It increase the concurrency between allocations of
> HugeTLB pages. But it is not my first consideration. There are
> a lot of users of mmap read lock of init_mm. The mmap write
> lock is holding through vmemmap_remap_free(), I want to make
> it does not affect other users of mmap read lock.

generically makes sense. I guess it wouldn't be critical at all for hugetlb
allocation as practically we are not going to reserve and release hugtlb
often as they are not THP.

anyway, it is not making anything worse and always a win to move.

>
> I suppose a lot of developers are trying to avoid using mmap write
> lock. I am also one of them.
>
> > Can we have some details on this to put in the commit log?
>
> For sure. Those judgments above should be placed in the
> commit log.
>
> Thanks.

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations
@ 2021-09-18 11:01         ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18 11:01 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 10:51 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 1:07 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > The init_mm.page_table_lock is used to protect kernel page tables, we
> > > can use it to serialize splitting vmemmap PMD mappings instead of mmap
> > > write lock, which can increase the concurrency of vmemmap_remap_free().
> > >
> >
> > Curious what is the actual benefit we get in user scenarios from this patch,
> > 1. we set bootargs to reserve hugetlb statically
> > 2. we "echo" some figures to sys or proc.
> >
> > In other words, Who is going to care about this concurrency?
>
> Actually, It increase the concurrency between allocations of
> HugeTLB pages. But it is not my first consideration. There are
> a lot of users of mmap read lock of init_mm. The mmap write
> lock is holding through vmemmap_remap_free(), I want to make
> it does not affect other users of mmap read lock.

generically makes sense. I guess it wouldn't be critical at all for hugetlb
allocation as practically we are not going to reserve and release hugtlb
often as they are not THP.

anyway, it is not making anything worse and always a win to move.

>
> I suppose a lot of developers are trying to avoid using mmap write
> lock. I am also one of them.
>
> > Can we have some details on this to put in the commit log?
>
> For sure. Those judgments above should be placed in the
> commit log.
>
> Thanks.

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  2021-09-18 10:30       ` Muchun Song
@ 2021-09-18 11:14         ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18 11:14 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > The page_head_if_fake() is used throughout memory management and the
> > > conditional check requires checking a global variable, although the
> > > overhead of this check may be small, it increases when the memory
> > > cache comes under pressure. Also, the global variable will not be
> > > modified after system boot, so it is very appropriate to use static
> > > key machanism.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  include/linux/hugetlb.h    |  6 +++++-
> > >  include/linux/page-flags.h |  6 ++++--
> > >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > > --- a/include/linux/hugetlb.h
> > > +++ b/include/linux/hugetlb.h
> > > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> > >  #endif /* CONFIG_HUGETLB_PAGE */
> > >
> > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > -extern bool hugetlb_free_vmemmap_enabled;
> > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > +                        hugetlb_free_vmemmap_enabled_key);
> > > +#define hugetlb_free_vmemmap_enabled                                    \
> > > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > > +
> > >  #else
> > >  #define hugetlb_free_vmemmap_enabled   false
> > >  #endif
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 7b1a918ebd43..d68d2cf30d76 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -185,7 +185,8 @@ enum pageflags {
> > >  #ifndef __GENERATING_BOUNDS_H
> > >
> > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > -extern bool hugetlb_free_vmemmap_enabled;
> > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > +                        hugetlb_free_vmemmap_enabled_key);
> > >
> > >  /*
> > >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> > >   */
> > >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > >  {
> > > -       if (!hugetlb_free_vmemmap_enabled)
> > > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > +                                &hugetlb_free_vmemmap_enabled_key))
> >
> > A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> > defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> > but here you are using static_branch_maybe() with the CONFIG and refer the key
> > directly.
> > Do we only need one of them? Or something is wrong?
> >
>
> Yeah, we only need one. But my consideration is that we
> use static_branch_maybe() for performance sensitive places.
> So I do not change hugetlb_free_vmemmap_enabled
> to static_branch_maybe(), this can reduce some codes
> that need to be updated when the static key is enabled.
> Actually, the user of hugetlb_free_vmemmap_enabled
> is not performance sensitive.

not quite sure if an unified inline API will be better, e.g.

#ifdef CONFIG_SCHED_SMT
extern struct static_key_false sched_smt_present;

static __always_inline bool sched_smt_active(void)
{
        return static_branch_likely(&sched_smt_present);
}
#else
static inline bool sched_smt_active(void) { return false; }
#endif

but in your case, CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
is always true in your page_head_if_fake(). Why do we check it
again?

>
> Thanks.

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
@ 2021-09-18 11:14         ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18 11:14 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > The page_head_if_fake() is used throughout memory management and the
> > > conditional check requires checking a global variable, although the
> > > overhead of this check may be small, it increases when the memory
> > > cache comes under pressure. Also, the global variable will not be
> > > modified after system boot, so it is very appropriate to use static
> > > key machanism.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  include/linux/hugetlb.h    |  6 +++++-
> > >  include/linux/page-flags.h |  6 ++++--
> > >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > > --- a/include/linux/hugetlb.h
> > > +++ b/include/linux/hugetlb.h
> > > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> > >  #endif /* CONFIG_HUGETLB_PAGE */
> > >
> > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > -extern bool hugetlb_free_vmemmap_enabled;
> > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > +                        hugetlb_free_vmemmap_enabled_key);
> > > +#define hugetlb_free_vmemmap_enabled                                    \
> > > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > > +
> > >  #else
> > >  #define hugetlb_free_vmemmap_enabled   false
> > >  #endif
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 7b1a918ebd43..d68d2cf30d76 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -185,7 +185,8 @@ enum pageflags {
> > >  #ifndef __GENERATING_BOUNDS_H
> > >
> > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > -extern bool hugetlb_free_vmemmap_enabled;
> > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > +                        hugetlb_free_vmemmap_enabled_key);
> > >
> > >  /*
> > >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> > >   */
> > >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > >  {
> > > -       if (!hugetlb_free_vmemmap_enabled)
> > > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > +                                &hugetlb_free_vmemmap_enabled_key))
> >
> > A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> > defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> > but here you are using static_branch_maybe() with the CONFIG and refer the key
> > directly.
> > Do we only need one of them? Or something is wrong?
> >
>
> Yeah, we only need one. But my consideration is that we
> use static_branch_maybe() for performance sensitive places.
> So I do not change hugetlb_free_vmemmap_enabled
> to static_branch_maybe(), this can reduce some codes
> that need to be updated when the static key is enabled.
> Actually, the user of hugetlb_free_vmemmap_enabled
> is not performance sensitive.

not quite sure if an unified inline API will be better, e.g.

#ifdef CONFIG_SCHED_SMT
extern struct static_key_false sched_smt_present;

static __always_inline bool sched_smt_active(void)
{
        return static_branch_likely(&sched_smt_present);
}
#else
static inline bool sched_smt_active(void) { return false; }
#endif

but in your case, CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
is always true in your page_head_if_fake(). Why do we check it
again?

>
> Thanks.

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  2021-09-18 11:14         ` Barry Song
@ 2021-09-18 11:47           ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 11:47 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 7:15 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > The page_head_if_fake() is used throughout memory management and the
> > > > conditional check requires checking a global variable, although the
> > > > overhead of this check may be small, it increases when the memory
> > > > cache comes under pressure. Also, the global variable will not be
> > > > modified after system boot, so it is very appropriate to use static
> > > > key machanism.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  include/linux/hugetlb.h    |  6 +++++-
> > > >  include/linux/page-flags.h |  6 ++++--
> > > >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> > > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > > > --- a/include/linux/hugetlb.h
> > > > +++ b/include/linux/hugetlb.h
> > > > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> > > >  #endif /* CONFIG_HUGETLB_PAGE */
> > > >
> > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > > +#define hugetlb_free_vmemmap_enabled                                    \
> > > > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > > > +
> > > >  #else
> > > >  #define hugetlb_free_vmemmap_enabled   false
> > > >  #endif
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index 7b1a918ebd43..d68d2cf30d76 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -185,7 +185,8 @@ enum pageflags {
> > > >  #ifndef __GENERATING_BOUNDS_H
> > > >
> > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > >
> > > >  /*
> > > >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> > > >   */
> > > >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > >  {
> > > > -       if (!hugetlb_free_vmemmap_enabled)
> > > > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > +                                &hugetlb_free_vmemmap_enabled_key))
> > >
> > > A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> > > defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> > > but here you are using static_branch_maybe() with the CONFIG and refer the key
> > > directly.
> > > Do we only need one of them? Or something is wrong?
> > >
> >
> > Yeah, we only need one. But my consideration is that we
> > use static_branch_maybe() for performance sensitive places.
> > So I do not change hugetlb_free_vmemmap_enabled
> > to static_branch_maybe(), this can reduce some codes
> > that need to be updated when the static key is enabled.
> > Actually, the user of hugetlb_free_vmemmap_enabled
> > is not performance sensitive.
>
> not quite sure if an unified inline API will be better, e.g.
>
> #ifdef CONFIG_SCHED_SMT
> extern struct static_key_false sched_smt_present;
>
> static __always_inline bool sched_smt_active(void)
> {
>         return static_branch_likely(&sched_smt_present);
> }
> #else
> static inline bool sched_smt_active(void) { return false; }
> #endif

Alright, I can change hugetlb_free_vmemmap_enabled to
an inline function.

>
> but in your case, CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> is always true in your page_head_if_fake(). Why do we check it
> again?

That is CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
not CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.

Thanks

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
@ 2021-09-18 11:47           ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-18 11:47 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 7:15 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > The page_head_if_fake() is used throughout memory management and the
> > > > conditional check requires checking a global variable, although the
> > > > overhead of this check may be small, it increases when the memory
> > > > cache comes under pressure. Also, the global variable will not be
> > > > modified after system boot, so it is very appropriate to use static
> > > > key machanism.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  include/linux/hugetlb.h    |  6 +++++-
> > > >  include/linux/page-flags.h |  6 ++++--
> > > >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> > > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > > > --- a/include/linux/hugetlb.h
> > > > +++ b/include/linux/hugetlb.h
> > > > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> > > >  #endif /* CONFIG_HUGETLB_PAGE */
> > > >
> > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > > +#define hugetlb_free_vmemmap_enabled                                    \
> > > > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > > > +
> > > >  #else
> > > >  #define hugetlb_free_vmemmap_enabled   false
> > > >  #endif
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index 7b1a918ebd43..d68d2cf30d76 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -185,7 +185,8 @@ enum pageflags {
> > > >  #ifndef __GENERATING_BOUNDS_H
> > > >
> > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > >
> > > >  /*
> > > >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> > > >   */
> > > >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > >  {
> > > > -       if (!hugetlb_free_vmemmap_enabled)
> > > > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > +                                &hugetlb_free_vmemmap_enabled_key))
> > >
> > > A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> > > defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> > > but here you are using static_branch_maybe() with the CONFIG and refer the key
> > > directly.
> > > Do we only need one of them? Or something is wrong?
> > >
> >
> > Yeah, we only need one. But my consideration is that we
> > use static_branch_maybe() for performance sensitive places.
> > So I do not change hugetlb_free_vmemmap_enabled
> > to static_branch_maybe(), this can reduce some codes
> > that need to be updated when the static key is enabled.
> > Actually, the user of hugetlb_free_vmemmap_enabled
> > is not performance sensitive.
>
> not quite sure if an unified inline API will be better, e.g.
>
> #ifdef CONFIG_SCHED_SMT
> extern struct static_key_false sched_smt_present;
>
> static __always_inline bool sched_smt_active(void)
> {
>         return static_branch_likely(&sched_smt_present);
> }
> #else
> static inline bool sched_smt_active(void) { return false; }
> #endif

Alright, I can change hugetlb_free_vmemmap_enabled to
an inline function.

>
> but in your case, CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> is always true in your page_head_if_fake(). Why do we check it
> again?

That is CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
not CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.

Thanks


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  2021-09-18 11:47           ` Muchun Song
@ 2021-09-18 12:27             ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18 12:27 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 11:48 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 7:15 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > The page_head_if_fake() is used throughout memory management and the
> > > > > conditional check requires checking a global variable, although the
> > > > > overhead of this check may be small, it increases when the memory
> > > > > cache comes under pressure. Also, the global variable will not be
> > > > > modified after system boot, so it is very appropriate to use static
> > > > > key machanism.
> > > > >
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > ---
> > > > >  include/linux/hugetlb.h    |  6 +++++-
> > > > >  include/linux/page-flags.h |  6 ++++--
> > > > >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> > > > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > > > > --- a/include/linux/hugetlb.h
> > > > > +++ b/include/linux/hugetlb.h
> > > > > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> > > > >  #endif /* CONFIG_HUGETLB_PAGE */
> > > > >
> > > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > > > +#define hugetlb_free_vmemmap_enabled                                    \
> > > > > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > > > > +
> > > > >  #else
> > > > >  #define hugetlb_free_vmemmap_enabled   false
> > > > >  #endif
> > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > index 7b1a918ebd43..d68d2cf30d76 100644
> > > > > --- a/include/linux/page-flags.h
> > > > > +++ b/include/linux/page-flags.h
> > > > > @@ -185,7 +185,8 @@ enum pageflags {
> > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > >
> > > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > > >
> > > > >  /*
> > > > >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> > > > >   */
> > > > >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > >  {
> > > > > -       if (!hugetlb_free_vmemmap_enabled)
> > > > > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > > +                                &hugetlb_free_vmemmap_enabled_key))
> > > >
> > > > A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> > > > defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> > > > but here you are using static_branch_maybe() with the CONFIG and refer the key
> > > > directly.
> > > > Do we only need one of them? Or something is wrong?
> > > >
> > >
> > > Yeah, we only need one. But my consideration is that we
> > > use static_branch_maybe() for performance sensitive places.
> > > So I do not change hugetlb_free_vmemmap_enabled
> > > to static_branch_maybe(), this can reduce some codes
> > > that need to be updated when the static key is enabled.
> > > Actually, the user of hugetlb_free_vmemmap_enabled
> > > is not performance sensitive.
> >
> > not quite sure if an unified inline API will be better, e.g.
> >
> > #ifdef CONFIG_SCHED_SMT
> > extern struct static_key_false sched_smt_present;
> >
> > static __always_inline bool sched_smt_active(void)
> > {
> >         return static_branch_likely(&sched_smt_present);
> > }
> > #else
> > static inline bool sched_smt_active(void) { return false; }
> > #endif
>
> Alright, I can change hugetlb_free_vmemmap_enabled to
> an inline function.
>
> >
> > but in your case, CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > is always true in your page_head_if_fake(). Why do we check it
> > again?
>
> That is CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
> not CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.

oops, sorry for missing that.

>
> Thanks

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
@ 2021-09-18 12:27             ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-18 12:27 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 11:48 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 7:15 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:55 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > The page_head_if_fake() is used throughout memory management and the
> > > > > conditional check requires checking a global variable, although the
> > > > > overhead of this check may be small, it increases when the memory
> > > > > cache comes under pressure. Also, the global variable will not be
> > > > > modified after system boot, so it is very appropriate to use static
> > > > > key machanism.
> > > > >
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > ---
> > > > >  include/linux/hugetlb.h    |  6 +++++-
> > > > >  include/linux/page-flags.h |  6 ++++--
> > > > >  mm/hugetlb_vmemmap.c       | 10 +++++-----
> > > > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > > index f7ca1a3870ea..ee3ddf3d12cf 100644
> > > > > --- a/include/linux/hugetlb.h
> > > > > +++ b/include/linux/hugetlb.h
> > > > > @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr
> > > > >  #endif /* CONFIG_HUGETLB_PAGE */
> > > > >
> > > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > > > +#define hugetlb_free_vmemmap_enabled                                    \
> > > > > +       static_key_enabled(&hugetlb_free_vmemmap_enabled_key)
> > > > > +
> > > > >  #else
> > > > >  #define hugetlb_free_vmemmap_enabled   false
> > > > >  #endif
> > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > index 7b1a918ebd43..d68d2cf30d76 100644
> > > > > --- a/include/linux/page-flags.h
> > > > > +++ b/include/linux/page-flags.h
> > > > > @@ -185,7 +185,8 @@ enum pageflags {
> > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > >
> > > > >  #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > -extern bool hugetlb_free_vmemmap_enabled;
> > > > > +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > > +                        hugetlb_free_vmemmap_enabled_key);
> > > > >
> > > > >  /*
> > > > >   * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> > > > >   */
> > > > >  static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > >  {
> > > > > -       if (!hugetlb_free_vmemmap_enabled)
> > > > > +       if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
> > > > > +                                &hugetlb_free_vmemmap_enabled_key))
> > > >
> > > > A question bothering me is that we still have hugetlb_free_vmemmap_enabled
> > > > defined as static_key_enabled(&hugetlb_free_vmemmap_enabled_key).
> > > > but here you are using static_branch_maybe() with the CONFIG and refer the key
> > > > directly.
> > > > Do we only need one of them? Or something is wrong?
> > > >
> > >
> > > Yeah, we only need one. But my consideration is that we
> > > use static_branch_maybe() for performance sensitive places.
> > > So I do not change hugetlb_free_vmemmap_enabled
> > > to static_branch_maybe(), this can reduce some codes
> > > that need to be updated when the static key is enabled.
> > > Actually, the user of hugetlb_free_vmemmap_enabled
> > > is not performance sensitive.
> >
> > not quite sure if an unified inline API will be better, e.g.
> >
> > #ifdef CONFIG_SCHED_SMT
> > extern struct static_key_false sched_smt_present;
> >
> > static __always_inline bool sched_smt_active(void)
> > {
> >         return static_branch_likely(&sched_smt_present);
> > }
> > #else
> > static inline bool sched_smt_active(void) { return false; }
> > #endif
>
> Alright, I can change hugetlb_free_vmemmap_enabled to
> an inline function.
>
> >
> > but in your case, CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > is always true in your page_head_if_fake(). Why do we check it
> > again?
>
> That is CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
> not CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.

oops, sorry for missing that.

>
> Thanks


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
  2021-09-18  5:20     ` Barry Song
@ 2021-09-20 14:26       ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-20 14:26 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 1:20 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > Since the head vmemmap page frame associated with each HugeTLB page is
> > reused, we should hide the PG_head flag of tail struct page from the
> > user. Add a tese case to check whether it is work properly.
> >
>
> TBH, I am a bit confused. I was thinking about some kernel unit tests to make
> sure those kernel APIs touched by this patchset are still working as before.
> This userspace test, while certainly useful for checking the content of page
> frames as expected, doesn't directly prove things haven't changed.
>
> In patch 1/4, a couple of APIs have the fixup for the fake head issue.
> Do you think a test like the below would be more sensible?
> 1. alloc 2MB hugeTLB

It is done in main().

> 2. get each page frame
> 3. apply those APIs in each page frame
> 4. Those APIs work completely the same as before.

Reading the flags of a page by /proc/kpageflags is done
in stable_page_flags(), which has invoked PageHead(),
PageTail(), PageCompound() and compound_head().
If those APIs work properly, the head page must have
15 and 17 bits set. And tail pages must have 16 and 17
bits set but 15 unset.

So I think check_page_flags() has done the step 2 to 4.
What do you think?

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
@ 2021-09-20 14:26       ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-20 14:26 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 1:20 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > Since the head vmemmap page frame associated with each HugeTLB page is
> > reused, we should hide the PG_head flag of tail struct page from the
> > user. Add a tese case to check whether it is work properly.
> >
>
> TBH, I am a bit confused. I was thinking about some kernel unit tests to make
> sure those kernel APIs touched by this patchset are still working as before.
> This userspace test, while certainly useful for checking the content of page
> frames as expected, doesn't directly prove things haven't changed.
>
> In patch 1/4, a couple of APIs have the fixup for the fake head issue.
> Do you think a test like the below would be more sensible?
> 1. alloc 2MB hugeTLB

It is done in main().

> 2. get each page frame
> 3. apply those APIs in each page frame
> 4. Those APIs work completely the same as before.

Reading the flags of a page by /proc/kpageflags is done
in stable_page_flags(), which has invoked PageHead(),
PageTail(), PageCompound() and compound_head().
If those APIs work properly, the head page must have
15 and 17 bits set. And tail pages must have 16 and 17
bits set but 15 unset.

So I think check_page_flags() has done the step 2 to 4.
What do you think?

Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-21 10:22         ` Muchun Song
@ 2021-09-21  0:11           ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-21  0:11 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > compared with the current implementation).
> > > >
> > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > compound_head() to make it returns the real head struct page when the
> > > > parameter is the tail struct page but with PG_head flag.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > @@ -1606,7 +1606,7 @@
> > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > >                         enabled.
> > > >                         Allows heavy hugetlb users to free up some more
> > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > >                         Format: { on | off (default) }
> > > >
> > > >                         on:  enable the feature
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > >
> > > >  #ifndef __GENERATING_BOUNDS_H
> > > >
> > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > +
> > > > +/*
> > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > + * word, there are more than one page struct with PG_head associated with each
> > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > + * to distinguish between those two different types of page structs so that
> > > > + * compound_head() can return the real head page struct when the parameter is
> > > > + * the tail page struct but with PG_head.
> > > > + *
> > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > + */
> > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > +{
> > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > +               return page;
> > > > +
> > > > +       /*
> > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > +        * cold cacheline in some cases.
> > > > +        */
> > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > +           test_bit(PG_head, &page->flags)) {
> > > > +               /*
> > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > +                * because the @page is a compound page composed with at least
> > > > +                * two contiguous pages.
> > > > +                */
> > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > +
> > > > +               if (likely(head & 1))
> > > > +                       return (const struct page *)(head - 1);
> > > > +       }
> > > > +
> > > > +       return page;
> > > > +}
> > > > +#else
> > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > +{
> > > > +       return page;
> > > > +}
> > > > +#endif
> > > > +
> > > >  static inline unsigned long _compound_head(const struct page *page)
> > > >  {
> > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > >
> > > >         if (unlikely(head & 1))
> > > >                 return head - 1;
> > > > -       return (unsigned long)page;
> > > > +       return (unsigned long)page_head_if_fake(page);
> > >
> > > hard to read. page_head_if_fake,  what is the other side of
> > > page_head_if_not_fake?
> >
> > 1) return itself if the @page is not a fake head page.
> > 2) return head page if @page is a fake head page.
> >
> > So I want to express that page_head_if_fake returns a
> > head page only and only if the parameter of @page is a
> > fake head page. Otherwise, it returns itself.
> >
> > > I would expect something like
> > > page_to_page_head()
> > > or
> > > get_page_head()
> > >
> >
> > Those names seem to be not appropriate as well, because
> > its functionality does not make sure it can return a head
> > page. If the parameter is a head page, it definitely
> > returns a head page, otherwise, it may return itself which
> > may be a tail page.
> >
> > From this point of view, I still prefer page_head_if_fake.
> >
> > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > one side only.
> >
> > If others have any ideas, comments are welcome.
> >
> > >
> > > >  }
> > > >
> > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > >
> > > >  static __always_inline int PageTail(struct page *page)
> > > >  {
> > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > +              page_head_if_fake(page) != page;
> > >
> > > i would expect a wrapper like:
> > > page_is_fake_head()
> >
> > Good point. Will do.
> >
> > >
> > > and the above page_to_page_head() can leverage the wrapper.
> > > here too.
> > >
> > > >  }
> > > >
> > > >  static __always_inline int PageCompound(struct page *page)
> > > >  {
> > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > +       return test_bit(PG_head, &page->flags) ||
> > > > +              READ_ONCE(page->compound_head) & 1;
> > >
> > > hard to read. could it be something like the below?
> > > return PageHead(page) || PageTail(page);
> > >
> > > or do we really need to change this function? even a fake head still has
> > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > is still a pagecompound, right?
> >
> > Right. PageCompound() can not be changed.  It is odd but
> > efficient because calling page_head_if_fake is eliminated.
> > So I select performance not readability. I'm not sure if it's
> > worth it.
>
> In order to improve readability, I'll introduce 3 helpers as follows.
>
> 1) page_head_or_fake(), which returns true for the head page
>    or fake head page.
> 2) page_head_is_fake(), which returns true for fake head page.
> 3) page_tail_not_fake_head(), which returns true for the tail page
>    except the fake head page.
>
> In the end, PageHead(), PageTail() and PageCompound() become
> the following.
>
> static __always_inline int PageHead(struct page *page)
> {
>     return page_head_or_fake(page) && !page_head_is_fake(page);
> }
>
> static __always_inline int PageTail(struct page *page)
> {
>     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> }
>
> static __always_inline int PageCompound(struct page *page)
> {
>     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> }
>
> Do those look more readable?
>

still not good enough. After a second thought, page_head_if_fake seems
to have the best performance though this function returns an odd value.
i just made a little bit refine on your code in doc:

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 2c0d11e71e26..240c2fca13c7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
  * compound_head() can return the real head page struct when the parameter is
  * the tail page struct but with PG_head.
  *
- * The page_head_if_fake() returns the real head page struct iff the @page may
- * be fake, otherwise, returns the @page if it cannot be a fake page struct.
+ * The page_head_if_fake() returns the real head page struct if the @page is
+ * fake page_head, otherwise, returns @page which can either be a true page_
+ * head or tail.
  */
 static __always_inline const struct page *page_head_if_fake(const
struct page *page)
 {
@@ -226,6 +227,12 @@ static __always_inline const struct page
*page_head_if_fake(const struct page *p

        return page;
 }
+
+static __always_inline const struct page *page_is_fake_head(const
struct page *page)
+{
+       return page_head_if_fake(page) != page;
+}
+
 #else
 static __always_inline const struct page *page_head_if_fake(const
struct page *page)
 {
@@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
struct page *page)
 static __always_inline int PageTail(struct page *page)
 {
        return READ_ONCE(page->compound_head) & 1 ||
-              page_head_if_fake(page) != page;
+              page_is_fake_head(page);
 }

 static __always_inline int PageCompound(struct page *page)

> Thanks.

Thanks
barry

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-21  0:11           ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-21  0:11 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > compared with the current implementation).
> > > >
> > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > compound_head() to make it returns the real head struct page when the
> > > > parameter is the tail struct page but with PG_head flag.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > @@ -1606,7 +1606,7 @@
> > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > >                         enabled.
> > > >                         Allows heavy hugetlb users to free up some more
> > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > >                         Format: { on | off (default) }
> > > >
> > > >                         on:  enable the feature
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > >
> > > >  #ifndef __GENERATING_BOUNDS_H
> > > >
> > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > +
> > > > +/*
> > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > + * word, there are more than one page struct with PG_head associated with each
> > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > + * to distinguish between those two different types of page structs so that
> > > > + * compound_head() can return the real head page struct when the parameter is
> > > > + * the tail page struct but with PG_head.
> > > > + *
> > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > + */
> > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > +{
> > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > +               return page;
> > > > +
> > > > +       /*
> > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > +        * cold cacheline in some cases.
> > > > +        */
> > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > +           test_bit(PG_head, &page->flags)) {
> > > > +               /*
> > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > +                * because the @page is a compound page composed with at least
> > > > +                * two contiguous pages.
> > > > +                */
> > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > +
> > > > +               if (likely(head & 1))
> > > > +                       return (const struct page *)(head - 1);
> > > > +       }
> > > > +
> > > > +       return page;
> > > > +}
> > > > +#else
> > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > +{
> > > > +       return page;
> > > > +}
> > > > +#endif
> > > > +
> > > >  static inline unsigned long _compound_head(const struct page *page)
> > > >  {
> > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > >
> > > >         if (unlikely(head & 1))
> > > >                 return head - 1;
> > > > -       return (unsigned long)page;
> > > > +       return (unsigned long)page_head_if_fake(page);
> > >
> > > hard to read. page_head_if_fake,  what is the other side of
> > > page_head_if_not_fake?
> >
> > 1) return itself if the @page is not a fake head page.
> > 2) return head page if @page is a fake head page.
> >
> > So I want to express that page_head_if_fake returns a
> > head page only and only if the parameter of @page is a
> > fake head page. Otherwise, it returns itself.
> >
> > > I would expect something like
> > > page_to_page_head()
> > > or
> > > get_page_head()
> > >
> >
> > Those names seem to be not appropriate as well, because
> > its functionality does not make sure it can return a head
> > page. If the parameter is a head page, it definitely
> > returns a head page, otherwise, it may return itself which
> > may be a tail page.
> >
> > From this point of view, I still prefer page_head_if_fake.
> >
> > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > one side only.
> >
> > If others have any ideas, comments are welcome.
> >
> > >
> > > >  }
> > > >
> > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > >
> > > >  static __always_inline int PageTail(struct page *page)
> > > >  {
> > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > +              page_head_if_fake(page) != page;
> > >
> > > i would expect a wrapper like:
> > > page_is_fake_head()
> >
> > Good point. Will do.
> >
> > >
> > > and the above page_to_page_head() can leverage the wrapper.
> > > here too.
> > >
> > > >  }
> > > >
> > > >  static __always_inline int PageCompound(struct page *page)
> > > >  {
> > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > +       return test_bit(PG_head, &page->flags) ||
> > > > +              READ_ONCE(page->compound_head) & 1;
> > >
> > > hard to read. could it be something like the below?
> > > return PageHead(page) || PageTail(page);
> > >
> > > or do we really need to change this function? even a fake head still has
> > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > is still a pagecompound, right?
> >
> > Right. PageCompound() can not be changed.  It is odd but
> > efficient because calling page_head_if_fake is eliminated.
> > So I select performance not readability. I'm not sure if it's
> > worth it.
>
> In order to improve readability, I'll introduce 3 helpers as follows.
>
> 1) page_head_or_fake(), which returns true for the head page
>    or fake head page.
> 2) page_head_is_fake(), which returns true for fake head page.
> 3) page_tail_not_fake_head(), which returns true for the tail page
>    except the fake head page.
>
> In the end, PageHead(), PageTail() and PageCompound() become
> the following.
>
> static __always_inline int PageHead(struct page *page)
> {
>     return page_head_or_fake(page) && !page_head_is_fake(page);
> }
>
> static __always_inline int PageTail(struct page *page)
> {
>     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> }
>
> static __always_inline int PageCompound(struct page *page)
> {
>     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> }
>
> Do those look more readable?
>

still not good enough. After a second thought, page_head_if_fake seems
to have the best performance though this function returns an odd value.
i just made a little bit refine on your code in doc:

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 2c0d11e71e26..240c2fca13c7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
  * compound_head() can return the real head page struct when the parameter is
  * the tail page struct but with PG_head.
  *
- * The page_head_if_fake() returns the real head page struct iff the @page may
- * be fake, otherwise, returns the @page if it cannot be a fake page struct.
+ * The page_head_if_fake() returns the real head page struct if the @page is
+ * fake page_head, otherwise, returns @page which can either be a true page_
+ * head or tail.
  */
 static __always_inline const struct page *page_head_if_fake(const
struct page *page)
 {
@@ -226,6 +227,12 @@ static __always_inline const struct page
*page_head_if_fake(const struct page *p

        return page;
 }
+
+static __always_inline const struct page *page_is_fake_head(const
struct page *page)
+{
+       return page_head_if_fake(page) != page;
+}
+
 #else
 static __always_inline const struct page *page_head_if_fake(const
struct page *page)
 {
@@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
struct page *page)
 static __always_inline int PageTail(struct page *page)
 {
        return READ_ONCE(page->compound_head) & 1 ||
-              page_head_if_fake(page) != page;
+              page_is_fake_head(page);
 }

 static __always_inline int PageCompound(struct page *page)

> Thanks.

Thanks
barry


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
  2021-09-20 14:26       ` Muchun Song
@ 2021-09-21  0:28         ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-21  0:28 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 2:26 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 1:20 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > Since the head vmemmap page frame associated with each HugeTLB page is
> > > reused, we should hide the PG_head flag of tail struct page from the
> > > user. Add a tese case to check whether it is work properly.
> > >
> >
> > TBH, I am a bit confused. I was thinking about some kernel unit tests to make
> > sure those kernel APIs touched by this patchset are still working as before.
> > This userspace test, while certainly useful for checking the content of page
> > frames as expected, doesn't directly prove things haven't changed.
> >
> > In patch 1/4, a couple of APIs have the fixup for the fake head issue.
> > Do you think a test like the below would be more sensible?
> > 1. alloc 2MB hugeTLB
>
> It is done in main().
>
> > 2. get each page frame
> > 3. apply those APIs in each page frame
> > 4. Those APIs work completely the same as before.
>
> Reading the flags of a page by /proc/kpageflags is done
> in stable_page_flags(), which has invoked PageHead(),
> PageTail(), PageCompound() and compound_head().
> If those APIs work properly, the head page must have
> 15 and 17 bits set. And tail pages must have 16 and 17
> bits set but 15 unset.
>
> So I think check_page_flags() has done the step 2 to 4.
> What do you think?

yes. Thanks for your explanation. thereby, I think we just need some doc
here to explain what it is checking. something like
/*
 * pages other than the first page must be tail and shouldn't be head;
 * this also verifies kernel has correctly set the fake page_head to tail
 * while hugetlb_free_vmemmap is enabled
 */
+       for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+               read(fd, &pageflags, sizeof(pageflags));
+               if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
+                   (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
+                       close(fd);
+                       printf("Tail page flags (%lx) is invalid\n", pageflags);
+                       return -1;
+               }
+       }
>
> Thanks.

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
@ 2021-09-21  0:28         ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-21  0:28 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 2:26 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 1:20 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > Since the head vmemmap page frame associated with each HugeTLB page is
> > > reused, we should hide the PG_head flag of tail struct page from the
> > > user. Add a tese case to check whether it is work properly.
> > >
> >
> > TBH, I am a bit confused. I was thinking about some kernel unit tests to make
> > sure those kernel APIs touched by this patchset are still working as before.
> > This userspace test, while certainly useful for checking the content of page
> > frames as expected, doesn't directly prove things haven't changed.
> >
> > In patch 1/4, a couple of APIs have the fixup for the fake head issue.
> > Do you think a test like the below would be more sensible?
> > 1. alloc 2MB hugeTLB
>
> It is done in main().
>
> > 2. get each page frame
> > 3. apply those APIs in each page frame
> > 4. Those APIs work completely the same as before.
>
> Reading the flags of a page by /proc/kpageflags is done
> in stable_page_flags(), which has invoked PageHead(),
> PageTail(), PageCompound() and compound_head().
> If those APIs work properly, the head page must have
> 15 and 17 bits set. And tail pages must have 16 and 17
> bits set but 15 unset.
>
> So I think check_page_flags() has done the step 2 to 4.
> What do you think?

yes. Thanks for your explanation. thereby, I think we just need some doc
here to explain what it is checking. something like
/*
 * pages other than the first page must be tail and shouldn't be head;
 * this also verifies kernel has correctly set the fake page_head to tail
 * while hugetlb_free_vmemmap is enabled
 */
+       for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+               read(fd, &pageflags, sizeof(pageflags));
+               if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
+                   (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
+                       close(fd);
+                       printf("Tail page flags (%lx) is invalid\n", pageflags);
+                       return -1;
+               }
+       }
>
> Thanks.

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-18 10:06       ` Muchun Song
@ 2021-09-21  6:43         ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21  6:43 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > page. However, we can remap all tail vmemmap pages to the page frame
> > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > compared with the current implementation).
> > >
> > > But the head vmemmap page is not freed to the buddy allocator and all
> > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > compound_head() to make it returns the real head struct page when the
> > > parameter is the tail struct page but with PG_head flag.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index bdb22006f713..a154a7b3b9a5 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1606,7 +1606,7 @@
> > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > >                         enabled.
> > >                         Allows heavy hugetlb users to free up some more
> > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > >                         Format: { on | off (default) }
> > >
> > >                         on:  enable the feature
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -184,13 +184,64 @@ enum pageflags {
> > >
> > >  #ifndef __GENERATING_BOUNDS_H
> > >
> > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > +extern bool hugetlb_free_vmemmap_enabled;
> > > +
> > > +/*
> > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > + * word, there are more than one page struct with PG_head associated with each
> > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > + * to distinguish between those two different types of page structs so that
> > > + * compound_head() can return the real head page struct when the parameter is
> > > + * the tail page struct but with PG_head.
> > > + *
> > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > + */
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       if (!hugetlb_free_vmemmap_enabled)
> > > +               return page;
> > > +
> > > +       /*
> > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > +        * struct page. The alignment check aims to avoid access the fields (
> > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > +        * cold cacheline in some cases.
> > > +        */
> > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > +           test_bit(PG_head, &page->flags)) {
> > > +               /*
> > > +                * We can safely access the field of the @page[1] with PG_head
> > > +                * because the @page is a compound page composed with at least
> > > +                * two contiguous pages.
> > > +                */
> > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > +
> > > +               if (likely(head & 1))
> > > +                       return (const struct page *)(head - 1);
> > > +       }
> > > +
> > > +       return page;
> > > +}
> > > +#else
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       return page;
> > > +}
> > > +#endif
> > > +
> > >  static inline unsigned long _compound_head(const struct page *page)
> > >  {
> > >         unsigned long head = READ_ONCE(page->compound_head);
> > >
> > >         if (unlikely(head & 1))
> > >                 return head - 1;
> > > -       return (unsigned long)page;
> > > +       return (unsigned long)page_head_if_fake(page);
> >
> > hard to read. page_head_if_fake,  what is the other side of
> > page_head_if_not_fake?
>
> 1) return itself if the @page is not a fake head page.
> 2) return head page if @page is a fake head page.
>
> So I want to express that page_head_if_fake returns a
> head page only and only if the parameter of @page is a
> fake head page. Otherwise, it returns itself.
>
> > I would expect something like
> > page_to_page_head()
> > or
> > get_page_head()
> >
>
> Those names seem to be not appropriate as well, because
> its functionality does not make sure it can return a head
> page. If the parameter is a head page, it definitely
> returns a head page, otherwise, it may return itself which
> may be a tail page.
>
> From this point of view, I still prefer page_head_if_fake.

After some thinking, I figured out 2 names.

page_head_if_fake() always returns a head page if the parameter
of @page is not a compound page or its ->flags has PG_head set
(you can think the head page is itself if the page is not a compound
page). All the callers of it already guarantee this. It means it has to
return a head page unless the @page is a tail page (except fake
head page). So I propose two names as follows.

1) page_head_unless_tail
2) page_head_filter_fake

The former means it always returns a head page unless the
caller passes a tail page as a parameter. The latter means
it always returns a head page but filtering out the fake head
page. The former is inspired by get_page_unless_zero.

What do you think?

Thanks.

>
> > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > sounds odd to me. just like the things have two sides, but if_fake  presents
> > one side only.
>
> If others have any ideas, comments are welcome.
>
> >
> > >  }
> > >
> > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > >
> > >  static __always_inline int PageTail(struct page *page)
> > >  {
> > > -       return READ_ONCE(page->compound_head) & 1;
> > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > +              page_head_if_fake(page) != page;
> >
> > i would expect a wrapper like:
> > page_is_fake_head()
>
> Good point. Will do.
>
> >
> > and the above page_to_page_head() can leverage the wrapper.
> > here too.
> >
> > >  }
> > >
> > >  static __always_inline int PageCompound(struct page *page)
> > >  {
> > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > +       return test_bit(PG_head, &page->flags) ||
> > > +              READ_ONCE(page->compound_head) & 1;
> >
> > hard to read. could it be something like the below?
> > return PageHead(page) || PageTail(page);
> >
> > or do we really need to change this function? even a fake head still has
> > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > is still a pagecompound, right?
>
> Right. PageCompound() can not be changed.  It is odd but
> efficient because calling page_head_if_fake is eliminated.
> So I select performance not readability. I'm not sure if it's
> worth it.
>
> Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-21  6:43         ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21  6:43 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > page. However, we can remap all tail vmemmap pages to the page frame
> > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > compared with the current implementation).
> > >
> > > But the head vmemmap page is not freed to the buddy allocator and all
> > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > compound_head() to make it returns the real head struct page when the
> > > parameter is the tail struct page but with PG_head flag.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index bdb22006f713..a154a7b3b9a5 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1606,7 +1606,7 @@
> > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > >                         enabled.
> > >                         Allows heavy hugetlb users to free up some more
> > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > >                         Format: { on | off (default) }
> > >
> > >                         on:  enable the feature
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -184,13 +184,64 @@ enum pageflags {
> > >
> > >  #ifndef __GENERATING_BOUNDS_H
> > >
> > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > +extern bool hugetlb_free_vmemmap_enabled;
> > > +
> > > +/*
> > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > + * word, there are more than one page struct with PG_head associated with each
> > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > + * to distinguish between those two different types of page structs so that
> > > + * compound_head() can return the real head page struct when the parameter is
> > > + * the tail page struct but with PG_head.
> > > + *
> > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > + */
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       if (!hugetlb_free_vmemmap_enabled)
> > > +               return page;
> > > +
> > > +       /*
> > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > +        * struct page. The alignment check aims to avoid access the fields (
> > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > +        * cold cacheline in some cases.
> > > +        */
> > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > +           test_bit(PG_head, &page->flags)) {
> > > +               /*
> > > +                * We can safely access the field of the @page[1] with PG_head
> > > +                * because the @page is a compound page composed with at least
> > > +                * two contiguous pages.
> > > +                */
> > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > +
> > > +               if (likely(head & 1))
> > > +                       return (const struct page *)(head - 1);
> > > +       }
> > > +
> > > +       return page;
> > > +}
> > > +#else
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       return page;
> > > +}
> > > +#endif
> > > +
> > >  static inline unsigned long _compound_head(const struct page *page)
> > >  {
> > >         unsigned long head = READ_ONCE(page->compound_head);
> > >
> > >         if (unlikely(head & 1))
> > >                 return head - 1;
> > > -       return (unsigned long)page;
> > > +       return (unsigned long)page_head_if_fake(page);
> >
> > hard to read. page_head_if_fake,  what is the other side of
> > page_head_if_not_fake?
>
> 1) return itself if the @page is not a fake head page.
> 2) return head page if @page is a fake head page.
>
> So I want to express that page_head_if_fake returns a
> head page only and only if the parameter of @page is a
> fake head page. Otherwise, it returns itself.
>
> > I would expect something like
> > page_to_page_head()
> > or
> > get_page_head()
> >
>
> Those names seem to be not appropriate as well, because
> its functionality does not make sure it can return a head
> page. If the parameter is a head page, it definitely
> returns a head page, otherwise, it may return itself which
> may be a tail page.
>
> From this point of view, I still prefer page_head_if_fake.

After some thinking, I figured out 2 names.

page_head_if_fake() always returns a head page if the parameter
of @page is not a compound page or its ->flags has PG_head set
(you can think the head page is itself if the page is not a compound
page). All the callers of it already guarantee this. It means it has to
return a head page unless the @page is a tail page (except fake
head page). So I propose two names as follows.

1) page_head_unless_tail
2) page_head_filter_fake

The former means it always returns a head page unless the
caller passes a tail page as a parameter. The latter means
it always returns a head page but filtering out the fake head
page. The former is inspired by get_page_unless_zero.

What do you think?

Thanks.

>
> > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > sounds odd to me. just like the things have two sides, but if_fake  presents
> > one side only.
>
> If others have any ideas, comments are welcome.
>
> >
> > >  }
> > >
> > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > >
> > >  static __always_inline int PageTail(struct page *page)
> > >  {
> > > -       return READ_ONCE(page->compound_head) & 1;
> > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > +              page_head_if_fake(page) != page;
> >
> > i would expect a wrapper like:
> > page_is_fake_head()
>
> Good point. Will do.
>
> >
> > and the above page_to_page_head() can leverage the wrapper.
> > here too.
> >
> > >  }
> > >
> > >  static __always_inline int PageCompound(struct page *page)
> > >  {
> > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > +       return test_bit(PG_head, &page->flags) ||
> > > +              READ_ONCE(page->compound_head) & 1;
> >
> > hard to read. could it be something like the below?
> > return PageHead(page) || PageTail(page);
> >
> > or do we really need to change this function? even a fake head still has
> > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > is still a pagecompound, right?
>
> Right. PageCompound() can not be changed.  It is odd but
> efficient because calling page_head_if_fake is eliminated.
> So I select performance not readability. I'm not sure if it's
> worth it.
>
> Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-18 10:06       ` Muchun Song
@ 2021-09-21 10:22         ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21 10:22 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > page. However, we can remap all tail vmemmap pages to the page frame
> > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > compared with the current implementation).
> > >
> > > But the head vmemmap page is not freed to the buddy allocator and all
> > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > compound_head() to make it returns the real head struct page when the
> > > parameter is the tail struct page but with PG_head flag.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index bdb22006f713..a154a7b3b9a5 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1606,7 +1606,7 @@
> > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > >                         enabled.
> > >                         Allows heavy hugetlb users to free up some more
> > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > >                         Format: { on | off (default) }
> > >
> > >                         on:  enable the feature
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -184,13 +184,64 @@ enum pageflags {
> > >
> > >  #ifndef __GENERATING_BOUNDS_H
> > >
> > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > +extern bool hugetlb_free_vmemmap_enabled;
> > > +
> > > +/*
> > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > + * word, there are more than one page struct with PG_head associated with each
> > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > + * to distinguish between those two different types of page structs so that
> > > + * compound_head() can return the real head page struct when the parameter is
> > > + * the tail page struct but with PG_head.
> > > + *
> > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > + */
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       if (!hugetlb_free_vmemmap_enabled)
> > > +               return page;
> > > +
> > > +       /*
> > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > +        * struct page. The alignment check aims to avoid access the fields (
> > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > +        * cold cacheline in some cases.
> > > +        */
> > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > +           test_bit(PG_head, &page->flags)) {
> > > +               /*
> > > +                * We can safely access the field of the @page[1] with PG_head
> > > +                * because the @page is a compound page composed with at least
> > > +                * two contiguous pages.
> > > +                */
> > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > +
> > > +               if (likely(head & 1))
> > > +                       return (const struct page *)(head - 1);
> > > +       }
> > > +
> > > +       return page;
> > > +}
> > > +#else
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       return page;
> > > +}
> > > +#endif
> > > +
> > >  static inline unsigned long _compound_head(const struct page *page)
> > >  {
> > >         unsigned long head = READ_ONCE(page->compound_head);
> > >
> > >         if (unlikely(head & 1))
> > >                 return head - 1;
> > > -       return (unsigned long)page;
> > > +       return (unsigned long)page_head_if_fake(page);
> >
> > hard to read. page_head_if_fake,  what is the other side of
> > page_head_if_not_fake?
>
> 1) return itself if the @page is not a fake head page.
> 2) return head page if @page is a fake head page.
>
> So I want to express that page_head_if_fake returns a
> head page only and only if the parameter of @page is a
> fake head page. Otherwise, it returns itself.
>
> > I would expect something like
> > page_to_page_head()
> > or
> > get_page_head()
> >
>
> Those names seem to be not appropriate as well, because
> its functionality does not make sure it can return a head
> page. If the parameter is a head page, it definitely
> returns a head page, otherwise, it may return itself which
> may be a tail page.
>
> From this point of view, I still prefer page_head_if_fake.
>
> > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > sounds odd to me. just like the things have two sides, but if_fake  presents
> > one side only.
>
> If others have any ideas, comments are welcome.
>
> >
> > >  }
> > >
> > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > >
> > >  static __always_inline int PageTail(struct page *page)
> > >  {
> > > -       return READ_ONCE(page->compound_head) & 1;
> > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > +              page_head_if_fake(page) != page;
> >
> > i would expect a wrapper like:
> > page_is_fake_head()
>
> Good point. Will do.
>
> >
> > and the above page_to_page_head() can leverage the wrapper.
> > here too.
> >
> > >  }
> > >
> > >  static __always_inline int PageCompound(struct page *page)
> > >  {
> > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > +       return test_bit(PG_head, &page->flags) ||
> > > +              READ_ONCE(page->compound_head) & 1;
> >
> > hard to read. could it be something like the below?
> > return PageHead(page) || PageTail(page);
> >
> > or do we really need to change this function? even a fake head still has
> > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > is still a pagecompound, right?
>
> Right. PageCompound() can not be changed.  It is odd but
> efficient because calling page_head_if_fake is eliminated.
> So I select performance not readability. I'm not sure if it's
> worth it.

In order to improve readability, I'll introduce 3 helpers as follows.

1) page_head_or_fake(), which returns true for the head page
   or fake head page.
2) page_head_is_fake(), which returns true for fake head page.
3) page_tail_not_fake_head(), which returns true for the tail page
   except the fake head page.

In the end, PageHead(), PageTail() and PageCompound() become
the following.

static __always_inline int PageHead(struct page *page)
{
    return page_head_or_fake(page) && !page_head_is_fake(page);
}

static __always_inline int PageTail(struct page *page)
{
    return page_tail_not_fake_head(page) || page_head_is_fake(page);
}

static __always_inline int PageCompound(struct page *page)
{
    return page_head_or_fake(page) || page_tail_not_fake_head(page);
}

Do those look more readable?

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-21 10:22         ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21 10:22 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > page. However, we can remap all tail vmemmap pages to the page frame
> > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > compared with the current implementation).
> > >
> > > But the head vmemmap page is not freed to the buddy allocator and all
> > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > compound_head() to make it returns the real head struct page when the
> > > parameter is the tail struct page but with PG_head flag.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index bdb22006f713..a154a7b3b9a5 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1606,7 +1606,7 @@
> > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > >                         enabled.
> > >                         Allows heavy hugetlb users to free up some more
> > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > >                         Format: { on | off (default) }
> > >
> > >                         on:  enable the feature
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -184,13 +184,64 @@ enum pageflags {
> > >
> > >  #ifndef __GENERATING_BOUNDS_H
> > >
> > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > +extern bool hugetlb_free_vmemmap_enabled;
> > > +
> > > +/*
> > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > + * word, there are more than one page struct with PG_head associated with each
> > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > + * to distinguish between those two different types of page structs so that
> > > + * compound_head() can return the real head page struct when the parameter is
> > > + * the tail page struct but with PG_head.
> > > + *
> > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > + */
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       if (!hugetlb_free_vmemmap_enabled)
> > > +               return page;
> > > +
> > > +       /*
> > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > +        * struct page. The alignment check aims to avoid access the fields (
> > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > +        * cold cacheline in some cases.
> > > +        */
> > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > +           test_bit(PG_head, &page->flags)) {
> > > +               /*
> > > +                * We can safely access the field of the @page[1] with PG_head
> > > +                * because the @page is a compound page composed with at least
> > > +                * two contiguous pages.
> > > +                */
> > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > +
> > > +               if (likely(head & 1))
> > > +                       return (const struct page *)(head - 1);
> > > +       }
> > > +
> > > +       return page;
> > > +}
> > > +#else
> > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > +{
> > > +       return page;
> > > +}
> > > +#endif
> > > +
> > >  static inline unsigned long _compound_head(const struct page *page)
> > >  {
> > >         unsigned long head = READ_ONCE(page->compound_head);
> > >
> > >         if (unlikely(head & 1))
> > >                 return head - 1;
> > > -       return (unsigned long)page;
> > > +       return (unsigned long)page_head_if_fake(page);
> >
> > hard to read. page_head_if_fake,  what is the other side of
> > page_head_if_not_fake?
>
> 1) return itself if the @page is not a fake head page.
> 2) return head page if @page is a fake head page.
>
> So I want to express that page_head_if_fake returns a
> head page only and only if the parameter of @page is a
> fake head page. Otherwise, it returns itself.
>
> > I would expect something like
> > page_to_page_head()
> > or
> > get_page_head()
> >
>
> Those names seem to be not appropriate as well, because
> its functionality does not make sure it can return a head
> page. If the parameter is a head page, it definitely
> returns a head page, otherwise, it may return itself which
> may be a tail page.
>
> From this point of view, I still prefer page_head_if_fake.
>
> > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > sounds odd to me. just like the things have two sides, but if_fake  presents
> > one side only.
>
> If others have any ideas, comments are welcome.
>
> >
> > >  }
> > >
> > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > >
> > >  static __always_inline int PageTail(struct page *page)
> > >  {
> > > -       return READ_ONCE(page->compound_head) & 1;
> > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > +              page_head_if_fake(page) != page;
> >
> > i would expect a wrapper like:
> > page_is_fake_head()
>
> Good point. Will do.
>
> >
> > and the above page_to_page_head() can leverage the wrapper.
> > here too.
> >
> > >  }
> > >
> > >  static __always_inline int PageCompound(struct page *page)
> > >  {
> > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > +       return test_bit(PG_head, &page->flags) ||
> > > +              READ_ONCE(page->compound_head) & 1;
> >
> > hard to read. could it be something like the below?
> > return PageHead(page) || PageTail(page);
> >
> > or do we really need to change this function? even a fake head still has
> > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > is still a pagecompound, right?
>
> Right. PageCompound() can not be changed.  It is odd but
> efficient because calling page_head_if_fake is eliminated.
> So I select performance not readability. I'm not sure if it's
> worth it.

In order to improve readability, I'll introduce 3 helpers as follows.

1) page_head_or_fake(), which returns true for the head page
   or fake head page.
2) page_head_is_fake(), which returns true for fake head page.
3) page_tail_not_fake_head(), which returns true for the tail page
   except the fake head page.

In the end, PageHead(), PageTail() and PageCompound() become
the following.

static __always_inline int PageHead(struct page *page)
{
    return page_head_or_fake(page) && !page_head_is_fake(page);
}

static __always_inline int PageTail(struct page *page)
{
    return page_tail_not_fake_head(page) || page_head_is_fake(page);
}

static __always_inline int PageCompound(struct page *page)
{
    return page_head_or_fake(page) || page_tail_not_fake_head(page);
}

Do those look more readable?

Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
  2021-09-21  0:28         ` Barry Song
@ 2021-09-21 13:18           ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21 13:18 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 8:29 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 2:26 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 1:20 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > Since the head vmemmap page frame associated with each HugeTLB page is
> > > > reused, we should hide the PG_head flag of tail struct page from the
> > > > user. Add a tese case to check whether it is work properly.
> > > >
> > >
> > > TBH, I am a bit confused. I was thinking about some kernel unit tests to make
> > > sure those kernel APIs touched by this patchset are still working as before.
> > > This userspace test, while certainly useful for checking the content of page
> > > frames as expected, doesn't directly prove things haven't changed.
> > >
> > > In patch 1/4, a couple of APIs have the fixup for the fake head issue.
> > > Do you think a test like the below would be more sensible?
> > > 1. alloc 2MB hugeTLB
> >
> > It is done in main().
> >
> > > 2. get each page frame
> > > 3. apply those APIs in each page frame
> > > 4. Those APIs work completely the same as before.
> >
> > Reading the flags of a page by /proc/kpageflags is done
> > in stable_page_flags(), which has invoked PageHead(),
> > PageTail(), PageCompound() and compound_head().
> > If those APIs work properly, the head page must have
> > 15 and 17 bits set. And tail pages must have 16 and 17
> > bits set but 15 unset.
> >
> > So I think check_page_flags() has done the step 2 to 4.
> > What do you think?
>
> yes. Thanks for your explanation. thereby, I think we just need some doc
> here to explain what it is checking. something like
> /*
>  * pages other than the first page must be tail and shouldn't be head;
>  * this also verifies kernel has correctly set the fake page_head to tail
>  * while hugetlb_free_vmemmap is enabled
>  */

Got it. Will do. Thanks.

> +       for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
> +               read(fd, &pageflags, sizeof(pageflags));
> +               if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
> +                   (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
> +                       close(fd);
> +                       printf("Tail page flags (%lx) is invalid\n", pageflags);
> +                       return -1;
> +               }
> +       }
> >
> > Thanks.
>
> Thanks
> barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case
@ 2021-09-21 13:18           ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21 13:18 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 8:29 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 2:26 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 1:20 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > Since the head vmemmap page frame associated with each HugeTLB page is
> > > > reused, we should hide the PG_head flag of tail struct page from the
> > > > user. Add a tese case to check whether it is work properly.
> > > >
> > >
> > > TBH, I am a bit confused. I was thinking about some kernel unit tests to make
> > > sure those kernel APIs touched by this patchset are still working as before.
> > > This userspace test, while certainly useful for checking the content of page
> > > frames as expected, doesn't directly prove things haven't changed.
> > >
> > > In patch 1/4, a couple of APIs have the fixup for the fake head issue.
> > > Do you think a test like the below would be more sensible?
> > > 1. alloc 2MB hugeTLB
> >
> > It is done in main().
> >
> > > 2. get each page frame
> > > 3. apply those APIs in each page frame
> > > 4. Those APIs work completely the same as before.
> >
> > Reading the flags of a page by /proc/kpageflags is done
> > in stable_page_flags(), which has invoked PageHead(),
> > PageTail(), PageCompound() and compound_head().
> > If those APIs work properly, the head page must have
> > 15 and 17 bits set. And tail pages must have 16 and 17
> > bits set but 15 unset.
> >
> > So I think check_page_flags() has done the step 2 to 4.
> > What do you think?
>
> yes. Thanks for your explanation. thereby, I think we just need some doc
> here to explain what it is checking. something like
> /*
>  * pages other than the first page must be tail and shouldn't be head;
>  * this also verifies kernel has correctly set the fake page_head to tail
>  * while hugetlb_free_vmemmap is enabled
>  */

Got it. Will do. Thanks.

> +       for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
> +               read(fd, &pageflags, sizeof(pageflags));
> +               if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
> +                   (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
> +                       close(fd);
> +                       printf("Tail page flags (%lx) is invalid\n", pageflags);
> +                       return -1;
> +               }
> +       }
> >
> > Thanks.
>
> Thanks
> barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-21  0:11           ` Barry Song
@ 2021-09-21 13:46             ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21 13:46 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > compared with the current implementation).
> > > > >
> > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > compound_head() to make it returns the real head struct page when the
> > > > > parameter is the tail struct page but with PG_head flag.
> > > > >
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > ---
> > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > @@ -1606,7 +1606,7 @@
> > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > >                         enabled.
> > > > >                         Allows heavy hugetlb users to free up some more
> > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > >                         Format: { on | off (default) }
> > > > >
> > > > >                         on:  enable the feature
> > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > --- a/include/linux/page-flags.h
> > > > > +++ b/include/linux/page-flags.h
> > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > >
> > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > >
> > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > +
> > > > > +/*
> > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > + * to distinguish between those two different types of page structs so that
> > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > + * the tail page struct but with PG_head.
> > > > > + *
> > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > + */
> > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > +{
> > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > +               return page;
> > > > > +
> > > > > +       /*
> > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > +        * cold cacheline in some cases.
> > > > > +        */
> > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > +               /*
> > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > +                * because the @page is a compound page composed with at least
> > > > > +                * two contiguous pages.
> > > > > +                */
> > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > +
> > > > > +               if (likely(head & 1))
> > > > > +                       return (const struct page *)(head - 1);
> > > > > +       }
> > > > > +
> > > > > +       return page;
> > > > > +}
> > > > > +#else
> > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > +{
> > > > > +       return page;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > >  {
> > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > >
> > > > >         if (unlikely(head & 1))
> > > > >                 return head - 1;
> > > > > -       return (unsigned long)page;
> > > > > +       return (unsigned long)page_head_if_fake(page);
> > > >
> > > > hard to read. page_head_if_fake,  what is the other side of
> > > > page_head_if_not_fake?
> > >
> > > 1) return itself if the @page is not a fake head page.
> > > 2) return head page if @page is a fake head page.
> > >
> > > So I want to express that page_head_if_fake returns a
> > > head page only and only if the parameter of @page is a
> > > fake head page. Otherwise, it returns itself.
> > >
> > > > I would expect something like
> > > > page_to_page_head()
> > > > or
> > > > get_page_head()
> > > >
> > >
> > > Those names seem to be not appropriate as well, because
> > > its functionality does not make sure it can return a head
> > > page. If the parameter is a head page, it definitely
> > > returns a head page, otherwise, it may return itself which
> > > may be a tail page.
> > >
> > > From this point of view, I still prefer page_head_if_fake.
> > >
> > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > one side only.
> > >
> > > If others have any ideas, comments are welcome.
> > >
> > > >
> > > > >  }
> > > > >
> > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > >
> > > > >  static __always_inline int PageTail(struct page *page)
> > > > >  {
> > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > +              page_head_if_fake(page) != page;
> > > >
> > > > i would expect a wrapper like:
> > > > page_is_fake_head()
> > >
> > > Good point. Will do.
> > >
> > > >
> > > > and the above page_to_page_head() can leverage the wrapper.
> > > > here too.
> > > >
> > > > >  }
> > > > >
> > > > >  static __always_inline int PageCompound(struct page *page)
> > > > >  {
> > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > +              READ_ONCE(page->compound_head) & 1;
> > > >
> > > > hard to read. could it be something like the below?
> > > > return PageHead(page) || PageTail(page);
> > > >
> > > > or do we really need to change this function? even a fake head still has
> > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > is still a pagecompound, right?
> > >
> > > Right. PageCompound() can not be changed.  It is odd but
> > > efficient because calling page_head_if_fake is eliminated.
> > > So I select performance not readability. I'm not sure if it's
> > > worth it.
> >
> > In order to improve readability, I'll introduce 3 helpers as follows.
> >
> > 1) page_head_or_fake(), which returns true for the head page
> >    or fake head page.
> > 2) page_head_is_fake(), which returns true for fake head page.
> > 3) page_tail_not_fake_head(), which returns true for the tail page
> >    except the fake head page.
> >
> > In the end, PageHead(), PageTail() and PageCompound() become
> > the following.
> >
> > static __always_inline int PageHead(struct page *page)
> > {
> >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > }
> >
> > static __always_inline int PageTail(struct page *page)
> > {
> >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > }
> >
> > static __always_inline int PageCompound(struct page *page)
> > {
> >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > }
> >
> > Do those look more readable?
> >
>
> still not good enough. After a second thought, page_head_if_fake seems
> to have the best performance though this function returns an odd value.
> i just made a little bit refine on your code in doc:

Right. page_head_if_fake is the choice for performance.

>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 2c0d11e71e26..240c2fca13c7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
>   * compound_head() can return the real head page struct when the parameter is
>   * the tail page struct but with PG_head.
>   *
> - * The page_head_if_fake() returns the real head page struct iff the @page may
> - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> + * The page_head_if_fake() returns the real head page struct if the @page is
> + * fake page_head, otherwise, returns @page which can either be a true page_
> + * head or tail.
>   */

Good annotation.

>  static __always_inline const struct page *page_head_if_fake(const
> struct page *page)
>  {
> @@ -226,6 +227,12 @@ static __always_inline const struct page
> *page_head_if_fake(const struct page *p
>
>         return page;
>  }
> +
> +static __always_inline const struct page *page_is_fake_head(const
> struct page *page)
> +{
> +       return page_head_if_fake(page) != page;
> +}
> +
>  #else
>  static __always_inline const struct page *page_head_if_fake(const
> struct page *page)
>  {
> @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> struct page *page)
>  static __always_inline int PageTail(struct page *page)
>  {
>         return READ_ONCE(page->compound_head) & 1 ||
> -              page_head_if_fake(page) != page;
> +              page_is_fake_head(page);
>  }

Yeah, this makes PageTail more readable. In your previous thread,
you proposed that why not use PageTail in PageCompound directly
to improve code readability. So I want to introduce 2 more helpers
besides page_is_fake_head().

static __always_inline int page_tail_not_fake_head(struct page *page)
{
    return READ_ONCE(page->compound_head) & 1;
}

static __always_inline int page_head_or_fake(struct page *page)
{
    return test_bit(PG_head, &page->flags);
}

Then PageTail() and PageCompound() change to the following.

static __always_inline int PageTail(struct page *page)
{
    return page_tail_not_fake_head(page) || page_is_fake_head(page);
}

static __always_inline int PageCompound(struct page *page)
{
    return page_head_or_fake(page) || page_tail_not_fake_head(page);
}

From the point of names of helpers, they act as self-annotation.
So I think PageTail and PageCompound become readable
as well. But you said "still not good enough". Is it because of
the names of helpers or introducing more complexity?

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-21 13:46             ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-21 13:46 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > compared with the current implementation).
> > > > >
> > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > compound_head() to make it returns the real head struct page when the
> > > > > parameter is the tail struct page but with PG_head flag.
> > > > >
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > ---
> > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > @@ -1606,7 +1606,7 @@
> > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > >                         enabled.
> > > > >                         Allows heavy hugetlb users to free up some more
> > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > >                         Format: { on | off (default) }
> > > > >
> > > > >                         on:  enable the feature
> > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > --- a/include/linux/page-flags.h
> > > > > +++ b/include/linux/page-flags.h
> > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > >
> > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > >
> > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > +
> > > > > +/*
> > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > + * to distinguish between those two different types of page structs so that
> > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > + * the tail page struct but with PG_head.
> > > > > + *
> > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > + */
> > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > +{
> > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > +               return page;
> > > > > +
> > > > > +       /*
> > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > +        * cold cacheline in some cases.
> > > > > +        */
> > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > +               /*
> > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > +                * because the @page is a compound page composed with at least
> > > > > +                * two contiguous pages.
> > > > > +                */
> > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > +
> > > > > +               if (likely(head & 1))
> > > > > +                       return (const struct page *)(head - 1);
> > > > > +       }
> > > > > +
> > > > > +       return page;
> > > > > +}
> > > > > +#else
> > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > +{
> > > > > +       return page;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > >  {
> > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > >
> > > > >         if (unlikely(head & 1))
> > > > >                 return head - 1;
> > > > > -       return (unsigned long)page;
> > > > > +       return (unsigned long)page_head_if_fake(page);
> > > >
> > > > hard to read. page_head_if_fake,  what is the other side of
> > > > page_head_if_not_fake?
> > >
> > > 1) return itself if the @page is not a fake head page.
> > > 2) return head page if @page is a fake head page.
> > >
> > > So I want to express that page_head_if_fake returns a
> > > head page only and only if the parameter of @page is a
> > > fake head page. Otherwise, it returns itself.
> > >
> > > > I would expect something like
> > > > page_to_page_head()
> > > > or
> > > > get_page_head()
> > > >
> > >
> > > Those names seem to be not appropriate as well, because
> > > its functionality does not make sure it can return a head
> > > page. If the parameter is a head page, it definitely
> > > returns a head page, otherwise, it may return itself which
> > > may be a tail page.
> > >
> > > From this point of view, I still prefer page_head_if_fake.
> > >
> > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > one side only.
> > >
> > > If others have any ideas, comments are welcome.
> > >
> > > >
> > > > >  }
> > > > >
> > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > >
> > > > >  static __always_inline int PageTail(struct page *page)
> > > > >  {
> > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > +              page_head_if_fake(page) != page;
> > > >
> > > > i would expect a wrapper like:
> > > > page_is_fake_head()
> > >
> > > Good point. Will do.
> > >
> > > >
> > > > and the above page_to_page_head() can leverage the wrapper.
> > > > here too.
> > > >
> > > > >  }
> > > > >
> > > > >  static __always_inline int PageCompound(struct page *page)
> > > > >  {
> > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > +              READ_ONCE(page->compound_head) & 1;
> > > >
> > > > hard to read. could it be something like the below?
> > > > return PageHead(page) || PageTail(page);
> > > >
> > > > or do we really need to change this function? even a fake head still has
> > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > is still a pagecompound, right?
> > >
> > > Right. PageCompound() can not be changed.  It is odd but
> > > efficient because calling page_head_if_fake is eliminated.
> > > So I select performance not readability. I'm not sure if it's
> > > worth it.
> >
> > In order to improve readability, I'll introduce 3 helpers as follows.
> >
> > 1) page_head_or_fake(), which returns true for the head page
> >    or fake head page.
> > 2) page_head_is_fake(), which returns true for fake head page.
> > 3) page_tail_not_fake_head(), which returns true for the tail page
> >    except the fake head page.
> >
> > In the end, PageHead(), PageTail() and PageCompound() become
> > the following.
> >
> > static __always_inline int PageHead(struct page *page)
> > {
> >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > }
> >
> > static __always_inline int PageTail(struct page *page)
> > {
> >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > }
> >
> > static __always_inline int PageCompound(struct page *page)
> > {
> >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > }
> >
> > Do those look more readable?
> >
>
> still not good enough. After a second thought, page_head_if_fake seems
> to have the best performance though this function returns an odd value.
> i just made a little bit refine on your code in doc:

Right. page_head_if_fake is the choice for performance.

>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 2c0d11e71e26..240c2fca13c7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
>   * compound_head() can return the real head page struct when the parameter is
>   * the tail page struct but with PG_head.
>   *
> - * The page_head_if_fake() returns the real head page struct iff the @page may
> - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> + * The page_head_if_fake() returns the real head page struct if the @page is
> + * fake page_head, otherwise, returns @page which can either be a true page_
> + * head or tail.
>   */

Good annotation.

>  static __always_inline const struct page *page_head_if_fake(const
> struct page *page)
>  {
> @@ -226,6 +227,12 @@ static __always_inline const struct page
> *page_head_if_fake(const struct page *p
>
>         return page;
>  }
> +
> +static __always_inline const struct page *page_is_fake_head(const
> struct page *page)
> +{
> +       return page_head_if_fake(page) != page;
> +}
> +
>  #else
>  static __always_inline const struct page *page_head_if_fake(const
> struct page *page)
>  {
> @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> struct page *page)
>  static __always_inline int PageTail(struct page *page)
>  {
>         return READ_ONCE(page->compound_head) & 1 ||
> -              page_head_if_fake(page) != page;
> +              page_is_fake_head(page);
>  }

Yeah, this makes PageTail more readable. In your previous thread,
you proposed that why not use PageTail in PageCompound directly
to improve code readability. So I want to introduce 2 more helpers
besides page_is_fake_head().

static __always_inline int page_tail_not_fake_head(struct page *page)
{
    return READ_ONCE(page->compound_head) & 1;
}

static __always_inline int page_head_or_fake(struct page *page)
{
    return test_bit(PG_head, &page->flags);
}

Then PageTail() and PageCompound() change to the following.

static __always_inline int PageTail(struct page *page)
{
    return page_tail_not_fake_head(page) || page_is_fake_head(page);
}

static __always_inline int PageCompound(struct page *page)
{
    return page_head_or_fake(page) || page_tail_not_fake_head(page);
}

From the point of names of helpers, they act as self-annotation.
So I think PageTail and PageCompound become readable
as well. But you said "still not good enough". Is it because of
the names of helpers or introducing more complexity?

Thanks.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-21 13:46             ` Muchun Song
@ 2021-09-21 20:43               ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-21 20:43 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Wed, Sep 22, 2021 at 1:46 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > >
> > > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > >
> > > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > > compared with the current implementation).
> > > > > >
> > > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > > compound_head() to make it returns the real head struct page when the
> > > > > > parameter is the tail struct page but with PG_head flag.
> > > > > >
> > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > > ---
> > > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > > >
> > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > @@ -1606,7 +1606,7 @@
> > > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > >                         enabled.
> > > > > >                         Allows heavy hugetlb users to free up some more
> > > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > >                         Format: { on | off (default) }
> > > > > >
> > > > > >                         on:  enable the feature
> > > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > > --- a/include/linux/page-flags.h
> > > > > > +++ b/include/linux/page-flags.h
> > > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > > >
> > > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > > >
> > > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > > +
> > > > > > +/*
> > > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > > + * to distinguish between those two different types of page structs so that
> > > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > > + * the tail page struct but with PG_head.
> > > > > > + *
> > > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > > + */
> > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > +{
> > > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > > +               return page;
> > > > > > +
> > > > > > +       /*
> > > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > > +        * cold cacheline in some cases.
> > > > > > +        */
> > > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > > +               /*
> > > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > > +                * because the @page is a compound page composed with at least
> > > > > > +                * two contiguous pages.
> > > > > > +                */
> > > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > > +
> > > > > > +               if (likely(head & 1))
> > > > > > +                       return (const struct page *)(head - 1);
> > > > > > +       }
> > > > > > +
> > > > > > +       return page;
> > > > > > +}
> > > > > > +#else
> > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > +{
> > > > > > +       return page;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > > >  {
> > > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > > >
> > > > > >         if (unlikely(head & 1))
> > > > > >                 return head - 1;
> > > > > > -       return (unsigned long)page;
> > > > > > +       return (unsigned long)page_head_if_fake(page);
> > > > >
> > > > > hard to read. page_head_if_fake,  what is the other side of
> > > > > page_head_if_not_fake?
> > > >
> > > > 1) return itself if the @page is not a fake head page.
> > > > 2) return head page if @page is a fake head page.
> > > >
> > > > So I want to express that page_head_if_fake returns a
> > > > head page only and only if the parameter of @page is a
> > > > fake head page. Otherwise, it returns itself.
> > > >
> > > > > I would expect something like
> > > > > page_to_page_head()
> > > > > or
> > > > > get_page_head()
> > > > >
> > > >
> > > > Those names seem to be not appropriate as well, because
> > > > its functionality does not make sure it can return a head
> > > > page. If the parameter is a head page, it definitely
> > > > returns a head page, otherwise, it may return itself which
> > > > may be a tail page.
> > > >
> > > > From this point of view, I still prefer page_head_if_fake.
> > > >
> > > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > > one side only.
> > > >
> > > > If others have any ideas, comments are welcome.
> > > >
> > > > >
> > > > > >  }
> > > > > >
> > > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > > >
> > > > > >  static __always_inline int PageTail(struct page *page)
> > > > > >  {
> > > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > > +              page_head_if_fake(page) != page;
> > > > >
> > > > > i would expect a wrapper like:
> > > > > page_is_fake_head()
> > > >
> > > > Good point. Will do.
> > > >
> > > > >
> > > > > and the above page_to_page_head() can leverage the wrapper.
> > > > > here too.
> > > > >
> > > > > >  }
> > > > > >
> > > > > >  static __always_inline int PageCompound(struct page *page)
> > > > > >  {
> > > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > > +              READ_ONCE(page->compound_head) & 1;
> > > > >
> > > > > hard to read. could it be something like the below?
> > > > > return PageHead(page) || PageTail(page);
> > > > >
> > > > > or do we really need to change this function? even a fake head still has
> > > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > > is still a pagecompound, right?
> > > >
> > > > Right. PageCompound() can not be changed.  It is odd but
> > > > efficient because calling page_head_if_fake is eliminated.
> > > > So I select performance not readability. I'm not sure if it's
> > > > worth it.
> > >
> > > In order to improve readability, I'll introduce 3 helpers as follows.
> > >
> > > 1) page_head_or_fake(), which returns true for the head page
> > >    or fake head page.
> > > 2) page_head_is_fake(), which returns true for fake head page.
> > > 3) page_tail_not_fake_head(), which returns true for the tail page
> > >    except the fake head page.
> > >
> > > In the end, PageHead(), PageTail() and PageCompound() become
> > > the following.
> > >
> > > static __always_inline int PageHead(struct page *page)
> > > {
> > >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > > }
> > >
> > > static __always_inline int PageTail(struct page *page)
> > > {
> > >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > > }
> > >
> > > static __always_inline int PageCompound(struct page *page)
> > > {
> > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > }
> > >
> > > Do those look more readable?
> > >
> >
> > still not good enough. After a second thought, page_head_if_fake seems
> > to have the best performance though this function returns an odd value.
> > i just made a little bit refine on your code in doc:
>
> Right. page_head_if_fake is the choice for performance.
>
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 2c0d11e71e26..240c2fca13c7 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
> >   * compound_head() can return the real head page struct when the parameter is
> >   * the tail page struct but with PG_head.
> >   *
> > - * The page_head_if_fake() returns the real head page struct iff the @page may
> > - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > + * The page_head_if_fake() returns the real head page struct if the @page is
> > + * fake page_head, otherwise, returns @page which can either be a true page_
> > + * head or tail.
> >   */
>
> Good annotation.
>
> >  static __always_inline const struct page *page_head_if_fake(const
> > struct page *page)
> >  {
> > @@ -226,6 +227,12 @@ static __always_inline const struct page
> > *page_head_if_fake(const struct page *p
> >
> >         return page;
> >  }
> > +
> > +static __always_inline const struct page *page_is_fake_head(const
> > struct page *page)
> > +{
> > +       return page_head_if_fake(page) != page;
> > +}
> > +
> >  #else
> >  static __always_inline const struct page *page_head_if_fake(const
> > struct page *page)
> >  {
> > @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> > struct page *page)
> >  static __always_inline int PageTail(struct page *page)
> >  {
> >         return READ_ONCE(page->compound_head) & 1 ||
> > -              page_head_if_fake(page) != page;
> > +              page_is_fake_head(page);
> >  }
>
> Yeah, this makes PageTail more readable. In your previous thread,
> you proposed that why not use PageTail in PageCompound directly
> to improve code readability. So I want to introduce 2 more helpers
> besides page_is_fake_head().
>
> static __always_inline int page_tail_not_fake_head(struct page *page)
> {
>     return READ_ONCE(page->compound_head) & 1;
> }
>
> static __always_inline int page_head_or_fake(struct page *page)
> {
>     return test_bit(PG_head, &page->flags);
> }
>
> Then PageTail() and PageCompound() change to the following.
>
> static __always_inline int PageTail(struct page *page)
> {
>     return page_tail_not_fake_head(page) || page_is_fake_head(page);
> }
>
> static __always_inline int PageCompound(struct page *page)
> {
>     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> }
>
> From the point of names of helpers, they act as self-annotation.
> So I think PageTail and PageCompound become readable
> as well. But you said "still not good enough". Is it because of
> the names of helpers or introducing more complexity?

I really don't think it is worth this complexity. If there is anything to make
the code more readable, I would rename page_head_if_fake() to
page_fixed_dup_head().

this function fixes up the page:
1. if the page is a fake head, we need to return its true head (things
get fixed.)
2. if the page is not a fake head, in other words, it is either true
head or tail, no need to fix anything.

>
> Thanks.

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-21 20:43               ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-21 20:43 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Wed, Sep 22, 2021 at 1:46 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > >
> > > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > >
> > > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > > compared with the current implementation).
> > > > > >
> > > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > > compound_head() to make it returns the real head struct page when the
> > > > > > parameter is the tail struct page but with PG_head flag.
> > > > > >
> > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > > ---
> > > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > > >
> > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > @@ -1606,7 +1606,7 @@
> > > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > >                         enabled.
> > > > > >                         Allows heavy hugetlb users to free up some more
> > > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > >                         Format: { on | off (default) }
> > > > > >
> > > > > >                         on:  enable the feature
> > > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > > --- a/include/linux/page-flags.h
> > > > > > +++ b/include/linux/page-flags.h
> > > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > > >
> > > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > > >
> > > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > > +
> > > > > > +/*
> > > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > > + * to distinguish between those two different types of page structs so that
> > > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > > + * the tail page struct but with PG_head.
> > > > > > + *
> > > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > > + */
> > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > +{
> > > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > > +               return page;
> > > > > > +
> > > > > > +       /*
> > > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > > +        * cold cacheline in some cases.
> > > > > > +        */
> > > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > > +               /*
> > > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > > +                * because the @page is a compound page composed with at least
> > > > > > +                * two contiguous pages.
> > > > > > +                */
> > > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > > +
> > > > > > +               if (likely(head & 1))
> > > > > > +                       return (const struct page *)(head - 1);
> > > > > > +       }
> > > > > > +
> > > > > > +       return page;
> > > > > > +}
> > > > > > +#else
> > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > +{
> > > > > > +       return page;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > > >  {
> > > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > > >
> > > > > >         if (unlikely(head & 1))
> > > > > >                 return head - 1;
> > > > > > -       return (unsigned long)page;
> > > > > > +       return (unsigned long)page_head_if_fake(page);
> > > > >
> > > > > hard to read. page_head_if_fake,  what is the other side of
> > > > > page_head_if_not_fake?
> > > >
> > > > 1) return itself if the @page is not a fake head page.
> > > > 2) return head page if @page is a fake head page.
> > > >
> > > > So I want to express that page_head_if_fake returns a
> > > > head page only and only if the parameter of @page is a
> > > > fake head page. Otherwise, it returns itself.
> > > >
> > > > > I would expect something like
> > > > > page_to_page_head()
> > > > > or
> > > > > get_page_head()
> > > > >
> > > >
> > > > Those names seem to be not appropriate as well, because
> > > > its functionality does not make sure it can return a head
> > > > page. If the parameter is a head page, it definitely
> > > > returns a head page, otherwise, it may return itself which
> > > > may be a tail page.
> > > >
> > > > From this point of view, I still prefer page_head_if_fake.
> > > >
> > > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > > one side only.
> > > >
> > > > If others have any ideas, comments are welcome.
> > > >
> > > > >
> > > > > >  }
> > > > > >
> > > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > > >
> > > > > >  static __always_inline int PageTail(struct page *page)
> > > > > >  {
> > > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > > +              page_head_if_fake(page) != page;
> > > > >
> > > > > i would expect a wrapper like:
> > > > > page_is_fake_head()
> > > >
> > > > Good point. Will do.
> > > >
> > > > >
> > > > > and the above page_to_page_head() can leverage the wrapper.
> > > > > here too.
> > > > >
> > > > > >  }
> > > > > >
> > > > > >  static __always_inline int PageCompound(struct page *page)
> > > > > >  {
> > > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > > +              READ_ONCE(page->compound_head) & 1;
> > > > >
> > > > > hard to read. could it be something like the below?
> > > > > return PageHead(page) || PageTail(page);
> > > > >
> > > > > or do we really need to change this function? even a fake head still has
> > > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > > is still a pagecompound, right?
> > > >
> > > > Right. PageCompound() can not be changed.  It is odd but
> > > > efficient because calling page_head_if_fake is eliminated.
> > > > So I select performance not readability. I'm not sure if it's
> > > > worth it.
> > >
> > > In order to improve readability, I'll introduce 3 helpers as follows.
> > >
> > > 1) page_head_or_fake(), which returns true for the head page
> > >    or fake head page.
> > > 2) page_head_is_fake(), which returns true for fake head page.
> > > 3) page_tail_not_fake_head(), which returns true for the tail page
> > >    except the fake head page.
> > >
> > > In the end, PageHead(), PageTail() and PageCompound() become
> > > the following.
> > >
> > > static __always_inline int PageHead(struct page *page)
> > > {
> > >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > > }
> > >
> > > static __always_inline int PageTail(struct page *page)
> > > {
> > >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > > }
> > >
> > > static __always_inline int PageCompound(struct page *page)
> > > {
> > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > }
> > >
> > > Do those look more readable?
> > >
> >
> > still not good enough. After a second thought, page_head_if_fake seems
> > to have the best performance though this function returns an odd value.
> > i just made a little bit refine on your code in doc:
>
> Right. page_head_if_fake is the choice for performance.
>
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 2c0d11e71e26..240c2fca13c7 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
> >   * compound_head() can return the real head page struct when the parameter is
> >   * the tail page struct but with PG_head.
> >   *
> > - * The page_head_if_fake() returns the real head page struct iff the @page may
> > - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > + * The page_head_if_fake() returns the real head page struct if the @page is
> > + * fake page_head, otherwise, returns @page which can either be a true page_
> > + * head or tail.
> >   */
>
> Good annotation.
>
> >  static __always_inline const struct page *page_head_if_fake(const
> > struct page *page)
> >  {
> > @@ -226,6 +227,12 @@ static __always_inline const struct page
> > *page_head_if_fake(const struct page *p
> >
> >         return page;
> >  }
> > +
> > +static __always_inline const struct page *page_is_fake_head(const
> > struct page *page)
> > +{
> > +       return page_head_if_fake(page) != page;
> > +}
> > +
> >  #else
> >  static __always_inline const struct page *page_head_if_fake(const
> > struct page *page)
> >  {
> > @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> > struct page *page)
> >  static __always_inline int PageTail(struct page *page)
> >  {
> >         return READ_ONCE(page->compound_head) & 1 ||
> > -              page_head_if_fake(page) != page;
> > +              page_is_fake_head(page);
> >  }
>
> Yeah, this makes PageTail more readable. In your previous thread,
> you proposed that why not use PageTail in PageCompound directly
> to improve code readability. So I want to introduce 2 more helpers
> besides page_is_fake_head().
>
> static __always_inline int page_tail_not_fake_head(struct page *page)
> {
>     return READ_ONCE(page->compound_head) & 1;
> }
>
> static __always_inline int page_head_or_fake(struct page *page)
> {
>     return test_bit(PG_head, &page->flags);
> }
>
> Then PageTail() and PageCompound() change to the following.
>
> static __always_inline int PageTail(struct page *page)
> {
>     return page_tail_not_fake_head(page) || page_is_fake_head(page);
> }
>
> static __always_inline int PageCompound(struct page *page)
> {
>     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> }
>
> From the point of names of helpers, they act as self-annotation.
> So I think PageTail and PageCompound become readable
> as well. But you said "still not good enough". Is it because of
> the names of helpers or introducing more complexity?

I really don't think it is worth this complexity. If there is anything to make
the code more readable, I would rename page_head_if_fake() to
page_fixed_dup_head().

this function fixes up the page:
1. if the page is a fake head, we need to return its true head (things
get fixed.)
2. if the page is not a fake head, in other words, it is either true
head or tail, no need to fix anything.

>
> Thanks.

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-21 20:43               ` Barry Song
@ 2021-09-22  2:38                 ` Muchun Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-22  2:38 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Wed, Sep 22, 2021 at 4:43 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Wed, Sep 22, 2021 at 1:46 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > > >
> > > > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > > > compared with the current implementation).
> > > > > > >
> > > > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > > > compound_head() to make it returns the real head struct page when the
> > > > > > > parameter is the tail struct page but with PG_head flag.
> > > > > > >
> > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > > > ---
> > > > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > > > >
> > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > @@ -1606,7 +1606,7 @@
> > > > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > >                         enabled.
> > > > > > >                         Allows heavy hugetlb users to free up some more
> > > > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > >                         Format: { on | off (default) }
> > > > > > >
> > > > > > >                         on:  enable the feature
> > > > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > > > --- a/include/linux/page-flags.h
> > > > > > > +++ b/include/linux/page-flags.h
> > > > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > > > >
> > > > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > > > >
> > > > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > > > + * to distinguish between those two different types of page structs so that
> > > > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > > > + * the tail page struct but with PG_head.
> > > > > > > + *
> > > > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > > > + */
> > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > +{
> > > > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > > > +               return page;
> > > > > > > +
> > > > > > > +       /*
> > > > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > > > +        * cold cacheline in some cases.
> > > > > > > +        */
> > > > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > > > +               /*
> > > > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > > > +                * because the @page is a compound page composed with at least
> > > > > > > +                * two contiguous pages.
> > > > > > > +                */
> > > > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > > > +
> > > > > > > +               if (likely(head & 1))
> > > > > > > +                       return (const struct page *)(head - 1);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return page;
> > > > > > > +}
> > > > > > > +#else
> > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > +{
> > > > > > > +       return page;
> > > > > > > +}
> > > > > > > +#endif
> > > > > > > +
> > > > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > > > >  {
> > > > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > > > >
> > > > > > >         if (unlikely(head & 1))
> > > > > > >                 return head - 1;
> > > > > > > -       return (unsigned long)page;
> > > > > > > +       return (unsigned long)page_head_if_fake(page);
> > > > > >
> > > > > > hard to read. page_head_if_fake,  what is the other side of
> > > > > > page_head_if_not_fake?
> > > > >
> > > > > 1) return itself if the @page is not a fake head page.
> > > > > 2) return head page if @page is a fake head page.
> > > > >
> > > > > So I want to express that page_head_if_fake returns a
> > > > > head page only and only if the parameter of @page is a
> > > > > fake head page. Otherwise, it returns itself.
> > > > >
> > > > > > I would expect something like
> > > > > > page_to_page_head()
> > > > > > or
> > > > > > get_page_head()
> > > > > >
> > > > >
> > > > > Those names seem to be not appropriate as well, because
> > > > > its functionality does not make sure it can return a head
> > > > > page. If the parameter is a head page, it definitely
> > > > > returns a head page, otherwise, it may return itself which
> > > > > may be a tail page.
> > > > >
> > > > > From this point of view, I still prefer page_head_if_fake.
> > > > >
> > > > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > > > one side only.
> > > > >
> > > > > If others have any ideas, comments are welcome.
> > > > >
> > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > > > >
> > > > > > >  static __always_inline int PageTail(struct page *page)
> > > > > > >  {
> > > > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > > > +              page_head_if_fake(page) != page;
> > > > > >
> > > > > > i would expect a wrapper like:
> > > > > > page_is_fake_head()
> > > > >
> > > > > Good point. Will do.
> > > > >
> > > > > >
> > > > > > and the above page_to_page_head() can leverage the wrapper.
> > > > > > here too.
> > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >  static __always_inline int PageCompound(struct page *page)
> > > > > > >  {
> > > > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > > > +              READ_ONCE(page->compound_head) & 1;
> > > > > >
> > > > > > hard to read. could it be something like the below?
> > > > > > return PageHead(page) || PageTail(page);
> > > > > >
> > > > > > or do we really need to change this function? even a fake head still has
> > > > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > > > is still a pagecompound, right?
> > > > >
> > > > > Right. PageCompound() can not be changed.  It is odd but
> > > > > efficient because calling page_head_if_fake is eliminated.
> > > > > So I select performance not readability. I'm not sure if it's
> > > > > worth it.
> > > >
> > > > In order to improve readability, I'll introduce 3 helpers as follows.
> > > >
> > > > 1) page_head_or_fake(), which returns true for the head page
> > > >    or fake head page.
> > > > 2) page_head_is_fake(), which returns true for fake head page.
> > > > 3) page_tail_not_fake_head(), which returns true for the tail page
> > > >    except the fake head page.
> > > >
> > > > In the end, PageHead(), PageTail() and PageCompound() become
> > > > the following.
> > > >
> > > > static __always_inline int PageHead(struct page *page)
> > > > {
> > > >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > > > }
> > > >
> > > > static __always_inline int PageTail(struct page *page)
> > > > {
> > > >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > > > }
> > > >
> > > > static __always_inline int PageCompound(struct page *page)
> > > > {
> > > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > > }
> > > >
> > > > Do those look more readable?
> > > >
> > >
> > > still not good enough. After a second thought, page_head_if_fake seems
> > > to have the best performance though this function returns an odd value.
> > > i just made a little bit refine on your code in doc:
> >
> > Right. page_head_if_fake is the choice for performance.
> >
> > >
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 2c0d11e71e26..240c2fca13c7 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
> > >   * compound_head() can return the real head page struct when the parameter is
> > >   * the tail page struct but with PG_head.
> > >   *
> > > - * The page_head_if_fake() returns the real head page struct iff the @page may
> > > - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > + * The page_head_if_fake() returns the real head page struct if the @page is
> > > + * fake page_head, otherwise, returns @page which can either be a true page_
> > > + * head or tail.
> > >   */
> >
> > Good annotation.
> >
> > >  static __always_inline const struct page *page_head_if_fake(const
> > > struct page *page)
> > >  {
> > > @@ -226,6 +227,12 @@ static __always_inline const struct page
> > > *page_head_if_fake(const struct page *p
> > >
> > >         return page;
> > >  }
> > > +
> > > +static __always_inline const struct page *page_is_fake_head(const
> > > struct page *page)
> > > +{
> > > +       return page_head_if_fake(page) != page;
> > > +}
> > > +
> > >  #else
> > >  static __always_inline const struct page *page_head_if_fake(const
> > > struct page *page)
> > >  {
> > > @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> > > struct page *page)
> > >  static __always_inline int PageTail(struct page *page)
> > >  {
> > >         return READ_ONCE(page->compound_head) & 1 ||
> > > -              page_head_if_fake(page) != page;
> > > +              page_is_fake_head(page);
> > >  }
> >
> > Yeah, this makes PageTail more readable. In your previous thread,
> > you proposed that why not use PageTail in PageCompound directly
> > to improve code readability. So I want to introduce 2 more helpers
> > besides page_is_fake_head().
> >
> > static __always_inline int page_tail_not_fake_head(struct page *page)
> > {
> >     return READ_ONCE(page->compound_head) & 1;
> > }
> >
> > static __always_inline int page_head_or_fake(struct page *page)
> > {
> >     return test_bit(PG_head, &page->flags);
> > }
> >
> > Then PageTail() and PageCompound() change to the following.
> >
> > static __always_inline int PageTail(struct page *page)
> > {
> >     return page_tail_not_fake_head(page) || page_is_fake_head(page);
> > }
> >
> > static __always_inline int PageCompound(struct page *page)
> > {
> >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > }
> >
> > From the point of names of helpers, they act as self-annotation.
> > So I think PageTail and PageCompound become readable
> > as well. But you said "still not good enough". Is it because of
> > the names of helpers or introducing more complexity?
>
> I really don't think it is worth this complexity. If there is anything to make

Got it.

> the code more readable, I would rename page_head_if_fake() to
> page_fixed_dup_head().

Here means page_fixed_up_head, right? Is it a typo?

Thanks.

>
> this function fixes up the page:
> 1. if the page is a fake head, we need to return its true head (things
> get fixed.)
> 2. if the page is not a fake head, in other words, it is either true
> head or tail, no need to fix anything.
>
> >
> > Thanks.
>
> Thanks
> barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-22  2:38                 ` Muchun Song
  0 siblings, 0 replies; 47+ messages in thread
From: Muchun Song @ 2021-09-22  2:38 UTC (permalink / raw)
  To: Barry Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Wed, Sep 22, 2021 at 4:43 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Wed, Sep 22, 2021 at 1:46 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > > >
> > > > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > > > compared with the current implementation).
> > > > > > >
> > > > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > > > compound_head() to make it returns the real head struct page when the
> > > > > > > parameter is the tail struct page but with PG_head flag.
> > > > > > >
> > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > > > ---
> > > > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > > > >
> > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > @@ -1606,7 +1606,7 @@
> > > > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > >                         enabled.
> > > > > > >                         Allows heavy hugetlb users to free up some more
> > > > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > >                         Format: { on | off (default) }
> > > > > > >
> > > > > > >                         on:  enable the feature
> > > > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > > > --- a/include/linux/page-flags.h
> > > > > > > +++ b/include/linux/page-flags.h
> > > > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > > > >
> > > > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > > > >
> > > > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > > > + * to distinguish between those two different types of page structs so that
> > > > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > > > + * the tail page struct but with PG_head.
> > > > > > > + *
> > > > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > > > + */
> > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > +{
> > > > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > > > +               return page;
> > > > > > > +
> > > > > > > +       /*
> > > > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > > > +        * cold cacheline in some cases.
> > > > > > > +        */
> > > > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > > > +               /*
> > > > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > > > +                * because the @page is a compound page composed with at least
> > > > > > > +                * two contiguous pages.
> > > > > > > +                */
> > > > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > > > +
> > > > > > > +               if (likely(head & 1))
> > > > > > > +                       return (const struct page *)(head - 1);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return page;
> > > > > > > +}
> > > > > > > +#else
> > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > +{
> > > > > > > +       return page;
> > > > > > > +}
> > > > > > > +#endif
> > > > > > > +
> > > > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > > > >  {
> > > > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > > > >
> > > > > > >         if (unlikely(head & 1))
> > > > > > >                 return head - 1;
> > > > > > > -       return (unsigned long)page;
> > > > > > > +       return (unsigned long)page_head_if_fake(page);
> > > > > >
> > > > > > hard to read. page_head_if_fake,  what is the other side of
> > > > > > page_head_if_not_fake?
> > > > >
> > > > > 1) return itself if the @page is not a fake head page.
> > > > > 2) return head page if @page is a fake head page.
> > > > >
> > > > > So I want to express that page_head_if_fake returns a
> > > > > head page only and only if the parameter of @page is a
> > > > > fake head page. Otherwise, it returns itself.
> > > > >
> > > > > > I would expect something like
> > > > > > page_to_page_head()
> > > > > > or
> > > > > > get_page_head()
> > > > > >
> > > > >
> > > > > Those names seem to be not appropriate as well, because
> > > > > its functionality does not make sure it can return a head
> > > > > page. If the parameter is a head page, it definitely
> > > > > returns a head page, otherwise, it may return itself which
> > > > > may be a tail page.
> > > > >
> > > > > From this point of view, I still prefer page_head_if_fake.
> > > > >
> > > > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > > > one side only.
> > > > >
> > > > > If others have any ideas, comments are welcome.
> > > > >
> > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > > > >
> > > > > > >  static __always_inline int PageTail(struct page *page)
> > > > > > >  {
> > > > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > > > +              page_head_if_fake(page) != page;
> > > > > >
> > > > > > i would expect a wrapper like:
> > > > > > page_is_fake_head()
> > > > >
> > > > > Good point. Will do.
> > > > >
> > > > > >
> > > > > > and the above page_to_page_head() can leverage the wrapper.
> > > > > > here too.
> > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >  static __always_inline int PageCompound(struct page *page)
> > > > > > >  {
> > > > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > > > +              READ_ONCE(page->compound_head) & 1;
> > > > > >
> > > > > > hard to read. could it be something like the below?
> > > > > > return PageHead(page) || PageTail(page);
> > > > > >
> > > > > > or do we really need to change this function? even a fake head still has
> > > > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > > > is still a pagecompound, right?
> > > > >
> > > > > Right. PageCompound() can not be changed.  It is odd but
> > > > > efficient because calling page_head_if_fake is eliminated.
> > > > > So I select performance not readability. I'm not sure if it's
> > > > > worth it.
> > > >
> > > > In order to improve readability, I'll introduce 3 helpers as follows.
> > > >
> > > > 1) page_head_or_fake(), which returns true for the head page
> > > >    or fake head page.
> > > > 2) page_head_is_fake(), which returns true for fake head page.
> > > > 3) page_tail_not_fake_head(), which returns true for the tail page
> > > >    except the fake head page.
> > > >
> > > > In the end, PageHead(), PageTail() and PageCompound() become
> > > > the following.
> > > >
> > > > static __always_inline int PageHead(struct page *page)
> > > > {
> > > >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > > > }
> > > >
> > > > static __always_inline int PageTail(struct page *page)
> > > > {
> > > >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > > > }
> > > >
> > > > static __always_inline int PageCompound(struct page *page)
> > > > {
> > > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > > }
> > > >
> > > > Do those look more readable?
> > > >
> > >
> > > still not good enough. After a second thought, page_head_if_fake seems
> > > to have the best performance though this function returns an odd value.
> > > i just made a little bit refine on your code in doc:
> >
> > Right. page_head_if_fake is the choice for performance.
> >
> > >
> > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > index 2c0d11e71e26..240c2fca13c7 100644
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
> > >   * compound_head() can return the real head page struct when the parameter is
> > >   * the tail page struct but with PG_head.
> > >   *
> > > - * The page_head_if_fake() returns the real head page struct iff the @page may
> > > - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > + * The page_head_if_fake() returns the real head page struct if the @page is
> > > + * fake page_head, otherwise, returns @page which can either be a true page_
> > > + * head or tail.
> > >   */
> >
> > Good annotation.
> >
> > >  static __always_inline const struct page *page_head_if_fake(const
> > > struct page *page)
> > >  {
> > > @@ -226,6 +227,12 @@ static __always_inline const struct page
> > > *page_head_if_fake(const struct page *p
> > >
> > >         return page;
> > >  }
> > > +
> > > +static __always_inline const struct page *page_is_fake_head(const
> > > struct page *page)
> > > +{
> > > +       return page_head_if_fake(page) != page;
> > > +}
> > > +
> > >  #else
> > >  static __always_inline const struct page *page_head_if_fake(const
> > > struct page *page)
> > >  {
> > > @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> > > struct page *page)
> > >  static __always_inline int PageTail(struct page *page)
> > >  {
> > >         return READ_ONCE(page->compound_head) & 1 ||
> > > -              page_head_if_fake(page) != page;
> > > +              page_is_fake_head(page);
> > >  }
> >
> > Yeah, this makes PageTail more readable. In your previous thread,
> > you proposed that why not use PageTail in PageCompound directly
> > to improve code readability. So I want to introduce 2 more helpers
> > besides page_is_fake_head().
> >
> > static __always_inline int page_tail_not_fake_head(struct page *page)
> > {
> >     return READ_ONCE(page->compound_head) & 1;
> > }
> >
> > static __always_inline int page_head_or_fake(struct page *page)
> > {
> >     return test_bit(PG_head, &page->flags);
> > }
> >
> > Then PageTail() and PageCompound() change to the following.
> >
> > static __always_inline int PageTail(struct page *page)
> > {
> >     return page_tail_not_fake_head(page) || page_is_fake_head(page);
> > }
> >
> > static __always_inline int PageCompound(struct page *page)
> > {
> >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > }
> >
> > From the point of names of helpers, they act as self-annotation.
> > So I think PageTail and PageCompound become readable
> > as well. But you said "still not good enough". Is it because of
> > the names of helpers or introducing more complexity?
>
> I really don't think it is worth this complexity. If there is anything to make

Got it.

> the code more readable, I would rename page_head_if_fake() to
> page_fixed_dup_head().

Here means page_fixed_up_head, right? Is it a typo?

Thanks.

>
> this function fixes up the page:
> 1. if the page is a fake head, we need to return its true head (things
> get fixed.)
> 2. if the page is not a fake head, in other words, it is either true
> head or tail, no need to fix anything.
>
> >
> > Thanks.
>
> Thanks
> barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
  2021-09-22  2:38                 ` Muchun Song
@ 2021-09-22  7:36                   ` Barry Song
  -1 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-22  7:36 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Wed, Sep 22, 2021 at 2:39 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Wed, Sep 22, 2021 at 4:43 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, Sep 22, 2021 at 1:46 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > >
> > > > > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > > > >
> > > > > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > > > > compared with the current implementation).
> > > > > > > >
> > > > > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > > > > compound_head() to make it returns the real head struct page when the
> > > > > > > > parameter is the tail struct page but with PG_head flag.
> > > > > > > >
> > > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > > > > ---
> > > > > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > @@ -1606,7 +1606,7 @@
> > > > > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > > >                         enabled.
> > > > > > > >                         Allows heavy hugetlb users to free up some more
> > > > > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > > >                         Format: { on | off (default) }
> > > > > > > >
> > > > > > > >                         on:  enable the feature
> > > > > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > > > > --- a/include/linux/page-flags.h
> > > > > > > > +++ b/include/linux/page-flags.h
> > > > > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > > > > >
> > > > > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > > > > >
> > > > > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > > > > + * to distinguish between those two different types of page structs so that
> > > > > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > > > > + * the tail page struct but with PG_head.
> > > > > > > > + *
> > > > > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > > > > + */
> > > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > > +{
> > > > > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > > > > +               return page;
> > > > > > > > +
> > > > > > > > +       /*
> > > > > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > > > > +        * cold cacheline in some cases.
> > > > > > > > +        */
> > > > > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > > > > +               /*
> > > > > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > > > > +                * because the @page is a compound page composed with at least
> > > > > > > > +                * two contiguous pages.
> > > > > > > > +                */
> > > > > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > > > > +
> > > > > > > > +               if (likely(head & 1))
> > > > > > > > +                       return (const struct page *)(head - 1);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return page;
> > > > > > > > +}
> > > > > > > > +#else
> > > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > > +{
> > > > > > > > +       return page;
> > > > > > > > +}
> > > > > > > > +#endif
> > > > > > > > +
> > > > > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > > > > >  {
> > > > > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > > > > >
> > > > > > > >         if (unlikely(head & 1))
> > > > > > > >                 return head - 1;
> > > > > > > > -       return (unsigned long)page;
> > > > > > > > +       return (unsigned long)page_head_if_fake(page);
> > > > > > >
> > > > > > > hard to read. page_head_if_fake,  what is the other side of
> > > > > > > page_head_if_not_fake?
> > > > > >
> > > > > > 1) return itself if the @page is not a fake head page.
> > > > > > 2) return head page if @page is a fake head page.
> > > > > >
> > > > > > So I want to express that page_head_if_fake returns a
> > > > > > head page only and only if the parameter of @page is a
> > > > > > fake head page. Otherwise, it returns itself.
> > > > > >
> > > > > > > I would expect something like
> > > > > > > page_to_page_head()
> > > > > > > or
> > > > > > > get_page_head()
> > > > > > >
> > > > > >
> > > > > > Those names seem to be not appropriate as well, because
> > > > > > its functionality does not make sure it can return a head
> > > > > > page. If the parameter is a head page, it definitely
> > > > > > returns a head page, otherwise, it may return itself which
> > > > > > may be a tail page.
> > > > > >
> > > > > > From this point of view, I still prefer page_head_if_fake.
> > > > > >
> > > > > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > > > > one side only.
> > > > > >
> > > > > > If others have any ideas, comments are welcome.
> > > > > >
> > > > > > >
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > > > > >
> > > > > > > >  static __always_inline int PageTail(struct page *page)
> > > > > > > >  {
> > > > > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > > > > +              page_head_if_fake(page) != page;
> > > > > > >
> > > > > > > i would expect a wrapper like:
> > > > > > > page_is_fake_head()
> > > > > >
> > > > > > Good point. Will do.
> > > > > >
> > > > > > >
> > > > > > > and the above page_to_page_head() can leverage the wrapper.
> > > > > > > here too.
> > > > > > >
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static __always_inline int PageCompound(struct page *page)
> > > > > > > >  {
> > > > > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > > > > +              READ_ONCE(page->compound_head) & 1;
> > > > > > >
> > > > > > > hard to read. could it be something like the below?
> > > > > > > return PageHead(page) || PageTail(page);
> > > > > > >
> > > > > > > or do we really need to change this function? even a fake head still has
> > > > > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > > > > is still a pagecompound, right?
> > > > > >
> > > > > > Right. PageCompound() can not be changed.  It is odd but
> > > > > > efficient because calling page_head_if_fake is eliminated.
> > > > > > So I select performance not readability. I'm not sure if it's
> > > > > > worth it.
> > > > >
> > > > > In order to improve readability, I'll introduce 3 helpers as follows.
> > > > >
> > > > > 1) page_head_or_fake(), which returns true for the head page
> > > > >    or fake head page.
> > > > > 2) page_head_is_fake(), which returns true for fake head page.
> > > > > 3) page_tail_not_fake_head(), which returns true for the tail page
> > > > >    except the fake head page.
> > > > >
> > > > > In the end, PageHead(), PageTail() and PageCompound() become
> > > > > the following.
> > > > >
> > > > > static __always_inline int PageHead(struct page *page)
> > > > > {
> > > > >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > > > > }
> > > > >
> > > > > static __always_inline int PageTail(struct page *page)
> > > > > {
> > > > >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > > > > }
> > > > >
> > > > > static __always_inline int PageCompound(struct page *page)
> > > > > {
> > > > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > > > }
> > > > >
> > > > > Do those look more readable?
> > > > >
> > > >
> > > > still not good enough. After a second thought, page_head_if_fake seems
> > > > to have the best performance though this function returns an odd value.
> > > > i just made a little bit refine on your code in doc:
> > >
> > > Right. page_head_if_fake is the choice for performance.
> > >
> > > >
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index 2c0d11e71e26..240c2fca13c7 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
> > > >   * compound_head() can return the real head page struct when the parameter is
> > > >   * the tail page struct but with PG_head.
> > > >   *
> > > > - * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > + * The page_head_if_fake() returns the real head page struct if the @page is
> > > > + * fake page_head, otherwise, returns @page which can either be a true page_
> > > > + * head or tail.
> > > >   */
> > >
> > > Good annotation.
> > >
> > > >  static __always_inline const struct page *page_head_if_fake(const
> > > > struct page *page)
> > > >  {
> > > > @@ -226,6 +227,12 @@ static __always_inline const struct page
> > > > *page_head_if_fake(const struct page *p
> > > >
> > > >         return page;
> > > >  }
> > > > +
> > > > +static __always_inline const struct page *page_is_fake_head(const
> > > > struct page *page)
> > > > +{
> > > > +       return page_head_if_fake(page) != page;
> > > > +}
> > > > +
> > > >  #else
> > > >  static __always_inline const struct page *page_head_if_fake(const
> > > > struct page *page)
> > > >  {
> > > > @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> > > > struct page *page)
> > > >  static __always_inline int PageTail(struct page *page)
> > > >  {
> > > >         return READ_ONCE(page->compound_head) & 1 ||
> > > > -              page_head_if_fake(page) != page;
> > > > +              page_is_fake_head(page);
> > > >  }
> > >
> > > Yeah, this makes PageTail more readable. In your previous thread,
> > > you proposed that why not use PageTail in PageCompound directly
> > > to improve code readability. So I want to introduce 2 more helpers
> > > besides page_is_fake_head().
> > >
> > > static __always_inline int page_tail_not_fake_head(struct page *page)
> > > {
> > >     return READ_ONCE(page->compound_head) & 1;
> > > }
> > >
> > > static __always_inline int page_head_or_fake(struct page *page)
> > > {
> > >     return test_bit(PG_head, &page->flags);
> > > }
> > >
> > > Then PageTail() and PageCompound() change to the following.
> > >
> > > static __always_inline int PageTail(struct page *page)
> > > {
> > >     return page_tail_not_fake_head(page) || page_is_fake_head(page);
> > > }
> > >
> > > static __always_inline int PageCompound(struct page *page)
> > > {
> > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > }
> > >
> > > From the point of names of helpers, they act as self-annotation.
> > > So I think PageTail and PageCompound become readable
> > > as well. But you said "still not good enough". Is it because of
> > > the names of helpers or introducing more complexity?
> >
> > I really don't think it is worth this complexity. If there is anything to make
>
> Got it.
>
> > the code more readable, I would rename page_head_if_fake() to
> > page_fixed_dup_head().
>
> Here means page_fixed_up_head, right? Is it a typo?

I actually meant "duplicated", but in your case, it is "fake".
It doesn't matter too much. Both are ok.

>
> Thanks.
>
> >
> > this function fixes up the page:
> > 1. if the page is a fake head, we need to return its true head (things
> > get fixed.)
> > 2. if the page is not a fake head, in other words, it is either true
> > head or tail, no need to fix anything.
> >
> > >
> > > Thanks.

Thanks
barry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH RESEND v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2021-09-22  7:36                   ` Barry Song
  0 siblings, 0 replies; 47+ messages in thread
From: Barry Song @ 2021-09-22  7:36 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Oscar Salvador, Michal Hocko,
	Barry Song, David Hildenbrand, Chen Huang, Bodeddula,
	Balasubramaniam, Jonathan Corbet, Matthew Wilcox, Xiongchun duan,
	fam.zheng, Muchun Song, Qi Zheng, linux-doc, LKML, Linux-MM

On Wed, Sep 22, 2021 at 2:39 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Wed, Sep 22, 2021 at 4:43 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, Sep 22, 2021 at 1:46 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Tue, Sep 21, 2021 at 8:11 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Tue, Sep 21, 2021 at 10:23 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >
> > > > > On Sat, Sep 18, 2021 at 6:06 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > >
> > > > > > On Sat, Sep 18, 2021 at 12:39 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sat, Sep 18, 2021 at 12:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > > > >
> > > > > > > > Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB
> > > > > > > > page. However, we can remap all tail vmemmap pages to the page frame
> > > > > > > > mapped to with the head vmemmap page. Finally, we can free 7 vmemmap
> > > > > > > > pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save
> > > > > > > > extra 2GB memory when there is 1TB HugeTLB pages in the system
> > > > > > > > compared with the current implementation).
> > > > > > > >
> > > > > > > > But the head vmemmap page is not freed to the buddy allocator and all
> > > > > > > > tail vmemmap pages are mapped to the head vmemmap page frame. So we
> > > > > > > > can see more than one struct page struct with PG_head (e.g. 8 per 2 MB
> > > > > > > > HugeTLB page) associated with each HugeTLB page. We should adjust
> > > > > > > > compound_head() to make it returns the real head struct page when the
> > > > > > > > parameter is the tail struct page but with PG_head flag.
> > > > > > > >
> > > > > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > > > > ---
> > > > > > > >  Documentation/admin-guide/kernel-parameters.txt |  2 +-
> > > > > > > >  include/linux/page-flags.h                      | 75 +++++++++++++++++++++++--
> > > > > > > >  mm/hugetlb_vmemmap.c                            | 60 +++++++++++---------
> > > > > > > >  mm/sparse-vmemmap.c                             | 21 +++++++
> > > > > > > >  4 files changed, 126 insertions(+), 32 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > index bdb22006f713..a154a7b3b9a5 100644
> > > > > > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > > > > > @@ -1606,7 +1606,7 @@
> > > > > > > >                         [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > > >                         enabled.
> > > > > > > >                         Allows heavy hugetlb users to free up some more
> > > > > > > > -                       memory (6 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > > > +                       memory (7 * PAGE_SIZE for each 2MB hugetlb page).
> > > > > > > >                         Format: { on | off (default) }
> > > > > > > >
> > > > > > > >                         on:  enable the feature
> > > > > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > > > > > index 8e1d97d8f3bd..7b1a918ebd43 100644
> > > > > > > > --- a/include/linux/page-flags.h
> > > > > > > > +++ b/include/linux/page-flags.h
> > > > > > > > @@ -184,13 +184,64 @@ enum pageflags {
> > > > > > > >
> > > > > > > >  #ifndef __GENERATING_BOUNDS_H
> > > > > > > >
> > > > > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > > > > > > > +extern bool hugetlb_free_vmemmap_enabled;
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * If the feature of freeing some vmemmap pages associated with each HugeTLB
> > > > > > > > + * page is enabled, the head vmemmap page frame is reused and all of the tail
> > > > > > > > + * vmemmap addresses map to the head vmemmap page frame (furture details can
> > > > > > > > + * refer to the figure at the head of the mm/hugetlb_vmemmap.c).  In other
> > > > > > > > + * word, there are more than one page struct with PG_head associated with each
> > > > > > > > + * HugeTLB page.  We __know__ that there is only one head page struct, the tail
> > > > > > > > + * page structs with PG_head are fake head page structs.  We need an approach
> > > > > > > > + * to distinguish between those two different types of page structs so that
> > > > > > > > + * compound_head() can return the real head page struct when the parameter is
> > > > > > > > + * the tail page struct but with PG_head.
> > > > > > > > + *
> > > > > > > > + * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > > > > > + * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > > > > > + */
> > > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > > +{
> > > > > > > > +       if (!hugetlb_free_vmemmap_enabled)
> > > > > > > > +               return page;
> > > > > > > > +
> > > > > > > > +       /*
> > > > > > > > +        * Only addresses aligned with PAGE_SIZE of struct page may be fake head
> > > > > > > > +        * struct page. The alignment check aims to avoid access the fields (
> > > > > > > > +        * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
> > > > > > > > +        * cold cacheline in some cases.
> > > > > > > > +        */
> > > > > > > > +       if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
> > > > > > > > +           test_bit(PG_head, &page->flags)) {
> > > > > > > > +               /*
> > > > > > > > +                * We can safely access the field of the @page[1] with PG_head
> > > > > > > > +                * because the @page is a compound page composed with at least
> > > > > > > > +                * two contiguous pages.
> > > > > > > > +                */
> > > > > > > > +               unsigned long head = READ_ONCE(page[1].compound_head);
> > > > > > > > +
> > > > > > > > +               if (likely(head & 1))
> > > > > > > > +                       return (const struct page *)(head - 1);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return page;
> > > > > > > > +}
> > > > > > > > +#else
> > > > > > > > +static __always_inline const struct page *page_head_if_fake(const struct page *page)
> > > > > > > > +{
> > > > > > > > +       return page;
> > > > > > > > +}
> > > > > > > > +#endif
> > > > > > > > +
> > > > > > > >  static inline unsigned long _compound_head(const struct page *page)
> > > > > > > >  {
> > > > > > > >         unsigned long head = READ_ONCE(page->compound_head);
> > > > > > > >
> > > > > > > >         if (unlikely(head & 1))
> > > > > > > >                 return head - 1;
> > > > > > > > -       return (unsigned long)page;
> > > > > > > > +       return (unsigned long)page_head_if_fake(page);
> > > > > > >
> > > > > > > hard to read. page_head_if_fake,  what is the other side of
> > > > > > > page_head_if_not_fake?
> > > > > >
> > > > > > 1) return itself if the @page is not a fake head page.
> > > > > > 2) return head page if @page is a fake head page.
> > > > > >
> > > > > > So I want to express that page_head_if_fake returns a
> > > > > > head page only and only if the parameter of @page is a
> > > > > > fake head page. Otherwise, it returns itself.
> > > > > >
> > > > > > > I would expect something like
> > > > > > > page_to_page_head()
> > > > > > > or
> > > > > > > get_page_head()
> > > > > > >
> > > > > >
> > > > > > Those names seem to be not appropriate as well, because
> > > > > > its functionality does not make sure it can return a head
> > > > > > page. If the parameter is a head page, it definitely
> > > > > > returns a head page, otherwise, it may return itself which
> > > > > > may be a tail page.
> > > > > >
> > > > > > From this point of view, I still prefer page_head_if_fake.
> > > > > >
> > > > > > > Anyway, I am not quite sure what is the best name. but page_head_if_fake(page)
> > > > > > > sounds odd to me. just like the things have two sides, but if_fake  presents
> > > > > > > one side only.
> > > > > >
> > > > > > If others have any ideas, comments are welcome.
> > > > > >
> > > > > > >
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  #define compound_head(page)    ((typeof(page))_compound_head(page))
> > > > > > > > @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page)
> > > > > > > >
> > > > > > > >  static __always_inline int PageTail(struct page *page)
> > > > > > > >  {
> > > > > > > > -       return READ_ONCE(page->compound_head) & 1;
> > > > > > > > +       return READ_ONCE(page->compound_head) & 1 ||
> > > > > > > > +              page_head_if_fake(page) != page;
> > > > > > >
> > > > > > > i would expect a wrapper like:
> > > > > > > page_is_fake_head()
> > > > > >
> > > > > > Good point. Will do.
> > > > > >
> > > > > > >
> > > > > > > and the above page_to_page_head() can leverage the wrapper.
> > > > > > > here too.
> > > > > > >
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static __always_inline int PageCompound(struct page *page)
> > > > > > > >  {
> > > > > > > > -       return test_bit(PG_head, &page->flags) || PageTail(page);
> > > > > > > > +       return test_bit(PG_head, &page->flags) ||
> > > > > > > > +              READ_ONCE(page->compound_head) & 1;
> > > > > > >
> > > > > > > hard to read. could it be something like the below?
> > > > > > > return PageHead(page) || PageTail(page);
> > > > > > >
> > > > > > > or do we really need to change this function? even a fake head still has
> > > > > > > the true test_bit(PG_head, &page->flags), though it is not a real head, it
> > > > > > > is still a pagecompound, right?
> > > > > >
> > > > > > Right. PageCompound() can not be changed.  It is odd but
> > > > > > efficient because calling page_head_if_fake is eliminated.
> > > > > > So I select performance not readability. I'm not sure if it's
> > > > > > worth it.
> > > > >
> > > > > In order to improve readability, I'll introduce 3 helpers as follows.
> > > > >
> > > > > 1) page_head_or_fake(), which returns true for the head page
> > > > >    or fake head page.
> > > > > 2) page_head_is_fake(), which returns true for fake head page.
> > > > > 3) page_tail_not_fake_head(), which returns true for the tail page
> > > > >    except the fake head page.
> > > > >
> > > > > In the end, PageHead(), PageTail() and PageCompound() become
> > > > > the following.
> > > > >
> > > > > static __always_inline int PageHead(struct page *page)
> > > > > {
> > > > >     return page_head_or_fake(page) && !page_head_is_fake(page);
> > > > > }
> > > > >
> > > > > static __always_inline int PageTail(struct page *page)
> > > > > {
> > > > >     return page_tail_not_fake_head(page) || page_head_is_fake(page);
> > > > > }
> > > > >
> > > > > static __always_inline int PageCompound(struct page *page)
> > > > > {
> > > > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > > > }
> > > > >
> > > > > Do those look more readable?
> > > > >
> > > >
> > > > still not good enough. After a second thought, page_head_if_fake seems
> > > > to have the best performance though this function returns an odd value.
> > > > i just made a little bit refine on your code in doc:
> > >
> > > Right. page_head_if_fake is the choice for performance.
> > >
> > > >
> > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > > > index 2c0d11e71e26..240c2fca13c7 100644
> > > > --- a/include/linux/page-flags.h
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -197,8 +197,9 @@ extern bool hugetlb_free_vmemmap_enabled;
> > > >   * compound_head() can return the real head page struct when the parameter is
> > > >   * the tail page struct but with PG_head.
> > > >   *
> > > > - * The page_head_if_fake() returns the real head page struct iff the @page may
> > > > - * be fake, otherwise, returns the @page if it cannot be a fake page struct.
> > > > + * The page_head_if_fake() returns the real head page struct if the @page is
> > > > + * fake page_head, otherwise, returns @page which can either be a true page_
> > > > + * head or tail.
> > > >   */
> > >
> > > Good annotation.
> > >
> > > >  static __always_inline const struct page *page_head_if_fake(const
> > > > struct page *page)
> > > >  {
> > > > @@ -226,6 +227,12 @@ static __always_inline const struct page
> > > > *page_head_if_fake(const struct page *p
> > > >
> > > >         return page;
> > > >  }
> > > > +
> > > > +static __always_inline const struct page *page_is_fake_head(const
> > > > struct page *page)
> > > > +{
> > > > +       return page_head_if_fake(page) != page;
> > > > +}
> > > > +
> > > >  #else
> > > >  static __always_inline const struct page *page_head_if_fake(const
> > > > struct page *page)
> > > >  {
> > > > @@ -247,7 +254,7 @@ static inline unsigned long _compound_head(const
> > > > struct page *page)
> > > >  static __always_inline int PageTail(struct page *page)
> > > >  {
> > > >         return READ_ONCE(page->compound_head) & 1 ||
> > > > -              page_head_if_fake(page) != page;
> > > > +              page_is_fake_head(page);
> > > >  }
> > >
> > > Yeah, this makes PageTail more readable. In your previous thread,
> > > you proposed that why not use PageTail in PageCompound directly
> > > to improve code readability. So I want to introduce 2 more helpers
> > > besides page_is_fake_head().
> > >
> > > static __always_inline int page_tail_not_fake_head(struct page *page)
> > > {
> > >     return READ_ONCE(page->compound_head) & 1;
> > > }
> > >
> > > static __always_inline int page_head_or_fake(struct page *page)
> > > {
> > >     return test_bit(PG_head, &page->flags);
> > > }
> > >
> > > Then PageTail() and PageCompound() change to the following.
> > >
> > > static __always_inline int PageTail(struct page *page)
> > > {
> > >     return page_tail_not_fake_head(page) || page_is_fake_head(page);
> > > }
> > >
> > > static __always_inline int PageCompound(struct page *page)
> > > {
> > >     return page_head_or_fake(page) || page_tail_not_fake_head(page);
> > > }
> > >
> > > From the point of names of helpers, they act as self-annotation.
> > > So I think PageTail and PageCompound become readable
> > > as well. But you said "still not good enough". Is it because of
> > > the names of helpers or introducing more complexity?
> >
> > I really don't think it is worth this complexity. If there is anything to make
>
> Got it.
>
> > the code more readable, I would rename page_head_if_fake() to
> > page_fixed_dup_head().
>
> Here means page_fixed_up_head, right? Is it a typo?

I actually meant "duplicated", but in your case, it is "fake".
It doesn't matter too much. Both are ok.

>
> Thanks.
>
> >
> > this function fixes up the page:
> > 1. if the page is a fake head, we need to return its true head (things
> > get fixed.)
> > 2. if the page is not a fake head, in other words, it is either true
> > head or tail, no need to fix anything.
> >
> > >
> > > Thanks.

Thanks
barry


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-09-22  7:37 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17  3:48 [PATCH RESEND v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Muchun Song
2021-09-17  3:48 ` [PATCH RESEND v2 1/4] mm: hugetlb: free " Muchun Song
2021-09-18  4:38   ` Barry Song
2021-09-18  4:38     ` Barry Song
2021-09-18 10:06     ` Muchun Song
2021-09-18 10:06       ` Muchun Song
2021-09-21  6:43       ` Muchun Song
2021-09-21  6:43         ` Muchun Song
2021-09-21 10:22       ` Muchun Song
2021-09-21 10:22         ` Muchun Song
2021-09-21  0:11         ` Barry Song
2021-09-21  0:11           ` Barry Song
2021-09-21 13:46           ` Muchun Song
2021-09-21 13:46             ` Muchun Song
2021-09-21 20:43             ` Barry Song
2021-09-21 20:43               ` Barry Song
2021-09-22  2:38               ` Muchun Song
2021-09-22  2:38                 ` Muchun Song
2021-09-22  7:36                 ` Barry Song
2021-09-22  7:36                   ` Barry Song
2021-09-17  3:48 ` [PATCH RESEND v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Muchun Song
2021-09-18  4:55   ` Barry Song
2021-09-18  4:55     ` Barry Song
2021-09-18 10:30     ` Muchun Song
2021-09-18 10:30       ` Muchun Song
2021-09-18 11:14       ` Barry Song
2021-09-18 11:14         ` Barry Song
2021-09-18 11:47         ` Muchun Song
2021-09-18 11:47           ` Muchun Song
2021-09-18 12:27           ` Barry Song
2021-09-18 12:27             ` Barry Song
2021-09-17  3:48 ` [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations Muchun Song
2021-09-18  5:06   ` Barry Song
2021-09-18  5:06     ` Barry Song
2021-09-18 10:51     ` Muchun Song
2021-09-18 10:51       ` Muchun Song
2021-09-18 11:01       ` Barry Song
2021-09-18 11:01         ` Barry Song
2021-09-17  3:48 ` [PATCH RESEND v2 4/4] selftests: vm: add a hugetlb test case Muchun Song
2021-09-18  5:20   ` Barry Song
2021-09-18  5:20     ` Barry Song
2021-09-20 14:26     ` Muchun Song
2021-09-20 14:26       ` Muchun Song
2021-09-21  0:28       ` Barry Song
2021-09-21  0:28         ` Barry Song
2021-09-21 13:18         ` Muchun Song
2021-09-21 13:18           ` Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.