linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] A few cleanup patches for khugepaged
@ 2022-06-11  8:47 Miaohe Lin
  2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
                   ` (6 more replies)
  0 siblings, 7 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

Hi everyone,
This series contains a few cleaup patches to remove unneeded return
value, use helper macro, fix typos and so on. More details can be
found in the respective changelogs. Thanks!

Miaohe Lin (7):
  mm/khugepaged: remove unneeded shmem_huge_enabled() check
  mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  mm/khugepaged: trivial typo and codestyle cleanup
  mm/khugepaged: minor cleanup for collapse_file
  mm/khugepaged: use helper macro __ATTR_RW
  mm/khugepaged: remove unneeded return value of
    khugepaged_add_pte_mapped_thp()
  mm/khugepaged: try to free transhuge swapcache when possible

 include/linux/swap.h |  5 +++
 mm/khugepaged.c      | 85 +++++++++++++++++++-------------------------
 mm/swap.h            |  5 ---
 3 files changed, 41 insertions(+), 54 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-11 20:33   ` Andrew Morton
                     ` (2 more replies)
  2022-06-11  8:47 ` [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs Miaohe Lin
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

If we reach here, hugepage_vma_check() has already made sure that hugepage
is enabled for shmem. Remove this duplicated check.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/khugepaged.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 476d79360101..73570dfffcec 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2153,8 +2153,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 		if (khugepaged_scan.address < hstart)
 			khugepaged_scan.address = hstart;
 		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
-		if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
-			goto skip;
 
 		while (khugepaged_scan.address < hend) {
 			int ret;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
  2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-15 15:14   ` Zach O'Keefe
  2022-06-15 17:49   ` Yang Shi
  2022-06-11  8:47 ` [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup Miaohe Lin
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
swap entry will remain in pagetable. This will result in later failure.
So stop swapping in pages in this case to save cpu cycles.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/khugepaged.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 73570dfffcec..a8adb2d1e9c6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 		swapped_in++;
 		ret = do_swap_page(&vmf);
 
-		/* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
+		/*
+		 * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
+		 * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
+		 * we do not retry here and swap entry will remain in pagetable
+		 * resulting in later failure.
+		 */
 		if (ret & VM_FAULT_RETRY) {
 			mmap_read_lock(mm);
-			if (hugepage_vma_revalidate(mm, haddr, &vma)) {
-				/* vma is no longer available, don't continue to swapin */
-				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
-				return false;
-			}
-			/* check if the pmd is still valid */
-			if (mm_find_pmd(mm, haddr) != pmd) {
-				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
-				return false;
-			}
+			trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
+			return false;
 		}
 		if (ret & VM_FAULT_ERROR) {
 			trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
  2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
  2022-06-11  8:47 ` [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-15  0:23   ` Zach O'Keefe
  2022-06-15 17:53   ` Yang Shi
  2022-06-11  8:47 ` [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file Miaohe Lin
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

Fix some typos and tweak the code to meet codestyle. No functional
change intended.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/khugepaged.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a8adb2d1e9c6..1b5dd3820eac 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -260,7 +260,7 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
 	unsigned long max_ptes_none;
 
 	err = kstrtoul(buf, 10, &max_ptes_none);
-	if (err || max_ptes_none > HPAGE_PMD_NR-1)
+	if (err || max_ptes_none > HPAGE_PMD_NR - 1)
 		return -EINVAL;
 
 	khugepaged_max_ptes_none = max_ptes_none;
@@ -286,7 +286,7 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
 	unsigned long max_ptes_swap;
 
 	err  = kstrtoul(buf, 10, &max_ptes_swap);
-	if (err || max_ptes_swap > HPAGE_PMD_NR-1)
+	if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
 		return -EINVAL;
 
 	khugepaged_max_ptes_swap = max_ptes_swap;
@@ -313,7 +313,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
 	unsigned long max_ptes_shared;
 
 	err  = kstrtoul(buf, 10, &max_ptes_shared);
-	if (err || max_ptes_shared > HPAGE_PMD_NR-1)
+	if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
 		return -EINVAL;
 
 	khugepaged_max_ptes_shared = max_ptes_shared;
@@ -599,7 +599,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 	int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
 	bool writable = false;
 
-	for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
+	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, address += PAGE_SIZE) {
 		pte_t pteval = *_pte;
 		if (pte_none(pteval) || (pte_present(pteval) &&
@@ -1216,7 +1216,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 
 	memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
 	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
-	for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
+	for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, _address += PAGE_SIZE) {
 		pte_t pteval = *_pte;
 		if (is_swap_pte(pteval)) {
@@ -1306,7 +1306,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 		/*
 		 * Check if the page has any GUP (or other external) pins.
 		 *
-		 * Here the check is racy it may see totmal_mapcount > refcount
+		 * Here the check is racy it may see total_mapcount > refcount
 		 * in some cases.
 		 * For example, one process with one forked child process.
 		 * The parent has the PMD split due to MADV_DONTNEED, then
@@ -1557,7 +1557,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 		 * mmap_write_lock(mm) as PMD-mapping is likely to be split
 		 * later.
 		 *
-		 * Not that vma->anon_vma check is racy: it can be set up after
+		 * Note that vma->anon_vma check is racy: it can be set up after
 		 * the check but before we took mmap_lock by the fault path.
 		 * But page lock would prevent establishing any new ptes of the
 		 * page, so we are safe.
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
                   ` (2 preceding siblings ...)
  2022-06-11  8:47 ` [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-15 15:54   ` Zach O'Keefe
  2022-06-11  8:47 ` [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW Miaohe Lin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

nr_none is always 0 for non-shmem case because the page can be read from
the backend store. So when nr_none ! = 0, it must be in is_shmem case.
Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
cpu cycles.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/khugepaged.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1b5dd3820eac..8e6fad7c7bd9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,
 
 	if (nr_none) {
 		__mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
-		if (is_shmem)
-			__mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
+		__mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
 	}
 
 	/* Join all the small entries into a single multi-index entry */
@@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,
 
 		/* Something went wrong: roll back page cache changes */
 		xas_lock_irq(&xas);
-		mapping->nrpages -= nr_none;
-
-		if (is_shmem)
+		if (nr_none) {
+			mapping->nrpages -= nr_none;
 			shmem_uncharge(mapping->host, nr_none);
+		}
 
 		xas_set(&xas, start);
 		xas_for_each(&xas, page, end - 1) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
                   ` (3 preceding siblings ...)
  2022-06-11  8:47 ` [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-15  0:29   ` Zach O'Keefe
  2022-06-15 21:28   ` Yang Shi
  2022-06-11  8:47 ` [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp() Miaohe Lin
  2022-06-11  8:47 ` [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible Miaohe Lin
  6 siblings, 2 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

Use helper macro __ATTR_RW to define the khugepaged attributes. Minor
readability improvement.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/khugepaged.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8e6fad7c7bd9..142e26e4bdbf 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -147,8 +147,7 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
 	return count;
 }
 static struct kobj_attribute scan_sleep_millisecs_attr =
-	__ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
-	       scan_sleep_millisecs_store);
+	__ATTR_RW(scan_sleep_millisecs);
 
 static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
 					  struct kobj_attribute *attr,
@@ -175,8 +174,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
 	return count;
 }
 static struct kobj_attribute alloc_sleep_millisecs_attr =
-	__ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
-	       alloc_sleep_millisecs_store);
+	__ATTR_RW(alloc_sleep_millisecs);
 
 static ssize_t pages_to_scan_show(struct kobject *kobj,
 				  struct kobj_attribute *attr,
@@ -200,8 +198,7 @@ static ssize_t pages_to_scan_store(struct kobject *kobj,
 	return count;
 }
 static struct kobj_attribute pages_to_scan_attr =
-	__ATTR(pages_to_scan, 0644, pages_to_scan_show,
-	       pages_to_scan_store);
+	__ATTR_RW(pages_to_scan);
 
 static ssize_t pages_collapsed_show(struct kobject *kobj,
 				    struct kobj_attribute *attr,
@@ -221,13 +218,13 @@ static ssize_t full_scans_show(struct kobject *kobj,
 static struct kobj_attribute full_scans_attr =
 	__ATTR_RO(full_scans);
 
-static ssize_t khugepaged_defrag_show(struct kobject *kobj,
+static ssize_t defrag_show(struct kobject *kobj,
 				      struct kobj_attribute *attr, char *buf)
 {
 	return single_hugepage_flag_show(kobj, attr, buf,
 					 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
 }
-static ssize_t khugepaged_defrag_store(struct kobject *kobj,
+static ssize_t defrag_store(struct kobject *kobj,
 				       struct kobj_attribute *attr,
 				       const char *buf, size_t count)
 {
@@ -235,8 +232,7 @@ static ssize_t khugepaged_defrag_store(struct kobject *kobj,
 				 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
 }
 static struct kobj_attribute khugepaged_defrag_attr =
-	__ATTR(defrag, 0644, khugepaged_defrag_show,
-	       khugepaged_defrag_store);
+	__ATTR_RW(defrag);
 
 /*
  * max_ptes_none controls if khugepaged should collapse hugepages over
@@ -246,13 +242,13 @@ static struct kobj_attribute khugepaged_defrag_attr =
  * runs. Increasing max_ptes_none will instead potentially reduce the
  * free memory in the system during the khugepaged scan.
  */
-static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
+static ssize_t max_ptes_none_show(struct kobject *kobj,
 					     struct kobj_attribute *attr,
 					     char *buf)
 {
 	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_none);
 }
-static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
+static ssize_t max_ptes_none_store(struct kobject *kobj,
 					      struct kobj_attribute *attr,
 					      const char *buf, size_t count)
 {
@@ -268,17 +264,16 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
 	return count;
 }
 static struct kobj_attribute khugepaged_max_ptes_none_attr =
-	__ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
-	       khugepaged_max_ptes_none_store);
+	__ATTR_RW(max_ptes_none);
 
-static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
+static ssize_t max_ptes_swap_show(struct kobject *kobj,
 					     struct kobj_attribute *attr,
 					     char *buf)
 {
 	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_swap);
 }
 
-static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
+static ssize_t max_ptes_swap_store(struct kobject *kobj,
 					      struct kobj_attribute *attr,
 					      const char *buf, size_t count)
 {
@@ -295,17 +290,16 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
 }
 
 static struct kobj_attribute khugepaged_max_ptes_swap_attr =
-	__ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
-	       khugepaged_max_ptes_swap_store);
+	__ATTR_RW(max_ptes_swap);
 
-static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
+static ssize_t max_ptes_shared_show(struct kobject *kobj,
 					       struct kobj_attribute *attr,
 					       char *buf)
 {
 	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_shared);
 }
 
-static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
+static ssize_t max_ptes_shared_store(struct kobject *kobj,
 					      struct kobj_attribute *attr,
 					      const char *buf, size_t count)
 {
@@ -322,8 +316,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
 }
 
 static struct kobj_attribute khugepaged_max_ptes_shared_attr =
-	__ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
-	       khugepaged_max_ptes_shared_store);
+	__ATTR_RW(max_ptes_shared);
 
 static struct attribute *khugepaged_attr[] = {
 	&khugepaged_defrag_attr.attr,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp()
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
                   ` (4 preceding siblings ...)
  2022-06-11  8:47 ` [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-15  0:19   ` Zach O'Keefe
  2022-06-15 21:35   ` Yang Shi
  2022-06-11  8:47 ` [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible Miaohe Lin
  6 siblings, 2 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

The return value of khugepaged_add_pte_mapped_thp() is always 0 and also
ignored. Remove it to clean up the code.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/khugepaged.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 142e26e4bdbf..ee0a719c8be9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1372,7 +1372,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
  * Notify khugepaged that given addr of the mm is pte-mapped THP. Then
  * khugepaged should try to collapse the page table.
  */
-static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
+static void khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
 					 unsigned long addr)
 {
 	struct mm_slot *mm_slot;
@@ -1384,7 +1384,6 @@ static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
 	if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP))
 		mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr;
 	spin_unlock(&khugepaged_mm_lock);
-	return 0;
 }
 
 static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
                   ` (5 preceding siblings ...)
  2022-06-11  8:47 ` [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp() Miaohe Lin
@ 2022-06-11  8:47 ` Miaohe Lin
  2022-06-15 17:13   ` Zach O'Keefe
  2022-06-15 23:58   ` Yang Shi
  6 siblings, 2 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-11  8:47 UTC (permalink / raw)
  To: akpm
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel, linmiaohe

Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
It's because release_pte_page() is not called for these pages and
thus free_page_and_swap_cache can't grab the page lock. These pages
won't be freed from swap cache even if we are the only user until
next time reclaim. It shouldn't hurt indeed, but we could try to
free these pages to save more memory for system.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/swap.h | 5 +++++
 mm/khugepaged.c      | 1 +
 mm/swap.h            | 5 -----
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8672a7123ccd..ccb83b12b724 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
 	return global_node_page_state(NR_SWAPCACHE);
 }
 
+extern void free_swap_cache(struct page *page);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
 /* linux/mm/swapfile.c */
@@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
 /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
 #define free_swap_and_cache(e) is_pfn_swap_entry(e)
 
+static inline void free_swap_cache(struct page *page)
+{
+}
+
 static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
 {
 	return 0;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ee0a719c8be9..52109ad13f78 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
 	list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
 		list_del(&src_page->lru);
 		release_pte_page(src_page);
+		free_swap_cache(src_page);
 	}
 }
 
diff --git a/mm/swap.h b/mm/swap.h
index 0193797b0c92..863f6086c916 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
 void delete_from_swap_cache(struct page *page);
 void clear_shadow_from_swap_cache(int type, unsigned long begin,
 				  unsigned long end);
-void free_swap_cache(struct page *page);
 struct page *lookup_swap_cache(swp_entry_t entry,
 			       struct vm_area_struct *vma,
 			       unsigned long addr);
@@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
 	return NULL;
 }
 
-static inline void free_swap_cache(struct page *page)
-{
-}
-
 static inline void show_swap_cache_info(void)
 {
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check
  2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
@ 2022-06-11 20:33   ` Andrew Morton
  2022-06-13  1:48     ` Miaohe Lin
  2022-06-15  0:13   ` Zach O'Keefe
  2022-06-15 17:35   ` Yang Shi
  2 siblings, 1 reply; 40+ messages in thread
From: Andrew Morton @ 2022-06-11 20:33 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel

On Sat, 11 Jun 2022 16:47:25 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem. Remove this duplicated check.

I updated this to

If we reach here, hugepage_vma_check() has already made sure that hugepage
is enabled for shmem, via its call to hugepage_vma_check().  Remove this
duplicated check.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check
  2022-06-11 20:33   ` Andrew Morton
@ 2022-06-13  1:48     ` Miaohe Lin
  2022-06-13 18:02       ` Andrew Morton
  0 siblings, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-13  1:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel

On 2022/6/12 4:33, Andrew Morton wrote:
> On Sat, 11 Jun 2022 16:47:25 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> If we reach here, hugepage_vma_check() has already made sure that hugepage
>> is enabled for shmem. Remove this duplicated check.
> 
> I updated this to
> 
> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem, via its call to hugepage_vma_check().  Remove this
> duplicated check.

Do you mean "khugepaged_scan_mm_slot() has already made sure that hugepage is
enabled for shmem, via its call to hugepage_vma_check()"?

Thanks!

> 
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check
  2022-06-13  1:48     ` Miaohe Lin
@ 2022-06-13 18:02       ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2022-06-13 18:02 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: aarcange, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, linux-mm, linux-kernel

On Mon, 13 Jun 2022 09:48:27 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> > I updated this to
> > 
> > If we reach here, hugepage_vma_check() has already made sure that hugepage
> > is enabled for shmem, via its call to hugepage_vma_check().  Remove this
> > duplicated check.
> 
> Do you mean "khugepaged_scan_mm_slot() has already made sure that hugepage is
> enabled for shmem, via its call to hugepage_vma_check()"?

yup, thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check
  2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
  2022-06-11 20:33   ` Andrew Morton
@ 2022-06-15  0:13   ` Zach O'Keefe
  2022-06-15 17:35   ` Yang Shi
  2 siblings, 0 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15  0:13 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem. Remove this duplicated check.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 476d79360101..73570dfffcec 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2153,8 +2153,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>  		if (khugepaged_scan.address < hstart)
>  			khugepaged_scan.address = hstart;
>  		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
> -		if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
> -			goto skip;
>  
>  		while (khugepaged_scan.address < hend) {
>  			int ret;
> -- 
> 2.23.0
> 
> 

Thanks for these cleanups, Miaohe.

Reviewed-by: Zach O'Keefe <zokeefe@google.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp()
  2022-06-11  8:47 ` [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp() Miaohe Lin
@ 2022-06-15  0:19   ` Zach O'Keefe
  2022-06-15 21:35   ` Yang Shi
  1 sibling, 0 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15  0:19 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> The return value of khugepaged_add_pte_mapped_thp() is always 0 and also
> ignored. Remove it to clean up the code.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 142e26e4bdbf..ee0a719c8be9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1372,7 +1372,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
>   * Notify khugepaged that given addr of the mm is pte-mapped THP. Then
>   * khugepaged should try to collapse the page table.
>   */
> -static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
> +static void khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
>  					 unsigned long addr)

Since this is a cleanup, should keep the second param aligned with opening
bracket to satisfy checkpatch.

Otherwise can add Reviewed-by: Zach O'Keefe <zokeefe@google.com>

>  {
>  	struct mm_slot *mm_slot;
> @@ -1384,7 +1384,6 @@ static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
>  	if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP))
>  		mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr;
>  	spin_unlock(&khugepaged_mm_lock);
> -	return 0;
>  }
>  
>  static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
> -- 
> 2.23.0
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup
  2022-06-11  8:47 ` [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup Miaohe Lin
@ 2022-06-15  0:23   ` Zach O'Keefe
  2022-06-15 17:53   ` Yang Shi
  1 sibling, 0 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15  0:23 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> Fix some typos and tweak the code to meet codestyle. No functional
> change intended.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a8adb2d1e9c6..1b5dd3820eac 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -260,7 +260,7 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>  	unsigned long max_ptes_none;
>  
>  	err = kstrtoul(buf, 10, &max_ptes_none);
> -	if (err || max_ptes_none > HPAGE_PMD_NR-1)
> +	if (err || max_ptes_none > HPAGE_PMD_NR - 1)
>  		return -EINVAL;
>  
>  	khugepaged_max_ptes_none = max_ptes_none;
> @@ -286,7 +286,7 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>  	unsigned long max_ptes_swap;
>  
>  	err  = kstrtoul(buf, 10, &max_ptes_swap);
> -	if (err || max_ptes_swap > HPAGE_PMD_NR-1)
> +	if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
>  		return -EINVAL;
>  
>  	khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -313,7 +313,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>  	unsigned long max_ptes_shared;
>  
>  	err  = kstrtoul(buf, 10, &max_ptes_shared);
> -	if (err || max_ptes_shared > HPAGE_PMD_NR-1)
> +	if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
>  		return -EINVAL;
>  
>  	khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -599,7 +599,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
>  	bool writable = false;
>  
> -	for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
> +	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, address += PAGE_SIZE) {
>  		pte_t pteval = *_pte;
>  		if (pte_none(pteval) || (pte_present(pteval) &&
> @@ -1216,7 +1216,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>  
>  	memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
>  	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
> -	for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
> +	for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, _address += PAGE_SIZE) {
>  		pte_t pteval = *_pte;
>  		if (is_swap_pte(pteval)) {
> @@ -1306,7 +1306,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>  		/*
>  		 * Check if the page has any GUP (or other external) pins.
>  		 *
> -		 * Here the check is racy it may see totmal_mapcount > refcount
> +		 * Here the check is racy it may see total_mapcount > refcount
>  		 * in some cases.
>  		 * For example, one process with one forked child process.
>  		 * The parent has the PMD split due to MADV_DONTNEED, then
> @@ -1557,7 +1557,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>  		 * mmap_write_lock(mm) as PMD-mapping is likely to be split
>  		 * later.
>  		 *
> -		 * Not that vma->anon_vma check is racy: it can be set up after
> +		 * Note that vma->anon_vma check is racy: it can be set up after
>  		 * the check but before we took mmap_lock by the fault path.
>  		 * But page lock would prevent establishing any new ptes of the
>  		 * page, so we are safe.
> -- 
> 2.23.0
> 
> 

Reviewed-by: Zach O'Keefe <zokeefe@google.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW
  2022-06-11  8:47 ` [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW Miaohe Lin
@ 2022-06-15  0:29   ` Zach O'Keefe
  2022-06-15  7:48     ` Miaohe Lin
  2022-06-15 21:28   ` Yang Shi
  1 sibling, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15  0:29 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> Use helper macro __ATTR_RW to define the khugepaged attributes. Minor
> readability improvement.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 37 +++++++++++++++----------------------
>  1 file changed, 15 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8e6fad7c7bd9..142e26e4bdbf 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -147,8 +147,7 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
>  	return count;
>  }
>  static struct kobj_attribute scan_sleep_millisecs_attr =
> -	__ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
> -	       scan_sleep_millisecs_store);
> +	__ATTR_RW(scan_sleep_millisecs);
>  
>  static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
>  					  struct kobj_attribute *attr,
> @@ -175,8 +174,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
>  	return count;
>  }
>  static struct kobj_attribute alloc_sleep_millisecs_attr =
> -	__ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
> -	       alloc_sleep_millisecs_store);
> +	__ATTR_RW(alloc_sleep_millisecs);
>  
>  static ssize_t pages_to_scan_show(struct kobject *kobj,
>  				  struct kobj_attribute *attr,
> @@ -200,8 +198,7 @@ static ssize_t pages_to_scan_store(struct kobject *kobj,
>  	return count;
>  }
>  static struct kobj_attribute pages_to_scan_attr =
> -	__ATTR(pages_to_scan, 0644, pages_to_scan_show,
> -	       pages_to_scan_store);
> +	__ATTR_RW(pages_to_scan);
>  
>  static ssize_t pages_collapsed_show(struct kobject *kobj,
>  				    struct kobj_attribute *attr,
> @@ -221,13 +218,13 @@ static ssize_t full_scans_show(struct kobject *kobj,
>  static struct kobj_attribute full_scans_attr =
>  	__ATTR_RO(full_scans);
>  
> -static ssize_t khugepaged_defrag_show(struct kobject *kobj,
> +static ssize_t defrag_show(struct kobject *kobj,
>  				      struct kobj_attribute *attr, char *buf)
>  {
>  	return single_hugepage_flag_show(kobj, attr, buf,
>  					 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>  }
> -static ssize_t khugepaged_defrag_store(struct kobject *kobj,
> +static ssize_t defrag_store(struct kobject *kobj,
>  				       struct kobj_attribute *attr,
>  				       const char *buf, size_t count)
>  {
> @@ -235,8 +232,7 @@ static ssize_t khugepaged_defrag_store(struct kobject *kobj,
>  				 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>  }
>  static struct kobj_attribute khugepaged_defrag_attr =
> -	__ATTR(defrag, 0644, khugepaged_defrag_show,
> -	       khugepaged_defrag_store);
> +	__ATTR_RW(defrag);
>  
>  /*
>   * max_ptes_none controls if khugepaged should collapse hugepages over
> @@ -246,13 +242,13 @@ static struct kobj_attribute khugepaged_defrag_attr =
>   * runs. Increasing max_ptes_none will instead potentially reduce the
>   * free memory in the system during the khugepaged scan.
>   */
> -static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
> +static ssize_t max_ptes_none_show(struct kobject *kobj,
>  					     struct kobj_attribute *attr,
>  					     char *buf)
>  {
>  	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_none);
>  }
> -static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
> +static ssize_t max_ptes_none_store(struct kobject *kobj,
>  					      struct kobj_attribute *attr,
>  					      const char *buf, size_t count)
>  {
> @@ -268,17 +264,16 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>  	return count;
>  }
>  static struct kobj_attribute khugepaged_max_ptes_none_attr =
> -	__ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
> -	       khugepaged_max_ptes_none_store);
> +	__ATTR_RW(max_ptes_none);
>  
> -static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
> +static ssize_t max_ptes_swap_show(struct kobject *kobj,
>  					     struct kobj_attribute *attr,
>  					     char *buf)
>  {
>  	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_swap);
>  }
>  
> -static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
> +static ssize_t max_ptes_swap_store(struct kobject *kobj,
>  					      struct kobj_attribute *attr,
>  					      const char *buf, size_t count)
>  {
> @@ -295,17 +290,16 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>  }
>  
>  static struct kobj_attribute khugepaged_max_ptes_swap_attr =
> -	__ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
> -	       khugepaged_max_ptes_swap_store);
> +	__ATTR_RW(max_ptes_swap);
>  
> -static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
> +static ssize_t max_ptes_shared_show(struct kobject *kobj,
>  					       struct kobj_attribute *attr,
>  					       char *buf)
>  {
>  	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_shared);
>  }
>  
> -static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
> +static ssize_t max_ptes_shared_store(struct kobject *kobj,
>  					      struct kobj_attribute *attr,
>  					      const char *buf, size_t count)
>  {
> @@ -322,8 +316,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>  }
>  
>  static struct kobj_attribute khugepaged_max_ptes_shared_attr =
> -	__ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
> -	       khugepaged_max_ptes_shared_store);
> +	__ATTR_RW(max_ptes_shared);
>  
>  static struct attribute *khugepaged_attr[] = {
>  	&khugepaged_defrag_attr.attr,
> -- 
> 2.23.0
> 
> 

For function names that changed, can we align args that don't fit on opening
line with the opening brace?

Thanks,
Zach

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW
  2022-06-15  0:29   ` Zach O'Keefe
@ 2022-06-15  7:48     ` Miaohe Lin
  0 siblings, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-15  7:48 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 2022/6/15 8:29, Zach O'Keefe wrote:
> On 11 Jun 16:47, Miaohe Lin wrote:
>> Use helper macro __ATTR_RW to define the khugepaged attributes. Minor
>> readability improvement.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/khugepaged.c | 37 +++++++++++++++----------------------
>>  1 file changed, 15 insertions(+), 22 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 8e6fad7c7bd9..142e26e4bdbf 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -147,8 +147,7 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
>>  	return count;
>>  }
>>  static struct kobj_attribute scan_sleep_millisecs_attr =
>> -	__ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
>> -	       scan_sleep_millisecs_store);
>> +	__ATTR_RW(scan_sleep_millisecs);
>>  
>>  static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
>>  					  struct kobj_attribute *attr,
>> @@ -175,8 +174,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
>>  	return count;
>>  }
>>  static struct kobj_attribute alloc_sleep_millisecs_attr =
>> -	__ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
>> -	       alloc_sleep_millisecs_store);
>> +	__ATTR_RW(alloc_sleep_millisecs);
>>  
>>  static ssize_t pages_to_scan_show(struct kobject *kobj,
>>  				  struct kobj_attribute *attr,
>> @@ -200,8 +198,7 @@ static ssize_t pages_to_scan_store(struct kobject *kobj,
>>  	return count;
>>  }
>>  static struct kobj_attribute pages_to_scan_attr =
>> -	__ATTR(pages_to_scan, 0644, pages_to_scan_show,
>> -	       pages_to_scan_store);
>> +	__ATTR_RW(pages_to_scan);
>>  
>>  static ssize_t pages_collapsed_show(struct kobject *kobj,
>>  				    struct kobj_attribute *attr,
>> @@ -221,13 +218,13 @@ static ssize_t full_scans_show(struct kobject *kobj,
>>  static struct kobj_attribute full_scans_attr =
>>  	__ATTR_RO(full_scans);
>>  
>> -static ssize_t khugepaged_defrag_show(struct kobject *kobj,
>> +static ssize_t defrag_show(struct kobject *kobj,
>>  				      struct kobj_attribute *attr, char *buf)
>>  {
>>  	return single_hugepage_flag_show(kobj, attr, buf,
>>  					 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>>  }
>> -static ssize_t khugepaged_defrag_store(struct kobject *kobj,
>> +static ssize_t defrag_store(struct kobject *kobj,
>>  				       struct kobj_attribute *attr,
>>  				       const char *buf, size_t count)
>>  {
>> @@ -235,8 +232,7 @@ static ssize_t khugepaged_defrag_store(struct kobject *kobj,
>>  				 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>>  }
>>  static struct kobj_attribute khugepaged_defrag_attr =
>> -	__ATTR(defrag, 0644, khugepaged_defrag_show,
>> -	       khugepaged_defrag_store);
>> +	__ATTR_RW(defrag);
>>  
>>  /*
>>   * max_ptes_none controls if khugepaged should collapse hugepages over
>> @@ -246,13 +242,13 @@ static struct kobj_attribute khugepaged_defrag_attr =
>>   * runs. Increasing max_ptes_none will instead potentially reduce the
>>   * free memory in the system during the khugepaged scan.
>>   */
>> -static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
>> +static ssize_t max_ptes_none_show(struct kobject *kobj,
>>  					     struct kobj_attribute *attr,
>>  					     char *buf)
>>  {
>>  	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_none);
>>  }
>> -static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>> +static ssize_t max_ptes_none_store(struct kobject *kobj,
>>  					      struct kobj_attribute *attr,
>>  					      const char *buf, size_t count)
>>  {
>> @@ -268,17 +264,16 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>>  	return count;
>>  }
>>  static struct kobj_attribute khugepaged_max_ptes_none_attr =
>> -	__ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
>> -	       khugepaged_max_ptes_none_store);
>> +	__ATTR_RW(max_ptes_none);
>>  
>> -static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
>> +static ssize_t max_ptes_swap_show(struct kobject *kobj,
>>  					     struct kobj_attribute *attr,
>>  					     char *buf)
>>  {
>>  	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_swap);
>>  }
>>  
>> -static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>> +static ssize_t max_ptes_swap_store(struct kobject *kobj,
>>  					      struct kobj_attribute *attr,
>>  					      const char *buf, size_t count)
>>  {
>> @@ -295,17 +290,16 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>>  }
>>  
>>  static struct kobj_attribute khugepaged_max_ptes_swap_attr =
>> -	__ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
>> -	       khugepaged_max_ptes_swap_store);
>> +	__ATTR_RW(max_ptes_swap);
>>  
>> -static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
>> +static ssize_t max_ptes_shared_show(struct kobject *kobj,
>>  					       struct kobj_attribute *attr,
>>  					       char *buf)
>>  {
>>  	return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_shared);
>>  }
>>  
>> -static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>> +static ssize_t max_ptes_shared_store(struct kobject *kobj,
>>  					      struct kobj_attribute *attr,
>>  					      const char *buf, size_t count)
>>  {
>> @@ -322,8 +316,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>>  }
>>  
>>  static struct kobj_attribute khugepaged_max_ptes_shared_attr =
>> -	__ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
>> -	       khugepaged_max_ptes_shared_store);
>> +	__ATTR_RW(max_ptes_shared);
>>  
>>  static struct attribute *khugepaged_attr[] = {
>>  	&khugepaged_defrag_attr.attr,
>> -- 
>> 2.23.0
>>
>>
> 
> For function names that changed, can we align args that don't fit on opening
> line with the opening brace?

Sorry, I forgot to fit on opening line with the opening brace and checkpatch.pl didn't
complain about it. Will fix these and similar case in another thread. Many thanks for
your review and comment!

> 
> Thanks,
> Zach
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-11  8:47 ` [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs Miaohe Lin
@ 2022-06-15 15:14   ` Zach O'Keefe
  2022-06-15 17:51     ` Yang Shi
  2022-06-15 17:49   ` Yang Shi
  1 sibling, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15 15:14 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
> swap entry will remain in pagetable. This will result in later failure.
> So stop swapping in pages in this case to save cpu cycles.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 73570dfffcec..a8adb2d1e9c6 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>  		swapped_in++;
>  		ret = do_swap_page(&vmf);
>  
> -		/* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
> +		/*
> +		 * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
> +		 * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
> +		 * we do not retry here and swap entry will remain in pagetable
> +		 * resulting in later failure.
> +		 */
>  		if (ret & VM_FAULT_RETRY) {
>  			mmap_read_lock(mm);
> -			if (hugepage_vma_revalidate(mm, haddr, &vma)) {
> -				/* vma is no longer available, don't continue to swapin */
> -				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> -				return false;
> -			}
> -			/* check if the pmd is still valid */
> -			if (mm_find_pmd(mm, haddr) != pmd) {
> -				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> -				return false;
> -			}
> +			trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> +			return false;
>  		}
>  		if (ret & VM_FAULT_ERROR) {
>  			trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> -- 
> 2.23.0
> 
>

I've convinced myself this is correct, but don't understand how we got here.
AFAICT, we've always continued to fault in pages, and, as you mention, don't
retry ones that have failed with VM_FAULT_RETRY - so
__collapse_huge_page_isolate() should fail. I don't think (?) there is any
benefit to continuing to swap if we don't handle VM_FAULT_RETRY appropriately.

So, I think this change looks good from that perspective. I suppose the only
other question would be: should we handle the VM_FAULT_RETRY case? Maybe 1
additional attempt then fail? AFAIK, this mostly (?) happens when the page is
locked.  Maybe it's not worth the extra complexity though..

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file
  2022-06-11  8:47 ` [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file Miaohe Lin
@ 2022-06-15 15:54   ` Zach O'Keefe
  2022-06-15 18:18     ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15 15:54 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> nr_none is always 0 for non-shmem case because the page can be read from
> the backend store. So when nr_none ! = 0, it must be in is_shmem case.
> Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
> cpu cycles.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1b5dd3820eac..8e6fad7c7bd9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,
>  
>  	if (nr_none) {
>  		__mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
> -		if (is_shmem)
> -			__mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> +		__mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
>  	}


Might be worth a small comment here - even though folks can see in above code
that this is only incremented in shmem path, might be nice to say why it's
always 0 for non-shmem (or conversely, why it's only possible to be non 0 on
shmem).

>  
>  	/* Join all the small entries into a single multi-index entry */
> @@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,
>  
>  		/* Something went wrong: roll back page cache changes */
>  		xas_lock_irq(&xas);
> -		mapping->nrpages -= nr_none;
> -
> -		if (is_shmem)
> +		if (nr_none) {
> +			mapping->nrpages -= nr_none;
>  			shmem_uncharge(mapping->host, nr_none);
> +		}
>  
>  		xas_set(&xas, start);
>  		xas_for_each(&xas, page, end - 1) {
> -- 
> 2.23.0
> 
> 

Otherwise,

Reviewed-by: Zach O'Keefe <zokeefe@google.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-11  8:47 ` [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible Miaohe Lin
@ 2022-06-15 17:13   ` Zach O'Keefe
  2022-06-16  3:38     ` Mika Penttilä
  2022-06-16  7:33     ` Miaohe Lin
  2022-06-15 23:58   ` Yang Shi
  1 sibling, 2 replies; 40+ messages in thread
From: Zach O'Keefe @ 2022-06-15 17:13 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 11 Jun 16:47, Miaohe Lin wrote:
> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> It's because release_pte_page() is not called for these pages and
> thus free_page_and_swap_cache can't grab the page lock. These pages
> won't be freed from swap cache even if we are the only user until
> next time reclaim. It shouldn't hurt indeed, but we could try to
> free these pages to save more memory for system.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/swap.h | 5 +++++
>  mm/khugepaged.c      | 1 +
>  mm/swap.h            | 5 -----
>  3 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8672a7123ccd..ccb83b12b724 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>  	return global_node_page_state(NR_SWAPCACHE);
>  }
>  
> +extern void free_swap_cache(struct page *page);
>  extern void free_page_and_swap_cache(struct page *);
>  extern void free_pages_and_swap_cache(struct page **, int);
>  /* linux/mm/swapfile.c */
> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>  
> +static inline void free_swap_cache(struct page *page)
> +{
> +}
> +
>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>  {
>  	return 0;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ee0a719c8be9..52109ad13f78 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>  	list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>  		list_del(&src_page->lru);
>  		release_pte_page(src_page);
> +		free_swap_cache(src_page);
>  	}
>  }

Aside: in __collapse_huge_page_isolate() (and also here) why can't we just check
PageCompound(page) && page == compound_head(page) to only act on compound pages
once? AFAIK this would alleviate this compound_pagelist business..

Anyways, as-is, free_page_and_swap_cache() won't be able to do
try_to_free_swap(), since it can't grab page lock, put it will call put_page().
I think (?) the last page ref might be dropped in release_pte_page(), so should
free_swap_cache() come before it?

>  
> diff --git a/mm/swap.h b/mm/swap.h
> index 0193797b0c92..863f6086c916 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>  void delete_from_swap_cache(struct page *page);
>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>  				  unsigned long end);
> -void free_swap_cache(struct page *page);
>  struct page *lookup_swap_cache(swp_entry_t entry,
>  			       struct vm_area_struct *vma,
>  			       unsigned long addr);
> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>  	return NULL;
>  }
>  
> -static inline void free_swap_cache(struct page *page)
> -{
> -}
> -
>  static inline void show_swap_cache_info(void)
>  {
>  }
> -- 
> 2.23.0
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check
  2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
  2022-06-11 20:33   ` Andrew Morton
  2022-06-15  0:13   ` Zach O'Keefe
@ 2022-06-15 17:35   ` Yang Shi
  2 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-15 17:35 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem. Remove this duplicated check.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/khugepaged.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 476d79360101..73570dfffcec 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2153,8 +2153,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>                 if (khugepaged_scan.address < hstart)
>                         khugepaged_scan.address = hstart;
>                 VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
> -               if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
> -                       goto skip;
>
>                 while (khugepaged_scan.address < hend) {
>                         int ret;
> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-11  8:47 ` [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs Miaohe Lin
  2022-06-15 15:14   ` Zach O'Keefe
@ 2022-06-15 17:49   ` Yang Shi
  2022-06-16  6:40     ` Miaohe Lin
  1 sibling, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-15 17:49 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
> swap entry will remain in pagetable. This will result in later failure.
> So stop swapping in pages in this case to save cpu cycles.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 73570dfffcec..a8adb2d1e9c6 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>                 swapped_in++;
>                 ret = do_swap_page(&vmf);
>
> -               /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
> +               /*
> +                * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
> +                * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
> +                * we do not retry here and swap entry will remain in pagetable
> +                * resulting in later failure.

Yeah, it makes sense.

> +                */
>                 if (ret & VM_FAULT_RETRY) {
>                         mmap_read_lock(mm);

A further optimization, you should not need to relock mmap_lock. You
may consider returning a different value or passing in *locked and
setting it to false, then check this value in the caller to skip
unlock.

> -                       if (hugepage_vma_revalidate(mm, haddr, &vma)) {
> -                               /* vma is no longer available, don't continue to swapin */
> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> -                               return false;
> -                       }
> -                       /* check if the pmd is still valid */
> -                       if (mm_find_pmd(mm, haddr) != pmd) {
> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> -                               return false;
> -                       }
> +                       trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> +                       return false;
>                 }
>                 if (ret & VM_FAULT_ERROR) {
>                         trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);

And I think "swapped_in++" needs to be moved after error handling.

> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-15 15:14   ` Zach O'Keefe
@ 2022-06-15 17:51     ` Yang Shi
  2022-06-16  6:08       ` Miaohe Lin
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-15 17:51 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Miaohe Lin, Andrew Morton, Andrea Arcangeli, Matthew Wilcox,
	Vlastimil Babka, David Howells, NeilBrown, Alistair Popple,
	David Hildenbrand, Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Wed, Jun 15, 2022 at 8:14 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On 11 Jun 16:47, Miaohe Lin wrote:
> > When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
> > swap entry will remain in pagetable. This will result in later failure.
> > So stop swapping in pages in this case to save cpu cycles.
> >
> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >  mm/khugepaged.c | 19 ++++++++-----------
> >  1 file changed, 8 insertions(+), 11 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 73570dfffcec..a8adb2d1e9c6 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >               swapped_in++;
> >               ret = do_swap_page(&vmf);
> >
> > -             /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
> > +             /*
> > +              * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
> > +              * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
> > +              * we do not retry here and swap entry will remain in pagetable
> > +              * resulting in later failure.
> > +              */
> >               if (ret & VM_FAULT_RETRY) {
> >                       mmap_read_lock(mm);
> > -                     if (hugepage_vma_revalidate(mm, haddr, &vma)) {
> > -                             /* vma is no longer available, don't continue to swapin */
> > -                             trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> > -                             return false;
> > -                     }
> > -                     /* check if the pmd is still valid */
> > -                     if (mm_find_pmd(mm, haddr) != pmd) {
> > -                             trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> > -                             return false;
> > -                     }
> > +                     trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> > +                     return false;
> >               }
> >               if (ret & VM_FAULT_ERROR) {
> >                       trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> > --
> > 2.23.0
> >
> >
>
> I've convinced myself this is correct, but don't understand how we got here.
> AFAICT, we've always continued to fault in pages, and, as you mention, don't
> retry ones that have failed with VM_FAULT_RETRY - so
> __collapse_huge_page_isolate() should fail. I don't think (?) there is any
> benefit to continuing to swap if we don't handle VM_FAULT_RETRY appropriately.
>
> So, I think this change looks good from that perspective. I suppose the only
> other question would be: should we handle the VM_FAULT_RETRY case? Maybe 1
> additional attempt then fail? AFAIK, this mostly (?) happens when the page is
> locked.  Maybe it's not worth the extra complexity though..

It should be unnecessary for khugepaged IMHO since it will scan all
the valid mm periodically, so it will come back eventually.

>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup
  2022-06-11  8:47 ` [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup Miaohe Lin
  2022-06-15  0:23   ` Zach O'Keefe
@ 2022-06-15 17:53   ` Yang Shi
  1 sibling, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-15 17:53 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Fix some typos and tweak the code to meet codestyle. No functional
> change intended.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/khugepaged.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a8adb2d1e9c6..1b5dd3820eac 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -260,7 +260,7 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>         unsigned long max_ptes_none;
>
>         err = kstrtoul(buf, 10, &max_ptes_none);
> -       if (err || max_ptes_none > HPAGE_PMD_NR-1)
> +       if (err || max_ptes_none > HPAGE_PMD_NR - 1)
>                 return -EINVAL;
>
>         khugepaged_max_ptes_none = max_ptes_none;
> @@ -286,7 +286,7 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>         unsigned long max_ptes_swap;
>
>         err  = kstrtoul(buf, 10, &max_ptes_swap);
> -       if (err || max_ptes_swap > HPAGE_PMD_NR-1)
> +       if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
>                 return -EINVAL;
>
>         khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -313,7 +313,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>         unsigned long max_ptes_shared;
>
>         err  = kstrtoul(buf, 10, &max_ptes_shared);
> -       if (err || max_ptes_shared > HPAGE_PMD_NR-1)
> +       if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
>                 return -EINVAL;
>
>         khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -599,7 +599,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>         int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
>         bool writable = false;
>
> -       for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
> +       for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>              _pte++, address += PAGE_SIZE) {
>                 pte_t pteval = *_pte;
>                 if (pte_none(pteval) || (pte_present(pteval) &&
> @@ -1216,7 +1216,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>
>         memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
>         pte = pte_offset_map_lock(mm, pmd, address, &ptl);
> -       for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
> +       for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>              _pte++, _address += PAGE_SIZE) {
>                 pte_t pteval = *_pte;
>                 if (is_swap_pte(pteval)) {
> @@ -1306,7 +1306,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>                 /*
>                  * Check if the page has any GUP (or other external) pins.
>                  *
> -                * Here the check is racy it may see totmal_mapcount > refcount
> +                * Here the check is racy it may see total_mapcount > refcount
>                  * in some cases.
>                  * For example, one process with one forked child process.
>                  * The parent has the PMD split due to MADV_DONTNEED, then
> @@ -1557,7 +1557,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>                  * mmap_write_lock(mm) as PMD-mapping is likely to be split
>                  * later.
>                  *
> -                * Not that vma->anon_vma check is racy: it can be set up after
> +                * Note that vma->anon_vma check is racy: it can be set up after
>                  * the check but before we took mmap_lock by the fault path.
>                  * But page lock would prevent establishing any new ptes of the
>                  * page, so we are safe.
> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file
  2022-06-15 15:54   ` Zach O'Keefe
@ 2022-06-15 18:18     ` Yang Shi
  2022-06-16  6:10       ` Miaohe Lin
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-15 18:18 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Miaohe Lin, Andrew Morton, Andrea Arcangeli, Matthew Wilcox,
	Vlastimil Babka, David Howells, NeilBrown, Alistair Popple,
	David Hildenbrand, Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Wed, Jun 15, 2022 at 8:55 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On 11 Jun 16:47, Miaohe Lin wrote:
> > nr_none is always 0 for non-shmem case because the page can be read from
> > the backend store. So when nr_none ! = 0, it must be in is_shmem case.
> > Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
> > cpu cycles.
> >
> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >  mm/khugepaged.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 1b5dd3820eac..8e6fad7c7bd9 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,
> >
> >       if (nr_none) {
> >               __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
> > -             if (is_shmem)
> > -                     __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> > +             __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> >       }
>
>
> Might be worth a small comment here - even though folks can see in above code
> that this is only incremented in shmem path, might be nice to say why it's
> always 0 for non-shmem (or conversely, why it's only possible to be non 0 on
> shmem).

Agreed, better to have some comments in the code.

>
> >
> >       /* Join all the small entries into a single multi-index entry */
> > @@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,
> >
> >               /* Something went wrong: roll back page cache changes */
> >               xas_lock_irq(&xas);
> > -             mapping->nrpages -= nr_none;
> > -
> > -             if (is_shmem)
> > +             if (nr_none) {
> > +                     mapping->nrpages -= nr_none;
> >                       shmem_uncharge(mapping->host, nr_none);
> > +             }
> >
> >               xas_set(&xas, start);
> >               xas_for_each(&xas, page, end - 1) {
> > --
> > 2.23.0
> >
> >
>
> Otherwise,
>
> Reviewed-by: Zach O'Keefe <zokeefe@google.com>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW
  2022-06-11  8:47 ` [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW Miaohe Lin
  2022-06-15  0:29   ` Zach O'Keefe
@ 2022-06-15 21:28   ` Yang Shi
  2022-06-16  7:07     ` Miaohe Lin
  1 sibling, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-15 21:28 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Use helper macro __ATTR_RW to define the khugepaged attributes. Minor
> readability improvement.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/khugepaged.c | 37 +++++++++++++++----------------------
>  1 file changed, 15 insertions(+), 22 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 8e6fad7c7bd9..142e26e4bdbf 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -147,8 +147,7 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
>         return count;
>  }
>  static struct kobj_attribute scan_sleep_millisecs_attr =
> -       __ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
> -              scan_sleep_millisecs_store);
> +       __ATTR_RW(scan_sleep_millisecs);
>
>  static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
>                                           struct kobj_attribute *attr,
> @@ -175,8 +174,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
>         return count;
>  }
>  static struct kobj_attribute alloc_sleep_millisecs_attr =
> -       __ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
> -              alloc_sleep_millisecs_store);
> +       __ATTR_RW(alloc_sleep_millisecs);
>
>  static ssize_t pages_to_scan_show(struct kobject *kobj,
>                                   struct kobj_attribute *attr,
> @@ -200,8 +198,7 @@ static ssize_t pages_to_scan_store(struct kobject *kobj,
>         return count;
>  }
>  static struct kobj_attribute pages_to_scan_attr =
> -       __ATTR(pages_to_scan, 0644, pages_to_scan_show,
> -              pages_to_scan_store);
> +       __ATTR_RW(pages_to_scan);
>
>  static ssize_t pages_collapsed_show(struct kobject *kobj,
>                                     struct kobj_attribute *attr,
> @@ -221,13 +218,13 @@ static ssize_t full_scans_show(struct kobject *kobj,
>  static struct kobj_attribute full_scans_attr =
>         __ATTR_RO(full_scans);
>
> -static ssize_t khugepaged_defrag_show(struct kobject *kobj,
> +static ssize_t defrag_show(struct kobject *kobj,
>                                       struct kobj_attribute *attr, char *buf)

Why do you rename all the functions? Seems unnecessary and less intriguing TBH.

>  {
>         return single_hugepage_flag_show(kobj, attr, buf,
>                                          TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>  }
> -static ssize_t khugepaged_defrag_store(struct kobject *kobj,
> +static ssize_t defrag_store(struct kobject *kobj,
>                                        struct kobj_attribute *attr,
>                                        const char *buf, size_t count)
>  {
> @@ -235,8 +232,7 @@ static ssize_t khugepaged_defrag_store(struct kobject *kobj,
>                                  TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>  }
>  static struct kobj_attribute khugepaged_defrag_attr =
> -       __ATTR(defrag, 0644, khugepaged_defrag_show,
> -              khugepaged_defrag_store);
> +       __ATTR_RW(defrag);
>
>  /*
>   * max_ptes_none controls if khugepaged should collapse hugepages over
> @@ -246,13 +242,13 @@ static struct kobj_attribute khugepaged_defrag_attr =
>   * runs. Increasing max_ptes_none will instead potentially reduce the
>   * free memory in the system during the khugepaged scan.
>   */
> -static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
> +static ssize_t max_ptes_none_show(struct kobject *kobj,
>                                              struct kobj_attribute *attr,
>                                              char *buf)
>  {
>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_none);
>  }
> -static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
> +static ssize_t max_ptes_none_store(struct kobject *kobj,
>                                               struct kobj_attribute *attr,
>                                               const char *buf, size_t count)
>  {
> @@ -268,17 +264,16 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>         return count;
>  }
>  static struct kobj_attribute khugepaged_max_ptes_none_attr =
> -       __ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
> -              khugepaged_max_ptes_none_store);
> +       __ATTR_RW(max_ptes_none);
>
> -static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
> +static ssize_t max_ptes_swap_show(struct kobject *kobj,
>                                              struct kobj_attribute *attr,
>                                              char *buf)
>  {
>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_swap);
>  }
>
> -static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
> +static ssize_t max_ptes_swap_store(struct kobject *kobj,
>                                               struct kobj_attribute *attr,
>                                               const char *buf, size_t count)
>  {
> @@ -295,17 +290,16 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>  }
>
>  static struct kobj_attribute khugepaged_max_ptes_swap_attr =
> -       __ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
> -              khugepaged_max_ptes_swap_store);
> +       __ATTR_RW(max_ptes_swap);
>
> -static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
> +static ssize_t max_ptes_shared_show(struct kobject *kobj,
>                                                struct kobj_attribute *attr,
>                                                char *buf)
>  {
>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_shared);
>  }
>
> -static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
> +static ssize_t max_ptes_shared_store(struct kobject *kobj,
>                                               struct kobj_attribute *attr,
>                                               const char *buf, size_t count)
>  {
> @@ -322,8 +316,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>  }
>
>  static struct kobj_attribute khugepaged_max_ptes_shared_attr =
> -       __ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
> -              khugepaged_max_ptes_shared_store);
> +       __ATTR_RW(max_ptes_shared);
>
>  static struct attribute *khugepaged_attr[] = {
>         &khugepaged_defrag_attr.attr,
> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp()
  2022-06-11  8:47 ` [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp() Miaohe Lin
  2022-06-15  0:19   ` Zach O'Keefe
@ 2022-06-15 21:35   ` Yang Shi
  1 sibling, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-15 21:35 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> The return value of khugepaged_add_pte_mapped_thp() is always 0 and also
> ignored. Remove it to clean up the code.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/khugepaged.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 142e26e4bdbf..ee0a719c8be9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1372,7 +1372,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
>   * Notify khugepaged that given addr of the mm is pte-mapped THP. Then
>   * khugepaged should try to collapse the page table.
>   */
> -static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
> +static void khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
>                                          unsigned long addr)
>  {
>         struct mm_slot *mm_slot;
> @@ -1384,7 +1384,6 @@ static int khugepaged_add_pte_mapped_thp(struct mm_struct *mm,
>         if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP))
>                 mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr;
>         spin_unlock(&khugepaged_mm_lock);
> -       return 0;
>  }
>
>  static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-11  8:47 ` [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible Miaohe Lin
  2022-06-15 17:13   ` Zach O'Keefe
@ 2022-06-15 23:58   ` Yang Shi
  2022-06-16  7:42     ` Miaohe Lin
  1 sibling, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-15 23:58 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> It's because release_pte_page() is not called for these pages and
> thus free_page_and_swap_cache can't grab the page lock. These pages
> won't be freed from swap cache even if we are the only user until
> next time reclaim. It shouldn't hurt indeed, but we could try to
> free these pages to save more memory for system.


>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/swap.h | 5 +++++
>  mm/khugepaged.c      | 1 +
>  mm/swap.h            | 5 -----
>  3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8672a7123ccd..ccb83b12b724 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>         return global_node_page_state(NR_SWAPCACHE);
>  }
>
> +extern void free_swap_cache(struct page *page);
>  extern void free_page_and_swap_cache(struct page *);
>  extern void free_pages_and_swap_cache(struct page **, int);
>  /* linux/mm/swapfile.c */
> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>
> +static inline void free_swap_cache(struct page *page)
> +{
> +}
> +
>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>  {
>         return 0;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ee0a719c8be9..52109ad13f78 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>                 list_del(&src_page->lru);
>                 release_pte_page(src_page);
> +               free_swap_cache(src_page);

Will this really work? The free_swap_cache() will just dec refcounts
without putting the page back to buddy. So the hugepage is not
actually freed at all. Am I missing something?

>         }
>  }
>
> diff --git a/mm/swap.h b/mm/swap.h
> index 0193797b0c92..863f6086c916 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>  void delete_from_swap_cache(struct page *page);
>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>                                   unsigned long end);
> -void free_swap_cache(struct page *page);
>  struct page *lookup_swap_cache(swp_entry_t entry,
>                                struct vm_area_struct *vma,
>                                unsigned long addr);
> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>         return NULL;
>  }
>
> -static inline void free_swap_cache(struct page *page)
> -{
> -}
> -
>  static inline void show_swap_cache_info(void)
>  {
>  }
> --
> 2.23.0
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-15 17:13   ` Zach O'Keefe
@ 2022-06-16  3:38     ` Mika Penttilä
  2022-06-16  7:33     ` Miaohe Lin
  1 sibling, 0 replies; 40+ messages in thread
From: Mika Penttilä @ 2022-06-16  3:38 UTC (permalink / raw)
  To: Zach O'Keefe, Miaohe Lin
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel


On 15.6.2022 20.13, Zach O'Keefe wrote:
> On 11 Jun 16:47, Miaohe Lin wrote:
>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>> It's because release_pte_page() is not called for these pages and
>> thus free_page_and_swap_cache can't grab the page lock. These pages
>> won't be freed from swap cache even if we are the only user until
>> next time reclaim. It shouldn't hurt indeed, but we could try to
>> free these pages to save more memory for system.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>   include/linux/swap.h | 5 +++++
>>   mm/khugepaged.c      | 1 +
>>   mm/swap.h            | 5 -----
>>   3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 8672a7123ccd..ccb83b12b724 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>>   	return global_node_page_state(NR_SWAPCACHE);
>>   }
>>   
>> +extern void free_swap_cache(struct page *page);
>>   extern void free_page_and_swap_cache(struct page *);
>>   extern void free_pages_and_swap_cache(struct page **, int);
>>   /* linux/mm/swapfile.c */
>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>>   /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>>   #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>   
>> +static inline void free_swap_cache(struct page *page)
>> +{
>> +}
>> +
>>   static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>>   {
>>   	return 0;
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index ee0a719c8be9..52109ad13f78 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>   	list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>   		list_del(&src_page->lru);
>>   		release_pte_page(src_page);
>> +		free_swap_cache(src_page);
>>   	}
>>   }
> 
> Aside: in __collapse_huge_page_isolate() (and also here) why can't we just check
> PageCompound(page) && page == compound_head(page) to only act on compound pages
> once? AFAIK this would alleviate this compound_pagelist business..



It is for pte mapped compound pages. Things like lock/unlock page and 
isolate/putback lru have to operate on head pages and you have to do 
operations like copy from tail pages before releasing heads. So while 
maybe you could add tests for head/tail here and there, work pages 
backwards etc it easily gets messier than the current solution with 
compound_pagelist.



> 
> Anyways, as-is, free_page_and_swap_cache() won't be able to do
> try_to_free_swap(), since it can't grab page lock, put it will call put_page().
> I think (?) the last page ref might be dropped in release_pte_page(), so should
> free_swap_cache() come before it?
> 
>>   
>> diff --git a/mm/swap.h b/mm/swap.h
>> index 0193797b0c92..863f6086c916 100644
>> --- a/mm/swap.h
>> +++ b/mm/swap.h
>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>>   void delete_from_swap_cache(struct page *page);
>>   void clear_shadow_from_swap_cache(int type, unsigned long begin,
>>   				  unsigned long end);
>> -void free_swap_cache(struct page *page);
>>   struct page *lookup_swap_cache(swp_entry_t entry,
>>   			       struct vm_area_struct *vma,
>>   			       unsigned long addr);
>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>>   	return NULL;
>>   }
>>   
>> -static inline void free_swap_cache(struct page *page)
>> -{
>> -}
>> -
>>   static inline void show_swap_cache_info(void)
>>   {
>>   }
>> -- 
>> 2.23.0
>>
>>
> 

--Mika


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-15 17:51     ` Yang Shi
@ 2022-06-16  6:08       ` Miaohe Lin
  0 siblings, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-16  6:08 UTC (permalink / raw)
  To: Yang Shi, Zach O'Keefe
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/16 1:51, Yang Shi wrote:
> On Wed, Jun 15, 2022 at 8:14 AM Zach O'Keefe <zokeefe@google.com> wrote:
>>
>> On 11 Jun 16:47, Miaohe Lin wrote:
>>> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
>>> swap entry will remain in pagetable. This will result in later failure.
>>> So stop swapping in pages in this case to save cpu cycles.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/khugepaged.c | 19 ++++++++-----------
>>>  1 file changed, 8 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 73570dfffcec..a8adb2d1e9c6 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>>>               swapped_in++;
>>>               ret = do_swap_page(&vmf);
>>>
>>> -             /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
>>> +             /*
>>> +              * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
>>> +              * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
>>> +              * we do not retry here and swap entry will remain in pagetable
>>> +              * resulting in later failure.
>>> +              */
>>>               if (ret & VM_FAULT_RETRY) {
>>>                       mmap_read_lock(mm);
>>> -                     if (hugepage_vma_revalidate(mm, haddr, &vma)) {
>>> -                             /* vma is no longer available, don't continue to swapin */
>>> -                             trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>>> -                             return false;
>>> -                     }
>>> -                     /* check if the pmd is still valid */
>>> -                     if (mm_find_pmd(mm, haddr) != pmd) {
>>> -                             trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>>> -                             return false;
>>> -                     }
>>> +                     trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>>> +                     return false;
>>>               }
>>>               if (ret & VM_FAULT_ERROR) {
>>>                       trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>>> --
>>> 2.23.0
>>>
>>>
>>
>> I've convinced myself this is correct, but don't understand how we got here.
>> AFAICT, we've always continued to fault in pages, and, as you mention, don't
>> retry ones that have failed with VM_FAULT_RETRY - so
>> __collapse_huge_page_isolate() should fail. I don't think (?) there is any
>> benefit to continuing to swap if we don't handle VM_FAULT_RETRY appropriately.
>>
>> So, I think this change looks good from that perspective. I suppose the only
>> other question would be: should we handle the VM_FAULT_RETRY case? Maybe 1
>> additional attempt then fail? AFAIK, this mostly (?) happens when the page is
>> locked.  Maybe it's not worth the extra complexity though..
> 
> It should be unnecessary for khugepaged IMHO since it will scan all
> the valid mm periodically, so it will come back eventually.

I tend to agree with Yang. Khugepaged will come back eventually so it's not
worth the extra complexity.

Thanks both!

> 
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file
  2022-06-15 18:18     ` Yang Shi
@ 2022-06-16  6:10       ` Miaohe Lin
  0 siblings, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-16  6:10 UTC (permalink / raw)
  To: Yang Shi, Zach O'Keefe
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/16 2:18, Yang Shi wrote:
> On Wed, Jun 15, 2022 at 8:55 AM Zach O'Keefe <zokeefe@google.com> wrote:
>>
>> On 11 Jun 16:47, Miaohe Lin wrote:
>>> nr_none is always 0 for non-shmem case because the page can be read from
>>> the backend store. So when nr_none ! = 0, it must be in is_shmem case.
>>> Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
>>> cpu cycles.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/khugepaged.c | 9 ++++-----
>>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index 1b5dd3820eac..8e6fad7c7bd9 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,
>>>
>>>       if (nr_none) {
>>>               __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
>>> -             if (is_shmem)
>>> -                     __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
>>> +             __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
>>>       }
>>
>>
>> Might be worth a small comment here - even though folks can see in above code
>> that this is only incremented in shmem path, might be nice to say why it's
>> always 0 for non-shmem (or conversely, why it's only possible to be non 0 on
>> shmem).
> 
> Agreed, better to have some comments in the code.

Will try to add comments in next version. Thanks both!

> 
>>
>>>
>>>       /* Join all the small entries into a single multi-index entry */
>>> @@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,
>>>
>>>               /* Something went wrong: roll back page cache changes */
>>>               xas_lock_irq(&xas);
>>> -             mapping->nrpages -= nr_none;
>>> -
>>> -             if (is_shmem)
>>> +             if (nr_none) {
>>> +                     mapping->nrpages -= nr_none;
>>>                       shmem_uncharge(mapping->host, nr_none);
>>> +             }
>>>
>>>               xas_set(&xas, start);
>>>               xas_for_each(&xas, page, end - 1) {
>>> --
>>> 2.23.0
>>>
>>>
>>
>> Otherwise,
>>
>> Reviewed-by: Zach O'Keefe <zokeefe@google.com>
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-15 17:49   ` Yang Shi
@ 2022-06-16  6:40     ` Miaohe Lin
  2022-06-16 15:46       ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-16  6:40 UTC (permalink / raw)
  To: Yang Shi
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/16 1:49, Yang Shi wrote:
> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
>> swap entry will remain in pagetable. This will result in later failure.
>> So stop swapping in pages in this case to save cpu cycles.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/khugepaged.c | 19 ++++++++-----------
>>  1 file changed, 8 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 73570dfffcec..a8adb2d1e9c6 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>>                 swapped_in++;
>>                 ret = do_swap_page(&vmf);
>>
>> -               /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
>> +               /*
>> +                * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
>> +                * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
>> +                * we do not retry here and swap entry will remain in pagetable
>> +                * resulting in later failure.
> 
> Yeah, it makes sense.
> 
>> +                */
>>                 if (ret & VM_FAULT_RETRY) {
>>                         mmap_read_lock(mm);
> 
> A further optimization, you should not need to relock mmap_lock. You
> may consider returning a different value or passing in *locked and
> setting it to false, then check this value in the caller to skip
> unlock.

Could we just keep the mmap_sem unlocked when __collapse_huge_page_swapin() fails due to the caller
always doing mmap_read_unlock when __collapse_huge_page_swapin() returns false and add some comments
about this behavior? This looks like a simple way for me.

> 
>> -                       if (hugepage_vma_revalidate(mm, haddr, &vma)) {
>> -                               /* vma is no longer available, don't continue to swapin */
>> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>> -                               return false;
>> -                       }
>> -                       /* check if the pmd is still valid */
>> -                       if (mm_find_pmd(mm, haddr) != pmd) {
>> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>> -                               return false;
>> -                       }
>> +                       trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
>> +                       return false;
>>                 }
>>                 if (ret & VM_FAULT_ERROR) {
>>                         trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> 
> And I think "swapped_in++" needs to be moved after error handling.

Do you mean do "swapped_in++" only after pages are swapped in successfully?

Thanks!

> 
>> --
>> 2.23.0
>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW
  2022-06-15 21:28   ` Yang Shi
@ 2022-06-16  7:07     ` Miaohe Lin
  2022-06-16 15:48       ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-16  7:07 UTC (permalink / raw)
  To: Yang Shi
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/16 5:28, Yang Shi wrote:
> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> Use helper macro __ATTR_RW to define the khugepaged attributes. Minor
>> readability improvement.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/khugepaged.c | 37 +++++++++++++++----------------------
>>  1 file changed, 15 insertions(+), 22 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 8e6fad7c7bd9..142e26e4bdbf 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -147,8 +147,7 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
>>         return count;
>>  }
>>  static struct kobj_attribute scan_sleep_millisecs_attr =
>> -       __ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
>> -              scan_sleep_millisecs_store);
>> +       __ATTR_RW(scan_sleep_millisecs);
>>
>>  static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
>>                                           struct kobj_attribute *attr,
>> @@ -175,8 +174,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
>>         return count;
>>  }
>>  static struct kobj_attribute alloc_sleep_millisecs_attr =
>> -       __ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
>> -              alloc_sleep_millisecs_store);
>> +       __ATTR_RW(alloc_sleep_millisecs);
>>
>>  static ssize_t pages_to_scan_show(struct kobject *kobj,
>>                                   struct kobj_attribute *attr,
>> @@ -200,8 +198,7 @@ static ssize_t pages_to_scan_store(struct kobject *kobj,
>>         return count;
>>  }
>>  static struct kobj_attribute pages_to_scan_attr =
>> -       __ATTR(pages_to_scan, 0644, pages_to_scan_show,
>> -              pages_to_scan_store);
>> +       __ATTR_RW(pages_to_scan);
>>
>>  static ssize_t pages_collapsed_show(struct kobject *kobj,
>>                                     struct kobj_attribute *attr,
>> @@ -221,13 +218,13 @@ static ssize_t full_scans_show(struct kobject *kobj,
>>  static struct kobj_attribute full_scans_attr =
>>         __ATTR_RO(full_scans);
>>
>> -static ssize_t khugepaged_defrag_show(struct kobject *kobj,
>> +static ssize_t defrag_show(struct kobject *kobj,
>>                                       struct kobj_attribute *attr, char *buf)
> 
> Why do you rename all the functions? Seems unnecessary and less intriguing TBH.

It's because e.g. __ATTR_RW(defrag) expects the defrag_show and defrag_store instead
of khugepaged_defrag_show and khugepaged_defrag_store.

Thanks.

> 
>>  {
>>         return single_hugepage_flag_show(kobj, attr, buf,
>>                                          TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>>  }
>> -static ssize_t khugepaged_defrag_store(struct kobject *kobj,
>> +static ssize_t defrag_store(struct kobject *kobj,
>>                                        struct kobj_attribute *attr,
>>                                        const char *buf, size_t count)
>>  {
>> @@ -235,8 +232,7 @@ static ssize_t khugepaged_defrag_store(struct kobject *kobj,
>>                                  TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
>>  }
>>  static struct kobj_attribute khugepaged_defrag_attr =
>> -       __ATTR(defrag, 0644, khugepaged_defrag_show,
>> -              khugepaged_defrag_store);
>> +       __ATTR_RW(defrag);
>>
>>  /*
>>   * max_ptes_none controls if khugepaged should collapse hugepages over
>> @@ -246,13 +242,13 @@ static struct kobj_attribute khugepaged_defrag_attr =
>>   * runs. Increasing max_ptes_none will instead potentially reduce the
>>   * free memory in the system during the khugepaged scan.
>>   */
>> -static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
>> +static ssize_t max_ptes_none_show(struct kobject *kobj,
>>                                              struct kobj_attribute *attr,
>>                                              char *buf)
>>  {
>>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_none);
>>  }
>> -static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>> +static ssize_t max_ptes_none_store(struct kobject *kobj,
>>                                               struct kobj_attribute *attr,
>>                                               const char *buf, size_t count)
>>  {
>> @@ -268,17 +264,16 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
>>         return count;
>>  }
>>  static struct kobj_attribute khugepaged_max_ptes_none_attr =
>> -       __ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
>> -              khugepaged_max_ptes_none_store);
>> +       __ATTR_RW(max_ptes_none);
>>
>> -static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
>> +static ssize_t max_ptes_swap_show(struct kobject *kobj,
>>                                              struct kobj_attribute *attr,
>>                                              char *buf)
>>  {
>>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_swap);
>>  }
>>
>> -static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>> +static ssize_t max_ptes_swap_store(struct kobject *kobj,
>>                                               struct kobj_attribute *attr,
>>                                               const char *buf, size_t count)
>>  {
>> @@ -295,17 +290,16 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
>>  }
>>
>>  static struct kobj_attribute khugepaged_max_ptes_swap_attr =
>> -       __ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
>> -              khugepaged_max_ptes_swap_store);
>> +       __ATTR_RW(max_ptes_swap);
>>
>> -static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
>> +static ssize_t max_ptes_shared_show(struct kobject *kobj,
>>                                                struct kobj_attribute *attr,
>>                                                char *buf)
>>  {
>>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_shared);
>>  }
>>
>> -static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>> +static ssize_t max_ptes_shared_store(struct kobject *kobj,
>>                                               struct kobj_attribute *attr,
>>                                               const char *buf, size_t count)
>>  {
>> @@ -322,8 +316,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
>>  }
>>
>>  static struct kobj_attribute khugepaged_max_ptes_shared_attr =
>> -       __ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
>> -              khugepaged_max_ptes_shared_store);
>> +       __ATTR_RW(max_ptes_shared);
>>
>>  static struct attribute *khugepaged_attr[] = {
>>         &khugepaged_defrag_attr.attr,
>> --
>> 2.23.0
>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-15 17:13   ` Zach O'Keefe
  2022-06-16  3:38     ` Mika Penttilä
@ 2022-06-16  7:33     ` Miaohe Lin
  1 sibling, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-16  7:33 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: akpm, aarcange, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, linux-mm, linux-kernel

On 2022/6/16 1:13, Zach O'Keefe wrote:
> On 11 Jun 16:47, Miaohe Lin wrote:
>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>> It's because release_pte_page() is not called for these pages and
>> thus free_page_and_swap_cache can't grab the page lock. These pages
>> won't be freed from swap cache even if we are the only user until
>> next time reclaim. It shouldn't hurt indeed, but we could try to
>> free these pages to save more memory for system.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  include/linux/swap.h | 5 +++++
>>  mm/khugepaged.c      | 1 +
>>  mm/swap.h            | 5 -----
>>  3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 8672a7123ccd..ccb83b12b724 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>>  	return global_node_page_state(NR_SWAPCACHE);
>>  }
>>  
>> +extern void free_swap_cache(struct page *page);
>>  extern void free_page_and_swap_cache(struct page *);
>>  extern void free_pages_and_swap_cache(struct page **, int);
>>  /* linux/mm/swapfile.c */
>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>  
>> +static inline void free_swap_cache(struct page *page)
>> +{
>> +}
>> +
>>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>>  {
>>  	return 0;
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index ee0a719c8be9..52109ad13f78 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>  	list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>  		list_del(&src_page->lru);
>>  		release_pte_page(src_page);
>> +		free_swap_cache(src_page);
>>  	}
>>  }
> 
> Aside: in __collapse_huge_page_isolate() (and also here) why can't we just check
> PageCompound(page) && page == compound_head(page) to only act on compound pages
> once? AFAIK this would alleviate this compound_pagelist business..
> 
> Anyways, as-is, free_page_and_swap_cache() won't be able to do
> try_to_free_swap(), since it can't grab page lock, put it will call put_page().
> I think (?) the last page ref might be dropped in release_pte_page(), so should
> free_swap_cache() come before it?

Thanks for catching this! If page is not in swapcache, the last page ref might be dropped.
So it's bad to call free_swap_cache() after it. Thanks!

> 
>>  
>> diff --git a/mm/swap.h b/mm/swap.h
>> index 0193797b0c92..863f6086c916 100644
>> --- a/mm/swap.h
>> +++ b/mm/swap.h
>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>>  void delete_from_swap_cache(struct page *page);
>>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>>  				  unsigned long end);
>> -void free_swap_cache(struct page *page);
>>  struct page *lookup_swap_cache(swp_entry_t entry,
>>  			       struct vm_area_struct *vma,
>>  			       unsigned long addr);
>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>>  	return NULL;
>>  }
>>  
>> -static inline void free_swap_cache(struct page *page)
>> -{
>> -}
>> -
>>  static inline void show_swap_cache_info(void)
>>  {
>>  }
>> -- 
>> 2.23.0
>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-15 23:58   ` Yang Shi
@ 2022-06-16  7:42     ` Miaohe Lin
  2022-06-16 15:53       ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-16  7:42 UTC (permalink / raw)
  To: Yang Shi
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/16 7:58, Yang Shi wrote:
> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>> It's because release_pte_page() is not called for these pages and
>> thus free_page_and_swap_cache can't grab the page lock. These pages
>> won't be freed from swap cache even if we are the only user until
>> next time reclaim. It shouldn't hurt indeed, but we could try to
>> free these pages to save more memory for system.
> 
> 
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  include/linux/swap.h | 5 +++++
>>  mm/khugepaged.c      | 1 +
>>  mm/swap.h            | 5 -----
>>  3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 8672a7123ccd..ccb83b12b724 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>>         return global_node_page_state(NR_SWAPCACHE);
>>  }
>>
>> +extern void free_swap_cache(struct page *page);
>>  extern void free_page_and_swap_cache(struct page *);
>>  extern void free_pages_and_swap_cache(struct page **, int);
>>  /* linux/mm/swapfile.c */
>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>
>> +static inline void free_swap_cache(struct page *page)
>> +{
>> +}
>> +
>>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>>  {
>>         return 0;
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index ee0a719c8be9..52109ad13f78 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>                 list_del(&src_page->lru);
>>                 release_pte_page(src_page);
>> +               free_swap_cache(src_page);
> 
> Will this really work? The free_swap_cache() will just dec refcounts
> without putting the page back to buddy. So the hugepage is not
> actually freed at all. Am I missing something?

Thanks for catching this! If page is on percpu lru_pvecs cache, page will
be released when lru_pvecs are drained. But if not, free_swap_cache() won't
free the page as it assumes the caller has a reference on the page and thus
only does page_ref_sub(). Does the below change looks sense for you?

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 52109ad13f78..b8c96e33591d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,

        list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
                list_del(&src_page->lru);
-               release_pte_page(src_page);
+               mod_node_page_state(page_pgdat(src_page),
+                                   NR_ISOLATED_ANON + page_is_file_lru(src_page),
+                                   -compound_nr(src_page));
+               unlock_page(src_page);
                free_swap_cache(src_page);
+               putback_lru_page(src_page);
        }
 }

Thanks!

> 
>>         }
>>  }
>>
>> diff --git a/mm/swap.h b/mm/swap.h
>> index 0193797b0c92..863f6086c916 100644
>> --- a/mm/swap.h
>> +++ b/mm/swap.h
>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>>  void delete_from_swap_cache(struct page *page);
>>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>>                                   unsigned long end);
>> -void free_swap_cache(struct page *page);
>>  struct page *lookup_swap_cache(swp_entry_t entry,
>>                                struct vm_area_struct *vma,
>>                                unsigned long addr);
>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>>         return NULL;
>>  }
>>
>> -static inline void free_swap_cache(struct page *page)
>> -{
>> -}
>> -
>>  static inline void show_swap_cache_info(void)
>>  {
>>  }
>> --
>> 2.23.0
>>
>>
> .
> 


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
  2022-06-16  6:40     ` Miaohe Lin
@ 2022-06-16 15:46       ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-16 15:46 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Wed, Jun 15, 2022 at 11:40 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/6/16 1:49, Yang Shi wrote:
> > On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> When do_swap_page returns VM_FAULT_RETRY, we do not retry here and thus
> >> swap entry will remain in pagetable. This will result in later failure.
> >> So stop swapping in pages in this case to save cpu cycles.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/khugepaged.c | 19 ++++++++-----------
> >>  1 file changed, 8 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index 73570dfffcec..a8adb2d1e9c6 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -1003,19 +1003,16 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >>                 swapped_in++;
> >>                 ret = do_swap_page(&vmf);
> >>
> >> -               /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
> >> +               /*
> >> +                * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
> >> +                * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
> >> +                * we do not retry here and swap entry will remain in pagetable
> >> +                * resulting in later failure.
> >
> > Yeah, it makes sense.
> >
> >> +                */
> >>                 if (ret & VM_FAULT_RETRY) {
> >>                         mmap_read_lock(mm);
> >
> > A further optimization, you should not need to relock mmap_lock. You
> > may consider returning a different value or passing in *locked and
> > setting it to false, then check this value in the caller to skip
> > unlock.
>
> Could we just keep the mmap_sem unlocked when __collapse_huge_page_swapin() fails due to the caller
> always doing mmap_read_unlock when __collapse_huge_page_swapin() returns false and add some comments
> about this behavior? This looks like a simple way for me.

Yeah, that sounds better.

>
> >
> >> -                       if (hugepage_vma_revalidate(mm, haddr, &vma)) {
> >> -                               /* vma is no longer available, don't continue to swapin */
> >> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >> -                               return false;
> >> -                       }
> >> -                       /* check if the pmd is still valid */
> >> -                       if (mm_find_pmd(mm, haddr) != pmd) {
> >> -                               trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >> -                               return false;
> >> -                       }
> >> +                       trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >> +                       return false;
> >>                 }
> >>                 if (ret & VM_FAULT_ERROR) {
> >>                         trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
> >
> > And I think "swapped_in++" needs to be moved after error handling.
>
> Do you mean do "swapped_in++" only after pages are swapped in successfully?

Yes.

>
> Thanks!
>
> >
> >> --
> >> 2.23.0
> >>
> >>
> > .
> >
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW
  2022-06-16  7:07     ` Miaohe Lin
@ 2022-06-16 15:48       ` Yang Shi
  0 siblings, 0 replies; 40+ messages in thread
From: Yang Shi @ 2022-06-16 15:48 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Thu, Jun 16, 2022 at 12:07 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/6/16 5:28, Yang Shi wrote:
> > On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> Use helper macro __ATTR_RW to define the khugepaged attributes. Minor
> >> readability improvement.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/khugepaged.c | 37 +++++++++++++++----------------------
> >>  1 file changed, 15 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index 8e6fad7c7bd9..142e26e4bdbf 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -147,8 +147,7 @@ static ssize_t scan_sleep_millisecs_store(struct kobject *kobj,
> >>         return count;
> >>  }
> >>  static struct kobj_attribute scan_sleep_millisecs_attr =
> >> -       __ATTR(scan_sleep_millisecs, 0644, scan_sleep_millisecs_show,
> >> -              scan_sleep_millisecs_store);
> >> +       __ATTR_RW(scan_sleep_millisecs);
> >>
> >>  static ssize_t alloc_sleep_millisecs_show(struct kobject *kobj,
> >>                                           struct kobj_attribute *attr,
> >> @@ -175,8 +174,7 @@ static ssize_t alloc_sleep_millisecs_store(struct kobject *kobj,
> >>         return count;
> >>  }
> >>  static struct kobj_attribute alloc_sleep_millisecs_attr =
> >> -       __ATTR(alloc_sleep_millisecs, 0644, alloc_sleep_millisecs_show,
> >> -              alloc_sleep_millisecs_store);
> >> +       __ATTR_RW(alloc_sleep_millisecs);
> >>
> >>  static ssize_t pages_to_scan_show(struct kobject *kobj,
> >>                                   struct kobj_attribute *attr,
> >> @@ -200,8 +198,7 @@ static ssize_t pages_to_scan_store(struct kobject *kobj,
> >>         return count;
> >>  }
> >>  static struct kobj_attribute pages_to_scan_attr =
> >> -       __ATTR(pages_to_scan, 0644, pages_to_scan_show,
> >> -              pages_to_scan_store);
> >> +       __ATTR_RW(pages_to_scan);
> >>
> >>  static ssize_t pages_collapsed_show(struct kobject *kobj,
> >>                                     struct kobj_attribute *attr,
> >> @@ -221,13 +218,13 @@ static ssize_t full_scans_show(struct kobject *kobj,
> >>  static struct kobj_attribute full_scans_attr =
> >>         __ATTR_RO(full_scans);
> >>
> >> -static ssize_t khugepaged_defrag_show(struct kobject *kobj,
> >> +static ssize_t defrag_show(struct kobject *kobj,
> >>                                       struct kobj_attribute *attr, char *buf)
> >
> > Why do you rename all the functions? Seems unnecessary and less intriguing TBH.
>
> It's because e.g. __ATTR_RW(defrag) expects the defrag_show and defrag_store instead
> of khugepaged_defrag_show and khugepaged_defrag_store.

Aha, I see. I missed this.

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Thanks.
>
> >
> >>  {
> >>         return single_hugepage_flag_show(kobj, attr, buf,
> >>                                          TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
> >>  }
> >> -static ssize_t khugepaged_defrag_store(struct kobject *kobj,
> >> +static ssize_t defrag_store(struct kobject *kobj,
> >>                                        struct kobj_attribute *attr,
> >>                                        const char *buf, size_t count)
> >>  {
> >> @@ -235,8 +232,7 @@ static ssize_t khugepaged_defrag_store(struct kobject *kobj,
> >>                                  TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);
> >>  }
> >>  static struct kobj_attribute khugepaged_defrag_attr =
> >> -       __ATTR(defrag, 0644, khugepaged_defrag_show,
> >> -              khugepaged_defrag_store);
> >> +       __ATTR_RW(defrag);
> >>
> >>  /*
> >>   * max_ptes_none controls if khugepaged should collapse hugepages over
> >> @@ -246,13 +242,13 @@ static struct kobj_attribute khugepaged_defrag_attr =
> >>   * runs. Increasing max_ptes_none will instead potentially reduce the
> >>   * free memory in the system during the khugepaged scan.
> >>   */
> >> -static ssize_t khugepaged_max_ptes_none_show(struct kobject *kobj,
> >> +static ssize_t max_ptes_none_show(struct kobject *kobj,
> >>                                              struct kobj_attribute *attr,
> >>                                              char *buf)
> >>  {
> >>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_none);
> >>  }
> >> -static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
> >> +static ssize_t max_ptes_none_store(struct kobject *kobj,
> >>                                               struct kobj_attribute *attr,
> >>                                               const char *buf, size_t count)
> >>  {
> >> @@ -268,17 +264,16 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
> >>         return count;
> >>  }
> >>  static struct kobj_attribute khugepaged_max_ptes_none_attr =
> >> -       __ATTR(max_ptes_none, 0644, khugepaged_max_ptes_none_show,
> >> -              khugepaged_max_ptes_none_store);
> >> +       __ATTR_RW(max_ptes_none);
> >>
> >> -static ssize_t khugepaged_max_ptes_swap_show(struct kobject *kobj,
> >> +static ssize_t max_ptes_swap_show(struct kobject *kobj,
> >>                                              struct kobj_attribute *attr,
> >>                                              char *buf)
> >>  {
> >>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_swap);
> >>  }
> >>
> >> -static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
> >> +static ssize_t max_ptes_swap_store(struct kobject *kobj,
> >>                                               struct kobj_attribute *attr,
> >>                                               const char *buf, size_t count)
> >>  {
> >> @@ -295,17 +290,16 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
> >>  }
> >>
> >>  static struct kobj_attribute khugepaged_max_ptes_swap_attr =
> >> -       __ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show,
> >> -              khugepaged_max_ptes_swap_store);
> >> +       __ATTR_RW(max_ptes_swap);
> >>
> >> -static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj,
> >> +static ssize_t max_ptes_shared_show(struct kobject *kobj,
> >>                                                struct kobj_attribute *attr,
> >>                                                char *buf)
> >>  {
> >>         return sysfs_emit(buf, "%u\n", khugepaged_max_ptes_shared);
> >>  }
> >>
> >> -static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
> >> +static ssize_t max_ptes_shared_store(struct kobject *kobj,
> >>                                               struct kobj_attribute *attr,
> >>                                               const char *buf, size_t count)
> >>  {
> >> @@ -322,8 +316,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
> >>  }
> >>
> >>  static struct kobj_attribute khugepaged_max_ptes_shared_attr =
> >> -       __ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show,
> >> -              khugepaged_max_ptes_shared_store);
> >> +       __ATTR_RW(max_ptes_shared);
> >>
> >>  static struct attribute *khugepaged_attr[] = {
> >>         &khugepaged_defrag_attr.attr,
> >> --
> >> 2.23.0
> >>
> >>
> > .
> >
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-16  7:42     ` Miaohe Lin
@ 2022-06-16 15:53       ` Yang Shi
  2022-06-17  2:26         ` Miaohe Lin
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-16 15:53 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/6/16 7:58, Yang Shi wrote:
> > On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> >> It's because release_pte_page() is not called for these pages and
> >> thus free_page_and_swap_cache can't grab the page lock. These pages
> >> won't be freed from swap cache even if we are the only user until
> >> next time reclaim. It shouldn't hurt indeed, but we could try to
> >> free these pages to save more memory for system.
> >
> >
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  include/linux/swap.h | 5 +++++
> >>  mm/khugepaged.c      | 1 +
> >>  mm/swap.h            | 5 -----
> >>  3 files changed, 6 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> index 8672a7123ccd..ccb83b12b724 100644
> >> --- a/include/linux/swap.h
> >> +++ b/include/linux/swap.h
> >> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
> >>         return global_node_page_state(NR_SWAPCACHE);
> >>  }
> >>
> >> +extern void free_swap_cache(struct page *page);
> >>  extern void free_page_and_swap_cache(struct page *);
> >>  extern void free_pages_and_swap_cache(struct page **, int);
> >>  /* linux/mm/swapfile.c */
> >> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
> >>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
> >>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
> >>
> >> +static inline void free_swap_cache(struct page *page)
> >> +{
> >> +}
> >> +
> >>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> >>  {
> >>         return 0;
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index ee0a719c8be9..52109ad13f78 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> >>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> >>                 list_del(&src_page->lru);
> >>                 release_pte_page(src_page);
> >> +               free_swap_cache(src_page);
> >
> > Will this really work? The free_swap_cache() will just dec refcounts
> > without putting the page back to buddy. So the hugepage is not
> > actually freed at all. Am I missing something?
>
> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
> free the page as it assumes the caller has a reference on the page and thus
> only does page_ref_sub(). Does the below change looks sense for you?

THP gets drained immediately so they won't stay in pagevecs.

>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 52109ad13f78..b8c96e33591d 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>
>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>                 list_del(&src_page->lru);
> -               release_pte_page(src_page);
> +               mod_node_page_state(page_pgdat(src_page),
> +                                   NR_ISOLATED_ANON + page_is_file_lru(src_page),
> +                                   -compound_nr(src_page));
> +               unlock_page(src_page);
>                 free_swap_cache(src_page);
> +               putback_lru_page(src_page);

I'm not sure if it is worth it or not for a rare corner case since THP
should not stay in swapcache unless try_to_unmap() in vmscan fails
IIUC. And it is not guaranteed that free_swap_cache() will get the
page lock.

>         }
>  }
>
> Thanks!
>
> >
> >>         }
> >>  }
> >>
> >> diff --git a/mm/swap.h b/mm/swap.h
> >> index 0193797b0c92..863f6086c916 100644
> >> --- a/mm/swap.h
> >> +++ b/mm/swap.h
> >> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
> >>  void delete_from_swap_cache(struct page *page);
> >>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
> >>                                   unsigned long end);
> >> -void free_swap_cache(struct page *page);
> >>  struct page *lookup_swap_cache(swp_entry_t entry,
> >>                                struct vm_area_struct *vma,
> >>                                unsigned long addr);
> >> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
> >>         return NULL;
> >>  }
> >>
> >> -static inline void free_swap_cache(struct page *page)
> >> -{
> >> -}
> >> -
> >>  static inline void show_swap_cache_info(void)
> >>  {
> >>  }
> >> --
> >> 2.23.0
> >>
> >>
> > .
> >
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-16 15:53       ` Yang Shi
@ 2022-06-17  2:26         ` Miaohe Lin
  2022-06-17 16:35           ` Yang Shi
  0 siblings, 1 reply; 40+ messages in thread
From: Miaohe Lin @ 2022-06-17  2:26 UTC (permalink / raw)
  To: Yang Shi
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/16 23:53, Yang Shi wrote:
> On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/6/16 7:58, Yang Shi wrote:
>>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>>>> It's because release_pte_page() is not called for these pages and
>>>> thus free_page_and_swap_cache can't grab the page lock. These pages
>>>> won't be freed from swap cache even if we are the only user until
>>>> next time reclaim. It shouldn't hurt indeed, but we could try to
>>>> free these pages to save more memory for system.
>>>
>>>
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  include/linux/swap.h | 5 +++++
>>>>  mm/khugepaged.c      | 1 +
>>>>  mm/swap.h            | 5 -----
>>>>  3 files changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>> index 8672a7123ccd..ccb83b12b724 100644
>>>> --- a/include/linux/swap.h
>>>> +++ b/include/linux/swap.h
>>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>>>>         return global_node_page_state(NR_SWAPCACHE);
>>>>  }
>>>>
>>>> +extern void free_swap_cache(struct page *page);
>>>>  extern void free_page_and_swap_cache(struct page *);
>>>>  extern void free_pages_and_swap_cache(struct page **, int);
>>>>  /* linux/mm/swapfile.c */
>>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>>>>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>>>>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>>>
>>>> +static inline void free_swap_cache(struct page *page)
>>>> +{
>>>> +}
>>>> +
>>>>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>>>>  {
>>>>         return 0;
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index ee0a719c8be9..52109ad13f78 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>>>                 list_del(&src_page->lru);
>>>>                 release_pte_page(src_page);
>>>> +               free_swap_cache(src_page);
>>>
>>> Will this really work? The free_swap_cache() will just dec refcounts
>>> without putting the page back to buddy. So the hugepage is not
>>> actually freed at all. Am I missing something?
>>
>> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
>> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
>> free the page as it assumes the caller has a reference on the page and thus
>> only does page_ref_sub(). Does the below change looks sense for you?
> 
> THP gets drained immediately so they won't stay in pagevecs.

Yes, you're right. I missed this.

> 
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 52109ad13f78..b8c96e33591d 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>
>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>                 list_del(&src_page->lru);
>> -               release_pte_page(src_page);
>> +               mod_node_page_state(page_pgdat(src_page),
>> +                                   NR_ISOLATED_ANON + page_is_file_lru(src_page),
>> +                                   -compound_nr(src_page));
>> +               unlock_page(src_page);
>>                 free_swap_cache(src_page);
>> +               putback_lru_page(src_page);
> 
> I'm not sure if it is worth it or not for a rare corner case since THP
> should not stay in swapcache unless try_to_unmap() in vmscan fails

IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the
swapcache if shrink_page_list is not called for this THP again after writeback
is done, e.g. when shrink_page_list is called from madvise, so there might be
no memory pressure, or do_swap_page puts the THP into page table again. Also THP
might not be splited when deferred_split_shrinker is not called, e.g. due to
not lacking of memory. Even if there is memory pressure, the THP will stay in
swapcache until next round page reclaim for this THP is done. So there should
be a non-negligible window that THP will stay in the swapcache.
Or am I miss something?

> IIUC. And it is not guaranteed that free_swap_cache() will get the
> page lock.

IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal
page anyway.

Thanks!

> 
>>         }
>>  }
>>
>> Thanks!
>>
>>>
>>>>         }
>>>>  }
>>>>
>>>> diff --git a/mm/swap.h b/mm/swap.h
>>>> index 0193797b0c92..863f6086c916 100644
>>>> --- a/mm/swap.h
>>>> +++ b/mm/swap.h
>>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>>>>  void delete_from_swap_cache(struct page *page);
>>>>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>>>>                                   unsigned long end);
>>>> -void free_swap_cache(struct page *page);
>>>>  struct page *lookup_swap_cache(swp_entry_t entry,
>>>>                                struct vm_area_struct *vma,
>>>>                                unsigned long addr);
>>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>>>>         return NULL;
>>>>  }
>>>>
>>>> -static inline void free_swap_cache(struct page *page)
>>>> -{
>>>> -}
>>>> -
>>>>  static inline void show_swap_cache_info(void)
>>>>  {
>>>>  }
>>>> --
>>>> 2.23.0
>>>>
>>>>
>>> .
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-17  2:26         ` Miaohe Lin
@ 2022-06-17 16:35           ` Yang Shi
  2022-06-18  3:13             ` Miaohe Lin
  0 siblings, 1 reply; 40+ messages in thread
From: Yang Shi @ 2022-06-17 16:35 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On Thu, Jun 16, 2022 at 7:27 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/6/16 23:53, Yang Shi wrote:
> > On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>
> >> On 2022/6/16 7:58, Yang Shi wrote:
> >>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >>>>
> >>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> >>>> It's because release_pte_page() is not called for these pages and
> >>>> thus free_page_and_swap_cache can't grab the page lock. These pages
> >>>> won't be freed from swap cache even if we are the only user until
> >>>> next time reclaim. It shouldn't hurt indeed, but we could try to
> >>>> free these pages to save more memory for system.
> >>>
> >>>
> >>>>
> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >>>> ---
> >>>>  include/linux/swap.h | 5 +++++
> >>>>  mm/khugepaged.c      | 1 +
> >>>>  mm/swap.h            | 5 -----
> >>>>  3 files changed, 6 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >>>> index 8672a7123ccd..ccb83b12b724 100644
> >>>> --- a/include/linux/swap.h
> >>>> +++ b/include/linux/swap.h
> >>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
> >>>>         return global_node_page_state(NR_SWAPCACHE);
> >>>>  }
> >>>>
> >>>> +extern void free_swap_cache(struct page *page);
> >>>>  extern void free_page_and_swap_cache(struct page *);
> >>>>  extern void free_pages_and_swap_cache(struct page **, int);
> >>>>  /* linux/mm/swapfile.c */
> >>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
> >>>>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
> >>>>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
> >>>>
> >>>> +static inline void free_swap_cache(struct page *page)
> >>>> +{
> >>>> +}
> >>>> +
> >>>>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> >>>>  {
> >>>>         return 0;
> >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>>> index ee0a719c8be9..52109ad13f78 100644
> >>>> --- a/mm/khugepaged.c
> >>>> +++ b/mm/khugepaged.c
> >>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> >>>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> >>>>                 list_del(&src_page->lru);
> >>>>                 release_pte_page(src_page);
> >>>> +               free_swap_cache(src_page);
> >>>
> >>> Will this really work? The free_swap_cache() will just dec refcounts
> >>> without putting the page back to buddy. So the hugepage is not
> >>> actually freed at all. Am I missing something?
> >>
> >> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
> >> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
> >> free the page as it assumes the caller has a reference on the page and thus
> >> only does page_ref_sub(). Does the below change looks sense for you?
> >
> > THP gets drained immediately so they won't stay in pagevecs.
>
> Yes, you're right. I missed this.
>
> >
> >>
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index 52109ad13f78..b8c96e33591d 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> >>
> >>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> >>                 list_del(&src_page->lru);
> >> -               release_pte_page(src_page);
> >> +               mod_node_page_state(page_pgdat(src_page),
> >> +                                   NR_ISOLATED_ANON + page_is_file_lru(src_page),
> >> +                                   -compound_nr(src_page));
> >> +               unlock_page(src_page);
> >>                 free_swap_cache(src_page);
> >> +               putback_lru_page(src_page);
> >
> > I'm not sure if it is worth it or not for a rare corner case since THP
> > should not stay in swapcache unless try_to_unmap() in vmscan fails
>
> IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the
> swapcache if shrink_page_list is not called for this THP again after writeback
> is done, e.g. when shrink_page_list is called from madvise, so there might be

I don't get, doesn't __remove_mapping() delete the page from swap cache?

> no memory pressure, or do_swap_page puts the THP into page table again. Also THP

do_swap_page() just swaps in base page, never THP.

> might not be splited when deferred_split_shrinker is not called, e.g. due to

I don't see how deferred split is related to this.

> not lacking of memory. Even if there is memory pressure, the THP will stay in
> swapcache until next round page reclaim for this THP is done. So there should
> be a non-negligible window that THP will stay in the swapcache.
> Or am I miss something?

I guess you may misunderstand what I meant. This patch is trying to
optimize freeing THP in swapcache. But it should be very rare that
khugepaged sees THP from swap cache. The only case I could think of is
try_to_unmap() in vmscan fails. That might leave THP in swap cache so
that khugepaged could see it.


>
> > IIUC. And it is not guaranteed that free_swap_cache() will get the
> > page lock.
>
> IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal
> page anyway.
>
> Thanks!
>
> >
> >>         }
> >>  }
> >>
> >> Thanks!
> >>
> >>>
> >>>>         }
> >>>>  }
> >>>>
> >>>> diff --git a/mm/swap.h b/mm/swap.h
> >>>> index 0193797b0c92..863f6086c916 100644
> >>>> --- a/mm/swap.h
> >>>> +++ b/mm/swap.h
> >>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
> >>>>  void delete_from_swap_cache(struct page *page);
> >>>>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
> >>>>                                   unsigned long end);
> >>>> -void free_swap_cache(struct page *page);
> >>>>  struct page *lookup_swap_cache(swp_entry_t entry,
> >>>>                                struct vm_area_struct *vma,
> >>>>                                unsigned long addr);
> >>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
> >>>>         return NULL;
> >>>>  }
> >>>>
> >>>> -static inline void free_swap_cache(struct page *page)
> >>>> -{
> >>>> -}
> >>>> -
> >>>>  static inline void show_swap_cache_info(void)
> >>>>  {
> >>>>  }
> >>>> --
> >>>> 2.23.0
> >>>>
> >>>>
> >>> .
> >>>
> >>
> > .
> >
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible
  2022-06-17 16:35           ` Yang Shi
@ 2022-06-18  3:13             ` Miaohe Lin
  0 siblings, 0 replies; 40+ messages in thread
From: Miaohe Lin @ 2022-06-18  3:13 UTC (permalink / raw)
  To: Yang Shi
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox, Vlastimil Babka,
	David Howells, NeilBrown, Alistair Popple, David Hildenbrand,
	Suren Baghdasaryan, Peter Xu, Linux MM,
	Linux Kernel Mailing List

On 2022/6/18 0:35, Yang Shi wrote:
> On Thu, Jun 16, 2022 at 7:27 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/6/16 23:53, Yang Shi wrote:
>>> On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> On 2022/6/16 7:58, Yang Shi wrote:
>>>>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>
>>>>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>>>>>> It's because release_pte_page() is not called for these pages and
>>>>>> thus free_page_and_swap_cache can't grab the page lock. These pages
>>>>>> won't be freed from swap cache even if we are the only user until
>>>>>> next time reclaim. It shouldn't hurt indeed, but we could try to
>>>>>> free these pages to save more memory for system.
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>> ---
>>>>>>  include/linux/swap.h | 5 +++++
>>>>>>  mm/khugepaged.c      | 1 +
>>>>>>  mm/swap.h            | 5 -----
>>>>>>  3 files changed, 6 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>>>> index 8672a7123ccd..ccb83b12b724 100644
>>>>>> --- a/include/linux/swap.h
>>>>>> +++ b/include/linux/swap.h
>>>>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>>>>>>         return global_node_page_state(NR_SWAPCACHE);
>>>>>>  }
>>>>>>
>>>>>> +extern void free_swap_cache(struct page *page);
>>>>>>  extern void free_page_and_swap_cache(struct page *);
>>>>>>  extern void free_pages_and_swap_cache(struct page **, int);
>>>>>>  /* linux/mm/swapfile.c */
>>>>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>>>>>>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>>>>>>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>>>>>
>>>>>> +static inline void free_swap_cache(struct page *page)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>>>>>>  {
>>>>>>         return 0;
>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>> index ee0a719c8be9..52109ad13f78 100644
>>>>>> --- a/mm/khugepaged.c
>>>>>> +++ b/mm/khugepaged.c
>>>>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>>>>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>>>>>                 list_del(&src_page->lru);
>>>>>>                 release_pte_page(src_page);
>>>>>> +               free_swap_cache(src_page);
>>>>>
>>>>> Will this really work? The free_swap_cache() will just dec refcounts
>>>>> without putting the page back to buddy. So the hugepage is not
>>>>> actually freed at all. Am I missing something?
>>>>
>>>> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
>>>> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
>>>> free the page as it assumes the caller has a reference on the page and thus
>>>> only does page_ref_sub(). Does the below change looks sense for you?
>>>
>>> THP gets drained immediately so they won't stay in pagevecs.
>>
>> Yes, you're right. I missed this.
>>
>>>
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index 52109ad13f78..b8c96e33591d 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>>>
>>>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>>>                 list_del(&src_page->lru);
>>>> -               release_pte_page(src_page);
>>>> +               mod_node_page_state(page_pgdat(src_page),
>>>> +                                   NR_ISOLATED_ANON + page_is_file_lru(src_page),
>>>> +                                   -compound_nr(src_page));
>>>> +               unlock_page(src_page);
>>>>                 free_swap_cache(src_page);
>>>> +               putback_lru_page(src_page);
>>>
>>> I'm not sure if it is worth it or not for a rare corner case since THP
>>> should not stay in swapcache unless try_to_unmap() in vmscan fails
>>
>> IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the
>> swapcache if shrink_page_list is not called for this THP again after writeback
>> is done, e.g. when shrink_page_list is called from madvise, so there might be
> 
> I don't get, doesn't __remove_mapping() delete the page from swap cache?

Sorry for making confusion. :(
IIUC, __remove_mapping() is only called when page is clean and page writeback is done in
shrink_page_list(). So for the first round of shrink_page_list(), the THP is under writeback
and __remove_mapping() won't be called. THP will be removed from swapcache via __remove_mapping()
in next round of shrink_page_list() if THP is clean and not under writeback. So THP should be in
the swapcache until next round of shrink_page_list().

And if shrink_page_list is called from madvise, the next round of shrink_page_list() for this
THP won't arrive if there is no memory pressure because madvise can't shrink pages that are
already in the swapcache (!pte_present case is ignored in madvise_cold_or_pageout_pte_range()).
So the THP might stay in swapcache for a long time.

Does this make sense for you?

> 
>> no memory pressure, or do_swap_page puts the THP into page table again. Also THP
> 
> do_swap_page() just swaps in base page, never THP.

If the THP is not removed from swapcache, do_swap_cache can found it via lookup_swap_cache().
So we "swap in" the THP.

> 
>> might not be splited when deferred_split_shrinker is not called, e.g. due to
> 
> I don't see how deferred split is related to this.

What I mean is that if the THP is splitted, khugepaged won't found that THP. So deferred split
should be considered?

> 
>> not lacking of memory. Even if there is memory pressure, the THP will stay in
>> swapcache until next round page reclaim for this THP is done. So there should
>> be a non-negligible window that THP will stay in the swapcache.
>> Or am I miss something?
> 
> I guess you may misunderstand what I meant. This patch is trying to
> optimize freeing THP in swapcache. But it should be very rare that
> khugepaged sees THP from swap cache. The only case I could think of is
> try_to_unmap() in vmscan fails. That might leave THP in swap cache so
> that khugepaged could see it.

I was trying to show you that how a THP can stay in the swapcache. If it's not
removed from swapcache via __remove_mapping, not splitted or swapped in before
it's removed, it will stay in the swapcache. Or am I miss something?

I hope I make my point clear this time. ;)

> 
> 
>>
>>> IIUC. And it is not guaranteed that free_swap_cache() will get the
>>> page lock.
>>
>> IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal
>> page anyway.
>>
>> Thanks!
>>
>>>
>>>>         }
>>>>  }
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>>>         }
>>>>>>  }
>>>>>>
>>>>>> diff --git a/mm/swap.h b/mm/swap.h
>>>>>> index 0193797b0c92..863f6086c916 100644
>>>>>> --- a/mm/swap.h
>>>>>> +++ b/mm/swap.h
>>>>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>>>>>>  void delete_from_swap_cache(struct page *page);
>>>>>>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>>>>>>                                   unsigned long end);
>>>>>> -void free_swap_cache(struct page *page);
>>>>>>  struct page *lookup_swap_cache(swp_entry_t entry,
>>>>>>                                struct vm_area_struct *vma,
>>>>>>                                unsigned long addr);
>>>>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>>>>>>         return NULL;
>>>>>>  }
>>>>>>
>>>>>> -static inline void free_swap_cache(struct page *page)
>>>>>> -{
>>>>>> -}
>>>>>> -
>>>>>>  static inline void show_swap_cache_info(void)
>>>>>>  {
>>>>>>  }
>>>>>> --
>>>>>> 2.23.0
>>>>>>
>>>>>>
>>>>> .
>>>>>
>>>>
>>> .
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-06-18  3:13 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-11  8:47 [PATCH 0/7] A few cleanup patches for khugepaged Miaohe Lin
2022-06-11  8:47 ` [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check Miaohe Lin
2022-06-11 20:33   ` Andrew Morton
2022-06-13  1:48     ` Miaohe Lin
2022-06-13 18:02       ` Andrew Morton
2022-06-15  0:13   ` Zach O'Keefe
2022-06-15 17:35   ` Yang Shi
2022-06-11  8:47 ` [PATCH 2/7] mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs Miaohe Lin
2022-06-15 15:14   ` Zach O'Keefe
2022-06-15 17:51     ` Yang Shi
2022-06-16  6:08       ` Miaohe Lin
2022-06-15 17:49   ` Yang Shi
2022-06-16  6:40     ` Miaohe Lin
2022-06-16 15:46       ` Yang Shi
2022-06-11  8:47 ` [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup Miaohe Lin
2022-06-15  0:23   ` Zach O'Keefe
2022-06-15 17:53   ` Yang Shi
2022-06-11  8:47 ` [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file Miaohe Lin
2022-06-15 15:54   ` Zach O'Keefe
2022-06-15 18:18     ` Yang Shi
2022-06-16  6:10       ` Miaohe Lin
2022-06-11  8:47 ` [PATCH 5/7] mm/khugepaged: use helper macro __ATTR_RW Miaohe Lin
2022-06-15  0:29   ` Zach O'Keefe
2022-06-15  7:48     ` Miaohe Lin
2022-06-15 21:28   ` Yang Shi
2022-06-16  7:07     ` Miaohe Lin
2022-06-16 15:48       ` Yang Shi
2022-06-11  8:47 ` [PATCH 6/7] mm/khugepaged: remove unneeded return value of khugepaged_add_pte_mapped_thp() Miaohe Lin
2022-06-15  0:19   ` Zach O'Keefe
2022-06-15 21:35   ` Yang Shi
2022-06-11  8:47 ` [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible Miaohe Lin
2022-06-15 17:13   ` Zach O'Keefe
2022-06-16  3:38     ` Mika Penttilä
2022-06-16  7:33     ` Miaohe Lin
2022-06-15 23:58   ` Yang Shi
2022-06-16  7:42     ` Miaohe Lin
2022-06-16 15:53       ` Yang Shi
2022-06-17  2:26         ` Miaohe Lin
2022-06-17 16:35           ` Yang Shi
2022-06-18  3:13             ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).