linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] A few fixup patches for hugetlb
@ 2022-08-23  3:02 Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

Hi everyone,
This series contains a few fixup patches to fix incorrect update of
max_huge_pages, fix WARN_ON(!kobj) in sysfs_create_group() and so on.
More details can be found in the respective changelogs.
Thanks!

---
v2:
 Collect Reviewed-by tag per Mike, Muchun and Fengwei. Thanks!
 Remove err == -EEXIST check and retry logic in 3/6.
 Remove unused local variable dst_entry per Lukas. Thanks!
---
Miaohe Lin (6):
  mm/hugetlb: fix incorrect update of max_huge_pages
  mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group()
  mm/hugetlb: fix missing call to restore_reserve_on_error()
  mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at()
  mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node()
  mm/hugetlb: make detecting shared pte more reliable

 mm/hugetlb.c         | 61 +++++++++++++++++++++++++++-----------------
 mm/hugetlb_vmemmap.c |  5 ++++
 2 files changed, 42 insertions(+), 24 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages
  2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
@ 2022-08-23  3:02 ` Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

There should be pages_per_huge_page(h) / pages_per_huge_page(target_hstate)
pages incremented for target_hstate->max_huge_pages when page is demoted.
Update max_huge_pages accordingly for consistency.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f8f1540108ab..19a7a83af569 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3473,7 +3473,8 @@ static int demote_free_huge_page(struct hstate *h, struct page *page)
 	 * based on pool changes for the demoted page.
 	 */
 	h->max_huge_pages--;
-	target_hstate->max_huge_pages += pages_per_huge_page(h);
+	target_hstate->max_huge_pages +=
+		pages_per_huge_page(h) / pages_per_huge_page(target_hstate);
 
 	return rc;
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group()
  2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
@ 2022-08-23  3:02 ` Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

If sysfs_create_group() fails with hstate_attr_group, hstate_kobjs[hi]
will be set to NULL. Then it will be passed to sysfs_create_group() if
h->demote_order != 0 thus triggering WARN_ON(!kobj) check. Fix this by
making sure hstate_kobjs[hi] != NULL when calling sysfs_create_group.

Fixes: 79dfc695525f ("hugetlb: add demote hugetlb page sysfs interfaces")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19a7a83af569..d46dfe5ba62c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3847,6 +3847,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
 	if (retval) {
 		kobject_put(hstate_kobjs[hi]);
 		hstate_kobjs[hi] = NULL;
+		return retval;
 	}
 
 	if (h->demote_order) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error()
  2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
@ 2022-08-23  3:02 ` Miaohe Lin
  2022-08-24 18:21   ` Mike Kravetz
  2022-08-23  3:02 ` [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

When huge_add_to_page_cache() fails, the page is freed directly without
calling restore_reserve_on_error() to restore reserve for newly allocated
pages not in page cache. Fix this by calling restore_reserve_on_error()
when huge_add_to_page_cache fails.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/hugetlb.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d46dfe5ba62c..8e62da153c64 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5576,7 +5576,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	if (idx >= size)
 		goto out;
 
-retry:
 	new_page = false;
 	page = find_lock_page(mapping, idx);
 	if (!page) {
@@ -5616,9 +5615,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 		if (vma->vm_flags & VM_MAYSHARE) {
 			int err = huge_add_to_page_cache(page, mapping, idx);
 			if (err) {
+				/*
+				 * err can't be -EEXIST which implies someone
+				 * else consumed the reservation since hugetlb
+				 * fault mutex is held when add a hugetlb page
+				 * to the page cache. So it's safe to call
+				 * restore_reserve_on_error() here.
+				 */
+				restore_reserve_on_error(h, vma, haddr, page);
 				put_page(page);
-				if (err == -EEXIST)
-					goto retry;
 				goto out;
 			}
 			new_pagecache_page = true;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at()
  2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
                   ` (2 preceding siblings ...)
  2022-08-23  3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
@ 2022-08-23  3:02 ` Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin
  5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

The memory barrier smp_wmb() is needed to make sure that preceding stores
to the page contents become visible before the below set_pte_at() write.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb_vmemmap.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 196159000897..ba2a2596fb4e 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -286,6 +286,11 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
 	copy_page(to, (void *)walk->reuse_addr);
 	reset_struct_pages(to);
 
+	/*
+	 * Makes sure that preceding stores to the page contents become visible
+	 * before the set_pte_at() write.
+	 */
+	smp_wmb();
 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
 }
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node()
  2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
                   ` (3 preceding siblings ...)
  2022-08-23  3:02 ` [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
@ 2022-08-23  3:02 ` Miaohe Lin
  2022-08-23  3:02 ` [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin
  5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

The sysfs group per_node_hstate_attr_group and hstate_demote_attr_group
when h->demote_order != 0 are created in hugetlb_register_node(). But
these sysfs groups are not removed when unregister the node, thus sysfs
group is leaked. Using sysfs_remove_group() to fix this issue.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
---
 mm/hugetlb.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8e62da153c64..2dfd10599f98 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3851,12 +3851,18 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
 	}
 
 	if (h->demote_order) {
-		if (sysfs_create_group(hstate_kobjs[hi],
-					&hstate_demote_attr_group))
+		retval = sysfs_create_group(hstate_kobjs[hi],
+					    &hstate_demote_attr_group);
+		if (retval) {
 			pr_warn("HugeTLB unable to create demote interfaces for %s\n", h->name);
+			sysfs_remove_group(hstate_kobjs[hi], hstate_attr_group);
+			kobject_put(hstate_kobjs[hi]);
+			hstate_kobjs[hi] = NULL;
+			return retval;
+		}
 	}
 
-	return retval;
+	return 0;
 }
 
 static void __init hugetlb_sysfs_init(void)
@@ -3942,10 +3948,15 @@ static void hugetlb_unregister_node(struct node *node)
 
 	for_each_hstate(h) {
 		int idx = hstate_index(h);
-		if (nhs->hstate_kobjs[idx]) {
-			kobject_put(nhs->hstate_kobjs[idx]);
-			nhs->hstate_kobjs[idx] = NULL;
-		}
+		struct kobject *hstate_kobj = nhs->hstate_kobjs[idx];
+
+		if (!hstate_kobj)
+			continue;
+		if (h->demote_order)
+			sysfs_remove_group(hstate_kobj, &hstate_demote_attr_group);
+		sysfs_remove_group(hstate_kobj, &per_node_hstate_attr_group);
+		kobject_put(hstate_kobj);
+		nhs->hstate_kobjs[idx] = NULL;
 	}
 
 	kobject_put(nhs->hugepages_kobj);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable
  2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
                   ` (4 preceding siblings ...)
  2022-08-23  3:02 ` [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
@ 2022-08-23  3:02 ` Miaohe Lin
  5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23  3:02 UTC (permalink / raw)
  To: akpm, mike.kravetz, songmuchun
  Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe

If the pagetables are shared, we shouldn't copy or take references. Since
src could have unshared and dst shares with another vma, huge_pte_none()
is thus used to determine whether dst_pte is shared. But this check isn't
reliable. A shared pte could have pte none in pagetable in fact. The page
count of ptep page should be checked here in order to reliably determine
whether pte is shared.

[Thanks Lukas for cleanup unused local variable dst_entry.]

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
---
 mm/hugetlb.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2dfd10599f98..8aa62765a055 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4763,7 +4763,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			    struct vm_area_struct *dst_vma,
 			    struct vm_area_struct *src_vma)
 {
-	pte_t *src_pte, *dst_pte, entry, dst_entry;
+	pte_t *src_pte, *dst_pte, entry;
 	struct page *ptepage;
 	unsigned long addr;
 	bool cow = is_cow_mapping(src_vma->vm_flags);
@@ -4808,15 +4808,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 		/*
 		 * If the pagetables are shared don't copy or take references.
-		 * dst_pte == src_pte is the common case of src/dest sharing.
 		 *
+		 * dst_pte == src_pte is the common case of src/dest sharing.
 		 * However, src could have 'unshared' and dst shares with
-		 * another vma.  If dst_pte !none, this implies sharing.
-		 * Check here before taking page table lock, and once again
-		 * after taking the lock below.
+		 * another vma. So page_count of ptep page is checked instead
+		 * to reliably determine whether pte is shared.
 		 */
-		dst_entry = huge_ptep_get(dst_pte);
-		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+		if (page_count(virt_to_page(dst_pte)) > 1) {
 			addr |= last_addr_mask;
 			continue;
 		}
@@ -4825,13 +4823,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		src_ptl = huge_pte_lockptr(h, src, src_pte);
 		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
 		entry = huge_ptep_get(src_pte);
-		dst_entry = huge_ptep_get(dst_pte);
 again:
-		if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) {
+		if (huge_pte_none(entry)) {
 			/*
-			 * Skip if src entry none.  Also, skip in the
-			 * unlikely case dst entry !none as this implies
-			 * sharing with another vma.
+			 * Skip if src entry none.
 			 */
 			;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
@@ -4910,7 +4905,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 					restore_reserve_on_error(h, dst_vma, addr,
 								new);
 					put_page(new);
-					/* dst_entry won't change as in child */
+					/* huge_ptep of dst_pte won't change as in child */
 					goto again;
 				}
 				hugetlb_install_page(dst_vma, dst_pte, addr, new);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error()
  2022-08-23  3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
@ 2022-08-24 18:21   ` Mike Kravetz
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-08-24 18:21 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, songmuchun, lukas.bulwahn, linux-mm, linux-kernel

On 08/23/22 11:02, Miaohe Lin wrote:
> When huge_add_to_page_cache() fails, the page is freed directly without
> calling restore_reserve_on_error() to restore reserve for newly allocated
> pages not in page cache. Fix this by calling restore_reserve_on_error()
> when huge_add_to_page_cache fails.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/hugetlb.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d46dfe5ba62c..8e62da153c64 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5576,7 +5576,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  	if (idx >= size)
>  		goto out;
>  
> -retry:
>  	new_page = false;
>  	page = find_lock_page(mapping, idx);
>  	if (!page) {
> @@ -5616,9 +5615,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  		if (vma->vm_flags & VM_MAYSHARE) {
>  			int err = huge_add_to_page_cache(page, mapping, idx);
>  			if (err) {
> +				/*
> +				 * err can't be -EEXIST which implies someone
> +				 * else consumed the reservation since hugetlb
> +				 * fault mutex is held when add a hugetlb page
> +				 * to the page cache. So it's safe to call
> +				 * restore_reserve_on_error() here.
> +				 */
> +				restore_reserve_on_error(h, vma, haddr, page);
>  				put_page(page);
> -				if (err == -EEXIST)
> -					goto retry;

Thanks for removing this check and adding the comment.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

>  				goto out;
>  			}
>  			new_pagecache_page = true;
> -- 
> 2.23.0
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-24 18:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23  3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
2022-08-23  3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
2022-08-23  3:02 ` [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
2022-08-23  3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
2022-08-24 18:21   ` Mike Kravetz
2022-08-23  3:02 ` [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
2022-08-23  3:02 ` [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
2022-08-23  3:02 ` [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).