linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
@ 2021-01-04  6:58 Muchun Song
  2021-01-04  6:58 ` [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Muchun Song @ 2021-01-04  6:58 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm,
	linux-kernel, Muchun Song

If the refcount is one when it is migrated, it means that the page
was freed from under us. So we are done and do not need to migrate.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/migrate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 4385f2fb5d18..a6631c4eb6a6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 		return -ENOSYS;
 	}
 
+	if (page_count(hpage) == 1) {
+		/* page was freed from under us. So we are done. */
+		putback_active_hugepage(hpage);
+		return MIGRATEPAGE_SUCCESS;
+	}
+
 	new_hpage = get_new_page(hpage, private);
 	if (!new_hpage)
 		return -ENOMEM;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
@ 2021-01-04  6:58 ` Muchun Song
  2021-01-04 22:38   ` Mike Kravetz
  2021-01-04  6:58 ` [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-04  6:58 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm,
	linux-kernel, Muchun Song

Because we only can isolate a active page via isolate_huge_page()
and hugetlbfs_fallocate() forget to mark it as active, we cannot
isolate and migrate those pages.

Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 fs/hugetlbfs/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b5c109703daa..2aceb085d202 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -737,10 +737,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 
 		/*
 		 * unlock_page because locked by add_to_page_cache()
-		 * page_put due to reference from alloc_huge_page()
+		 * put_page() (which is in the putback_active_hugepage())
+		 * due to reference from alloc_huge_page()
 		 */
 		unlock_page(page);
-		put_page(page);
+		putback_active_hugepage(page);
 	}
 
 	if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
  2021-01-04  6:58 ` [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
@ 2021-01-04  6:58 ` Muchun Song
  2021-01-05  0:00   ` Mike Kravetz
  2021-01-04  6:58 ` [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-04  6:58 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm,
	linux-kernel, Muchun Song

There is a race condition between __free_huge_page()
and dissolve_free_huge_page().

CPU0:                         CPU1:

// page_count(page) == 1
put_page(page)
  __free_huge_page(page)
                              dissolve_free_huge_page(page)
                                spin_lock(&hugetlb_lock)
                                // PageHuge(page) && !page_count(page)
                                update_and_free_page(page)
                                // page is freed to the buddy
                                spin_unlock(&hugetlb_lock)
    spin_lock(&hugetlb_lock)
    clear_page_huge_active(page)
    enqueue_huge_page(page)
    // It is wrong, the page is already freed
    spin_unlock(&hugetlb_lock)

The race windows is between put_page() and spin_lock() which
is in the __free_huge_page().

We should make sure that the page is already on the free list
when it is dissolved.

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1f3bf1710b66..72608008f8b4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+static inline bool PageHugeFreed(struct page *head)
+{
+	return page_private(head) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+	set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+	set_page_private(head + 4, 0);
+}
+
 /* Forward declaration */
 static int hugetlb_acct_memory(struct hstate *h, long delta);
 
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
 	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
+	SetPageHugeFreed(page);
 }
 
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 
 		list_move(&page->lru, &h->hugepage_activelist);
 		set_page_refcounted(page);
+		ClearPageHugeFreed(page);
 		h->free_huge_pages--;
 		h->free_huge_pages_node[nid]--;
 		return page;
@@ -1504,6 +1521,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
+	ClearPageHugeFreed(page);
 	spin_unlock(&hugetlb_lock);
 }
 
@@ -1770,6 +1788,36 @@ int dissolve_free_huge_page(struct page *page)
 		int nid = page_to_nid(head);
 		if (h->free_huge_pages - h->resv_huge_pages == 0)
 			goto out;
+
+		/*
+		 * There is a race condition between __free_huge_page()
+		 * and dissolve_free_huge_page().
+		 *
+		 * CPU0:                         CPU1:
+		 *
+		 * // page_count(page) == 1
+		 * put_page(page)
+		 *   __free_huge_page(page)
+		 *                               dissolve_free_huge_page(page)
+		 *                                 spin_lock(&hugetlb_lock)
+		 *                                 // PageHuge(page) && !page_count(page)
+		 *                                 update_and_free_page(page)
+		 *                                 // page is freed to the buddy
+		 *                                 spin_unlock(&hugetlb_lock)
+		 *     spin_lock(&hugetlb_lock)
+		 *     enqueue_huge_page(page)
+		 *     // It is wrong, the page is already freed
+		 *     spin_unlock(&hugetlb_lock)
+		 *
+		 * The race window is between put_page() and spin_lock() which
+		 * is in the __free_huge_page().
+		 *
+		 * We should make sure that the page is already on the free list
+		 * when it is dissolved.
+		 */
+		if (unlikely(!PageHugeFreed(head)))
+			goto out;
+
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
 		 * which makes any subpages rather than the error page reusable.
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
  2021-01-04  6:58 ` [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
  2021-01-04  6:58 ` [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
@ 2021-01-04  6:58 ` Muchun Song
  2021-01-05  1:32   ` Mike Kravetz
  2021-01-05  6:37   ` HORIGUCHI NAOYA(堀口 直也)
  2021-01-04  6:58 ` [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Muchun Song @ 2021-01-04  6:58 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm,
	linux-kernel, Muchun Song

When dissolve_free_huge_page() races with __free_huge_page(), we can
do a retry. Because the race window is small.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 72608008f8b4..db00ae375d2a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  * nothing for in-use hugepages and non-hugepages.
  * This function returns values like below:
  *
- *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
- *          (allocated or reserved.)
- *       0: successfully dissolved free hugepages or the page is not a
- *          hugepage (considered as already dissolved)
+ *  -EAGAIN: race with __free_huge_page() and can do a retry
+ *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
+ *           (allocated or reserved.)
+ *       0:  successfully dissolved free hugepages or the page is not a
+ *           hugepage (considered as already dissolved)
  */
 int dissolve_free_huge_page(struct page *page)
 {
@@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
 		 * We should make sure that the page is already on the free list
 		 * when it is dissolved.
 		 */
-		if (unlikely(!PageHugeFreed(head)))
+		if (unlikely(!PageHugeFreed(head))) {
+			rc = -EAGAIN;
 			goto out;
+		}
 
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
@@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
+retry:
 		rc = dissolve_free_huge_page(page);
+		if (rc == -EAGAIN)
+			goto retry;
 		if (rc)
 			break;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (2 preceding siblings ...)
  2021-01-04  6:58 ` [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
@ 2021-01-04  6:58 ` Muchun Song
  2021-01-05  1:42   ` Mike Kravetz
  2021-01-04  6:58 ` [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-04  6:58 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm,
	linux-kernel, Muchun Song

There is a race between isolate_huge_page() and __free_huge_page().

CPU0:                                       CPU1:

if (PageHuge(page))
                                            put_page(page)
                                              __free_huge_page(page)
                                                  spin_lock(&hugetlb_lock)
                                                  update_and_free_page(page)
                                                    set_compound_page_dtor(page,
                                                      NULL_COMPOUND_DTOR)
                                                  spin_unlock(&hugetlb_lock)
  isolate_huge_page(page)
    // trigger BUG_ON
    VM_BUG_ON_PAGE(!PageHead(page), page)
    spin_lock(&hugetlb_lock)
    page_huge_active(page)
      // trigger BUG_ON
      VM_BUG_ON_PAGE(!PageHuge(page), page)
    spin_unlock(&hugetlb_lock)

When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because
it is already freed to the buddy allocator.

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index db00ae375d2a..5c2f64f53177 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5587,9 +5587,9 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
 {
 	bool ret = true;
 
-	VM_BUG_ON_PAGE(!PageHead(page), page);
 	spin_lock(&hugetlb_lock);
-	if (!page_huge_active(page) || !get_page_unless_zero(page)) {
+	if (!PageHeadHuge(page) || !page_huge_active(page) ||
+	    !get_page_unless_zero(page)) {
 		ret = false;
 		goto unlock;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (3 preceding siblings ...)
  2021-01-04  6:58 ` [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
@ 2021-01-04  6:58 ` Muchun Song
  2021-01-05  1:50   ` Mike Kravetz
  2021-01-04 22:17 ` [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Mike Kravetz
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-04  6:58 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm,
	linux-kernel, Muchun Song

The page_huge_active() can be called from scan_movable_pages() which
do not hold a reference count to the HugeTLB page. So when we call
page_huge_active() from scan_movable_pages(), the HugeTLB page can
be freed parallel. Then we will trigger a BUG_ON which is in the
page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
VM_BUG_ON_PAGE.

Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5c2f64f53177..4c8631114e88 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1360,7 +1360,6 @@ struct hstate *size_to_hstate(unsigned long size)
  */
 bool page_huge_active(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageHuge(page), page);
 	return PageHead(page) && PagePrivate(&page[1]);
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (4 preceding siblings ...)
  2021-01-04  6:58 ` [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
@ 2021-01-04 22:17 ` Mike Kravetz
  2021-01-05 16:58 ` David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Mike Kravetz @ 2021-01-04 22:17 UTC (permalink / raw)
  To: Muchun Song, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 1/3/21 10:58 PM, Muchun Song wrote:
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/migrate.c | 6 ++++++
>  1 file changed, 6 insertions(+)

Thanks!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-04  6:58 ` [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
@ 2021-01-04 22:38   ` Mike Kravetz
  2021-01-05  2:44     ` [External] " Muchun Song
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Kravetz @ 2021-01-04 22:38 UTC (permalink / raw)
  To: Muchun Song, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 1/3/21 10:58 PM, Muchun Song wrote:
> Because we only can isolate a active page via isolate_huge_page()
> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> isolate and migrate those pages.
> 
> Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  fs/hugetlbfs/inode.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Good catch.  This is indeed an issue.

> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..2aceb085d202 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -737,10 +737,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>  
>  		/*
>  		 * unlock_page because locked by add_to_page_cache()
> -		 * page_put due to reference from alloc_huge_page()
> +		 * put_page() (which is in the putback_active_hugepage())
> +		 * due to reference from alloc_huge_page()

Thanks for fixing the comment.

>  		 */
>  		unlock_page(page);
> -		put_page(page);
> +		putback_active_hugepage(page);

I'm curious why you used putback_active_hugepage() here instead of simply
calling set_page_huge_active() before the put_page()?

When the page was allocated, it was placed on the active list (alloc_huge_page).
Therefore, the hugetlb_lock locking and list movement should not be necessary.

-- 
Mike Kravetz

>  	}
>  
>  	if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-04  6:58 ` [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
@ 2021-01-05  0:00   ` Mike Kravetz
  2021-01-05  2:55     ` [External] " Muchun Song
  2021-01-05  6:12     ` Muchun Song
  0 siblings, 2 replies; 31+ messages in thread
From: Mike Kravetz @ 2021-01-05  0:00 UTC (permalink / raw)
  To: Muchun Song, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 1/3/21 10:58 PM, Muchun Song wrote:
> There is a race condition between __free_huge_page()
> and dissolve_free_huge_page().
> 
> CPU0:                         CPU1:
> 
> // page_count(page) == 1
> put_page(page)
>   __free_huge_page(page)
>                               dissolve_free_huge_page(page)
>                                 spin_lock(&hugetlb_lock)
>                                 // PageHuge(page) && !page_count(page)
>                                 update_and_free_page(page)
>                                 // page is freed to the buddy
>                                 spin_unlock(&hugetlb_lock)
>     spin_lock(&hugetlb_lock)
>     clear_page_huge_active(page)
>     enqueue_huge_page(page)
>     // It is wrong, the page is already freed
>     spin_unlock(&hugetlb_lock)
> 
> The race windows is between put_page() and spin_lock() which
> is in the __free_huge_page().
> 
> We should make sure that the page is already on the free list
> when it is dissolved.
> 
> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1f3bf1710b66..72608008f8b4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
>  static int num_fault_mutexes;
>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>  
> +static inline bool PageHugeFreed(struct page *head)
> +{
> +	return page_private(head) == -1UL;

	return page_private(head + 4) == -1UL;

> +}
> +
> +static inline void SetPageHugeFreed(struct page *head)
> +{
> +	set_page_private(head + 4, -1UL);
> +}
> +
> +static inline void ClearPageHugeFreed(struct page *head)
> +{
> +	set_page_private(head + 4, 0);
> +}

It is unfortunate that we can not use some existing value like
page_huge_active() to determine if dissolve_free_huge_page() should
proceed with freeing the page to buddy.  If the existing check,

	if (!page_count(page)) {

was changed to

	if (!page_count(page) && !page_huge_active(page)) {

the race window would be shrunk.  However, the most straight forward
way to fully close the window is with the approach taken here.

> +
>  /* Forward declaration */
>  static int hugetlb_acct_memory(struct hstate *h, long delta);
>  
> @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
>  	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
> +	SetPageHugeFreed(page);
>  }
>  
>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>  
>  		list_move(&page->lru, &h->hugepage_activelist);
>  		set_page_refcounted(page);
> +		ClearPageHugeFreed(page);
>  		h->free_huge_pages--;
>  		h->free_huge_pages_node[nid]--;
>  		return page;
> @@ -1504,6 +1521,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
>  	h->nr_huge_pages_node[nid]++;
> +	ClearPageHugeFreed(page);
>  	spin_unlock(&hugetlb_lock);
>  }
>  
> @@ -1770,6 +1788,36 @@ int dissolve_free_huge_page(struct page *page)
>  		int nid = page_to_nid(head);
>  		if (h->free_huge_pages - h->resv_huge_pages == 0)
>  			goto out;
> +
> +		/*
> +		 * There is a race condition between __free_huge_page()
> +		 * and dissolve_free_huge_page().
> +		 *
> +		 * CPU0:                         CPU1:
> +		 *
> +		 * // page_count(page) == 1
> +		 * put_page(page)
> +		 *   __free_huge_page(page)
> +		 *                               dissolve_free_huge_page(page)
> +		 *                                 spin_lock(&hugetlb_lock)
> +		 *                                 // PageHuge(page) && !page_count(page)
> +		 *                                 update_and_free_page(page)
> +		 *                                 // page is freed to the buddy
> +		 *                                 spin_unlock(&hugetlb_lock)
> +		 *     spin_lock(&hugetlb_lock)
> +		 *     enqueue_huge_page(page)
> +		 *     // It is wrong, the page is already freed
> +		 *     spin_unlock(&hugetlb_lock)
> +		 *
> +		 * The race window is between put_page() and spin_lock() which
> +		 * is in the __free_huge_page().

IMO, the description of the race condition in the commit message is
sufficient.  It does not need to be here in the code.  The below comment
should be sufficient.

-- 
Mike Kravetz

> +		 *
> +		 * We should make sure that the page is already on the free list
> +		 * when it is dissolved.
> +		 */
> +		if (unlikely(!PageHugeFreed(head)))
> +			goto out;
> +
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
>  		 * which makes any subpages rather than the error page reusable.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-04  6:58 ` [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
@ 2021-01-05  1:32   ` Mike Kravetz
  2021-01-05  3:14     ` [External] " Muchun Song
  2021-01-05  6:37   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 31+ messages in thread
From: Mike Kravetz @ 2021-01-05  1:32 UTC (permalink / raw)
  To: Muchun Song, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 1/3/21 10:58 PM, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.

In general, I agree that the race window is small.  However, worst case
would be if the freeing of the page is put on a work queue.  Is it acceptable
to keep retrying in that case?  In addition, the 'Free some vmemmap' series
may slow the free_huge_page path even more.

In these worst case scenarios, I am not sure we want to just spin retrying.

-- 
Mike Kravetz

> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 72608008f8b4..db00ae375d2a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   * nothing for in-use hugepages and non-hugepages.
>   * This function returns values like below:
>   *
> - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> - *          (allocated or reserved.)
> - *       0: successfully dissolved free hugepages or the page is not a
> - *          hugepage (considered as already dissolved)
> + *  -EAGAIN: race with __free_huge_page() and can do a retry
> + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> + *           (allocated or reserved.)
> + *       0:  successfully dissolved free hugepages or the page is not a
> + *           hugepage (considered as already dissolved)
>   */
>  int dissolve_free_huge_page(struct page *page)
>  {
> @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
>  		 * We should make sure that the page is already on the free list
>  		 * when it is dissolved.
>  		 */
> -		if (unlikely(!PageHugeFreed(head)))
> +		if (unlikely(!PageHugeFreed(head))) {
> +			rc = -EAGAIN;
>  			goto out;
> +		}
>  
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
> @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>  		page = pfn_to_page(pfn);
> +retry:
>  		rc = dissolve_free_huge_page(page);
> +		if (rc == -EAGAIN)
> +			goto retry;
>  		if (rc)
>  			break;
>  	}
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page
  2021-01-04  6:58 ` [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
@ 2021-01-05  1:42   ` Mike Kravetz
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Kravetz @ 2021-01-05  1:42 UTC (permalink / raw)
  To: Muchun Song, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 1/3/21 10:58 PM, Muchun Song wrote:
> There is a race between isolate_huge_page() and __free_huge_page().
> 
> CPU0:                                       CPU1:
> 
> if (PageHuge(page))
>                                             put_page(page)
>                                               __free_huge_page(page)
>                                                   spin_lock(&hugetlb_lock)
>                                                   update_and_free_page(page)
>                                                     set_compound_page_dtor(page,
>                                                       NULL_COMPOUND_DTOR)
>                                                   spin_unlock(&hugetlb_lock)
>   isolate_huge_page(page)
>     // trigger BUG_ON
>     VM_BUG_ON_PAGE(!PageHead(page), page)
>     spin_lock(&hugetlb_lock)
>     page_huge_active(page)
>       // trigger BUG_ON
>       VM_BUG_ON_PAGE(!PageHuge(page), page)
>     spin_unlock(&hugetlb_lock)
> 
> When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
> buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because
> it is already freed to the buddy allocator.
> 
> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
  2021-01-04  6:58 ` [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
@ 2021-01-05  1:50   ` Mike Kravetz
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Kravetz @ 2021-01-05  1:50 UTC (permalink / raw)
  To: Muchun Song, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 1/3/21 10:58 PM, Muchun Song wrote:
> The page_huge_active() can be called from scan_movable_pages() which
> do not hold a reference count to the HugeTLB page. So when we call
> page_huge_active() from scan_movable_pages(), the HugeTLB page can
> be freed parallel. Then we will trigger a BUG_ON which is in the
> page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
> VM_BUG_ON_PAGE.
> 
> Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 1 -
>  1 file changed, 1 deletion(-)

Thanks!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-04 22:38   ` Mike Kravetz
@ 2021-01-05  2:44     ` Muchun Song
  2021-01-05 22:27       ` Mike Kravetz
  0 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-05  2:44 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/3/21 10:58 PM, Muchun Song wrote:
> > Because we only can isolate a active page via isolate_huge_page()
> > and hugetlbfs_fallocate() forget to mark it as active, we cannot
> > isolate and migrate those pages.
> >
> > Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  fs/hugetlbfs/inode.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
>
> Good catch.  This is indeed an issue.
>
> >
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index b5c109703daa..2aceb085d202 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -737,10 +737,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> >
> >               /*
> >                * unlock_page because locked by add_to_page_cache()
> > -              * page_put due to reference from alloc_huge_page()
> > +              * put_page() (which is in the putback_active_hugepage())
> > +              * due to reference from alloc_huge_page()
>
> Thanks for fixing the comment.
>
> >                */
> >               unlock_page(page);
> > -             put_page(page);
> > +             putback_active_hugepage(page);
>
> I'm curious why you used putback_active_hugepage() here instead of simply
> calling set_page_huge_active() before the put_page()?
>
> When the page was allocated, it was placed on the active list (alloc_huge_page).
> Therefore, the hugetlb_lock locking and list movement should not be necessary.

I agree with you. Because set_page_huge_active is not exported (static
function). Only exporting set_page_huge_active seems strange (leaving
clear_page_huge_active not export). This is just my opinion. What's
yours, Mike?

Thanks.

>
> --
> Mike Kravetz
>
> >       }
> >
> >       if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-05  0:00   ` Mike Kravetz
@ 2021-01-05  2:55     ` Muchun Song
  2021-01-05 23:22       ` Mike Kravetz
  2021-01-05  6:12     ` Muchun Song
  1 sibling, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-05  2:55 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/3/21 10:58 PM, Muchun Song wrote:
> > There is a race condition between __free_huge_page()
> > and dissolve_free_huge_page().
> >
> > CPU0:                         CPU1:
> >
> > // page_count(page) == 1
> > put_page(page)
> >   __free_huge_page(page)
> >                               dissolve_free_huge_page(page)
> >                                 spin_lock(&hugetlb_lock)
> >                                 // PageHuge(page) && !page_count(page)
> >                                 update_and_free_page(page)
> >                                 // page is freed to the buddy
> >                                 spin_unlock(&hugetlb_lock)
> >     spin_lock(&hugetlb_lock)
> >     clear_page_huge_active(page)
> >     enqueue_huge_page(page)
> >     // It is wrong, the page is already freed
> >     spin_unlock(&hugetlb_lock)
> >
> > The race windows is between put_page() and spin_lock() which
> > is in the __free_huge_page().
> >
> > We should make sure that the page is already on the free list
> > when it is dissolved.
> >
> > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 48 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 1f3bf1710b66..72608008f8b4 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >  static int num_fault_mutexes;
> >  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >
> > +static inline bool PageHugeFreed(struct page *head)
> > +{
> > +     return page_private(head) == -1UL;
>
>         return page_private(head + 4) == -1UL;
>
> > +}
> > +
> > +static inline void SetPageHugeFreed(struct page *head)
> > +{
> > +     set_page_private(head + 4, -1UL);
> > +}
> > +
> > +static inline void ClearPageHugeFreed(struct page *head)
> > +{
> > +     set_page_private(head + 4, 0);
> > +}
>
> It is unfortunate that we can not use some existing value like
> page_huge_active() to determine if dissolve_free_huge_page() should
> proceed with freeing the page to buddy.  If the existing check,
>
>         if (!page_count(page)) {
>
> was changed to
>
>         if (!page_count(page) && !page_huge_active(page)) {
>
> the race window would be shrunk.  However, the most straight forward
> way to fully close the window is with the approach taken here.

I also thought about this fix. But this is not enough. Because
we just call put_page to free the HugeTLB page without
setting activeness in some place (e.g. error handling
routines).

If we use page_huge_active, we should set activeness
before put_page. But we cannot guarantee this.

Thanks.

>
> > +
> >  /* Forward declaration */
> >  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> >       list_move(&page->lru, &h->hugepage_freelists[nid]);
> >       h->free_huge_pages++;
> >       h->free_huge_pages_node[nid]++;
> > +     SetPageHugeFreed(page);
> >  }
> >
> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >
> >               list_move(&page->lru, &h->hugepage_activelist);
> >               set_page_refcounted(page);
> > +             ClearPageHugeFreed(page);
> >               h->free_huge_pages--;
> >               h->free_huge_pages_node[nid]--;
> >               return page;
> > @@ -1504,6 +1521,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >       spin_lock(&hugetlb_lock);
> >       h->nr_huge_pages++;
> >       h->nr_huge_pages_node[nid]++;
> > +     ClearPageHugeFreed(page);
> >       spin_unlock(&hugetlb_lock);
> >  }
> >
> > @@ -1770,6 +1788,36 @@ int dissolve_free_huge_page(struct page *page)
> >               int nid = page_to_nid(head);
> >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> >                       goto out;
> > +
> > +             /*
> > +              * There is a race condition between __free_huge_page()
> > +              * and dissolve_free_huge_page().
> > +              *
> > +              * CPU0:                         CPU1:
> > +              *
> > +              * // page_count(page) == 1
> > +              * put_page(page)
> > +              *   __free_huge_page(page)
> > +              *                               dissolve_free_huge_page(page)
> > +              *                                 spin_lock(&hugetlb_lock)
> > +              *                                 // PageHuge(page) && !page_count(page)
> > +              *                                 update_and_free_page(page)
> > +              *                                 // page is freed to the buddy
> > +              *                                 spin_unlock(&hugetlb_lock)
> > +              *     spin_lock(&hugetlb_lock)
> > +              *     enqueue_huge_page(page)
> > +              *     // It is wrong, the page is already freed
> > +              *     spin_unlock(&hugetlb_lock)
> > +              *
> > +              * The race window is between put_page() and spin_lock() which
> > +              * is in the __free_huge_page().
>
> IMO, the description of the race condition in the commit message is
> sufficient.  It does not need to be here in the code.  The below comment
> should be sufficient.

Thanks. Will do.

>
> --
> Mike Kravetz
>
> > +              *
> > +              * We should make sure that the page is already on the free list
> > +              * when it is dissolved.
> > +              */
> > +             if (unlikely(!PageHugeFreed(head)))
> > +                     goto out;
> > +
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> >                * which makes any subpages rather than the error page reusable.
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-05  1:32   ` Mike Kravetz
@ 2021-01-05  3:14     ` Muchun Song
  2021-01-05  3:46       ` Muchun Song
  0 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-05  3:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/3/21 10:58 PM, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
>
> In general, I agree that the race window is small.  However, worst case
> would be if the freeing of the page is put on a work queue.  Is it acceptable
> to keep retrying in that case?  In addition, the 'Free some vmemmap' series
> may slow the free_huge_page path even more.

I also consider the 'Free some vmemmap' series case. In my next
version series, I will flush the work before dissolve_free_huge_page
returns when encountering this race. So the retry is acceptable.
Right?

Thanks.

>
> In these worst case scenarios, I am not sure we want to just spin retrying.
>
> --
> Mike Kravetz
>
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 72608008f8b4..db00ae375d2a 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> >   * nothing for in-use hugepages and non-hugepages.
> >   * This function returns values like below:
> >   *
> > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > - *          (allocated or reserved.)
> > - *       0: successfully dissolved free hugepages or the page is not a
> > - *          hugepage (considered as already dissolved)
> > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > + *           (allocated or reserved.)
> > + *       0:  successfully dissolved free hugepages or the page is not a
> > + *           hugepage (considered as already dissolved)
> >   */
> >  int dissolve_free_huge_page(struct page *page)
> >  {
> > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> >                * We should make sure that the page is already on the free list
> >                * when it is dissolved.
> >                */
> > -             if (unlikely(!PageHugeFreed(head)))
> > +             if (unlikely(!PageHugeFreed(head))) {
> > +                     rc = -EAGAIN;
> >                       goto out;
> > +             }
> >
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >
> >       for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >               page = pfn_to_page(pfn);
> > +retry:
> >               rc = dissolve_free_huge_page(page);
> > +             if (rc == -EAGAIN)
> > +                     goto retry;
> >               if (rc)
> >                       break;
> >       }
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-05  3:14     ` [External] " Muchun Song
@ 2021-01-05  3:46       ` Muchun Song
  2021-01-06  0:07         ` Mike Kravetz
  0 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-05  3:46 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Tue, Jan 5, 2021 at 11:14 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 1/3/21 10:58 PM, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > do a retry. Because the race window is small.
> >
> > In general, I agree that the race window is small.  However, worst case
> > would be if the freeing of the page is put on a work queue.  Is it acceptable
> > to keep retrying in that case?  In addition, the 'Free some vmemmap' series
> > may slow the free_huge_page path even more.
>
> I also consider the 'Free some vmemmap' series case. In my next
> version series, I will flush the work before dissolve_free_huge_page
> returns when encountering this race. So the retry is acceptable.
> Right?

Hi Mike,

How about flushing the @free_hpage_work when
encountering this race?

>
> Thanks.
>
> >
> > In these worst case scenarios, I am not sure we want to just spin retrying.
> >
> > --
> > Mike Kravetz
> >
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 16 +++++++++++-----
> > >  1 file changed, 11 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 72608008f8b4..db00ae375d2a 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > >   * nothing for in-use hugepages and non-hugepages.
> > >   * This function returns values like below:
> > >   *
> > > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > > - *          (allocated or reserved.)
> > > - *       0: successfully dissolved free hugepages or the page is not a
> > > - *          hugepage (considered as already dissolved)
> > > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > > + *           (allocated or reserved.)
> > > + *       0:  successfully dissolved free hugepages or the page is not a
> > > + *           hugepage (considered as already dissolved)
> > >   */
> > >  int dissolve_free_huge_page(struct page *page)
> > >  {
> > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> > >                * We should make sure that the page is already on the free list
> > >                * when it is dissolved.
> > >                */
> > > -             if (unlikely(!PageHugeFreed(head)))
> > > +             if (unlikely(!PageHugeFreed(head))) {
> > > +                     rc = -EAGAIN;
> > >                       goto out;
> > > +             }
> > >
> > >               /*
> > >                * Move PageHWPoison flag from head page to the raw error page,
> > > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > >
> > >       for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> > >               page = pfn_to_page(pfn);
> > > +retry:
> > >               rc = dissolve_free_huge_page(page);
> > > +             if (rc == -EAGAIN)
> > > +                     goto retry;
> > >               if (rc)
> > >                       break;
> > >       }
> > >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-05  0:00   ` Mike Kravetz
  2021-01-05  2:55     ` [External] " Muchun Song
@ 2021-01-05  6:12     ` Muchun Song
  1 sibling, 0 replies; 31+ messages in thread
From: Muchun Song @ 2021-01-05  6:12 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/3/21 10:58 PM, Muchun Song wrote:
> > There is a race condition between __free_huge_page()
> > and dissolve_free_huge_page().
> >
> > CPU0:                         CPU1:
> >
> > // page_count(page) == 1
> > put_page(page)
> >   __free_huge_page(page)
> >                               dissolve_free_huge_page(page)
> >                                 spin_lock(&hugetlb_lock)
> >                                 // PageHuge(page) && !page_count(page)
> >                                 update_and_free_page(page)
> >                                 // page is freed to the buddy
> >                                 spin_unlock(&hugetlb_lock)
> >     spin_lock(&hugetlb_lock)
> >     clear_page_huge_active(page)
> >     enqueue_huge_page(page)
> >     // It is wrong, the page is already freed
> >     spin_unlock(&hugetlb_lock)
> >
> > The race windows is between put_page() and spin_lock() which
> > is in the __free_huge_page().
> >
> > We should make sure that the page is already on the free list
> > when it is dissolved.
> >
> > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 48 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 1f3bf1710b66..72608008f8b4 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >  static int num_fault_mutexes;
> >  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >
> > +static inline bool PageHugeFreed(struct page *head)
> > +{
> > +     return page_private(head) == -1UL;
>
>         return page_private(head + 4) == -1UL;

Very thanks. It's my mistake when rebasing. Will
update in the next version.

>
> > +}
> > +
> > +static inline void SetPageHugeFreed(struct page *head)
> > +{
> > +     set_page_private(head + 4, -1UL);
> > +}
> > +
> > +static inline void ClearPageHugeFreed(struct page *head)
> > +{
> > +     set_page_private(head + 4, 0);
> > +}
>
> It is unfortunate that we can not use some existing value like
> page_huge_active() to determine if dissolve_free_huge_page() should
> proceed with freeing the page to buddy.  If the existing check,
>
>         if (!page_count(page)) {
>
> was changed to
>
>         if (!page_count(page) && !page_huge_active(page)) {
>
> the race window would be shrunk.  However, the most straight forward
> way to fully close the window is with the approach taken here.
>
> > +
> >  /* Forward declaration */
> >  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> >       list_move(&page->lru, &h->hugepage_freelists[nid]);
> >       h->free_huge_pages++;
> >       h->free_huge_pages_node[nid]++;
> > +     SetPageHugeFreed(page);
> >  }
> >
> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >
> >               list_move(&page->lru, &h->hugepage_activelist);
> >               set_page_refcounted(page);
> > +             ClearPageHugeFreed(page);
> >               h->free_huge_pages--;
> >               h->free_huge_pages_node[nid]--;
> >               return page;
> > @@ -1504,6 +1521,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >       spin_lock(&hugetlb_lock);
> >       h->nr_huge_pages++;
> >       h->nr_huge_pages_node[nid]++;
> > +     ClearPageHugeFreed(page);
> >       spin_unlock(&hugetlb_lock);
> >  }
> >
> > @@ -1770,6 +1788,36 @@ int dissolve_free_huge_page(struct page *page)
> >               int nid = page_to_nid(head);
> >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> >                       goto out;
> > +
> > +             /*
> > +              * There is a race condition between __free_huge_page()
> > +              * and dissolve_free_huge_page().
> > +              *
> > +              * CPU0:                         CPU1:
> > +              *
> > +              * // page_count(page) == 1
> > +              * put_page(page)
> > +              *   __free_huge_page(page)
> > +              *                               dissolve_free_huge_page(page)
> > +              *                                 spin_lock(&hugetlb_lock)
> > +              *                                 // PageHuge(page) && !page_count(page)
> > +              *                                 update_and_free_page(page)
> > +              *                                 // page is freed to the buddy
> > +              *                                 spin_unlock(&hugetlb_lock)
> > +              *     spin_lock(&hugetlb_lock)
> > +              *     enqueue_huge_page(page)
> > +              *     // It is wrong, the page is already freed
> > +              *     spin_unlock(&hugetlb_lock)
> > +              *
> > +              * The race window is between put_page() and spin_lock() which
> > +              * is in the __free_huge_page().
>
> IMO, the description of the race condition in the commit message is
> sufficient.  It does not need to be here in the code.  The below comment
> should be sufficient.
>
> --
> Mike Kravetz
>
> > +              *
> > +              * We should make sure that the page is already on the free list
> > +              * when it is dissolved.
> > +              */
> > +             if (unlikely(!PageHugeFreed(head)))
> > +                     goto out;
> > +
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> >                * which makes any subpages rather than the error page reusable.
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-04  6:58 ` [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
  2021-01-05  1:32   ` Mike Kravetz
@ 2021-01-05  6:37   ` HORIGUCHI NAOYA(堀口 直也)
  2021-01-05  7:10     ` [External] " Muchun Song
  1 sibling, 1 reply; 31+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-01-05  6:37 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, akpm, hillf.zj, n-horiguchi, ak, yongjun_wei,
	mhocko, linux-mm, linux-kernel

On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 72608008f8b4..db00ae375d2a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   * nothing for in-use hugepages and non-hugepages.
>   * This function returns values like below:
>   *
> - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> - *          (allocated or reserved.)
> - *       0: successfully dissolved free hugepages or the page is not a
> - *          hugepage (considered as already dissolved)
> + *  -EAGAIN: race with __free_huge_page() and can do a retry
> + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> + *           (allocated or reserved.)
> + *       0:  successfully dissolved free hugepages or the page is not a
> + *           hugepage (considered as already dissolved)
>   */
>  int dissolve_free_huge_page(struct page *page)
>  {
> @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
>  		 * We should make sure that the page is already on the free list
>  		 * when it is dissolved.
>  		 */
> -		if (unlikely(!PageHugeFreed(head)))
> +		if (unlikely(!PageHugeFreed(head))) {
> +			rc = -EAGAIN;
>  			goto out;
> +		}

If dissolve_free_huge_page() races with __free_huge_page() and we detect
it with this check, the hugepage is expected to be freed or dissolved by
__free_huge_page(), so is it enough just to return with rc = 0 without retrying?

Thanks,
Naoya Horiguchi

>  
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
> @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>  		page = pfn_to_page(pfn);
> +retry:
>  		rc = dissolve_free_huge_page(page);
> +		if (rc == -EAGAIN)
> +			goto retry;
>  		if (rc)
>  			break;
>  	}
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-05  6:37   ` HORIGUCHI NAOYA(堀口 直也)
@ 2021-01-05  7:10     ` Muchun Song
  2021-01-05  7:30       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 31+ messages in thread
From: Muchun Song @ 2021-01-05  7:10 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: mike.kravetz, akpm, n-horiguchi, ak, mhocko, linux-mm, linux-kernel

On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 72608008f8b4..db00ae375d2a 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> >   * nothing for in-use hugepages and non-hugepages.
> >   * This function returns values like below:
> >   *
> > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > - *          (allocated or reserved.)
> > - *       0: successfully dissolved free hugepages or the page is not a
> > - *          hugepage (considered as already dissolved)
> > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > + *           (allocated or reserved.)
> > + *       0:  successfully dissolved free hugepages or the page is not a
> > + *           hugepage (considered as already dissolved)
> >   */
> >  int dissolve_free_huge_page(struct page *page)
> >  {
> > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> >                * We should make sure that the page is already on the free list
> >                * when it is dissolved.
> >                */
> > -             if (unlikely(!PageHugeFreed(head)))
> > +             if (unlikely(!PageHugeFreed(head))) {
> > +                     rc = -EAGAIN;
> >                       goto out;
> > +             }
>
> If dissolve_free_huge_page() races with __free_huge_page() and we detect
> it with this check, the hugepage is expected to be freed or dissolved by
> __free_huge_page(), so is it enough just to return with rc = 0 without retrying?

The dissolve_free_huge_page() aims to free the page to the buddy
allocator not the hugepage pool. So it is not enough just to return 0,
right? Or did you mean that we set the page temporary here and
let the __free_huge_page do the freeing later for us? Thanks.

>
> Thanks,
> Naoya Horiguchi
>
> >
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >
> >       for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >               page = pfn_to_page(pfn);
> > +retry:
> >               rc = dissolve_free_huge_page(page);
> > +             if (rc == -EAGAIN)
> > +                     goto retry;
> >               if (rc)
> >                       break;
> >       }
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-05  7:10     ` [External] " Muchun Song
@ 2021-01-05  7:30       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 31+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-01-05  7:30 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, akpm, n-horiguchi, ak, mhocko, linux-mm, linux-kernel

On Tue, Jan 05, 2021 at 03:10:35PM +0800, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
> >
> > On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > do a retry. Because the race window is small.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 16 +++++++++++-----
> > >  1 file changed, 11 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 72608008f8b4..db00ae375d2a 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > >   * nothing for in-use hugepages and non-hugepages.
> > >   * This function returns values like below:
> > >   *
> > > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > > - *          (allocated or reserved.)
> > > - *       0: successfully dissolved free hugepages or the page is not a
> > > - *          hugepage (considered as already dissolved)
> > > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > > + *           (allocated or reserved.)
> > > + *       0:  successfully dissolved free hugepages or the page is not a
> > > + *           hugepage (considered as already dissolved)
> > >   */
> > >  int dissolve_free_huge_page(struct page *page)
> > >  {
> > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> > >                * We should make sure that the page is already on the free list
> > >                * when it is dissolved.
> > >                */
> > > -             if (unlikely(!PageHugeFreed(head)))
> > > +             if (unlikely(!PageHugeFreed(head))) {
> > > +                     rc = -EAGAIN;
> > >                       goto out;
> > > +             }
> >
> > If dissolve_free_huge_page() races with __free_huge_page() and we detect
> > it with this check, the hugepage is expected to be freed or dissolved by
> > __free_huge_page(), so is it enough just to return with rc = 0 without retrying?
> 
> The dissolve_free_huge_page() aims to free the page to the buddy
> allocator not the hugepage pool. So it is not enough just to return 0,
> right? Or did you mean that we set the page temporary here and
> let the __free_huge_page do the freeing later for us? Thanks.

Ah, OK. You're right.
Thank you for the answer (and finding/fixing a few bugs).

- Naoya

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (5 preceding siblings ...)
  2021-01-04 22:17 ` [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Mike Kravetz
@ 2021-01-05 16:58 ` David Hildenbrand
  2021-01-05 18:04   ` Yang Shi
  2021-01-05 18:04 ` Yang Shi
  2021-01-06 16:11 ` Michal Hocko
  8 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2021-01-05 16:58 UTC (permalink / raw)
  To: Muchun Song, mike.kravetz, akpm
  Cc: hillf.zj, n-horiguchi, ak, yongjun_wei, mhocko, linux-mm, linux-kernel

On 04.01.21 07:58, Muchun Song wrote:
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/migrate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4385f2fb5d18..a6631c4eb6a6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  		return -ENOSYS;
>  	}
>  
> +	if (page_count(hpage) == 1) {
> +		/* page was freed from under us. So we are done. */
> +		putback_active_hugepage(hpage);
> +		return MIGRATEPAGE_SUCCESS;
> +	}
> +
>  	new_hpage = get_new_page(hpage, private);
>  	if (!new_hpage)
>  		return -ENOMEM;
> 

This series seems to fix quite some important cases (thanks). Do we want
to cc stable some/all?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-05 16:58 ` David Hildenbrand
@ 2021-01-05 18:04   ` Yang Shi
  2021-01-05 18:05     ` David Hildenbrand
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Shi @ 2021-01-05 18:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Muchun Song, Mike Kravetz, Andrew Morton, hillf.zj,
	Naoya Horiguchi, ak, yongjun_wei, mhocko, Linux MM,
	Linux Kernel Mailing List

On Tue, Jan 5, 2021 at 8:58 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.01.21 07:58, Muchun Song wrote:
> > If the refcount is one when it is migrated, it means that the page
> > was freed from under us. So we are done and do not need to migrate.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/migrate.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 4385f2fb5d18..a6631c4eb6a6 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >               return -ENOSYS;
> >       }
> >
> > +     if (page_count(hpage) == 1) {
> > +             /* page was freed from under us. So we are done. */
> > +             putback_active_hugepage(hpage);
> > +             return MIGRATEPAGE_SUCCESS;
> > +     }
> > +
> >       new_hpage = get_new_page(hpage, private);
> >       if (!new_hpage)
> >               return -ENOMEM;
> >
>
> This series seems to fix quite some important cases (thanks). Do we want
> to cc stable some/all?

For this particular one, I don't think so IMHO. It is an optimization
rather than a bug fix.

>
> --
> Thanks,
>
> David / dhildenb
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (6 preceding siblings ...)
  2021-01-05 16:58 ` David Hildenbrand
@ 2021-01-05 18:04 ` Yang Shi
  2021-01-06 16:11 ` Michal Hocko
  8 siblings, 0 replies; 31+ messages in thread
From: Yang Shi @ 2021-01-05 18:04 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, hillf.zj, Naoya Horiguchi, ak,
	yongjun_wei, mhocko, Linux MM, Linux Kernel Mailing List

On Sun, Jan 3, 2021 at 11:01 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/migrate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4385f2fb5d18..a6631c4eb6a6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>                 return -ENOSYS;
>         }
>
> +       if (page_count(hpage) == 1) {
> +               /* page was freed from under us. So we are done. */
> +               putback_active_hugepage(hpage);
> +               return MIGRATEPAGE_SUCCESS;
> +       }
> +

Acked-by: Yang Shi <shy828301@gmail.com>

>         new_hpage = get_new_page(hpage, private);
>         if (!new_hpage)
>                 return -ENOMEM;
> --
> 2.11.0
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-05 18:04   ` Yang Shi
@ 2021-01-05 18:05     ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2021-01-05 18:05 UTC (permalink / raw)
  To: Yang Shi
  Cc: Muchun Song, Mike Kravetz, Andrew Morton, hillf.zj,
	Naoya Horiguchi, ak, yongjun_wei, mhocko, Linux MM,
	Linux Kernel Mailing List

On 05.01.21 19:04, Yang Shi wrote:
> On Tue, Jan 5, 2021 at 8:58 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 04.01.21 07:58, Muchun Song wrote:
>>> If the refcount is one when it is migrated, it means that the page
>>> was freed from under us. So we are done and do not need to migrate.
>>>
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>>  mm/migrate.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 4385f2fb5d18..a6631c4eb6a6 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>>               return -ENOSYS;
>>>       }
>>>
>>> +     if (page_count(hpage) == 1) {
>>> +             /* page was freed from under us. So we are done. */
>>> +             putback_active_hugepage(hpage);
>>> +             return MIGRATEPAGE_SUCCESS;
>>> +     }
>>> +
>>>       new_hpage = get_new_page(hpage, private);
>>>       if (!new_hpage)
>>>               return -ENOMEM;
>>>
>>
>> This series seems to fix quite some important cases (thanks). Do we want
>> to cc stable some/all?
> 
> For this particular one, I don't think so IMHO. It is an optimization
> rather than a bug fix.

I'm rather referring to the actual fixes (I received no cover letter so
I picked the first patch in the series)


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-05  2:44     ` [External] " Muchun Song
@ 2021-01-05 22:27       ` Mike Kravetz
  2021-01-06  2:57         ` Muchun Song
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Kravetz @ 2021-01-05 22:27 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On 1/4/21 6:44 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>> Because we only can isolate a active page via isolate_huge_page()
>>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
>>> isolate and migrate those pages.
>>>
>>> Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>>  fs/hugetlbfs/inode.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> Good catch.  This is indeed an issue.
>>
>>>
>>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>>> index b5c109703daa..2aceb085d202 100644
>>> --- a/fs/hugetlbfs/inode.c
>>> +++ b/fs/hugetlbfs/inode.c
>>> @@ -737,10 +737,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>>>
>>>               /*
>>>                * unlock_page because locked by add_to_page_cache()
>>> -              * page_put due to reference from alloc_huge_page()
>>> +              * put_page() (which is in the putback_active_hugepage())
>>> +              * due to reference from alloc_huge_page()
>>
>> Thanks for fixing the comment.
>>
>>>                */
>>>               unlock_page(page);
>>> -             put_page(page);
>>> +             putback_active_hugepage(page);
>>
>> I'm curious why you used putback_active_hugepage() here instead of simply
>> calling set_page_huge_active() before the put_page()?
>>
>> When the page was allocated, it was placed on the active list (alloc_huge_page).
>> Therefore, the hugetlb_lock locking and list movement should not be necessary.
> 
> I agree with you. Because set_page_huge_active is not exported (static
> function). Only exporting set_page_huge_active seems strange (leaving
> clear_page_huge_active not export). This is just my opinion. What's
> yours, Mike?

I'm thinking that we should export (make external) set_page_huge_active.
We can leave clear_page_huge_active as static and just add something to
the commit log noting that there are no external users.

My primary reason for doing this is to eliminate the extra and unnecessary
per-page lock/unlock cycle.  I believe there are some applications that
use fallocate to pre-allocate very large hugetlbfs files.  They may notice
the extra overhead.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-05  2:55     ` [External] " Muchun Song
@ 2021-01-05 23:22       ` Mike Kravetz
  2021-01-06  6:05         ` Muchun Song
  0 siblings, 1 reply; 31+ messages in thread
From: Mike Kravetz @ 2021-01-05 23:22 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On 1/4/21 6:55 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>> There is a race condition between __free_huge_page()
>>> and dissolve_free_huge_page().
>>>
>>> CPU0:                         CPU1:
>>>
>>> // page_count(page) == 1
>>> put_page(page)
>>>   __free_huge_page(page)
>>>                               dissolve_free_huge_page(page)
>>>                                 spin_lock(&hugetlb_lock)
>>>                                 // PageHuge(page) && !page_count(page)
>>>                                 update_and_free_page(page)
>>>                                 // page is freed to the buddy
>>>                                 spin_unlock(&hugetlb_lock)
>>>     spin_lock(&hugetlb_lock)
>>>     clear_page_huge_active(page)
>>>     enqueue_huge_page(page)
>>>     // It is wrong, the page is already freed
>>>     spin_unlock(&hugetlb_lock)
>>>
>>> The race windows is between put_page() and spin_lock() which
>>> is in the __free_huge_page().
>>>
>>> We should make sure that the page is already on the free list
>>> when it is dissolved.
>>>
>>> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>>  mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 48 insertions(+)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 1f3bf1710b66..72608008f8b4 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
>>>  static int num_fault_mutexes;
>>>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>>>
>>> +static inline bool PageHugeFreed(struct page *head)
>>> +{
>>> +     return page_private(head) == -1UL;
>>
>>         return page_private(head + 4) == -1UL;
>>
>>> +}
>>> +
>>> +static inline void SetPageHugeFreed(struct page *head)
>>> +{
>>> +     set_page_private(head + 4, -1UL);
>>> +}
>>> +
>>> +static inline void ClearPageHugeFreed(struct page *head)
>>> +{
>>> +     set_page_private(head + 4, 0);
>>> +}
>>
>> It is unfortunate that we can not use some existing value like
>> page_huge_active() to determine if dissolve_free_huge_page() should
>> proceed with freeing the page to buddy.  If the existing check,
>>
>>         if (!page_count(page)) {
>>
>> was changed to
>>
>>         if (!page_count(page) && !page_huge_active(page)) {
>>
>> the race window would be shrunk.  However, the most straight forward
>> way to fully close the window is with the approach taken here.
> 
> I also thought about this fix. But this is not enough. Because
> we just call put_page to free the HugeTLB page without
> setting activeness in some place (e.g. error handling
> routines).
> 
> If we use page_huge_active, we should set activeness
> before put_page. But we cannot guarantee this.

Just FYI,
I went back and explored the option of doing set_page_huge_active
when a page was put on the active list and clear_page_huge_active
when put on the free list.  This would be much like what you are
doing with PageHugeFreed.  Commit bcc54222309c which added page_huge_active
implied that this was possible.  Then I remembered a race fixed in
cb6acd01e2e4 that required delaying the call to set_page_huge_active
in hugetlb_no_page.  So, such a scheme would not work.

Also,
It seems we could use head[3].mapping for PageHugeFreed ?  Not much
of an advantage.  It does not add another tail page needed to store
page metadata.  And, this fits within the already defined
HUGETLB_CGROUP_MIN_ORDER.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-05  3:46       ` Muchun Song
@ 2021-01-06  0:07         ` Mike Kravetz
  0 siblings, 0 replies; 31+ messages in thread
From: Mike Kravetz @ 2021-01-06  0:07 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On 1/4/21 7:46 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 11:14 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>
>> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>>> When dissolve_free_huge_page() races with __free_huge_page(), we can
>>>> do a retry. Because the race window is small.
>>>
>>> In general, I agree that the race window is small.  However, worst case
>>> would be if the freeing of the page is put on a work queue.  Is it acceptable
>>> to keep retrying in that case?  In addition, the 'Free some vmemmap' series
>>> may slow the free_huge_page path even more.
>>
>> I also consider the 'Free some vmemmap' series case. In my next
>> version series, I will flush the work before dissolve_free_huge_page
>> returns when encountering this race. So the retry is acceptable.
>> Right?
> 
> Hi Mike,
> 
> How about flushing the @free_hpage_work when
> encountering this race?

Flushing should be fine.

The more I think about it, the more I think spinning to retry is not
going to be an issue.  Why?  As you mentioned the race window is quite
small.  It just makes me a bit nervous to retry in a tight loop.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-05 22:27       ` Mike Kravetz
@ 2021-01-06  2:57         ` Muchun Song
  0 siblings, 0 replies; 31+ messages in thread
From: Muchun Song @ 2021-01-06  2:57 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Wed, Jan 6, 2021 at 6:29 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/4/21 6:44 PM, Muchun Song wrote:
> > On Tue, Jan 5, 2021 at 6:40 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>
> >> On 1/3/21 10:58 PM, Muchun Song wrote:
> >>> Because we only can isolate a active page via isolate_huge_page()
> >>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> >>> isolate and migrate those pages.
> >>>
> >>> Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> >>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>> ---
> >>>  fs/hugetlbfs/inode.c | 5 +++--
> >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> Good catch.  This is indeed an issue.
> >>
> >>>
> >>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> >>> index b5c109703daa..2aceb085d202 100644
> >>> --- a/fs/hugetlbfs/inode.c
> >>> +++ b/fs/hugetlbfs/inode.c
> >>> @@ -737,10 +737,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> >>>
> >>>               /*
> >>>                * unlock_page because locked by add_to_page_cache()
> >>> -              * page_put due to reference from alloc_huge_page()
> >>> +              * put_page() (which is in the putback_active_hugepage())
> >>> +              * due to reference from alloc_huge_page()
> >>
> >> Thanks for fixing the comment.
> >>
> >>>                */
> >>>               unlock_page(page);
> >>> -             put_page(page);
> >>> +             putback_active_hugepage(page);
> >>
> >> I'm curious why you used putback_active_hugepage() here instead of simply
> >> calling set_page_huge_active() before the put_page()?
> >>
> >> When the page was allocated, it was placed on the active list (alloc_huge_page).
> >> Therefore, the hugetlb_lock locking and list movement should not be necessary.
> >
> > I agree with you. Because set_page_huge_active is not exported (static
> > function). Only exporting set_page_huge_active seems strange (leaving
> > clear_page_huge_active not export). This is just my opinion. What's
> > yours, Mike?
>
> I'm thinking that we should export (make external) set_page_huge_active.
> We can leave clear_page_huge_active as static and just add something to
> the commit log noting that there are no external users.
>
> My primary reason for doing this is to eliminate the extra and unnecessary
> per-page lock/unlock cycle.  I believe there are some applications that
> use fallocate to pre-allocate very large hugetlbfs files.  They may notice
> the extra overhead.

Agree. Will do in the next version. Thanks.

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-05 23:22       ` Mike Kravetz
@ 2021-01-06  6:05         ` Muchun Song
  0 siblings, 0 replies; 31+ messages in thread
From: Muchun Song @ 2021-01-06  6:05 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Wed, Jan 6, 2021 at 7:22 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/4/21 6:55 PM, Muchun Song wrote:
> > On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>
> >> On 1/3/21 10:58 PM, Muchun Song wrote:
> >>> There is a race condition between __free_huge_page()
> >>> and dissolve_free_huge_page().
> >>>
> >>> CPU0:                         CPU1:
> >>>
> >>> // page_count(page) == 1
> >>> put_page(page)
> >>>   __free_huge_page(page)
> >>>                               dissolve_free_huge_page(page)
> >>>                                 spin_lock(&hugetlb_lock)
> >>>                                 // PageHuge(page) && !page_count(page)
> >>>                                 update_and_free_page(page)
> >>>                                 // page is freed to the buddy
> >>>                                 spin_unlock(&hugetlb_lock)
> >>>     spin_lock(&hugetlb_lock)
> >>>     clear_page_huge_active(page)
> >>>     enqueue_huge_page(page)
> >>>     // It is wrong, the page is already freed
> >>>     spin_unlock(&hugetlb_lock)
> >>>
> >>> The race windows is between put_page() and spin_lock() which
> >>> is in the __free_huge_page().
> >>>
> >>> We should make sure that the page is already on the free list
> >>> when it is dissolved.
> >>>
> >>> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> >>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>> ---
> >>>  mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 48 insertions(+)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 1f3bf1710b66..72608008f8b4 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >>>  static int num_fault_mutexes;
> >>>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >>>
> >>> +static inline bool PageHugeFreed(struct page *head)
> >>> +{
> >>> +     return page_private(head) == -1UL;
> >>
> >>         return page_private(head + 4) == -1UL;
> >>
> >>> +}
> >>> +
> >>> +static inline void SetPageHugeFreed(struct page *head)
> >>> +{
> >>> +     set_page_private(head + 4, -1UL);
> >>> +}
> >>> +
> >>> +static inline void ClearPageHugeFreed(struct page *head)
> >>> +{
> >>> +     set_page_private(head + 4, 0);
> >>> +}
> >>
> >> It is unfortunate that we can not use some existing value like
> >> page_huge_active() to determine if dissolve_free_huge_page() should
> >> proceed with freeing the page to buddy.  If the existing check,
> >>
> >>         if (!page_count(page)) {
> >>
> >> was changed to
> >>
> >>         if (!page_count(page) && !page_huge_active(page)) {
> >>
> >> the race window would be shrunk.  However, the most straight forward
> >> way to fully close the window is with the approach taken here.
> >
> > I also thought about this fix. But this is not enough. Because
> > we just call put_page to free the HugeTLB page without
> > setting activeness in some place (e.g. error handling
> > routines).
> >
> > If we use page_huge_active, we should set activeness
> > before put_page. But we cannot guarantee this.
>
> Just FYI,
> I went back and explored the option of doing set_page_huge_active
> when a page was put on the active list and clear_page_huge_active
> when put on the free list.  This would be much like what you are
> doing with PageHugeFreed.  Commit bcc54222309c which added page_huge_active
> implied that this was possible.  Then I remembered a race fixed in
> cb6acd01e2e4 that required delaying the call to set_page_huge_active
> in hugetlb_no_page.  So, such a scheme would not work.

Sounds like a tortuous story. :)

>
> Also,
> It seems we could use head[3].mapping for PageHugeFreed ?  Not much
> of an advantage.  It does not add another tail page needed to store
> page metadata.  And, this fits within the already defined
> HUGETLB_CGROUP_MIN_ORDER.

It is fine to me. Will do. Thanks.

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (7 preceding siblings ...)
  2021-01-05 18:04 ` Yang Shi
@ 2021-01-06 16:11 ` Michal Hocko
  2021-01-06 16:12   ` Michal Hocko
  8 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2021-01-06 16:11 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, akpm, hillf.zj, n-horiguchi, ak, yongjun_wei,
	linux-mm, linux-kernel

On Mon 04-01-21 14:58:38, Muchun Song wrote:
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.

Is this common enough that it would warrant the explicit check for each
migration?

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/migrate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4385f2fb5d18..a6631c4eb6a6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  		return -ENOSYS;
>  	}
>  
> +	if (page_count(hpage) == 1) {
> +		/* page was freed from under us. So we are done. */
> +		putback_active_hugepage(hpage);
> +		return MIGRATEPAGE_SUCCESS;
> +	}
> +
>  	new_hpage = get_new_page(hpage, private);
>  	if (!new_hpage)
>  		return -ENOMEM;
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-06 16:11 ` Michal Hocko
@ 2021-01-06 16:12   ` Michal Hocko
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2021-01-06 16:12 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, akpm, hillf.zj, n-horiguchi, ak, yongjun_wei,
	linux-mm, linux-kernel

On Wed 06-01-21 17:11:39, Michal Hocko wrote:
> On Mon 04-01-21 14:58:38, Muchun Song wrote:
> > If the refcount is one when it is migrated, it means that the page
> > was freed from under us. So we are done and do not need to migrate.
> 
> Is this common enough that it would warrant the explicit check for each
> migration?

Ups, just noticed that you have posted a newer version. I will follow up
there.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-01-06 16:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04  6:58 [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
2021-01-04  6:58 ` [PATCH 2/6] hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
2021-01-04 22:38   ` Mike Kravetz
2021-01-05  2:44     ` [External] " Muchun Song
2021-01-05 22:27       ` Mike Kravetz
2021-01-06  2:57         ` Muchun Song
2021-01-04  6:58 ` [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
2021-01-05  0:00   ` Mike Kravetz
2021-01-05  2:55     ` [External] " Muchun Song
2021-01-05 23:22       ` Mike Kravetz
2021-01-06  6:05         ` Muchun Song
2021-01-05  6:12     ` Muchun Song
2021-01-04  6:58 ` [PATCH 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
2021-01-05  1:32   ` Mike Kravetz
2021-01-05  3:14     ` [External] " Muchun Song
2021-01-05  3:46       ` Muchun Song
2021-01-06  0:07         ` Mike Kravetz
2021-01-05  6:37   ` HORIGUCHI NAOYA(堀口 直也)
2021-01-05  7:10     ` [External] " Muchun Song
2021-01-05  7:30       ` HORIGUCHI NAOYA(堀口 直也)
2021-01-04  6:58 ` [PATCH 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
2021-01-05  1:42   ` Mike Kravetz
2021-01-04  6:58 ` [PATCH 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
2021-01-05  1:50   ` Mike Kravetz
2021-01-04 22:17 ` [PATCH 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Mike Kravetz
2021-01-05 16:58 ` David Hildenbrand
2021-01-05 18:04   ` Yang Shi
2021-01-05 18:05     ` David Hildenbrand
2021-01-05 18:04 ` Yang Shi
2021-01-06 16:11 ` Michal Hocko
2021-01-06 16:12   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).